How to copy millions files from dedicated server to AWS EC2? [closed]
I have a website that needs to move from a dedicated server to AWS EC2 instance. I have 650GB+ data and 3+ million files.
I tried using SCP like this but because of huge file it taking so much time.
scp -r remote_username@10.10.0.2:/remote/directory /local/directory
My Source OS is Centos 7.5 with cPanel. 1TB HDD and 650GB data, the destination server is Ubuntu 18.04, 700GB HDD.
I know we have some other option also like LFTP, SFTP, rSync etc, please help me with quickest method.
ssh sftp scp amazon-ec2 lftp
closed as primarily opinion-based by bertieb, music2myear, karel, Moab, Seth Mar 25 at 11:09
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
I have a website that needs to move from a dedicated server to AWS EC2 instance. I have 650GB+ data and 3+ million files.
I tried using SCP like this but because of huge file it taking so much time.
scp -r remote_username@10.10.0.2:/remote/directory /local/directory
My Source OS is Centos 7.5 with cPanel. 1TB HDD and 650GB data, the destination server is Ubuntu 18.04, 700GB HDD.
I know we have some other option also like LFTP, SFTP, rSync etc, please help me with quickest method.
ssh sftp scp amazon-ec2 lftp
closed as primarily opinion-based by bertieb, music2myear, karel, Moab, Seth Mar 25 at 11:09
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
Please Edit the question (to the bottom left of the question text) to indicate the OS of the source machine, and any other specifications like confirming exact copy.
– Christopher Hostage
Feb 27 at 16:31
If you're willing to spend money, there are commercial file-transfer solutions which are much faster than scp, rsync, or sftp.
– Kenster
Feb 27 at 20:55
@Kenster Thank you but I already started using SCP and almost 50% completed so in this situation I don't want to spend money on transfer files.
– Mi2
Mar 1 at 15:58
add a comment |
I have a website that needs to move from a dedicated server to AWS EC2 instance. I have 650GB+ data and 3+ million files.
I tried using SCP like this but because of huge file it taking so much time.
scp -r remote_username@10.10.0.2:/remote/directory /local/directory
My Source OS is Centos 7.5 with cPanel. 1TB HDD and 650GB data, the destination server is Ubuntu 18.04, 700GB HDD.
I know we have some other option also like LFTP, SFTP, rSync etc, please help me with quickest method.
ssh sftp scp amazon-ec2 lftp
I have a website that needs to move from a dedicated server to AWS EC2 instance. I have 650GB+ data and 3+ million files.
I tried using SCP like this but because of huge file it taking so much time.
scp -r remote_username@10.10.0.2:/remote/directory /local/directory
My Source OS is Centos 7.5 with cPanel. 1TB HDD and 650GB data, the destination server is Ubuntu 18.04, 700GB HDD.
I know we have some other option also like LFTP, SFTP, rSync etc, please help me with quickest method.
ssh sftp scp amazon-ec2 lftp
ssh sftp scp amazon-ec2 lftp
edited Feb 27 at 16:37
Mi2
asked Feb 27 at 15:21
Mi2Mi2
62
62
closed as primarily opinion-based by bertieb, music2myear, karel, Moab, Seth Mar 25 at 11:09
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as primarily opinion-based by bertieb, music2myear, karel, Moab, Seth Mar 25 at 11:09
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
Please Edit the question (to the bottom left of the question text) to indicate the OS of the source machine, and any other specifications like confirming exact copy.
– Christopher Hostage
Feb 27 at 16:31
If you're willing to spend money, there are commercial file-transfer solutions which are much faster than scp, rsync, or sftp.
– Kenster
Feb 27 at 20:55
@Kenster Thank you but I already started using SCP and almost 50% completed so in this situation I don't want to spend money on transfer files.
– Mi2
Mar 1 at 15:58
add a comment |
Please Edit the question (to the bottom left of the question text) to indicate the OS of the source machine, and any other specifications like confirming exact copy.
– Christopher Hostage
Feb 27 at 16:31
If you're willing to spend money, there are commercial file-transfer solutions which are much faster than scp, rsync, or sftp.
– Kenster
Feb 27 at 20:55
@Kenster Thank you but I already started using SCP and almost 50% completed so in this situation I don't want to spend money on transfer files.
– Mi2
Mar 1 at 15:58
Please Edit the question (to the bottom left of the question text) to indicate the OS of the source machine, and any other specifications like confirming exact copy.
– Christopher Hostage
Feb 27 at 16:31
Please Edit the question (to the bottom left of the question text) to indicate the OS of the source machine, and any other specifications like confirming exact copy.
– Christopher Hostage
Feb 27 at 16:31
If you're willing to spend money, there are commercial file-transfer solutions which are much faster than scp, rsync, or sftp.
– Kenster
Feb 27 at 20:55
If you're willing to spend money, there are commercial file-transfer solutions which are much faster than scp, rsync, or sftp.
– Kenster
Feb 27 at 20:55
@Kenster Thank you but I already started using SCP and almost 50% completed so in this situation I don't want to spend money on transfer files.
– Mi2
Mar 1 at 15:58
@Kenster Thank you but I already started using SCP and almost 50% completed so in this situation I don't want to spend money on transfer files.
– Mi2
Mar 1 at 15:58
add a comment |
5 Answers
5
active
oldest
votes
I would suggest zipping the files in say 1 GB chunks and uploading those.
When unzipping each file is checked against a CRC checksum. You can use built-in splitting so zip automatically generates .z00 .z01 .z02 .z03 ...
Alternatively, you can use the rar format which allows creation of parity data to repair damaged segments.
add a comment |
There is one AWS Solution how to transfer your data:
https://aws.amazon.com/snowball/?nc1=h_ls
As I know, you'll get a device (via Post Service like DHL)
You can copy your data on this device and then Amazon will upload this data for you.
I can't understand why I will need the device, I can copy all files via the web, I know I can do this using SCP, lftp, rsync, sftp but I want to know which one is fast and no risk for data missing. If possible then need some help with SSH command.
– Mi2
Feb 27 at 15:48
@user219457 please Edit the original question with your specifications. You've found some of the right tools, and figuring out how to use those is important.
– Christopher Hostage
Feb 27 at 16:28
add a comment |
The only way to speed the upload is to do it in multiple parts in parallel.
If you can divide the job among several computers using distinct connections,
this will speed up the upload.
If a single computer does not reach full throughput, you can opt for a
multi-thread method where each thread will open its own connection in parallel.
See the post
Which is the fastest way to copy 400G of files from an ec2 elastic block store volume to s3?
for suggestions of products and scripts.
See also the article
FS File Sync – Faster File Transfer To Amazon EFS File Systems.
add a comment |
When using scp, it doesn't retry or continue on partially transferred files.
Try using rsync instead, e.g.
rsync -vuaz remote_username@10.10.0.2:/remote/directory/ /local/directory/
Arguments:
-v/--verboseincrease verbosity.
-u/--updateskip files that are newer on the receiver.
-a/--archivearchive mode; equals-rlptgoD
-z/--compresscompress file data during the transfer.
My maximum files are images and size is less than 2Mb, do you think rsync will do faster than SCP? I already started copying an 80GB directory, so if I close puty now and start using rsync do you this I will get an issue with already downloaded files?
– Mi2
Feb 27 at 17:18
Withscp, once you got transfer error, you've to transfer everything over and over again, as you don't know which files were copied fully, which not. Rsync will make the list of all files which needs to be updated, before copying anything. You can usersyncafter you usedscp, so it can continue from the point wherescpfinished. Not sure if it's faster, the speed could be the same. You can leave putty (to avoid unnecessary changes), once you got any transfer issue, continue withrsync.
– kenorb
Feb 27 at 17:30
add a comment |
Try installing AWS CLI on your dedicated server.
Then use aws s3 command to transfer the files to your AWS S3 bucket first.
E.g.
aws s3 sync local/directory s3://mybucket/local/directory
Then transfer back to your local EC2 instance:
aws s3 sync s3://mybucket/local/directory local/directory
The command is designed to copy large number of files, and it can continue on failure.
You can also decide to serve the files for EC2 instance directly from S3.
Check aws s3 sync help for help.
add a comment |
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
I would suggest zipping the files in say 1 GB chunks and uploading those.
When unzipping each file is checked against a CRC checksum. You can use built-in splitting so zip automatically generates .z00 .z01 .z02 .z03 ...
Alternatively, you can use the rar format which allows creation of parity data to repair damaged segments.
add a comment |
I would suggest zipping the files in say 1 GB chunks and uploading those.
When unzipping each file is checked against a CRC checksum. You can use built-in splitting so zip automatically generates .z00 .z01 .z02 .z03 ...
Alternatively, you can use the rar format which allows creation of parity data to repair damaged segments.
add a comment |
I would suggest zipping the files in say 1 GB chunks and uploading those.
When unzipping each file is checked against a CRC checksum. You can use built-in splitting so zip automatically generates .z00 .z01 .z02 .z03 ...
Alternatively, you can use the rar format which allows creation of parity data to repair damaged segments.
I would suggest zipping the files in say 1 GB chunks and uploading those.
When unzipping each file is checked against a CRC checksum. You can use built-in splitting so zip automatically generates .z00 .z01 .z02 .z03 ...
Alternatively, you can use the rar format which allows creation of parity data to repair damaged segments.
answered Feb 27 at 16:22
cybernardcybernard
10.5k31728
10.5k31728
add a comment |
add a comment |
There is one AWS Solution how to transfer your data:
https://aws.amazon.com/snowball/?nc1=h_ls
As I know, you'll get a device (via Post Service like DHL)
You can copy your data on this device and then Amazon will upload this data for you.
I can't understand why I will need the device, I can copy all files via the web, I know I can do this using SCP, lftp, rsync, sftp but I want to know which one is fast and no risk for data missing. If possible then need some help with SSH command.
– Mi2
Feb 27 at 15:48
@user219457 please Edit the original question with your specifications. You've found some of the right tools, and figuring out how to use those is important.
– Christopher Hostage
Feb 27 at 16:28
add a comment |
There is one AWS Solution how to transfer your data:
https://aws.amazon.com/snowball/?nc1=h_ls
As I know, you'll get a device (via Post Service like DHL)
You can copy your data on this device and then Amazon will upload this data for you.
I can't understand why I will need the device, I can copy all files via the web, I know I can do this using SCP, lftp, rsync, sftp but I want to know which one is fast and no risk for data missing. If possible then need some help with SSH command.
– Mi2
Feb 27 at 15:48
@user219457 please Edit the original question with your specifications. You've found some of the right tools, and figuring out how to use those is important.
– Christopher Hostage
Feb 27 at 16:28
add a comment |
There is one AWS Solution how to transfer your data:
https://aws.amazon.com/snowball/?nc1=h_ls
As I know, you'll get a device (via Post Service like DHL)
You can copy your data on this device and then Amazon will upload this data for you.
There is one AWS Solution how to transfer your data:
https://aws.amazon.com/snowball/?nc1=h_ls
As I know, you'll get a device (via Post Service like DHL)
You can copy your data on this device and then Amazon will upload this data for you.
answered Feb 27 at 15:38
DmytroDmytro
1
1
I can't understand why I will need the device, I can copy all files via the web, I know I can do this using SCP, lftp, rsync, sftp but I want to know which one is fast and no risk for data missing. If possible then need some help with SSH command.
– Mi2
Feb 27 at 15:48
@user219457 please Edit the original question with your specifications. You've found some of the right tools, and figuring out how to use those is important.
– Christopher Hostage
Feb 27 at 16:28
add a comment |
I can't understand why I will need the device, I can copy all files via the web, I know I can do this using SCP, lftp, rsync, sftp but I want to know which one is fast and no risk for data missing. If possible then need some help with SSH command.
– Mi2
Feb 27 at 15:48
@user219457 please Edit the original question with your specifications. You've found some of the right tools, and figuring out how to use those is important.
– Christopher Hostage
Feb 27 at 16:28
I can't understand why I will need the device, I can copy all files via the web, I know I can do this using SCP, lftp, rsync, sftp but I want to know which one is fast and no risk for data missing. If possible then need some help with SSH command.
– Mi2
Feb 27 at 15:48
I can't understand why I will need the device, I can copy all files via the web, I know I can do this using SCP, lftp, rsync, sftp but I want to know which one is fast and no risk for data missing. If possible then need some help with SSH command.
– Mi2
Feb 27 at 15:48
@user219457 please Edit the original question with your specifications. You've found some of the right tools, and figuring out how to use those is important.
– Christopher Hostage
Feb 27 at 16:28
@user219457 please Edit the original question with your specifications. You've found some of the right tools, and figuring out how to use those is important.
– Christopher Hostage
Feb 27 at 16:28
add a comment |
The only way to speed the upload is to do it in multiple parts in parallel.
If you can divide the job among several computers using distinct connections,
this will speed up the upload.
If a single computer does not reach full throughput, you can opt for a
multi-thread method where each thread will open its own connection in parallel.
See the post
Which is the fastest way to copy 400G of files from an ec2 elastic block store volume to s3?
for suggestions of products and scripts.
See also the article
FS File Sync – Faster File Transfer To Amazon EFS File Systems.
add a comment |
The only way to speed the upload is to do it in multiple parts in parallel.
If you can divide the job among several computers using distinct connections,
this will speed up the upload.
If a single computer does not reach full throughput, you can opt for a
multi-thread method where each thread will open its own connection in parallel.
See the post
Which is the fastest way to copy 400G of files from an ec2 elastic block store volume to s3?
for suggestions of products and scripts.
See also the article
FS File Sync – Faster File Transfer To Amazon EFS File Systems.
add a comment |
The only way to speed the upload is to do it in multiple parts in parallel.
If you can divide the job among several computers using distinct connections,
this will speed up the upload.
If a single computer does not reach full throughput, you can opt for a
multi-thread method where each thread will open its own connection in parallel.
See the post
Which is the fastest way to copy 400G of files from an ec2 elastic block store volume to s3?
for suggestions of products and scripts.
See also the article
FS File Sync – Faster File Transfer To Amazon EFS File Systems.
The only way to speed the upload is to do it in multiple parts in parallel.
If you can divide the job among several computers using distinct connections,
this will speed up the upload.
If a single computer does not reach full throughput, you can opt for a
multi-thread method where each thread will open its own connection in parallel.
See the post
Which is the fastest way to copy 400G of files from an ec2 elastic block store volume to s3?
for suggestions of products and scripts.
See also the article
FS File Sync – Faster File Transfer To Amazon EFS File Systems.
answered Feb 27 at 16:55
harrymcharrymc
264k14273581
264k14273581
add a comment |
add a comment |
When using scp, it doesn't retry or continue on partially transferred files.
Try using rsync instead, e.g.
rsync -vuaz remote_username@10.10.0.2:/remote/directory/ /local/directory/
Arguments:
-v/--verboseincrease verbosity.
-u/--updateskip files that are newer on the receiver.
-a/--archivearchive mode; equals-rlptgoD
-z/--compresscompress file data during the transfer.
My maximum files are images and size is less than 2Mb, do you think rsync will do faster than SCP? I already started copying an 80GB directory, so if I close puty now and start using rsync do you this I will get an issue with already downloaded files?
– Mi2
Feb 27 at 17:18
Withscp, once you got transfer error, you've to transfer everything over and over again, as you don't know which files were copied fully, which not. Rsync will make the list of all files which needs to be updated, before copying anything. You can usersyncafter you usedscp, so it can continue from the point wherescpfinished. Not sure if it's faster, the speed could be the same. You can leave putty (to avoid unnecessary changes), once you got any transfer issue, continue withrsync.
– kenorb
Feb 27 at 17:30
add a comment |
When using scp, it doesn't retry or continue on partially transferred files.
Try using rsync instead, e.g.
rsync -vuaz remote_username@10.10.0.2:/remote/directory/ /local/directory/
Arguments:
-v/--verboseincrease verbosity.
-u/--updateskip files that are newer on the receiver.
-a/--archivearchive mode; equals-rlptgoD
-z/--compresscompress file data during the transfer.
My maximum files are images and size is less than 2Mb, do you think rsync will do faster than SCP? I already started copying an 80GB directory, so if I close puty now and start using rsync do you this I will get an issue with already downloaded files?
– Mi2
Feb 27 at 17:18
Withscp, once you got transfer error, you've to transfer everything over and over again, as you don't know which files were copied fully, which not. Rsync will make the list of all files which needs to be updated, before copying anything. You can usersyncafter you usedscp, so it can continue from the point wherescpfinished. Not sure if it's faster, the speed could be the same. You can leave putty (to avoid unnecessary changes), once you got any transfer issue, continue withrsync.
– kenorb
Feb 27 at 17:30
add a comment |
When using scp, it doesn't retry or continue on partially transferred files.
Try using rsync instead, e.g.
rsync -vuaz remote_username@10.10.0.2:/remote/directory/ /local/directory/
Arguments:
-v/--verboseincrease verbosity.
-u/--updateskip files that are newer on the receiver.
-a/--archivearchive mode; equals-rlptgoD
-z/--compresscompress file data during the transfer.
When using scp, it doesn't retry or continue on partially transferred files.
Try using rsync instead, e.g.
rsync -vuaz remote_username@10.10.0.2:/remote/directory/ /local/directory/
Arguments:
-v/--verboseincrease verbosity.
-u/--updateskip files that are newer on the receiver.
-a/--archivearchive mode; equals-rlptgoD
-z/--compresscompress file data during the transfer.
answered Feb 27 at 17:02
kenorbkenorb
11.5k1580116
11.5k1580116
My maximum files are images and size is less than 2Mb, do you think rsync will do faster than SCP? I already started copying an 80GB directory, so if I close puty now and start using rsync do you this I will get an issue with already downloaded files?
– Mi2
Feb 27 at 17:18
Withscp, once you got transfer error, you've to transfer everything over and over again, as you don't know which files were copied fully, which not. Rsync will make the list of all files which needs to be updated, before copying anything. You can usersyncafter you usedscp, so it can continue from the point wherescpfinished. Not sure if it's faster, the speed could be the same. You can leave putty (to avoid unnecessary changes), once you got any transfer issue, continue withrsync.
– kenorb
Feb 27 at 17:30
add a comment |
My maximum files are images and size is less than 2Mb, do you think rsync will do faster than SCP? I already started copying an 80GB directory, so if I close puty now and start using rsync do you this I will get an issue with already downloaded files?
– Mi2
Feb 27 at 17:18
Withscp, once you got transfer error, you've to transfer everything over and over again, as you don't know which files were copied fully, which not. Rsync will make the list of all files which needs to be updated, before copying anything. You can usersyncafter you usedscp, so it can continue from the point wherescpfinished. Not sure if it's faster, the speed could be the same. You can leave putty (to avoid unnecessary changes), once you got any transfer issue, continue withrsync.
– kenorb
Feb 27 at 17:30
My maximum files are images and size is less than 2Mb, do you think rsync will do faster than SCP? I already started copying an 80GB directory, so if I close puty now and start using rsync do you this I will get an issue with already downloaded files?
– Mi2
Feb 27 at 17:18
My maximum files are images and size is less than 2Mb, do you think rsync will do faster than SCP? I already started copying an 80GB directory, so if I close puty now and start using rsync do you this I will get an issue with already downloaded files?
– Mi2
Feb 27 at 17:18
With
scp, once you got transfer error, you've to transfer everything over and over again, as you don't know which files were copied fully, which not. Rsync will make the list of all files which needs to be updated, before copying anything. You can use rsync after you used scp, so it can continue from the point where scp finished. Not sure if it's faster, the speed could be the same. You can leave putty (to avoid unnecessary changes), once you got any transfer issue, continue with rsync.– kenorb
Feb 27 at 17:30
With
scp, once you got transfer error, you've to transfer everything over and over again, as you don't know which files were copied fully, which not. Rsync will make the list of all files which needs to be updated, before copying anything. You can use rsync after you used scp, so it can continue from the point where scp finished. Not sure if it's faster, the speed could be the same. You can leave putty (to avoid unnecessary changes), once you got any transfer issue, continue with rsync.– kenorb
Feb 27 at 17:30
add a comment |
Try installing AWS CLI on your dedicated server.
Then use aws s3 command to transfer the files to your AWS S3 bucket first.
E.g.
aws s3 sync local/directory s3://mybucket/local/directory
Then transfer back to your local EC2 instance:
aws s3 sync s3://mybucket/local/directory local/directory
The command is designed to copy large number of files, and it can continue on failure.
You can also decide to serve the files for EC2 instance directly from S3.
Check aws s3 sync help for help.
add a comment |
Try installing AWS CLI on your dedicated server.
Then use aws s3 command to transfer the files to your AWS S3 bucket first.
E.g.
aws s3 sync local/directory s3://mybucket/local/directory
Then transfer back to your local EC2 instance:
aws s3 sync s3://mybucket/local/directory local/directory
The command is designed to copy large number of files, and it can continue on failure.
You can also decide to serve the files for EC2 instance directly from S3.
Check aws s3 sync help for help.
add a comment |
Try installing AWS CLI on your dedicated server.
Then use aws s3 command to transfer the files to your AWS S3 bucket first.
E.g.
aws s3 sync local/directory s3://mybucket/local/directory
Then transfer back to your local EC2 instance:
aws s3 sync s3://mybucket/local/directory local/directory
The command is designed to copy large number of files, and it can continue on failure.
You can also decide to serve the files for EC2 instance directly from S3.
Check aws s3 sync help for help.
Try installing AWS CLI on your dedicated server.
Then use aws s3 command to transfer the files to your AWS S3 bucket first.
E.g.
aws s3 sync local/directory s3://mybucket/local/directory
Then transfer back to your local EC2 instance:
aws s3 sync s3://mybucket/local/directory local/directory
The command is designed to copy large number of files, and it can continue on failure.
You can also decide to serve the files for EC2 instance directly from S3.
Check aws s3 sync help for help.
edited Feb 27 at 17:33
answered Feb 27 at 17:08
kenorbkenorb
11.5k1580116
11.5k1580116
add a comment |
add a comment |
Please Edit the question (to the bottom left of the question text) to indicate the OS of the source machine, and any other specifications like confirming exact copy.
– Christopher Hostage
Feb 27 at 16:31
If you're willing to spend money, there are commercial file-transfer solutions which are much faster than scp, rsync, or sftp.
– Kenster
Feb 27 at 20:55
@Kenster Thank you but I already started using SCP and almost 50% completed so in this situation I don't want to spend money on transfer files.
– Mi2
Mar 1 at 15:58