Incremental backups with tar where current file has most recent and previous files only have different...











up vote
1
down vote

favorite












I am somewhat familiar with how to use tar's --listed-incremental flag to take incremental backups. The end result is a backup-0 file that has the first full back-up and then backup-1, backup-2, ..., backup-x with the changes in order of the backups.



In the past I have used rsync and hard-links to make backups where backup-0 is current state and each backup-x folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).



I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.



So the idea is to have a growing list of files like so:





  • backup-0.tar.bz2 - this is the current backup and will be the biggest because it is a full backup


  • backup-1.tar.bz2 - this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2)


  • backup-2.tar.bz2 - this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2)


  • backup-3.tar.bz2 - ...


  • backup-4.tar.bz2 - ...


  • backup-5.tar.bz2 - ...


If that doesn't make sense hopefully this will.



First time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file2

  3. make backup-0.tar.bz2


At this point backup-0.tar.bz2 has /tmp/file1 and /tmp/file2.



Second time:




  1. $ touch /tmp/file3

  2. $ rm /tmp/file2

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1 and /tmp/file3


  • backup-1.tar.bz2 has /tmp/file2; it doesn't have file1 cause it didn't change so it's in backup-0.tar.bz2


Third time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file4

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1, /tmp/file3, and /tmp/file4


  • backup-1.tar.bz2 has /tmp/file1 because it was changed


  • backup-2.tar.bz2 has /tmp/file2


Like so:



|       | first time | second time | third time              |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |


I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.




  1. first time = take backup-0

  2. second time


    1. rename backup-0 to backup-1

    2. take backup-0

    3. remove everything from backup-1 that matches backup-0



  3. third time


    1. rename backup-1 to backup-2

    2. rename backup-0 to backup-1

    3. take backup-0

    4. remove everything from backup-1 that matches backup-0



  4. fourth time


    1. rename backup-2 to backup-3

    2. rename backup-1 to backup-2

    3. rename backup-0 to backup-1

    4. take backup-0

    5. remove everything from backup-1 that matches backup-0




I feel like it's that last step (remove everything from backup-1 that matches backup-0) that is inefficient.



My question is, how can I do this? If I use tar's --listed-incremental it'll do the reverse of what I am trying.










share|improve this question
























  • How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
    – IMTheNachoMan
    Nov 17 at 5:52















up vote
1
down vote

favorite












I am somewhat familiar with how to use tar's --listed-incremental flag to take incremental backups. The end result is a backup-0 file that has the first full back-up and then backup-1, backup-2, ..., backup-x with the changes in order of the backups.



In the past I have used rsync and hard-links to make backups where backup-0 is current state and each backup-x folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).



I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.



So the idea is to have a growing list of files like so:





  • backup-0.tar.bz2 - this is the current backup and will be the biggest because it is a full backup


  • backup-1.tar.bz2 - this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2)


  • backup-2.tar.bz2 - this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2)


  • backup-3.tar.bz2 - ...


  • backup-4.tar.bz2 - ...


  • backup-5.tar.bz2 - ...


If that doesn't make sense hopefully this will.



First time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file2

  3. make backup-0.tar.bz2


At this point backup-0.tar.bz2 has /tmp/file1 and /tmp/file2.



Second time:




  1. $ touch /tmp/file3

  2. $ rm /tmp/file2

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1 and /tmp/file3


  • backup-1.tar.bz2 has /tmp/file2; it doesn't have file1 cause it didn't change so it's in backup-0.tar.bz2


Third time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file4

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1, /tmp/file3, and /tmp/file4


  • backup-1.tar.bz2 has /tmp/file1 because it was changed


  • backup-2.tar.bz2 has /tmp/file2


Like so:



|       | first time | second time | third time              |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |


I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.




  1. first time = take backup-0

  2. second time


    1. rename backup-0 to backup-1

    2. take backup-0

    3. remove everything from backup-1 that matches backup-0



  3. third time


    1. rename backup-1 to backup-2

    2. rename backup-0 to backup-1

    3. take backup-0

    4. remove everything from backup-1 that matches backup-0



  4. fourth time


    1. rename backup-2 to backup-3

    2. rename backup-1 to backup-2

    3. rename backup-0 to backup-1

    4. take backup-0

    5. remove everything from backup-1 that matches backup-0




I feel like it's that last step (remove everything from backup-1 that matches backup-0) that is inefficient.



My question is, how can I do this? If I use tar's --listed-incremental it'll do the reverse of what I am trying.










share|improve this question
























  • How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
    – IMTheNachoMan
    Nov 17 at 5:52













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I am somewhat familiar with how to use tar's --listed-incremental flag to take incremental backups. The end result is a backup-0 file that has the first full back-up and then backup-1, backup-2, ..., backup-x with the changes in order of the backups.



In the past I have used rsync and hard-links to make backups where backup-0 is current state and each backup-x folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).



I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.



So the idea is to have a growing list of files like so:





  • backup-0.tar.bz2 - this is the current backup and will be the biggest because it is a full backup


  • backup-1.tar.bz2 - this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2)


  • backup-2.tar.bz2 - this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2)


  • backup-3.tar.bz2 - ...


  • backup-4.tar.bz2 - ...


  • backup-5.tar.bz2 - ...


If that doesn't make sense hopefully this will.



First time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file2

  3. make backup-0.tar.bz2


At this point backup-0.tar.bz2 has /tmp/file1 and /tmp/file2.



Second time:




  1. $ touch /tmp/file3

  2. $ rm /tmp/file2

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1 and /tmp/file3


  • backup-1.tar.bz2 has /tmp/file2; it doesn't have file1 cause it didn't change so it's in backup-0.tar.bz2


Third time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file4

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1, /tmp/file3, and /tmp/file4


  • backup-1.tar.bz2 has /tmp/file1 because it was changed


  • backup-2.tar.bz2 has /tmp/file2


Like so:



|       | first time | second time | third time              |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |


I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.




  1. first time = take backup-0

  2. second time


    1. rename backup-0 to backup-1

    2. take backup-0

    3. remove everything from backup-1 that matches backup-0



  3. third time


    1. rename backup-1 to backup-2

    2. rename backup-0 to backup-1

    3. take backup-0

    4. remove everything from backup-1 that matches backup-0



  4. fourth time


    1. rename backup-2 to backup-3

    2. rename backup-1 to backup-2

    3. rename backup-0 to backup-1

    4. take backup-0

    5. remove everything from backup-1 that matches backup-0




I feel like it's that last step (remove everything from backup-1 that matches backup-0) that is inefficient.



My question is, how can I do this? If I use tar's --listed-incremental it'll do the reverse of what I am trying.










share|improve this question















I am somewhat familiar with how to use tar's --listed-incremental flag to take incremental backups. The end result is a backup-0 file that has the first full back-up and then backup-1, backup-2, ..., backup-x with the changes in order of the backups.



In the past I have used rsync and hard-links to make backups where backup-0 is current state and each backup-x folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).



I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.



So the idea is to have a growing list of files like so:





  • backup-0.tar.bz2 - this is the current backup and will be the biggest because it is a full backup


  • backup-1.tar.bz2 - this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2)


  • backup-2.tar.bz2 - this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2)


  • backup-3.tar.bz2 - ...


  • backup-4.tar.bz2 - ...


  • backup-5.tar.bz2 - ...


If that doesn't make sense hopefully this will.



First time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file2

  3. make backup-0.tar.bz2


At this point backup-0.tar.bz2 has /tmp/file1 and /tmp/file2.



Second time:




  1. $ touch /tmp/file3

  2. $ rm /tmp/file2

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1 and /tmp/file3


  • backup-1.tar.bz2 has /tmp/file2; it doesn't have file1 cause it didn't change so it's in backup-0.tar.bz2


Third time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file4

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1, /tmp/file3, and /tmp/file4


  • backup-1.tar.bz2 has /tmp/file1 because it was changed


  • backup-2.tar.bz2 has /tmp/file2


Like so:



|       | first time | second time | third time              |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |


I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.




  1. first time = take backup-0

  2. second time


    1. rename backup-0 to backup-1

    2. take backup-0

    3. remove everything from backup-1 that matches backup-0



  3. third time


    1. rename backup-1 to backup-2

    2. rename backup-0 to backup-1

    3. take backup-0

    4. remove everything from backup-1 that matches backup-0



  4. fourth time


    1. rename backup-2 to backup-3

    2. rename backup-1 to backup-2

    3. rename backup-0 to backup-1

    4. take backup-0

    5. remove everything from backup-1 that matches backup-0




I feel like it's that last step (remove everything from backup-1 that matches backup-0) that is inefficient.



My question is, how can I do this? If I use tar's --listed-incremental it'll do the reverse of what I am trying.







linux backup tar






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 17 at 6:38

























asked Nov 17 at 4:19









IMTheNachoMan

17212




17212












  • How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
    – IMTheNachoMan
    Nov 17 at 5:52


















  • How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
    – IMTheNachoMan
    Nov 17 at 5:52
















How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
– IMTheNachoMan
Nov 17 at 5:52




How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
– IMTheNachoMan
Nov 17 at 5:52










1 Answer
1






active

oldest

votes

















up vote
0
down vote














If I use tar's --listed-incremental it'll do the reverse of what I am trying.




It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:




  1. Rename backup-N to backup-(N+1) looping from Nmax down to 0.

  2. Restore full backup (now backup-1) to a temporary directory.

  3. Create backup-0 from the current data with a new snapshot file.

  4. Remove backup-1 (previous full backup).

  5. Treat the temporary directory as a "new" version. Create backup-1 as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).


You may wonder if this will keep the old (kept) backup-N files coherent with the new ones. A reasonable doubt, since the manual says:




-g, --listed-incremental=FILE

Handle new GNU-format incremental backups. FILE is the name of a snapshot file, where tar stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. If FILE does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level 0 dump). To create incremental archives of non-zero level N, create a copy of the snapshot file created during the level N-1, and use it as FILE.




So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N files every time you perform a full backup. But then:




When listing or extracting, the actual contents of FILE is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use /dev/null in its place.




This means if you extract backup-N files in increasing sequence to get a state from some time ago, any backup-M file (M>0) only expects a valid M-1 state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M file based on a full backup (as you will do, every backup-M will start as backup-1 where backup-0 is a full backup) or based on a chain of incremental backups (as the manual suggests).





I understand your point is to keep backup-0 as an up-to-date full backup and to be able to "go back in time" with backup-0, backup-1, backup-2, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1 and upload a full new backup-0 every time. If your data is huge then uploading a full backup every time will be a pain.



For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup few times:




rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync.




Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.






share|improve this answer





















    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "3"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1376154%2fincremental-backups-with-tar-where-current-file-has-most-recent-and-previous-fil%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote














    If I use tar's --listed-incremental it'll do the reverse of what I am trying.




    It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:




    1. Rename backup-N to backup-(N+1) looping from Nmax down to 0.

    2. Restore full backup (now backup-1) to a temporary directory.

    3. Create backup-0 from the current data with a new snapshot file.

    4. Remove backup-1 (previous full backup).

    5. Treat the temporary directory as a "new" version. Create backup-1 as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).


    You may wonder if this will keep the old (kept) backup-N files coherent with the new ones. A reasonable doubt, since the manual says:




    -g, --listed-incremental=FILE

    Handle new GNU-format incremental backups. FILE is the name of a snapshot file, where tar stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. If FILE does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level 0 dump). To create incremental archives of non-zero level N, create a copy of the snapshot file created during the level N-1, and use it as FILE.




    So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N files every time you perform a full backup. But then:




    When listing or extracting, the actual contents of FILE is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use /dev/null in its place.




    This means if you extract backup-N files in increasing sequence to get a state from some time ago, any backup-M file (M>0) only expects a valid M-1 state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M file based on a full backup (as you will do, every backup-M will start as backup-1 where backup-0 is a full backup) or based on a chain of incremental backups (as the manual suggests).





    I understand your point is to keep backup-0 as an up-to-date full backup and to be able to "go back in time" with backup-0, backup-1, backup-2, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1 and upload a full new backup-0 every time. If your data is huge then uploading a full backup every time will be a pain.



    For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup few times:




    rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync.




    Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.






    share|improve this answer

























      up vote
      0
      down vote














      If I use tar's --listed-incremental it'll do the reverse of what I am trying.




      It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:




      1. Rename backup-N to backup-(N+1) looping from Nmax down to 0.

      2. Restore full backup (now backup-1) to a temporary directory.

      3. Create backup-0 from the current data with a new snapshot file.

      4. Remove backup-1 (previous full backup).

      5. Treat the temporary directory as a "new" version. Create backup-1 as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).


      You may wonder if this will keep the old (kept) backup-N files coherent with the new ones. A reasonable doubt, since the manual says:




      -g, --listed-incremental=FILE

      Handle new GNU-format incremental backups. FILE is the name of a snapshot file, where tar stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. If FILE does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level 0 dump). To create incremental archives of non-zero level N, create a copy of the snapshot file created during the level N-1, and use it as FILE.




      So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N files every time you perform a full backup. But then:




      When listing or extracting, the actual contents of FILE is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use /dev/null in its place.




      This means if you extract backup-N files in increasing sequence to get a state from some time ago, any backup-M file (M>0) only expects a valid M-1 state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M file based on a full backup (as you will do, every backup-M will start as backup-1 where backup-0 is a full backup) or based on a chain of incremental backups (as the manual suggests).





      I understand your point is to keep backup-0 as an up-to-date full backup and to be able to "go back in time" with backup-0, backup-1, backup-2, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1 and upload a full new backup-0 every time. If your data is huge then uploading a full backup every time will be a pain.



      For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup few times:




      rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync.




      Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote










        If I use tar's --listed-incremental it'll do the reverse of what I am trying.




        It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:




        1. Rename backup-N to backup-(N+1) looping from Nmax down to 0.

        2. Restore full backup (now backup-1) to a temporary directory.

        3. Create backup-0 from the current data with a new snapshot file.

        4. Remove backup-1 (previous full backup).

        5. Treat the temporary directory as a "new" version. Create backup-1 as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).


        You may wonder if this will keep the old (kept) backup-N files coherent with the new ones. A reasonable doubt, since the manual says:




        -g, --listed-incremental=FILE

        Handle new GNU-format incremental backups. FILE is the name of a snapshot file, where tar stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. If FILE does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level 0 dump). To create incremental archives of non-zero level N, create a copy of the snapshot file created during the level N-1, and use it as FILE.




        So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N files every time you perform a full backup. But then:




        When listing or extracting, the actual contents of FILE is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use /dev/null in its place.




        This means if you extract backup-N files in increasing sequence to get a state from some time ago, any backup-M file (M>0) only expects a valid M-1 state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M file based on a full backup (as you will do, every backup-M will start as backup-1 where backup-0 is a full backup) or based on a chain of incremental backups (as the manual suggests).





        I understand your point is to keep backup-0 as an up-to-date full backup and to be able to "go back in time" with backup-0, backup-1, backup-2, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1 and upload a full new backup-0 every time. If your data is huge then uploading a full backup every time will be a pain.



        For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup few times:




        rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync.




        Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.






        share|improve this answer













        If I use tar's --listed-incremental it'll do the reverse of what I am trying.




        It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:




        1. Rename backup-N to backup-(N+1) looping from Nmax down to 0.

        2. Restore full backup (now backup-1) to a temporary directory.

        3. Create backup-0 from the current data with a new snapshot file.

        4. Remove backup-1 (previous full backup).

        5. Treat the temporary directory as a "new" version. Create backup-1 as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).


        You may wonder if this will keep the old (kept) backup-N files coherent with the new ones. A reasonable doubt, since the manual says:




        -g, --listed-incremental=FILE

        Handle new GNU-format incremental backups. FILE is the name of a snapshot file, where tar stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. If FILE does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level 0 dump). To create incremental archives of non-zero level N, create a copy of the snapshot file created during the level N-1, and use it as FILE.




        So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N files every time you perform a full backup. But then:




        When listing or extracting, the actual contents of FILE is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use /dev/null in its place.




        This means if you extract backup-N files in increasing sequence to get a state from some time ago, any backup-M file (M>0) only expects a valid M-1 state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M file based on a full backup (as you will do, every backup-M will start as backup-1 where backup-0 is a full backup) or based on a chain of incremental backups (as the manual suggests).





        I understand your point is to keep backup-0 as an up-to-date full backup and to be able to "go back in time" with backup-0, backup-1, backup-2, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1 and upload a full new backup-0 every time. If your data is huge then uploading a full backup every time will be a pain.



        For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup few times:




        rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync.




        Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 17 at 7:26









        Kamil Maciorowski

        22.6k155072




        22.6k155072






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1376154%2fincremental-backups-with-tar-where-current-file-has-most-recent-and-previous-fil%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Probability when a professor distributes a quiz and homework assignment to a class of n students.

            Aardman Animations

            Are they similar matrix