Does compression into one large archive result in better compression than individual compression of folders?











up vote
1
down vote

favorite












I have several folders of around 8GB or so. Together these folders total around 60GB of data. I can compress these folders one of two ways: either individually, creating one compressed archive for each of them, or altogether into a single large compressed archive.



Generally speaking, assuming all the data to be compressed is of the same type and the compression algorithm used is the same (and that I also don't care about the time it would take to decompress the larger file), will either method result in better compression than another, or will the total sizes of the compressed files in the two scenarios tend to be equal?










share|improve this question


























    up vote
    1
    down vote

    favorite












    I have several folders of around 8GB or so. Together these folders total around 60GB of data. I can compress these folders one of two ways: either individually, creating one compressed archive for each of them, or altogether into a single large compressed archive.



    Generally speaking, assuming all the data to be compressed is of the same type and the compression algorithm used is the same (and that I also don't care about the time it would take to decompress the larger file), will either method result in better compression than another, or will the total sizes of the compressed files in the two scenarios tend to be equal?










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have several folders of around 8GB or so. Together these folders total around 60GB of data. I can compress these folders one of two ways: either individually, creating one compressed archive for each of them, or altogether into a single large compressed archive.



      Generally speaking, assuming all the data to be compressed is of the same type and the compression algorithm used is the same (and that I also don't care about the time it would take to decompress the larger file), will either method result in better compression than another, or will the total sizes of the compressed files in the two scenarios tend to be equal?










      share|improve this question













      I have several folders of around 8GB or so. Together these folders total around 60GB of data. I can compress these folders one of two ways: either individually, creating one compressed archive for each of them, or altogether into a single large compressed archive.



      Generally speaking, assuming all the data to be compressed is of the same type and the compression algorithm used is the same (and that I also don't care about the time it would take to decompress the larger file), will either method result in better compression than another, or will the total sizes of the compressed files in the two scenarios tend to be equal?







      windows compression 7-zip archiving






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Dec 5 at 0:07









      Hashim

      2,97363056




      2,97363056






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.



          Only if the archive is using solid compression. A non-solid archive (like a Zip archive) compresses files individually. This enables you to easily decompress single files from the archive. It also allows you to add files to the archive without having to recompress everything.



          With solid archives, all this is a lot harder: To decompress a file at the very end of the stream, everything has to be decompressed (though not necessarily written to disk). When adding a file, the algorithm also needs to go through everything.



          There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.



          In the 7-Zip GUI, it’s this option:



          7-Zip Add dialog



          Without taking into account the data being compressed, it’s really simple:




          • Non-solid: Fast interactive access, worst compression

          • Solid blocks: Somewhat efficient interactive access, better compression

          • Solid: No interactive access, best compression


          Depending on the predicted access pattern, you should select a suitable variant.






          share|improve this answer




























            up vote
            3
            down vote













            While it is impossible to say with absolute certainty, one larger archive theoretically should result in a smaller archive size, as more blocks of data can be found as repetitive. This is assuming the data is as homogenized as you say.



            However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.



            The only true way to know which method is best would be to test both ways.






            share|improve this answer




























              up vote
              1
              down vote













              The single archive will almost always be smaller, though not for the reason you think.



              Put simply, by having only one archive, you don't waste space with multiple archive file headers. There's some minimal amount of space an archive file takes up just to be a valid archive, and you end up taking up that much space with each archive you create. The only widely used exception to this is the cpio format, which has no header for the archive itself, but instead just has per-file headers.



              More realistically, you will usually get at least as good of a compression ratio using just one archive instead of more than one, and with some archivers it can be significantly better (for example, zpaq does deduplication within the archive, so it can save a lot of space if there's lots of duplicated data).



              There's another question you need to ask before you decide on this though: Is the overhead of having to handle a single large archive instead of multiple smaller ones worth the space savings? Depending on where you're storing the data, it may be more economical to just use the smaller archives, especially if you're likely to only need one of the folders at a time.



              Overall though, Keltari is correct, the only way to know for sure is to test it.






              share|improve this answer





















                Your Answer








                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "3"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1380866%2fdoes-compression-into-one-large-archive-result-in-better-compression-than-indivi%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                3
                down vote



                accepted










                Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.



                Only if the archive is using solid compression. A non-solid archive (like a Zip archive) compresses files individually. This enables you to easily decompress single files from the archive. It also allows you to add files to the archive without having to recompress everything.



                With solid archives, all this is a lot harder: To decompress a file at the very end of the stream, everything has to be decompressed (though not necessarily written to disk). When adding a file, the algorithm also needs to go through everything.



                There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.



                In the 7-Zip GUI, it’s this option:



                7-Zip Add dialog



                Without taking into account the data being compressed, it’s really simple:




                • Non-solid: Fast interactive access, worst compression

                • Solid blocks: Somewhat efficient interactive access, better compression

                • Solid: No interactive access, best compression


                Depending on the predicted access pattern, you should select a suitable variant.






                share|improve this answer

























                  up vote
                  3
                  down vote



                  accepted










                  Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.



                  Only if the archive is using solid compression. A non-solid archive (like a Zip archive) compresses files individually. This enables you to easily decompress single files from the archive. It also allows you to add files to the archive without having to recompress everything.



                  With solid archives, all this is a lot harder: To decompress a file at the very end of the stream, everything has to be decompressed (though not necessarily written to disk). When adding a file, the algorithm also needs to go through everything.



                  There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.



                  In the 7-Zip GUI, it’s this option:



                  7-Zip Add dialog



                  Without taking into account the data being compressed, it’s really simple:




                  • Non-solid: Fast interactive access, worst compression

                  • Solid blocks: Somewhat efficient interactive access, better compression

                  • Solid: No interactive access, best compression


                  Depending on the predicted access pattern, you should select a suitable variant.






                  share|improve this answer























                    up vote
                    3
                    down vote



                    accepted







                    up vote
                    3
                    down vote



                    accepted






                    Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.



                    Only if the archive is using solid compression. A non-solid archive (like a Zip archive) compresses files individually. This enables you to easily decompress single files from the archive. It also allows you to add files to the archive without having to recompress everything.



                    With solid archives, all this is a lot harder: To decompress a file at the very end of the stream, everything has to be decompressed (though not necessarily written to disk). When adding a file, the algorithm also needs to go through everything.



                    There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.



                    In the 7-Zip GUI, it’s this option:



                    7-Zip Add dialog



                    Without taking into account the data being compressed, it’s really simple:




                    • Non-solid: Fast interactive access, worst compression

                    • Solid blocks: Somewhat efficient interactive access, better compression

                    • Solid: No interactive access, best compression


                    Depending on the predicted access pattern, you should select a suitable variant.






                    share|improve this answer












                    Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.



                    Only if the archive is using solid compression. A non-solid archive (like a Zip archive) compresses files individually. This enables you to easily decompress single files from the archive. It also allows you to add files to the archive without having to recompress everything.



                    With solid archives, all this is a lot harder: To decompress a file at the very end of the stream, everything has to be decompressed (though not necessarily written to disk). When adding a file, the algorithm also needs to go through everything.



                    There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.



                    In the 7-Zip GUI, it’s this option:



                    7-Zip Add dialog



                    Without taking into account the data being compressed, it’s really simple:




                    • Non-solid: Fast interactive access, worst compression

                    • Solid blocks: Somewhat efficient interactive access, better compression

                    • Solid: No interactive access, best compression


                    Depending on the predicted access pattern, you should select a suitable variant.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Dec 5 at 21:03









                    Daniel B

                    33.1k75987




                    33.1k75987
























                        up vote
                        3
                        down vote













                        While it is impossible to say with absolute certainty, one larger archive theoretically should result in a smaller archive size, as more blocks of data can be found as repetitive. This is assuming the data is as homogenized as you say.



                        However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.



                        The only true way to know which method is best would be to test both ways.






                        share|improve this answer

























                          up vote
                          3
                          down vote













                          While it is impossible to say with absolute certainty, one larger archive theoretically should result in a smaller archive size, as more blocks of data can be found as repetitive. This is assuming the data is as homogenized as you say.



                          However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.



                          The only true way to know which method is best would be to test both ways.






                          share|improve this answer























                            up vote
                            3
                            down vote










                            up vote
                            3
                            down vote









                            While it is impossible to say with absolute certainty, one larger archive theoretically should result in a smaller archive size, as more blocks of data can be found as repetitive. This is assuming the data is as homogenized as you say.



                            However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.



                            The only true way to know which method is best would be to test both ways.






                            share|improve this answer












                            While it is impossible to say with absolute certainty, one larger archive theoretically should result in a smaller archive size, as more blocks of data can be found as repetitive. This is assuming the data is as homogenized as you say.



                            However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.



                            The only true way to know which method is best would be to test both ways.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Dec 5 at 0:36









                            Keltari

                            50.1k18115167




                            50.1k18115167






















                                up vote
                                1
                                down vote













                                The single archive will almost always be smaller, though not for the reason you think.



                                Put simply, by having only one archive, you don't waste space with multiple archive file headers. There's some minimal amount of space an archive file takes up just to be a valid archive, and you end up taking up that much space with each archive you create. The only widely used exception to this is the cpio format, which has no header for the archive itself, but instead just has per-file headers.



                                More realistically, you will usually get at least as good of a compression ratio using just one archive instead of more than one, and with some archivers it can be significantly better (for example, zpaq does deduplication within the archive, so it can save a lot of space if there's lots of duplicated data).



                                There's another question you need to ask before you decide on this though: Is the overhead of having to handle a single large archive instead of multiple smaller ones worth the space savings? Depending on where you're storing the data, it may be more economical to just use the smaller archives, especially if you're likely to only need one of the folders at a time.



                                Overall though, Keltari is correct, the only way to know for sure is to test it.






                                share|improve this answer

























                                  up vote
                                  1
                                  down vote













                                  The single archive will almost always be smaller, though not for the reason you think.



                                  Put simply, by having only one archive, you don't waste space with multiple archive file headers. There's some minimal amount of space an archive file takes up just to be a valid archive, and you end up taking up that much space with each archive you create. The only widely used exception to this is the cpio format, which has no header for the archive itself, but instead just has per-file headers.



                                  More realistically, you will usually get at least as good of a compression ratio using just one archive instead of more than one, and with some archivers it can be significantly better (for example, zpaq does deduplication within the archive, so it can save a lot of space if there's lots of duplicated data).



                                  There's another question you need to ask before you decide on this though: Is the overhead of having to handle a single large archive instead of multiple smaller ones worth the space savings? Depending on where you're storing the data, it may be more economical to just use the smaller archives, especially if you're likely to only need one of the folders at a time.



                                  Overall though, Keltari is correct, the only way to know for sure is to test it.






                                  share|improve this answer























                                    up vote
                                    1
                                    down vote










                                    up vote
                                    1
                                    down vote









                                    The single archive will almost always be smaller, though not for the reason you think.



                                    Put simply, by having only one archive, you don't waste space with multiple archive file headers. There's some minimal amount of space an archive file takes up just to be a valid archive, and you end up taking up that much space with each archive you create. The only widely used exception to this is the cpio format, which has no header for the archive itself, but instead just has per-file headers.



                                    More realistically, you will usually get at least as good of a compression ratio using just one archive instead of more than one, and with some archivers it can be significantly better (for example, zpaq does deduplication within the archive, so it can save a lot of space if there's lots of duplicated data).



                                    There's another question you need to ask before you decide on this though: Is the overhead of having to handle a single large archive instead of multiple smaller ones worth the space savings? Depending on where you're storing the data, it may be more economical to just use the smaller archives, especially if you're likely to only need one of the folders at a time.



                                    Overall though, Keltari is correct, the only way to know for sure is to test it.






                                    share|improve this answer












                                    The single archive will almost always be smaller, though not for the reason you think.



                                    Put simply, by having only one archive, you don't waste space with multiple archive file headers. There's some minimal amount of space an archive file takes up just to be a valid archive, and you end up taking up that much space with each archive you create. The only widely used exception to this is the cpio format, which has no header for the archive itself, but instead just has per-file headers.



                                    More realistically, you will usually get at least as good of a compression ratio using just one archive instead of more than one, and with some archivers it can be significantly better (for example, zpaq does deduplication within the archive, so it can save a lot of space if there's lots of duplicated data).



                                    There's another question you need to ask before you decide on this though: Is the overhead of having to handle a single large archive instead of multiple smaller ones worth the space savings? Depending on where you're storing the data, it may be more economical to just use the smaller archives, especially if you're likely to only need one of the folders at a time.



                                    Overall though, Keltari is correct, the only way to know for sure is to test it.







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Dec 5 at 20:31









                                    Austin Hemmelgarn

                                    2,46418




                                    2,46418






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Super User!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.





                                        Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                        Please pay close attention to the following guidance:


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1380866%2fdoes-compression-into-one-large-archive-result-in-better-compression-than-indivi%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Probability when a professor distributes a quiz and homework assignment to a class of n students.

                                        Aardman Animations

                                        Are they similar matrix