Key derivation: bit lengths











up vote
1
down vote

favorite












This is a follow-up question to



HKDF: ikm, salt and info values



Based on the feedback, I have now decided to implement my key derivation for AES-GCM-256 file encryption roughly as follows:



enter image description here



Everything up to scrypt is run once, the HKDF-expand part is repeated for every file to be encrypted. The random salt and info values get stored (unencrypted) with the encrypted file.



My questions:




  1. Do you see any serious flaws?


  2. What is your opinion about the various bitlengths? In particular: Does it make any sense at all to blow up everything to 512 bits temporarily when, in the end, I use only 256 bits for the data protection key?











share|improve this question
























  • Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
    – Maarten Bodewes
    Nov 23 at 11:13












  • The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
    – FineJoe
    Nov 23 at 11:31












  • Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
    – Maarten Bodewes
    Nov 23 at 11:39












  • Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
    – FineJoe
    Nov 23 at 13:19










  • See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
    – Maarten Bodewes
    Nov 23 at 13:22

















up vote
1
down vote

favorite












This is a follow-up question to



HKDF: ikm, salt and info values



Based on the feedback, I have now decided to implement my key derivation for AES-GCM-256 file encryption roughly as follows:



enter image description here



Everything up to scrypt is run once, the HKDF-expand part is repeated for every file to be encrypted. The random salt and info values get stored (unencrypted) with the encrypted file.



My questions:




  1. Do you see any serious flaws?


  2. What is your opinion about the various bitlengths? In particular: Does it make any sense at all to blow up everything to 512 bits temporarily when, in the end, I use only 256 bits for the data protection key?











share|improve this question
























  • Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
    – Maarten Bodewes
    Nov 23 at 11:13












  • The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
    – FineJoe
    Nov 23 at 11:31












  • Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
    – Maarten Bodewes
    Nov 23 at 11:39












  • Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
    – FineJoe
    Nov 23 at 13:19










  • See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
    – Maarten Bodewes
    Nov 23 at 13:22















up vote
1
down vote

favorite









up vote
1
down vote

favorite











This is a follow-up question to



HKDF: ikm, salt and info values



Based on the feedback, I have now decided to implement my key derivation for AES-GCM-256 file encryption roughly as follows:



enter image description here



Everything up to scrypt is run once, the HKDF-expand part is repeated for every file to be encrypted. The random salt and info values get stored (unencrypted) with the encrypted file.



My questions:




  1. Do you see any serious flaws?


  2. What is your opinion about the various bitlengths? In particular: Does it make any sense at all to blow up everything to 512 bits temporarily when, in the end, I use only 256 bits for the data protection key?











share|improve this question















This is a follow-up question to



HKDF: ikm, salt and info values



Based on the feedback, I have now decided to implement my key derivation for AES-GCM-256 file encryption roughly as follows:



enter image description here



Everything up to scrypt is run once, the HKDF-expand part is repeated for every file to be encrypted. The random salt and info values get stored (unencrypted) with the encrypted file.



My questions:




  1. Do you see any serious flaws?


  2. What is your opinion about the various bitlengths? In particular: Does it make any sense at all to blow up everything to 512 bits temporarily when, in the end, I use only 256 bits for the data protection key?








aes key-derivation hkdf scrypt






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 at 12:50

























asked Nov 23 at 9:09









FineJoe

165




165












  • Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
    – Maarten Bodewes
    Nov 23 at 11:13












  • The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
    – FineJoe
    Nov 23 at 11:31












  • Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
    – Maarten Bodewes
    Nov 23 at 11:39












  • Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
    – FineJoe
    Nov 23 at 13:19










  • See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
    – Maarten Bodewes
    Nov 23 at 13:22




















  • Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
    – Maarten Bodewes
    Nov 23 at 11:13












  • The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
    – FineJoe
    Nov 23 at 11:31












  • Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
    – Maarten Bodewes
    Nov 23 at 11:39












  • Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
    – FineJoe
    Nov 23 at 13:19










  • See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
    – Maarten Bodewes
    Nov 23 at 13:22


















Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
– Maarten Bodewes
Nov 23 at 11:13






Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
– Maarten Bodewes
Nov 23 at 11:13














The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
– FineJoe
Nov 23 at 11:31






The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
– FineJoe
Nov 23 at 11:31














Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
– Maarten Bodewes
Nov 23 at 11:39






Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
– Maarten Bodewes
Nov 23 at 11:39














Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
– FineJoe
Nov 23 at 13:19




Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
– FineJoe
Nov 23 at 13:19












See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
– Maarten Bodewes
Nov 23 at 13:22






See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
– Maarten Bodewes
Nov 23 at 13:22












2 Answers
2






active

oldest

votes

















up vote
2
down vote













There is no real reason for the info values to be 512 bits long. The only requirement for them is to be unique, and for that, even 128 bits of randomness is enough (at least assuming that you won't be encrypting more than $2^{64}$ files with the same key). The same goes for the salt, too. Of course, using longer values won't really hurt security, it just makes your encrypted files a bit larger.



I see no problem with using 512-bit intermediate values, even if you're only generating a 256-bit key at the end. In fact, I'd consider replacing SHA-256 in the first step with SHA-512, if only to standardize on a single hash function. I believe SHA-512 is even somewhat faster on modern 64-bit CPUs, although that's unlikely to make any significant difference in practice compared to I/O and other overhead costs.



Also, as long as you're using a distinct IV/nonce for each file, you don't also need a distinct key. So you could just use the master key (truncated to whatever length you need) directly as the AES-GCM key, and dispense with HKDF entirely. Or you could keep using HKDF-Expand (e.g. if you need to derive other key material from the master key for some reason), but only call it once with a fixed info string (say, "AES-256-GCM-key" or something else reasonably distinct and informative) to derive the encryption key.



You could even consider deriving your GCM IVs from the master key using HKDF, with an info string like "AES-256-GCM-IV-<counter>", where <counter> is an incremental counter to make all the info strings for a given master key unique. (You could also append e.g. the file name and the current time to the info string too, just in case the master key somehow got reused.) You won't need to store this info string anywhere, since you can just store the IV derived using it instead. The primary advantage to using this method, instead of just using random IVs, is that it protects you from the (small but non-zero) risk of system RNG failure. Of course, if you want, you can also still include a bunch of random bits in the info string used to derive the IVs, too.





Note that, per NIST SP 800-38D section 8.3, you should not encrypt more than $2^{32}$ files with GCM mode using the same key and random IVs. This is to keep the risk of IV collision sufficiently small. If you do find yourself needing to encrypt more files than that at once, probably the easiest way to sidestep the limit is just to re-derive a new master key from the hashed password with a different salt. That means re-running scrypt, but doing that every $2^{32}$ files is probably not a significant performance issue.



A more significant limit, in practice, is the SP 800-38D section 5.2.1.1 also limits the length of files encrypted with AES-GCM to less than $2^{39}-256$ bits = 64 GiB (minus 32 bytes, to be exact). If you ever need to encrypt a file longer than that with GCM, you'll need to break it into shorter segments and combine them with something like the CHAIN construction from this paper.





As for other issues, one potential concern is that using the same master key derived from the same salt for multiple files, and storing the salt unencrypted in the file itself, makes it possible to see if two files have been encrypted by the same user at the same time. That might be an unwanted information leak.



Unfortunately there's no easy way to fix that, unless you can store the salt somewhere else (where?), use a different salt for each file (which makes encryption slower, since you'd need to re-run scrypt for each file) or omit the salt entirely (which is not recommended, as it makes your scheme vulnerable to attacks using precomputed password &lrarr; scrypt(password) tables). Still, at the very least, you should clearly inform your users of it, and maybe provide an option to use a new salt for each file, at the cost of performance.






share|improve this answer

















  • 1




    I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
    – Maarten Bodewes
    Nov 23 at 13:44








  • 1




    @MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
    – Ilmari Karonen
    Nov 23 at 13:57










  • Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
    – Maarten Bodewes
    Nov 23 at 14:22












  • @Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
    – FineJoe
    Nov 23 at 14:39












  • @FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
    – Ilmari Karonen
    Nov 23 at 14:44


















up vote
2
down vote













Generally you should be fine with having a minimum size of 128 bits and a maximum size of 256 bits for parameters such as salts. Although you should halve the security parameter because of the birthday problem, I think most cryptographers would still choose 256 bits salts as maximum.



It is unlikely that the birthday problem can be used to enhance the security beyond the, say 192 bit security it offers after about $2^{64}$ files. So you would have a very large margin with 256 bits. Salts are sometimes even set to 64 bits - for instance in the OpenSSL command line for encryption - but like anything of 64 bits, that may be on the low side by now.





WRT the security of the construction rather than the salt size



Beware that advertising a security strength of 256 bits for password based encryption is rather insincere. Passwords commonly have a security strength far below 64 bits. Even a strong key strengthening function such as scrypt will not add significant strength to this.



Such issues may be partially resolved by using public key encryption where the public key is used for encryption and the private decryption key may be kept in an less accessible location until it is used. That private key may need to be wrapped as well, possibly with a scheme such as specified in the question. OpenPGP is an (old) format that could be described this way.






share|improve this answer























    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "281"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f64276%2fkey-derivation-bit-lengths%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote













    There is no real reason for the info values to be 512 bits long. The only requirement for them is to be unique, and for that, even 128 bits of randomness is enough (at least assuming that you won't be encrypting more than $2^{64}$ files with the same key). The same goes for the salt, too. Of course, using longer values won't really hurt security, it just makes your encrypted files a bit larger.



    I see no problem with using 512-bit intermediate values, even if you're only generating a 256-bit key at the end. In fact, I'd consider replacing SHA-256 in the first step with SHA-512, if only to standardize on a single hash function. I believe SHA-512 is even somewhat faster on modern 64-bit CPUs, although that's unlikely to make any significant difference in practice compared to I/O and other overhead costs.



    Also, as long as you're using a distinct IV/nonce for each file, you don't also need a distinct key. So you could just use the master key (truncated to whatever length you need) directly as the AES-GCM key, and dispense with HKDF entirely. Or you could keep using HKDF-Expand (e.g. if you need to derive other key material from the master key for some reason), but only call it once with a fixed info string (say, "AES-256-GCM-key" or something else reasonably distinct and informative) to derive the encryption key.



    You could even consider deriving your GCM IVs from the master key using HKDF, with an info string like "AES-256-GCM-IV-<counter>", where <counter> is an incremental counter to make all the info strings for a given master key unique. (You could also append e.g. the file name and the current time to the info string too, just in case the master key somehow got reused.) You won't need to store this info string anywhere, since you can just store the IV derived using it instead. The primary advantage to using this method, instead of just using random IVs, is that it protects you from the (small but non-zero) risk of system RNG failure. Of course, if you want, you can also still include a bunch of random bits in the info string used to derive the IVs, too.





    Note that, per NIST SP 800-38D section 8.3, you should not encrypt more than $2^{32}$ files with GCM mode using the same key and random IVs. This is to keep the risk of IV collision sufficiently small. If you do find yourself needing to encrypt more files than that at once, probably the easiest way to sidestep the limit is just to re-derive a new master key from the hashed password with a different salt. That means re-running scrypt, but doing that every $2^{32}$ files is probably not a significant performance issue.



    A more significant limit, in practice, is the SP 800-38D section 5.2.1.1 also limits the length of files encrypted with AES-GCM to less than $2^{39}-256$ bits = 64 GiB (minus 32 bytes, to be exact). If you ever need to encrypt a file longer than that with GCM, you'll need to break it into shorter segments and combine them with something like the CHAIN construction from this paper.





    As for other issues, one potential concern is that using the same master key derived from the same salt for multiple files, and storing the salt unencrypted in the file itself, makes it possible to see if two files have been encrypted by the same user at the same time. That might be an unwanted information leak.



    Unfortunately there's no easy way to fix that, unless you can store the salt somewhere else (where?), use a different salt for each file (which makes encryption slower, since you'd need to re-run scrypt for each file) or omit the salt entirely (which is not recommended, as it makes your scheme vulnerable to attacks using precomputed password &lrarr; scrypt(password) tables). Still, at the very least, you should clearly inform your users of it, and maybe provide an option to use a new salt for each file, at the cost of performance.






    share|improve this answer

















    • 1




      I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
      – Maarten Bodewes
      Nov 23 at 13:44








    • 1




      @MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
      – Ilmari Karonen
      Nov 23 at 13:57










    • Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
      – Maarten Bodewes
      Nov 23 at 14:22












    • @Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
      – FineJoe
      Nov 23 at 14:39












    • @FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
      – Ilmari Karonen
      Nov 23 at 14:44















    up vote
    2
    down vote













    There is no real reason for the info values to be 512 bits long. The only requirement for them is to be unique, and for that, even 128 bits of randomness is enough (at least assuming that you won't be encrypting more than $2^{64}$ files with the same key). The same goes for the salt, too. Of course, using longer values won't really hurt security, it just makes your encrypted files a bit larger.



    I see no problem with using 512-bit intermediate values, even if you're only generating a 256-bit key at the end. In fact, I'd consider replacing SHA-256 in the first step with SHA-512, if only to standardize on a single hash function. I believe SHA-512 is even somewhat faster on modern 64-bit CPUs, although that's unlikely to make any significant difference in practice compared to I/O and other overhead costs.



    Also, as long as you're using a distinct IV/nonce for each file, you don't also need a distinct key. So you could just use the master key (truncated to whatever length you need) directly as the AES-GCM key, and dispense with HKDF entirely. Or you could keep using HKDF-Expand (e.g. if you need to derive other key material from the master key for some reason), but only call it once with a fixed info string (say, "AES-256-GCM-key" or something else reasonably distinct and informative) to derive the encryption key.



    You could even consider deriving your GCM IVs from the master key using HKDF, with an info string like "AES-256-GCM-IV-<counter>", where <counter> is an incremental counter to make all the info strings for a given master key unique. (You could also append e.g. the file name and the current time to the info string too, just in case the master key somehow got reused.) You won't need to store this info string anywhere, since you can just store the IV derived using it instead. The primary advantage to using this method, instead of just using random IVs, is that it protects you from the (small but non-zero) risk of system RNG failure. Of course, if you want, you can also still include a bunch of random bits in the info string used to derive the IVs, too.





    Note that, per NIST SP 800-38D section 8.3, you should not encrypt more than $2^{32}$ files with GCM mode using the same key and random IVs. This is to keep the risk of IV collision sufficiently small. If you do find yourself needing to encrypt more files than that at once, probably the easiest way to sidestep the limit is just to re-derive a new master key from the hashed password with a different salt. That means re-running scrypt, but doing that every $2^{32}$ files is probably not a significant performance issue.



    A more significant limit, in practice, is the SP 800-38D section 5.2.1.1 also limits the length of files encrypted with AES-GCM to less than $2^{39}-256$ bits = 64 GiB (minus 32 bytes, to be exact). If you ever need to encrypt a file longer than that with GCM, you'll need to break it into shorter segments and combine them with something like the CHAIN construction from this paper.





    As for other issues, one potential concern is that using the same master key derived from the same salt for multiple files, and storing the salt unencrypted in the file itself, makes it possible to see if two files have been encrypted by the same user at the same time. That might be an unwanted information leak.



    Unfortunately there's no easy way to fix that, unless you can store the salt somewhere else (where?), use a different salt for each file (which makes encryption slower, since you'd need to re-run scrypt for each file) or omit the salt entirely (which is not recommended, as it makes your scheme vulnerable to attacks using precomputed password &lrarr; scrypt(password) tables). Still, at the very least, you should clearly inform your users of it, and maybe provide an option to use a new salt for each file, at the cost of performance.






    share|improve this answer

















    • 1




      I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
      – Maarten Bodewes
      Nov 23 at 13:44








    • 1




      @MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
      – Ilmari Karonen
      Nov 23 at 13:57










    • Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
      – Maarten Bodewes
      Nov 23 at 14:22












    • @Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
      – FineJoe
      Nov 23 at 14:39












    • @FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
      – Ilmari Karonen
      Nov 23 at 14:44













    up vote
    2
    down vote










    up vote
    2
    down vote









    There is no real reason for the info values to be 512 bits long. The only requirement for them is to be unique, and for that, even 128 bits of randomness is enough (at least assuming that you won't be encrypting more than $2^{64}$ files with the same key). The same goes for the salt, too. Of course, using longer values won't really hurt security, it just makes your encrypted files a bit larger.



    I see no problem with using 512-bit intermediate values, even if you're only generating a 256-bit key at the end. In fact, I'd consider replacing SHA-256 in the first step with SHA-512, if only to standardize on a single hash function. I believe SHA-512 is even somewhat faster on modern 64-bit CPUs, although that's unlikely to make any significant difference in practice compared to I/O and other overhead costs.



    Also, as long as you're using a distinct IV/nonce for each file, you don't also need a distinct key. So you could just use the master key (truncated to whatever length you need) directly as the AES-GCM key, and dispense with HKDF entirely. Or you could keep using HKDF-Expand (e.g. if you need to derive other key material from the master key for some reason), but only call it once with a fixed info string (say, "AES-256-GCM-key" or something else reasonably distinct and informative) to derive the encryption key.



    You could even consider deriving your GCM IVs from the master key using HKDF, with an info string like "AES-256-GCM-IV-<counter>", where <counter> is an incremental counter to make all the info strings for a given master key unique. (You could also append e.g. the file name and the current time to the info string too, just in case the master key somehow got reused.) You won't need to store this info string anywhere, since you can just store the IV derived using it instead. The primary advantage to using this method, instead of just using random IVs, is that it protects you from the (small but non-zero) risk of system RNG failure. Of course, if you want, you can also still include a bunch of random bits in the info string used to derive the IVs, too.





    Note that, per NIST SP 800-38D section 8.3, you should not encrypt more than $2^{32}$ files with GCM mode using the same key and random IVs. This is to keep the risk of IV collision sufficiently small. If you do find yourself needing to encrypt more files than that at once, probably the easiest way to sidestep the limit is just to re-derive a new master key from the hashed password with a different salt. That means re-running scrypt, but doing that every $2^{32}$ files is probably not a significant performance issue.



    A more significant limit, in practice, is the SP 800-38D section 5.2.1.1 also limits the length of files encrypted with AES-GCM to less than $2^{39}-256$ bits = 64 GiB (minus 32 bytes, to be exact). If you ever need to encrypt a file longer than that with GCM, you'll need to break it into shorter segments and combine them with something like the CHAIN construction from this paper.





    As for other issues, one potential concern is that using the same master key derived from the same salt for multiple files, and storing the salt unencrypted in the file itself, makes it possible to see if two files have been encrypted by the same user at the same time. That might be an unwanted information leak.



    Unfortunately there's no easy way to fix that, unless you can store the salt somewhere else (where?), use a different salt for each file (which makes encryption slower, since you'd need to re-run scrypt for each file) or omit the salt entirely (which is not recommended, as it makes your scheme vulnerable to attacks using precomputed password &lrarr; scrypt(password) tables). Still, at the very least, you should clearly inform your users of it, and maybe provide an option to use a new salt for each file, at the cost of performance.






    share|improve this answer












    There is no real reason for the info values to be 512 bits long. The only requirement for them is to be unique, and for that, even 128 bits of randomness is enough (at least assuming that you won't be encrypting more than $2^{64}$ files with the same key). The same goes for the salt, too. Of course, using longer values won't really hurt security, it just makes your encrypted files a bit larger.



    I see no problem with using 512-bit intermediate values, even if you're only generating a 256-bit key at the end. In fact, I'd consider replacing SHA-256 in the first step with SHA-512, if only to standardize on a single hash function. I believe SHA-512 is even somewhat faster on modern 64-bit CPUs, although that's unlikely to make any significant difference in practice compared to I/O and other overhead costs.



    Also, as long as you're using a distinct IV/nonce for each file, you don't also need a distinct key. So you could just use the master key (truncated to whatever length you need) directly as the AES-GCM key, and dispense with HKDF entirely. Or you could keep using HKDF-Expand (e.g. if you need to derive other key material from the master key for some reason), but only call it once with a fixed info string (say, "AES-256-GCM-key" or something else reasonably distinct and informative) to derive the encryption key.



    You could even consider deriving your GCM IVs from the master key using HKDF, with an info string like "AES-256-GCM-IV-<counter>", where <counter> is an incremental counter to make all the info strings for a given master key unique. (You could also append e.g. the file name and the current time to the info string too, just in case the master key somehow got reused.) You won't need to store this info string anywhere, since you can just store the IV derived using it instead. The primary advantage to using this method, instead of just using random IVs, is that it protects you from the (small but non-zero) risk of system RNG failure. Of course, if you want, you can also still include a bunch of random bits in the info string used to derive the IVs, too.





    Note that, per NIST SP 800-38D section 8.3, you should not encrypt more than $2^{32}$ files with GCM mode using the same key and random IVs. This is to keep the risk of IV collision sufficiently small. If you do find yourself needing to encrypt more files than that at once, probably the easiest way to sidestep the limit is just to re-derive a new master key from the hashed password with a different salt. That means re-running scrypt, but doing that every $2^{32}$ files is probably not a significant performance issue.



    A more significant limit, in practice, is the SP 800-38D section 5.2.1.1 also limits the length of files encrypted with AES-GCM to less than $2^{39}-256$ bits = 64 GiB (minus 32 bytes, to be exact). If you ever need to encrypt a file longer than that with GCM, you'll need to break it into shorter segments and combine them with something like the CHAIN construction from this paper.





    As for other issues, one potential concern is that using the same master key derived from the same salt for multiple files, and storing the salt unencrypted in the file itself, makes it possible to see if two files have been encrypted by the same user at the same time. That might be an unwanted information leak.



    Unfortunately there's no easy way to fix that, unless you can store the salt somewhere else (where?), use a different salt for each file (which makes encryption slower, since you'd need to re-run scrypt for each file) or omit the salt entirely (which is not recommended, as it makes your scheme vulnerable to attacks using precomputed password &lrarr; scrypt(password) tables). Still, at the very least, you should clearly inform your users of it, and maybe provide an option to use a new salt for each file, at the cost of performance.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 23 at 13:01









    Ilmari Karonen

    34k268134




    34k268134








    • 1




      I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
      – Maarten Bodewes
      Nov 23 at 13:44








    • 1




      @MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
      – Ilmari Karonen
      Nov 23 at 13:57










    • Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
      – Maarten Bodewes
      Nov 23 at 14:22












    • @Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
      – FineJoe
      Nov 23 at 14:39












    • @FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
      – Ilmari Karonen
      Nov 23 at 14:44














    • 1




      I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
      – Maarten Bodewes
      Nov 23 at 13:44








    • 1




      @MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
      – Ilmari Karonen
      Nov 23 at 13:57










    • Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
      – Maarten Bodewes
      Nov 23 at 14:22












    • @Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
      – FineJoe
      Nov 23 at 14:39












    • @FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
      – Ilmari Karonen
      Nov 23 at 14:44








    1




    1




    I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
    – Maarten Bodewes
    Nov 23 at 13:44






    I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
    – Maarten Bodewes
    Nov 23 at 13:44






    1




    1




    @MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
    – Ilmari Karonen
    Nov 23 at 13:57




    @MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
    – Ilmari Karonen
    Nov 23 at 13:57












    Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
    – Maarten Bodewes
    Nov 23 at 14:22






    Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
    – Maarten Bodewes
    Nov 23 at 14:22














    @Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
    – FineJoe
    Nov 23 at 14:39






    @Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
    – FineJoe
    Nov 23 at 14:39














    @FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
    – Ilmari Karonen
    Nov 23 at 14:44




    @FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
    – Ilmari Karonen
    Nov 23 at 14:44










    up vote
    2
    down vote













    Generally you should be fine with having a minimum size of 128 bits and a maximum size of 256 bits for parameters such as salts. Although you should halve the security parameter because of the birthday problem, I think most cryptographers would still choose 256 bits salts as maximum.



    It is unlikely that the birthday problem can be used to enhance the security beyond the, say 192 bit security it offers after about $2^{64}$ files. So you would have a very large margin with 256 bits. Salts are sometimes even set to 64 bits - for instance in the OpenSSL command line for encryption - but like anything of 64 bits, that may be on the low side by now.





    WRT the security of the construction rather than the salt size



    Beware that advertising a security strength of 256 bits for password based encryption is rather insincere. Passwords commonly have a security strength far below 64 bits. Even a strong key strengthening function such as scrypt will not add significant strength to this.



    Such issues may be partially resolved by using public key encryption where the public key is used for encryption and the private decryption key may be kept in an less accessible location until it is used. That private key may need to be wrapped as well, possibly with a scheme such as specified in the question. OpenPGP is an (old) format that could be described this way.






    share|improve this answer



























      up vote
      2
      down vote













      Generally you should be fine with having a minimum size of 128 bits and a maximum size of 256 bits for parameters such as salts. Although you should halve the security parameter because of the birthday problem, I think most cryptographers would still choose 256 bits salts as maximum.



      It is unlikely that the birthday problem can be used to enhance the security beyond the, say 192 bit security it offers after about $2^{64}$ files. So you would have a very large margin with 256 bits. Salts are sometimes even set to 64 bits - for instance in the OpenSSL command line for encryption - but like anything of 64 bits, that may be on the low side by now.





      WRT the security of the construction rather than the salt size



      Beware that advertising a security strength of 256 bits for password based encryption is rather insincere. Passwords commonly have a security strength far below 64 bits. Even a strong key strengthening function such as scrypt will not add significant strength to this.



      Such issues may be partially resolved by using public key encryption where the public key is used for encryption and the private decryption key may be kept in an less accessible location until it is used. That private key may need to be wrapped as well, possibly with a scheme such as specified in the question. OpenPGP is an (old) format that could be described this way.






      share|improve this answer

























        up vote
        2
        down vote










        up vote
        2
        down vote









        Generally you should be fine with having a minimum size of 128 bits and a maximum size of 256 bits for parameters such as salts. Although you should halve the security parameter because of the birthday problem, I think most cryptographers would still choose 256 bits salts as maximum.



        It is unlikely that the birthday problem can be used to enhance the security beyond the, say 192 bit security it offers after about $2^{64}$ files. So you would have a very large margin with 256 bits. Salts are sometimes even set to 64 bits - for instance in the OpenSSL command line for encryption - but like anything of 64 bits, that may be on the low side by now.





        WRT the security of the construction rather than the salt size



        Beware that advertising a security strength of 256 bits for password based encryption is rather insincere. Passwords commonly have a security strength far below 64 bits. Even a strong key strengthening function such as scrypt will not add significant strength to this.



        Such issues may be partially resolved by using public key encryption where the public key is used for encryption and the private decryption key may be kept in an less accessible location until it is used. That private key may need to be wrapped as well, possibly with a scheme such as specified in the question. OpenPGP is an (old) format that could be described this way.






        share|improve this answer














        Generally you should be fine with having a minimum size of 128 bits and a maximum size of 256 bits for parameters such as salts. Although you should halve the security parameter because of the birthday problem, I think most cryptographers would still choose 256 bits salts as maximum.



        It is unlikely that the birthday problem can be used to enhance the security beyond the, say 192 bit security it offers after about $2^{64}$ files. So you would have a very large margin with 256 bits. Salts are sometimes even set to 64 bits - for instance in the OpenSSL command line for encryption - but like anything of 64 bits, that may be on the low side by now.





        WRT the security of the construction rather than the salt size



        Beware that advertising a security strength of 256 bits for password based encryption is rather insincere. Passwords commonly have a security strength far below 64 bits. Even a strong key strengthening function such as scrypt will not add significant strength to this.



        Such issues may be partially resolved by using public key encryption where the public key is used for encryption and the private decryption key may be kept in an less accessible location until it is used. That private key may need to be wrapped as well, possibly with a scheme such as specified in the question. OpenPGP is an (old) format that could be described this way.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 23 at 13:18

























        answered Nov 23 at 11:11









        Maarten Bodewes

        51.9k674189




        51.9k674189






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Cryptography Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f64276%2fkey-derivation-bit-lengths%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How do I know what Microsoft account the skydrive app is syncing to?

            When does type information flow backwards in C++?

            Grease: Live!