Key derivation: bit lengths
up vote
1
down vote
favorite
This is a follow-up question to
HKDF: ikm, salt and info values
Based on the feedback, I have now decided to implement my key derivation for AES-GCM-256 file encryption roughly as follows:
Everything up to scrypt is run once, the HKDF-expand part is repeated for every file to be encrypted. The random salt and info values get stored (unencrypted) with the encrypted file.
My questions:
Do you see any serious flaws?
What is your opinion about the various bitlengths? In particular: Does it make any sense at all to blow up everything to 512 bits temporarily when, in the end, I use only 256 bits for the data protection key?
aes key-derivation hkdf scrypt
|
show 2 more comments
up vote
1
down vote
favorite
This is a follow-up question to
HKDF: ikm, salt and info values
Based on the feedback, I have now decided to implement my key derivation for AES-GCM-256 file encryption roughly as follows:
Everything up to scrypt is run once, the HKDF-expand part is repeated for every file to be encrypted. The random salt and info values get stored (unencrypted) with the encrypted file.
My questions:
Do you see any serious flaws?
What is your opinion about the various bitlengths? In particular: Does it make any sense at all to blow up everything to 512 bits temporarily when, in the end, I use only 256 bits for the data protection key?
aes key-derivation hkdf scrypt
Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
– Maarten Bodewes
Nov 23 at 11:13
The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
– FineJoe
Nov 23 at 11:31
Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
– Maarten Bodewes
Nov 23 at 11:39
Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
– FineJoe
Nov 23 at 13:19
See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
– Maarten Bodewes
Nov 23 at 13:22
|
show 2 more comments
up vote
1
down vote
favorite
up vote
1
down vote
favorite
This is a follow-up question to
HKDF: ikm, salt and info values
Based on the feedback, I have now decided to implement my key derivation for AES-GCM-256 file encryption roughly as follows:
Everything up to scrypt is run once, the HKDF-expand part is repeated for every file to be encrypted. The random salt and info values get stored (unencrypted) with the encrypted file.
My questions:
Do you see any serious flaws?
What is your opinion about the various bitlengths? In particular: Does it make any sense at all to blow up everything to 512 bits temporarily when, in the end, I use only 256 bits for the data protection key?
aes key-derivation hkdf scrypt
This is a follow-up question to
HKDF: ikm, salt and info values
Based on the feedback, I have now decided to implement my key derivation for AES-GCM-256 file encryption roughly as follows:
Everything up to scrypt is run once, the HKDF-expand part is repeated for every file to be encrypted. The random salt and info values get stored (unencrypted) with the encrypted file.
My questions:
Do you see any serious flaws?
What is your opinion about the various bitlengths? In particular: Does it make any sense at all to blow up everything to 512 bits temporarily when, in the end, I use only 256 bits for the data protection key?
aes key-derivation hkdf scrypt
aes key-derivation hkdf scrypt
edited Nov 23 at 12:50
asked Nov 23 at 9:09
FineJoe
165
165
Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
– Maarten Bodewes
Nov 23 at 11:13
The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
– FineJoe
Nov 23 at 11:31
Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
– Maarten Bodewes
Nov 23 at 11:39
Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
– FineJoe
Nov 23 at 13:19
See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
– Maarten Bodewes
Nov 23 at 13:22
|
show 2 more comments
Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
– Maarten Bodewes
Nov 23 at 11:13
The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
– FineJoe
Nov 23 at 11:31
Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
– Maarten Bodewes
Nov 23 at 11:39
Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
– FineJoe
Nov 23 at 13:19
See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
– Maarten Bodewes
Nov 23 at 13:22
Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
– Maarten Bodewes
Nov 23 at 11:13
Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
– Maarten Bodewes
Nov 23 at 11:13
The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
– FineJoe
Nov 23 at 11:31
The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
– FineJoe
Nov 23 at 11:31
Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
– Maarten Bodewes
Nov 23 at 11:39
Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
– Maarten Bodewes
Nov 23 at 11:39
Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
– FineJoe
Nov 23 at 13:19
Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
– FineJoe
Nov 23 at 13:19
See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
– Maarten Bodewes
Nov 23 at 13:22
See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
– Maarten Bodewes
Nov 23 at 13:22
|
show 2 more comments
2 Answers
2
active
oldest
votes
up vote
2
down vote
There is no real reason for the info values to be 512 bits long. The only requirement for them is to be unique, and for that, even 128 bits of randomness is enough (at least assuming that you won't be encrypting more than $2^{64}$ files with the same key). The same goes for the salt, too. Of course, using longer values won't really hurt security, it just makes your encrypted files a bit larger.
I see no problem with using 512-bit intermediate values, even if you're only generating a 256-bit key at the end. In fact, I'd consider replacing SHA-256 in the first step with SHA-512, if only to standardize on a single hash function. I believe SHA-512 is even somewhat faster on modern 64-bit CPUs, although that's unlikely to make any significant difference in practice compared to I/O and other overhead costs.
Also, as long as you're using a distinct IV/nonce for each file, you don't also need a distinct key. So you could just use the master key (truncated to whatever length you need) directly as the AES-GCM key, and dispense with HKDF entirely. Or you could keep using HKDF-Expand (e.g. if you need to derive other key material from the master key for some reason), but only call it once with a fixed info string (say, "AES-256-GCM-key" or something else reasonably distinct and informative) to derive the encryption key.
You could even consider deriving your GCM IVs from the master key using HKDF, with an info string like "AES-256-GCM-IV-<counter>", where <counter> is an incremental counter to make all the info strings for a given master key unique. (You could also append e.g. the file name and the current time to the info string too, just in case the master key somehow got reused.) You won't need to store this info string anywhere, since you can just store the IV derived using it instead. The primary advantage to using this method, instead of just using random IVs, is that it protects you from the (small but non-zero) risk of system RNG failure. Of course, if you want, you can also still include a bunch of random bits in the info string used to derive the IVs, too.
Note that, per NIST SP 800-38D section 8.3, you should not encrypt more than $2^{32}$ files with GCM mode using the same key and random IVs. This is to keep the risk of IV collision sufficiently small. If you do find yourself needing to encrypt more files than that at once, probably the easiest way to sidestep the limit is just to re-derive a new master key from the hashed password with a different salt. That means re-running scrypt, but doing that every $2^{32}$ files is probably not a significant performance issue.
A more significant limit, in practice, is the SP 800-38D section 5.2.1.1 also limits the length of files encrypted with AES-GCM to less than $2^{39}-256$ bits = 64 GiB (minus 32 bytes, to be exact). If you ever need to encrypt a file longer than that with GCM, you'll need to break it into shorter segments and combine them with something like the CHAIN construction from this paper.
As for other issues, one potential concern is that using the same master key derived from the same salt for multiple files, and storing the salt unencrypted in the file itself, makes it possible to see if two files have been encrypted by the same user at the same time. That might be an unwanted information leak.
Unfortunately there's no easy way to fix that, unless you can store the salt somewhere else (where?), use a different salt for each file (which makes encryption slower, since you'd need to re-run scrypt for each file) or omit the salt entirely (which is not recommended, as it makes your scheme vulnerable to attacks using precomputed password ⇆ scrypt(password) tables). Still, at the very least, you should clearly inform your users of it, and maybe provide an option to use a new salt for each file, at the cost of performance.
1
I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
– Maarten Bodewes
Nov 23 at 13:44
1
@MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
– Ilmari Karonen
Nov 23 at 13:57
Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
– Maarten Bodewes
Nov 23 at 14:22
@Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
– FineJoe
Nov 23 at 14:39
@FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
– Ilmari Karonen
Nov 23 at 14:44
|
show 3 more comments
up vote
2
down vote
Generally you should be fine with having a minimum size of 128 bits and a maximum size of 256 bits for parameters such as salts. Although you should halve the security parameter because of the birthday problem, I think most cryptographers would still choose 256 bits salts as maximum.
It is unlikely that the birthday problem can be used to enhance the security beyond the, say 192 bit security it offers after about $2^{64}$ files. So you would have a very large margin with 256 bits. Salts are sometimes even set to 64 bits - for instance in the OpenSSL command line for encryption - but like anything of 64 bits, that may be on the low side by now.
WRT the security of the construction rather than the salt size
Beware that advertising a security strength of 256 bits for password based encryption is rather insincere. Passwords commonly have a security strength far below 64 bits. Even a strong key strengthening function such as scrypt will not add significant strength to this.
Such issues may be partially resolved by using public key encryption where the public key is used for encryption and the private decryption key may be kept in an less accessible location until it is used. That private key may need to be wrapped as well, possibly with a scheme such as specified in the question. OpenPGP is an (old) format that could be described this way.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
There is no real reason for the info values to be 512 bits long. The only requirement for them is to be unique, and for that, even 128 bits of randomness is enough (at least assuming that you won't be encrypting more than $2^{64}$ files with the same key). The same goes for the salt, too. Of course, using longer values won't really hurt security, it just makes your encrypted files a bit larger.
I see no problem with using 512-bit intermediate values, even if you're only generating a 256-bit key at the end. In fact, I'd consider replacing SHA-256 in the first step with SHA-512, if only to standardize on a single hash function. I believe SHA-512 is even somewhat faster on modern 64-bit CPUs, although that's unlikely to make any significant difference in practice compared to I/O and other overhead costs.
Also, as long as you're using a distinct IV/nonce for each file, you don't also need a distinct key. So you could just use the master key (truncated to whatever length you need) directly as the AES-GCM key, and dispense with HKDF entirely. Or you could keep using HKDF-Expand (e.g. if you need to derive other key material from the master key for some reason), but only call it once with a fixed info string (say, "AES-256-GCM-key" or something else reasonably distinct and informative) to derive the encryption key.
You could even consider deriving your GCM IVs from the master key using HKDF, with an info string like "AES-256-GCM-IV-<counter>", where <counter> is an incremental counter to make all the info strings for a given master key unique. (You could also append e.g. the file name and the current time to the info string too, just in case the master key somehow got reused.) You won't need to store this info string anywhere, since you can just store the IV derived using it instead. The primary advantage to using this method, instead of just using random IVs, is that it protects you from the (small but non-zero) risk of system RNG failure. Of course, if you want, you can also still include a bunch of random bits in the info string used to derive the IVs, too.
Note that, per NIST SP 800-38D section 8.3, you should not encrypt more than $2^{32}$ files with GCM mode using the same key and random IVs. This is to keep the risk of IV collision sufficiently small. If you do find yourself needing to encrypt more files than that at once, probably the easiest way to sidestep the limit is just to re-derive a new master key from the hashed password with a different salt. That means re-running scrypt, but doing that every $2^{32}$ files is probably not a significant performance issue.
A more significant limit, in practice, is the SP 800-38D section 5.2.1.1 also limits the length of files encrypted with AES-GCM to less than $2^{39}-256$ bits = 64 GiB (minus 32 bytes, to be exact). If you ever need to encrypt a file longer than that with GCM, you'll need to break it into shorter segments and combine them with something like the CHAIN construction from this paper.
As for other issues, one potential concern is that using the same master key derived from the same salt for multiple files, and storing the salt unencrypted in the file itself, makes it possible to see if two files have been encrypted by the same user at the same time. That might be an unwanted information leak.
Unfortunately there's no easy way to fix that, unless you can store the salt somewhere else (where?), use a different salt for each file (which makes encryption slower, since you'd need to re-run scrypt for each file) or omit the salt entirely (which is not recommended, as it makes your scheme vulnerable to attacks using precomputed password ⇆ scrypt(password) tables). Still, at the very least, you should clearly inform your users of it, and maybe provide an option to use a new salt for each file, at the cost of performance.
1
I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
– Maarten Bodewes
Nov 23 at 13:44
1
@MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
– Ilmari Karonen
Nov 23 at 13:57
Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
– Maarten Bodewes
Nov 23 at 14:22
@Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
– FineJoe
Nov 23 at 14:39
@FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
– Ilmari Karonen
Nov 23 at 14:44
|
show 3 more comments
up vote
2
down vote
There is no real reason for the info values to be 512 bits long. The only requirement for them is to be unique, and for that, even 128 bits of randomness is enough (at least assuming that you won't be encrypting more than $2^{64}$ files with the same key). The same goes for the salt, too. Of course, using longer values won't really hurt security, it just makes your encrypted files a bit larger.
I see no problem with using 512-bit intermediate values, even if you're only generating a 256-bit key at the end. In fact, I'd consider replacing SHA-256 in the first step with SHA-512, if only to standardize on a single hash function. I believe SHA-512 is even somewhat faster on modern 64-bit CPUs, although that's unlikely to make any significant difference in practice compared to I/O and other overhead costs.
Also, as long as you're using a distinct IV/nonce for each file, you don't also need a distinct key. So you could just use the master key (truncated to whatever length you need) directly as the AES-GCM key, and dispense with HKDF entirely. Or you could keep using HKDF-Expand (e.g. if you need to derive other key material from the master key for some reason), but only call it once with a fixed info string (say, "AES-256-GCM-key" or something else reasonably distinct and informative) to derive the encryption key.
You could even consider deriving your GCM IVs from the master key using HKDF, with an info string like "AES-256-GCM-IV-<counter>", where <counter> is an incremental counter to make all the info strings for a given master key unique. (You could also append e.g. the file name and the current time to the info string too, just in case the master key somehow got reused.) You won't need to store this info string anywhere, since you can just store the IV derived using it instead. The primary advantage to using this method, instead of just using random IVs, is that it protects you from the (small but non-zero) risk of system RNG failure. Of course, if you want, you can also still include a bunch of random bits in the info string used to derive the IVs, too.
Note that, per NIST SP 800-38D section 8.3, you should not encrypt more than $2^{32}$ files with GCM mode using the same key and random IVs. This is to keep the risk of IV collision sufficiently small. If you do find yourself needing to encrypt more files than that at once, probably the easiest way to sidestep the limit is just to re-derive a new master key from the hashed password with a different salt. That means re-running scrypt, but doing that every $2^{32}$ files is probably not a significant performance issue.
A more significant limit, in practice, is the SP 800-38D section 5.2.1.1 also limits the length of files encrypted with AES-GCM to less than $2^{39}-256$ bits = 64 GiB (minus 32 bytes, to be exact). If you ever need to encrypt a file longer than that with GCM, you'll need to break it into shorter segments and combine them with something like the CHAIN construction from this paper.
As for other issues, one potential concern is that using the same master key derived from the same salt for multiple files, and storing the salt unencrypted in the file itself, makes it possible to see if two files have been encrypted by the same user at the same time. That might be an unwanted information leak.
Unfortunately there's no easy way to fix that, unless you can store the salt somewhere else (where?), use a different salt for each file (which makes encryption slower, since you'd need to re-run scrypt for each file) or omit the salt entirely (which is not recommended, as it makes your scheme vulnerable to attacks using precomputed password ⇆ scrypt(password) tables). Still, at the very least, you should clearly inform your users of it, and maybe provide an option to use a new salt for each file, at the cost of performance.
1
I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
– Maarten Bodewes
Nov 23 at 13:44
1
@MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
– Ilmari Karonen
Nov 23 at 13:57
Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
– Maarten Bodewes
Nov 23 at 14:22
@Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
– FineJoe
Nov 23 at 14:39
@FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
– Ilmari Karonen
Nov 23 at 14:44
|
show 3 more comments
up vote
2
down vote
up vote
2
down vote
There is no real reason for the info values to be 512 bits long. The only requirement for them is to be unique, and for that, even 128 bits of randomness is enough (at least assuming that you won't be encrypting more than $2^{64}$ files with the same key). The same goes for the salt, too. Of course, using longer values won't really hurt security, it just makes your encrypted files a bit larger.
I see no problem with using 512-bit intermediate values, even if you're only generating a 256-bit key at the end. In fact, I'd consider replacing SHA-256 in the first step with SHA-512, if only to standardize on a single hash function. I believe SHA-512 is even somewhat faster on modern 64-bit CPUs, although that's unlikely to make any significant difference in practice compared to I/O and other overhead costs.
Also, as long as you're using a distinct IV/nonce for each file, you don't also need a distinct key. So you could just use the master key (truncated to whatever length you need) directly as the AES-GCM key, and dispense with HKDF entirely. Or you could keep using HKDF-Expand (e.g. if you need to derive other key material from the master key for some reason), but only call it once with a fixed info string (say, "AES-256-GCM-key" or something else reasonably distinct and informative) to derive the encryption key.
You could even consider deriving your GCM IVs from the master key using HKDF, with an info string like "AES-256-GCM-IV-<counter>", where <counter> is an incremental counter to make all the info strings for a given master key unique. (You could also append e.g. the file name and the current time to the info string too, just in case the master key somehow got reused.) You won't need to store this info string anywhere, since you can just store the IV derived using it instead. The primary advantage to using this method, instead of just using random IVs, is that it protects you from the (small but non-zero) risk of system RNG failure. Of course, if you want, you can also still include a bunch of random bits in the info string used to derive the IVs, too.
Note that, per NIST SP 800-38D section 8.3, you should not encrypt more than $2^{32}$ files with GCM mode using the same key and random IVs. This is to keep the risk of IV collision sufficiently small. If you do find yourself needing to encrypt more files than that at once, probably the easiest way to sidestep the limit is just to re-derive a new master key from the hashed password with a different salt. That means re-running scrypt, but doing that every $2^{32}$ files is probably not a significant performance issue.
A more significant limit, in practice, is the SP 800-38D section 5.2.1.1 also limits the length of files encrypted with AES-GCM to less than $2^{39}-256$ bits = 64 GiB (minus 32 bytes, to be exact). If you ever need to encrypt a file longer than that with GCM, you'll need to break it into shorter segments and combine them with something like the CHAIN construction from this paper.
As for other issues, one potential concern is that using the same master key derived from the same salt for multiple files, and storing the salt unencrypted in the file itself, makes it possible to see if two files have been encrypted by the same user at the same time. That might be an unwanted information leak.
Unfortunately there's no easy way to fix that, unless you can store the salt somewhere else (where?), use a different salt for each file (which makes encryption slower, since you'd need to re-run scrypt for each file) or omit the salt entirely (which is not recommended, as it makes your scheme vulnerable to attacks using precomputed password ⇆ scrypt(password) tables). Still, at the very least, you should clearly inform your users of it, and maybe provide an option to use a new salt for each file, at the cost of performance.
There is no real reason for the info values to be 512 bits long. The only requirement for them is to be unique, and for that, even 128 bits of randomness is enough (at least assuming that you won't be encrypting more than $2^{64}$ files with the same key). The same goes for the salt, too. Of course, using longer values won't really hurt security, it just makes your encrypted files a bit larger.
I see no problem with using 512-bit intermediate values, even if you're only generating a 256-bit key at the end. In fact, I'd consider replacing SHA-256 in the first step with SHA-512, if only to standardize on a single hash function. I believe SHA-512 is even somewhat faster on modern 64-bit CPUs, although that's unlikely to make any significant difference in practice compared to I/O and other overhead costs.
Also, as long as you're using a distinct IV/nonce for each file, you don't also need a distinct key. So you could just use the master key (truncated to whatever length you need) directly as the AES-GCM key, and dispense with HKDF entirely. Or you could keep using HKDF-Expand (e.g. if you need to derive other key material from the master key for some reason), but only call it once with a fixed info string (say, "AES-256-GCM-key" or something else reasonably distinct and informative) to derive the encryption key.
You could even consider deriving your GCM IVs from the master key using HKDF, with an info string like "AES-256-GCM-IV-<counter>", where <counter> is an incremental counter to make all the info strings for a given master key unique. (You could also append e.g. the file name and the current time to the info string too, just in case the master key somehow got reused.) You won't need to store this info string anywhere, since you can just store the IV derived using it instead. The primary advantage to using this method, instead of just using random IVs, is that it protects you from the (small but non-zero) risk of system RNG failure. Of course, if you want, you can also still include a bunch of random bits in the info string used to derive the IVs, too.
Note that, per NIST SP 800-38D section 8.3, you should not encrypt more than $2^{32}$ files with GCM mode using the same key and random IVs. This is to keep the risk of IV collision sufficiently small. If you do find yourself needing to encrypt more files than that at once, probably the easiest way to sidestep the limit is just to re-derive a new master key from the hashed password with a different salt. That means re-running scrypt, but doing that every $2^{32}$ files is probably not a significant performance issue.
A more significant limit, in practice, is the SP 800-38D section 5.2.1.1 also limits the length of files encrypted with AES-GCM to less than $2^{39}-256$ bits = 64 GiB (minus 32 bytes, to be exact). If you ever need to encrypt a file longer than that with GCM, you'll need to break it into shorter segments and combine them with something like the CHAIN construction from this paper.
As for other issues, one potential concern is that using the same master key derived from the same salt for multiple files, and storing the salt unencrypted in the file itself, makes it possible to see if two files have been encrypted by the same user at the same time. That might be an unwanted information leak.
Unfortunately there's no easy way to fix that, unless you can store the salt somewhere else (where?), use a different salt for each file (which makes encryption slower, since you'd need to re-run scrypt for each file) or omit the salt entirely (which is not recommended, as it makes your scheme vulnerable to attacks using precomputed password ⇆ scrypt(password) tables). Still, at the very least, you should clearly inform your users of it, and maybe provide an option to use a new salt for each file, at the cost of performance.
answered Nov 23 at 13:01
Ilmari Karonen
34k268134
34k268134
1
I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
– Maarten Bodewes
Nov 23 at 13:44
1
@MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
– Ilmari Karonen
Nov 23 at 13:57
Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
– Maarten Bodewes
Nov 23 at 14:22
@Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
– FineJoe
Nov 23 at 14:39
@FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
– Ilmari Karonen
Nov 23 at 14:44
|
show 3 more comments
1
I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
– Maarten Bodewes
Nov 23 at 13:44
1
@MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
– Ilmari Karonen
Nov 23 at 13:57
Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
– Maarten Bodewes
Nov 23 at 14:22
@Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
– FineJoe
Nov 23 at 14:39
@FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
– Ilmari Karonen
Nov 23 at 14:44
1
1
I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
– Maarten Bodewes
Nov 23 at 13:44
I was writing that SHA-512 would be faster as well, but for a password input it is probably slower as the input may not fill a single SHA-256 block of 440 bits of data (excluding padding and length).
– Maarten Bodewes
Nov 23 at 13:44
1
1
@MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
– Ilmari Karonen
Nov 23 at 13:57
@MaartenBodewes: True, but the OP is using the same hash for key files, which might be longer. Then again, in that case it's probably I/O bound anyway.
– Ilmari Karonen
Nov 23 at 13:57
Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
– Maarten Bodewes
Nov 23 at 14:22
Yeah, but I'd rather have a speedy hash for files than for passwords, so SHA-512 makes even more sense that way. No need to burn CPU instructions for zero benefit. SHA-512 it is. Except of course that the latest Intel's and AMD's have SHA-256 instructions embedded. Um. Whatever, I give up :)
– Maarten Bodewes
Nov 23 at 14:22
@Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
– FineJoe
Nov 23 at 14:39
@Ilmari: Two questions 1. Using HKDF for AES-GCM iv generation: Why not use a simple 96-bit counter as iv? To my knowledge, the iv is simply a nonce and doesn't need to be (pseudo-)random. 2. In your model, why would I run Scrypt again after 2^32 files? Why not simply HKDF-expand - just with a different info value?
– FineJoe
Nov 23 at 14:39
@FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
– Ilmari Karonen
Nov 23 at 14:44
@FineJoe: That's OK too, but it triggers the same extra information leak that I... uh... seem to have left out from the actually posted version of my answer. Oops. Anyway, the problem with sequential IVs is that if you observe a file with the salt ABCXYZ and the sequential IV 0005, you'll know that the same user must have encrypted at least four (or five, if you start counting from zero) other files at the same time. That extra information leak might be harmless, or it might not, depending on the user's specific security needs. In any case, it's something that the user may not expect.
– Ilmari Karonen
Nov 23 at 14:44
|
show 3 more comments
up vote
2
down vote
Generally you should be fine with having a minimum size of 128 bits and a maximum size of 256 bits for parameters such as salts. Although you should halve the security parameter because of the birthday problem, I think most cryptographers would still choose 256 bits salts as maximum.
It is unlikely that the birthday problem can be used to enhance the security beyond the, say 192 bit security it offers after about $2^{64}$ files. So you would have a very large margin with 256 bits. Salts are sometimes even set to 64 bits - for instance in the OpenSSL command line for encryption - but like anything of 64 bits, that may be on the low side by now.
WRT the security of the construction rather than the salt size
Beware that advertising a security strength of 256 bits for password based encryption is rather insincere. Passwords commonly have a security strength far below 64 bits. Even a strong key strengthening function such as scrypt will not add significant strength to this.
Such issues may be partially resolved by using public key encryption where the public key is used for encryption and the private decryption key may be kept in an less accessible location until it is used. That private key may need to be wrapped as well, possibly with a scheme such as specified in the question. OpenPGP is an (old) format that could be described this way.
add a comment |
up vote
2
down vote
Generally you should be fine with having a minimum size of 128 bits and a maximum size of 256 bits for parameters such as salts. Although you should halve the security parameter because of the birthday problem, I think most cryptographers would still choose 256 bits salts as maximum.
It is unlikely that the birthday problem can be used to enhance the security beyond the, say 192 bit security it offers after about $2^{64}$ files. So you would have a very large margin with 256 bits. Salts are sometimes even set to 64 bits - for instance in the OpenSSL command line for encryption - but like anything of 64 bits, that may be on the low side by now.
WRT the security of the construction rather than the salt size
Beware that advertising a security strength of 256 bits for password based encryption is rather insincere. Passwords commonly have a security strength far below 64 bits. Even a strong key strengthening function such as scrypt will not add significant strength to this.
Such issues may be partially resolved by using public key encryption where the public key is used for encryption and the private decryption key may be kept in an less accessible location until it is used. That private key may need to be wrapped as well, possibly with a scheme such as specified in the question. OpenPGP is an (old) format that could be described this way.
add a comment |
up vote
2
down vote
up vote
2
down vote
Generally you should be fine with having a minimum size of 128 bits and a maximum size of 256 bits for parameters such as salts. Although you should halve the security parameter because of the birthday problem, I think most cryptographers would still choose 256 bits salts as maximum.
It is unlikely that the birthday problem can be used to enhance the security beyond the, say 192 bit security it offers after about $2^{64}$ files. So you would have a very large margin with 256 bits. Salts are sometimes even set to 64 bits - for instance in the OpenSSL command line for encryption - but like anything of 64 bits, that may be on the low side by now.
WRT the security of the construction rather than the salt size
Beware that advertising a security strength of 256 bits for password based encryption is rather insincere. Passwords commonly have a security strength far below 64 bits. Even a strong key strengthening function such as scrypt will not add significant strength to this.
Such issues may be partially resolved by using public key encryption where the public key is used for encryption and the private decryption key may be kept in an less accessible location until it is used. That private key may need to be wrapped as well, possibly with a scheme such as specified in the question. OpenPGP is an (old) format that could be described this way.
Generally you should be fine with having a minimum size of 128 bits and a maximum size of 256 bits for parameters such as salts. Although you should halve the security parameter because of the birthday problem, I think most cryptographers would still choose 256 bits salts as maximum.
It is unlikely that the birthday problem can be used to enhance the security beyond the, say 192 bit security it offers after about $2^{64}$ files. So you would have a very large margin with 256 bits. Salts are sometimes even set to 64 bits - for instance in the OpenSSL command line for encryption - but like anything of 64 bits, that may be on the low side by now.
WRT the security of the construction rather than the salt size
Beware that advertising a security strength of 256 bits for password based encryption is rather insincere. Passwords commonly have a security strength far below 64 bits. Even a strong key strengthening function such as scrypt will not add significant strength to this.
Such issues may be partially resolved by using public key encryption where the public key is used for encryption and the private decryption key may be kept in an less accessible location until it is used. That private key may need to be wrapped as well, possibly with a scheme such as specified in the question. OpenPGP is an (old) format that could be described this way.
edited Nov 23 at 13:18
answered Nov 23 at 11:11
Maarten Bodewes
51.9k674189
51.9k674189
add a comment |
add a comment |
Thanks for contributing an answer to Cryptography Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f64276%2fkey-derivation-bit-lengths%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Yes, this is the kind of key derivation scheme that I would hope for when performing password based encryption (PBE). Note that performing full analysis of cryptographic designs is off topic on crypto.SE. The initial hash is generally skipped for these kind of designs (having a crypto hash followed by a password hash is kind of meaningless) but it won't alter the security properties of the function as far as I can see. Do indicate a version number, you may want to upgrade things later.
– Maarten Bodewes
Nov 23 at 11:13
The reason for the initial crypto hash is simple: In my context it is crucial that people can switch back and forth between using a key file and entering the contents of the key file as password. However, I also have to allow for LARGE key files (those aren't entered as password, of course), and scrypt can't process them block-wise (scrypt isn't based on Merkle–Damgård). So I'm forced to feed them to scrypt via their hash value.
– FineJoe
Nov 23 at 11:31
Ah, yeah, that's a fine reason to use a hash there. Don't forget to specify how the input needs to be formatted though. Password hashes generally have a default encoding for passwords (usually UTF-8 nowadays) but SHA-256 hasn't.
– Maarten Bodewes
Nov 23 at 11:39
Is sha256 enough here... or should I go for sha512 (after all, scrypt and hkdf-expand in my model use 512-bit-outputs, too)?
– FineJoe
Nov 23 at 13:19
See Ilmari's answer for that. SHA-256 is OK, but SHA-512 is already used in other parts of your scheme, so as long as your PBKDF (scrypt) can handle the expanded output you can just use the 512 bit output. Otherwise you could still truncate the output as well, of course.
– Maarten Bodewes
Nov 23 at 13:22