Is it safe (no data loss) to convert ANSI to UTF-8 and then back to ANSI?
Is it safe (no data loss) to convert ANSI to UTF-8 and then back to ANSI?
I have read that you can lose data going from UTF-8 to ANSI.
But if the file was changed from ANSI to UTF-8 (and not changed further while in UTF-8) and then changed back to ANSI, is this 100% safe?
encoding unicode utf-8 ansi
add a comment |
Is it safe (no data loss) to convert ANSI to UTF-8 and then back to ANSI?
I have read that you can lose data going from UTF-8 to ANSI.
But if the file was changed from ANSI to UTF-8 (and not changed further while in UTF-8) and then changed back to ANSI, is this 100% safe?
encoding unicode utf-8 ansi
If you are really worried you can compare the files after to see if they're identical, then you know if it's 100% There may be a command to do a byte by byte check. Thefccommand might do a byte by byte check
– barlop
Feb 14 at 16:37
add a comment |
Is it safe (no data loss) to convert ANSI to UTF-8 and then back to ANSI?
I have read that you can lose data going from UTF-8 to ANSI.
But if the file was changed from ANSI to UTF-8 (and not changed further while in UTF-8) and then changed back to ANSI, is this 100% safe?
encoding unicode utf-8 ansi
Is it safe (no data loss) to convert ANSI to UTF-8 and then back to ANSI?
I have read that you can lose data going from UTF-8 to ANSI.
But if the file was changed from ANSI to UTF-8 (and not changed further while in UTF-8) and then changed back to ANSI, is this 100% safe?
encoding unicode utf-8 ansi
encoding unicode utf-8 ansi
asked Feb 14 at 16:01
Mr. SmithMr. Smith
32
32
If you are really worried you can compare the files after to see if they're identical, then you know if it's 100% There may be a command to do a byte by byte check. Thefccommand might do a byte by byte check
– barlop
Feb 14 at 16:37
add a comment |
If you are really worried you can compare the files after to see if they're identical, then you know if it's 100% There may be a command to do a byte by byte check. Thefccommand might do a byte by byte check
– barlop
Feb 14 at 16:37
If you are really worried you can compare the files after to see if they're identical, then you know if it's 100% There may be a command to do a byte by byte check. The
fc command might do a byte by byte check– barlop
Feb 14 at 16:37
If you are really worried you can compare the files after to see if they're identical, then you know if it's 100% There may be a command to do a byte by byte check. The
fc command might do a byte by byte check– barlop
Feb 14 at 16:37
add a comment |
1 Answer
1
active
oldest
votes
It's probably safe, but only if you convert between the same encodings both times.
UTF-8 by itself isn't a character set – it is a way to encode Unicode into bytes. It can represent the same characters as UTF-16, the encoding that modern Windows uses. So the real question is whether converting to Unicode can lose information – and AFAIK, the answer is "it shouldn't, but it sometimes might":
The Old New Thing has a footnote about this:
Bonus chatter: Even the round trip from ANSI to Unicode and back to ANSI can be lossy, depending on the flags you pass regarding use of precomposed characters, for example.
Unicode has several canonical forms – for example,
ãcan be stored both as a single codepoint (precomposed), or as plaina+ combining tilde (decomposed). Windows prefers the former, macOS prefers the latter.
I'm not entirely sure whether e.g. Windows-932 counts as "ANSI", but I wouldn't be surprised if there were issues (as mentioned on Wikipedia) due to the same byte doubling as both a
¥symbol and a path separator that's normally a backslash...
Meanwhile, there is no encoding or codepage called "ANSI". It's the name of a standards organization which has defined several text encodings. Within Windows, the term means a large set of "Windows-125x" encodings for various countries and languages (somewhat corresponding to ISO 8859 encodings, and allegedly based on early drafts written by ANSI).
So it is very possible that one system calls Windows-1251 "ANSI" and another uses Windows-1257 for the same, and as a result, each can represent characters that the other cannot. (In fact, latest Windows 10.1809 even allows UTF-8 to be the "ANSI" encoding.) In the case of differently configured systems, even if the initial conversion to Unicode doesn't lose information, converting back to "ANSI" will.
You write "It's probably safe, but only if you convert between the same encodings both times." <--- Do you mean it's not 100% safe even if you convert between the same encodings both times?
– barlop
Feb 14 at 16:38
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1405744%2fis-it-safe-no-data-loss-to-convert-ansi-to-utf-8-and-then-back-to-ansi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It's probably safe, but only if you convert between the same encodings both times.
UTF-8 by itself isn't a character set – it is a way to encode Unicode into bytes. It can represent the same characters as UTF-16, the encoding that modern Windows uses. So the real question is whether converting to Unicode can lose information – and AFAIK, the answer is "it shouldn't, but it sometimes might":
The Old New Thing has a footnote about this:
Bonus chatter: Even the round trip from ANSI to Unicode and back to ANSI can be lossy, depending on the flags you pass regarding use of precomposed characters, for example.
Unicode has several canonical forms – for example,
ãcan be stored both as a single codepoint (precomposed), or as plaina+ combining tilde (decomposed). Windows prefers the former, macOS prefers the latter.
I'm not entirely sure whether e.g. Windows-932 counts as "ANSI", but I wouldn't be surprised if there were issues (as mentioned on Wikipedia) due to the same byte doubling as both a
¥symbol and a path separator that's normally a backslash...
Meanwhile, there is no encoding or codepage called "ANSI". It's the name of a standards organization which has defined several text encodings. Within Windows, the term means a large set of "Windows-125x" encodings for various countries and languages (somewhat corresponding to ISO 8859 encodings, and allegedly based on early drafts written by ANSI).
So it is very possible that one system calls Windows-1251 "ANSI" and another uses Windows-1257 for the same, and as a result, each can represent characters that the other cannot. (In fact, latest Windows 10.1809 even allows UTF-8 to be the "ANSI" encoding.) In the case of differently configured systems, even if the initial conversion to Unicode doesn't lose information, converting back to "ANSI" will.
You write "It's probably safe, but only if you convert between the same encodings both times." <--- Do you mean it's not 100% safe even if you convert between the same encodings both times?
– barlop
Feb 14 at 16:38
add a comment |
It's probably safe, but only if you convert between the same encodings both times.
UTF-8 by itself isn't a character set – it is a way to encode Unicode into bytes. It can represent the same characters as UTF-16, the encoding that modern Windows uses. So the real question is whether converting to Unicode can lose information – and AFAIK, the answer is "it shouldn't, but it sometimes might":
The Old New Thing has a footnote about this:
Bonus chatter: Even the round trip from ANSI to Unicode and back to ANSI can be lossy, depending on the flags you pass regarding use of precomposed characters, for example.
Unicode has several canonical forms – for example,
ãcan be stored both as a single codepoint (precomposed), or as plaina+ combining tilde (decomposed). Windows prefers the former, macOS prefers the latter.
I'm not entirely sure whether e.g. Windows-932 counts as "ANSI", but I wouldn't be surprised if there were issues (as mentioned on Wikipedia) due to the same byte doubling as both a
¥symbol and a path separator that's normally a backslash...
Meanwhile, there is no encoding or codepage called "ANSI". It's the name of a standards organization which has defined several text encodings. Within Windows, the term means a large set of "Windows-125x" encodings for various countries and languages (somewhat corresponding to ISO 8859 encodings, and allegedly based on early drafts written by ANSI).
So it is very possible that one system calls Windows-1251 "ANSI" and another uses Windows-1257 for the same, and as a result, each can represent characters that the other cannot. (In fact, latest Windows 10.1809 even allows UTF-8 to be the "ANSI" encoding.) In the case of differently configured systems, even if the initial conversion to Unicode doesn't lose information, converting back to "ANSI" will.
You write "It's probably safe, but only if you convert between the same encodings both times." <--- Do you mean it's not 100% safe even if you convert between the same encodings both times?
– barlop
Feb 14 at 16:38
add a comment |
It's probably safe, but only if you convert between the same encodings both times.
UTF-8 by itself isn't a character set – it is a way to encode Unicode into bytes. It can represent the same characters as UTF-16, the encoding that modern Windows uses. So the real question is whether converting to Unicode can lose information – and AFAIK, the answer is "it shouldn't, but it sometimes might":
The Old New Thing has a footnote about this:
Bonus chatter: Even the round trip from ANSI to Unicode and back to ANSI can be lossy, depending on the flags you pass regarding use of precomposed characters, for example.
Unicode has several canonical forms – for example,
ãcan be stored both as a single codepoint (precomposed), or as plaina+ combining tilde (decomposed). Windows prefers the former, macOS prefers the latter.
I'm not entirely sure whether e.g. Windows-932 counts as "ANSI", but I wouldn't be surprised if there were issues (as mentioned on Wikipedia) due to the same byte doubling as both a
¥symbol and a path separator that's normally a backslash...
Meanwhile, there is no encoding or codepage called "ANSI". It's the name of a standards organization which has defined several text encodings. Within Windows, the term means a large set of "Windows-125x" encodings for various countries and languages (somewhat corresponding to ISO 8859 encodings, and allegedly based on early drafts written by ANSI).
So it is very possible that one system calls Windows-1251 "ANSI" and another uses Windows-1257 for the same, and as a result, each can represent characters that the other cannot. (In fact, latest Windows 10.1809 even allows UTF-8 to be the "ANSI" encoding.) In the case of differently configured systems, even if the initial conversion to Unicode doesn't lose information, converting back to "ANSI" will.
It's probably safe, but only if you convert between the same encodings both times.
UTF-8 by itself isn't a character set – it is a way to encode Unicode into bytes. It can represent the same characters as UTF-16, the encoding that modern Windows uses. So the real question is whether converting to Unicode can lose information – and AFAIK, the answer is "it shouldn't, but it sometimes might":
The Old New Thing has a footnote about this:
Bonus chatter: Even the round trip from ANSI to Unicode and back to ANSI can be lossy, depending on the flags you pass regarding use of precomposed characters, for example.
Unicode has several canonical forms – for example,
ãcan be stored both as a single codepoint (precomposed), or as plaina+ combining tilde (decomposed). Windows prefers the former, macOS prefers the latter.
I'm not entirely sure whether e.g. Windows-932 counts as "ANSI", but I wouldn't be surprised if there were issues (as mentioned on Wikipedia) due to the same byte doubling as both a
¥symbol and a path separator that's normally a backslash...
Meanwhile, there is no encoding or codepage called "ANSI". It's the name of a standards organization which has defined several text encodings. Within Windows, the term means a large set of "Windows-125x" encodings for various countries and languages (somewhat corresponding to ISO 8859 encodings, and allegedly based on early drafts written by ANSI).
So it is very possible that one system calls Windows-1251 "ANSI" and another uses Windows-1257 for the same, and as a result, each can represent characters that the other cannot. (In fact, latest Windows 10.1809 even allows UTF-8 to be the "ANSI" encoding.) In the case of differently configured systems, even if the initial conversion to Unicode doesn't lose information, converting back to "ANSI" will.
edited Feb 15 at 7:15
answered Feb 14 at 16:24
grawitygrawity
241k37508562
241k37508562
You write "It's probably safe, but only if you convert between the same encodings both times." <--- Do you mean it's not 100% safe even if you convert between the same encodings both times?
– barlop
Feb 14 at 16:38
add a comment |
You write "It's probably safe, but only if you convert between the same encodings both times." <--- Do you mean it's not 100% safe even if you convert between the same encodings both times?
– barlop
Feb 14 at 16:38
You write "It's probably safe, but only if you convert between the same encodings both times." <--- Do you mean it's not 100% safe even if you convert between the same encodings both times?
– barlop
Feb 14 at 16:38
You write "It's probably safe, but only if you convert between the same encodings both times." <--- Do you mean it's not 100% safe even if you convert between the same encodings both times?
– barlop
Feb 14 at 16:38
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1405744%2fis-it-safe-no-data-loss-to-convert-ansi-to-utf-8-and-then-back-to-ansi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you are really worried you can compare the files after to see if they're identical, then you know if it's 100% There may be a command to do a byte by byte check. The
fccommand might do a byte by byte check– barlop
Feb 14 at 16:37