How do I parse JSON sprinkled unpredictably into a string?
Suppose that I've got a node.js application that receives input in a weird format: strings with JSON arbitrarily sprinkled into them, like so:
This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text
I have a couple guarantees about this input text:
The bits of literal text in between the JSON objects are always free from curly braces.
The top level JSON objects shoved into the text are always object literals, never arrays.
My goal is to split this into an array, with the literal text left alone and the JSON parsed out, like this:
[
"This is a string ",
{"with":"json","in":"it"},
" followed by more text ",
{"and":{"some":["more","json"]}},
" and more text"
]
So far I've written a naive solution that simply counts curly braces to decide where the JSON starts and stops. But this wouldn't work if the JSON contains strings with curly braces in them {"like":"this one } right here"}
. I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse
's job. Is there a better way to solve this problem?
javascript node.js json
|
show 2 more comments
Suppose that I've got a node.js application that receives input in a weird format: strings with JSON arbitrarily sprinkled into them, like so:
This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text
I have a couple guarantees about this input text:
The bits of literal text in between the JSON objects are always free from curly braces.
The top level JSON objects shoved into the text are always object literals, never arrays.
My goal is to split this into an array, with the literal text left alone and the JSON parsed out, like this:
[
"This is a string ",
{"with":"json","in":"it"},
" followed by more text ",
{"and":{"some":["more","json"]}},
" and more text"
]
So far I've written a naive solution that simply counts curly braces to decide where the JSON starts and stops. But this wouldn't work if the JSON contains strings with curly braces in them {"like":"this one } right here"}
. I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse
's job. Is there a better way to solve this problem?
javascript node.js json
3
There is no foolproof way to do this short of writing a full-blown parser yourself.
– Jared Smith
Feb 9 at 16:30
7
Wow, that's a nightmare of a data format. You could maybe brute-force it by starting from each{
, greedy-matching to the last}
, then backtracking until JSON.parse stops throwing errors on the contents; rinse and repeat. Computationally expensive (and ridiculous) but at least you wouldn't have to write your own parser.
– Daniel Beck
Feb 9 at 16:33
2
"strings with JSON arbitrarily sprinkled into them, like so:" Note,"This is a string "
is validJSON
.
– guest271314
Feb 9 at 16:42
Does the actual input data contain space characters following the comma, between the keys{"with":"json", "in":"it"}
?
– guest271314
Feb 9 at 17:03
2
"Is there a better way to solve this problem?" - Any chance you can contact the guys who generate that input and ask them to work decently?
– Pedro A
Feb 10 at 2:33
|
show 2 more comments
Suppose that I've got a node.js application that receives input in a weird format: strings with JSON arbitrarily sprinkled into them, like so:
This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text
I have a couple guarantees about this input text:
The bits of literal text in between the JSON objects are always free from curly braces.
The top level JSON objects shoved into the text are always object literals, never arrays.
My goal is to split this into an array, with the literal text left alone and the JSON parsed out, like this:
[
"This is a string ",
{"with":"json","in":"it"},
" followed by more text ",
{"and":{"some":["more","json"]}},
" and more text"
]
So far I've written a naive solution that simply counts curly braces to decide where the JSON starts and stops. But this wouldn't work if the JSON contains strings with curly braces in them {"like":"this one } right here"}
. I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse
's job. Is there a better way to solve this problem?
javascript node.js json
Suppose that I've got a node.js application that receives input in a weird format: strings with JSON arbitrarily sprinkled into them, like so:
This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text
I have a couple guarantees about this input text:
The bits of literal text in between the JSON objects are always free from curly braces.
The top level JSON objects shoved into the text are always object literals, never arrays.
My goal is to split this into an array, with the literal text left alone and the JSON parsed out, like this:
[
"This is a string ",
{"with":"json","in":"it"},
" followed by more text ",
{"and":{"some":["more","json"]}},
" and more text"
]
So far I've written a naive solution that simply counts curly braces to decide where the JSON starts and stops. But this wouldn't work if the JSON contains strings with curly braces in them {"like":"this one } right here"}
. I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse
's job. Is there a better way to solve this problem?
javascript node.js json
javascript node.js json
edited Feb 9 at 17:07
R. Davidson
asked Feb 9 at 16:26
R. DavidsonR. Davidson
1708
1708
3
There is no foolproof way to do this short of writing a full-blown parser yourself.
– Jared Smith
Feb 9 at 16:30
7
Wow, that's a nightmare of a data format. You could maybe brute-force it by starting from each{
, greedy-matching to the last}
, then backtracking until JSON.parse stops throwing errors on the contents; rinse and repeat. Computationally expensive (and ridiculous) but at least you wouldn't have to write your own parser.
– Daniel Beck
Feb 9 at 16:33
2
"strings with JSON arbitrarily sprinkled into them, like so:" Note,"This is a string "
is validJSON
.
– guest271314
Feb 9 at 16:42
Does the actual input data contain space characters following the comma, between the keys{"with":"json", "in":"it"}
?
– guest271314
Feb 9 at 17:03
2
"Is there a better way to solve this problem?" - Any chance you can contact the guys who generate that input and ask them to work decently?
– Pedro A
Feb 10 at 2:33
|
show 2 more comments
3
There is no foolproof way to do this short of writing a full-blown parser yourself.
– Jared Smith
Feb 9 at 16:30
7
Wow, that's a nightmare of a data format. You could maybe brute-force it by starting from each{
, greedy-matching to the last}
, then backtracking until JSON.parse stops throwing errors on the contents; rinse and repeat. Computationally expensive (and ridiculous) but at least you wouldn't have to write your own parser.
– Daniel Beck
Feb 9 at 16:33
2
"strings with JSON arbitrarily sprinkled into them, like so:" Note,"This is a string "
is validJSON
.
– guest271314
Feb 9 at 16:42
Does the actual input data contain space characters following the comma, between the keys{"with":"json", "in":"it"}
?
– guest271314
Feb 9 at 17:03
2
"Is there a better way to solve this problem?" - Any chance you can contact the guys who generate that input and ask them to work decently?
– Pedro A
Feb 10 at 2:33
3
3
There is no foolproof way to do this short of writing a full-blown parser yourself.
– Jared Smith
Feb 9 at 16:30
There is no foolproof way to do this short of writing a full-blown parser yourself.
– Jared Smith
Feb 9 at 16:30
7
7
Wow, that's a nightmare of a data format. You could maybe brute-force it by starting from each
{
, greedy-matching to the last }
, then backtracking until JSON.parse stops throwing errors on the contents; rinse and repeat. Computationally expensive (and ridiculous) but at least you wouldn't have to write your own parser.– Daniel Beck
Feb 9 at 16:33
Wow, that's a nightmare of a data format. You could maybe brute-force it by starting from each
{
, greedy-matching to the last }
, then backtracking until JSON.parse stops throwing errors on the contents; rinse and repeat. Computationally expensive (and ridiculous) but at least you wouldn't have to write your own parser.– Daniel Beck
Feb 9 at 16:33
2
2
"strings with JSON arbitrarily sprinkled into them, like so:" Note,
"This is a string "
is valid JSON
.– guest271314
Feb 9 at 16:42
"strings with JSON arbitrarily sprinkled into them, like so:" Note,
"This is a string "
is valid JSON
.– guest271314
Feb 9 at 16:42
Does the actual input data contain space characters following the comma, between the keys
{"with":"json", "in":"it"}
?– guest271314
Feb 9 at 17:03
Does the actual input data contain space characters following the comma, between the keys
{"with":"json", "in":"it"}
?– guest271314
Feb 9 at 17:03
2
2
"Is there a better way to solve this problem?" - Any chance you can contact the guys who generate that input and ask them to work decently?
– Pedro A
Feb 10 at 2:33
"Is there a better way to solve this problem?" - Any chance you can contact the guys who generate that input and ask them to work decently?
– Pedro A
Feb 10 at 2:33
|
show 2 more comments
6 Answers
6
active
oldest
votes
Here's a comparatively simple brute-force approach: split the whole input string on curly braces, then step through the array in order. Whenever you come across an open brace, find the longest chunk of the array from that starting point that successfully parses as JSON. Rinse and repeat.
This will not work if the input contains invalid JSON and/or unbalanced braces (see the last two test cases below.)
const tryJSON = input => {
try {
return JSON.parse(input);
} catch (e) {
return false;
}
}
const parse = input => {
let output = ;
let chunks = input.split(/([{}])/);
for (let i = 0; i < chunks.length; i++) {
if (chunks[i] === '{') {
// found some possible JSON; start at the last } and backtrack until it works.
for (let j = chunks.lastIndexOf('}'); j > i; j--) {
if (chunks[j] === '}') {
// Does it blend?
let parsed = tryJSON(chunks.slice(i, j + 1).join(""))
if (parsed) {
// it does! Grab the whole thing and skip ahead
output.push(parsed);
i = j;
}
}
}
} else if (chunks[i]) {
// neither JSON nor empty
output.push(chunks[i])
}
}
console.log(output)
return output
}
parse(`{"foo": "bar"}`)
parse(`test{"foo": "b}ar{{[[[{}}}}{}{}}"}`)
parse(`this {"is": "a st}ri{ng"} with {"json": ["in", "i{t"]}`)
parse(`{}`)
parse(`this {"i{s": invalid}`)
parse(`So is {this: "one"}`)
add a comment |
You can check if JSON.parse throws an error to determine if the chunk is a valid JSON object or not. If it throws an error then the unquoted }
are unbalanced:
const tests = [
'{"just":"json }}{}{}{{}}}}","x":[1,2,3]}',
'Just a string',
'This string has a tricky case: {"like":"this one } right here"}',
'This string {} has a tiny JSON object in it.',
'.{}.',
'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text',
];
tests.forEach( test => console.log( parse_json_interleaved_string( test ) ) );
function parse_json_interleaved_string ( str ) {
const chunks = [ ];
let last_json_end_index = -1;
let json_index = str.indexOf( '{', last_json_end_index + 1 );
for ( ; json_index !== -1; json_index = str.indexOf( '{', last_json_end_index + 1 ) ) {
// Push the plain string before the JSON
if ( json_index !== last_json_end_index + 1 )
chunks.push( str.substring( last_json_end_index, json_index ) );
let json_end_index = str.indexOf( '}', json_index + 1 );
// Find the end of the JSON
while ( true ) {
try {
JSON.parse( str.substring( json_index, json_end_index + 1 ) );
break;
} catch ( e ) {
json_end_index = str.indexOf( '}', json_end_index + 1 );
if ( json_end_index === -1 )
throw new Error( 'Unterminated JSON object in string' );
}
}
// Push JSON
chunks.push( str.substring( json_index, json_end_index + 1 ) );
last_json_end_index = json_end_index + 1;
}
// Push final plain string if any
if ( last_json_end_index === - 1 )
chunks.push( str );
else if ( str.length !== last_json_end_index )
chunks.push( str.substr( last_json_end_index ) );
return chunks;
}
add a comment |
I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse's job. Is there a better way to solve this problem?
I don't think so. Your input is pretty far from JSON.
But accounting for all those things isn't that hard.
The following snippet should work:
function construct(str) {
const len = str.length
let lastSavedIndex = -1
let bracketLevel = 0
let inJsonString = false
let lastCharWasEscapeChar = false
let result =
for(let i = 0; i < len; ++i) {
if(bracketLevel !== 0 && !lastCharWasEscapeChar && str[i] === '"') {
inJsonString = !inJsonString
}
else if (!inJsonString && str[i] === '{') {
if (bracketLevel === 0) {
result.push(str.substring(lastSavedIndex + 1, i))
lastSavedIndex = i - 1
}
++bracketLevel
}
else if (!inJsonString && str[i] === '}') {
--bracketLevel
if (bracketLevel === 0) {
result.push(JSON.parse(str.substring(lastSavedIndex + 1, i + 1)))
lastSavedIndex = i
}
}
else if (inJsonString && str[i] === '\') {
lastCharWasEscapeChar = !lastCharWasEscapeChar
}
else {
lastCharWasEscapeChar = false
}
}
if(lastSavedIndex !== len -1) {
result.push(str.substring(lastSavedIndex + 1, len))
}
return result
}
const standardText = 'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text. {"foo": "bar}"}'
const inputTA = document.getElementById('input')
const outputDiv = document.getElementById('output')
function updateOutput() {
outputDiv.innerText =
JSON.stringify(
construct(inputTA.value),
null,
2
)
}
inputTA.oninput = updateOutput
inputTA.value = standardText
updateOutput()
<textarea id="input" rows="5" cols="50"></textarea>
<pre id="output"><pre>
add a comment |
You can use RegExp
/(s(?=[{]))|s(?=[ws]+[{])/ig
to .split()
space character followed by opening curly brace {
or space character followed by one or more word or space characters followed by opening curly brace, .filter()
to remove undefined
values from resulting array, create a new array, then while
the resulting split array has .length
get the index where the value contains only space characters, .splice()
the beginning of the matched array to the index plus 1
, if array .length
is 0
.push()
empty string ''
else space character ' '
with match .join()
ed by space character ' '
.replace()
last space character and .shift()
matched array, which is JSON
, then next element of the matched array.
const str = `This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text {"like":"this one } right here"}`;
const formatStringContainingJSON = s => {
const r = /(s(?=[{]))|s(?=[ws]+[{])/ig;
const matches = s.split(r).filter(Boolean);
const res = ;
while (matches.length) {
const index = matches.findIndex(s => /^s+$/.test(s));
const match = matches.splice(0, index + 1);
res.push(
`${!res.length ? '' : ' '}${match.join(' ').replace(/s$/, '')}`
, `${matches.shift()}`
);
};
return res;
}
let result = formatStringContainingJSON(str);
console.log(result);
add a comment |
Here you one approach that iterates char by char. First we create an array from the input and then use reduce()
on it. When we detect an opening curly bracket {
we push the current accumulated chunk on an array of detected results, and then we set a flag on the accumulator
object we are using on reduce
. While this flag is set to true
we will try to parse for a JSON
and only when success we put the chunk
representing the JSON
on the array of detected results and set the flag again to false
.
The accumulator
of the reduce()
method will hold next data:
res
: an array with detected results:strings
orjsons
.
chunk
: a string representing the current accumulated chunk of chars.
isJson
: a boolean indicating if the currentchunk
isjson
or not.
const input = 'This is a string {"with":"json", "in":"it"} followed by more text {"and":{"some":["more","json","data"]}} and more text';
let obj = Array.from(input).reduce(({res, isJson, chunk}, curr) =>
{
if (curr === "{")
{
if (!isJson) res.push(chunk);
chunk = isJson ? chunk + curr : curr;
isJson = true;
}
else if (isJson)
{
try
{
chunk += curr;
JSON.parse(chunk);
// If no error, we found a JSON.
res.push(chunk);
chunk = "";
isJson = false;
}
catch(e) {/* Ignore error */}
}
else
{
chunk += curr;
}
return {res, isJson, chunk};
}, {res:, isJson:false, chunk:""})
// First stage done, lets debug obtained data.
obj.res.push(obj.chunk);
console.log(obj.res);
// Finally, we map the pieces.
let res = obj.res.map(x => x.match("{") ? JSON.parse(x) : x);
console.log(res);
add a comment |
Obligatory answer: this is an improper format (because of this complication, and the guarantee is a security hole if the parser is improperly designed); it should ideally be redesigned. (Sorry, it had to be said.)
Barring that, you can generate a parser using your favorite parser generator that outputs to javascript as a target language. It might even have a demo grammar for JSON.
However, the glaring security issue is incredibly scary (if any JSON gets past the 'guarantee', suddenly it's a vector). An array interspersed representation seems nicer, with the constraint that assert(text.length == markup.length+1)
:
'{
"text": ['Hello', 'this is red text!'],
"markup": [{"text":"everyone", "color":"red"}]
}'
or even nicer:
'[
{"type":"text", "text":"Hello"},
{"type":"markup", "text":"everyone", "color":"red"} # or ,"val":{"text":.., "color":..}}
{"type":"text", "text":"this is red text!"},
...
]'
Store compressed ideally. Unserialize without any worries with JSON.parse.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54608178%2fhow-do-i-parse-json-sprinkled-unpredictably-into-a-string%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here's a comparatively simple brute-force approach: split the whole input string on curly braces, then step through the array in order. Whenever you come across an open brace, find the longest chunk of the array from that starting point that successfully parses as JSON. Rinse and repeat.
This will not work if the input contains invalid JSON and/or unbalanced braces (see the last two test cases below.)
const tryJSON = input => {
try {
return JSON.parse(input);
} catch (e) {
return false;
}
}
const parse = input => {
let output = ;
let chunks = input.split(/([{}])/);
for (let i = 0; i < chunks.length; i++) {
if (chunks[i] === '{') {
// found some possible JSON; start at the last } and backtrack until it works.
for (let j = chunks.lastIndexOf('}'); j > i; j--) {
if (chunks[j] === '}') {
// Does it blend?
let parsed = tryJSON(chunks.slice(i, j + 1).join(""))
if (parsed) {
// it does! Grab the whole thing and skip ahead
output.push(parsed);
i = j;
}
}
}
} else if (chunks[i]) {
// neither JSON nor empty
output.push(chunks[i])
}
}
console.log(output)
return output
}
parse(`{"foo": "bar"}`)
parse(`test{"foo": "b}ar{{[[[{}}}}{}{}}"}`)
parse(`this {"is": "a st}ri{ng"} with {"json": ["in", "i{t"]}`)
parse(`{}`)
parse(`this {"i{s": invalid}`)
parse(`So is {this: "one"}`)
add a comment |
Here's a comparatively simple brute-force approach: split the whole input string on curly braces, then step through the array in order. Whenever you come across an open brace, find the longest chunk of the array from that starting point that successfully parses as JSON. Rinse and repeat.
This will not work if the input contains invalid JSON and/or unbalanced braces (see the last two test cases below.)
const tryJSON = input => {
try {
return JSON.parse(input);
} catch (e) {
return false;
}
}
const parse = input => {
let output = ;
let chunks = input.split(/([{}])/);
for (let i = 0; i < chunks.length; i++) {
if (chunks[i] === '{') {
// found some possible JSON; start at the last } and backtrack until it works.
for (let j = chunks.lastIndexOf('}'); j > i; j--) {
if (chunks[j] === '}') {
// Does it blend?
let parsed = tryJSON(chunks.slice(i, j + 1).join(""))
if (parsed) {
// it does! Grab the whole thing and skip ahead
output.push(parsed);
i = j;
}
}
}
} else if (chunks[i]) {
// neither JSON nor empty
output.push(chunks[i])
}
}
console.log(output)
return output
}
parse(`{"foo": "bar"}`)
parse(`test{"foo": "b}ar{{[[[{}}}}{}{}}"}`)
parse(`this {"is": "a st}ri{ng"} with {"json": ["in", "i{t"]}`)
parse(`{}`)
parse(`this {"i{s": invalid}`)
parse(`So is {this: "one"}`)
add a comment |
Here's a comparatively simple brute-force approach: split the whole input string on curly braces, then step through the array in order. Whenever you come across an open brace, find the longest chunk of the array from that starting point that successfully parses as JSON. Rinse and repeat.
This will not work if the input contains invalid JSON and/or unbalanced braces (see the last two test cases below.)
const tryJSON = input => {
try {
return JSON.parse(input);
} catch (e) {
return false;
}
}
const parse = input => {
let output = ;
let chunks = input.split(/([{}])/);
for (let i = 0; i < chunks.length; i++) {
if (chunks[i] === '{') {
// found some possible JSON; start at the last } and backtrack until it works.
for (let j = chunks.lastIndexOf('}'); j > i; j--) {
if (chunks[j] === '}') {
// Does it blend?
let parsed = tryJSON(chunks.slice(i, j + 1).join(""))
if (parsed) {
// it does! Grab the whole thing and skip ahead
output.push(parsed);
i = j;
}
}
}
} else if (chunks[i]) {
// neither JSON nor empty
output.push(chunks[i])
}
}
console.log(output)
return output
}
parse(`{"foo": "bar"}`)
parse(`test{"foo": "b}ar{{[[[{}}}}{}{}}"}`)
parse(`this {"is": "a st}ri{ng"} with {"json": ["in", "i{t"]}`)
parse(`{}`)
parse(`this {"i{s": invalid}`)
parse(`So is {this: "one"}`)
Here's a comparatively simple brute-force approach: split the whole input string on curly braces, then step through the array in order. Whenever you come across an open brace, find the longest chunk of the array from that starting point that successfully parses as JSON. Rinse and repeat.
This will not work if the input contains invalid JSON and/or unbalanced braces (see the last two test cases below.)
const tryJSON = input => {
try {
return JSON.parse(input);
} catch (e) {
return false;
}
}
const parse = input => {
let output = ;
let chunks = input.split(/([{}])/);
for (let i = 0; i < chunks.length; i++) {
if (chunks[i] === '{') {
// found some possible JSON; start at the last } and backtrack until it works.
for (let j = chunks.lastIndexOf('}'); j > i; j--) {
if (chunks[j] === '}') {
// Does it blend?
let parsed = tryJSON(chunks.slice(i, j + 1).join(""))
if (parsed) {
// it does! Grab the whole thing and skip ahead
output.push(parsed);
i = j;
}
}
}
} else if (chunks[i]) {
// neither JSON nor empty
output.push(chunks[i])
}
}
console.log(output)
return output
}
parse(`{"foo": "bar"}`)
parse(`test{"foo": "b}ar{{[[[{}}}}{}{}}"}`)
parse(`this {"is": "a st}ri{ng"} with {"json": ["in", "i{t"]}`)
parse(`{}`)
parse(`this {"i{s": invalid}`)
parse(`So is {this: "one"}`)
const tryJSON = input => {
try {
return JSON.parse(input);
} catch (e) {
return false;
}
}
const parse = input => {
let output = ;
let chunks = input.split(/([{}])/);
for (let i = 0; i < chunks.length; i++) {
if (chunks[i] === '{') {
// found some possible JSON; start at the last } and backtrack until it works.
for (let j = chunks.lastIndexOf('}'); j > i; j--) {
if (chunks[j] === '}') {
// Does it blend?
let parsed = tryJSON(chunks.slice(i, j + 1).join(""))
if (parsed) {
// it does! Grab the whole thing and skip ahead
output.push(parsed);
i = j;
}
}
}
} else if (chunks[i]) {
// neither JSON nor empty
output.push(chunks[i])
}
}
console.log(output)
return output
}
parse(`{"foo": "bar"}`)
parse(`test{"foo": "b}ar{{[[[{}}}}{}{}}"}`)
parse(`this {"is": "a st}ri{ng"} with {"json": ["in", "i{t"]}`)
parse(`{}`)
parse(`this {"i{s": invalid}`)
parse(`So is {this: "one"}`)
const tryJSON = input => {
try {
return JSON.parse(input);
} catch (e) {
return false;
}
}
const parse = input => {
let output = ;
let chunks = input.split(/([{}])/);
for (let i = 0; i < chunks.length; i++) {
if (chunks[i] === '{') {
// found some possible JSON; start at the last } and backtrack until it works.
for (let j = chunks.lastIndexOf('}'); j > i; j--) {
if (chunks[j] === '}') {
// Does it blend?
let parsed = tryJSON(chunks.slice(i, j + 1).join(""))
if (parsed) {
// it does! Grab the whole thing and skip ahead
output.push(parsed);
i = j;
}
}
}
} else if (chunks[i]) {
// neither JSON nor empty
output.push(chunks[i])
}
}
console.log(output)
return output
}
parse(`{"foo": "bar"}`)
parse(`test{"foo": "b}ar{{[[[{}}}}{}{}}"}`)
parse(`this {"is": "a st}ri{ng"} with {"json": ["in", "i{t"]}`)
parse(`{}`)
parse(`this {"i{s": invalid}`)
parse(`So is {this: "one"}`)
edited Feb 9 at 19:06
answered Feb 9 at 18:22
Daniel BeckDaniel Beck
13.6k51844
13.6k51844
add a comment |
add a comment |
You can check if JSON.parse throws an error to determine if the chunk is a valid JSON object or not. If it throws an error then the unquoted }
are unbalanced:
const tests = [
'{"just":"json }}{}{}{{}}}}","x":[1,2,3]}',
'Just a string',
'This string has a tricky case: {"like":"this one } right here"}',
'This string {} has a tiny JSON object in it.',
'.{}.',
'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text',
];
tests.forEach( test => console.log( parse_json_interleaved_string( test ) ) );
function parse_json_interleaved_string ( str ) {
const chunks = [ ];
let last_json_end_index = -1;
let json_index = str.indexOf( '{', last_json_end_index + 1 );
for ( ; json_index !== -1; json_index = str.indexOf( '{', last_json_end_index + 1 ) ) {
// Push the plain string before the JSON
if ( json_index !== last_json_end_index + 1 )
chunks.push( str.substring( last_json_end_index, json_index ) );
let json_end_index = str.indexOf( '}', json_index + 1 );
// Find the end of the JSON
while ( true ) {
try {
JSON.parse( str.substring( json_index, json_end_index + 1 ) );
break;
} catch ( e ) {
json_end_index = str.indexOf( '}', json_end_index + 1 );
if ( json_end_index === -1 )
throw new Error( 'Unterminated JSON object in string' );
}
}
// Push JSON
chunks.push( str.substring( json_index, json_end_index + 1 ) );
last_json_end_index = json_end_index + 1;
}
// Push final plain string if any
if ( last_json_end_index === - 1 )
chunks.push( str );
else if ( str.length !== last_json_end_index )
chunks.push( str.substr( last_json_end_index ) );
return chunks;
}
add a comment |
You can check if JSON.parse throws an error to determine if the chunk is a valid JSON object or not. If it throws an error then the unquoted }
are unbalanced:
const tests = [
'{"just":"json }}{}{}{{}}}}","x":[1,2,3]}',
'Just a string',
'This string has a tricky case: {"like":"this one } right here"}',
'This string {} has a tiny JSON object in it.',
'.{}.',
'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text',
];
tests.forEach( test => console.log( parse_json_interleaved_string( test ) ) );
function parse_json_interleaved_string ( str ) {
const chunks = [ ];
let last_json_end_index = -1;
let json_index = str.indexOf( '{', last_json_end_index + 1 );
for ( ; json_index !== -1; json_index = str.indexOf( '{', last_json_end_index + 1 ) ) {
// Push the plain string before the JSON
if ( json_index !== last_json_end_index + 1 )
chunks.push( str.substring( last_json_end_index, json_index ) );
let json_end_index = str.indexOf( '}', json_index + 1 );
// Find the end of the JSON
while ( true ) {
try {
JSON.parse( str.substring( json_index, json_end_index + 1 ) );
break;
} catch ( e ) {
json_end_index = str.indexOf( '}', json_end_index + 1 );
if ( json_end_index === -1 )
throw new Error( 'Unterminated JSON object in string' );
}
}
// Push JSON
chunks.push( str.substring( json_index, json_end_index + 1 ) );
last_json_end_index = json_end_index + 1;
}
// Push final plain string if any
if ( last_json_end_index === - 1 )
chunks.push( str );
else if ( str.length !== last_json_end_index )
chunks.push( str.substr( last_json_end_index ) );
return chunks;
}
add a comment |
You can check if JSON.parse throws an error to determine if the chunk is a valid JSON object or not. If it throws an error then the unquoted }
are unbalanced:
const tests = [
'{"just":"json }}{}{}{{}}}}","x":[1,2,3]}',
'Just a string',
'This string has a tricky case: {"like":"this one } right here"}',
'This string {} has a tiny JSON object in it.',
'.{}.',
'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text',
];
tests.forEach( test => console.log( parse_json_interleaved_string( test ) ) );
function parse_json_interleaved_string ( str ) {
const chunks = [ ];
let last_json_end_index = -1;
let json_index = str.indexOf( '{', last_json_end_index + 1 );
for ( ; json_index !== -1; json_index = str.indexOf( '{', last_json_end_index + 1 ) ) {
// Push the plain string before the JSON
if ( json_index !== last_json_end_index + 1 )
chunks.push( str.substring( last_json_end_index, json_index ) );
let json_end_index = str.indexOf( '}', json_index + 1 );
// Find the end of the JSON
while ( true ) {
try {
JSON.parse( str.substring( json_index, json_end_index + 1 ) );
break;
} catch ( e ) {
json_end_index = str.indexOf( '}', json_end_index + 1 );
if ( json_end_index === -1 )
throw new Error( 'Unterminated JSON object in string' );
}
}
// Push JSON
chunks.push( str.substring( json_index, json_end_index + 1 ) );
last_json_end_index = json_end_index + 1;
}
// Push final plain string if any
if ( last_json_end_index === - 1 )
chunks.push( str );
else if ( str.length !== last_json_end_index )
chunks.push( str.substr( last_json_end_index ) );
return chunks;
}
You can check if JSON.parse throws an error to determine if the chunk is a valid JSON object or not. If it throws an error then the unquoted }
are unbalanced:
const tests = [
'{"just":"json }}{}{}{{}}}}","x":[1,2,3]}',
'Just a string',
'This string has a tricky case: {"like":"this one } right here"}',
'This string {} has a tiny JSON object in it.',
'.{}.',
'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text',
];
tests.forEach( test => console.log( parse_json_interleaved_string( test ) ) );
function parse_json_interleaved_string ( str ) {
const chunks = [ ];
let last_json_end_index = -1;
let json_index = str.indexOf( '{', last_json_end_index + 1 );
for ( ; json_index !== -1; json_index = str.indexOf( '{', last_json_end_index + 1 ) ) {
// Push the plain string before the JSON
if ( json_index !== last_json_end_index + 1 )
chunks.push( str.substring( last_json_end_index, json_index ) );
let json_end_index = str.indexOf( '}', json_index + 1 );
// Find the end of the JSON
while ( true ) {
try {
JSON.parse( str.substring( json_index, json_end_index + 1 ) );
break;
} catch ( e ) {
json_end_index = str.indexOf( '}', json_end_index + 1 );
if ( json_end_index === -1 )
throw new Error( 'Unterminated JSON object in string' );
}
}
// Push JSON
chunks.push( str.substring( json_index, json_end_index + 1 ) );
last_json_end_index = json_end_index + 1;
}
// Push final plain string if any
if ( last_json_end_index === - 1 )
chunks.push( str );
else if ( str.length !== last_json_end_index )
chunks.push( str.substr( last_json_end_index ) );
return chunks;
}
const tests = [
'{"just":"json }}{}{}{{}}}}","x":[1,2,3]}',
'Just a string',
'This string has a tricky case: {"like":"this one } right here"}',
'This string {} has a tiny JSON object in it.',
'.{}.',
'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text',
];
tests.forEach( test => console.log( parse_json_interleaved_string( test ) ) );
function parse_json_interleaved_string ( str ) {
const chunks = [ ];
let last_json_end_index = -1;
let json_index = str.indexOf( '{', last_json_end_index + 1 );
for ( ; json_index !== -1; json_index = str.indexOf( '{', last_json_end_index + 1 ) ) {
// Push the plain string before the JSON
if ( json_index !== last_json_end_index + 1 )
chunks.push( str.substring( last_json_end_index, json_index ) );
let json_end_index = str.indexOf( '}', json_index + 1 );
// Find the end of the JSON
while ( true ) {
try {
JSON.parse( str.substring( json_index, json_end_index + 1 ) );
break;
} catch ( e ) {
json_end_index = str.indexOf( '}', json_end_index + 1 );
if ( json_end_index === -1 )
throw new Error( 'Unterminated JSON object in string' );
}
}
// Push JSON
chunks.push( str.substring( json_index, json_end_index + 1 ) );
last_json_end_index = json_end_index + 1;
}
// Push final plain string if any
if ( last_json_end_index === - 1 )
chunks.push( str );
else if ( str.length !== last_json_end_index )
chunks.push( str.substr( last_json_end_index ) );
return chunks;
}
const tests = [
'{"just":"json }}{}{}{{}}}}","x":[1,2,3]}',
'Just a string',
'This string has a tricky case: {"like":"this one } right here"}',
'This string {} has a tiny JSON object in it.',
'.{}.',
'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text',
];
tests.forEach( test => console.log( parse_json_interleaved_string( test ) ) );
function parse_json_interleaved_string ( str ) {
const chunks = [ ];
let last_json_end_index = -1;
let json_index = str.indexOf( '{', last_json_end_index + 1 );
for ( ; json_index !== -1; json_index = str.indexOf( '{', last_json_end_index + 1 ) ) {
// Push the plain string before the JSON
if ( json_index !== last_json_end_index + 1 )
chunks.push( str.substring( last_json_end_index, json_index ) );
let json_end_index = str.indexOf( '}', json_index + 1 );
// Find the end of the JSON
while ( true ) {
try {
JSON.parse( str.substring( json_index, json_end_index + 1 ) );
break;
} catch ( e ) {
json_end_index = str.indexOf( '}', json_end_index + 1 );
if ( json_end_index === -1 )
throw new Error( 'Unterminated JSON object in string' );
}
}
// Push JSON
chunks.push( str.substring( json_index, json_end_index + 1 ) );
last_json_end_index = json_end_index + 1;
}
// Push final plain string if any
if ( last_json_end_index === - 1 )
chunks.push( str );
else if ( str.length !== last_json_end_index )
chunks.push( str.substr( last_json_end_index ) );
return chunks;
}
edited Feb 9 at 18:02
answered Feb 9 at 17:28
PaulproPaulpro
114k15226232
114k15226232
add a comment |
add a comment |
I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse's job. Is there a better way to solve this problem?
I don't think so. Your input is pretty far from JSON.
But accounting for all those things isn't that hard.
The following snippet should work:
function construct(str) {
const len = str.length
let lastSavedIndex = -1
let bracketLevel = 0
let inJsonString = false
let lastCharWasEscapeChar = false
let result =
for(let i = 0; i < len; ++i) {
if(bracketLevel !== 0 && !lastCharWasEscapeChar && str[i] === '"') {
inJsonString = !inJsonString
}
else if (!inJsonString && str[i] === '{') {
if (bracketLevel === 0) {
result.push(str.substring(lastSavedIndex + 1, i))
lastSavedIndex = i - 1
}
++bracketLevel
}
else if (!inJsonString && str[i] === '}') {
--bracketLevel
if (bracketLevel === 0) {
result.push(JSON.parse(str.substring(lastSavedIndex + 1, i + 1)))
lastSavedIndex = i
}
}
else if (inJsonString && str[i] === '\') {
lastCharWasEscapeChar = !lastCharWasEscapeChar
}
else {
lastCharWasEscapeChar = false
}
}
if(lastSavedIndex !== len -1) {
result.push(str.substring(lastSavedIndex + 1, len))
}
return result
}
const standardText = 'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text. {"foo": "bar}"}'
const inputTA = document.getElementById('input')
const outputDiv = document.getElementById('output')
function updateOutput() {
outputDiv.innerText =
JSON.stringify(
construct(inputTA.value),
null,
2
)
}
inputTA.oninput = updateOutput
inputTA.value = standardText
updateOutput()
<textarea id="input" rows="5" cols="50"></textarea>
<pre id="output"><pre>
add a comment |
I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse's job. Is there a better way to solve this problem?
I don't think so. Your input is pretty far from JSON.
But accounting for all those things isn't that hard.
The following snippet should work:
function construct(str) {
const len = str.length
let lastSavedIndex = -1
let bracketLevel = 0
let inJsonString = false
let lastCharWasEscapeChar = false
let result =
for(let i = 0; i < len; ++i) {
if(bracketLevel !== 0 && !lastCharWasEscapeChar && str[i] === '"') {
inJsonString = !inJsonString
}
else if (!inJsonString && str[i] === '{') {
if (bracketLevel === 0) {
result.push(str.substring(lastSavedIndex + 1, i))
lastSavedIndex = i - 1
}
++bracketLevel
}
else if (!inJsonString && str[i] === '}') {
--bracketLevel
if (bracketLevel === 0) {
result.push(JSON.parse(str.substring(lastSavedIndex + 1, i + 1)))
lastSavedIndex = i
}
}
else if (inJsonString && str[i] === '\') {
lastCharWasEscapeChar = !lastCharWasEscapeChar
}
else {
lastCharWasEscapeChar = false
}
}
if(lastSavedIndex !== len -1) {
result.push(str.substring(lastSavedIndex + 1, len))
}
return result
}
const standardText = 'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text. {"foo": "bar}"}'
const inputTA = document.getElementById('input')
const outputDiv = document.getElementById('output')
function updateOutput() {
outputDiv.innerText =
JSON.stringify(
construct(inputTA.value),
null,
2
)
}
inputTA.oninput = updateOutput
inputTA.value = standardText
updateOutput()
<textarea id="input" rows="5" cols="50"></textarea>
<pre id="output"><pre>
add a comment |
I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse's job. Is there a better way to solve this problem?
I don't think so. Your input is pretty far from JSON.
But accounting for all those things isn't that hard.
The following snippet should work:
function construct(str) {
const len = str.length
let lastSavedIndex = -1
let bracketLevel = 0
let inJsonString = false
let lastCharWasEscapeChar = false
let result =
for(let i = 0; i < len; ++i) {
if(bracketLevel !== 0 && !lastCharWasEscapeChar && str[i] === '"') {
inJsonString = !inJsonString
}
else if (!inJsonString && str[i] === '{') {
if (bracketLevel === 0) {
result.push(str.substring(lastSavedIndex + 1, i))
lastSavedIndex = i - 1
}
++bracketLevel
}
else if (!inJsonString && str[i] === '}') {
--bracketLevel
if (bracketLevel === 0) {
result.push(JSON.parse(str.substring(lastSavedIndex + 1, i + 1)))
lastSavedIndex = i
}
}
else if (inJsonString && str[i] === '\') {
lastCharWasEscapeChar = !lastCharWasEscapeChar
}
else {
lastCharWasEscapeChar = false
}
}
if(lastSavedIndex !== len -1) {
result.push(str.substring(lastSavedIndex + 1, len))
}
return result
}
const standardText = 'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text. {"foo": "bar}"}'
const inputTA = document.getElementById('input')
const outputDiv = document.getElementById('output')
function updateOutput() {
outputDiv.innerText =
JSON.stringify(
construct(inputTA.value),
null,
2
)
}
inputTA.oninput = updateOutput
inputTA.value = standardText
updateOutput()
<textarea id="input" rows="5" cols="50"></textarea>
<pre id="output"><pre>
I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse's job. Is there a better way to solve this problem?
I don't think so. Your input is pretty far from JSON.
But accounting for all those things isn't that hard.
The following snippet should work:
function construct(str) {
const len = str.length
let lastSavedIndex = -1
let bracketLevel = 0
let inJsonString = false
let lastCharWasEscapeChar = false
let result =
for(let i = 0; i < len; ++i) {
if(bracketLevel !== 0 && !lastCharWasEscapeChar && str[i] === '"') {
inJsonString = !inJsonString
}
else if (!inJsonString && str[i] === '{') {
if (bracketLevel === 0) {
result.push(str.substring(lastSavedIndex + 1, i))
lastSavedIndex = i - 1
}
++bracketLevel
}
else if (!inJsonString && str[i] === '}') {
--bracketLevel
if (bracketLevel === 0) {
result.push(JSON.parse(str.substring(lastSavedIndex + 1, i + 1)))
lastSavedIndex = i
}
}
else if (inJsonString && str[i] === '\') {
lastCharWasEscapeChar = !lastCharWasEscapeChar
}
else {
lastCharWasEscapeChar = false
}
}
if(lastSavedIndex !== len -1) {
result.push(str.substring(lastSavedIndex + 1, len))
}
return result
}
const standardText = 'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text. {"foo": "bar}"}'
const inputTA = document.getElementById('input')
const outputDiv = document.getElementById('output')
function updateOutput() {
outputDiv.innerText =
JSON.stringify(
construct(inputTA.value),
null,
2
)
}
inputTA.oninput = updateOutput
inputTA.value = standardText
updateOutput()
<textarea id="input" rows="5" cols="50"></textarea>
<pre id="output"><pre>
function construct(str) {
const len = str.length
let lastSavedIndex = -1
let bracketLevel = 0
let inJsonString = false
let lastCharWasEscapeChar = false
let result =
for(let i = 0; i < len; ++i) {
if(bracketLevel !== 0 && !lastCharWasEscapeChar && str[i] === '"') {
inJsonString = !inJsonString
}
else if (!inJsonString && str[i] === '{') {
if (bracketLevel === 0) {
result.push(str.substring(lastSavedIndex + 1, i))
lastSavedIndex = i - 1
}
++bracketLevel
}
else if (!inJsonString && str[i] === '}') {
--bracketLevel
if (bracketLevel === 0) {
result.push(JSON.parse(str.substring(lastSavedIndex + 1, i + 1)))
lastSavedIndex = i
}
}
else if (inJsonString && str[i] === '\') {
lastCharWasEscapeChar = !lastCharWasEscapeChar
}
else {
lastCharWasEscapeChar = false
}
}
if(lastSavedIndex !== len -1) {
result.push(str.substring(lastSavedIndex + 1, len))
}
return result
}
const standardText = 'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text. {"foo": "bar}"}'
const inputTA = document.getElementById('input')
const outputDiv = document.getElementById('output')
function updateOutput() {
outputDiv.innerText =
JSON.stringify(
construct(inputTA.value),
null,
2
)
}
inputTA.oninput = updateOutput
inputTA.value = standardText
updateOutput()
<textarea id="input" rows="5" cols="50"></textarea>
<pre id="output"><pre>
function construct(str) {
const len = str.length
let lastSavedIndex = -1
let bracketLevel = 0
let inJsonString = false
let lastCharWasEscapeChar = false
let result =
for(let i = 0; i < len; ++i) {
if(bracketLevel !== 0 && !lastCharWasEscapeChar && str[i] === '"') {
inJsonString = !inJsonString
}
else if (!inJsonString && str[i] === '{') {
if (bracketLevel === 0) {
result.push(str.substring(lastSavedIndex + 1, i))
lastSavedIndex = i - 1
}
++bracketLevel
}
else if (!inJsonString && str[i] === '}') {
--bracketLevel
if (bracketLevel === 0) {
result.push(JSON.parse(str.substring(lastSavedIndex + 1, i + 1)))
lastSavedIndex = i
}
}
else if (inJsonString && str[i] === '\') {
lastCharWasEscapeChar = !lastCharWasEscapeChar
}
else {
lastCharWasEscapeChar = false
}
}
if(lastSavedIndex !== len -1) {
result.push(str.substring(lastSavedIndex + 1, len))
}
return result
}
const standardText = 'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text. {"foo": "bar}"}'
const inputTA = document.getElementById('input')
const outputDiv = document.getElementById('output')
function updateOutput() {
outputDiv.innerText =
JSON.stringify(
construct(inputTA.value),
null,
2
)
}
inputTA.oninput = updateOutput
inputTA.value = standardText
updateOutput()
<textarea id="input" rows="5" cols="50"></textarea>
<pre id="output"><pre>
answered Feb 9 at 17:55
SourceOverflowSourceOverflow
1,6181416
1,6181416
add a comment |
add a comment |
You can use RegExp
/(s(?=[{]))|s(?=[ws]+[{])/ig
to .split()
space character followed by opening curly brace {
or space character followed by one or more word or space characters followed by opening curly brace, .filter()
to remove undefined
values from resulting array, create a new array, then while
the resulting split array has .length
get the index where the value contains only space characters, .splice()
the beginning of the matched array to the index plus 1
, if array .length
is 0
.push()
empty string ''
else space character ' '
with match .join()
ed by space character ' '
.replace()
last space character and .shift()
matched array, which is JSON
, then next element of the matched array.
const str = `This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text {"like":"this one } right here"}`;
const formatStringContainingJSON = s => {
const r = /(s(?=[{]))|s(?=[ws]+[{])/ig;
const matches = s.split(r).filter(Boolean);
const res = ;
while (matches.length) {
const index = matches.findIndex(s => /^s+$/.test(s));
const match = matches.splice(0, index + 1);
res.push(
`${!res.length ? '' : ' '}${match.join(' ').replace(/s$/, '')}`
, `${matches.shift()}`
);
};
return res;
}
let result = formatStringContainingJSON(str);
console.log(result);
add a comment |
You can use RegExp
/(s(?=[{]))|s(?=[ws]+[{])/ig
to .split()
space character followed by opening curly brace {
or space character followed by one or more word or space characters followed by opening curly brace, .filter()
to remove undefined
values from resulting array, create a new array, then while
the resulting split array has .length
get the index where the value contains only space characters, .splice()
the beginning of the matched array to the index plus 1
, if array .length
is 0
.push()
empty string ''
else space character ' '
with match .join()
ed by space character ' '
.replace()
last space character and .shift()
matched array, which is JSON
, then next element of the matched array.
const str = `This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text {"like":"this one } right here"}`;
const formatStringContainingJSON = s => {
const r = /(s(?=[{]))|s(?=[ws]+[{])/ig;
const matches = s.split(r).filter(Boolean);
const res = ;
while (matches.length) {
const index = matches.findIndex(s => /^s+$/.test(s));
const match = matches.splice(0, index + 1);
res.push(
`${!res.length ? '' : ' '}${match.join(' ').replace(/s$/, '')}`
, `${matches.shift()}`
);
};
return res;
}
let result = formatStringContainingJSON(str);
console.log(result);
add a comment |
You can use RegExp
/(s(?=[{]))|s(?=[ws]+[{])/ig
to .split()
space character followed by opening curly brace {
or space character followed by one or more word or space characters followed by opening curly brace, .filter()
to remove undefined
values from resulting array, create a new array, then while
the resulting split array has .length
get the index where the value contains only space characters, .splice()
the beginning of the matched array to the index plus 1
, if array .length
is 0
.push()
empty string ''
else space character ' '
with match .join()
ed by space character ' '
.replace()
last space character and .shift()
matched array, which is JSON
, then next element of the matched array.
const str = `This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text {"like":"this one } right here"}`;
const formatStringContainingJSON = s => {
const r = /(s(?=[{]))|s(?=[ws]+[{])/ig;
const matches = s.split(r).filter(Boolean);
const res = ;
while (matches.length) {
const index = matches.findIndex(s => /^s+$/.test(s));
const match = matches.splice(0, index + 1);
res.push(
`${!res.length ? '' : ' '}${match.join(' ').replace(/s$/, '')}`
, `${matches.shift()}`
);
};
return res;
}
let result = formatStringContainingJSON(str);
console.log(result);
You can use RegExp
/(s(?=[{]))|s(?=[ws]+[{])/ig
to .split()
space character followed by opening curly brace {
or space character followed by one or more word or space characters followed by opening curly brace, .filter()
to remove undefined
values from resulting array, create a new array, then while
the resulting split array has .length
get the index where the value contains only space characters, .splice()
the beginning of the matched array to the index plus 1
, if array .length
is 0
.push()
empty string ''
else space character ' '
with match .join()
ed by space character ' '
.replace()
last space character and .shift()
matched array, which is JSON
, then next element of the matched array.
const str = `This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text {"like":"this one } right here"}`;
const formatStringContainingJSON = s => {
const r = /(s(?=[{]))|s(?=[ws]+[{])/ig;
const matches = s.split(r).filter(Boolean);
const res = ;
while (matches.length) {
const index = matches.findIndex(s => /^s+$/.test(s));
const match = matches.splice(0, index + 1);
res.push(
`${!res.length ? '' : ' '}${match.join(' ').replace(/s$/, '')}`
, `${matches.shift()}`
);
};
return res;
}
let result = formatStringContainingJSON(str);
console.log(result);
const str = `This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text {"like":"this one } right here"}`;
const formatStringContainingJSON = s => {
const r = /(s(?=[{]))|s(?=[ws]+[{])/ig;
const matches = s.split(r).filter(Boolean);
const res = ;
while (matches.length) {
const index = matches.findIndex(s => /^s+$/.test(s));
const match = matches.splice(0, index + 1);
res.push(
`${!res.length ? '' : ' '}${match.join(' ').replace(/s$/, '')}`
, `${matches.shift()}`
);
};
return res;
}
let result = formatStringContainingJSON(str);
console.log(result);
const str = `This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text {"like":"this one } right here"}`;
const formatStringContainingJSON = s => {
const r = /(s(?=[{]))|s(?=[ws]+[{])/ig;
const matches = s.split(r).filter(Boolean);
const res = ;
while (matches.length) {
const index = matches.findIndex(s => /^s+$/.test(s));
const match = matches.splice(0, index + 1);
res.push(
`${!res.length ? '' : ' '}${match.join(' ').replace(/s$/, '')}`
, `${matches.shift()}`
);
};
return res;
}
let result = formatStringContainingJSON(str);
console.log(result);
answered Feb 9 at 19:43
guest271314guest271314
1
1
add a comment |
add a comment |
Here you one approach that iterates char by char. First we create an array from the input and then use reduce()
on it. When we detect an opening curly bracket {
we push the current accumulated chunk on an array of detected results, and then we set a flag on the accumulator
object we are using on reduce
. While this flag is set to true
we will try to parse for a JSON
and only when success we put the chunk
representing the JSON
on the array of detected results and set the flag again to false
.
The accumulator
of the reduce()
method will hold next data:
res
: an array with detected results:strings
orjsons
.
chunk
: a string representing the current accumulated chunk of chars.
isJson
: a boolean indicating if the currentchunk
isjson
or not.
const input = 'This is a string {"with":"json", "in":"it"} followed by more text {"and":{"some":["more","json","data"]}} and more text';
let obj = Array.from(input).reduce(({res, isJson, chunk}, curr) =>
{
if (curr === "{")
{
if (!isJson) res.push(chunk);
chunk = isJson ? chunk + curr : curr;
isJson = true;
}
else if (isJson)
{
try
{
chunk += curr;
JSON.parse(chunk);
// If no error, we found a JSON.
res.push(chunk);
chunk = "";
isJson = false;
}
catch(e) {/* Ignore error */}
}
else
{
chunk += curr;
}
return {res, isJson, chunk};
}, {res:, isJson:false, chunk:""})
// First stage done, lets debug obtained data.
obj.res.push(obj.chunk);
console.log(obj.res);
// Finally, we map the pieces.
let res = obj.res.map(x => x.match("{") ? JSON.parse(x) : x);
console.log(res);
add a comment |
Here you one approach that iterates char by char. First we create an array from the input and then use reduce()
on it. When we detect an opening curly bracket {
we push the current accumulated chunk on an array of detected results, and then we set a flag on the accumulator
object we are using on reduce
. While this flag is set to true
we will try to parse for a JSON
and only when success we put the chunk
representing the JSON
on the array of detected results and set the flag again to false
.
The accumulator
of the reduce()
method will hold next data:
res
: an array with detected results:strings
orjsons
.
chunk
: a string representing the current accumulated chunk of chars.
isJson
: a boolean indicating if the currentchunk
isjson
or not.
const input = 'This is a string {"with":"json", "in":"it"} followed by more text {"and":{"some":["more","json","data"]}} and more text';
let obj = Array.from(input).reduce(({res, isJson, chunk}, curr) =>
{
if (curr === "{")
{
if (!isJson) res.push(chunk);
chunk = isJson ? chunk + curr : curr;
isJson = true;
}
else if (isJson)
{
try
{
chunk += curr;
JSON.parse(chunk);
// If no error, we found a JSON.
res.push(chunk);
chunk = "";
isJson = false;
}
catch(e) {/* Ignore error */}
}
else
{
chunk += curr;
}
return {res, isJson, chunk};
}, {res:, isJson:false, chunk:""})
// First stage done, lets debug obtained data.
obj.res.push(obj.chunk);
console.log(obj.res);
// Finally, we map the pieces.
let res = obj.res.map(x => x.match("{") ? JSON.parse(x) : x);
console.log(res);
add a comment |
Here you one approach that iterates char by char. First we create an array from the input and then use reduce()
on it. When we detect an opening curly bracket {
we push the current accumulated chunk on an array of detected results, and then we set a flag on the accumulator
object we are using on reduce
. While this flag is set to true
we will try to parse for a JSON
and only when success we put the chunk
representing the JSON
on the array of detected results and set the flag again to false
.
The accumulator
of the reduce()
method will hold next data:
res
: an array with detected results:strings
orjsons
.
chunk
: a string representing the current accumulated chunk of chars.
isJson
: a boolean indicating if the currentchunk
isjson
or not.
const input = 'This is a string {"with":"json", "in":"it"} followed by more text {"and":{"some":["more","json","data"]}} and more text';
let obj = Array.from(input).reduce(({res, isJson, chunk}, curr) =>
{
if (curr === "{")
{
if (!isJson) res.push(chunk);
chunk = isJson ? chunk + curr : curr;
isJson = true;
}
else if (isJson)
{
try
{
chunk += curr;
JSON.parse(chunk);
// If no error, we found a JSON.
res.push(chunk);
chunk = "";
isJson = false;
}
catch(e) {/* Ignore error */}
}
else
{
chunk += curr;
}
return {res, isJson, chunk};
}, {res:, isJson:false, chunk:""})
// First stage done, lets debug obtained data.
obj.res.push(obj.chunk);
console.log(obj.res);
// Finally, we map the pieces.
let res = obj.res.map(x => x.match("{") ? JSON.parse(x) : x);
console.log(res);
Here you one approach that iterates char by char. First we create an array from the input and then use reduce()
on it. When we detect an opening curly bracket {
we push the current accumulated chunk on an array of detected results, and then we set a flag on the accumulator
object we are using on reduce
. While this flag is set to true
we will try to parse for a JSON
and only when success we put the chunk
representing the JSON
on the array of detected results and set the flag again to false
.
The accumulator
of the reduce()
method will hold next data:
res
: an array with detected results:strings
orjsons
.
chunk
: a string representing the current accumulated chunk of chars.
isJson
: a boolean indicating if the currentchunk
isjson
or not.
const input = 'This is a string {"with":"json", "in":"it"} followed by more text {"and":{"some":["more","json","data"]}} and more text';
let obj = Array.from(input).reduce(({res, isJson, chunk}, curr) =>
{
if (curr === "{")
{
if (!isJson) res.push(chunk);
chunk = isJson ? chunk + curr : curr;
isJson = true;
}
else if (isJson)
{
try
{
chunk += curr;
JSON.parse(chunk);
// If no error, we found a JSON.
res.push(chunk);
chunk = "";
isJson = false;
}
catch(e) {/* Ignore error */}
}
else
{
chunk += curr;
}
return {res, isJson, chunk};
}, {res:, isJson:false, chunk:""})
// First stage done, lets debug obtained data.
obj.res.push(obj.chunk);
console.log(obj.res);
// Finally, we map the pieces.
let res = obj.res.map(x => x.match("{") ? JSON.parse(x) : x);
console.log(res);
const input = 'This is a string {"with":"json", "in":"it"} followed by more text {"and":{"some":["more","json","data"]}} and more text';
let obj = Array.from(input).reduce(({res, isJson, chunk}, curr) =>
{
if (curr === "{")
{
if (!isJson) res.push(chunk);
chunk = isJson ? chunk + curr : curr;
isJson = true;
}
else if (isJson)
{
try
{
chunk += curr;
JSON.parse(chunk);
// If no error, we found a JSON.
res.push(chunk);
chunk = "";
isJson = false;
}
catch(e) {/* Ignore error */}
}
else
{
chunk += curr;
}
return {res, isJson, chunk};
}, {res:, isJson:false, chunk:""})
// First stage done, lets debug obtained data.
obj.res.push(obj.chunk);
console.log(obj.res);
// Finally, we map the pieces.
let res = obj.res.map(x => x.match("{") ? JSON.parse(x) : x);
console.log(res);
const input = 'This is a string {"with":"json", "in":"it"} followed by more text {"and":{"some":["more","json","data"]}} and more text';
let obj = Array.from(input).reduce(({res, isJson, chunk}, curr) =>
{
if (curr === "{")
{
if (!isJson) res.push(chunk);
chunk = isJson ? chunk + curr : curr;
isJson = true;
}
else if (isJson)
{
try
{
chunk += curr;
JSON.parse(chunk);
// If no error, we found a JSON.
res.push(chunk);
chunk = "";
isJson = false;
}
catch(e) {/* Ignore error */}
}
else
{
chunk += curr;
}
return {res, isJson, chunk};
}, {res:, isJson:false, chunk:""})
// First stage done, lets debug obtained data.
obj.res.push(obj.chunk);
console.log(obj.res);
// Finally, we map the pieces.
let res = obj.res.map(x => x.match("{") ? JSON.parse(x) : x);
console.log(res);
edited Feb 9 at 23:27
answered Feb 9 at 19:03
ShiderszShidersz
9,0612833
9,0612833
add a comment |
add a comment |
Obligatory answer: this is an improper format (because of this complication, and the guarantee is a security hole if the parser is improperly designed); it should ideally be redesigned. (Sorry, it had to be said.)
Barring that, you can generate a parser using your favorite parser generator that outputs to javascript as a target language. It might even have a demo grammar for JSON.
However, the glaring security issue is incredibly scary (if any JSON gets past the 'guarantee', suddenly it's a vector). An array interspersed representation seems nicer, with the constraint that assert(text.length == markup.length+1)
:
'{
"text": ['Hello', 'this is red text!'],
"markup": [{"text":"everyone", "color":"red"}]
}'
or even nicer:
'[
{"type":"text", "text":"Hello"},
{"type":"markup", "text":"everyone", "color":"red"} # or ,"val":{"text":.., "color":..}}
{"type":"text", "text":"this is red text!"},
...
]'
Store compressed ideally. Unserialize without any worries with JSON.parse.
add a comment |
Obligatory answer: this is an improper format (because of this complication, and the guarantee is a security hole if the parser is improperly designed); it should ideally be redesigned. (Sorry, it had to be said.)
Barring that, you can generate a parser using your favorite parser generator that outputs to javascript as a target language. It might even have a demo grammar for JSON.
However, the glaring security issue is incredibly scary (if any JSON gets past the 'guarantee', suddenly it's a vector). An array interspersed representation seems nicer, with the constraint that assert(text.length == markup.length+1)
:
'{
"text": ['Hello', 'this is red text!'],
"markup": [{"text":"everyone", "color":"red"}]
}'
or even nicer:
'[
{"type":"text", "text":"Hello"},
{"type":"markup", "text":"everyone", "color":"red"} # or ,"val":{"text":.., "color":..}}
{"type":"text", "text":"this is red text!"},
...
]'
Store compressed ideally. Unserialize without any worries with JSON.parse.
add a comment |
Obligatory answer: this is an improper format (because of this complication, and the guarantee is a security hole if the parser is improperly designed); it should ideally be redesigned. (Sorry, it had to be said.)
Barring that, you can generate a parser using your favorite parser generator that outputs to javascript as a target language. It might even have a demo grammar for JSON.
However, the glaring security issue is incredibly scary (if any JSON gets past the 'guarantee', suddenly it's a vector). An array interspersed representation seems nicer, with the constraint that assert(text.length == markup.length+1)
:
'{
"text": ['Hello', 'this is red text!'],
"markup": [{"text":"everyone", "color":"red"}]
}'
or even nicer:
'[
{"type":"text", "text":"Hello"},
{"type":"markup", "text":"everyone", "color":"red"} # or ,"val":{"text":.., "color":..}}
{"type":"text", "text":"this is red text!"},
...
]'
Store compressed ideally. Unserialize without any worries with JSON.parse.
Obligatory answer: this is an improper format (because of this complication, and the guarantee is a security hole if the parser is improperly designed); it should ideally be redesigned. (Sorry, it had to be said.)
Barring that, you can generate a parser using your favorite parser generator that outputs to javascript as a target language. It might even have a demo grammar for JSON.
However, the glaring security issue is incredibly scary (if any JSON gets past the 'guarantee', suddenly it's a vector). An array interspersed representation seems nicer, with the constraint that assert(text.length == markup.length+1)
:
'{
"text": ['Hello', 'this is red text!'],
"markup": [{"text":"everyone", "color":"red"}]
}'
or even nicer:
'[
{"type":"text", "text":"Hello"},
{"type":"markup", "text":"everyone", "color":"red"} # or ,"val":{"text":.., "color":..}}
{"type":"text", "text":"this is red text!"},
...
]'
Store compressed ideally. Unserialize without any worries with JSON.parse.
answered Feb 15 at 9:45
ninjageckoninjagecko
60.9k17113122
60.9k17113122
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54608178%2fhow-do-i-parse-json-sprinkled-unpredictably-into-a-string%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
There is no foolproof way to do this short of writing a full-blown parser yourself.
– Jared Smith
Feb 9 at 16:30
7
Wow, that's a nightmare of a data format. You could maybe brute-force it by starting from each
{
, greedy-matching to the last}
, then backtracking until JSON.parse stops throwing errors on the contents; rinse and repeat. Computationally expensive (and ridiculous) but at least you wouldn't have to write your own parser.– Daniel Beck
Feb 9 at 16:33
2
"strings with JSON arbitrarily sprinkled into them, like so:" Note,
"This is a string "
is validJSON
.– guest271314
Feb 9 at 16:42
Does the actual input data contain space characters following the comma, between the keys
{"with":"json", "in":"it"}
?– guest271314
Feb 9 at 17:03
2
"Is there a better way to solve this problem?" - Any chance you can contact the guys who generate that input and ask them to work decently?
– Pedro A
Feb 10 at 2:33