Chemical Formula Parser
$begingroup$
As an exercise, I wrote a simple context free parser modeled from a pushdown automaton for chemical formulas such as NO2
or Fe(NO3)3
.
To explain a bit about the decisions made:
The state functions (
pstart
,pchem
andpnum
) are lambdas because they carry around a lot of context. They could be methods of a class, it doesn't really matter.The state functions have the input character injected because I realized halfway that I have to process some more after the line ended and I don't want to duplicate the state functions.
I am mostly concerned about the general messiness of the parser. There is no obvious way to know where variables are mutated, control flow is all over the place and consequently I find it difficult to reason about the program. Is there a better code structure?
Finally, I'd like to know if I caught all illegal syntax. Note repeating chemicals like NN
is allowed.
#include<iostream>
#include<stack>
#include<unordered_map>
#include<string>
#include<iterator>
using namespace std::literals;
enum class state_t { start, num, chem, error };
double parse(std::string line, const std::unordered_map<std::string, double>& m)
{
auto b = line.begin(), e = line.end();
std::stack<double> stk;
int num = 0;
std::string chem;
state_t state = state_t::start;
auto pstart = [&](char c) {
if(std::isupper(c))
{
chem = ""s + c;
state = state_t::chem;
return true;
}
else if(std::isdigit(c))
{
if(stk.empty())
state = state_t::error;
else
{
num = c - '0';
state = state_t::num;
return true;
}
}
else if(c == '(')
{
stk.push(-1);
return true;
}
else if(c == ')')
{
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.empty())
state = state_t::error;
else
{
stk.pop();
stk.push(sum);
return true;
}
}
else
state = state_t::error;
return false;
};
auto pnum = [&](char c) {
if(std::isdigit(c))
{
num = 10 * num + c - '0';
return true;
}
else
{
stk.top() *= num;
state = state_t::start;
}
return false;
};
auto pchem = [&](char c){
if(std::islower(c))
{
chem += c;
return true;
}
else
{
if(auto it = m.find(chem); it != m.end())
{
stk.push(it->second);
state = state_t::start;
}
else
state = state_t::error;
}
return false;
};
while(b != e)
{
switch(state)
{
case state_t::start:
if(pstart(*b))
b++;
break;
case state_t::num:
if(pnum(*b))
b++;
break;
case state_t::chem:
if(pchem(*b))
b++;
break;
default:
return -1;
}
}
switch(state)
{
case state_t::num:
pnum('n');
break;
case state_t::chem:
pchem('n');
break;
}
if(state == state_t::error)
return -1;
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.size() > 0) // expected ')'
return -1;
return sum;
}
int main()
{
std::unordered_map<std::string, double> m;
m["Na"] = 23.5;
m["Cl"] = 35.5;
m["O"] = 16;
m["N"] = 14;
m["Fe"] = 55.8;
std::string line;
while(getline(std::cin, line))
std::cout << parse(std::move(line), m) << 'n';
}
c++ parsing
$endgroup$
|
show 2 more comments
$begingroup$
As an exercise, I wrote a simple context free parser modeled from a pushdown automaton for chemical formulas such as NO2
or Fe(NO3)3
.
To explain a bit about the decisions made:
The state functions (
pstart
,pchem
andpnum
) are lambdas because they carry around a lot of context. They could be methods of a class, it doesn't really matter.The state functions have the input character injected because I realized halfway that I have to process some more after the line ended and I don't want to duplicate the state functions.
I am mostly concerned about the general messiness of the parser. There is no obvious way to know where variables are mutated, control flow is all over the place and consequently I find it difficult to reason about the program. Is there a better code structure?
Finally, I'd like to know if I caught all illegal syntax. Note repeating chemicals like NN
is allowed.
#include<iostream>
#include<stack>
#include<unordered_map>
#include<string>
#include<iterator>
using namespace std::literals;
enum class state_t { start, num, chem, error };
double parse(std::string line, const std::unordered_map<std::string, double>& m)
{
auto b = line.begin(), e = line.end();
std::stack<double> stk;
int num = 0;
std::string chem;
state_t state = state_t::start;
auto pstart = [&](char c) {
if(std::isupper(c))
{
chem = ""s + c;
state = state_t::chem;
return true;
}
else if(std::isdigit(c))
{
if(stk.empty())
state = state_t::error;
else
{
num = c - '0';
state = state_t::num;
return true;
}
}
else if(c == '(')
{
stk.push(-1);
return true;
}
else if(c == ')')
{
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.empty())
state = state_t::error;
else
{
stk.pop();
stk.push(sum);
return true;
}
}
else
state = state_t::error;
return false;
};
auto pnum = [&](char c) {
if(std::isdigit(c))
{
num = 10 * num + c - '0';
return true;
}
else
{
stk.top() *= num;
state = state_t::start;
}
return false;
};
auto pchem = [&](char c){
if(std::islower(c))
{
chem += c;
return true;
}
else
{
if(auto it = m.find(chem); it != m.end())
{
stk.push(it->second);
state = state_t::start;
}
else
state = state_t::error;
}
return false;
};
while(b != e)
{
switch(state)
{
case state_t::start:
if(pstart(*b))
b++;
break;
case state_t::num:
if(pnum(*b))
b++;
break;
case state_t::chem:
if(pchem(*b))
b++;
break;
default:
return -1;
}
}
switch(state)
{
case state_t::num:
pnum('n');
break;
case state_t::chem:
pchem('n');
break;
}
if(state == state_t::error)
return -1;
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.size() > 0) // expected ')'
return -1;
return sum;
}
int main()
{
std::unordered_map<std::string, double> m;
m["Na"] = 23.5;
m["Cl"] = 35.5;
m["O"] = 16;
m["N"] = 14;
m["Fe"] = 55.8;
std::string line;
while(getline(std::cin, line))
std::cout << parse(std::move(line), m) << 'n';
}
c++ parsing
$endgroup$
1
$begingroup$
I realizedif(stk.size() > 1)
should beif(stk.size() > 0)
, should I edit?
$endgroup$
– 友人A
Jan 27 at 11:08
2
$begingroup$
Feel free to edit as long as there are no answers. But try to not edit as time goes by.
$endgroup$
– Incomputable
Jan 27 at 11:26
2
$begingroup$
Note, the singular is “automaton”.
$endgroup$
– Konrad Rudolph
Jan 27 at 14:26
1
$begingroup$
There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
$endgroup$
– Martin York
Jan 27 at 18:50
1
$begingroup$
You said that you wrote a parser but the result of the function isdouble
so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the functionparse
should be renamed appropriately.
$endgroup$
– Konrad Rudolph
Jan 28 at 11:32
|
show 2 more comments
$begingroup$
As an exercise, I wrote a simple context free parser modeled from a pushdown automaton for chemical formulas such as NO2
or Fe(NO3)3
.
To explain a bit about the decisions made:
The state functions (
pstart
,pchem
andpnum
) are lambdas because they carry around a lot of context. They could be methods of a class, it doesn't really matter.The state functions have the input character injected because I realized halfway that I have to process some more after the line ended and I don't want to duplicate the state functions.
I am mostly concerned about the general messiness of the parser. There is no obvious way to know where variables are mutated, control flow is all over the place and consequently I find it difficult to reason about the program. Is there a better code structure?
Finally, I'd like to know if I caught all illegal syntax. Note repeating chemicals like NN
is allowed.
#include<iostream>
#include<stack>
#include<unordered_map>
#include<string>
#include<iterator>
using namespace std::literals;
enum class state_t { start, num, chem, error };
double parse(std::string line, const std::unordered_map<std::string, double>& m)
{
auto b = line.begin(), e = line.end();
std::stack<double> stk;
int num = 0;
std::string chem;
state_t state = state_t::start;
auto pstart = [&](char c) {
if(std::isupper(c))
{
chem = ""s + c;
state = state_t::chem;
return true;
}
else if(std::isdigit(c))
{
if(stk.empty())
state = state_t::error;
else
{
num = c - '0';
state = state_t::num;
return true;
}
}
else if(c == '(')
{
stk.push(-1);
return true;
}
else if(c == ')')
{
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.empty())
state = state_t::error;
else
{
stk.pop();
stk.push(sum);
return true;
}
}
else
state = state_t::error;
return false;
};
auto pnum = [&](char c) {
if(std::isdigit(c))
{
num = 10 * num + c - '0';
return true;
}
else
{
stk.top() *= num;
state = state_t::start;
}
return false;
};
auto pchem = [&](char c){
if(std::islower(c))
{
chem += c;
return true;
}
else
{
if(auto it = m.find(chem); it != m.end())
{
stk.push(it->second);
state = state_t::start;
}
else
state = state_t::error;
}
return false;
};
while(b != e)
{
switch(state)
{
case state_t::start:
if(pstart(*b))
b++;
break;
case state_t::num:
if(pnum(*b))
b++;
break;
case state_t::chem:
if(pchem(*b))
b++;
break;
default:
return -1;
}
}
switch(state)
{
case state_t::num:
pnum('n');
break;
case state_t::chem:
pchem('n');
break;
}
if(state == state_t::error)
return -1;
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.size() > 0) // expected ')'
return -1;
return sum;
}
int main()
{
std::unordered_map<std::string, double> m;
m["Na"] = 23.5;
m["Cl"] = 35.5;
m["O"] = 16;
m["N"] = 14;
m["Fe"] = 55.8;
std::string line;
while(getline(std::cin, line))
std::cout << parse(std::move(line), m) << 'n';
}
c++ parsing
$endgroup$
As an exercise, I wrote a simple context free parser modeled from a pushdown automaton for chemical formulas such as NO2
or Fe(NO3)3
.
To explain a bit about the decisions made:
The state functions (
pstart
,pchem
andpnum
) are lambdas because they carry around a lot of context. They could be methods of a class, it doesn't really matter.The state functions have the input character injected because I realized halfway that I have to process some more after the line ended and I don't want to duplicate the state functions.
I am mostly concerned about the general messiness of the parser. There is no obvious way to know where variables are mutated, control flow is all over the place and consequently I find it difficult to reason about the program. Is there a better code structure?
Finally, I'd like to know if I caught all illegal syntax. Note repeating chemicals like NN
is allowed.
#include<iostream>
#include<stack>
#include<unordered_map>
#include<string>
#include<iterator>
using namespace std::literals;
enum class state_t { start, num, chem, error };
double parse(std::string line, const std::unordered_map<std::string, double>& m)
{
auto b = line.begin(), e = line.end();
std::stack<double> stk;
int num = 0;
std::string chem;
state_t state = state_t::start;
auto pstart = [&](char c) {
if(std::isupper(c))
{
chem = ""s + c;
state = state_t::chem;
return true;
}
else if(std::isdigit(c))
{
if(stk.empty())
state = state_t::error;
else
{
num = c - '0';
state = state_t::num;
return true;
}
}
else if(c == '(')
{
stk.push(-1);
return true;
}
else if(c == ')')
{
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.empty())
state = state_t::error;
else
{
stk.pop();
stk.push(sum);
return true;
}
}
else
state = state_t::error;
return false;
};
auto pnum = [&](char c) {
if(std::isdigit(c))
{
num = 10 * num + c - '0';
return true;
}
else
{
stk.top() *= num;
state = state_t::start;
}
return false;
};
auto pchem = [&](char c){
if(std::islower(c))
{
chem += c;
return true;
}
else
{
if(auto it = m.find(chem); it != m.end())
{
stk.push(it->second);
state = state_t::start;
}
else
state = state_t::error;
}
return false;
};
while(b != e)
{
switch(state)
{
case state_t::start:
if(pstart(*b))
b++;
break;
case state_t::num:
if(pnum(*b))
b++;
break;
case state_t::chem:
if(pchem(*b))
b++;
break;
default:
return -1;
}
}
switch(state)
{
case state_t::num:
pnum('n');
break;
case state_t::chem:
pchem('n');
break;
}
if(state == state_t::error)
return -1;
double sum = 0;
while(!stk.empty() && stk.top() > 0)
{
sum += stk.top();
stk.pop();
}
if(stk.size() > 0) // expected ')'
return -1;
return sum;
}
int main()
{
std::unordered_map<std::string, double> m;
m["Na"] = 23.5;
m["Cl"] = 35.5;
m["O"] = 16;
m["N"] = 14;
m["Fe"] = 55.8;
std::string line;
while(getline(std::cin, line))
std::cout << parse(std::move(line), m) << 'n';
}
c++ parsing
c++ parsing
edited Jan 27 at 14:38
友人A
asked Jan 27 at 10:26
友人A友人A
436
436
1
$begingroup$
I realizedif(stk.size() > 1)
should beif(stk.size() > 0)
, should I edit?
$endgroup$
– 友人A
Jan 27 at 11:08
2
$begingroup$
Feel free to edit as long as there are no answers. But try to not edit as time goes by.
$endgroup$
– Incomputable
Jan 27 at 11:26
2
$begingroup$
Note, the singular is “automaton”.
$endgroup$
– Konrad Rudolph
Jan 27 at 14:26
1
$begingroup$
There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
$endgroup$
– Martin York
Jan 27 at 18:50
1
$begingroup$
You said that you wrote a parser but the result of the function isdouble
so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the functionparse
should be renamed appropriately.
$endgroup$
– Konrad Rudolph
Jan 28 at 11:32
|
show 2 more comments
1
$begingroup$
I realizedif(stk.size() > 1)
should beif(stk.size() > 0)
, should I edit?
$endgroup$
– 友人A
Jan 27 at 11:08
2
$begingroup$
Feel free to edit as long as there are no answers. But try to not edit as time goes by.
$endgroup$
– Incomputable
Jan 27 at 11:26
2
$begingroup$
Note, the singular is “automaton”.
$endgroup$
– Konrad Rudolph
Jan 27 at 14:26
1
$begingroup$
There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
$endgroup$
– Martin York
Jan 27 at 18:50
1
$begingroup$
You said that you wrote a parser but the result of the function isdouble
so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the functionparse
should be renamed appropriately.
$endgroup$
– Konrad Rudolph
Jan 28 at 11:32
1
1
$begingroup$
I realized
if(stk.size() > 1)
should be if(stk.size() > 0)
, should I edit?$endgroup$
– 友人A
Jan 27 at 11:08
$begingroup$
I realized
if(stk.size() > 1)
should be if(stk.size() > 0)
, should I edit?$endgroup$
– 友人A
Jan 27 at 11:08
2
2
$begingroup$
Feel free to edit as long as there are no answers. But try to not edit as time goes by.
$endgroup$
– Incomputable
Jan 27 at 11:26
$begingroup$
Feel free to edit as long as there are no answers. But try to not edit as time goes by.
$endgroup$
– Incomputable
Jan 27 at 11:26
2
2
$begingroup$
Note, the singular is “automaton”.
$endgroup$
– Konrad Rudolph
Jan 27 at 14:26
$begingroup$
Note, the singular is “automaton”.
$endgroup$
– Konrad Rudolph
Jan 27 at 14:26
1
1
$begingroup$
There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
$endgroup$
– Martin York
Jan 27 at 18:50
$begingroup$
There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
$endgroup$
– Martin York
Jan 27 at 18:50
1
1
$begingroup$
You said that you wrote a parser but the result of the function is
double
so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the function parse
should be renamed appropriately.$endgroup$
– Konrad Rudolph
Jan 28 at 11:32
$begingroup$
You said that you wrote a parser but the result of the function is
double
so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the function parse
should be renamed appropriately.$endgroup$
– Konrad Rudolph
Jan 28 at 11:32
|
show 2 more comments
2 Answers
2
active
oldest
votes
$begingroup$
Using a state machine is fine, but is usually harder to get right than writing the parser with focus on the grammar. You're also mixing a computation with the parsing which adds more information to consider when analyzing the code.
I would recommend to separate the parser code from the computation and then writing the parser strictly following the grammar you want to parse. I'll try to illustrate what I mean by giving a simplified version of your parser. Say you have a grammar:
formula = *(group)
group = element [ count ]
element = uppercase [ lowercase ]
count = "0" .. "9"
You can now give a function for each non-terminal:
// formula = *(group) ; * becomes while (...) ...
std::list<group> parse_formula(std::stringstream& s)
{
std::list<group> rv;
while (!s.eof())
{
rv.push_back(parse_group(s));
}
return rv;
}
// group = element [ count ]
group parse_group(std::stringstream& s)
{
group g;
group.element = parse_element(s);
try
{
group.count = parse_count(s);
}
catch (const parse_failed&)
{
group = 1;
}
}
// element = uppercase [lowercase]
std::string parse_element(std::stringstream& s)
{
if (!std::isupper(s.peek()))
{
throw parse_failed(...);
}
std::string element = s.get();;
if (std::islower(s.peek()))
{
s.get(ch);
element += ch;
}
return element;
}
// count = [0-9]
unsigned parse_count(std::stringstream& s)
{
if (!std::isdigit(s.peek()))
{
throw parse_failed(...);
}
unsigned rv;
s >> rv; // this actually violates the grammar as it reads multiple digits
return rv;
}
You can then iterate over the list of groups and compute the sum.
$endgroup$
add a comment |
$begingroup$
Lambdas Too Complex
You say that the reason behind your lambdas is that they carry a lot of state around, but it is mutating plenty of it as well. Extracting your lambdas into methods in their own right would make it easier to keep track of where your state is used and where it is mutated.
Methods Should Calculate Or Mutate, Not Both
Lambdas that calculate something and then mutate something else are notoriously difficult to debug and maintain. When you create your methods, it should either calculate something (such as the atomic weight) or change something (such as changing your state).
Variable Names Unclear
Names like pstart
, num
and m
do not tell me anything about the purpose of these variables. There is nothing wrong with LongVariableNamesThatExplainWhatTheyAreDoing
.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f212312%2fchemical-formula-parser%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Using a state machine is fine, but is usually harder to get right than writing the parser with focus on the grammar. You're also mixing a computation with the parsing which adds more information to consider when analyzing the code.
I would recommend to separate the parser code from the computation and then writing the parser strictly following the grammar you want to parse. I'll try to illustrate what I mean by giving a simplified version of your parser. Say you have a grammar:
formula = *(group)
group = element [ count ]
element = uppercase [ lowercase ]
count = "0" .. "9"
You can now give a function for each non-terminal:
// formula = *(group) ; * becomes while (...) ...
std::list<group> parse_formula(std::stringstream& s)
{
std::list<group> rv;
while (!s.eof())
{
rv.push_back(parse_group(s));
}
return rv;
}
// group = element [ count ]
group parse_group(std::stringstream& s)
{
group g;
group.element = parse_element(s);
try
{
group.count = parse_count(s);
}
catch (const parse_failed&)
{
group = 1;
}
}
// element = uppercase [lowercase]
std::string parse_element(std::stringstream& s)
{
if (!std::isupper(s.peek()))
{
throw parse_failed(...);
}
std::string element = s.get();;
if (std::islower(s.peek()))
{
s.get(ch);
element += ch;
}
return element;
}
// count = [0-9]
unsigned parse_count(std::stringstream& s)
{
if (!std::isdigit(s.peek()))
{
throw parse_failed(...);
}
unsigned rv;
s >> rv; // this actually violates the grammar as it reads multiple digits
return rv;
}
You can then iterate over the list of groups and compute the sum.
$endgroup$
add a comment |
$begingroup$
Using a state machine is fine, but is usually harder to get right than writing the parser with focus on the grammar. You're also mixing a computation with the parsing which adds more information to consider when analyzing the code.
I would recommend to separate the parser code from the computation and then writing the parser strictly following the grammar you want to parse. I'll try to illustrate what I mean by giving a simplified version of your parser. Say you have a grammar:
formula = *(group)
group = element [ count ]
element = uppercase [ lowercase ]
count = "0" .. "9"
You can now give a function for each non-terminal:
// formula = *(group) ; * becomes while (...) ...
std::list<group> parse_formula(std::stringstream& s)
{
std::list<group> rv;
while (!s.eof())
{
rv.push_back(parse_group(s));
}
return rv;
}
// group = element [ count ]
group parse_group(std::stringstream& s)
{
group g;
group.element = parse_element(s);
try
{
group.count = parse_count(s);
}
catch (const parse_failed&)
{
group = 1;
}
}
// element = uppercase [lowercase]
std::string parse_element(std::stringstream& s)
{
if (!std::isupper(s.peek()))
{
throw parse_failed(...);
}
std::string element = s.get();;
if (std::islower(s.peek()))
{
s.get(ch);
element += ch;
}
return element;
}
// count = [0-9]
unsigned parse_count(std::stringstream& s)
{
if (!std::isdigit(s.peek()))
{
throw parse_failed(...);
}
unsigned rv;
s >> rv; // this actually violates the grammar as it reads multiple digits
return rv;
}
You can then iterate over the list of groups and compute the sum.
$endgroup$
add a comment |
$begingroup$
Using a state machine is fine, but is usually harder to get right than writing the parser with focus on the grammar. You're also mixing a computation with the parsing which adds more information to consider when analyzing the code.
I would recommend to separate the parser code from the computation and then writing the parser strictly following the grammar you want to parse. I'll try to illustrate what I mean by giving a simplified version of your parser. Say you have a grammar:
formula = *(group)
group = element [ count ]
element = uppercase [ lowercase ]
count = "0" .. "9"
You can now give a function for each non-terminal:
// formula = *(group) ; * becomes while (...) ...
std::list<group> parse_formula(std::stringstream& s)
{
std::list<group> rv;
while (!s.eof())
{
rv.push_back(parse_group(s));
}
return rv;
}
// group = element [ count ]
group parse_group(std::stringstream& s)
{
group g;
group.element = parse_element(s);
try
{
group.count = parse_count(s);
}
catch (const parse_failed&)
{
group = 1;
}
}
// element = uppercase [lowercase]
std::string parse_element(std::stringstream& s)
{
if (!std::isupper(s.peek()))
{
throw parse_failed(...);
}
std::string element = s.get();;
if (std::islower(s.peek()))
{
s.get(ch);
element += ch;
}
return element;
}
// count = [0-9]
unsigned parse_count(std::stringstream& s)
{
if (!std::isdigit(s.peek()))
{
throw parse_failed(...);
}
unsigned rv;
s >> rv; // this actually violates the grammar as it reads multiple digits
return rv;
}
You can then iterate over the list of groups and compute the sum.
$endgroup$
Using a state machine is fine, but is usually harder to get right than writing the parser with focus on the grammar. You're also mixing a computation with the parsing which adds more information to consider when analyzing the code.
I would recommend to separate the parser code from the computation and then writing the parser strictly following the grammar you want to parse. I'll try to illustrate what I mean by giving a simplified version of your parser. Say you have a grammar:
formula = *(group)
group = element [ count ]
element = uppercase [ lowercase ]
count = "0" .. "9"
You can now give a function for each non-terminal:
// formula = *(group) ; * becomes while (...) ...
std::list<group> parse_formula(std::stringstream& s)
{
std::list<group> rv;
while (!s.eof())
{
rv.push_back(parse_group(s));
}
return rv;
}
// group = element [ count ]
group parse_group(std::stringstream& s)
{
group g;
group.element = parse_element(s);
try
{
group.count = parse_count(s);
}
catch (const parse_failed&)
{
group = 1;
}
}
// element = uppercase [lowercase]
std::string parse_element(std::stringstream& s)
{
if (!std::isupper(s.peek()))
{
throw parse_failed(...);
}
std::string element = s.get();;
if (std::islower(s.peek()))
{
s.get(ch);
element += ch;
}
return element;
}
// count = [0-9]
unsigned parse_count(std::stringstream& s)
{
if (!std::isdigit(s.peek()))
{
throw parse_failed(...);
}
unsigned rv;
s >> rv; // this actually violates the grammar as it reads multiple digits
return rv;
}
You can then iterate over the list of groups and compute the sum.
edited Jan 27 at 14:38
answered Jan 27 at 12:13
CornholioCornholio
48617
48617
add a comment |
add a comment |
$begingroup$
Lambdas Too Complex
You say that the reason behind your lambdas is that they carry a lot of state around, but it is mutating plenty of it as well. Extracting your lambdas into methods in their own right would make it easier to keep track of where your state is used and where it is mutated.
Methods Should Calculate Or Mutate, Not Both
Lambdas that calculate something and then mutate something else are notoriously difficult to debug and maintain. When you create your methods, it should either calculate something (such as the atomic weight) or change something (such as changing your state).
Variable Names Unclear
Names like pstart
, num
and m
do not tell me anything about the purpose of these variables. There is nothing wrong with LongVariableNamesThatExplainWhatTheyAreDoing
.
$endgroup$
add a comment |
$begingroup$
Lambdas Too Complex
You say that the reason behind your lambdas is that they carry a lot of state around, but it is mutating plenty of it as well. Extracting your lambdas into methods in their own right would make it easier to keep track of where your state is used and where it is mutated.
Methods Should Calculate Or Mutate, Not Both
Lambdas that calculate something and then mutate something else are notoriously difficult to debug and maintain. When you create your methods, it should either calculate something (such as the atomic weight) or change something (such as changing your state).
Variable Names Unclear
Names like pstart
, num
and m
do not tell me anything about the purpose of these variables. There is nothing wrong with LongVariableNamesThatExplainWhatTheyAreDoing
.
$endgroup$
add a comment |
$begingroup$
Lambdas Too Complex
You say that the reason behind your lambdas is that they carry a lot of state around, but it is mutating plenty of it as well. Extracting your lambdas into methods in their own right would make it easier to keep track of where your state is used and where it is mutated.
Methods Should Calculate Or Mutate, Not Both
Lambdas that calculate something and then mutate something else are notoriously difficult to debug and maintain. When you create your methods, it should either calculate something (such as the atomic weight) or change something (such as changing your state).
Variable Names Unclear
Names like pstart
, num
and m
do not tell me anything about the purpose of these variables. There is nothing wrong with LongVariableNamesThatExplainWhatTheyAreDoing
.
$endgroup$
Lambdas Too Complex
You say that the reason behind your lambdas is that they carry a lot of state around, but it is mutating plenty of it as well. Extracting your lambdas into methods in their own right would make it easier to keep track of where your state is used and where it is mutated.
Methods Should Calculate Or Mutate, Not Both
Lambdas that calculate something and then mutate something else are notoriously difficult to debug and maintain. When you create your methods, it should either calculate something (such as the atomic weight) or change something (such as changing your state).
Variable Names Unclear
Names like pstart
, num
and m
do not tell me anything about the purpose of these variables. There is nothing wrong with LongVariableNamesThatExplainWhatTheyAreDoing
.
answered Jan 27 at 12:15
Joe CJoe C
1,045211
1,045211
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f212312%2fchemical-formula-parser%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
I realized
if(stk.size() > 1)
should beif(stk.size() > 0)
, should I edit?$endgroup$
– 友人A
Jan 27 at 11:08
2
$begingroup$
Feel free to edit as long as there are no answers. But try to not edit as time goes by.
$endgroup$
– Incomputable
Jan 27 at 11:26
2
$begingroup$
Note, the singular is “automaton”.
$endgroup$
– Konrad Rudolph
Jan 27 at 14:26
1
$begingroup$
There is a standard for writing chemical formulas to be machine parsable so that they are not ambiguous. Its called SMILE Simple Molecule Input Line Entry. The trouble with your system is that for more complex molecules you can't be unambiguous.
$endgroup$
– Martin York
Jan 27 at 18:50
1
$begingroup$
You said that you wrote a parser but the result of the function is
double
so clearly you’re doing way more than just parsing, and it’s not at all clear (from the names, nor the usage) what transformation you’re performing. I’m guessing you are calculating the molecular weight? In which case the functionparse
should be renamed appropriately.$endgroup$
– Konrad Rudolph
Jan 28 at 11:32