Parsing Chemistry
5 comments
·October 24, 2025the__alchemist
Note: There are two standardized formats for this called SMILES and SELFIES. SMILES is much better supported, but SELFIES is more robust. I'm integrating them into some bio and chem software I'm working on.
You can do things like look up, using PubChem's API, similar molecules etc to a SMILES string.
I believe most molecule editors can load and save SMILES.
dachrillz
What about inchi? Isn’t that a common way of describing molecules as well?
the__alchemist
Good point!
whitten
Does the SMILE (or Simplified Molecular Input Line Entry System) code have an EBNF definition ? https://en.wikipedia.org/wiki/Simplified_Molecular_Input_Lin... Claims there is a context free grammar.
Does this do structural formulae too?
Was thinking of InChI[0] but on Googling SMILES and SELFIES I found this[1] talk, this[2] paper and my goodness I've been down a few rabbit holes since...
[0] https://en.wikipedia.org/wiki/International_Chemical_Identif... [1] https://www.inchi-trust.org/wp/wp-content/uploads/2019/12/18... [2] https://pubs.rsc.org/en/content/articlehtml/2022/dd/d1dd0001...