Skip to content(if available)orjump to list(if available)

Slightly better named character reference tokenization than Chrome, Safari, FF

masfuerte

That was a good read. I reread the relevant section of the HTML5 spec and noticed an error in an example:

> For example, ¬in will be parsed as "¬in" whereas &notin will be parsed as "∉".

Only a small minority of the named character references are permitted without a closing semicolon, and notin is not one of them. So &notin is actually parsed as "¬in". ∉ is parsed as "∉".

https://html.spec.whatwg.org/#parse-error-missing-semicolon-...

squeek502

Good catch, that does indeed look like a mistake in the spec. Everything past the first sentence of that error description is suspect, honestly (seems like it was naively adapted from the example in [1] but that example isn't relevant to the missing-semicolon-after-character-reference error).

Will submit an issue/PR to correct it when I get a chance.

[1] https://html.spec.whatwg.org/multipage/parsing.html#named-ch...

Ndymium

Thanks to your article I just realised my HTML entity codec library doesn't support decoding those named entities that can omit the semicolon at the end. More work for me, good thing my summer vacation just started! :)

deepdarkforest

This might not get a lot of traction because it's very technical, but i wanted to say a massive well done for the effort. 20k words on anything this specific is not a joke. I wish i would put this level of commitment to anything in life, this was inspiring if nothing else.

squeek502

Appreciate it (I'm the author). I'd like to think there's a good bit of interesting stuff in here outside of the specific topic of named character reference tokenization.

chaps

"no[t] a 'data structures' person"

says the person who wrote an extremely technical 20k word blog post on data structures! <3

arthurcolle

Congratulations on your newfound promotion to data structures person btw