Inspect ANSI control codes and escape sequences

SpaceL10n

The things they don't prepare you for in school...

I was working at my first job and we had a ColdFusion app that was displaying some data from the database. I get a ticket one day saying our search page would crash when searching for a very specific document. The other 1 million+ documents all loaded fine to our knowledge, so why this one?

I was pretty junior back then and feeling mighty defeated as to why I couldn't figure it out. I debugged every single line and condition, trying to find some reason. After ruling out the code as a culprit, I took the data we were loading and placed it into Notepad++. Don't remember why exactly. I was wracking my brain trying to come up with explanation and lazily moving the text cursor left and right through the text, mostly out of boredom and despair.

That's when I noticed that I had pressed the right arrow key in my keyboard and the text cursor position hadn't changed! I pressed it again and nothing. Again, nothin. It took eight key presses to move the text cursor from one letter in a word to the adjacent letter. I was utterly bamboozled. Why was the text cursor getting stuck in the middle of this word?!

Shortly thereafter, I discovered "Show all hidden characters" setting in the menu. I toggled it and sure enough there were little black boxes with weird three letter strings in them. NUL, ESC, and others - right where my cursor was getting hung up.

That was the day I learned about ANSI control characters and the importance of data sanitization.

112233

"\u001b[0m — reset" ... what? Why SGR is not called by name, while, e.g. CUU is? strange... According to which terminal or standard it interperts sequences?

Is this tool really helpful? It does look nice! But it does not help with the corneriest cases that would benefit from such tool the most.

tronster

This is a fantastic web util; bookmarked for the future.

I wish I had this when I was making, [Dragon's Oven](https://tronster.itch.io/dragon). It was a lot of nights and weekends of tinkering with ANSI codes in Typescript. I learned a lot that surprised me, such as: most modern OS's still don't support 16m colors out of the box and that the default Linux shell doesn't support beyond 16 colors. Also no really good modern ANSI editors out there. I tried bringing back "TheDraw" in DosBOX for some art, but ended up using a mismatch of more modern utilities, false starting one of my own, and working on an image to ASCii/ANSI converter.

Maybe it's growing up in the BBS days, but something about ANSI is really charming.

prometheus76

TheDraw was a cornerstone of my teenage years. I would log into different BBSs just to see their ANSI welcome screens, then I would try and re-create them to learn the art. It was a unique form of animation and I was hoping you had figured out how to get TheDraw working.

I also later used ANSI to make my own cool command line prompts in DOS and later, Linux.

webpro

Working with and debugging ANSI control codes and escape sequences can be a challenge.

This free web-based tool helps to inspect the input, visualize colors and styling, and list control codes. By using a proper tokenizer and parser (not just regex hacks), it supports all sorts of control codes. The parser is open source and available too (find links in "about").

Type or paste text in the black text area, or try out the examples. Use the lookup table to filter & find specific codes.

Feedback welcome, I’d love to know what’s confusing, missing, or especially useful.

michaelmior

Very cool! Seems like this should be a Show HN post.

mnurzia

Neat tool, I could see this being handy for debugging TUI tools.

I noticed that it works with _escaped_ ESC characters ("\x1b", "\u001b", "\033") but it didn't recognize raw ESC characters that I had in my clipboard. It might be useful to support those (maybe highlight them similarly to how VS Code highlights whitespace characters). The characters show up as numbered unicode error glyphs (I'm on Firefox, if that helps)

ryan-c

This is really cool - I've been experimenting with terminal escape sequences recently, and they go deep. Thanks for sharing! Get in touch (email in profile) if you'd like to collaborate.

codesnik

I wonder how many languages have nice looking "\e" for "\u001b". ruby, perl, bash, anything else?

JdeBP

The revealing shibboleth is when people call it "ANSI". (-: "ANSI" is what people call it when they are working from paltry and incomplete samizdat doco of how this stuff works, from Microsoft's old ANSI.SYS appendix to its MS-DOS user manual, to innumerable modern WWW sites all repeating received wisdom.

The thing to remember is that the "E" in "ECMA" does not stand for "ANSI".

* https://ecma-international.org/publications-and-standards/st...

* https://www.itu.int/rec/T-REC-T.416-199303-I

If you read ECMA-35, you'll find that there's actually a whole system to escape sequences and control sequences. As I pointed out last month, it's often the case that people who haven't read ECMA-35 don't realize that parameter characters can be more than digits, don't handle intermediate characters, and don't grasp how DEC's question mark and SCO's equals sign fit into the overall picture. People who haven't read ECMA-48 and traced its history don't realize that there's subtlety to missing parameters in control sequences. And people who haven't read ITU/IEC T.416 do what many of us did years ago and get 24-bit colour wrong. Twice over.

* https://github.com/tattoy-org/tattoy/issues/105#issuecomment...

Other common errors include missing out on all of the other 7-bit aliases for C1 characters. Or not realising that the ECMA-35/ECMA-48 syntax allows for any control sequence to have sub-parameters, not just SGR. Or using regular expressions and pattern matching instead of a state machine. Only a state machine truly handles the fact that in the real world terminals allowed, and enacted, various C0 and C1 control characters in the middle of control sequences, as well as had ways of cancelling or restarting control sequences mid-sequence.

* https://github.com/jdebp/nosh/blob/trunk/source/ECMA48Decode...

But it gets even worse for a real world control sequence decoder.

In the real world, not only do terminals interpret the same control sequences, and their parameters, differently depending from whether the terminal is sending or receiving them; but several terminal emulators like the one in Interix, rxvt, the one built in to Linux, and even XTerm, send control sequences that not only break ECMA-35 but also conflict with received control sequences. So if one wants to be comprehensive and be cabable of decoding real data, one needs a switch to tell the program whether to decode the character stream as if it is being received by the terminal or as if it is being sent by the terminal.

* https://jdebp.uk/Softwares/nosh/guide/commands/console-decod...

Microsoft Terminal tries to do things properly, which many modern terminal emulators and tools do not, and handles this with two distinct entire state machines, one for input and one for output.

* https://github.com/microsoft/terminal/tree/main/src/terminal...

I handled it with a few goto statements and a handful of flags. (-:

* https://github.com/jdebp/nosh/blob/trunk/source/console-deco...

blueflow

I think this rant is out-of-place here, type "\x1b[:<=>$t" and check for yourself. It parses correctly. You do learn about the allowed character ranges for CSI sequences from ECMA-48 only, not from the Microsoft docs, so i guess the author did their homework.

JdeBP

That tells me that you are writing from ignorance, as for starters that's a truly pathetic test that even misses one of the characters that I explicitly mentioned above, let alone thoroughly tests the full range that the specs define. I had an actual poke around the parser code, in contrast to your superficial experimentation. (-: One can, with knowledge, actually find the point where the only three unusual characters that you in fact tested are special cased.

blueflow

They are not special cased:

  https://github.com/webpro/ANSI.tools/blob/main/packages/parser/src/parsers/csi.ts#L12

The comment correctly identifies the 0x30-0x3f range as parameter bytes and the following as intermediate bytes. Both the range and the names for the bytes are matching ECMA-48 Chapter 5.4.

But you seem to think that everyone except yourself is incompetent, are you trying to make up for something?