Go, PET, Let Hen - Curious adventures in (Commodore) BASIC tokenizing
6 comments
·July 5, 2025OhMeadhbh
jim_lawless
You could do similar things on a C64 and other computers. You might try this out on a C64 emulator such as VICE.
10 REM NOTHING TO SEE HERE
20 PRINT "HELLO!"
POKE 2049,1
Run it. You'll see HELLO! LIST it and you'll continuously see line 10. If you try to LIST 20 the machine pretty much locks up.
Screen image is here:
https://jimlawless.net/images/remtrick.gif
(note that in the above image, you'll see two RUN lines ... it appears that I captured the screen as it was in mid-scroll... )
LocalH
At least on the C64, you could also put a line containing REM shift-L in the program, and the LIST command would crash out when encountering it.
jklowden
Why do I remember that every C64 BASIC keyword was a 2-byte integer? A typing shortcut was to enter the first letter, followed by a "shifted" high-bit character. Every keyword was represented that way.
Variables were also 2-bytes, but ASCII. The user could enter a longer name, but only the first two characters were significant.
masswerk
Yes, variable names are 2 bytes, in their stored memory location in RAM. As these must be 7-bit ASCII bytes, the sign-bits and their distribution over these two bytes is used to encode the type. And all simple variables take 7 bytes of memory in total, regardless, whether the remaining 5 bytes are actually needed to store the data or not.
sign-bits type (payload)
0 0 ... floating point number (1 byte exponent, 4 bytes mantissa)
1 1 ... integer (2 bytes)
0 1 ... string (1 byte length, 2-bytes pointer to location)
1 0 ... FN function (2 bytes pointer to BASIC, 2 bytes pointer to parameter variable)
In a program (the BASIC text), though, variables names are stored in full and in plain ASCII, at whatever length of characters.LocalH
Not every keyword could be abbreviated with only two characters. The linked article actually discusses this mechanism. Once tokenized, the keywords only took up a single byte.
This takes me back a few years. I spent HOURS writing BASIC programs to analyze other BASIC programs as a kid. My favourite PET trick was to hide the basic source by putting a comment (REM statement) at the beginning and end of the program. Then POKEing the address of the ending comment in the "next line" link in the first line. It turns out that when the interpreter was running the program, it didn't use the "next line" link, it just assumed the bytes following the current line were the beginning of the next line. But the LIST command //did// use the link. So you could get a program to run perfectly fine, but when someone did a LIST, the only thing they saw were the two comments.
I can't remember if this worked on the C64, but it worked on the 4016 and 4032's in our high school's computer lab.