Machine Code Isn't Scary
232 comments
·June 4, 2025HeyLaughingBoy
a_cardboard_box
> I once, for a side gig, had to write a 16-bit long-division routine on a processor with only one 8-bit accumulator. That was the point at which I declared that I'd never write another assembly program.
This is exactly the kind of job I'd enjoy! A perfectly doable technical challenge with clear requirements. Some people like solving Sudoku puzzles, I like solving programming puzzles.
I guess I'm just not "the overwhelming majority of programmers".
HeyLaughingBoy
> doable technical challenge with clear requirements
That's a Project Management issue, not an implementation concern.
In my case, there was no requirement that said "use 16-bit long division." However, we had committed to a particular processor family (MC68HC05), and the calculation precision required 16-bit math. IIRC, there was a compiler available, but it cost more than the rest of the project and the code it produced wouldn't have fit into the variant of the processor that I was using anyway.
The actual requirement would have looked more like "detect a 0.1% decrease in signal that persists for 10 seconds, then do X."
Animats
Oh, yes, that era. I had to program a MC68HC11 in Forth because the C compiler was so expensive.
favorited
I think the majority of programmers would enjoy it, but most would first need to pick an ISA (something older is probably going to be more approachable for beginners), learn enough about it to understand basic arithmetic instructions, learn enough about the dev tools to be able to assemble, link, and execute their code, etc.
For most folks, that's going to be a couple days of prep work before they can get to the fun part of solving the puzzle.
markus_zhang
I totally agree. I read and commented the source code of Woz's SWEET16 and it was a blast to fully understand it.
But of course, might not be that rosy if under great time constraints.
commandlinefan
> the kind of job I'd enjoy
I feel the same way, but I also can't help but imagine the boss jumping up and down and throwing chairs and screaming "how can you not be done yet? You're a programmer and this is a program and it's been three _hours_ already".
1vuio0pswjnm7
"I guess I'm just not the "overwhelming majority of programmers"."
The "overwhelming" majority of programmers may be underwhelming
Some readers may be unimpressed by programmers who complain about and criticise assembly language, e.g., claiming it offers "no benefit" to others, especially when no one is forcing these programmers to use it
WalterBright
I did a lot of assembler programming before discovering C. I learned C in maybe an hour because of that.
Not knowing assembler means programmers have a bit of a blind spot towards what are expensive things to do in C vs what generates the best code.
For example, debugging a program sometimes requires looking at the generated assembler. Recently I was wondering why my deliberate null pointer dereference wasn't generating an exception. Looking at the assembler, there were no instructions generated for it. It turns out that since a null pointer dereference was undefined behavior, the compiler didn't need to generate any instructions for it.
ozim
I build web applications that run on top of databases, web servers and frameworks.
I do need to understand how indexes in db engine work, I need to understand there might be socket exhaustion in the system, I do need to understand how my framework allocates data on heap vs stack.
Having to drop to instructions that is for web servers, db, frameworks developers not for me to do. I do have a clue how low level works but there is no need for me.
That is part where parent poster is correct there are better ways for developers to spend time on - trust your database, web servers and framework and learn how those work in depth you can skip asembler, because all of those will take a lot of time anyway and most likely those are the ones you should/can tweak to fix performance not assembler.
marssaxman
> since a null pointer dereference was undefined behavior, the compiler didn't need to generate any instructions for it.
I deeply hate this attitude in modern compiler design.
WalterBright
Me too. My compilers don't do that.
cogman10
The issue is it's a moving target. What was expensive yesterday could be fast today based on compiler optimizations (and potentially vice versa).
Further, changes in the ISA can open up gains in performance that weren't available in yesteryear. An example of this would be SIMD instruction usage.
It's not a bad idea to know enough assembly language to understand why code is slow. However, the general default should be to avoid writing it. Your time would be better spent getting a newer version of your compiler and potentially enabling things like PGO.
zahlman
> most posters advocating learning assembly language have never had to use it in a production environment... For the overwhelming majority of programmers, assembly offers absolutely no benefit.
I don't follow. Why should assembly have to be useful or pleasant in a production environment, for learning it to be useful?
I was taught a couple different flavours of assembly in university, and I found it quite useful for appreciating what the machine actually does. Certainly more so than C. Abstractions do ultimately have to be rooted in something.
atoav
You and the post you commented on display both a valid point. If we're talking about using assembly as a broad general purpose programming environment that would be a mess (which is precisely why it has no broad adoption). When we talk about assembly as a niché special purpose solution we would come to a different conclusion, coincidentally this is where assembly is still used today: environments where we need highly optimized code.
Your point about education is orthogonal to the point made. I agree with you that learning assembly can be a good way to teach people how computers work on a low level, but that has nothing to do with whether it is useful as a skill to learn.
As someone teaching similar things at the university level to a non-tech audience I have to always carefully wheigh how much "practically useless" lessons a typical art student can stomach. And which kind of lesson will just deter them, potentially forever.
zahlman
> I agree with you that learning assembly can be a good way to teach people how computers work on a low level, but that has nothing to do with whether it is useful as a skill to learn.
I don't understand the distinction you're trying to make. The post I was replying to specifically discussed "learning assembly language". My entire point is that "learning assembly language" has purposes other than creating practical real-world programs in assembly.
bongodongobob
Is it useful to learn bagpipes? I guess learning for its own sake is good, but if you want to join a band, guitar or keyboards are going to be a better bet and learning bagpipes first isn't going to do much for you.
barrkel
Do bagpipes explain the mystery of sand performing calculations and taking actions? Do they give you an intuition for connecting how CPUs and memory accesses and cache hierarchies work with high level code, in such a way that you can start to understand why one version of code might be faster or slower than another?
If you can't see through field accesses and function calls to memory indirections, anything you might read about how TLBs and caches and branch prediction work doesn't connect to much.
strken
If a guitar was an abstraction layer that was implemented by low-level bagpipes then a) that would be awesome and b) guitar players would find their guitar playing to benefit from bagpipe lessons. At the very least they'd be able to understand and maintain their guitar better.
spc476
Learning the accordion didn't hurt Weird Al's career, nor did using the flute hurt Ian Anderson (lead vocalist and flutist of Jethro Tull).
mabster
I started my career in assembly and it's reduced over time. Towards the end of the gamedev work I was still reading a lot of assembly but no longer writing it (using intrinsics instead). It was definitely a lot slower to write.
But there are a number of things we did that are not available or difficult in C:
- Guaranteed tail calls
- Returning multiple values without touching memory
- using the stack pointer as a general purpose pointer for writing to memory
- Changing the stack pointer to support co-routimes
- Using our own register / calling convention (e.g. setting aside a register to be available to all routines
- Unpicking the stack to reduce register setup for commonly used routines or fast longjmps
- VM "jump tables" without requiring an indirection to know where to jump to
flohofwoe
On 8-bit home computer CPUs like the 6502 or Z80, high level programming languages like C simply were not an option, you left too much performance on the table (not to mention BASIC which was easily 100x slower than handwritten assembly).
Forth was quite acceptable performance wise, but that's barely above a good macro assembler.
And after the 8-bitters, assembly coding on the Amiga was pure pleasure - also for large programs, since apart from the great 68k ISA the entire Amiga hardware and operating system was written to make assembly coding convenient (and even though C was much better on the 68k, most serious programs used a mix of C and assembly).
(also, while writing assembly code today isn't all that important, reading assembly code definitely is when looking at compiler output and trying to figure out why and how the compiler butchered my high level code).
jamesfinlayson
> (also, while writing assembly code today isn't all that important, reading assembly code definitely is when looking at compiler output and trying to figure out why and how the compiler butchered my high level code).
Agreed - I wouldn't be able to write any x86 assembly without a bit of help, but having done some game reverse engineering I've learned enough to make sense of compiler generated code.
pjmlp
To add to that, there is a reason why even all modern JITs also have ways to look into generated code.
Anyone curious how their JVM, CLR, V8, ART, Julia,.... gets massaged into machine code only needs to learn about the related tools on the ecosystem.
Some of them are available on online playgrounds like Compiler Explorer, Sharpio,....
acegopher
> the entire Amiga hardware and operating system was written to make assembly coding convenient
I am curious what specific examples do you have of the HW and OS being made/written to make ASM convenient?
flohofwoe
The hardware could be controlled via memory mapped 16-bit registers, e.g. checking whether the left mouse button is down is a single instruction:
btst #6, $bfe001
The OS used a simple assembly-friendly calling convention, parameters were passed in registers instead of the stack (and the API documentation mentioned which parameters are expected in which registers), and the reference manuals usually had both C and assembly examples, etc... basically lots of little things to make the lives of assembly coders easier.This YouTube playlist gives a nice overview of assembly coding on the Amiga (mostly via direct hardware access though): https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hT...
ferguess_k
Most of us do not have the chance to use it in production. I think that's where the fancy came from.
We are also getting burned out by the modern Agile web/data/whatever development scene and would like to drill really deep into one specific area without stakeholders breathing down our necks every few hours, which assembly programming conveniently provides.
I also consider the grit (forced or voluntary) to be a bath of fire which significantly improved two important things - the programmer's understanding of the system, and the programmer's capability to run low level code in his brain. Is it suffering? Definitely, but this is a suffering that bring technical prowess.
Most of us do not have the privilege to suffer properly. Do you prefer to suffer from incomplete documentation, very low level code and banging your head on a wall for tough technical problems, or do you prefer to suffer from incomplete documentation, layers and layers of abstraction, stakeholders changing requirements every day and actually know very little about technical stuffs? I think it is an easy choice, at least for me. If there is an assembly language / C job that is willing to take me in, I'll do it in half of the salary I'm earning.
craftkiller
I never used it in production and yet learning it absolutely provided me with benefits. I didn't understand pointers until I spent a weekend learning assembly.
Const-me
I think writing assembly indeed offers no benefit for most developers. However, being able to read and understand assembly is generally useful.
Enables debugging binaries and crash dumps without complete source codes, like DLLs shipped with Windows or third-party DLLs. Allows to understand what compilers (both traditional and JIT) did to your source codes, this is useful when doing performance optimizations.
anta40
I write mobile apps for living (mostly Java/Kotlin, a little bit of Flutter/RN) so yeah agree assembly is practically useless for professional work.
But for tinkering (e.g writing GBA/NES games), hell why not? It's fun.
Lerc
I have tried to convince people that ASM is reasonable as a first stage teaching language. The reputation as a nearly mystical art practiced by a few doesn't help. The thing is, instructions are simple. Getting them to do things is not hard, the difficulty comes from tasks exceeding a scale where you can think about things at their most basic level.
It quickly becomes tedious to do large programs, not really hard, just unmanagable, which is precisely it should be taught as a first language. You learn how do do simple things and you learn why programming languages are used. You teach the problem that is being solved before teaching more advanced programming concepts that solve the problem.
tsimionescu
The biggest problem with using ASM as a first language to teach beginners is that it is extremely tedious, error prone, and sensitive to details. It is also unstructured, it uses entirely different control flow primitives than any language they will learn in the future, meaning they will not be well prepared for learning a real language that does scale to programs more complex than a few additions and calling an OS output routine.
So why teach someone a language that doesn't have if, while, (local) variables, scopes, types, nor even real function calls?
It's a very nice exercise for understanding how a computer functions, and it has a clear role in education - I'm not arguing people shouldn't learn it at all. But I think it's a terrible first language to learn.
MobiusHorizons
Because these are the primatives that are in use when programming in any language, and there is a benefit to learning the primatives before learning higher level abstractions. For instance we teach arithmetic before calculus.
I see lots of people become pretty helpless when their framework isn’t working as expected or abstraction becomes leaky. Most people don’t really need to know assembly in order to get past this, but the general intuition of “there is something underneath the subtraction that I could understand” is very useful.
jcranmer
The primitives of control flow in programming languages are sequencing, if, while, for, switch, return, and "early return" (goto restricted to exit a containing block). We might compile these into a form that represents everything using conditional jumps, unconditional jumps, and jump tables, but that's not how people think about it, definitely not at the level of programming languages (and even in the compiler IR phase, we're often mentally retranslating the conditional jump/unconditional jump model back into the high-level control flows).
And I could go on with other topics. High-level languages, even something like C, are just a completely different model of looking at the world from machine language, and truly understanding how machines work is actually quite an alien model. There's a reason that people try to pretend that C is portable assembler rather than actually trying to work with a true portable assembler language.
The relationship you're looking for is not arithmetic to calculus, but set theory to arithmetic. Yes, you can build the axioms of arithmetic on top of set theory as a purer basis. But people don't think about arithmetic in terms of set theory, and we certainly don't try to teach set theory before arithmetic.
jader201
> For instance we teach arithmetic before calculus.
I don’t think that’s a fitting analogy.
Nearly everyone on the planet uses (basic) arithmetic. Very few use calculus.
By contrast, very few (programmers) use ASM, but nearly all of them use higher level languages.
tsimionescu
I think comparing assembly with arithmetic is dead wrong. Arithmetic is something that you use constantly in virtually any mathematical activity you will ever do, at least at the under-graduate level. There is literally 0 calculus, statistics, or algebra you could understand if you didn't know arithmetic.
In contrast, you can have a very successful, very advanced career in computer science or in programming without once in your life touching a line of assembler code. It's not very likely, and you'll be all the poorer for it, but it's certainly possible.
Assembly language is much more like learning the foundations of mathematics, like Hilbert's program (except, of course, historically that came a few millenia after).
wavemode
> extremely tedious, error prone, and sensitive to details
I've taught people Python as their first language, and this was their exact opinion of it.
When you're an experienced programmer you tend to have a poor gauge of how newcomers internalize things. For people who are brand new it is basically all noise. We're just trying to gradually get them used to the noise. Getting used to the noise while also trying to figure out the difference between strings, numbers, booleans, lists, etc. is more difficult for newcomers than many people realize. Even the concept of scoping can sometimes be too high-level for a beginner, IME.
I like asm from the perspective that, its semantics are extremely simple to explain. And JMP (GOTO) maps cleanly from the flowchart model of programming that most people intuit first.
cameldrv
IMO Python used to be a great first language, but it's gotten much more complicated over the years. When I'm teaching programming, I want an absolute minimum number of things where I have to say "don't worry about that, it's just boilerplate, you'll learn what it means later."
In particular, Python having generators and making range() be a generator means that in order to fully explain a simple for loop that's supposed to do something X times, I have to explain generators, which are conceptually complicated. When range() just returned a list, it was much easier to explain that it was iterating over a list that I could actually see.
tsimionescu
I do realize how difficult this all is, I still have some recollection from how I started to program and how alien it all seemed. And note that I first started with 4 years of C in high-school
However, I don't agree at all that having strings and numbers as different things was ever a problem. On the contrary, explaining that the same data can be interpreted as both 40 and "0" is mistifying and very hard to grok, in my experience. And don't get me started on how hard it is to conceptualize pointers. Or working with the (implicit) stack in assembly instead of being able to use named variables.
wat10000
My kid is just finishing up a high school intro CS class. A full school year in, and they still have trouble with the fact that their variable and type names must have the exact same capitalization everywhere they're used.
yetihehe
> So why teach someone a language that doesn't have if, while, (local) variables, scopes, types, nor even real function calls?
You can teach them how to implement function calls, variables and loops using assembly, to show them how they work under the hood and how they should be thankful for having simple if in their high level languages like C.
tsimionescu
That often leaves people with very bad mental models of how programs actually compile in modern optimizing compilers and in modern operating systems (e.g. people end up believing that variables always live on the stack, that function parameters are passed on the stack, that loops are executed in the same way regardless of how you write them, etc).
diggan
> it is extremely tedious, error prone, and sensitive to details.
That sounds like the perfect beginner language! If they survive that experience, they'll do very well in almost any type of programming, as it's mostly the same just a tiny bit less tedious. A bit like "hardening" but for programmers.
HeyLaughingBoy
Perhaps if you want to gatekeep for the most stubborn individuals, but you'll lose a lot of talent that way.
horsawlarway
So much this.
This is like learning to read by first being asked to memorize all the rules of grammar and being quizzed on them, or being forced to learn all the ins and outs of book binding and ink production.
It's tedious, unproductive, miserable.
There's very little reward for a lot of complexity, and the complexity isn't the "stimulating" complexity of thinking through a problem; it's complexity in the sense of "I put the wrong bit in the wrong spot and everything is broken with very little guidance on why, and I don't have the mental model to even understand".
There's a perfectly fine time to learn assembly and machine instructions, and they're useful skills to have - but they really don't need to be present at the beginning of the learning process.
---
My suggestion is to go even farther the other way. Start at the "I can make a real thing happen in the real world with code" stage as soon as possible.
Kids & adults both light up when they realize they can make motor turn, or an LED blink with code.
It's similarly "low level" in that there isn't much going on and they'll end up learning more about computers as machines, but much more satisfying and rewarding.
skydhash
The best way to go about that is to use a simulator for an old cpu like EdSim51[0]. Can do a lot of things with just a few lines of code.
> it's complexity in the sense of "I put the wrong bit in the wrong spot and everything is broken with very little guidance on why, and I don't have the mental model to even understand
That's the nice thing about assembly, it always works, but the result may not be as expected. But instead of having a whole lot of magic between what is happening and how you model it, it's easy to reason about the program. You don't have to deal with stack trace, types, garbage collection and null pointer exception. Execution and programming is the same mental model, linear unless you said so.
You can start with assembly and then switch to C or Python and tell them: For bigger project, assembly is tedious and this is what we invented instead.
mousethatroared
"is that it is extremely tedious, error prone, and sensitive to details. It is also unstructured,"
That's why it's such an important first language! Pedagogically it's the foundation motivating all the fancy things languages give you.
You don't teach a kid to cut wood with a table saw. You give them a hand saw!
tsimionescu
No, it is not the foundation motivating what other languages give you, not at all.
Programming languages are usually designed based on formal semantics. They include constructs that have been found either through experience or certain formal reasons to be good ways to structure programs.
Haskell's lazy evaluation model, for example, has no relationship to assembly code. It was not in any way designed with thought to how assembly code works, it was designed to have certain desirable theoretical properties like referential transparency.
It's also important to realize that there is no "assembly language". Each processor family has its own specific assembly code with its own particular semantics that may vary wildly from any other processor. Not to mention, there are abstract assembly codes like WebAssembly or JVM bytecode, which often have even more alien semantics.
ziml77
You give them a hand saw because power tools are far easier to inflict serious injuries with. But if you're teaching a kid who's old enough, there's no reason to start on a hand saw if you have the power tools available.
taco_emoji
> You don't teach a kid to cut wood with a table saw. You give them a hand saw!
Okay but that's not for pedagogical reasons, it's because power saws are MUCH more dangerous than hand saw.
Contrariwise, you don't teach a kid to drill wood with a brace & bit, because a power drill is easier to use.
mystified5016
Controversial opinion but we should be teaching new programmers how a CPU works and not hand-wave the physical machine away to the cloud.
Not doing this is how you get Electron.
flohofwoe
Starting with assembly makes it pretty clear why higher level languages had been invented. E.g. a speed run through computing:
- machine code
- assembly
- Lisp and Forth
- C
- Pascal
- maybe a short detour into OOP and functional languages
...but in the end, all you need to understand for programming computers are "sequences, conditions and loops" (that's what my computer club teacher used to say - still good advice).
TheOtherHobbes
I'd change the end of that list to C, Pascal, Lisp, Python.
But in the end no one learns "assembler". Everyone learns a specific ISA, and they all have different strengths and limitations. Assembler on a 36-bit PDP-10, with 16 registers and native floating point, is a completely different experience to assembler on a Z80 with an 8-bit accumulator and no multiply or divide.
You can learn about the heap and the stack and registers and branches and jumps on both, but you're still thinking in terms of toy matchstick architecture, not modern building design.
tsimionescu
> but in the end, all you need to understand for programming computers are "sequences, conditions and loops"
I fully agree - and assembly language teaches you precisely 0 of these.
flohofwoe
IMO It depends a lot on the assembly flavour.
The best ISA for learning is probably the Motorola 68000, followed by some 8-bit CPUs (6502, 6809, Z80), also probably ARM1, although I never had to deal with it. I always thought that x86 assembly is ugly (no matter if Intel or AT&T).
> It quickly becomes tedious to do large programs
IME with modern tooling, assembly coding can be surprisingly productive. For instance I wrote a VSCode extension for 8-bit home computers [1], and dog-fooded a little demo with it [2], and that felt a lot more productive than back in the day with an on-device assembler (or even typing in machine code by numbers).
[1] https://marketplace.visualstudio.com/items?itemName=floooh.v...
[2] https://floooh.github.io/kcide-sample/kc854.html?file=demo.k...
Lerc
Oh nice, I was talking just yesterday how I liked chips as a programming paradigm.
I agree about tooling, I made a pacman game in a dcpu16 emulator in a couple of days.
https://fingswotidun.com/dcpu16/pac.html
I experimented with a fantasy console idea using an in-browser assembler as well. https://k8.fingswotidun.com/static/ide/I think you can build environments that give immediate feedback and the ability to do real things quickly in ASM. I would still recommend moving swiftly on to something higher level as soon as it started to feel like a grind.
tsimionescu
Sure, but learning an old ISA can leave you with a very very wrong idea about how modern processors work. Even x86 assembly paints a very misleading image of how modern processors actually work. For example, someone learning x86-64 assembly will likely believe all of the following:
- assembly instructions are executed in the order they appear in in the source code
- an x86 processor only has a handful of registers
- writing to a register is an instruction like any other and will take roughly the same time
- the largest registers on an x86 processor are 64-bit
t-3
They will be disabused of any of those notions simply by reading the relevant portions of the architecture handbook. In a pedagogical environment that's very simple to arrange.
ThrowawayR2
All of which are completely irrelevant implementation details hidden behind the ISA. The x86-64 ISA promises execution of instructions in the specified order, a certain number of registers, etc. and that's all they need to know.
flohofwoe
Peeking under the hood is a later step after getting comfortable with assembly coding. E.g. none of those details are really relevant when starting out, instead it makes a lot of sense to do a speed run through computing history in order to really understand why modern CPUs (and computers as a whole) work like they do.
whobre
I agree that M68k is nice, as are the 8-bit ones you mention. I just find it strange that you like Z80 and dislike x86 - they are fundamentally not that different and both are descended from 8080.
flohofwoe
Yeah the Z80 instruction set is quite messy (mainly because it had to fill gaps of the 8080 instruction set for backward compatibility). But as an evolution of the 8080 instruction set, the Z80 is still cleaner than x86 (IMHO!).
Also, the Z8000 looks quite interesting and like the better 16-bit alternative to the x86, but it never took off: https://en.wikipedia.org/wiki/Zilog_Z8000
aaronbaugher
I started with Z-80 assembly, then BASIC, then 6502 assembly, then higher-level languages like C and perl, and I think the assembly gave me a useful foundation for what was going on under the hood. I'm not sure I'd even call assembly a "language" in the sense of the others. It has instructions, not statements, and there's really no syntax.
If I were teaching a general-interest programming course, I'd probably start with just a bit of BASIC to introduce a few broad concepts like variables and looping, then a little assembly to say, "And this is what's going on when you do those things," and then move up the chain. Then when they get to something like C and go to look at the assembly it produces for debugging, they'll at least be familiar with concepts like registers and branching. So not quite the order I happened to do it in, but similar.
criddell
If you have a good macro assembler, it is only a little more difficult than C. There's just more to learn up front (things like calling conventions, register usage, etc...).
I wouldn't teach it first, but after a person knows the basics in another language, seeing how it all actually works can be fun.
jcranmer
I was a TA for an intro to assembly language course, which means I got my office hours full of all of the students who struggled with assembly language and had to work with them one-on-one to get them over their roadblocks to pass the class.
Assembly language is not a reasonable first programming language. There's just so many things about it that make it a poor choice for programming instruction.
Chiefly, assembly lacks structure. There's no such thing as variables. There's no such thing as functions. You can fake some of this stuff with convention, but if you make mistakes--and students in intro-to-programming will make mistakes--there is nothing that's going to poke you that you did something wrong, you just get the wrong result.
ferguess_k
I think in most CS programs, students do learn assembly early on, perhaps not as the first language, but definitely as a second language, as required by most Arch courses.
cameldrv
Personally way back when, I first learned BASIC, then tried to learn C, but didn't get pointers, then learned ASM, and then pointers became obvious, and went back to C. If you're going to be using C or doing anything with hardware, learning ASM IMO is very useful just to understand how the machine really works.
zahlman
> Getting them to do things is not hard, the difficulty comes from tasks exceeding a scale where you can think about things at their most basic level.
Indeed - you don't actually need to work on difficult tasks to get the intellectual benefit. Once you've properly understood what a computer is, you can absorb the ideas of SICP.
flohofwoe
Here's a similar (and much more indepth) opcode decoding recipe for Z80, very useful for emulator development:
http://www.z80.info/decoding.htm
For actually programming in machine code this understanding of the internal opcode structure isn't all that useful though, usually - without an assembler at hand - you had a lookup table with all possible assembly instructions on the left side, and the corresponding machine code bytes on the right side.
Programming by typing machine code into a hex editor is possible, but really only recommended as absolute fallback if there's no assembler at hand - mainly because you had to keep track of all global constant and subroutine entry addresses - e.g. the main thing that an assembler does for you, and you had to leave gaps at strategic locations so that it is possible to patch the code without having to move things around.
tasty_freeze
For the past year or so, a couple teen boys from my neighborhood come by on sunday afternoon for a couple hours of programming in python. I started very simply and built up with text based tasks, then showed them pygame.
I am thinking about showing them what is under the hood, that python itself is just a program. When I learned to program it was the late 70s, and trs-80s and apple-IIs were easy to understand at the machine code level.
I could recapitulate that experience for them, via an emulator, but that again just feels like an abstraction. I want them to have the bare-metal experience. But x86 is such a sprawling, complicated instruction set that it is very intimidating. Of course I'd stick to a simplified subset of the instructions, but even then, it seems like a lot more work to make output running on a PC vs on the old 8-bit machines where you write to a specific location and it shows up on the screen.
ThrowawayR2
Buy them a copy of "Human Resource Machine" on Steam or (preferably since there's no DRM) Good Old Games. It's a gamified version of what writing machine language on the old 8 bit CPUs of yore was like. The puzzle challenge in HRM is authentic in the sense that it derives from the natural constraints of having a single accumulator and very simple instructions rather than unnaturally injected constraints like the Zachtronics games, which are good but I wouldn't recommend as a learning tool.
TuringTourist
You could try the thing that made it click for me, long after x86 was dominant.
Show them a CPU running on Logisim (or the like, such as the newer Digital) and show how when you plug a program into a ROM, it turns into wires lighting up and flipping gates/activating data lines/read registers etc.
boricj
Machine code isn't scary, but its nature is severely misunderstood.
Skipping over the bundling of instructions into code blocks, the next logical construct are functions. These have references to code and data in memory; if you want to relocate functions around in memory you introduce the concept of relocations to annotate these references and of a linker to fix them to a particular location.
But once the linker has done its job, the function is no longer relocatable, you can't move it around... or that is what someone sane might say.
If you can undo the work of the linker, you can extract relocatable functions from executables. These functions can then be reused into new executables, without decompiling them first; after all, if what you've extracted is equivalent to the original relocatable function, you can do the same things than it.
Repeat this process over the entire executable and you're stripped it for parts, ready to be put back together with the linker. Change some parts and you have the ability to modify it as if you're replacing object files, instead of binary patching it in place with all the constraints that comes with it.
Machine code is like Lego bricks, it just takes a rather unconventional point of view (and quite a bit of time to perfect the art of delinking) to realize it.
jebarker
ASM programming is fun. Machine code (as in what ASM encodes to) isn't scary, but it is extremely tedious to work with. I recommend the first part of Casey Muratori's Performance Aware Programming course if you want to feel that pain.
ferguess_k
I think you need to do it in production to retain the knowledge. If you just do it as a hobby, then most people just give up at certain point because there is no point to bang your head on the wall for nothing. You need to have a real problem to solve to go a long way.
shakna
I started building a Forth recently, but decided that instead of interpreter or transpiler or whatever, I'd map to bytes in memory and just straight execute them.
This non-optimising JIT has been far, far easier than all the scary articles and comments I've seen led me to believe.
I'm already in the middle of making it work on both Aarch64 and RISC-V, a couple weeks in.
pjmlp
We did a similar approach back in the day, when going through the Tiger language[0], on the Java version.
Our approach was to model the compiler IR into Assembly macros, and follow the classical UNIX compiler build pipeline, thus even though it wasn't the most performant compiler in the world, we could nonetheless enjoy having our toy compiler generate real executables in the end.
cnity
I did this for WebAssembly WAT (an IR that is syntactically similar to lisp) by mapping the AST for my lisp more or less directly to the WAT IR, then emitting the bytecode from there. It was pretty fun.
simpleui
Very interesting, care to share the source?
shakna
Oh, it's still a while off that. I do plan to make it public at some point, but when I'm actually happy the code isn't completely vomit.
But for a simple taste, the push to stack function currently looks like this. (All the emit stuff just writes bytes into a mmap that gets executed later.)
void compile_push_literal(Value val) {
#if ARCH_X86_64
emit_bytes((uint8_t[]){X86_MOV_RDI_IMM64_0, X86_MOV_RDI_IMM64_1}, 2); emit_uint64_le(val);
emit_bytes((uint8_t[]){X86_MOV_RAX_IMM64_0, X86_MOV_RAX_IMM64_1}, 2); emit_uint64_le((uint64_t)push);
emit_bytes((uint8_t[]){X86_CALL_RAX_0, X86_CALL_RAX_1}, 2);
#elif ARCH_ARM64
uint64_t imm = val;
emit_uint32_le(ARM64_MOVZ_OP | (ARM64_REG_X0 << 0) | ((imm & 0xFFFF) << 5));
emit_uint32_le(ARM64_MOVK_OP_LSL16 | (ARM64_REG_X0 << 0) | (((imm >> 16) & 0xFFFF) << 5));
emit_uint32_le(ARM64_MOVK_OP_LSL32 | (ARM64_REG_X0 << 0) | (((imm >> 32) & 0xFFFF) << 5));
emit_uint32_le(ARM64_MOVK_OP_LSL48 | (ARM64_REG_X0 << 0) | (((imm >> 48) & 0xFFFF) << 5));
uint64_t func_addr = (uint64_t)push;
emit_uint32_le(ARM64_MOVZ_OP | (ARM64_REG_X1 << 0) | ((func_addr & 0xFFFF) << 5));
emit_uint32_le(ARM64_MOVK_OP_LSL16 | (ARM64_REG_X1 << 0) | (((func_addr >> 16) & 0xFFFF) << 5));
emit_uint32_le(ARM64_MOVK_OP_LSL32 | (ARM64_REG_X1 << 0) | (((func_addr >> 32) & 0xFFFF) << 5));
emit_uint32_le(ARM64_MOVK_OP_LSL48 | (ARM64_REG_X1 << 0) | (((func_addr >> 48) & 0xFFFF) << 5));
emit_uint32_le(ARM64_BLR_OP | (ARM64_REG_X1 << 5));
#elif ARCH_RISCV64
emit_load_imm_riscv(val, RISCV_REG_A0, RISCV_REG_T1);
emit_load_imm_riscv((uint64_t)push, RISCV_REG_T0, RISCV_REG_T1);
emit_uint32_le((0 << 20) | (RISCV_REG_T0 << 15) | (RISCV_F3_JALR << 12) | (RISCV_REG_RA << 7) | RISCV_OP_JALR);
#endif
}
0x000xca0xfe
Machine code generation for RISC-V is so easy. Excellent for teaching.
User23
This is super cool!
Creating an assembler with Lisp syntax and then using that to bootstrap a Lisp compiler (with Lisp macros instead of standard assembler macros) is one of those otherwise pointless educational projects I’ve been wanting to do for years. One day perhaps.
simpleui
Thanks Shakna!
mananaysiempre
I mean, it’s not hard as such, the encodings of some instruction sets are just ass, with 32- and 64-bit x86 as the foremost example and Thumb-2 not far behind it. Also, if you’re dynamically patching existing code, you’ll have to contend with both modern OSes (especially “hardening” patches thereto) making your life harder in bespoke incompatible ways (see: most of libffi) and modern CPUs being very buggy around self-modifying code. Other than that, it just takes quite a bit of tedious but straightforward work to get anywhere.
shakna
I haven't had any issues with the OS.
I mmap, insert, mark as executable and done. Patchjumping and everything "just works".
I'm not modifying my own process, so there's no hardening issues. Just modifying an anonymous memory map.
PlunderBunny
I taught myself to program on an 8-bit BBC micro-computer in the mid-80s by typing in BASIC listings. I understood BASIC quite well, and could write my own structured BASIC programs, but machine code was always a bit out-of-reach. I would try to read books that started by demonstrating how to add, subtract etc, but I couldn’t see how that could build up to more complicated stuff that I could do in BASIC, like polling for input, or playing sounds, or drawing characters on the screen. Only once I got an advanced users guide and discovered the operating system commands, then it started to click with me - the complicated stuff was just arranging all the right data in the right bits of memory or registers, then (essentially) calling a particular OS command and saying ‘here’s the data you want’.
bowsamic
Yeah the issue is that the pedagogy doesn’t make it clear how to bridge the “calculator” with the OS stuff. I had this issue when I was a kid. How does adding eventually make something draw on the screen? Of course, it doesn’t, you need some hardware or OS specific information
eterm
It wasn't until I read Petzold's CODE that this stuff, especially the role of the the motherboard bridging processing, memory and I/O and what an OS is for, that it started to click for me.
bowsamic
Yeah my stepdad bought me CODE as a kid and that really helped. A similar book that took a "top down" approach as much as possible would have also helped though, but would be much harder to pull off
tux3
But if this doesn't satisfy your curiosity, you might realize this is just pushing the magic blackbox/question mark a little further down the chain
How does the OS and the hardware draw on the screen, actually? All they have is also just calculator stuff, super basic primitives. You can't even do loops in hardware, or even real branches (hardware always "executes both sides" of a branch at once)
Anyways, if you keep digging long enough you eventually end up finding this XKCD https://xkcd.com/435/ =)
tsimionescu
> hardware always "executes both sides" of a branch at once
Unless you're talking about quantum hardware, that is very much not true. The whole point of transistors is to choose whether to power one part of a circuit or another.
Plus, even for hardware, the solution to all this is to modularize all the way down. One piece of hardware sets up the right state and powers up another piece of hardware - this type of logic doesn't stop at the OS level. For drawing on the screen, ultimately you reach a piece of hardware that lights up in one of three colors based on that state - but all the way there, it's the same kind of "function calls" (and even more indirection, such as communication protocols) on many levels.
neomech
In 1982, I programmed my ZX81 by converting assembly to hex by hand because BASIC was just too slow. I'd write my assembly on paper, convert it to hex using reference tables, then use a simple BASIC FOR loop to POKE the values into memory we'd reserved space for the machine code in a REM statement at a fixed position in memory.
When all the values were POKEd in, I'd save to tape and execute it with RAND USR 16514.
That memory address is permanently etched in my brain even now.
It wasn't good, bad or scary it was just what I had to do to make the programs I wanted to make.
stevekemp
I did the same thing, on the 48k Spectrum, a year or two later. I also remembered to add some NOPs between functions, to avoid me having to recalculate all the relative jump instructions if I made changes.
dedicate
For me the 'scary' part of machine code was never the actual logic. It was always just staring at that wall of hex or mnemonics and feeling like I needed a secret decoder ring!
jiehong
Yes, that does not help.
To me, it looks like some kind of complex tetris game. I guess we could maybe represent a program as such, with pieces for registers, instructions, etc.
Yet, the tooling we have is very terse, and textual.
amelius
Machine code was only "scary" in the old days when you had to reboot your system when you made a small mistake.
rausr
I enjoyed having to reload everything from tape (compact cassette tape - ie the kind of thing you'd use with a home computer in the early eighties), after a crash due to my poor code. I think the term used then was "character building" ;)
chopin
One of my first C programs wrote straight into the BIOS memory (1989 iirc). The machine froze and refused to reboot. We had to remove the BIOS battery to reset the BIOS. Luckily, the battery wasn't soldered to the main board.
renox
The thing is: most programmers see assembly language generated by a compiler, so no comment, and in optimised code with vector operations, it IS scary.
Reading this thread leaves me with the impression that most posters advocating learning assembly language have never had to use it in a production environment. It sucks!
For the overwhelming majority of programmers, assembly offers absolutely no benefit. I learned (MC6809) assembly after learning BASIC. I went on to become an embedded systems programmer in an era where compilers were still pretty expensive, and I worked for a cheapskate. I wrote an untold amount of assembly for various microcontrollers over the first 10 years of my career. I honestly can't say I got any more benefit out of than programming in C; it just made everything take so much longer.
I once, for a side gig, had to write a 16-bit long-division routine on a processor with only one 8-bit accumulator. That was the point at which I declared that I'd never write another assembly program. Luckily, by then gcc supported some smaller processors so I could switch to using Atmel AVR series.