Go-attention: A full attention mechanism and transformer in pure Go

85 comments

·March 3, 2025

atomic128

You can do a lot better than this, by using Go as a JIT code generator, dynamically linking the result, and jumping into it with cgo. Easily saturates the CPU vector math units.

I use exactly this approach for the futures/options prediction transformers on my website.

But I will never "open" another piece of software, now that it's all grist for the LLM code generator industry. Anonymous common property, sold by the LLM companies. No credit to the author.

Why anyone opens any software anymore is a mystery to me. We are witnessing the greatest theft of intellectual property in the history of Man.

badsectoracula

> Why anyone opens any software anymore is a mystery to me.

Because i open my software to be useful to others, including others that may benefit from my code indirectly via LLM being trained on it. If anything, just recently i was thinking of how to make a documentation generator system to generate documents in a format that'd be easier for LLMs to "grok" so that people can feed it to an LLM and ask questions about it.

I'd advocate for using a local LLM instead though, they may not be technically as good as the cloud stuff you rent, but they are good enough, can run on most mid-to-high-end PCs and you are in control.

godelski

  > We are witnessing the greatest theft of intellectual property in the history of Man.

Issue is that this has been going on for decades. We've been really bad at allocating capital to people who are building important and highly influential software. These big companies do play a role, but it is a shame that a small portion of these profits does not go back to the people who did work that is cornerstone to their success. I often wonder what the world would look like if being an open source developer was actually profitable. But we definitely know what the world looks like when being a open source developer essentially means having two full time jobs and getting paid for one.

I think the problem is people see it as "charity" and see charity work as less valuable. I remember Steve Levitt talking about a conversation he had with Bill Gates over this. I think it was in the PIMA episode where they discuss CEO pay for charities and how their pay is far less than that of a corporation, even if the work is the same.

robertlagrant

> We've been really bad at allocating capital to people who are building important and highly influential software

What does this mean? Can you give an example?

realo

Well... there you go!

https://en.m.wikipedia.org/wiki/XZ_Utils_backdoor

whoiscroberts

I took it to mean that we give money to people who ask for it

ncruces

Because some people just don't care where their code ends up.

Many people release code to the "public domain" (or under very liberal licenses). If those never cared if corporate entity™ used it in proprietary software, why should they care if LLM chews on it and regurgitates it out?

Also, it's far worse if entitled user® posts abusive issues to my repo, than if they copy snippets of my code through a LLM and are forced to support their inferior spitballed copy all by themselves.

csdvrx

> Because some people just don't care where their code ends up.

Yes, take me for example.

> Many people release code to the "public domain" (or under very liberal licenses).

In my case, the MIT license, because I saw it was popular, and I was afraid that in some places, "public domain" might cause unexpected legal issues to whoever wants to "play by the book" and use my code.

> if LLM chews on it and regurgitates it out

As work coming from a machine does not have copyright protection, whoever gets a LLM to spit out my code back can then claim it as their own, under whatever term they like.

If this person wants to contribute to a free software project and release the code under the GPL v2 or v3, good: it may help create a new feature that users will enjoy!

If this person wants to contribute to their company private software that's only available on a subscription basis (and let's say the subscription is sold at an eye-watering price), good: it means whoever pay this subscription will get more from their money, and whoever use the software may get a new feature they will enjoy!

Software has nearly 0 marginal costs. LLM is the closest thing to a Star-Trek level "replicator", getting everyone everything they want.

On which moral grounds would you object to a Star-Trek level replicator for physical good? (please make them good, as offering any food anyone may want would fix world hunger once and for all)

Then why object to that for virtual goods?

Maybe I'm reading too much into your reply, but I don't see it as trolling or negative faith.

I see variants of it in many places, and they all look to me very close to luddism: rejecting a new technology, because you fear for your own work, while ignoring what this technology will enable in the greater picture: for the orignal case of luddism, reducing the price of clothing for everyone by increasing production and decreasing labor, therefore allowing workers to get in other fields where they may try to satisfy other human wants - some that would be inconcievable to the original luddites like videogames

We should feel graceful we get more technology, as it removes constraints and make more people happy.

hnlmorg

I don’t think fearing one’s job is necessarily a bad reason because as much as I love the idea of a Star Trek utopia, real and present people have real responsibilities like children which are cared for from money generated by their careers.

This is particularly relevant in societies which take a dim view of their social responsibilities (I’m looking at you America) which means there’s less of a safety net should that career disappear.

We are already seeing more developers than job vacancies is the tech market, so this isn’t a theoretical concern either.

That all said, I don’t think hiding our valuable code for fear of LLMs is the right solution either. If your code is really that good then you’ll be more likely to secure your career by sharing your code because it builds a visible reputation that extends further than any verbiage on a CV might.

So while I don’t agree with the LLM excuse I can still completely understand why someone might cite it as a reason not to open their source.

Another valid reason is that some people have been completely burnt out deadline with entitled complaints from users. Thankfully I’ve had a mostly positive experience personally but I’ve read that others haven’t been so fortunate.

TeMPOraL

> On which moral grounds would you object to a Star-Trek level replicator for physical good? Then why object to that for virtual goods?

This just made me realize a distressing thing - if we ever built a replicator, a lot of people might then want to destroy it. For the same reason I believe they object to LLMs - greed and entitlement. Because they don't get to benefit personally, they don't get first right to refuse, the instinct is to deny the value to others. The Dog in the Manger.

ncruces

> Maybe I'm reading too much into your reply, but I don't see it as trolling or negative faith.

Maybe you are. All my repos are either MIT (where I'm a little proud, and would appreciate the acknowledgement - though realistically, I'd never sue anyone over it) or MIT-0.

So yeah, if it ends up in a LLM, and people copy it, great. Less "please give me free support" requests coming my end.

ben_w

> On which moral grounds would you object to a Star-Trek level replicator for physical good? (please make them good, as offering any food anyone may want would fix world hunger once and for all)

Unfortunately this is one topic in which my philosophy qualification comes in handy — "moral grounds" are so varied by people, that it's almost useless as an argument.

Consider the following list of examples, I expect most people in the world will object to at least one of these arguments, but which one(s) they object to will vary wildly:

1. Kantian Ethics: Replicators risk devaluing human labor by reducing work to a mere means, thereby undermining the inherent dignity derived from effort.

2. Aristotelian Virtue Ethics: By eliminating the need for craftsmanship and effort, replicators could impair the cultivation of virtues essential for personal and communal flourishing.

3. Marxist Ethics: The obsolescence of traditional labor due to replicators may intensify alienation and disrupt social structures central to class solidarity.

4. Existentialism: Removing material struggle through replicators might strip life of the challenges necessary for authentic self-creation and personal meaning.

5. Confucian Ethics: Such technology could erode the social harmony built on mutual effort and well-defined communal roles, destabilizing moral and familial bonds.

6. Environmental Ethics: Unlimited production enabled by replicators may encourage overconsumption and waste, endangering ecological balance and sustainable resource management.

7. Amish Ethics: Replicators could undermine the values of simplicity, humility, and communal labor by promoting dependence on technology instead of human effort and cooperation.

8. Consequentialism: While replicators, as defined by your question, can solve world hunger, they're also demonstrably able to build weapons (as can current 3d printers), and can function as Von Neumann self-replicating machines. A literal minefield made of these appeared in DS9, and giving such tech to the world almost unavoidably means also giving such tech to every psychopath. Also grey-goo/paperclip scenarios become plausible.

> Then why object to that for virtual goods?

You can't eat virtual cheese, and unlike The Matrix if you die in a video game you don't die in real life, so the arguments for/against AI don't even need to be the same as those for/against Trek-replicators.

umvi

The software I open is usually a gift to humanity/public service. I'm not seeking to gain anything. Anyone can use it for anything - for profit or not.

diggan

Or put the way I usually say it, in completely normal conversations:

> free of charge, to any person obtaining a copy of this software, to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software

anonym29

If I had a nickel for every time I repeated this exact line, verbatim in casual conversation...

ninininino

I suppose there's a different angle, which is that the Open community can distill the privately trained models and then open the distilled model in turn, like many believe Deepseek did. In effect, letting private corps pay for expensive training (w/o paying the authors of the data they are training on, as you correctly point out), but then benefiting from their training labor/cost by copying it back to the open community and making it free again.

pona-a

That does make me optimistic. "Stealing back" our stolen data does in the end a free model make — unless the current, very... unprecedented US admin decides distributing unauthorized distilled models carries a prison sentence.

But I think most of it is psychological. There used to be goodwill between the public and NLP researchers: what heartless monster would object to some linguists using the by-product of their conversations to make a computer learn that a "king - man + woman = queen" or generate some unintentionally comedic writing?

Now this honeymoon is over. You see that what you've been feeding your public life is now a monster with a hundred vices and a few good deeds. It is behind the tidal wave of spam and misinfo, it is the oracle breeding ignorance among the gullible, it is the iron hand of censorship for many a police state, but most insulting of all, it is sold by its makers as a replacement for any genuine talent or minimal human effort.

"Why learn to draw when you can have an AI produce a cheap imitation instead? Why learn math, CS, or foreign languages when you can delegate any and all thinking to the great machine? What did we even have you all for, anyway — intellectuals, artists, and craftsman — with your constant complaining and demands we learn a skill? Who do they think they are? Experts?"

No, the future belongs to the lazy and talentless, to thieves and usurpers, who will sit at the top with an aristocratic, borderline catatonic, brainlessness, while you will be at their knees, polishing their boots — since the machine to do so costs an order more than your meager wage.

It is anti-intellectualism in a form purified to industrial potency, directed at the very people by whose generosity their rather inept "replacements" were manufactured.

I can't say what's the rational response in all this. I can tell you what emotional response seems most appealing.

aaa_aaa

Unlike you, many people do not care what they think, write or utter is copied or used. Also some believes intellectual property is not property. Real thieves are the ones who got phony monopoly grants to protect it.

spudlyo

I personally hold that intellectual property isn't property, and is increasingly becoming a net negative to humanity as a whole. I see AI as an accelerant in the erosion of IP's relevance and enforceability. With AI being able to crank out derivative works at scale, it blurs the lines between infringement and transformation. The flood of such content makes enforcement increasingly impractical.

While I'm not unsympathetic to the plight of creatives, and their need to eat, I feel like the pendulum has swung so far to the interests of the copyright holders and away from the needs of the public that the bargain is no longer one I support.

treyd

Intellectual property isn't real. It's a fiction we constructed to try to control expression in order to allow extraction of profit from ideas. We had to keep exceptions like expiration and "fair use" to make it not absurd and obviously self-contradictory.

All LLMs are doing is shuffling around the roles to bring light to an underlying contradiction. Yes they are profiting off of unpaid labor, but what that actually means is the models themselves should be "property" of everyone just as the training data should be.

golergka

> Intellectual property isn't real. It's a fiction we constructed

Regardless of my opinion about IP in particular, argument "X isn't real, it's a fiction we constructed" is silly. We have "constructed" things like justice, duty, charity, mercy, and a lot of other social and moral constructs, and it's good that we did. They're also just as real as potential energy in physics: it's not a material object that you can see or touch, but it greatly affects what happens in reality.

treyd

Sure, but I would argue that those are concepts that have existed for millennia and has real material grounding in reality. Whereas intellectual property is entirely a fictional construction.

At its core, owning property involves the ability to use force to assert your control over it. This is completely impossible with ideas (and information more broadly) since they're non-physical, so it's not really property in the way real world property like land is.

So because it's not reflective of how the material world works, that's the heart of the contradiction I alluded to in my previous comment. There is no way to resolve the problem of LLMs from within the logical framework that doesn't lead to some further counterintuitive result. There has to be legislation around it if we want to keep the charade going, or ideally we'd want to drop it altogether.

zbobet2012

It depends on how long the time you spend in your c function is. cgo has a substantial overhead for calling. I tend to prefer just writing ASM functions for critical path code. You can use libraries like https://github.com/mmcloughlin/avo to make it easier to write/maintain.

bborud

Have you tried writing Go assembler instead of x86?

https://go.dev/doc/asm

(I'm not suggesting, merely asking since I haven't written any assembler for Intel for 30+ years and I have never written Go assembler)

nikolayasdf123

does not mention benchmarks. Go is unacceptably slow when it comes to math. with complete absence of SIMD CPU instructions (aka "do it yourself in assembly") and GPU/CUDA, Go is orders of magnitude slower than what you would get in C/C++/Rust or even Python or Java (that are calling C/C++)

Art9681

A Go optimization story: https://sourcegraph.com/blog/slow-to-simd

stpedgwdgfhgdd

Great to see these algorithms in Go. Finally I can study them at the implementation level as opposed to reading blogs.

neonsunset

Just scalar code? I was hoping to see some Goasm here for acceptable performance (or you could rewrite it in F#/C# which provide appropriate SIMD primitives).

edit: to answer my own question, when inspected with Ghidra, this implementation indeed compiles to very slow scalar code (operates on single fp64 values).

chrsig

i just hope for a sufficiently smart compiler shrug (i'm pretty sure go has some autovectorization)

before jumping to another language, I suggest perhaps examine the memory layout and access patterns.

neonsunset

The code there is written in a fairly auto-vectorizeable way. But the actual capabilities of Go's compiler are very far away from this despite public expectation (and autovectorization is brittle, writing inference or training in a way that relies on it is the last thing you want). To put it in perspective, until 2021 Go was always passing the data on the stack on function calls. It has improved since then but the overall design aims to ensure common scenarios are fast (e.g. comparisons against string literals are unrolled) but once you venture outside that or if it's an optimization that requires more compiler complexity - Go is far less likely to employ it.

chrsig

> and autovectorization is brittle, writing inference or training in a way that relies on it is the last thing you want)

I'm curious if you could speak more to this? Is the concern that operations may get reordered?

> To put it in perspective, until 2021 Go was always passing the data on the stack on function calls. It has improved since then but the overall design aims to ensure common scenarios are fast (e.g. comparisons against string literals are unrolled) but once you venture outside that or if it's an optimization that requires more compiler complexity - Go is far less likely to employ it.

I agree with this assesment.

The individual operations in the repository (e.g., dot product) look like they could be autovectorized. I'm assuming they aren't because of the use of a slice. I'm mildly curious if it could be massaged into something autovectorized.

Most of my observations re: autovectorization in go have been on fixed sized vectors and matrices where SSE2 instructions are pretty readily available and loop unrolling is pretty simple.

I'm curious what it would produce with the matrix in a single slice rather than independent allocations. Not curious enough to start poking at it, just curious enough to ramble about it conversationally.

ein0p

Inadvisable, IMO. This is not going to perform well. There are bindings for llama.cpp, I'd use that if I had to do things in Go. And yes, I'm aware that it calls into icky and uncouth C++, but it will be way faster, especially if you have some sort of acceleration hardware.

truth_seeker

Without using SIMD CPU instructions, its gonna be super expensive.

Some like viterin/vek or kelindar/simd package could be helpful.

Art9681

Here is an interesting article on Go optimizations with SIMD:

https://sourcegraph.com/blog/slow-to-simd

HN

Go-attention: A full attention mechanism and transformer in pure Go

Go-attention: A full attention mechanism and transformer in pure Go