Skip to content(if available)orjump to list(if available)

Define policy forbidding use of AI code generators

benlivengood

Open source and libre/free software are particularly vulnerable to a future where AI-generated code is ruled to be either infringing or public domain.

In the former case, disentangling AI-edits from human edits could tie a project up in legal proceedings for years and projects don't have any funding to fight a copyright suit. Specifically, code that is AI-generated and subsequently modified or incorporated in the rest of the code would raise the question of whether subsequent human edits were non-fair-use derivative works.

In the latter case the license restrictions no longer apply to portions of the codebase raising similar issues from derived code; a project that is only 98% OSS/FS licensed suddenly has much less leverage in takedowns to companies abusing the license terms; having to prove that infringers are definitely using the human-generated and licensed code.

Proprietary software is only mildly harmed in either case; it would require speculative copyright owners to disassemble their binaries and try to make the case that AI-generated code infringed without being able to see the codebase itself. And plenty of proprietary software has public domain code in it already.

zer00eyz

raincole

It's sailed, but towards the other way: https://www.bbc.com/news/articles/cg5vjqdm1ypo

jssjsnj

QEMU: Define policy forbidding use of AI code generators

deadbabe

If a software is truly wide open source in the sense of “do whatever the fuck you want with this code, we don’t care”, then it has nothing to fear from AI.

candiddevmike

Won't apply to closed source, not public code, which the GPL (QEMU uses) is quite good at ensuring becomes open source...

kgwxd

Can't release someone else's proprietary source under a "do whatever the fuck you want" license and actually do whatever the fuck you want, without getting sued.

deadbabe

It’d be like trying to squeeze blood from a stone

AJ007

I understand what experienced developers don't want random AI contributions from no-knowledge "developers" contributing to a project. In any situation, if a human is review AI code line by line that would tie up humans for years, even ignoring anything legally.

#1 There will be no verifiable way to prove something was AI generated beyond early models.

#2 Software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects. The only room for debate on that is an apocalypse level scenario where humans fail to continue producing semiconductors or electricity.

#3 If a project successfully excludes AI contributions (not clear how other than controlling contributions to a tight group of anti-AI fanatics), it's just going to be cloned, and the clones will leave it in the dust. If the license permits forking then it could be forked too, but cloning and purging any potential legal issues might be preferred.

There still is a path for open source projects. It will be different. There's going to be much, much more software in the future and it's not going to be all junk (although 99% might.)

amake

> #2 Software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects

Still waiting to see evidence of AI-driven projects eating the lunch of "traditional" projects.

viraptor

It's happening slowly all around. It's not obvious because people producing high quality stuff have no incentive at all to mark their changes as AI-generated. But there are also local tools generated faster than you could adjust existing tools to do what you want. I'm running 3 things now just for myself that I generated from scratch instead of trying to send feature requests to existing apps I can buy.

It's only going to get more pervasive from now on.

mcoliver

80-90% of Claude is now written by Claude

luqtas

that's like driving big personal vehicles and having a bunch of children and eating a bunch of meat and do nothing about because marine and terrestrial ecosystems weren't fully destroyed by global warming

blibble

> #2 Software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects

"competitive", meaning: "most features/lines of code emitted" might matter to a PHB or Microsoft

but has never mattered to open source

basilgohar

I feel like this is mostly proofless assertion. I'm aware what you hint at is happening, but the conclusions you arrive at are far from proven or even reasonable at this stage.

For what it's worth, I think AI for code will arrive at a place like how other coding tools sit – hinting, intellisense, linting, maybe even static or dynamic analysis, but I doubt NOT using AI will be a critical asset to productivity.

Someone else in the thread already mentioned it's a bit of an amplifier. If you're good, it can make you better, but if you're bad it just spreads your poor skills like a robot vacuum spreads animal waste.

galangalalgol

I think that was his point, the project full of bad developers isn't the competition. It is a peer whose skill matches yours and uses agents on top of that. By myself I am no match for myself + cline.

heavyset_go

Regarding #1, at least in the mainframe/cloud model of hosted LLMs, the operators have a history of model prompts and outputs.

For example, if using Copilot, Microsoft also has every commit ever made if the project is on GitHub.

They could, theoretically, determine what did or didn't come out of their models and was integrated into source trees.

Regarding #2 and #3, with relatively novel software like QEMU that models platforms that other open source software doesn't, LLMs might not be a good fit for contributions. Especially where emulation and hardware accuracy, timing, quirks, errata etc matter.

For example, modeling a new architecture or emulating new hardware might have LLMs generating convincing looking nonsense. Similarly, integrating them with newly added and changing APIs like in kvm might be a poor choice for LLM use.

A4ET8a8uTh0_v2

I am of two minds of it having now seen both good coders augmented by AI and bad coders further diminished by it ( I would even argue its worse than stack overflow, because back then they would at least would have had to adjust code a little bit ).

I am personally somewhere in the middle, just good enough to know I am really bad at this so I make sure that I don't contribute to anything that is actually important ( like QEMU ).

But how many people recognize their own strengths and weaknesses? That is part of the problem and now we are proposing that even that modicum of self-regulation ( as flawed as it is ) be removed.

FWIW, I hear you. I also don't have an answer. Just thinking out loud.

alganet

Quoting them:

> The policy we set now must be for today, and be open to revision. It's best to start strict and safe, then relax.

So, no need for the drama.

XorNot

A reasonable conclusion about this would simply be that the developers are saying "we're not merging anything which you can't explain".

Which is entirely reasonable. The trend of people say, on HN saying "I asked an LLM and this is what it said..." is infuriating.

It's just an upfront declaration that if your answer to something is "it's what Claude thinks" then it's not getting merged.

Filligree

That’s not what the policy says, however. You could be the world’s most honest person, using Claude only to generate code you described to it in detail and fully understand, and would still be forbidden.

Eisenstein

If AI can generate software so easily and which performs the expected functions, why do we even need to know that it did so? Isn't the future really just asking an AI for a result and getting that result? The AI would be writing all sorts of bespoke code to do the thing we ask, and then discard it immediately after. That is what seems more likely, and not 'so much software we have to figure out rights to'.

JonChesterfield

Interesting. Harder line than the LLVM one found at https://llvm.org/docs/DeveloperPolicy.html#ai-generated-cont...

I'm very old man shouting at clouds about this stuff. I don't want to review code the author doesn't understand and I don't want to merge code neither of us understand.

compton93

I don't want to review code the author doesn't understand

This really bothers me. I've had people ask me to do some task except they get AI to provide instructions on how to do the task and send me the instructions, rather than saying "Hey can you please do X". It's insulting.

andy99

Had someone higher up ask about something in my area of expertise. I said I didn't think is was possible, he followed up with a chatGPT conversation he had where it "gave him some ideas that we could use as an approach", as if that was some useful insight.

This is the same people that think that "learning to code" is a translation issue they don't have time for as opposed to experience they don't have.

a4isms

> This is the same people that think that "learning to code" is a translation issue they don't have time for as opposed to experience they don't have.

This is very, very germane and a very quotable line. And these people have been around from long before LLMs appeared. These are the people who dash off an incomplete idea on Friday afternoon and expect to see a finished product in production by next Tuesday, latest. They have no self-awareness of how much context and disambiguation is needed to go from "idea in my head" to working, deterministic software that drives something like a process change in a business.

joshstrange

I’ve started to experience/see this and it makes me want to scream.

You can’t dismiss it out of hand (especially with it coming from up the chain) but it takes no time at all to generate by someone who knows nothing about the problem space (or worse, just enough to be dangerous) and it could take hours or more to debunk/disprove the suggestion.

I don’t know what to call this? Cognitive DDOS? Amplified Plausibility Attack? There should be a name for it and it should be ridiculed.

candiddevmike

Imagine a boring dystopia where everyone is given hallucinated tasks from LLMs that may in some crazy way be feasible but aren't, and you can't argue that they're impossible without being fired since leadership lacks critical thinking.

null

[deleted]

alluro2

A friend experienced a similar thing at work - he gave a well-informed assessment of why something is difficult to implement and it would take a couple of weeks, based on the knowledge of the system and experience with it - only for the manager to reply within 5 min with a screenshot of an (even surprisingly) idiotic ChatGPT reply, and a message along the lines of "here's how you can do it, I guess by the end of the day".

I know several people like this, and it seems they feel like they have god powers now - and that they alone can communicate with "the AI" in this way that is simply unreachable by the rest of the peasants.

petesergeant

> Had someone higher up ask about something in my area of expertise. I said I didn't think is was possible, he followed up with a chatGPT conversation he had where it "gave him some ideas that we could use as an approach", as if that was some useful insight.

I would find it very insulting if someone did this to me, for sure, as well as a huge waste of my time.

On the other hand I've also worked with some very intransigent developers who've actively fought against things they simply didn't want to do on flimsy technical grounds, knowing it couldn't be properly challenged by the requester.

On yet another hand, I've also been subordinate to people with a small amount of technical knowledge -- or a small amount of knowledge about a specific problem -- who'll do the exact same thing without ChatGPT: fire a bunch of mid-wit ideas downstream that you have already thought about, but you then need to spend a bunch of time explaining why their hot-takes aren't good. Or the CEO of a small digital agency I worked at circa 2004 asking us if we'd ever considered using CSS for our projects (which were of course CSS heavy).

colechristensen

People keep asking me if AI is going to take my job and recent experience shows that it very much is not. AI is great for being mostly correct and then giving someone without enough context a mostly correct way to shoot themselves in the foot.

AI further encourages the problem in DevOps/Systems Engineering/SRE where someone comes to you and says "hey can you do this for me" having come up with the solution instead of giving you the problem "hey can you help me accomplish this"... AI gives them solutions which is more steps away to detangle into what really needs to be done.

AI has knowledge, but it doesn't have taste. Especially when it doesn't have all of the context a person with experience, it just has bad taste in solutions or just the absence of taste but with the additional problem that it makes it much easier for people to do things.

Permissions on what people have access to read and permission to change is now going to have to be more restricted because not only are we dealing with folks who have limited experience with permissions, now we have them empowered by AI to do more things which are less advisable.

alganet

In corporate, you are _forced_ to trust your coworker somehow and swallow it. Specially higher-ups.

In free software though, these kinds of nonsense suggestions always happened, way before AI. Just look at any project mailing list.

It is expected that any new suggestion will encounter some resistance, the new contributor itself should be aware of that. For serious projects specifically, the levels of skepticism are usually way higher than corporations, and that's healthy and desirable.

nijave

Especially when you try to correct them and they insist AI is the correct one

Sometimes it's fun reverse engineering the directions back into various forum, Stack Overflow, and documentation fragments and pointing out how AI assembled similar things into something incorrect

halostatue

I have just started adding DCO to _all_ of the open source code that I maintain and will be adding text like this on `CONTRIBUTING.md`:

---

LLM-Generated Contribution Policy

Color is a library full of complex math and subtle decisions (some of them possibly even wrong). It is extremely important that any issues or pull requests be well understood by the submitter and that, especially for pull requests, the developer can attest to the Developer Certificate of Origin for each pull request (see LICENCE).

If LLM assistance is used in writing pull requests, this must be documented in the commit message and pull request. If there is evidence of LLM assistance without such declaration, the pull request will be declined.

Any contribution (bug, feature request, or pull request) that uses unreviewed LLM output will be rejected.

---

I am also adding this to my `SECURITY.md` entries:

---

LLM-Generated Security Report Policy

Absolutely no security reports will be accepted that have been generated by LLM agents.

---

As it's mostly just me, I'm trying to strike a balance, but my preference is against LLM generated contributions.

phire

I do use GitHub copilot on my personal projects.

But I refuse to use it as anything more than a fancy autocomplete. If it suggests code that's pretty close to what I was about to type anyway, I accept it.

This ensures that I still understand my code, that there shouldn't be any hallucination derived bugs, [1] and there really shouldn't be any questions about copyright if I was about to type it.

I find using copilot this way speeds me up. Not really because my typing is slow, it's more that I have a habit of getting bored and distracted while typing. Copilot helps me get to the next thinking/debugging part sooner.

My brain really comprehend the idea that anyone would not want to not understand their code. Especially if they are going to submit it as a PR.

And I'm a little annoyed that the existence of such people is resulting in policies that will stop me from using LLMs as autocomplete when submitting to open source projects.

I have tried using copilot in other ways. I'd love for it to be able to do menial refactoring tasks for me. But every-time I experiment, it seems to fall off the rails so fast. Or it just ends up slower than what I could do manually because it has to re-generate all my code instead of just editing it.

[1] Though I find it really interesting that if I'm in the middle of typing a bug, copilot is very happy to autocomplete it in its buggy form. Even when the bug is obvious from local context, like I've typoed a variable name.

jitl

When I use LLM for coding tasks, it's like "hey please translate this YAML to structs and extract any repeated patterns to re-used variables". It's possible to do this transform with deterministic tools, but AI will do a fine job in 30s and it's trivial to test the new output is identical to the prompt input.

My high-level work is absolutely impossible to delegate to AI, but AI really helps with tedious or low-stakes incidental tasks. The other day I asked Claude Code to wire up some graphs and outlier analysis for some database benchmark result CSVs. Something conceptually easy, but takes a fair bit of time to figure out libraries and get everything hooked up unless you're already an expert at csv processing.

mistrial9

oh agree and amplify this -- graphs are worlds unto themselves. some of the high end published research papers have astounding contents, for example..

hsbauauvhabzb

You’re the exact kind of person I want to work with. Self reflective and in opposition of lazy behaviours.

rodgerd

This to me is interesting when it comes to free software projects; sure there are a lot of people contributing as their day job. But if you contribute or manage a project for the pleasure of it, things which undermine your enjoyment - cleaning up AI slop - are absolutely a thing to say "fuck off" over.

dheera

> I don't want to review code the author doesn't understand

The author is me and my silicon buddy. We understand this stuff.

recursive

Of course we understand it. Just ask us!

acedTrex

Oh hey, the thing I predicted in my blog titled "yes i will judge you for using AI" happened lol

Basically I think open source has traditionally HEAVILY relied on hidden competency markers to judge the quality of incoming contributions. LLMs throw that entire concept on its head by presenting code that has competent markers but none of the backing experience. It is a very very jarring experience for experienced individuals.

I suspect that virtual or in person meetings and other forms of social proof independent of the actual PR will become far more crucial for making inroads in large projects in the future.

SchemaLoad

I've started seeing this at work with coworkers using LLMs to generate code reviews. They submit comments which are way above their skill level which almost trick you in to thinking they are correct since only a very skilled developer would make these suggestions. And then ultimately you end up wasting tons of time proving how these suggestions are wrong. Spending far more time than the person pasting the suggestions spent to generate them.

Groxx

By far the largest review-effort PRs of my career have been in the past year, due to mid-sized LLM-built features. Multiple rounds of other signoffs saying "lgtm" with only minor style comments only for me to finally read it and see that no, it is not even remotely acceptable and we have several uses built by the same team that would fail immediately if it was merged, to say nothing of the thousands of other users that might also be affected. Stuff the reviewers have experience with and didn't think about because they got stuck in the "looks plausible" rut, rather than "is correct".

So it goes back for changes. It returns the next day with complete rewrites of large chunks. More "lgtm" from others. More incredibly obvious flaws, race conditions, the works.

And then round three repeats mistakes that came up in round one, because LLMs don't learn.

This is not a future style of work that I look forward to participating in.

diabllicseagull

funny enough I had coworkers who similarly had a hold of the jargon but without any substance. They would always turn out to be time sinks for others doing the useful work. AI imitating that type of drag on the workplace is kinda funny ngl.

beej71

I'm not really in the field any longer, but one of my favorite things to do with LLMs is ask for code reviews. I usually end up learning something new. And a good 30-50% of the suggestions are useful. Which actually isn't skillful enough to give it a title of "code reviewer", so I certainly wouldn't foist the suggestions on someone else.

acedTrex

Yep 100%, it is something I have also observed. Frankly has been frustrating to the point I spun up a quick one off html site to rant/get my thoughts out. https://jaysthoughts.com/aithoughts1

itsmekali321

send your blog link please

ants_everywhere

This is signed off primarily by RedHat, and they tend to be pretty serious/corporate.

I suspect their concern is not so much whether users have own the copyright to AI output but rather the risk that AI will spit out code from its training set that belongs to another project.

Most hypervisors are closed source and some are developed by litigious companies.

blibble

> but rather the risk that AI will spit out code from its training set that belongs to another project.

this is everything that it spits out

ants_everywhere

This is an uninformed take

Groxx

It is a legally untested take

duskwuff

I'd also worry that a language model is much more likely to introduce subtle logical errors, potentially ones which violate the hypervisor's security boundaries - and a user relying heavily on that model to write code for them will be much less prepared to detect those errors.

ants_everywhere

Generally speaking AI will make it easier to write more secure code. Tooling and automation help a lot with security and AI makes it easier to write good tooling.

I would wager good money that in a few years the most security-focused companies will be relying heavily on AI somewhere in their software supply chain.

So I don't think this policy is about security posture. No doubt human experts are reviewing the security-relevant patches anyway.

abhisek

> It's best to start strict and safe, then relax.

Makes total sense.

I am just wondering how do we differentiate between AI generated code and human written code that is influenced or copied from some unknown source. The same licensing problem may happen with human code as well especially for OSS where anyone can contribute.

Given the current usage, I am not sure if AI generated code has an identity of its own. It’s really a tool in the hand of a human.

Havoc

I wonder whether the motivation is really legal? I get the sense that some projects are just sick of reviewing crap AI submissions

esjeon

Possibly, but QEMU is such a critical piece software in our industry. Its application stretches from one end to the other - desktop VM, cloud/remote instance, build server, security sandbox, cross-platform environment, etc. Even a small legal risk can hurt the industry pretty badly.

gerdesj

The policy is concise and well bounded. It seems to me to assert that you cannot safely assign attribution of authorship of software code that you think was generated algorithmically.

I use the term algorithmic because I think it is stronger than "AI lol". I note they use terms like AI code generator in the policy, which might be just as strong but looks to me as unlikely to becoming a useful legal term (its hardly "a man on the Clapham omnibus").

They finish with this, rather reasonable flourish:

"The policy we set now must be for today, and be open to revision. It's best to start strict and safe, then relax."

No doubt they do get a load of slop but they seem to want to close the legal angles down first and attribution seems a fair place to start off. This play book looks way better than curl's.

Lerc

I'm not sure which way AI would move the dial when it comes to the median submission. Humans can, and do, make some crap code.

If the problem is too many submissions, that would suggest there needs to be structures in place to manage that.

Perhaps projects receiving lage quanties of updates need triage teams. I suspect most of the submissions are done in good faith.

I can see some people choosing to avoid AI due to the possibility of legal issues. I'm doubtful of the likelihood of such problems, but some people favour eliminating all possibly over minimizing likelihood. The philosopher in me feels like people who think they have eliminated the possibility of something just haven't thought about it enough.

ehnto

Barrier of entry, automated submissions are two aspects I see changing with AI. You at least have to be able to code before submitting bad code.

With AI you're going to get job hunters automating PRs for big name projects so they can stick the contributions in their resume.

catlifeonmars

> If the problem is too many submissions, that would suggest there needs to be structures in place to manage that. > Perhaps projects receiving lage quanties of updates need triage teams. I suspect most of the submissions are done in good faith.

This ignores the fact that many open source projects do not have the resources to dedicate to a large number of contributions. A side effect of LLM generated code is probably going to be a lot of code. I think this is going to be an issue that is not dependent on the overall quality of the code.

bobmcnamara

Have you seen how Monsanto enforces their seed right?

SchemaLoad

This could honestly break open source, with how quickly you can generate bullshit, and how long it takes to review and reject it. I can imagine more projects going the way of Android where you can download the source, but realistically you can't contribute as a random outsider.

b00ty4breakfast

I have an online acquaintance that maintains a very small and not widely used open-source project and the amount of (what we assume to be) automated AI submissions* they have to wade through is kinda wild given the very small number of contributors and users the thing has. It's gotta be clogging up these big projects like a DDoS attack.

*"Automated" as in bots and "AI submissions" as in ai-generated code

zahlman

For many projects you realistically can't contribute as a random outsider anyway, simply because of the effort involved in grokking enough of the existing architecture to figure out where to make changes.

hollerith

I've always thought that the possibility of forking the project is the main benefit to open-source licensing, and we know Android can be forked.

ants_everywhere

the primary benefit of open source is freedom

api

Quality contributions to OSS are rare unless the project is huge.

loeg

Historically the opposite of quality contributions has been no contributions, not net-negative contributions (random slop that costs more in review than it provides benefit).

disconcision

i mean they say the policy is open for revision and it's also possible to make exceptions; if it's an excuse, they are going out of their way to let people down easy

hughw

I'd hope there could be some distinction between using LLM as a super autocomplete in your IDE, vs giving it high-level guidelines and making it generate substantive code. It's a gray area, sure, but if I made a contribution I'd want to be able to use the labor-saving feature of Copilot, say, without danger of it copying an algorithm from open source code. For example, today I generated a series of case statements and Copilot detected the pattern and saved me tons of typing.

dheera

That and also just AI glasses that become an extension of my mind and body, just giving me clues and guidance on everything I do including what's on my screen.

I see those glasses as becoming just a part of me, just like my current dumb glasses are a part of me that enables me to see better, the smart glasses will help me to see AND think better.

My brain was trained on a lot of proprietary code as well, the copyright issues around AI models are pointless western NIMBY thinking and will lead to the downfall of western civilization if they keep pursuing legal what-ifs as an excuse to reject awesome technology.

wyldfire

I understand where this comes from but I think it's a mistake. I agree it would be nice if there were "well settled law" regarding AI and copyright, probably relatively few rulings and next to zero legislation on which to base their feelings.

In addition to a policy to reject contributions from AI, I think it may make sense to point out places where AI generated content can be used. For example - how much of QEMU project's (copious) CI setup is really stuff that is critical content to protect? What about ever-more interesting test cases or environments that could be enabled? Something like "contribute those things here instead, and make judicious use of AI there, with these kinds of guard rails..."

dclowd9901

What's the risk of not doing this? Better code but slower velocity for an open source project?

I think that particular brand of risk makes sense for this particular project, and the authors don't seem particularly negative toward GenAI as a concept, just going through a "one way door" with it.

dijksterhuis

It's a simpler solution is just to wait until legal situation is clearer.

QEMU is (mostly) GPL 2.0 licensed, meaning (most) code contributions need to be GPL 2.0 compatible [0]. Let's say, hypothetically, there's a code contribution added by some patch involving gen AI code which is derived/memorised/copied from non-GPL compatible code [1]. Then, hypothetically, a legal case sets precedent that gen AI FOSS code must re-apply the license of the original derived/memorised/copied code. QEMU maintainers would probably need to roll back all those incompatible code contributions. After some time, those code contributions could have ended up with downstream callers which also need to be rewritten (even in CI code).

It might be possible to first say "only CI code which is clearly labelled as 'DO NOT RE-USE: AI' or some such". But the maintainers would still need to go through and rewrite those parts of the CI code if this hypothetical plays out. Plus it adds extra work to reviews and merge processes etc.

it's just less work and less drama for everyone involved to say "no thank you (for now)".

----

caveat: IANAL, and licensing is not my specific expertise (but i would quite like it to be one day)

[0]: https://github.com/qemu/qemu/blob/master/LICENSE

[1]: e.g. No license / MPL / Apache / Aritistic / Creative Commons https://www.gnu.org/licenses/license-list.html#NonFreeSoftwa...

pavon

This isn't like some other legal questions that go decades before being answered in court. There are dozens of cases working through the courts today that will shed light on some aspects of the copyright questions within a few years. QEMU has made great progress over the last 22 years without the aid of AI, waiting a few more years isn't going to hurt them.

hinterlands

I think you need to read between the lines here. Anything you do is a legal risk, but this particular risk seems acceptable to many of the world's largest and richest companies. QEMU isn't special, so if they're taking this position, it's most likely simply because they don't want to deal with LLM-generated code for some other reason, are eager to use legal risk as a cover to avoid endless arguments on mailing lists.

We do that in corporate environments too. "I don't like this" -> "let me see what lawyers say" -> "a-ha, you can't do it because legal says it's a risk".

kazinator

There is a well settled practice in computing that you just don't plagiarize code. Even a small snippet. Even if copyright law would consider such a small thing "fair use".

bfLives

> There is a well settled practice in computing that you just don't plagiarize code. Even a small snippet.

I think way many developers use StackOverflow suggests otherwise.

kazinator

In the first place, in order to post to StackOverflow, you are required to have the copyright over the code, and be able to grant them a perpetual license.

They redistribute the material under the CC BY-SA 4.0 license. https://creativecommons.org/licenses/by-sa/4.0/

This allows visitors to use the material, with attribution. One can, of course, use the ideas in a SO answer to develop one's own solution.

9283409232

This isn't 100% true meaning it isn't well settled. Have people already forgotten Google vs Oracle? Google ended up winning that after years and years but the judgements went back and forth and there are around 4 or 5 guidelines to determine whether something is or isn't fair use and generative AI would fail at a few of those.

kazinator

Google vs. Oracle was about whether APIs are copyrightable, which is an important issue that speaks to antitrust. Oracle wanted the interface itself to be copyrighted so that even if someone reproduced the API from a description of it, it would infringe. The implication being that components which clone an API would be infringing, even though their implementation is original, discouraging competitors from making API-compatible components.

My comment didn't say anything about the output of AI being fair use or not, rather that fair use (no matter where you are getting material from) ipso facto doesn't mean that copy paste is considered okay.

Every employer I ever had discouraged copy and paste from anywhere as a blanket rule.

At least, that had been the norm, before the LLM takeover. Obviously, organizations that use AI now for writing code are plagiarizing left and right.

null

[deleted]

sysmax

I wish people would make distinction regarding the size/scope of the AI-generated parts. Like with video copyright laws, where a 5-second clip from a copyrighted movie is usually considered fair use and not frowned upon.

Because for projects like QEMU, current AI models can actually do mind-boggling stuff. You can give it a PDF describing an instruction set, and it will generate you wrapper classes for emulating particular instructions. Then you can give it one class like this and a few paragraphs from the datasheet, and it will spit out unit tests checking that your class works as the CPU vendor describes.

Like, you can get from 0% to 100% test coverage several orders of magnitude faster than doing it by hand. Or refactoring, where you want to add support for a particular memory virtualization trick, and you need to update 100 instruction classes based on straight-forward, but not 100% formal rule. A human developer would be pulling their hairs out, while an LLM will do it faster than you can get a coffee.

762236

It sounds like you're saying someone could rewrite Qemu on their own, with the help of AI. That would be pretty funny.

echelon

Qemu can make the choice to stay in the "stone age" if they want. Contributors who prefer AI assistance can spend their time elsewhere.

It might actually be prudent for some (perhaps many foundational) OSS projects to reject AI until the full legal case law precedent has been established. If they begin taking contributions and we find out later that courts find this is in violation of some third party's copyright (as shocking as that outcome may seem), that puts these projects in jeopardy. And they certainly do not have the funding or bandwidth to avoid litigation. Or to handle a complete rollback to pre-AI background states.

null

[deleted]

Aeolun

This seems absolutely impossible to enforce. All my editors give me AI assisted code hints. Zed, cursor, VS code. All of them now show me autocomplete that comes from an LLM. There's absolutely no distinction between that code, and code that I've typed out myself.

It's like complaining that I may have no legal right to submit my stick figure because I potentially copied it from the drawing of another stick figure.

I'm firmly convinced that these policies are only written to have plausible deniability when stuff with generated code gets inevitably submitted anyway. There's no way the people that write these things aren't aware they're completely unenforceable.

luispauloml

> I'm firmly convinced that these policies are only written to have plausible deniability when stuff with generated code gets inevitably submitted anyway.

Of course it is. And nobody said otherwise, because that is explicitly stated on the commit message:

    [...] More broadly there is,
    as yet, no broad consensus on the licensing implications of code
    generators trained on inputs under a wide variety of licenses
And in the patch itself:

    [...] With AI
    content generators, the copyright and license status of the output is
    ill-defined with no generally accepted, settled legal foundation.
What other commenters pointed out is that, beyond the legal issue, other problems also arise form the use of AI-generated code.

teeray

It’s like the seemingly-confusing gates passing through customs that say “nothing to declare” when you’ve already made your declarations. Walking through that gate is a conscious act that places culpability on you, so you can’t simply say “oh, I forgot” or something.

The thinking here is probably similar: if AI-generated code becomes poisonous and is detected in a project, the DCO could allow shedding liability onto the contributor that said it wasn’t AI-generated.

Filligree

> Of course it is. And nobody said otherwise, because that is explicitly stated on the commit message

Don’t be ridiculous. The majority of people are in fact honest, and won’t submit such code; the major effect of the policy is to prevent those contributions.

Then you get plausible deniability for code submitted by villains, sure, but I’d like to hope that’s rare.

raincole

I think most people don't make money by submitting code to QEMU, so there isn't that much incentive to cheat.

shmerl

Neovim doesn't force you to use AI, unless you configure it yourself. If your editor doesn't allow you to switch it off, there must be a big problem with it.

naveed125

Coolest thing I've seen today.