Watching AI drive Microsoft employees insane

579 comments

·May 21, 2025

diggan

Interesting that every comment has "Help improve Copilot by leaving feedback using the or buttons" suffix, yet none of the comments received any feedback, either positive or negative.

> This seems like it's fixing the symptom rather than the underlying issue?

This is also my experience when you haven't setup a proper system prompt to address this for everything an LLM does. Funniest PRs are the ones that "resolves" test failures by removing/commenting out the test cases, or change the assertions. Googles and Microsofts models seems more likely to do this than OpenAIs and Anthropics models, I wonder if there is some difference in their internal processes that are leaking through here?

The same PR as the quote above continues with 3 more messages before the human seemingly gives up:

> please take a look

> Your new tests aren't being run because the new file wasn't added to the csproj

> Your added tests are failing.

I can't imagine how the people who have to deal with this are feeling. It's like you have a junior developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

Another PR: https://github.com/dotnet/runtime/pull/115732/files

How are people reviewing that? 90% of the page height is taken up by "Check failure", can hardly see the code/diff at all. And as a cherry on top, the unit test has a comment that say "Test expressions mentioned in the issue". This whole thing would be fucking hilarious if I didn't feel so bad for the humans who are on the other side of this.

surgical_fire

> I can't imagine how the people who have to deal with this are feeling. It's like you have a junior developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

That comparison is awful. I work with quite a few Junior developers and they can be competent. Certainly don't make the silly mistakes that LLMs do, don't need nearly as much handholding, and tend to learn pretty quickly so I don't have to keep repeating myself.

LLMs are decent code assistants when used with care, and can do a lot of heavy lifting, they certainly speed me up when I have a clear picture of what I want to do, and they are good to bounce off ideas when I am planning for something. That said, I really don't see how it could meaningfully replace an intern however, much less an actual developer.

safety1st

These GH interactions remind me of one of those offshore software outsourcing firms on Upwork or Freelancer.com that bid $3/hr on every project that gets posted. There's a PM who takes your task and gives it to a "developer" who potentially has never actually written a line of code, but maybe they've built a WordPress site by pointing and clicking in Elementor or something. After dozens of hours billed you will, in fact, get code where the new file wasn't added to the csproj or something like that, and when you point it out, they will bill another 20 hours, and send you a new copy of the project, where the test always fails. It's exactly like this.

Nice to see that Microsoft has automated that, failure will be cheaper now.

dkdbejwi383

This gives me flashbacks to when my big corporate former employer outsourced a bunch of work offshore.

An outsourced contractor was tasked with a very simple job as their first task - update a single dependency, which required just a bump of the version and no code changes - after three days of them seemingly struggling to even understand what they were asked to do, inability to clone the repo, failure to install the necessary tooling on their machine, they ended up getting fired from the project. Complete waste of money, and the time of those of us having to delegate and review this work.

AbstractH24

> These GH interactions remind me of one of those offshore software outsourcing firms on Upwork or Freelancer.com that bid $3/hr on every project that gets posted

Those have long been the folks I’ve seen at the biggest risk of being replaced by AI. Tasks that didn’t rely on human interaction or much training, just brute force which can be done from anywhere.

And for them, that $3/hr was really good money.

voxic11

Actually the AI might still be more expensive at this point. But give it a few years I'm sure they will get the costs down.

kamaal

>>These GH interactions remind me of one of those offshore software outsourcing firms on Upwork or Freelancer.com that bid $3/hr on every project that gets posted.

This level of smugness is why outsourcing still continues to exist. The kind of things you talk about were rare. And were mostly exaggerated to create anti-outsourcing narrative. None of that led to outsourcing actually going away simply because people are actually getting good work done.

Bad quality things are cheap != All cheap things are bad.

Same will work with AI too, while people continue to crap on AI, things will only improve, people will be more productive with AI, get more and bigger things done for cheaper and better. This is just inevitable given how things are going now.

>>There's a PM who takes your task and gives it to a "developer" who potentially has never actually written a line of code, but maybe they've built a WordPress site by pointing and clicking in Elementor or something.

In the peak of outsourcing wave. Both the call center people and IT services people had internal training and graduation standards that were quite brutal and mad attrition rates.

Exams often went along the lines of having to write whole ass projects without internet help in hours. Theory exams that had like -2 marks on getting things wrong. Dozens of exams, projects, coding exams, on-floor internships, project interviews.

>>After dozens of hours billed you will, in fact, get code where the new file wasn't added to the csproj or something like that, and when you point it out, they will bill another 20 hours, and send you a new copy of the project, where the test always fails. It's exactly like this.

Most IT services billing had pivoted away from hourly billing, to fixed time and material in the 2000s itself.

>>It's exactly like this.

Very much like outsourcing. AI is here to stay man. Deal with it. Its not going anywhere. For like $20 a month, companies will have same capability as a full time junior dev.

This is NOT going away. Its here to stay. And will only get better with time.

sbarre

I think that was the point of the comparison..

It's not like a regular junior developer, it's much worse.

spacemadness

And yet it got the job and lots of would be juniors didn’t, and it seems to be costing the company more in compute and senior dev handholding. Nice work silicon valley.

preisschild

> That said, I really don't see how it could meaningfully replace an intern however

And even if it could, how do you get senior devs without junior devs? ^^

surgical_fire

What is making it difficult for Junior devs to be hired is not AI. That is a diversion.

The raise in interest rates a couple of years ago triggered many layoffs in the industry. When that happens salaries are squeezed. Experienced people work for less, and juniors have trouble finding job because they are now competing against people with plenty of experience.

theamk

Simple, there are always people who are intentionally using the hard way. There is a community programming old 16-bit machines for example, which are much harder than modern tools. Or someone learning assembly language "just for fun".

Some of those (or similar) people will actually learn new stuff and become senior devs. Yes, there will be much fewer of them, so they'll command a higher salary, and they will deliver amazing stuff. The rest, who spend their entire carrier being AI handlers, will never raise above junior/mid level.

(Well, either that or people who cannot program by themselves will get promoted anyway, the software will get more bugs and less features, and things will be generally worse for both consumers and programmers... but I prefer not to think about this option)

lazide

Sounds like a next quarter problem (I wish it was /s).

kaycey2022

> It's like you have a senior phd level intelligence developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

Is that better?

PKop

Did you miss the "except" in his sentence? He was making the point this is worse than junior devs for all reasons listed.

surgical_fire

I was agreeing with him, by saying that the comparison is awful.

Not sure how it can be read otherwise.

yubblegum

This field (SE - when I started out back in late 80s) was enjoyable. Now it has become toxic, from the interview process, to imitating "big tech" songs and dances by small fry companies, and now this. Is there any joy left in being a professional software developer?

bluefirebrand

Making quite a bit of money brings me a lot of joy compared to other industries

But the actual software part? I'm not sure anymore

diggan

> This field (SE - when I started out back in late 80s) was enjoyable. Now it has become toxic

I feel the same way today, but I got started around 2012 professionally. I wonder how much of this is just our fading optimism after seeing how shit really works behind the scenes, and how much the industry itself is responsible for it. I know we're not the only two people feeling this way either, but it seems all of us have different timescales from when it turned from "enjoyable" to "get me out of here".

salawat

My issue stems from the attitudes of the people we're doing it for. I started out doing it for humanity. To bring the bicycle for the mind to everyone.

Then one day I woke up and realized the ones paying me were also the ones using it to run over or do circles around everyone else not equipped with a bicycle yet; and were colluding to make crippled bicycles that'd never liberate the masses as much as they themselves had been previously liberated; bicycles designed to monitor, or to undermine their owner, or more disgustingly, their "licensee".

So I'm not doing it anymore. I'm not going to continue making deliberately crippled, overly complex, legally encumbered bicycles for the mind, purely intended as subjects for ARR extraction.

vrighter

I started coding at a young age, but entered the professional world in 2012, just like you. I feel the same. I just can't come to grips with the fact that the goal is not to write good software anymore, but to get something, anything out the door on which we can then sell by marketing it based on stuff it doesn't do yet (but it will, we promise!) so that we can make more money and fake making something "new" again (putting a textbox and a button, and hooking it up to an LLM api). Software is nowadays assumed to not work properly. And we're not allowed to fix it anymore!

null

[deleted]

bwfan123

It happens in waves. For a period, there was an oversupply of cs engineers, and now, the supply will shrink. On top of this, the BS put out by AI code will require experienced engineers to fix.

So, for experienced engineers, I see a great future fixing the shit show that is AI-code.

throwaway2037

Each time that I arrive at a new job, I take some time to poke around at the various software projects. If the state of the union is awful, I always think: "Great: No where to go but up." If the state of the union is excellent, I think: "Uh oh. I will probably drag down the average here and make it a little bit worse, because I am an average software dev."

raxxorraxor

So many little scripts are spawned and they are all shit for production. I stopped reviewing them, pray to the omnissiah now and spray incense into our server room to placate the machine gods.

Because that shit makes you insane as well.

iamleppert

No, there is absolutely no joy left.

coldpie

I've been looking at getting a CDL and becoming a city bus driver, or maybe a USPS driver or deliveryman or clerk or something.

yubblegum

I hear you. Same boat just can't figure out the life jacket yet. (You do fine wood work, why not that? I am considering finding entry level work in architecture myself - kicking myself for giving that up for software now. Did not see this shit show coming.)

sweman

[flagged]

camdenreslink

A very very small percentage of professional software developers get that.

mrweasel

At least we can tell the junior developers to not submit a pull-request before they have the tests running locally.

At what point does the human developers just give up and close the PRs as "AI garbage". Keep the ones that works, then just junk the rest. I feel that at some point entertaining the machine becomes unbearable and people just stops doing it or rage close the PRs.

pydry

When their performance reviews stop depending upon them not doing that.

Microsoft's stock price is dependent on them proving that this is a success.

Qem

> Microsoft's stock price is dependent on them proving that this is a success.

Perhaps this explains the recent firings that affected faster CPython and other projects. While they throw money at AI but sucess still doesn't materialize, they need to make the books look good for yet another quarter through the old-school reliable method of laying off people left and right.

mrweasel

What happens when they can't prove that and development efficiency starts falling, because developers spend 50% of their time battling copilot?

null

[deleted]

throwaway2037

    > rage close the PRs

I am shaking with laughter reading this phrase. You got me good here. It is the perfect repurpose of "rage quit" for the AI slop era. I hope that we see some MSFT employees go insane from responding to so many shitty PRs from LLMs.

One of my all time "rage quit" stories is Azer Koçulu of npm left-pad incident infamy. That guy is my Internet hero -- "fight the power".

microtherion

Better yet, deploy their own LLM to close the PRs.

throwup238

> Interesting that every comment has "Help improve Copilot by leaving feedback using the or buttons" suffix, yet none of the comments received any feedback, either positive or negative.

The feedback buttons open a feedback form modal, they don’t reflect the number of feedback given like the emoji button. If you leave feedback, it will reflect your thumbs up/down (hiding the other button), it doesn’t say anything about whether anyone else has left feedback (I’ve tried it on my own repos).

belter

This whole thread from yesterday take a whole different meaning: https://news.ycombinator.com/item?id=44031432

Comment in the GitHub discussion:

"...You and I and every programmer who hasn't been living under a rock knows that AI isn't ready to be adopted at this scale yet, on the premier; 100M-user code-hosting platform. It doesn't make any sense except in brain-washed corporate-talk like "we are testing today what it can do tomorrow".

I'm not saying that this couldn't be an adequate change some day, perhaps even in a few years but we all know this isn't it today. It's 100% financial-driven hype with a pinch of we're too big to fail mentality..."

namaria

"Big data" -> "Cloud" -> "LLM-as-A(G)I"

It's all just recycled rent seeking corporate hype for enterprise compute.

The moment I had decided to learn Kubernetes years ago, got a book and saw microservices compared to 'object-oriented' programming I realized that. The 'big ball of mud' paper and the 'worse is better' rant frame it all pretty well in my view. Prioritize velocity, get slop in production, cope with the accidental complexity, rinse repeat. Eventually you get to a point where GPU farms seem like a reasonable way to auto-complete code.

When you find yourself in a hole, stop digging. Any bigger excavator you send down there will only get buried when the mud crashes down.

vasco

> improve Copilot by leaving feedback using the or buttons" suffix, yet none of the comments received any feedback, either positive or negative

Why do they even need it? Success is code getting merged 1st shot, failure gets worse the more requests for changes the agent gets. Asking for manual feedback seems like a waste of time. Measure cycle time and rate of approvals and change failure rate like you would for any developer.

dfxm12

It's like you have a junior developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

Anyone who has dealt with Microsoft support knows this feeling well. Even talking to the higher level customer success folks feels like talking to a brick wall. After dozens of support cases, I can count on zero hands the number of issues that were closed satisfactorily.

I appreciate Microsoft eating their dogfood here, but please don't make me eat it too! If anyone from MS is reading this, please release finished products that you are prepared to support!

xnorswap

> How are people reviewing that? 90% of the page height is taken up by "Check failure",

Typically, you wouldn't bother manually reviewing something until the automated checks have passed.

diggan

I dunno, when I review code, I don't review what's automatically checked anyways, but thinking about the change/diff in a broader context, and whatever isn't automatically checked. And the earlier you can steer people in the right direction, the better. But maybe this isn't the typical workflow.

Cthulhu_

It's a waste of time tbh; fixing the checks may require the author to rethink or rewrite their entire solution, which means your review no longer applies.

Let them finish a pull request before spending time reviewing it. That said, a merge request needs to have an issue written before it's picked up, so that the author does not spend time on a solution before the problem is understood. That's idealism though.

xnorswap

The reality is more nuanced, there are situations where you'd want to glance over it anyway, such as looking for an opportunity to coach a junior dev.

I'd rather hop in and get them on the right path rather than letting them struggle alone, particularly if they're struggling.

If it's another senior developer though I'd happily leave them to it to get the unit tests all passing before I take a proper look at their work.

But as a general principle, please at least get a PR through formatting checks before assigning it to a person.

phkahler

>> And the earlier you can steer people in the right direction, the better.

The earliest feedback you can get comes from the compiler. If it won't build successfully don't submit the PR.

kruuuder

A comment on the first pull request provides some context:

> The stream of PRs is coming from requests from the maintainers of the repo. We're experimenting to understand the limits of what the tools can do today and preparing for what they'll be able to do tomorrow. Anything that gets merged is the responsibility of the maintainers, as is the case for any PR submitted by anyone to this open source and welcoming repo. Nothing gets merged without it meeting all the same quality bars and with us signing up for all the same maintenance requirements.

abxyz

The author of that comment, an employee of Microsoft, goes on to say:

> It is my opinion that anyone not at least thinking about benefiting from such tools will be left behind.

The read here is: Microsoft is so abuzz with excitement/panic about AI taking all software engineering jobs that Microsoft employees are jumping on board with Microsoft's AI push out of a fear of "being left behind". That's not the confidence inspiring the statement they intended it to be, it's the opposite, it underscores that this isn't the .net team "experimenting to understand the limits of what the tools" but rather the .net team trying to keep their jobs.

Verdex

The "left behind" mantra that I've been hearing for a while now is the strange one to me.

Like, I need to start smashing my face into a keyboard for 10000 hours or else I won't be able to use LLM tools effectively.

If LLM is this tool that is more intuitive than normal programming and adds all this productivity, then surely I can just wait for a bunch of others to wear themselves out smashing the faces on a keyboard for 10000 hours and then skim the cream off of the top, no worse for wear.

On the other hand, if using LLMs is a neverending nightmare of chaos and misery that's 10x harder than programming (but with the benefit that I don't actually have to learn something that might accidentally be useful), then yeah I guess I can see why I would need to get in my hours to use it. But maybe I could just not use it.

"Left behind" really only makes sense to me if my KPIs have been linked with LLM flavor aid style participation.

Ultimately, though, physics doesn't care about social conformity and last I checked the machine is running on physics.

spiffytech

There's a third way things might go: on the way to "superpower for everyone", we go through an extended phase where AI is only a superpower in skilled hands. The job market bifurcates around this. People who make strong use of it get first pick of the good jobs. People not making effective use of AI get whatever's left.

Kinda like how word processing used to be an important career skill people put on their resumes. Assuming AI becomes as that commonplace and accessible, will it happen fast enough that devs who want good jobs can afford to just wait that out?

Vicinity9635

If you're not using it where it's useful to you, then I still wouldn't say you're getting left behind, but you're making your job harder than it has to be. Anecdotally I've found it useful mostly for writing unit tests and sometimes debugging (can be as effective as a rubber duck).

It's like the 2025 version not not using an IDE.

It's a powerful tool. You still need to know when to and when not to use it.

marcosdumay

> It's like the 2025 version not not using an IDE.

That's right on the mark. It will save you a little bit of work on tasks that aren't the bottleneck on your productivity, and disrupt some random tasks that may or may not be important.

It's makes so little difference that plenty of people in 2025 don't use an IDE, and looking at their performance from the outside one just can't tell.

Except that LLMs have less potential to improve your tasks and more potential to be disruptive.

static_void

Tests are one of the areas where it performs least well. I can ask an LLM to summarize the functionality of code and be happy with the answer, but the tests it writes are the most facile unit tests, just the null hypothesis tests and the like. "Here's a test that the constructor works." Cool.

null

[deleted]

the-lazy-guy

This is Stephen Toub, who is the lead of many important .NET projects. I don't think he is worried about losing job anytime soon.

I think, we should not read too much into it. He is honestly exploring how much this tool can help him to resolve trivial issues. Maybe he was asked to do so by some of his bosses, but unlikely to fear the tool replacing him in the near future.

n8cpdx

They don’t have any problem firing experienced devs for no reason. Including on the .NET team (most of the .NET Android dev team was laid off recently).

https://www.theregister.com/2025/05/16/microsofts_axe_softwa...

Perhaps they were fired for failing to show enthusiasm for AI?

low_tech_love

I love the fact that they seem to be asking it to do simple things because ”AI can do the simple boring things for us so we can focus on the important problems” and then it floods them with so many meaningless mumbo jumbo that they could have probably done the simple thing in a fraction of the time they take to keep correcting it continuously.

sensanaty

Didn't M$ just fire like 7000 people, many of which were involved in big important M$ projects? The CPython guys, for example.

spacemadness

Anyone not showing open AI enthusiasm at that level will absolutely be fired. Anyone speaking for MS will have to be openly enthusiastic or silent on the topic by now.

hnthrow90348765

TBF they are dogfooding this (good) but it's just not going well

davidgerard

"eating our own dogshit"

dmix

> Microsoft employees are jumping on board with Microsoft's AI push out of a fear of "being left behind"

If they weren't experimenting with AI and coding and took a more conservative approach, while other companies like Anthropic was running similar experiments, I'm sure HN would also be critiquing them for not keeping up as a stodgy big corporation.

As long as they are willing to take risks by trying and failing on their own repos, it's fine in my books. Even though I'd never let that stuff touch a professional github repo personally.

jayGlow

exactly ignoring new technologies can be a death sentence for a company even one as large as Microsoft. even if this technology doesn't pay off its still a good idea to at least look into potential uses.

username135

i dont think hey are mutually exclusive. jumping on board seems like the smart move if you're worried about losing your career. you also get to confirm your suspicions.

lcnPylGDnU4H9OF

This is important context given that it would be absurd for the managers to have already drawn a definitive conclusion about the models’ capabilities. An explicit understanding that the purpose of the exercise is to get a better idea of the current strengths and weaknesses of the models in a “real world” context makes this actually very reasonable.

mrguyorama

So why in public, and why in the most ham-fisted way, and why on important infrastructure, and why in such a terrible integration that it can't even verify that things compile before opening a PR!

In my org, we would have had to bypass precommit hooks to do this!

rsynnott

Beyond every other absurdity here, well, maybe Microsoft is different, but I would never assign a PR that was _failing CI_ to somebody. That that's happening feels like an admission that the thing doesn't _really_ work at all; if it worked even slightly, it would at least only assign passing PRs, but presumably it's bad enough that if they put in that requirement there would be no PRs.

sbarre

I feel like everyone is applying a worse-case narrative to what's going on here..

I see this as a work in progress.. I am almost certain the humans in the loop on these PRs are well aware of what's going on and have their expectations in check, and this isn't just "business as usual" like any other PR or work assignment.

This is a test. You can't improve a system without testing it on real world conditions.

How do we know they're not tweaking the Copilot system prompts and settings behind the scenes while they're doing this work?

Can no one see the possibility that what is happening in those PRs is exactly what all the people involved expected to have happen, and they're just going through the process of seeing what happens when you try to refine and coach the system to either success or failure?

When we adopted AI coding assist tools internally over a year ago we did almost exactly this (not directly in GitHub though).

We asked a bunch of senior engineers to see how far they could get by coaching the AI to write code rather than writing it themselves. We wanted to calibrate our expectations and better understand the limits, strengths and weaknesses of these new tools we wanted to adopt.

In most of those early cases we ended up with worse code than if it had been written by humans, but we learned a ton. We can also clearly see how much better things have gotten over time, since we have that benchmark to look back on.

rco8786

I think people would be more likely to adopt this view if the overall narrative about AI is that it’s a work in progress and we expect it to get magnitudes better. But the narrative is that AI is already replacing human software engineers.

codyvoda

[flagged]

phkahler

>> I see this as a work in progress.. I am almost certain the humans in the loop on these PRs are well aware of what's going on and have their expectations in check, and this isn't just "business as usual" like any other PR or work assignment.

>> This is a test. You can't improve a system without testing it on real world conditions.

Software developers know to fix build problems before asking for a review. The AIs are submitting PRs in bad faith because they don't know any better. Compilers and other build tools produce errors when they fail, and the AI is ignoring this first line of feedback.

It is not a maintainers job to review code for syntax errors, or use of APIs that don't actually exist, or other silly mistakes. That's the compilers job and it does it well. The AI needs to take that feedback and fix the issues before escalating to humans.

sbarre

Like I said, I think you may be missing the point of the whole exercise.

mieubrisse

I was looking for exactly this comment. Everybody's gloating, "Wow look how dumb AI is! Haha, schadenfreude!" but this seems like just a natural part of the evolution process to me.

It's going to look stupid... until the point it doesn't. And my money's on, "This will eventually be a solved problem."

roxolotl

The question though is what is the time horizon of “eventually”. Very different decisions should be made if it’s 1 year, 2 years, 4 years, 8 years etc. To me it seems as if everyone is making decisions which are only reasonable if the time horizon is 1 year. Maybe they are correct and we’re on the cusp. Maybe they aren’t.

Good decision making would weigh the odds of 1 vs 8 vs 16 years. This isn’t good decision making.

Qem

> It's going to look stupid... until the point it doesn't. And my money's on, "This will eventually be a solved problem."

AI can remain stupid longer than you can remain solvent.

grewsome

Sometimes the last 10% takes 90% of the time. It'll be interesting to see how this pans out, and whether it will eventually get to something that could be considered a solved problem.

I'm not so sure they'll get there. If the solved problem is defined as a sub-standard but low cost, then I wouldn't bet against that. A solution better than that though, I don't think I'd put my money on that.

spacemadness

People seem like they’re gloating as the message received in this period of the hype cycle is that AI is as good as a junior dev without caveats and it in no way is suppose to be stupid.

Workaccount2

To some people, it will always look stupid.

I have met people who believe that automobile engineering peaked in the 1960's, and they will argue that until you are blue in the face.

null

[deleted]

solids

You are not addressing the point in the comment, why are failing CI changes assigned?

sbarre

I believe I did address that when I said "this is not business as usual work"..

So the typical expectations or norms of how code reviews and PRs work between humans don't really apply here.

That's my guess at least. I have no more insider information than you.

beefnugs

This is the exact reason AI sucks : there is no proper feedback loop.

EVERY single prompt should have the opportunity to get copied off into a permanent log where the end user triggers it : log all input, all output, human writes a summary of what he wanted to happen but did not, what he thinks might have went wrong, what he thinks should have happened (domain specific experts giving feedback about how things are fucking up) And then its still only useful with long term tracking like how someone actually made a training change to fix this exact failure scenario.

None of that exists, so just like "full self driving" was a pie in the sky bullshit dream that proved machine learning has an 80/20 never gonna fully work problem, same thing here

munksbeer

> I feel like everyone is applying a worse-case narrative to what's going on here..

Unfortunately, just about every thread on this genre is like that now.

Dlanv

They said in the comments that currently the firewall is blocking it from checking tests for passing, and they need to fix that.

Otherwise it would check the tests are passing.

robotcapital

Replace the AI agent with any other new technology and this is an example of a company:

1. Working out in the open

2. Dogfooding their own product

3. Pushing the state of the art

Given that the negative impact here falls mostly (completely?) on the Microsoft team which opted into this, is there any reason why we shouldn't be supporting progress here?

JB_Dev

100% agree. i’m not sure why everyone is clowning on them here. This process is a win. Do people want this all being hidden instead in a forked private repo?

It’s showing the actual capabilities in practice. That’s much better and way more illuminating than what normally happens with sales and marketing hype.

rco8786

Satya says: "I’d say maybe 20%, 30% of the code that is inside of our repos today and some of our projects are probably all written by software".

Zuckerberg says: "Our bet is sort of that in the next year probably … maybe half the development is going to be done by AI, as opposed to people, and then that will just kind of increase from there".

It's hard to square those statements up with what we're seeing happen on these PRs.

SketchySeaBeast

These are AI companies selling AI to executives, there's no need to square the circle, the people that they are talking to have no interest in what's happening in a repo, it's about convincing people to buy in early so they can start making money off their massive investments.

daveguy

> Satya says: "I’d say maybe 20%, 30% of the code that is inside of our repos today and some of our projects are probably all written by software".

Well, that makes sense to me. Microsoft's software has gotten noticably worse in the last few years. So much that I have abandoned it for my daily driver for the first time since the early 2000s.

polishdude20

The fact that Zuck is saying "sort of" and "probably" is a big giveaway it's not going to happen.

constantcrying

Who is "we" and how and why would "we" "support" or not "support" anything.

Personally I just think it is funny that MS is soft launching a product into total failure.

throwaway844498

"Pushing the state of the art" and experimenting on a critical software development framework is probably not the best idea.

Dlanv

Why not, when it goes through code review by experienced software engineers who are experts on the subject in a codebase that is covered by extensive unit tests?

Draiken

I don't know about you, but it's much more likely for me to let a bug slip when I'm reviewing someone else's code than when I'm writing it myself.

This is what's happening right now: they are having to review every single line produced by this machine and trying to understand why it wrote what it wrote.

Even with experienced developers reviewing and lots of tests, the likelihood of bugs in this code compared to a real engineer working on it is much higher.

Why not do this on less mission critical software at the very least?

Right now I'm very happy I don't write anything on .NET if this is what they'll use as a guinea pig for the snake oil.

mrguyorama

>supporting progress

This presupposes AI IS progress.

Nevermind that what this actually shows is an executive or engineering team that so buys their own hype that they didn't even try to run this locally and internally before blasting to the world that their system can't even ensure tests are passing before submitting a PR. They are having a problem with firewall rules blocking the system from seeing CI outcomes and that's part of why it's doing so badly, so why wasn't that verified BEFORE doing this on stage?

"Working out in the open" here is a bad thing. These are issues that SHOULD have been caught by an internal POC FIRST. You don't publicly do bullshit.

"Dogfooding" doesn't require throwing this at important infrastructure code. Does VS code not have small bugs that need fixing? Infrastructure should expect high standards.

"Pushing the state of the art" is comedy. This is the state of the art? This is pushing the state of the art? How much money has been thrown into the fire for this result? How much did each of those PRs cost anyway?

lawn

Because they're using it on an extremely popular repository that many people depend on?

And given the absolute garbage the AI is putting out the quality of the repo will drop. Either slop code will get committed or the bots will suck away time from people who could've done something productive instead.

globalise83

Malicious compliance should be the order of the day. Just approve the requests without reviewing them and wait until management blinks when Microsoft's entire tech stack is on fire. Then quit your job and become a troubleshooter on x3 the pay.

sbarre

I know this is meant to sound witty or clever, but who actually wants to behave this way at their job?

I'll never understand the antagonistic "us vs. them" mentality people have with their employer's leadership, or people who think that you should be actively sabotaging things or be "maliciously compliant" when things aren't perfect or you don't agree with some decision that was made.

To each their own I guess, but I wouldn't be able to sleep well at night.

HelloMcFly

It’s worth recognizing that the tension between labor and capital historical reality, not just a modern-day bad attitude. Workers and leadership don’t automatically share goals, especially when senior management incentives often prioritize reducing labor costs which they always do now (and no, this wasn't always universally so).

Most employees want to do good work, but pretending there’s no structural divergence in interests flattens decades of labor history and ignores the power dynamics baked into modern orgs. It’s not about being antagonistic, it’s about being clear-eyed where there are differences between the motivations of your org. leadership and your personal best interests. After a few levels remove from your position, you're just headcount with loaded cost.

sbarre

Great comment.. It's of course more complex than I made it out to be, I was mostly reacting to the idea of "malicious compliance" at your place of employment and how at odds that is with my own personal morals and approach.

But 100% agreed that everyone should maintain a realistic expectation and understanding of their relationship with their employer, and that job security and employment guarantees are possibly at an all-time low in our industry.

gthrowaway2342

[dead]

Frost1x

I suppose that depends on your relationship with your employer. If your goals are highly aligned (e.g. lots of equity based compensation, some degree of stability and security, interest in your role, healthy management practices that value their workforce, etc.) then I agree, it’s in your own self interest to push back because it can effect you directly.

Meanwhile a lot of folks have very unhealthy to non-existent relationships with their employers. There may be some mixture where they may be temporary hired/viewed as highly disposable or transient in nature having very little to gain from the success of the business, they may be compensated regardless of success/failure, they may have toxic management who treat them terribly (condescendingly, constantly critical, rarely positive, etc.). Bad and non-existent relationships lead to this sort of behavior. In general we’re moving towards “non-existent” relationships with employers broadly speaking for the labor force.

The counter argument is often floated here “well why work there” and the fact is money is necessary to survive, the number of positions available hiring at any given point is finite, and many almost by definition won’t ever be the top performers in their field to the point they truly choose their employers and career paths with full autonomy. So lots of people end up in lots of places that are toxic or highly misaligned with their interests as a survival mechanism. As such, watching the toxic places shoot themselves in the foot can be some level of justice people find where generally unpleasant people finally get to see consequences of their actions and take some responsibility.

People will prop others up from their own consequences so long as there’s something in it for them. As you peel that away, at some point there’s a level of poetic justice to watch the situation burn. This is why I’m not convinced having completely transactional relationships with employers is a good thing. Even having self interest and stability in mind, certain levels of toxicity in business management can fester. At some point no amount of money is worth dealing with that and some form of correction is needed there. The only mechanism is to typically assure poor decision making and action is actually held accountable.

sbarre

Another great comment, thanks! Like I said elsewhere I agree things are more complicated than I made them out to be in my short and narrow response.

I agree with all your points here, the broader context of one's working conditions really matter.

I do think there's a difference between sitting back and watching things go bad (vs struggling to compensate for other people's bad decisions) and actively contributing to the problems (the "malicious compliance" part)..

Letting things fail is sometimes the right choice to make, if you feel like you can't effect change otherwise.

Being the active reason that things fail, I don't think is ever the right choice.

nope1000

On the other hand: why should you accept that your employer is trying to fire you but first wants you to train the machine that will replace you? For me this is the most "them vs us" it can be.

early_exit

To be fair, "them" are actively working to replace "us" with AI.

bluefirebrand

Do you sleep well at night just doing what you're told by people who don't really care about your well being?

I don't get that

sbarre

There's a whole lot of assumptions in your statement/question there, don't you think?

Hamuko

Considering that there's daily employee protests against Microsoft now, probably a lot of Microsoft employees want to behave like that.

Xori71

I agree. It doesn’t help that once things start breaking down, the employer will ask the employees to fix the issue themselves, and thus they’ll have to deal with so much broken code that they’ll be miserable. It’ll become a spiral.

anonymousab

When the issues arise because of the tool being trained explicitly to respect/fire you, then that sounds like an apt and appropriate resulting level of job security.

mrguyorama

>I'll never understand the antagonistic "us vs. them" mentality

Your manager understands it. Their manager understands it. Department heads understand it. The execs understand it. The shareholders understand it.

Who does it benefit for the laborers to refuse to understand it?

It's not like I hate my job. It's just being realistic that if a company could make more money by firing me, they would, and if you have good managers and leadership, they will make sure you understand this in a way that respects you as a human and a professional.

sbarre

What you are describing is not "antagonistic" though..

> antagonism: actively expressed opposition or hostility

I agree with you that everyone should have a clear and realistic understanding of their relationship with their employer. And that is entirely possible in a professional and constructive manner.

But that's not the same thing as being actively hostile towards your place of work.

tantalor

> when Microsoft's entire tech stack is on fire

Too late?

MonkeyClub

Just in time for marshmallows!

hello_computer

Might as well when they’re going to lay you off no matter what you do (like the guy who made an awesome TypeScript compiler in Go).

xyst

At some point code pilot will just delete the whole codebase. Can’t fail integration tests if there is no code :)

otabdeveloper4

That would be logical, but alas LLMs can't into logic.

Bloating the codebase with dead code is much more likely.

weird-eye-issue

That's cute, but the maintainers themselves submitted the requests with Copilot.

balazstorok

At least opening PRs is a safe option, you can just dump the whole thing if it doesn't turn out to be useful.

Also, trying something new out will most likely have hiccups. Ultimately it may fail. But that doesn't mean it's not worth the effort.

The thing may rapidly evolve if it's being hard-tested on actual code and actual issues. For example it will be probably changed so that it will iterate until tests are actually running (and maybe some static checking can help it, like not deleting tests).

Waiting to see what happens. I expect it will find its niche in development and become actually useful, taking off menial tasks from developers.

Frost1x

It might be a safer option in a forked version of the project that the public can’t see. I have to wonder about the optics here from a sales perspective. You’d think they’d test this out more internally before putting it in public access.

Now when your small or medium size business management reads about CoPilot in some Executive Quarterly magazine and floats that brilliant idea internally, someone can quite literally point to these as examples of real world examples and let people analyze and pass it up the management chain. Maybe that wasn’t thought through all the way.

Usually businesses tend to hide this sort of performance of their applications to the best of their abilities, only showcasing nearly flawless functionality.

xnickb

> I expect it will find its niche in development and become actually useful, taking off menial tasks from developers.

Reading AI generated code is arguably far more annoying than any menial task. Especially if the said code happens to have subtle errors.

Speaking from experience.

balazstorok

This is probably version 0.1 or 0.2.

Reviewing what the AI does now is not to be compared with human PRs. You are not doing the work as it is expected in the (hopefully near?) future but you are training the AI and the developers of the AI and more crucially: you are digging out failure modes to fix.

xnickb

While I admire your optimism regarding those errors getting fixed, I myself am sceptical about the idea of that happening in my lifetime (I'm in my mid 30s).

It would definitely be nice to be wrong though. That'd make life so much easier.

ecb_penguin

This is true for all code and has nothing to do with AI. Reading code has always been harder than writing code.

The joke is that PERL was a write-once, read-none language.

> Speaking from experience.

My experience is all code can have subtle errors, and I wouldn't treat any PR differently.

xnickb

I agree, but when working with code written by your teammate you have a rough idea what kind of errors to expect.

AI however is far more creative than any given single person.

That's my gut feeling anyway. I don't have numbers or any other rigorous data. I only know that Linus Torvalds made a very good point about chain of trust. And I don't see myself ever trysting AI the same way I can trust a human.

cesarb

> At least opening PRs is a safe option, you can just dump the whole thing if it doesn't turn out to be useful.

There's however a border zone which is "worse than failure": when it looks good enough that the PRs can be accepted, but contain subtle issues which will bite you later.

UncleMeat

Yep. I've been on teams that have good code review culture and carefully review things so they'd be able to catch subtle issues. But I've also been on teams where reviews are basically "tests pass, approved" with no other examination. Those teams are 100% going to let garbage changes in.

camdenreslink

Even when you review human-written code carefully, subtle bugs can sneak through. Software development is hard.

ecb_penguin

Funny enough, this happens literally every day with millions of developers. There will be thousands upon thousands of incidents in the next hour because a PR looked good, but contained a subtle issue.

6uhrmittag

> At least opening PRs is a safe option, you can just dump the whole thing if it doesn't turn out to be useful.

However, every PR adds load and complexity to community projects.

As another commenter suggested, doing these kind of experiments on separate forks sound a bit less intrusive. Could be a take away from this experiment and set a good example.

There are many cool projects on GitHub that are just accumulating PRs for years, until the maintainer ultimately gives up and someone forks it and cherry-picks the working PRs. I've than that myself.

I'm super worried that we'll end up with more and more of these projects and abandoned forks :/

cyanydeez

Unfortunately,if you believe LLMs really can learn to code with bugs, then the nezt step would be to curate a sufficiently bug free data set. Theres no evidence this has occured, rather, they just scraped whayecer

petetnt

GitHub has spent billions of dollars building an AI that struggles with things like whitespace related linting errors on one of the most mature repositories available. This would be probably okay for a hobbyist experiment, but they are selling this as a groundbreaking product that costs real money.

marcosdumay

> This would be probably okay for a hobbyist experiment

It's perfectly ok for a professional research experiment.

What's not ok is their insistence on selling the partial research results.

sexy_seedbox

Nat Friedman must be rolling in his grave...

oh wait

ocdtrekkie

He's rolling in money for sure.

Philpax

Stephen Toub, a Partner Software Engineer at MS, explaining that the maintainers are intentionally requesting these PRs to test Copilot: https://github.com/dotnet/runtime/pull/115762#issuecomment-2...

Quarrelsome

rah, we might be in trouble here. The primary issue at play is that we don't have a reliable means of measuring developer performance, outside of subjective judgement like end of year reviews.

This means its probably quite hard to measure the gain or the drag of using these agents. On one side, its a lot cheaper than a junior, but on the other side it pulls time from seniors and doesn't necessarily follow instruction well (i.e. "errr your new tests are failing").

This combined with the "cult of the CEO" sets the stage for organisational dissonance where developer complaints can be dismissed as "not wanting to be replaced" and the benefits can be overstated. There will be ways of measuring this, to project it as huge net benefit (which the cult of the CEO will leap upon) and there will be ways of measuring this to project it as a net loss (rabble rousing developers). All because there is no industry standard measure accepted by both parts of the org that can be pointed at which yields the actual truth (whatever that may be).

If I might add absurd conjecture: We might see interesting knock-on effects like orgs demanding a lowering of review standards in order to get more AI PRs into the source.

rco8786

> its a lot cheaper than a junior

I’m not even sure if this is true when considering training costs of the model. It takes a lot of junior engineer salaries to amortize the billions spent building this thing in the first place.

Quarrelsome

sure, but for an org just buying tokens its cheaper and more disposable than an employee. At least it looks better on paper for the bean counters.

BugheadTorpeda6

Yes it's going to cause many problems forcompanies I think, but at least they will deserve it (the employees won't unfortunately unless they've drank the kool-aid, I rarely meet ICs that have drank it fwiw, which means I'm either in a serious bubble, or this is being pushed from the top down). The only clear winners are going to be chip companies.

There's never going to be an industry standard measure either. Measuring productivity as I'm sure you know is incredibly dumb for a job like this because the beneficialness of our work product can be both insanely positive and put the company on top or it can be so negative that it goes bankrupt. And ultimately a lot of what goes into people choosing whether they like the work product or not is subjective. A large part of our work is more of an art than a science and I say that as somebody that works about as far away from the frontend as one can get.

Crosseye_Jack

I do love one bot asking another bot to sign a CLA! - https://github.com/dotnet/runtime/pull/115732#issuecomment-2...

pm215

That's funny, but also interesting that it didn't "sign" it. I would naively have expected that being handed a clear instruction like "reply with the following information" would strongly bias the LLM to reply as requested. I wonder if they've special cased that kind of thing in the prompt; or perhaps my intuition is just wrong here?

Bedon292

A comment on one of the threads, when a random person tried to have copilot change something, said that copilot will not respond to anyone without write access to the repo. I would assume that bot doesn't have write access, so copilot just ignores them.

Quarrel

AI can't, as I understand it, have copyright over anything they do.

Nor can it be an entity to sign anything.

I assume the "not-copyrightable" issue, doesn't in anyway interfere with the rights trying to be protected by the CLA, but IANAL ..

I assume they've explicitly told it not to sign things (perhaps, because they don't want a sniff of their bot agreeing to things on behalf of MSFT).

candiddevmike

Are LLM contributions effectively under public domain?

90s_dev

Well?? Did it sign it???

jsheard

Not sure if a chatbot can legally sign a contract, we'd better ask ChatGPT for a second opinion.

gortok

At least currently, to qualify for copyright, there must be a human author. https://www.reuters.com/world/us/us-appeals-court-rejects-co...

I have no idea how this will ultimately shake out legally, but it would be absolutely wild for Microsoft to not have thought about this potential legal issue.

Hamuko

I would imagine it can't sign it, especially with the options given.

>I have sole ownership of intellectual property rights to my Submissions

I would assume that the AI cannot have IP ownership considering that an AI cannot have copyright in the US.

>I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer.

Surely an AI would not be classified as an employee and therefore would not have an employer. Has Microsoft drafted an employment contract with Copilot? And if we consider an AI agent to be an employee, is it protected by the Fair Labor Standards Act? Is it getting paid at least minimum wage?

TuringNYC

There is some unfortunate history here, though not a perfect analog: https://en.wikipedia.org/wiki/2010_United_States_foreclosure...

tessierashpool9

offer it more money, then it will sign

b0ner_t0ner

Just need the chatbot to connect to an MCP to call my robotic arm to sign it.

marcosdumay

It didn't. It completely ignored the request.

(Turns out the AI was programmed to ignore bots. Go figure.)

nikolayasdf123

that's the future, AI talking to other AI, everywhere, all the time

thallium205

Is this the first instance of an AI cyber bullying another AI?

margorczynski

With how stochastic the process is it makes it basically unusable for any large scale task. What's the plan? To roll the dice until the answer pops up? That would be maybe viable if there was a way to automatically evaluate it 100% but with a human in the loop required it becomes untenable.

diggan

> What's the plan?

Call me old school, but I find the workflow of "divide and conquer" to be as helpful when working with LLMs, as without them. Although what is needed to be considered a "large scale task" varies by LLMs and implementation. Some models/implementations (seemingly Copilot) struggles with even the smallest change, while others breeze through them. Lots of trial and error is needed to find that line for each model/implementation :/

mjburgess

The relevant scale is the number of hard constraints on the solution code, not the size of task as measured by "hours it would take the median programmer to write".

So eg., one line of code which needed to handle dozens of hard-constraints on the system (eg., using a specific class, method, with a specific device, specific memory management, etc.) will very rarely be output correctly by an LLM.

Likewise "blank-page, vibe coding" can be very fast if "make me X" has only functional/soft-constraints on the code itself.

"Gigawatt LLMs" have brute-forced there way to having a statistical system capable of usefully, if not universally, adhreading to one or two hard constraints. I'd imagine the dozen or so common in any existing application is well beyond a Terawatt range of training and inference cost.

cyanydeez

Keep in mind that the model of using LLM assumes the underlying dataset converges to production ready code. Thats never been proven, cause we know they scraped sourcs code without attribution.

nonethewiser

Its hard for me to think of a small, clearly defined coding problem an LLM cant solve.

mrguyorama

There are several in the linked post, primarily:

"Your code does not compile" and "Your tests fail"

If you have to tell an intern that more than once on a single task, there's going to be conversations.

jodrellblank

"Find a counter example to the Collatz conjecture".

safety1st

I mean I guess this isn't very ambitious, but it's a meaningful time saver if I basically just write code in natural language, and then Copilot generates the real code based on that. I don't have to look up syntax details, or what some function somewhere was named, etc. It will perform very accurately this way. It probably makes me 20% more efficient. It doubles my efficiency in a language I'm unfamiliar with.

I can't fire half my dev org tomorrow with that approach, I can't really fire anyone, so I guess it would be a big letdown for a lot of execs. Meanwhile though we just keep incrementally shipping more stuff faster at higher quality so I'm happy...

This works because it treats the LLM like what it actually is: an exceptionally good if slightly random text transformer.

rsynnott

I suspect that the plan is that MS has spent a lot, really a LOT, of money on this nonsense, and there is now significant pressure to put, something, anything, out even if it is worse than useless.

Traubenfuchs

> to roll the dice

This was discussed here

https://news.ycombinator.com/item?id=43988913

eterevsky

The plan is to improve AI agents from their current ~intern level to a level of a good engineer.

ehnto

They are not intern level.

Even if it could perform at a similar level to an intern at a programming task, it lacks a great deal of the other attributes that a human brings to the table, including how they integrate into a team of other agents (human or otherwise). I won't bother listing them, as we are all humans.

I think the hype is missing the forest for the trees, and I think exactly this multi-agent dynamic might be where the trees start to fall down in front of us. That and the as currently insurmountable issues of context and coherence over long time horizons.

Tade0

My impression is that Copilot acts a lot like one of my former coworkers, who struggled with:

-Being a parent to a small child and the associated sleep deprivation.

-His reluctance to read documentation.

-There being a language barrier between him the project owners. Emphasis here, as the LLM acts like someone who speaks through a particularly good translation service, but otherwise doesn't understand the language spoken.

Workaccount2

The real missing the forest for the trees is thinking that software and the way users will use computers is going to remain static.

Software today is written to accommodate every possible need of every possible user, and then a bunch of unneeded selling point features on top of that. These massive sprawling code bases made to deliver one-size fits all utility.

I don't need 3 million LOC Excel 365 to keep track of who is working on the floor on what day this week. Gemini 2.5 can write an applet that does that perfectly in 10 minutes.

ethanol-brain

Seems like that is taking a very long time, on top of some very grandiose promises being delivered today.

infecto

I look back over the past 2-3 years and am pretty amazed with how quick change and progress have been made. The promises are indeed large but the speed of progress has been fast. Not defending the promise but “taking a very long time” does not seem to be an accurate representation.

DrillShopper

Third AI Winter from overpromise/underdeliver when?

interimlojd

You are really underselling interns. They learn from a single correction, sometimes even without a correction, all by themselves. Their ability to integrate previous experience in the context of new problems is far, far above what I've ever seen in LLMs

mnky9800n

Yes but they are supposed to be PhD level 5 years ago if you are listening to sama et al.

rchaud

Especially ironic considering he's neither a developer nor a PhD. He's the smooth talking "MBA idea guy looking for a technical cofounder" type that's frequently decried on HN.

einsteinx2

Without handholding (aka being used as a tool by a competent programmer instead of as an independent “agent”), they’re currently significantly worse than an intern.

serial_dev

This looks much worse than an intern. This feels like a good engineer who has brain damage.

When you look at it from afar, it looks potentially good, but as you start looking into it for real, you start realizing none of it makes any sense. Then you make simple suggestions, it does something that looks like what you asked, yet completely missing the point.

An intern, no matter how bad it is, could only waste so much time and energy.

This makes wasting time and introducing mind-bogglingly stupid bugs infinitely scalable.

marmakoide

The plan went from the AI being a force multiplier, to a resource hungry beast that have to be fed in the hope it's good enough to justify its hunger.

rsynnott

I mean, I think this is a _lot_ worse than an intern. An intern isn't constantly going to make PRs with failing CI, for a start.

le-mark

The real tragedy is the management mandating this have their eyes clearly set on replacing the very same software engineers with this technology. I don’t know what’s more Kafka than Kafka but this situation certainly is!

strogonoff

When tasked to train a technology that deprecates yourself, it’s relatively OK (you’re getting paid handsomely, and many of the developers at Microsoft etc. are probably ready to retire soon anyway). It’s another thing to realize that the same technology will also deprecate your children.

solarwindy

The managers may believe that's what they're asking their developers to do, but doesn't this whole charade expose the fact that this technology just does not have even close to the claimed capabilities?

I see it as wishful thinking in the extreme to suppose that probabilistic mashing together of plagiarized jigsaw pieces of code could somehow approach human intelligence and reasoning—and yet, the parlour trick is convincing enough that this has escalated into a mass delusion.

strogonoff

Philosophy becomes key. True human intelligence is not very well defined, and possibly cannot be divorced from concepts like “consciousness” or “agency”, at which point claiming that the thing is “like human” opens the operator to accusations of running a torture chamber or being a slave owner of entities that can feel.

tossandthrow

Management obviously also know, that when they do not have anybody to manage, then they are also obselete.