GitHub Copilot Coding Agent
371 comments
·May 19, 2025taurath
timrogers
We've been using Copilot coding agent internally at GitHub, and more widely across Microsoft, for nearly three months. That dogfooding has been hugely valuable, with tonnes of valuable feedback (and bug bashing!) that has helped us get the agent ready to launch today.
So far, the agent has been used by about 400 GitHub employees in more than 300 our our repositories, and we've merged almost 1,000 pull requests contributed by Copilot.
In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
(Source: I'm the product lead at GitHub for Copilot coding agent.)
overfeed
> we've merged almost 1,000 pull requests contributed by Copilot
I'm curious to know how many Copilot PRs were not merged and/or required human take-overs.
sethammons
textbook survivorship bias https://en.wikipedia.org/wiki/Survivorship_bias
every bullet hole in that plane is the 1k PRs contributed by copilot. The missing dots, and whole missing planes, are unaccounted for. Ie, "ai ruined my morning"
philipwhiuk
I'm curious how many were much more than Dependabot changes.
xeromal
I see number of PRs as modern LOC, something that doesn't tell me anything about quality.
literalAardvark
"We need to get 1000 PRs merged from Copilot" "But that'll take more time" "Doesn't matter"
NitpickLawyer
> In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
Really cool, thanks for sharing! Would you perhaps consider implementing something like these stats that aider keeps on "aider writing itself"? - https://aider.chat/HISTORY.html
timrogers
Nice idea! We're going to try to get together a blog post in the next couple of weeks on how we're using Copilot coding agent at GitHub - including to build Copilot coding agent ;) - and having some live stats would be pretty sweet too.
taurath
> In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
Thats a fun stat! Are humans in the #1-4 slots? Its hard to know what processes are automated (300 repos sounds like a lot of repos!).
Thank you for sharing the numbers you can. Every time a product launch is announced, I feel like its a gleeful announcement of a decrease of my usefulness. I've got imposter syndrome enough, perhaps Microsoft might want to speak to the developer community and let us know what they see happening? Right now its mostly the pink slips that are doing the speaking.
timrogers
Humans are indeed in slots #1-4.
After hearing feedback from the community, we’re planning to share more on the GitHub Blog about how we’re using Copilot coding agent at GitHub. Watch this space!
_heimdall
How strong was the push from leadership to use the agents internally?
As part of the dogfooding I could see them really pushing hard to try having agents make and merge PRs, at which point the data is tainted and you don't know if the 1,000 PRs were created or merged to meet demand or because devs genuinely found it useful and accurate.
mirkodrummer
> 1,000 pull requests contributed by Copilot
I'd like a breakdown of this phrase, how much human work vs Copilot and in what form, autocomplete vs agent. It's not specified seems more like a marketing trickery than real data
timrogers
The "1,000 pull requests contributed by Copilot" datapoint is specifically referring to Copilot coding agent over the past 2.5 months.
Pretty much every developer at GitHub is using Copilot in their day to work, so its influence touches virtually every code change we make ;)
binarymax
So I need to ask: what is the overall goal of your project? What will you do in, say, 5 years from now?
timrogers
What I'm most excited about is allowing developers to spend more of their time working on the work they enjoy, and less of their time working on mundane, boring or annoying tasks.
Most developers don't love writing tests, or updating documentation, or working on tricky dependency updates - and I really think we're heading to a world where AI can take the load of that and free me up to work on the most interesting and complex problems.
ilaksh
That's a completely nonsensical question given how quickly things are evolving. No one has a five year project timeline.
dsl
> In the repo where we're building the agent, the agent itself is actually the #5 contributor
How does this align with Microsoft's AI safety principals? What controls are in place to prevent Copilot from deciding that it could be more effective with less limitations?
timrogers
Copilot only does work that has been assigned to it by a developer, and all the code that the agent writes has to go through a pull request before it can be merged. In fact, Copilot has no write access to GitHub at all, except to push to its own branch.
That ensures that all of Copilot's code goes through our normal review process which requires a review from an independent human.
bamboozled
Haha
meindnoch
Yeah, Product Managers always swear by their products.
mjr00
From talking to colleagues at Microsoft it's a very management-driven push, not developer-driven. Friend on an Azure team had a team member who was nearly put on a PIP because they refused to install the internal AI coding assistant. Every manager has "number of developers using AI" as an OKR, but anecdotally most devs are installing the AI assistant and not using it or using it very occasionally. Allegedly it's pretty terrible at C# and PowerShell which limits its usefulness at MS.
shepherdjerred
[flagged]
antihipocrat
That's exactly what senior executives who aren't coding are saying everywhere.
Meanwhile, engineers are using it for code completion and as a Google search alternative.
I don't see much difference here at all, the only habit to change is learning to trust an AI solution as much as a Stack Overflow answer. Though the benefit of SO is each comment is timestamped and there are alternative takes, corrections, caveats in the comments.
mjr00
What does this have to do with my comment? Did you mean to reply to someone else?
I don't understand what this has to do with AI adoption at MS (and Google/AWS, while we're at it) being management-driven.
adamsb6
There's a large group of people that claim that AI tools are no good and I can't tell if they're in some niche where they truly aren't, they don't care to put any effort into learning the tools, or they're simply in denial.
evantbyrne
It's just tooling. Costs nothing to wait for it to be better. It's not like you're going miss out on AGI. The cost of actually testing every slop code generator is non-trivial.
rsoto2
AIs are boring
karn97
A bwtter stack exchange search isn't that revolutionary
sensanaty
> I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success
Have they tried dogfooding their dogshit little tool called Teams in the last few years? Cause if that's what their "famed" dogfooding gets us, I'm terrified to see what lays in wait with copilot.
twodave
I feel like I saw a quote recently that said 20-30% of MS code is generated in some way. [0]
In any case, I think this is the best use case for AI in programming—as a force multiplier for the developer. It’s for the best benefit of both AI and humanity for AI to avoid diminishing the creativity, agency and critical thinking skills of its human operators. AI should be task oriented, but high level decision-making and planning should always be a human task.
So I think our use of AI for programming should remain heavily human-driven for the long term. Ultimately, its use should involve enriching humans’ capabilities over churning out features for profit, though there are obvious limits to that.
[0] https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-a...
greatwhitenorth
How much was previously generated by intellisense and other code gen tools before AI? What is the delta?
DeepYogurt
> I feel like I saw a quote recently that said 20-30% of MS code is generated in some way. [0]
Similar to google. MS now requires devs to use ai
spooneybarger
I know a lot of devs at MSFT, none of them are required to use AI.
beefnugs
So demanding all employees use it... results in less than 30% compliance. That does tell me a lot
tmpz22
How much of that is protobuf stubs and other forms of banal autogenerate code?
twodave
Updated my comment to include the link. As much as 30% specifically generated by AI.
rcarmo
That quote was completely misrepresented.
ilaksh
You might want to study the history of technology and how rapidly compute efficiency has increased as well as how quickly the models are improving.
In this context, assuming that humans will still be able to do high level planning anywhere near as well as an AI, say 3-5 years out, is almost ludicrous.
_se
Reality check time for you: people were saying this exact thing 3 years ago. You cannot extrapolate like that.
k__
"I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success."
They just cut down their workforce, letting some of their AI people go. So, I assume there isn't that much success.
null
lacoolj
They have released numbers, but I can't say they are for this specific product or something else. They are apparently having AI generate "30%" of their code.
https://techcrunch.com/2025/04/29/microsoft-ceo-says-up-to-3...
null
Scene_Cast2
I tried doing some vibe coding on a greenfield project (using gemini 2.5 pro + cline). On one hand - super impressive, a major productivity booster (even compared to using a non-integrated LLM chat interface).
I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions (putting things where they don't belong). Unfortunately, there's not that much self-retrospection on these aspects if you ask about the quality of the code or if there are any better ways of doing it. Of course, if you pick up that something is in the wrong spot and prompt better, they'll pick up on it immediately.
I also ended up blowing through $15 of LLM tokens in a single evening. (Previously, as a heavy LLM user including coding tasks, I was averaging maybe $20 a month.)
candiddevmike
> I also ended up blowing through $15 of LLM tokens in a single evening.
This is a feature, not a bug. LLMs are going to be the next "OMG my AWS bill" phenomenon.
Scene_Cast2
Cline very visibly displays the ongoing cost of the task. Light edits are about 10 cents, and heavy stuff can run a couple of bucks. It's just that the tab accumulates faster than I expect.
eterm
> Light edits are about 10 cents
Some well-paid developers will excuse this with, "Well if it saved me 5 minutes, it's worth an order of magnitude than 10 cents".
Which is true, however there's a big caveat: Time saved isn't time gained.
You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.
PretzelPirate
> Cline very visibly displays the ongoing cost of the task
LLMs are now being positioned as "let them work autonomously in the background" which means no one will be watching the cost in real time.
Perhaps I can set limits on how much money each task is worth, but very few would estimate that properly.
Cthulhu_
Especially at companies (hence this github one), where the employees don't care about cost because it's the boss' credit card.
philkuz
I think that models are gonna commoditize, if they haven't already. The cost of switching over is rather small, especially when you have good evals on what you want done.
Also there's no way you can build a business without providing value in this space. Buyers are not that dumb.
raincole
They are already quite commoditized. Commoditization doesn't mean "cheap", and it doesn't mean you won't spend $15 a night like the GP did.
BeetleB
> I also ended up blowing through $15 of LLM tokens in a single evening.
Consider using Aider, and aggressively managing the context (via /add, /drop and /clear).
gen220
I, too, recommend aider whenever these discussions crop up; it converted me from the "AI tools suck" side of this discussion to the "you're using the wrong tool" side.
I'd also recommend creating little `README`'s in your codebase that are mainly written with aider as the intended audience. In it, I'll explain architecture, what code makes (non-)sense to write in this directory, and so on. Has the side-effect of being helpful for humans, too.
Nowadays when I'm editing with aider, I'll include the project README (which contains a project overview + pointers to other README's), and whatever README is most relevant to the scope of my session. It's super productive.
I'm yet to find a model that beats the cost-effectiveness of Sonnet 3.7. I've tried the latest deepseek models, and while I love the price (nearly 50x cheaper?), it's just far too error-prone compared to Sonnet 3.7. It generates solid plans / architecture discussions, but, unlike Sonnet, the code it generates often confidently off-the-mark.
BeetleB
Have you tried Gemini 2.5? It's cheaper and scores higher on the Aider leaderboard.
mattlondon
Why create READMEs and not just comments in the code?
rcarmo
There is a better way than just READMEs: https://taoofmac.com/space/blog/2025/05/13/2230
danenania
My tool Plandex[1] allows you to switch between automatic and manual context management. It can be useful to begin a task with automatic context while scoping it out and making the high level plan, then switch to the more 'aider-style' manual context management once the relevant files are clearly established.
1 - https://github.com/plandex-ai/plandex
Also, a bit more on auto vs. manual context management in the docs: https://docs.plandex.ai/core-concepts/context-management
SkyPuncher
I loathe using AI in a greenfield project. There are simply too many possible paths, so it seems to randomly switch between approaches.
In a brownfield code base, I can often provide it reference files to pattern match against. So much easier to get great results when it can anchor itself in the rest of your code base.
imiric
The trick for greenfield projects is to use it to help you design detailed specs and a tentative implementation plan. Just bounce some ideas off of it, as with a somewhat smarter rubber duck, and hone the design until you arrive at something you're happy with. Then feed the detailed implementation plan step by step to another model or session.
This is a popular workflow I first read about here[1].
This has been the most useful use case for LLMs for me. Actually getting them to implement the spec correctly is the hard part, and you'll have to take the reigns and course correct often.
[1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/
rcarmo
Here’s my workflow, it takes that a few steps further: https://taoofmac.com/space/blog/2025/05/13/2230
jollyllama
The trouble occurs when the brownfield project is crap already.
tmpz22
While its being touted for Greenfield projects I've notices a lot of failures when it comes to bootstrapping a stack.
For example it (Gemini 2.5) really struggles with newer ecosystem like Fastapi when wiring libraries like SQLAlchemy, Pytest, Python-playwright, etc., together.
I find more value in bootstrapping myself, and then using it to help with boiler plate once an effective safety harness is in place.
jim180
I've vibe coded small project as well using Claude Code. It's about visitors registration at the company. Simple project, one form, a couple of checkboxes, everything is stored in sqlite + has endpoint for getting .xlsx.
Initial cost was around $20 USD, which later grew to (mostly polishing) $40 with some manual work.
I've intentionally picked up simple stack: html+js+php.
A couple of things:
* I'd say I'm happy about the result from product's perspective * Codebase could be better, but I could not care less about in this case * By default, AI does not care about security unless I specifically tell it * Claude insisted on using old libs. When I've specifically told it to use the latest and greatest, it upgraded them but left code that works just with an old version. Also it mixed latest DaisyUI with some old version of tailwindcss :)
On one hand it was super easy and fun to do, on the other hand if I was a junior engineer, I bet it would have cost more.
jstummbillig
If you want to use Cline and are at all price sensitive (in these ranges) you have to do manual context management just for that reason. I find that too cumbersome and use Windsurf (currently with Gemini 2.5 pro) for that reason.
shepherdjerred
$15 in an evening sounds like a great deal when you consider the cost of highly-paid software engineers
echelon
> highly-paid software engineers
For now.
ipaddr
The money won't be flowing forever. This will cost you $6,000 a year.
shepherdjerred
A new grad at a FANG costs ~$200k-$250k a year after benefits
falcor84
> LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt
I wonder if the next phase would be the rise of (AI-driven?) "linters" that check that the implementation matches the architecture definition.
dontlikeyoueith
And now we've come full circle back to UML-based code generation.
Everything old is new again!
nodja
I wish they optimized things before adding more crap that will slow things down even more. The only thing that's fast with copilot is the autocomplete, it sometimes takes several minutes to make edits on a 100 line file regardless of the model I pick (some are faster than others). If these models had a close to 100% hit rate this would be somewhat fine, but going back and forth with something that takes this long is not productive. It's literally faster to open claude/chatgpt on a new tab and paste the question and code there and paste it back into vscode than using their ask/edit/agent tools.
I've cancelled my copilot subscription last week and when it expires in two weeks I'll mostly likely shift to local models for autocomplete/simple stuff.
brushfoot
My experience has mostly been the opposite -- changes to several-hundred-line files usually only take a few seconds.
That said, months ago I did experience the kind of slow agent edit times you mentioned. I don't know where the bottleneck was, but it hasn't come back.
I'm on library WiFi right now, "vibe coding" (as much as I dislike that term) a new tool for my customers using Copilot, and it's snappy.
nodja
Here's a video of what it looks like with sonnet 3.7.
The claude and gemini models tend to be the slowest (yes, including flash). 4o is currently the fastest but still not great.
NicuCalcea
For me, the speed varies from day to day (Sonnet 3.7), but I've never seen it this slow.
notsylver
I've had this too, especially it getting stuck at the very end and just.. never finishing. Once the usage-based billing comes into effect I think I'll try cursor again. What local models are you using? The local models I tried for autocomplete were unusable, though based on aiders benchmark I never really tried with larger models for chat. If I could I would love to go local-only instead.
BeetleB
Several minutes? Something is seriously wrong. For most models, it takes seconds.
nodja
2m27s for a partial response editing a 178 line file (it failed with an error, which seems to happen a lot with claude, but that's another issue).
porridgeraisin
It takes minutes for me too sometimes.
Cursor is quicker, I guess it's a response parsing thing - when they make the decision to show it in the UI.
bencyoung
Some example PRs if people want to look:
https://github.com/dotnet/runtime/pull/115733 https://github.com/dotnet/runtime/pull/115732 https://github.com/dotnet/runtime/pull/115762
sensanaty
That first PR (115733) would make me quit after a week if we were to implement this crap at my job and someone forced me to babysit an AI in its PRs in this fashion. The others are also rough.
A wall of noise that tells you nothing of any substance but with an authoritative tone as if what it's doing is objective and truthful - Immediately followed by:
- The 8 actual lines of code (discounting the tests & boilerplate) it wrote to actually fix the issue is being questioned by the person reviewing the code, it seems he's not convinced this is actually fixing what it should be fixing.
- Not running the "comprehensive" regression tests at all
- When they do run, they fail
- When they get "fixed" oh-so confidently, they still fail. Fifty-nine failing checks. Some of these tests take upward of an hour to run.
So the reviewer here has to read all the generated slop in the PR description and try to grok what the PR is about, read through the changes himself anyway (thankfully it's only a ~50 line diff in this situation, but imagine if this was a large refactor of some sort with a dozen files changed), and then drag it by the hand multiple times to try fix issues it itself is causing. All the while you have to tag the AI as if it's another colleague and talk to it as if it's not just going to spit out whatever inane bullshit it thinks you want to hear based on the question asked. Test failed? Well, tests fixed! (no, they weren't)
And we're supposed to be excited about having this crap thrust on us, with clueless managers being sold on this being a replacement for an actual dev? We're being told this is what peak efficiency looks like?
joejoo
[dead]
acdha
Thanks, that’s really interesting to see - especially with the exchange around whether something is the problem or the symptom, where the confident tone belies the lack of understanding. As an open source maintainer I wonder about the best way to limit usage to cases where someone has time to spend on those interactions.
bencyoung
Seems amazing similar to the changes a junior would make (jump to the solution that "fixes" it in the most shallow way) at the moment
bearjaws
That first PR is rough. Why does it have to wait for a comment to fix failing tests?
yahoozoo
lol, those first two… poor Stephen
replwoacause
Thanks. I wonder what model they're using under the hood? I have such a good experience working with Cline and Claude Sonnet 3.7 and a comparatively much worse time with anything Github offers. These PRs are pretty consistent with the experience I've had in the IDE too. Incidentally, what has MSFT done to Claude Sonnet 3.7 in VSCode? It's like they lobotomized it compared to using it through Cline or the API directly. Trying to save on tokens or something?
bionhoward
Major scam alert, they are training on your code in private repos if you use this
You can tell because they advertise “Pro” and “Pro+” but then the FAQ reads,
> Does GitHub use Copilot Business or Enterprise data to train GitHub’s model? > No. GitHub does not use either Copilot Business or Enterprise data to train its models.
Aka, even paid individuals plans are getting brain raped
manmal
Might have been the case, but no longer:
https://docs.github.com/en/copilot/managing-copilot/managing...
dankwizard
If you're programming on Windows, your screen is being screenshotted every few seconds anyway. If you don't think OCR isn't analysing everything resembling a letter on your screen boy do I have some news for you.
malfist
Windows recall is not installed by default
dankwizard
Windows Recall is the local storage, user enabled AI thing. Not what I was talking about.
jagged-chisel
I’ve been trying to use Copilot for a few days to get some help writing against code stored on GitHub.
Copilot has been pretty useless. It couldn’t maintain context for more than two exchanges.
Copilot: here’s some C code to do that
Me: convert that to $OTHER_LANGUAGE
Copilot: what code would you like me to convert?
Me: the code you just generated
Copilot: if you can upload a file or share a link to the code, I can help you translate it …
It points me in a direction that’s a minimum of 15 degrees off true north (“true north” being the goal for which I am coding), usually closer to 90 degrees. When I ask for code, it hallucinates over half of the API calls.
null
rcarmo
Be more methodical, it isn’t magic: https://taoofmac.com/space/blog/2025/05/13/2230
jagged-chisel
I’m sure you have no idea what my method is. Besides, this whole “you’re holding it wrong” mentality isn’t productive - our technology should be adapting to us, we shouldn’t need to adapt ourselves to it.
Anyway, I can just use another LLM that serves me better.
shwouchk
I played around with it quite a bit. it is both impressive and scary. most importantly, it tends to indiscriminately use dependencies from random tiny repos, and often enough not the correct ones, for major projects. buyer beware.
PhilipRoman
This is something I've noticed as well with different AIs. They seem to disproportionately trust data read from the web. For example, I asked to check if some obvious phishing pages were scams and multiple times I got just a summary of the content as if it was authoritative. Several times I've gotten some random chinese repo with 2 stars presented as if it was the industry standard solution, since that's what it said in the README.
On an unrelated note, it also suggested I use the "Strobe" protocol for encryption and sent me to https://strobe.cool which is ironic considering that page is all about making one hallucinate.
avs733
>They seem to disproportionately trust data read from the web.
I doubt LLM's have anything like what we would conceptualize as trust. They have information, which is regurgitated because it is activated as relevant.
That being said, many humans don't really have a strong concept of information validation as part of day to day action and thinking. Development theory talks about this in terms of 'formal operational' thinking and 'personal epistemology' - basically how does thinking happen and then how is knowledge in those models conceptualized. Learning Sciences research generally talks about Piaget and formal operational before adulthood and stages of personal epistemology in higher education.
Research consistently suggests that about 50% of adults are not able to consistently operate in the formal thinking space. The behavior you are talking about is also typical of 'absolutist' epistemic perspectives where answers are right or wrong and aren't meaningfully evaluated - just identifed as relevant or not. Evaluating the credibility of information is that it comes from a source that is trusted - most often an authority figure - it is not the role of the person knowing it.
jaymzcampbell
> ... sent me to...
Oh wow, that was great - particularly if I then look at my own body parts (like my palm) that I know are not moving, it's particularly disturbing. That's a really well done effect, I've seen something similar but nothing quite like that.
meindnoch
>On an unrelated note, it also suggested I use the "Strobe" protocol for encryption and sent me to https://strobe.cool which is ironic considering that page is all about making one hallucinate.
That's not hallucination. That's just an optical illusion.
timrogers
Thanks for flagging this! That isn't a behavior I've seen before in testing, and I'd love to dig into it more to see what's happening.
Would you be able to drop me an email? My address is my HN login @github.com.
(I work on the product team for Copilot coding agent.)
yellow_lead
Given that PRs run actions in a more trusted context for private repos, this is a bit concerning.
timrogers
As we've built Copilot coding agent, we've put a lot of thought and work into our security story.
One of the things we've done here is to treat Copilot's commits like commits from a first-time contributor to an open source project.
When Copilot pushes changes, your GitHub Actions workflows won't run by default, and you'll have to click the "Approve and run workflows" button in the merge box.
That gives you the chance to run Copilot's code before it runs in Actions and has access to your secrets.
(Source: I'm on the product team for Copilot coding agent.)
ThierryAbalea
The announcement https://github.blog/news-insights/product-news/github-copilo... seems to position GitHub Actions as a core part of the Copilot coding agent’s architecture. From what I understand in the documentation and your comment, GitHub Actions is triggered later in the flow, mainly for security reasons. Just to clarify, is GitHub Actions also used in the development environment of the agent, or only after the code is generated and pushed?
yellow_lead
Nice! Thanks for that info
cmrdporcupine
So like the typical junior developer, then.
porridgeraisin
No, lol. Even the enthusiastic junior developer would go around pestering people asking if the dependency is OK.
th0ma5
No, not at all. Why do people keep saying shit like these thought terminating sentences. Try to see the glass of Kool Aid please. People are trying to understand how to communicate important valuable things about failure states and you're advocating ignorance.
sensanaty
Because the marketing started with "This is literally the singularity and will take over everything and everyone's jobs".
Then people realized that was BS, so the marketing moved on to "This will enhance everyone's jobs, as a companion that will help everyone".
People also realized that was pure BS. A few more marketing rebrands later and we're at the current situation where we try to equate it to the lowest possible rung of employee they can think of, because surely Junior == Incompetent Idiot You Can't Trust Not To Waste Your Time†. The funny part is that they have been objectively and undeniably only getting better since the early days of the hype bubble, yet the push now is that they're "basically junior level!". Speaks volumes IMO, how those goal posts keep getting moved whenever people actually use these systems in the real work.
---
† IMO every single Junior I've ever worked with has been some of the best moments of my career. It allowed space for me to grow my own knowledge, while I get to talk to and help someone extremely passionate if a bit overeager. This stance on Juniors is, frankly, baffling to me because it's so far from my experiences with how they tend to work, oftentimes they're a million times better than those "10x rockstars" you hear about all the time.
allthenopes25
"Drowning in technical debt?"
Stop fighting and sink!
But rest assured that with Github Copilot Coding Agent, your codebase will develop larger and larger volumes of new, exciting, underexplored technical debt that you can't be blamed for, and your colleagues will follow you into the murky depths soon.
muglug
> Copilot excels at low-to-medium complexity tasks
Oh cool!
> in well-tested codebases
Oh ok never mind
lukehoban
As peer commenters have noted, coding agent can be really good at improving test coverage when needed.
But also as a slightly deeper observation - agentic coding tools really do benefit significantly from good test coverage. Tests are a way to “box in” the agent and allow it to check its work regularly. While they aren’t necessary for these tools to work, they can enable coding agents to accomplish a lot more on your behalf.
(I work on Copilot coding agent)
CSMastermind
In my experience they write a lot of pointless tests that technically increase coverage while not actually adding much more value than a good type system/compiler would.
They also have a tendency to suppress errors instead of fixing them, especially when the right thing to do is throw an error on some edge case.
shepherdjerred
You can tell the AI not to suppress errors
throwaway12361
In my experience it works well even without good testing, at least for greenfield projects. It just works best if there are already tests when creating updates and patches.
null
abraham
Have it write tests for everything and then you've got a well tested codebase.
danielbln
Caveat empor, I've seen some LLMs mock the living hell out of everything, to the point of not testing much of anything. Something to be aware of.
yen223
I've seen too many human operators do that too. Definitely a problem to watch out for
eikenberry
You forgot the /s
boomskats
My buddy is at GH working on an adjacent project & he hasn't stopped talking about this for the last few days. I think I've been reminded to 'make sure I tune into the keynote on Monday' at least 8 times now.
I gave up trying to watch the stream after the third authentication timeout, but if I'd known it was this I'd maybe have tried a fourth time.
tmpz22
I’m always hesitant to listen to the line coders on projects because they’re getting a heavy dose of the internal hype every day.
I’d love for this to blow past cursor. Will definitely tune in to see it.
dontlikeyoueith
>I’m always hesitant to listen to the line coders on projects because they’re getting a heavy dose of the internal hype every day.
I'm senior enough that I get to frequently see the gap between what my dev team thinks of our work and what actual customers think.
As a result, I no longer care at all what developers (including myself on my own projects) think about the quality of the thing they've built.
sethammons
These do not need to be mutually exclusive. Define the quality of the software in terms of customer experience and give developers ownership to improve those markers. You can think service level objectives.
In many cases, this means pushing for more stable deployments which requires other quality improvements.
throwaway12361
Word of advice: just go to YouTube and skip the MS registration tax
unshavedyak
What specific keynote are they referring to? I'm curious, but thus far my searches have failed
babelfish
MS Build is today
quantadev
I love Copilot in VSCode. I have it set to use Claude most of the time, but it let's you pick your fav LLM, for it to use. I just open the files I'm going to refactor, type into the chat window what I want done, click 'accept' on every code change it recommends in it's answer, causing VSCode to auto-merge the changes into my code. Couldn't possibly be simpler. Then I scrutinize and test. If anything went wrong I just use GitLens to rollback the change, but that's very rare.
Especially now that Copilot supports MCP I can plug in my own custom "Tools" (i.e. Function calling done by the AI Agent), and I have everything I need. Never even bothered trying Cursor or Windsurf, which i'm sure are great too, but _mainly_ since they're just forks of VSCode, as the IDE.
SkyBelow
Have you tried the agent mode instead of the ask mode? With just a bit more prompting, it does a pretty good job of finding the files it needs to use on its own. Then again, I've only used it in smaller projects so larger ones might need more manual guidance.
quantadev
I assumed I was using 'Agent mode' but now that you mentioned it, I checked and you're right I've been in 'Ask mode' instead. oops. So thanks for the tip!
I'm looking forward to seeing how Agent Mode is better. Copilot has been such a great experience so far I haven't tried to keep up with every little new feature they add, and I've fallen behind.
SkyBelow
I find agent mode much more powerful as it can search your code base for further reference and even has access to other systems (I haven't seen exactly what is the other level of access, I'm guessing it isn't full access to the web but it can access certain only info repositories). I do find it sometime a little over eager to do instead of explain, so Ask mode is still useful when you want explanations. It also appears that agent has the search capabilities while ask does not, but it might also be something recently added to both and I just don't recall it from being in ask mode as I'm use to the past when it wasn't present.
rcarmo
Try doing https://taoofmac.com/space/blog/2025/05/13/2230, you’ll have some fun,
quantadev
I've come to the same conclusions mentioned in most of that and done most of that already. I was an early-adopter of LLM tech, and have my own coding agent system, written in python. Soon I'm about to port those tools over to MCP so that I can just use VSCode for most everything, and never even need my Gradio Chatbot that I wrote to learn how to write tools, and use tools.
My favorite tool that I've written is one that simply lets me specify named blocks by name, in a prompt, and AI figures out how to use the tool to read each block. A named block is defined like:
# block_begin MyBlock ...lines of code # block_end
So I can just embed those blocks around the code rather change pasting into prompts.
fvold
The biggest change Copilot has done for me so far is to have me replace my VSCode with VSCodium to be sure it doesn't sneak any uploading of my code to a third party without my knowing.
I'm all for new tech getting introduced and made useful, but let's make it all opt in, shall we?
qwertox
Care to explain? Where are they uploading code to?
bluefirebrand
Whatever servers run Copilot for code suggestions
That isn't running locally
> Copilot excels at low-to-medium complexity tasks in well-tested codebases, from adding features and fixing bugs to extending tests, refactoring, and improving documentation.
Bounds bounds bounds bounds. The important part for humans seems to be maintaining boundaries for AI. If your well-tested codebase has the tests built thru AI, its probably not going to work.
I think its somewhat telling that they can't share numbers for how they're using it internally. I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success. There's real stuff in there, and my brain has an insanely hard time separating the trillion dollars of hype from the usefulness.