Skip to content(if available)orjump to list(if available)

A Research Preview of Codex

A Research Preview of Codex

234 comments

·May 16, 2025

johnjwang

Some engineers on my team at Assembled and I have been a part of the alpha test of Codex, and I'll say it's been quite impressive.

We’ve long used local agents like Cursor and Claude Code, so we didn’t expect too much. But Codex shines in a few areas:

Parallel task execution: You can batch dozens of small edits (refactors, tests, boilerplate) and run them concurrently without context juggling. It's super nice to run a bunch of tasks at the same time (something that's really hard to do in Cursor, Cline, etc.)

It kind of feels like a junior engineer on steroids, you just need to point it at a file or function, specify the change, and it scaffolds out most of a PR. You still need to do a lot of work to get it production ready, but it's as if you have an infinite number of junior engineers at your disposal now all working on different things.

Model quality is good, but hard to say it's that much better than other models. In side-by-side tests with Cursor + Gemini 2.5-pro, naming, style and logic are relatively indistinguishable, so quality meets our bar but doesn’t yet exceed it.

criddell

If you aren't hiring junior engineers to do these kinds of things, where do you think the senior engineers you need in the future will come from?

My kid recently graduated from a very good school with a degree in computer science and what she's told me about the job market is scary. It seems that, relatively speaking, there's a lot of postings for senior engineers and very little for new grads.

My employer has hired recently and the flood of resumes after posting for a relatively low level position was nuts. There was just no hope of giving each candidate a fair chance and that really sucks.

My kid's classmates who did find work did it mostly through personal connections.

voidspark

This is exactly the problem. The top level executives are setting up to retire with billions in the bank, while the workers develop their own replacements before they retire with millions in the bank. Senior developers will be mostly obsolete too.

I have mentored junior developers and found it to be a rewarding part of the job. My colleagues mostly ignore juniors, provide no real guidance, couldn't care less. I see this attitude from others in the comments here, relieved they don't have to face that human interaction anymore. There are too many antisocial weirdos in this industry.

Without a strong moral and cultural foundation the AGI paradigm will be a dystopia. Humans obsolete across all industries.

oytis

> I have mentored junior developers and found it to be a rewarding part of the job.

Can totally relate. Unfortunately the trend for all-senior teams and companies has started long before ChatGPT, so the opportunities have been quite scarce, at least in a professional environment.

criddell

> I have mentored junior developers and found it to be a rewarding part of the job.

That's really awesome. I hope my daughter finds a job somewhere that values professional development. I'd hate for her to quit the industry before she sees just how interesting and rewarding it can be.

I didn't have many mentors when starting out, but the ones I had were so unbelievably helpful both professionally and personally. If I didn't have their advice and encouragement, I don't think I'd still be doing what I'm doing.

geekraver

Same, mine is about to graduate with a CS masters from a great school. Couldn't get any internships, and is now incredibly negative about ever being able to find work, which doesn't help. We're pretty much looking at minimum wage jobs doing tech support for NGOs at this point (and the current wave of funding cuts from Federal government for those kind of orgs is certainly not going to help with that).

sam0x17

Hiring of juniors is basically dead these days and it has been like this for about 10 years and I hate it. I remember when I was a junior in 2014 there were actually startups who would hire cohorts of juniors (like 10 at a time, fresh out of CS degree sort of folks with almost no applied coding experience) and then train them up to senior for a few years, and then a small number will stay and the rest will go elsewhere and the company will hire their next batch of juniors. Now no one does this, everyone wants a senior no matter how simple the task. This has caused everyone in the industry to stuff their resume, so you end up in a situation where companies are looking for 10 years of experience in ecosystems that are only 5 years old.

That said, back in the early 00s there was much more of a culture of everyone is expected to be self-taught and doing real web dev probably before they even get to college, so by the time they graduate they are in reality quite senior. This was true for me and a lot of my friends, but I feel like these days there are many CS grads who haven't done a lot of applied stuff. But at the same time, to be fair, this was a way easier task in the early 00s because if you knew JS/HTML/CSS/SQL, C++ and maybe some .NET language that was pretty much it you could do everything (there were virtually no frameworks), now there are thousands of frameworks and languages and ecosystems and you could spend 5+ years learning any one of them. It is no longer possible for one person to learn all of tech, people are much more specialized these days.

But I agree that eventually someone is going to have to start hiring juniors again or there will be no seniors.

dgb23

I recently read an article about the US having a relatively weak occupational training.

To contrast, CH and GER are known to have very robust and regulated apprenticeship programs. Meaning you start working at a much earlier age (16) and go to vocational school at the same time for about 4 years. This path is then supported with all kinds of educational stepping stones later down the line.

There are many software developers who went that route in CH for example, starting with an application development apprenticeship, then getting to technical college in their mid 20's and so on.

I think this model has a lot of advantages. University is for kids who like school and the academic approach to learning. Apprenticeships plus further education or an autodidactic path then casts a much broader net, where you learn practical skills much earlier.

There are several advantages and disadvantages of both paths. In summary I think the academic path provides deeper CS knowledge which can be a force multiplier. The apprenticeship path leads to earlier high productivity and pragmatism.

My opinion is that in combination, both being strongly supported paths, creates more opportunities for people and strengthens the economy as a whole.

polskibus

I think the bigger problem, that started around 2022 is much lower volume of jobs in software development. Projects were shutdown, funding was retracted, even the big wave of migrations to the cloud died down.

Today startups mostly wrap LLMs as this is what VCs expect. Larger companies have smaller IT budgets than before (adjusted for inflation). This is the real problem that causes the jobs shortage.

_bin_

This is a bit of a game theory problem. "Training senior engineers" is an expensive and thankless task: you bear essentially all the cost, and most of the total benefit accrues to others as a positive externality. Griping at companies that they should undertake to provide this positive externality isn't really a constructive solution.

I think some people are betting on the fact that AI can replace junior devs in 2-5 years and seniors in 10-20, when the old ones are largely gone. But that's sort of beside the point as far as most corporate decision-making.

dorian-graph

This hyper-fixation on replacing engineers in writing code is hilarious, and dangerous, to me. Many people, even in tech companies, have no idea how software is built, maintained, and run.

I think instead we should focus on getting rid of managers and product owners.

nopinsight

With Agentic RL training and sufficient data, AI operating at the level of average senior engineers should become plausible in a couple to a few years.

Top-tier engineers who integrate a deep understanding of business and user needs into technical design will likely be safe until we get full-fledged AGI.

hooverd

I think it'll be great if you're working in software not for a software company.

al_borland

That sounds like a dangerous bet.

johnjwang

To be clear, we still hire engineers who are early in their careers (and we've found them to be some of the best folks on our team).

All the same principles apply as before: smart, driven, high ownership engineers make a huge difference to a company's success, and I find that the trend is even stronger now than before because of all the tools that these early career engineers have access to. Many of the folks we've hired have been able to spin up on our codebase much faster than in the past.

We're mainly helping them develop taste for what good code / good practices look like.

criddell

> we still hire engineers who are early in their careers

That's really great to hear.

Your experience that a new engineer equipped with modern tools is more effective and productive than in the past is important to highlight. It makes total sense.

startupsfail

More recent models are not without drive and are not stupid either.

There’s still quite a bit of a gap in terms of trust.

oytis

I guess the industry leaders think we'll not need senior engineers either as capabilities evolve.

But also, I think this underestimates significantly what junior engineers do. Junior engineers are people who have spent 4 to 6 years receiving a specialised education in a university - and they normally need to be already good at school math. All they lack is experience applying this education on a job - but they are professionals - educated, proactive and mostly smart.

The market is tough indeed, and as much it is tough for a senior engineer like myself, I don't envy the current cohort of fresh grads. It being tough is only tangentially related to the AI though. Main factor is the general economic slowdown, with AI contributing by distracting already scarce investment from non-AI companies and producing a lot of uncertainty in how many and what employees companies will need in the future. Their current capabilities are nowhere near to having a real economic impact.

Wish your kid and you a lot of patience, grit and luck.

hintymad

> If you aren't hiring junior engineers to do these kinds of things, where do you think the senior engineers you need in the future will come from?

Unfortunately this is not how companies think. I read somewhere more than 20 years ago about outsourcing and manufacturing offshoring. The author basically asked the same: if we move out the so-called low-end jobs, where do we think we will get the senior engineers? Yet companies continued offshoring, and the western lost talent and know-how, while watching our competitor you-know-who become the world leader in increasingly more industries.

lurking_swe

ahh, the classic “i shall please my investors next quarter while ignoring reality, so i can disappoint my shareholders in 10 years”. lol.

As you say, happens all the time. Also doesn’t make sense because so few people are buying individual stocks anyway. Goal should be to consistently outperform over the long term. Wall street tends to be very myopic.

Thinking long term is a hard concept for the bean counters at these tech companies i guess…

echelon

It's happening to Hollywood right now. In the past three years, since roughly 2022, the majority of IATSE folks (film crew, grips, etc.) have seen their jobs disappear to Eastern Europe where the labor costs one tenth of what it does here. And there are no rules for maximum number of consecutive hours worked.

hintymad

It looks we are in this interesting cycle: millions of engineers contribute to open-source on github. The best of our minds use the code to develop powerful models to replace exactly these engineers. In fact, the more code a group contributes to github, the easier it is for the companies to replace this group. Case in point, frontend engineers are impacted most so far.

Does this mean people will be less incentivized to contribute to open source as time goes by?

P.S., I think the current trend is a wakeup call to us software engineers. We thought we were doing highly creative work, but in reality we spend a lot of time doing the basic job of knowledge workers: retrieving knowledge and interpolating some basic and highly predictable variations. Unfortunately, the current AI is really good at replacing this type of work.

My optimistic view is that in long term we will have invent or expand into more interesting work, but I'm not sure how long we will have to wait. The current generation of software engineers may suffer high supply but low demand of our profession for years to come.

lispisok

As much as I support community developed software and "free as in freedom", "Open Source" got completely perverted into tricking people to work for free for huge financial benefits for others. Your comment is just one example of that.

For that reason all my silly little side projects are now in private repos. I dont care the chance somebody builds a business around them is slim to none. Dont think putting a license will protect you either. You'd have to know somebody is violating your license before you can even think about doing anything and that's basically impossible if it gets ripped into a private codebase and isnt obvious externally.

hintymad

> "Open Source" got completely perverted into tricking people to work for free for huge financial benefits for others

I'm quite conflicted on this assessment. On one hand, I was wondering if we would get better job market if there were not much open-sourced systems. We may have had a much slower growth, but we would see our growth last for a lot more years, which mean we may enjoy our profession until our retirement and more. On the other hand, open source did create large cakes, right? Like the "big data" market, the ML market, the distributed system market, and etc. Like the millions of data scientists who could barely use Pandas and scipy, or hundreds of thousands of ML engineers who couldn't even bother to know what semi positive definite matrix is.

Interesting times.

electrondood

> doing the basic job of knowledge workers

If you extrapolate and generalize further... what is at risk is any task that involves taking information input (text, audio, images, video, etc.), and applying it to create some information output or perform some action which is useful.

That's basically the definition of work. It's not just knowledge work, it's literally any work.

Daishiman

> P.S., I think the current trend is a wakeup call to us software engineers. We thought we were doing highly creative work, but in reality we spend a lot of time doing the basic job of knowledge workers: retrieving knowledge and interpolating some basic and highly predictable variations. Unfortunately, the current AI is really good at replacing this type of work.

Most of the waking hours of most creative work have this type of drudgery. Professional painters and designers spend most of their time replicating ideas that are well fleshed-out. Musicians spend most of their time rehearsing existing compositions.

There is a point to be made that these repetitive tasks are a prerequisite to come up with creative ideas.

rowanG077

I disagree. AI have shown to most capable in what we consider creative jobs. Music creation, voice acting, text/story writing, art creation, video creation and more.

woah

> Parallel task execution: You can batch dozens of small edits (refactors, tests, boilerplate) and run them concurrently without context juggling. It's super nice to run a bunch of tasks at the same time (something that's really hard to do in Cursor, Cline, etc.)

> It kind of feels like a junior engineer on steroids, you just need to point it at a file or function, specify the change, and it scaffolds out most of a PR. You still need to do a lot of work to get it production ready, but it's as if you have an infinite number of junior engineers at your disposal now all working on different things.

What's the benefit of this? It sounds like it's just a gimmick for the "AI will replace programmers" headlines. In reality, LLMs complete their tasks within seconds, and the time consuming part is specifying the tasks and then reviewing and correcting them. What is the point of parallelizing the fastest part of the process?

johnjwang

In my experience, it still does take quite a bit of time (minutes) to run a task on these agentic LLMs (especially with the latest reasoning models), and in Cursor / Cline / other code editor versions of AI, it's enough time for you to get distracted, lose context, and start working on another task.

So the benefit is really that during this "down" time, you can do multiple useful things in parallel. Previously, our engineers were waiting on the Cursor agent to finish, but the parallelization means you're explicitly turning your brain off of one task and moving on to a different task.

woah

In my experience in Cursor with Claude 3.5 and Gemini 2.5, if an agent has run for more than a minute it has usually lost the plot. Maybe model use in Codex is a new breed?

kfajdsl

A single response can take a few seconds, but tasks with agentic flows can be dozens of back and forths. I've had a fairly complicated Roo Code task take 10 minutes (multiple subtasks).

ctoth

> Each task is processed independently in a separate, isolated environment preloaded with your codebase. Codex can read and edit files, as well as run commands including test harnesses, linters, and type checkers. Task completion typically takes between 1 and 30 minutes, depending on complexity, and you can monitor Codex’s progress in real time.

fourside

> You still need to do a lot of work to get it production ready, but it's as if you have an infinite number of junior engineers at your disposal now all working on different things.

One issue with junior devs is that because they’re not fully autonomous, you have to spend a non trivial amount of time guiding them and reviewing their code. Even if I had easy access to a lot of them, pretty quickly that overhead would become the bottleneck.

Did you think that managing a lot of these virtual devs could get overwhelming or are they pretty autonomous?

fabrice_d

They wrote "You still need to do a lot of work to get it production ready". So I would say it's not much better than real colleagues. Especially since junior devs will improve to a point they don't need your hand holding (remember you also were a junior at some point), which is not proven will happen with AI tools.

bmcahren

Counter-point A: AI coding assistance tools are rapidly advancing at a clip that is inarguably faster than humans.

Counter-point B: AI does not get tired, does not need space, does not need catering to their experience. AI is fine being interrupted and redirected. AI is fine spending two days on something that gets overwritten and thrown away (no morale loss).

rfoo

You don't need to be nice to your virtual junior devs. Saves quite a lot time too.

As long as I spend less time reviewing and guiding than doing it myself it's a win for me. I don't have any fun doing these things and I'd rather yelling at a bunch of "agents". For those who enjoy doing bunch of small edits I guess it's the opposite.

HappMacDonald

I'm definitely wary of the concept of dismissing courtesy when working with AI agents, because I certainly don't want to lose that habit when I turn around and have to interact with humans again.

Jimmc414

> We’ve long used local agents like Cursor and Claude Code, so we didn’t expect too much.

If you don't mind, what were the strengths and limitations of Claude Code compared to Codex? You mentioned parallel task execution being a standout feature for Codex - was this a particular pain point with Claude Code? Any other insights on how Claude Code performed for your team would be valuable. We are pleased with Claude Code at the moment and were a bit underwhelmed by comparable Codex CLI tool OAI released earlier this month.

t_a_mm_acq

Post realizing CC can operate same code base, same file tree on different terminals instances, it's been a significant unlock for us. Most devs have 3 running concurrently. 1. master task list + checks for completion on tasks. 2. operating on current task + documentation. 3. side quests, bugs, additional context.

rinse and repeat once task done, update #1 and cycle again. Add in another CC window if need more tasks concurrently.

downside is cost but if not an issue, it's great for getting stuff done across distributed teams..

naiv

do you have then instance 2 and 3 listening to instance 1 with just a prompt? or how does this work?

runako

> Parallel task execution: You can batch dozens of small edits (refactors, tests, boilerplate) and run them concurrently without context juggling.

This is also part of a recent update to Zed. I typically use Zed with my own Claude API key.

ai-christianson

Is Zed managing the containerized dev environments, or creating multiple worktrees or anything like that? Or are they all sharing the same work tree?

runako

As far as I know, they are sharing a single work tree. So I suppose that could get messy by default.

That said, it might be possible to tell each agent to create a branch and do work there? I haven't tried that.

I haven't seen anything about Zed using containers, but again you might be able to tell each agent to use some container tooling you have in place since it can run commands if you give it permission.

_bin_

I believe cursor now supports parallel tasks, no? I haven't done much with it personally but I have buddies who have.

If you want one idiot's perspective, please hyper-focus on model quality. The barrier right now is not tooling, it's the fact that models are not good enough for a large amount of work. More importantly, they're still closer to interns than junior devs: you must give them a ton of guidance, constant feedback, and a very stern eye for them to do even pretty simple tasks.

I'd like to see something with an o1-preview/pro level of quality that isn't insanely expensive, particularly since a lot of programming isn't about syntax (which most SotA modls have down pat) but about understanding the underlying concepts, an area in which they remain weak.

Atp I really don't care if the tooling sucks. Just give me really, really good mdoels that don't cost a kidney.

dakiol

This whole "LLMs == junior engineers" is so pedantic. Don't we realize that the same way senior engineers thinkg that LLMs can just replace junior engineers, high-level executives think that LLMs will soon replace senior ones?

Junior engineers are not cattle. They are the future senior ones, they bring new insights into teams, new perspectives; diversity. I can tell you the times I have learnt so many valuable things from so-called junior engineers (and not only tech-wise things).

LLMs have their place, but ffs, stop with the "junior engineer replacement" shit.

obsolete_wagie

You need someone thats technical to look at the agent output, senior engineers will be around. Junior engineers are certainly being replaced

dakiol

Thanks, Sherlock. Now, tell me, when senior engineers start to retire, who will replace them? Ah, yeah, I can hear you say "LLMs!". And LLMs will rewrite themselves so we won't need seniors anymore writing code. And LLMs will write all the code companies need. So obvious, of course. We won't need a single senior because we won't have them, because they are not hired these days anymore. Perfect plan.

nadis

In the preview video, I appreciated Katy Shi's comment on "I think this is a reflection of where engineering work has moved over the past where a lot of my time now is spent reviewing code rather than writing it."

Preview video from Open AI: https://www.youtube.com/watch?v=hhdpnbfH6NU&t=878s

As I think about what "AI-native" or just the future of building software loos like, its interesting to me that - right now - developers are still just reading code and tests rather than looking at simulations.

While a new(ish) concept for software development, simulations could provide a wider range of outcomes and, especially for the front end, are far easier to evaluate than just code/tests alone. I'm biased because this is something I've been exploring but it really hit me over the head looking at the Codex launch materials.

ai-christianson

> rather than looking at simulations

You mean like automated test suites?

tough

automated visual fuzzy-testing with some self-reinforcement loops

There's already library's for QA testing and VLM's can give critique on a series of screenshots automated by a playwright script per branch

ai-christianson

Cool. Putting vision in the loop is a great idea.

Ambitious idea, but I like it.

ofirpress

[I'm one of the co-creators of SWE-bench] The team managed to improve on the already very strong o3 results on SWE-bench, but it's interesting that we're just seeing an improvement of a few percentage points. I wonder if getting to 85% from 75% on Verified is going to take as long as it took to get from 20% to 75%.

Snuggly73

I can be completely off base, but it feels to me like benchmaxxing is going on with swe-bench.

Look at the results from multi swe bench - https://multi-swe-bench.github.io/#/

swe polybench - https://amazon-science.github.io/SWE-PolyBench/

Kotlin bench - https://firebender.com/leaderboard

mr_north_london

How long did it take to go from 20% to 75%?

ionwake

Im sorry if Im being silly, but I have paid for the Pro version, $200 a month, everytime I click on Try Codex, it takes me to a pricing page with the "Team Plan" https://chatgpt.com/codex#pricing.

Is this still rolling out? I dont need the team plan too do I?

I have been using openAI products for years now and I am keen to try but I have no idea what I am doing wrong.

jdee

im the same, and it appeared for me 2 mins ago. looks like its still rolling out

throwaway314155

They do this with every major release. Never going to understand why.

mr_north_london

It's still rolling out

ionwake

Thx for the reply, Im in london too ( atm )

bionhoward

What about privacy, training opt out?

What about using it for AI / developing models that compete with our new overlords?

Seems like using this is just asking to get rug pulled for competing with em when they release something that competes with your thing. Am I just an old who’s crowing about nothing? It’s ok for them to tell us we own outputs we can’t use to compete with em?

piskov

What the video: there is an explicit switch at one of the steps about (not) allowing to train on your repo.

lurking_swe

That’s nice. And we trust that it does what it says because…? The AI company (openai, anthropic, etc) pinky promised? Have we seen their source code? How do you know they don’t train?

Facebook has been caught in recent DOJ hearings breaking the law with how they run their business, just as one example. They claimed under oath, previously, to not be doing X, and then years later there was proof they did exactly that.

https://youtu.be/7ZzxxLqWKOE?si=_FD2gikJkSH1V96r

A companies “word” means nothing imo. None of this makes sense if i’m being honest. Unless you personally have a negotiated contract with the provider, and can somehow be certain they are doing what they claim, and can later sue for damages, all of this is just crossing your fingers and hoping for the best.

tough

On the other hand you can enable explicit sharing of your data and get a few million free tokens daily

blixt

They mentioned "microVM" in the live stream. Notably there's no browser or internet access. It makes sense, running specialized Firecracker/Unikraft/etc microkernels is way faster and cheaper so you can scale it up. But there will be a big technical scalability difficulty jump from this to the "agents with their own computers". ChatGPT Operator already does have a browser, so they definitely can do this, but I imagine the demand is orders of magnitudes different.

There must be room for a Modal/Cloudflare/etc infrastructure company that focuses only on providing full-fledged computer environments specifically for AI with forking/snapshotting (pause/resume), screen access, human-in-the-loop support, and so forth, and it would be very lucrative. We have browser-use, etc, but they don't (yet) capture the whole flow.

null

[deleted]

alvis

I used to work for a bank and the legal team used to ping us to make tiny changes to the app for compliance related issues. Now they can fix themselves. I think they’d be very proud and happy

ajkjk

Hopefully nobody lets legal touch anything without the ability to run the code to test it, plus code reviews. So probably not.

singularity2001

that will be an interesting new Bug tracker: anyone in the company will be able to report any bug or add any future request, if the model will be able to solve it automatically perfect otherwise some human might take over. The interesting question then will be what code changes are legal and within the standards of what the company wants. So non-technical code/issue reviewer will become a super important and ubiquitous job.

SketchySeaBeast

Not just legal/within the standards, but which actually meet the unspoken requirements of the request. "We just need a new checkbox that asks if you're left handed" might seem easy, but then it has ramifications for the Application PDF that gets generated, as well as any systems downstream, and maybe it requires a data conversion of some sort somewhere. I know that the PO's I work with miss stuff or assume that the request will just have features by default.

asdev

I promise you the legal team is not pushing any code changes

null

[deleted]

SketchySeaBeast

Is this the same idea as when we switched to multicore machines? The rate of change on the capabilities of a single agent has slowed enough now the only way for OpenAI to appearing to be making decent progress is to have many?

yanis_t

So it's looking like it's only running in the cloud, that is it will push commits to my remote repo before I have a chance to see if it works?

When I'm using aider, after it make a commit what I do, I then immediately run git reset HEAD^ and then git diff (actually I use github desktop client to see the diff) to evaluate what exactly it did, and if I like it or not. Then I usually make some adjustments and only after that commit and push.

danielbln

You may want to pass --no-auto-commits to Aider if you peel them off HEAD afterwards anyway.

flakiness

You can think of this as a managed (cloud) version of their codex command line tool, which runs locally on your laptop.

The secret sauce here seems like their new model, but I expect it to come to API at some point.

codemac

watch the live stream, it shows you the diff as the completed task, you decide whether or not to generate a github pr when you see the diff.

asdev

is the point of this to actually assign tasks to an AI to complete end to end? Every task I do with AI requires atleast some bit of hand holding, sometimes reprompting etc. So I don't see why I would want to run tasks in parallel, I don't think it would increase throughput. Curious if others have better experiences with this

simianwords

I wonder if tools like these are best for semi structured refactors like upgrade to python3, migrate to postgres etc