Has Google solved two of AI's oldest problems?

throwup238

I really hope they have because I’ve also been experimenting with LLMs to automate searching through old archival handwritten documents. I’m interested in the Conquistadors and their extensive accounts of their expeditions, but holy cow reading 16th century handwritten Spanish and translating it at the same time is a nightmare, requiring a ton of expertise and inside field knowledge. It doesn’t help that they were often written in the field by semi-literate people who misused lots of words. Even the simplest accounts require quite a lot of detective work to decipher with subtle signals like that pound sign for the sugar loaf.

> Whatever it is, users have reported some truly wild things: it codes fully functioning Windows and Apple OS clones, 3D design software, Nintendo emulators, and productivity suites from single prompts.

This I’m a lot more skeptical of. The linked twitter post just looks like something it would replicate via HTML/CSS/JS. Whats the kernel look like?

snickerbockers

I'm skeptical that they're actually capable of making something novel. There are thousands of hobby operating systems and video game emulators on github for it to train off of so it's not particularly surprising that it can copy somebody else's homework.

flatline

I believe they can create a novel instance of a system from a sufficient number of relevant references - i.e. implement a set of already-known features without (much) code duplication. LLMs are certainly capable of this level of generalization due to their huge non-relevant reference set. Whether they can expand beyond that into something truly novel from a feature/functionality standpoint is a whole other, and less well-defined, question. I tend to agree that they are closed systems relative to their corpus. But then, aren't we? I feel like the aperture for true novelty to enter is vanishingly small, and cultures put a premium on it vis-a-vis the arts, technological innovation, etc. Almost every human endeavor is just copying and iterating on prior examples.

WhyOhWhyQ

"> Whatever it is, users have reported some truly wild things: it codes fully functioning Windows and Apple OS clones, 3D design software, Nintendo emulators, and productivity suites from single prompts."

Wow I'm doing it way wrong. How do I get the good stuff?

zer00eyz

Your not.

I want you to go into the kitchen and bake a cake. Please replace all the flour with baking soda. If it comes out looking limp and lifeless just decorate it up with extra layers of frosting.

You can make something that looks like a cake but would not be good to eat.

The cake, sometimes, is a lie. And in this case, so are likely most of these results... or they are the actual source code of some other project just regurgitated.

hinkley

We got the results back. You are a horrible person. I’m serious, that’s what it says: “Horrible person.”

We weren’t even testing for that.

jvreeland

I'd love to find more info on this but from what I can find it seems to be making webpages that look like those product, and seemingly can "run python" or "emulate a game" but writing something that, based on all of GitHub, can approximate an iPhone or emulator in javscript/css/HTML is very very very different than writing an OS.

nestorD

Oh! That's a nice use-case and not too far from stuff I have been playing with! (happily I do not have to deal with handwriting, just bad scans of older newspapers and texts)

I can vouch for the fact that LLMs are great at searching in the original language, summarizing key points to let you know whether a document might be of interest, then providing you with a translation where you need one.

The fun part has been build tools to turn Claude code and Codex CLI into capable research assistant for that type of projects.

conception

I will note that 2.5 pro preview… march? Was maybe the best model I’ve used yet. The actual release model was… less. I suspect Google found the preview too expensive and optimized it down but it was interesting to see there was some hidden horsepower there. Google has always been poised to be the AI leader/winner - excited to see if this is fluff or the real deal or another preview that gets nerfed.

efitz

I haven’t seen this new google model but now must try it out.

I will say that other frontier models are starting to surprise me with their reasoning/understanding- I really have a hard time making (or believing) the argument that they are just predicting the next word.

I’ve been using Claude Code heavily since April; Sonnet 4.5 frequently surprises me.

Two days ago I told the AI to read all the documentation from my 5 projects related to a tool I’m building, and create a wiki, focused on audience and task.

I'm hand reviewing the 50 wiki pages it created, but overall it did a great job.

I got frustrated about one issue: I have a github issue to create a way to integrate with issue trackers (like Jira), but it's TODO, and the AI featured on the home page that we had issue tracker integration. It created a page for it and everything; I figured it was hallucinating.

I went to edit the page and replace it with placeholder text and was shocked that the LLM had (unprompted) figured out how to use existing features to integrate with issue trackers, and wrote sample code for GitHub, Jira and Slack (notifications). That truly surprised me.

pavlov

I’ve seen those A/B choices on Google AI Studio recently, and there wasn’t a substantial difference between the outputs. It felt more like a different random seed for the same model.

Of course it’s very possible my use case wasn’t terribly interesting so it wouldn’t reveal model differences, or that it was a different A/B test.

thatoneengineer

https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...

bgwalter

No, just another academic with the ominous handle @generativehistory that is beguiled by "AI". It is strange that others can never reproduce such amazing feats.

pksebben

I don't know if I'd call it an 'amazing feat', but claude had me pause for a moment recently.

Some time ago, I'd been working on a framework that involved a series of servers (not the only one I've talked to claude about) that had to pass messages around in a particular fashion. Mostly technical implementation details and occasional questions about architecture.

Fast forward a ways, and on a lark I decided to ask in the abstract about the best way to structure such an interaction. Mark that this was not in the same chat or project and didn't have any identifying information about the original, save for the structure of the abstraction (in this case, a message bus server and some translation and processing services, all accessed via client.)

so:

- we were far enough removed that the whole conversation pertaining to the original was for sure not in the context window

- we only referred to the abstraction (with like a A=>B=>C=>B=>A kind of notation and a very brief question)

- most of the work on the original was in claude code

and it knew. In the answer it gave, it mentioned the project by name. I can think of only two ways this could have happened:

- they are doing some real fancy tricks to cram your entire corpus of chat history into the current context somehow

- the model has access to some kind of fact database where it was keeping an effective enough abstraction to make the connection

I find either one mindblowing for different reasons.

netsharc

Author says "It is the most amazing thing I have seen an LLM do, and it was unprompted, entirely accidental." and then jumps back to the "beginning of the story". Including talking about a trip to Canada.

Skip to the section headed "The Ultimate Test" for the resolution of the clickbait of "the most amazing thing...". (According to him, it correctly interpreted a line in an 18th century merchant ledger using maths and logic)

HN

Has Google solved two of AI's oldest problems?

Has Google solved two of AI's oldest problems?