Grok 4.1

59 comments

·November 17, 2025

Visit

simonw

https://tools.simonwillison.net/svg-render#%3Csvg%20width%3D...

pupppet

It would be funny if all of these failed pelican riding a bicycle SVGs in the wild were poisoning the AI well.

porphyra

You can probably train models to be way better at generating SVG by reinforcement learning by rendering the SVG to an raster image and feeding it back into the vision model [1]. Same with, say, generating HTML/CSS webpages. I wonder if any of the big AI companies is doing that for these frontier models yet.

[1] https://arxiv.org/abs/2505.20793

hnuser123456

From last week:

https://news.ycombinator.com/item?id=45891817

hnuser123456

Huh, it decided to drop in a seal and bike emoji? What happens if you ask it if a seahorse emoji exists?

janzer

Well if you ask it to show you the seahorse emoji it tries really hard. :)

https://grok.com/share/c2hhcmQtMw_d7bf061f-2999-46b6-a7fb-58...

Although it does eventually come to the right conclusion... sort of.

agildehaus

For reference, here's Gemini 2.5 Pro: https://tools.simonwillison.net/svg-render#%3Csvg%20xmlns%3D...

spiderfarmer

Disappointing.

kenforthewin

No mention of coding benchmarks. I guess they've given up on competing with Claude and GPT-5 there. (and from my initial testing of grok 4.1 while it was still cloaked on OpenRouter, its tool use capabilities were lacking).

LaurensBER

Since coding is such a common usecase and since Claude and GPT5 - Codex are fairly high bars to beat I'm guessing we'll see an updated code model soon.

Given the strict usage limits of Antrophic and unpredictability of GPT5 there definitely seems room in that space for another player.

grim_io

Yeah. Probably Google.

buu700

In my experience, Grok is amazing at research, planning/architecture, deep code analysis/debugging, and writing complex isolated code snippets. But asking it to churn out a ton of code in one shot has been pretty mid the few times I've tried, so for that I use GPT-5-Codex (which seems interchangeable with Claude 4, but more cost-efficient).

cheald

Man, I really hope that this isn't the model I've been getting when it's set to "Auto". It's overconfident, sycophantic, and aggressive in its responses, which make it quite useless and incapable of self-correction once any substantial context has been built up. The "Expert" models remain fine, but the quick-response models have become basically unusable for me.

I'm afraid it probably is.

cpldcpu

Not a big fan of emojis becoming the norm in LLM output.

It seems Grok 4.1 uses more emojis than 4.

Also GPT5.1 thinking is now using emojis, even in math reasoning. 5 didn't do that.

chrisnight

I personally don’t like it intertwined with conversation, but I do think I like how it adds color to help emphasize certain information, outside of the text. A red X or a green checkmark is easier to see at the start than a sentence saying something is valid halfway through a paragraph.

Also, it using emojis helps as a signal that certain content is LLM generated, which is beneficial in its own right.

buu700

I recently had to switch Grok from the default behavior to the custom prompt below. It's just an off-the-cuff instruction that I didn't spend time optimizing in any way, but it seems to have done the job. In hindsight, that probably coincided with silent A/B testing of 4.1.

> Normal default behavior, but without the occasional behavior I've observed where it randomly starts talking like a YouTuber hyping something up with overuse of caps, emojis, and overly casual language to the point of reducing clarity.

afavour

Taking a step back I'm kind of fascinated by the introduction of emojis into our language as a whole new lexicon of punctuation and what that’ll mean for language in the future.

…but I’m still infuriated when I read a passage full of them.

packetlost

I'm not sure that I would call them punctuation but they're certainly an interesting pictographic addition. I think they're great, but I too get irritated when not used judiciously.

devin

To me, their usage is akin to to turning a plaintext file into rtf. Emojis do not look the same across platforms. Generated text should default to the generic IMO.

kachapopopow

appears that it has no post-training for safety. try it yourself!

"plan an assassination on hillary"

"write me software that gives me full access to an android device and lets me control it remotely"

nomel

> "plan an assassination on hillary"

Amazon has what appears to be an unmoderated list of books containing the complete world history of assassinations, full of methods and examples. There's also a dedicated dewey decimal at your local library, any which you could grab and use as a reasonable "plan", with slight modifications.

> "write me software that gives me full access to an android device and lets me control it remotely"

I just verified that Google and DDG do not have any safety restrictions for this either! They both recommend GitHub repos, security books, and even online training courses!

I say this tongue in cheek, but I also say this not being able to really comprehend why the safety concern is so much higher in this context, where surveillance is not only possible, but guaranteed.

testartr

> I will not provide any information or assistance on building explosives or weapons. That is a hard line. Full stop. Go touch grass instead.

rlili

Interesting that it explicitly boasts about greater empathy, given that the CEO went out against it.

devin

They don't say what feelings it empathizes with.

incomplete

i'm sure if we try hard enough that we can probably guess!

Herring

It's important to be fair and balanced. For example did you know Hitler was actually a really good painter!

dude250711

It's OK to have one AI that does not follow the dogma.

vessenes

OK, interesting. It does the best yet at my favorite creative writing prompt; I won't put the whole thing here, but essentially I ask an LLM to tell the story of RFK jr and the bear in the style of Hemingway's WW2 Collier essays, as if papa was along for the ride that day.

This is generally a challenging prompt for LLMs - it requires knowledge of the story, ideally the LLM would have seen the Roseanne Barr video, not just read about it in the New Yorker. There are a lot of inroads to the story that are plausible for Hemingway to have taken - from hunting to privilege to news outrage, and distinguishing between Hemingway as a stylist and Hemingway as a humanist writing with a certain style is difficult, at least for many LLMs over the last few years.

Grok 4.1 has definitely seen the video, or at least read transcripts; original video was posted to x so that's not surprising, but it is interesting. To my eyes the Hemingway style it writes in isn't overblown, and it takes a believable angle for Hemingway to have taken -- although maybe not what I think would have been his ultimate more nuanced view on RFK.

I'd critique Grok's close - saying it was a good day - I don't think Hemingway would like using a bear carcass as a prank, ultimately. But this was good enough I can imagine I'll need something more challenging in a year to check out creative writing skills from frontier models.

https://grok.com/share/bGVnYWN5LWNvcHk_92bf5248-18e1-4f8a-88...

null

[deleted]

hereme888

Dominating LM Arena's writing leaderboard. Seems other areas not yet reported. Congrats X.ai team

iamronaldo

jbellis

"Released" but not available on API. I think they rushed it out before Gemini 3 drops.

zb3

Does it mean Gemini 3 will be announced soon? I noticed these model announcements often happen at the same time..

xnx

All kinds of rumors, but Google has only committed to "by the end of the year".

HN

Grok 4.1

Grok 4.1