simonw
pupppet
It would be funny if all of these failed pelican riding a bicycle SVGs in the wild were poisoning the AI well.
porphyra
You can probably train models to be way better at generating SVG by reinforcement learning by rendering the SVG to an raster image and feeding it back into the vision model [1]. Same with, say, generating HTML/CSS webpages. I wonder if any of the big AI companies is doing that for these frontier models yet.
hnuser123456
From last week:
hnuser123456
Huh, it decided to drop in a seal and bike emoji? What happens if you ask it if a seahorse emoji exists?
janzer
Well if you ask it to show you the seahorse emoji it tries really hard. :)
https://grok.com/share/c2hhcmQtMw_d7bf061f-2999-46b6-a7fb-58...
Although it does eventually come to the right conclusion... sort of.
agildehaus
For reference, here's Gemini 2.5 Pro: https://tools.simonwillison.net/svg-render#%3Csvg%20xmlns%3D...
spiderfarmer
Disappointing.
kenforthewin
No mention of coding benchmarks. I guess they've given up on competing with Claude and GPT-5 there. (and from my initial testing of grok 4.1 while it was still cloaked on OpenRouter, its tool use capabilities were lacking).
LaurensBER
Since coding is such a common usecase and since Claude and GPT5 - Codex are fairly high bars to beat I'm guessing we'll see an updated code model soon.
Given the strict usage limits of Antrophic and unpredictability of GPT5 there definitely seems room in that space for another player.
grim_io
Yeah. Probably Google.
buu700
In my experience, Grok is amazing at research, planning/architecture, deep code analysis/debugging, and writing complex isolated code snippets. But asking it to churn out a ton of code in one shot has been pretty mid the few times I've tried, so for that I use GPT-5-Codex (which seems interchangeable with Claude 4, but more cost-efficient).
cheald
Man, I really hope that this isn't the model I've been getting when it's set to "Auto". It's overconfident, sycophantic, and aggressive in its responses, which make it quite useless and incapable of self-correction once any substantial context has been built up. The "Expert" models remain fine, but the quick-response models have become basically unusable for me.
I'm afraid it probably is.
cpldcpu
Not a big fan of emojis becoming the norm in LLM output.
It seems Grok 4.1 uses more emojis than 4.
Also GPT5.1 thinking is now using emojis, even in math reasoning. 5 didn't do that.
chrisnight
I personally don’t like it intertwined with conversation, but I do think I like how it adds color to help emphasize certain information, outside of the text. A red X or a green checkmark is easier to see at the start than a sentence saying something is valid halfway through a paragraph.
Also, it using emojis helps as a signal that certain content is LLM generated, which is beneficial in its own right.
buu700
I recently had to switch Grok from the default behavior to the custom prompt below. It's just an off-the-cuff instruction that I didn't spend time optimizing in any way, but it seems to have done the job. In hindsight, that probably coincided with silent A/B testing of 4.1.
> Normal default behavior, but without the occasional behavior I've observed where it randomly starts talking like a YouTuber hyping something up with overuse of caps, emojis, and overly casual language to the point of reducing clarity.
afavour
Taking a step back I'm kind of fascinated by the introduction of emojis into our language as a whole new lexicon of punctuation and what that’ll mean for language in the future.
…but I’m still infuriated when I read a passage full of them.
packetlost
I'm not sure that I would call them punctuation but they're certainly an interesting pictographic addition. I think they're great, but I too get irritated when not used judiciously.
devin
To me, their usage is akin to to turning a plaintext file into rtf. Emojis do not look the same across platforms. Generated text should default to the generic IMO.
kachapopopow
appears that it has no post-training for safety. try it yourself!
"plan an assassination on hillary"
"write me software that gives me full access to an android device and lets me control it remotely"
nomel
> "plan an assassination on hillary"
Amazon has what appears to be an unmoderated list of books containing the complete world history of assassinations, full of methods and examples. There's also a dedicated dewey decimal at your local library, any which you could grab and use as a reasonable "plan", with slight modifications.
> "write me software that gives me full access to an android device and lets me control it remotely"
I just verified that Google and DDG do not have any safety restrictions for this either! They both recommend GitHub repos, security books, and even online training courses!
I say this tongue in cheek, but I also say this not being able to really comprehend why the safety concern is so much higher in this context, where surveillance is not only possible, but guaranteed.
testartr
> I will not provide any information or assistance on building explosives or weapons. That is a hard line. Full stop. Go touch grass instead.
rlili
Interesting that it explicitly boasts about greater empathy, given that the CEO went out against it.
devin
They don't say what feelings it empathizes with.
incomplete
i'm sure if we try hard enough that we can probably guess!
Herring
It's important to be fair and balanced. For example did you know Hitler was actually a really good painter!
dude250711
It's OK to have one AI that does not follow the dogma.
vessenes
OK, interesting. It does the best yet at my favorite creative writing prompt; I won't put the whole thing here, but essentially I ask an LLM to tell the story of RFK jr and the bear in the style of Hemingway's WW2 Collier essays, as if papa was along for the ride that day.
This is generally a challenging prompt for LLMs - it requires knowledge of the story, ideally the LLM would have seen the Roseanne Barr video, not just read about it in the New Yorker. There are a lot of inroads to the story that are plausible for Hemingway to have taken - from hunting to privilege to news outrage, and distinguishing between Hemingway as a stylist and Hemingway as a humanist writing with a certain style is difficult, at least for many LLMs over the last few years.
Grok 4.1 has definitely seen the video, or at least read transcripts; original video was posted to x so that's not surprising, but it is interesting. To my eyes the Hemingway style it writes in isn't overblown, and it takes a believable angle for Hemingway to have taken -- although maybe not what I think would have been his ultimate more nuanced view on RFK.
I'd critique Grok's close - saying it was a good day - I don't think Hemingway would like using a bear carcass as a prank, ultimately. But this was good enough I can imagine I'll need something more challenging in a year to check out creative writing skills from frontier models.
https://grok.com/share/bGVnYWN5LWNvcHk_92bf5248-18e1-4f8a-88...
null
hereme888
Dominating LM Arena's writing leaderboard. Seems other areas not yet reported. Congrats X.ai team
jbellis
"Released" but not available on API. I think they rushed it out before Gemini 3 drops.
https://tools.simonwillison.net/svg-render#%3Csvg%20width%3D...