Honda: 2 years of ml vs 1 month of prompting - heres what we learned
11 comments
·November 10, 2025pards
> Over multiple years, we built a supervised pipeline that worked. In 6 rounds of prompting, we matched it. That’s the headline, but it’s not the point. The real shift is that classification is no longer gated by data availability, annotation cycles, or pipeline engineering.
stego-tech
And this is where the strengths of LLMs really lie: making performant ML available to a wider audience, without requiring PHDs in Computer Science or Mathematics to build. It’s consistently where I spend my time tinkering with these, albeit in a local-only environment.
If all the bullshit hype and marketing would evaporate already (“LLMs will replace all jobs!”), stuff like this would float to the top more and companies with large data sets would almost certainly be clamoring for drop-in analysis solutions based on prompt construction. They’d likely be far happier with the results, too, instead of fielding complaints from workers about it (AI) being rammed down their throats at every turn.
Veliladon
^ This. I'm waiting for an LLM where I can just point it to a repo, slurp it up, and let me ask questions about it.
nmfisher
$ git clone repo && cd repo $ claude
Ask away. Best method I’ve found so far for this.
cpursley
github copilot somewhat does this.
etothet
This is exactly what Devin (https://devin.ai) is designed to do. Their deepwiki feature is free. I’ve personally had decent success with it, but YMMV.
stogot
This was fun to read
“ Fun fact: Translating French and Spanish claims into German first improved technical accuracy—an unexpected perk of Germany’s automotive dominance.”
happimess
I wonder how they came up with that. Was it a human idea, or did the AI stumble upon it?
Given that it was inside a 9-step text preprocessing pipeline, it would be surprising if the AI had that much autonomy.
yahoozoo
I wonder if text embeddings and semantic similarity would be effective here?
davidsainez
> We tried multiple vectorization and classification approaches. Our data was heavily imbalanced and skewed towards negative cases. We found that TF-IDF with 1-gram features paired with XGBoost consistently emerged as the winner.
Crucially, this is: