Skip to content(if available)orjump to list(if available)

Show HN: Beating Pokemon Red with RL and <10M Parameters

Show HN: Beating Pokemon Red with RL and <10M Parameters

46 comments

·March 5, 2025

Hi everyone!

After spending hundreds of hours, we're excited to finally share our progress in developing a reinforcement learning system to beat Pokémon Red. Our system successfully completes the game using a policy under 10M parameters, PPO, and a few novel techniques. With the release of Claude Plays Pokémon, now feels like the perfect time to showcase our work.

We'd love to get feedback!

levocardia

Really cool work. It seems like some critical areas (team rocket, safari zone) rely on encoding game knowledge into the reward function somehow, which "smuggles in" external intelligence about the game. A lot of these are related to planning, which makes me wonder whether you could "bolt on" an LLM to do things like steer the RL agent, dynamically choose what to reward, or even do some of the planning itself. Do you think there's any low-hanging fruit on this front?

Xelynega

For well-known games like "Pokemon Red" I wonder how much of that game knowledge would be "smuggled in" by an LLM in it's training data if you just replaced the external info in the reward function with it/used it to make up for other deficiencies.

I think they allude to this in their conclusion, but it's less about the low-hanging fruit and more about designing a system to feedback game dialogue into the RL decision making process in a way that can be mutated as part of the RL(be it an LLM or something else)

drubs

Wrote about this in the results section. I think there is a way to mix the two and simplify the rewards in the process. A lot of the magic behind getting the agent to teach and use cut probably could have been handled by an LLM.

novia

Please stream the gameplay to twitch so people can compare.

tehsauce

We have a shared community map where you can watch hundreds of agents from multiple peoples training runs playing in real time!

https://pwhiddy.github.io/pokerl-map-viz/

bubblyworld

What an awesome project! I'm curious - I would have thought that rewarding unique coordinates would be enough to get the agent to (eventually) explore all areas, including the key ones. What did the agents end up doing before key areas got an extra reward?

(and how on earth did you port Pokémon red to a RL environment? O.o)

drubs

The environments wouldn't concentrate enough in the Rocket Hideout beneath Celadon Game Corner. The agent would have the player wander the world reward hacking. With wild battles enabled, the environments would end up in Lavender Tower fighting Gastly.

> (and how on earth did you port Pokémon red to a RL environment? O.o)

Read and find out :)

wegfawefgawefg

you dont port it you wrap it. you can put anything in an rl environment. usually emulators are done with bizhawk, and some lua. worst case theres ffi or screen capture.

drubs

My first version of this project 5 years ago involved a python-lua named pipe using Bizhawk actually. No clue where that code went

benopal64

Incredible work. I am just learning about PyBoy from your project, and it made me think of many fun ways to use that library to play Pokemon autonomously.

endofreach

> Pokémon Red takes 25 hours on average for a new player to complete.

Seriously? I've never really played video games, but i remember spending so much time on pokemon red when i was young. Not sure if i ever really finished more than once. But i'm pretty sure i must have played for more than 50h or so before even close to finish. My memory might trick me though.

Not sure which pokemon version it was, but i got so hooked trying to get this "secret" pokemon which was just a bunch of pixels. Some kind of bug (of the game, not the type of pokemon). You had to do specific things in a park and other things and then surf up and down x-times on the right shore of an island... or something like that. I had no idea how it worked and got so hooked, i must have spent most of my playing time on things like that.

Oh boy, memories...

ludicity

It definitely took me way more than 25 hours as a kid to beat Pokemon Blue! But I was so young that I didn't understand that "Oak: Hello!" meant that someone called Oak was talking.

The glitched Pokemon you're talking about is Missingno by the way! I remember surfing up and down Cinnabar Island to do the same thing.

jononor

Very nice! Nice to see demonstrations of reinforcement learning being used to solve non-trivial tasks.

differintegral

This is very cool, congrats!

I wonder, does anyone have a sense of the approximate raw number of button presses required to beat the game? Mostly curious to see how that compares to the parameter count.

tarentel

I imagine < 10000. https://github.com/KeeyanGhoreshi/PokemonFireredSingleSequen... and https://www.youtube.com/watch?v=6gjsAA_5Agk. I believe this is something like 200k and is a slightly different game. Quite a bit less than 10m either way.

nimish

Considering how many things are less complicated than Pokemon, this is very cool

worble

Heads up, clicking "Next Page" just takes you to an empty screen, you have to use the navigation links on the left if you want to get read past the first screen.

drubs

Thanks for the heads up. I just pushed a fix.

worble

I think you fixed the one below the puffer.ai image, but not the one above Authors.

drubs

and...fixed!

bee_rider

Ah, very neat.

Maybe some day the “rival” character in Pokemon can be played by a RL system, haha. That way you can have a “real player (simulated)” for your rival.

xinpw8

a cool idea, except that battling actually doesn't even matter to the ai. if you look at what the agent is doing during a battle, it is sort of spamming options + picking damaging attacks. it would be a stretch to say that agents were 'good' at battling...

wegfawefgawefg

if youve done the work to to make the rival rl based and have the ability to go around youd probably have added basic battle controls

modeless

Can't Pokemon be beaten by almost random play?

tehsauce

It's impossible to beat with random actions or brute force, but you can get surprisingly far. It doesn't take too long to get halfway through route 1, but even with insane compute you'll never make it even to viridian forest.

drdeca

Judging by the “pi plays Pokemon Sapphire”, uh, not in a reasonable amount of time? It’s been at it for over 3 years, hasn’t gotten a gym badge yet, mostly stays in the starting town.

VertanaNinjai

It can be brute forced if that’s what you mean. It has a fairly low difficulty curve and these old games have a grid system for movement and action selections. That’s why they’re pointing out the lower parameter amount and CPU. The point I took away is doing more with less.

xinpw8

It definitely cannot be beaten using random inputs. It doesn't even get out of Pallet Town after billions of steps. We tested...

fancyswimtime

the game has been beaten by fish

bloomingkales

The win condition of the game is the entire state of the game configured in a certain way. So there exists a lot of win conditions, you just have to do a search.

mclau156

Could you have used the decompilations of pokemon on github? https://github.com/pret/pokered

drubs

There's an entire section on how the decompilations were used :)

mclau156

Ok sorry I thought maybe there was a chance that the decomp project could edited in a way that would create a ROM that allowed RL to be done easier, but it seems like it just came in handy for looking up values along with the GB ASM tutorial, the alternative of my thought process is re-creating pokemon red in a modern language which you also mentioned

xinpw8

if you helped with pret then god bless you