Skip to content(if available)orjump to list(if available)

OmniParser for Pure Vision Based GUI Agent

trq_

This is awesome, can't wait for evals against Claude Computer Use!

jauntywundrkind

I have a little bit of a vice of enjoying some "idle" games. I have intended to do some very basic manual screen carving & ocr & computer vision to try to "read" my state in these games, & have multi-actor "play" models for them, just for fun really & to decrease time sunk gaming (by spending significant time coding/learning).

This certainly seems like it has a lot of promise to make that much much much easier. Game UI's are less uniform so maybe this might be harder or not easily be applicable, but hopefully