Automated Capability Discovery via Foundation Model Self-Exploration
5 comments
·February 12, 2025kittikitti
While I appreciate arxiv.org, I think there should be more peer reviewed work.
viraptor
Ideally we would see some peer review on arxiv itself. There are some... wrappers? of that kind of functionality on https://www.scienceopen.com/ and others, but it would be amazing to see those reviews closer to the source.
SubiculumCode
Personally, I've come to think of the peer-review process as a big reinforcer of the publish or perish culture in academia. Merit review committees are encouraged to rely on the count (and IF scores) of published peer-reviewed papers to measure impact, allowing them to depend on the peer-review publishing process to mint tokens signifying the value of a researcher. While this saves the committees time and gives them an excuse to not actually evaluate the content of the researcher's output, there are costs to researchers.
For good and careful scientists, the peer review process rarely adds much value to the original submission, yet requires a lot of tedious work and energy responding to minor concerns. That time and energy could be spent doing more research. Peer-review adds its most value to bad manuscripts of bad research, where good reviewers coach the authors on how to do science better. This also takes up a lot of time.
If I could do it my way, I'd rather publish to an archive and move on once I feel that the research is to my satisfaction.
SubiculumCode
Is it the case that the authors of these ML papers frequently don't even try to get it into a peer reviewed manuscript?
The study focuses on evaluating GPT-4o, Claude 3.5, and Llama3-8B, but it might benefit a bit from testing across more architectures (like Mixtral, DeepSeek, Gemini). This would help show generalizing of ACD.