Show HN: OpenNutrition – A free, public nutrition database

151 comments

·April 3, 2025

Hi HN!

Today I’m excited to launch OpenNutrition: a free, ODbL-licenced nutrition database of everyday generic, branded, and restaurant foods, a search engine that can browse the web to import new foods, and a companion app that bundles the database and search as a free macro tracking app.

Consistently logging the foods you eat has been shown to support long-term health outcomes (1)(2), but doing so easily depends on having a large, accurate, and up-to-date nutrition database. Free, public databases are often out-of-date, hard to navigate, and missing critical coverage (like branded restaurant foods). User-generated databases can be unreliable or closed-source. Commercial databases come with ongoing, often per-seat licensing costs, and usage restrictions that limit innovation.

As an amateur powerlifter and long-term weight loss maintainer, helping others pursue their health goals is something I care about deeply. After exiting my previous startup last year, I wanted to investigate the possibility of using LLMs to create the database and infrastructure required to make a great food logging app that was cost engineered for free and accessible distribution, as I believe that the availability of these tools is a public good. That led to creating the dataset I’m releasing today; nutritional data is public record, and its organization and dissemination should be, too.

What’s in the database?

- 5,287 common everyday foods, 3,836 prepared and generic restaurant foods, and 4,182 distinct menu items from ~50 popular US restaurant chains; foods have standardized naming, consistent numeric serving sizes, estimated micronutrient profiles, descriptions, and citations/groundings to USDA, AUSNUT, FRIDA, CNF, etc, when possible.

- 313,442 of the most popular US branded grocery products with standardized naming, parsed serving sizes, and additive/allergen data, grounded in branded USDA data; the most popular 1% have estimated micronutrient data, with the goal of full coverage.

Even the largest commercial databases can be frustrating to work with when searching for foods or customizations without existing coverage. To solve this, I created a real-time version of the same approach used to build the core database that can browse the web to learn about new foods or food customizations if needed (e.g., a highly customized Starbucks order). There is a limited demo on the web, and in-app you can log foods with text search, via barcode scan, or by image, all of which can search the web to import foods for you if needed. Foods discovered via these searches are fed back into the database, and I plan to publish updated versions as coverage expands.

- Search & Explore: https://www.opennutrition.app/search

- Methodology/About: https://www.opennutrition.app/about

- Get the iOS App: https://apps.apple.com/us/app/opennutrition-macro-tracker/id...

- Download the dataset: https://www.opennutrition.app/download

OpenNutrition’s iOS app offers free essential logging and a limited number of agentic searches, plus expenditure tracking and ongoing diet recommendations like best-in-class paid apps. A paid tier ($49/year) unlocks additional searches and features (data backup, prioritized micronutrient coverage for logged foods), and helps fund further development and broader library coverage.

I’d love to hear your feedback, questions, and suggestions—whether it’s about the database itself, a really great/bad search result, or the app.

1. Burke et al., 2011, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268700/

2. Patel et al., 2019, https://mhealth.jmir.org/2019/2/e12209/

Visit

Cheer2171

> Final nutritional data is generated by providing a reasoning model with a large corpus of grounding data. The LLM is tasked with creating complete nutritional values, explicitly explaining the rationale behind each value it generates. Outputs undergo rigorous validation steps, including cross-checking with advanced auditing models such as OpenAI’s o1-pro, which has proven especially proficient at performing high-quality random audits. In practice, o1-pro frequently provided clearer and more substantive insights than manual audits alone.

This is not a dataset. This is an insult to the very idea of data. This is the most anti-scientific post I have ever seen voted to the top of HN. Truth about the world is not derived from three LLMs stacked on top of each other in a trenchcoat.

justsid

I find this actually very upsetting. My wife does calorie counting and all of the apps for it are horrible, especially the market leaders. But those have one thing going for them: Databases of nutritional information, which can be used for easy meal calorie counting. Just enter the ingredients (usually you can scan a barcode) and how much you ate of the total and it tells you where you are standing on caloric and nutritional intake. But even those datasets aren’t always bang on, especially here in Canada where some products share bar codes with US products but they have different nutritional values. Reading the title, I was very excited about the ability to make my wife a better app to support her needs. Unfortunately this is not at all usable for this use case or really any? What’s the point of having data that you just can’t trust at all?

joshdickson

Why don't you think you can trust the dataset at all? There's a 100+ comment discussion here where we've really only found that perhaps the Australian database's "unsweetened oat milk" (which your wife could log in the app by barcode for an exact product match) doesn't match an expected result.

I use the dataset every day, and when I find something unexpected, it's generally been my own understanding of the food's contents and not a database inaccuracy, although those certainly do exist and I squash them whenever I see a report.

rendaw

By 100+ comment discussion I assume you mean this HN post in its whole? People here aren't checking the facts, so the fact that only one person found an issue doesn't mean much.

creativeCak3

I agree so much with you. This is not a dataset. This is the vomit of an LLM making stuff up. Like...why couldn't you just collect the data that already exist?? Why do you need an LLM?

Adding an LLM to this just adds a unnecessary layer of complexity, for what benefit? For street cred?

joshdickson

There's an in-depth review of the reasoning for undertaking this project in general and this approach in particular in the Methodology/About section below, see "Current State of Nutritional Data".

Millions of people use food logging apps to drive behavioral change and help adhere to healthy lifestyles. I believe there is immense societal good in continuing to offer improved tools to accomplish this, especially for free, and that's why I created the project and chose to open source the data.

https://www.opennutrition.app/about#current-state-of-nutriti...

ZunarJ5

As soon as I saw "AI enhanced for Accuracy" I laughed and wondered if this was a belated April Fools joke.

tmpz22

Imagine how much more efficient government would be if we just generate all the data with LLMs.

NewJazz

Stop. Giving. Them. Ideas.

https://www.reddit.com/r/ABoringDystopia/comments/1jq8kzl/th...

pmichaud

[flagged]

rmah

It doesn't matter how accurate the models are, it's not a "data set" (in the scientific sense), it's more of a conclusion set. Maybe the conclusions are spot on. Maybe not. I have no idea.

Cheer2171

Right. At my most generous, this is a dataset about LLM behavior when asked to infer nutritional value. It is in no way a nutrition dataset. It is perhaps useful as half of a benchmark for accuracy, compared to actual ground truth. Unlike a scientist, you're not motivated or resourced enough to create the ground truth dataset. So you took a shortcut and hid it from the landing page.

This workflow, this motivation, this business model, this marketing is an affront to truth itself.

joshdickson

I envisioned many lines of inquiry from HN but the idea that a compressed TSV of nutritional data is not a "dataset" (definition: a collection of related sets of information that is composed of separate elements but can be manipulated as a unit by a computer) was unexpected.

thi2

Tried it with unsweetened oat milk and the info was off in nearly every col.

Not representable because I dont have US food but since its AI enhanced I cant compare my stuff with the stuff in the "dataset" and be sure thats an Us vs germany thing..

joshdickson

Would you mind posting/messaging me in some way (links in bio) what you expected it to show?

It looks like for unsweetened oat milk:

https://www.opennutrition.app/search/unsweetened-oat-milk-mt...

...it is leaning into a citation from the Australian Nutrient Database (e.g. Oat beverage, fluid, unfortified. Australian Nutrient Database. Public Food Key F006132. ), which is what I instructed it to do if it thought there was an exact match from a governmental database.

It's possible this is a poor general source for oat milk or that's not the beverage intended for the entry to stand for. I'll check it out, thank you for the report.

yamihere

>> User-generated databases can be unreliable

>> Foods discovered via these searches are fed back into the database,

Aren’t LLMs also unreliable? How do you ensure the new content is from an authoritative, accurate source? How do you ensure the numbers that make it into the database are actually what the source provided?

According to the Methodology/About page

>> The LLM is tasked with creating complete nutritional values, explicitly explaining the rationale behind each value it generates. Outputs undergo rigorous validation steps,

Those rigorous validation steps were also created with LLMs, correct?

>> whose core innovations leveraged AI but didn’t explicitly market themselves as “AI products.”

Odd choice for an entirely AI based service. First thought I had after reading that was: must be because people don’t trust AI generated information. Seems disengenuous to minimize the AI aspect in marketing while this product only exists because of AI.

Great idea though, thanks for giving it a shot!

joshdickson

> Those rigorous validation steps were also created with LLMs, correct?

Not really. I do explain in the methodology post how good o1-pro is at the task, but there was a lot of manual effort involved in coming to that conclusion with my own effort to review the LLM's reasoning, and even still, o1-pro is not perfect.

yamihere

Nice! Thanks for responding.

>> Outputs undergo rigorous validation steps, including cross-checking with advanced auditing models such as OpenAI’s o1-pro, which has proven especially proficient at performing high-quality random audits.

>> there was a lot of manual effort involved in coming to that conclusion with my own effort to review the LLM's reasoning

So, the randomly audited entries seemed reasonable to you – not even the data itself, just the reasoning about the generated data. Did the manual reviews stop once things started looking good enough? Are the audits ongoing, to fill out the rest of the dataset? Would those be manually double-checked as well?

>> I became interested in exploring how recent advances in generative AI could enable entirely new kinds of consumer products—ones whose core innovations leveraged AI but didn’t explicitly market themselves as “AI products.”

Once again: Why not market this as an AI product? This is LLMs all the way down.

People are already interested in using this dataset. I was. Now, LLM generated “usually close enough to not be actively harmful” data is being distributed as a source for any and all to use. I think your disclaimer is excellent. Does your license require an equivalent disclaimer be provided by those using this data?

joshdickson

> not even the data itself, just the reasoning about the generated data

Poor phrasing on my end -- yes, absolutely the end data as well as the reasoning, as the reasoning tends to include the final answer.

Maybe I should! Appreciate the feedback.

rob

Not really sure how the author thinks anybody who tracks their calories/macros seriously is going to trust a website that literally just makes up values for the vitamins, minerals, etc:

> TL;DR: They are estimates from giving an LLM (generally o3 mini high due to cost, some o1 preview) a large corpus of grounding data to reason over and asking it to use its general world knowledge to return estimates it was confident in, which, when escalating to better LLMs like o1-pro and manual verification, proved to be good enough that I thought they warranted release.

yamihere

That’s the best part! People don’t care and won’t check! They’ll just pay money!

Most of the data being close enough to be better than nothing and not actively harmful + a disclaimer and the author is absolved of all responsibility!

Even better, this will now be used in all sorts of other apps, analyses, and for training other LLMs! And I expect all those will also prominently include an “all of this was genereated by an LLM” disclamers. For sure.

XorNot

Also https://world.openfoodfacts.org/ exists, and has an app with everything you'd need. And is just crowd sourcing nutrition labels and barcodes.

joshdickson

OpenFoodFacts is a huge inspiration to this project, obviously. However, as someone with a normal diet, OFF lacks:

1. Generic, non-branded foods

2. Simple prepared foods that ease food entry

3. Restaurant foods

4. Micronutrients beyond those reported by the brand.

OFF is a fantastic project but OpenNutrition is really trying to fit a different niche. OFF does what it does very well; I would never be able to use it to track my food intake.

joshdickson

I have tracked my macro intake seriously for years and use the database every day, as do many folks who used the initial app releases. It's actually more valuable to me to have the data in this format, even estimated, because what happens with other apps is you get gaps in macronutrient reporting on things like Omega 3's, and you wonder 'Am I not eating any Omega 3's or does the database containing the food I ate just not include them?'. In that case I'd much rather have an LLM that had access to as much relevant data as I could feed it reason through approximate nutrient distribution and give me the best estimate it could.

Appreciate the feedback!

lm28469

The search is broken on safari, every time it refreshes you lose the focus on the text input, which means you have to click on the search bar after every single character you type. The filters are broken, type "chocolate", chose the M&M's brand, none of the labels return a result despite showing (xxx)

> I wanted to investigate the possibility of using LLMs

ah, yeah, I guess it makes sense then...

joshdickson

Ah that is an embarrassing bug. Mobile safari does not do that. Thank you for the report, looking to see why that is now.

Edit: Should be patched in Desktop Safari now.

jonesy827

It's still erroring in Firefox on macOS and Windows. I see a CORS error on the XHR request

joshdickson

Should be back up now, I didn't scale up quickly enough for the traffic. My apologies and thank you for the report.

fastball

The search/filtering is broken in Chrome as well, seems to be a deeper issue than something browser-specific.

bhatfiel

LLM generated nutrition for accuracy.

The first item I manually look up is has about double calories listed in the "dataset" versus reality. Honey bunches of oats honey roasted.

joshdickson

Both products show a 1 cup (41g) and 160 calorie serving size to me?

OpenNutrition: https://www.opennutrition.app/search/honey-bunches-of-oats-h...

Via Manufacturer: https://www.honeybunchesofoats.com/product/honey-bunches-of-...

If you wouldn't mind DM'ing me the barcode you're looking at that would be helpful to understand what the nature of the discrepancy is.

throwway120385

Oof. That makes it completely useless for counting calories. It would be especially bad because the labeling for a lot of ready-made products is available from the manufacturer's website so it should be pretty easy to get it right.

johnisgood

Just at a quick glance...

How can a large egg (50 g) contain 147 g choline?

https://www.opennutrition.app/search/eggs-eeG7JQCQipwf

Additionally, on https://www.opennutrition.app/search/brown-lentils-VwKWF7CQq... it says:

> Unlike larger legumes, they require no pre-soaking and cook in 20-30 minutes, making them ideal for soups, stews, and salads

That is not necessarily true. Based on my experience, it does require pre-soaking, otherwise you will have to cook it for a long time, as opposed to red lentils (which is done under 15 minutes, no pre-soaking needed), although red lentils taste more like yellow peas.

In any case, I think this could be really useful, once accurate enough. One could even implement other features on top, such as a calorie tracker and so forth, but that is a huge project on its own.

I wish you luck!

joshdickson

That is missing a milligram label, thank you for pointing that out. Fix uploading now.

johnisgood

That is what I thought.

BTW when you hover over the ingredients, you just get back the name. Are you guys going to do something with it in the future? Right now there is a visual feedback (the cursor changes), but it is not useful yet. I am not entirely sure what I would have expected, perhaps a description of what it is, and upon clicking on it, it could have information gathered from various sources, like examine.com and what have you, but that would be a huge change on its own, the short description upon mouse hover-over should work for now and may not be a huge change.

joshdickson

The goal, without question, is 100% full coverage on citations for every piece of data that's in the database, even if the citation is an LLM's general reasoning (which for o1-pro is both quite good and often includes study citations).

Right now you'll see that aggregated on some items like this where the reported data is an ensemble of all of the linked resources: https://www.opennutrition.app/search/eggs-eeG7JQCQipwf

Frankly, I just couldn't justify the additional time and monetary expense in doing that if I released this initial version and nobody cared or found it useful. This dataset was also compiled before tools like Claude Citations came out which could make it easier. That is the nature of AI-driven data; I think this is useful now, it is also the worst it will ever be.

null

[deleted]

ramon156

I think they meant mg. Eggs are 293/100gr

null

[deleted]

diggan

As this seems US focused, I'll share an alternative that works really well with European products (and a lot of US ones too, apparently): https://yuka.io/en/

Really easy to use (just scan the barcode and you get easily digested data about the product) has every product imaginable, also analyzes cosmetics and best of all, all the basic functionality is free.

Not affiliated, been using it for years at this point and now it's an essential partner when going shopping. That they let people decide their own premium pricing per year is just icing on the cake.

8mobile

I'm not going to discuss whether or not to use an LLM, I just want to thank you for opennutrition because it's very useful for checking the nutritional values of each food, especially for diabetics.

briandoll

Real and open nutritional datasources exist: https://support.cronometer.com/hc/en-us/articles/36001823947...

joshdickson

OpenNutrition uses many of the same open datasources included there, including USDA SR, CNF, AUSNUT, etc. The other datasources are licensed and not open, and I do not use those so that I can deliver a free app with a more generous set of features.

papa_bear

This is neat. I've spent a lot of time thinking about implementing something similar for my company Eat This Much, but end up pushing it off in favor of focusing on our core meal planning features.

When something doesn't have a reference listed, and just says "sourced from a publicly available first-party datasource", what does that mean? Crawled from other sources and you'd prefer not to say? The wording does feel a little sketchy when contrasted with entries that do list sources.

When something does list references that don't seem super close to the actual food, what is the process like there for interpreting those values? Example, this Chicken Salad inheriting from Chicken Spread: https://www.opennutrition.app/search/chicken-salad-37mAX17YX...

The quality of the data might feel rough now, but I can see this being valuable for our users even if it's just an opt-in "show estimated micronutrients" or something. Would require labeling values as not being directly from a source of truth.

One thing that a lot of people are missing is that there is already a lot of inaccurate nutrition data out there. Even on information directly from the manufacturer, sometimes there are errors, or just old versions of the product that never get scrubbed from the internet (I imagine the latter case would be tricky for an LLM to deal with too). Just logging your dietary intake in any form will get you 80% of the benefit of tracking via some self awareness of your intake. Of course, it's an easy argument to point out that if you had the choice between verified data and fuzzy LLM data, you should go for the human verified data (for now).

joshdickson

Thank you for your questions and feedback.

> When something doesn't have a reference listed, and just says "sourced from a publicly available first-party datasource", what does that mean?

It depends, and the degree to which it depends is why the citation is ambiguous (although it is true, if imprecise). My goal is to individually cite the individual nutrients but it was simply too costly and time-consuming at the stage of the project at which I did this work.

> what is the process like there for interpreting those values?

Because the degree to which something in the database might be related to those values is so varied, it depends. The reasoning agent had access to those database entires, which is helpful because they tend to contain micronutrient data. It also had access to web data, as well as its own world knowledge, and considers sources in that order. Ultimately it was left up to the agent to decide what the most reasonable fit for each food was, thinking through what an average user likely meant by that entry (e.g. a typical user probably assumes a 'Tomato' is raw), and then to choose the best sources from there. For the chicken salad, it used approximate micronutrient values from the listed references to inform its answer, but adapted the end values for how the dish is described in the description.

> if you had the choice between verified data and fuzzy LLM data, you should go for the human verified data (for now)

Human verification isn't free, and that means it is not available to a lot of people who can't or don't want to pay for something. But if that's something that someone values, I would certainly not diss the human effort!

papa_bear

Very cool, thanks for elaborating on the process. Good luck, I'll be keeping an eye on your progress!

monkburger

There’s an important caveat to keep in mind when it comes to food databases, especially those relying on branded or restaurant items:

U.S. law does not require food manufacturers to disclose everything that goes into their products. Under the Code of Federal Regulations (21 CFR § 101.100), there are exemptions to ingredient labeling... An example: flavorings, spices, and incidental additives (like processing aids or anti-caking agents) are not always listed explicitly. Also: proprietary blends and "natural flavors" can legally conceal dozens of chemicals (some synthetic), which consumers have no way of identifying.

Micronutrient data is often estimated or missing from labels and restaurant menus, which limits the accuracy of even the best-intentioned databases. Studies show that the nutritional information provided by restaurants and brands is frequently incomplete or inaccurate, especially when it comes to sodium, sugar, and actual serving sizes. (Urban et al. "The Energy Content of Restaurant Foods Without Stated Calorie Information" ; Labuza et al., 2008 and others)

IMO Food databases are only as accurate as the source data allows. Until food labeling laws mandate full disclosure and third-party verification, apps like this can support health awareness. Still, they shouldn't be treated as precise medical or dietary guidance—particularly for people with allergies, sensitivities, or chronic health conditions that require strict tracking.

adamas

What's the main difference between this and OpenFoodFacts really ?

masijo

Well, OpenFoodFacts are actual facts. This seems to rely on LLMs to do the job.

adamas

Oh, it's worse then.

hombre_fatal

The problem with OpenFoodFacts is that it just has nutrition label info for packaged goods.

So, very little nutrient info beyond calories and protein. No info about micronutrients. No info about minerals, vitamins, amino acids, fatty acids.

It's useless for nutrition tracking since if you're eating packaged food, then you already have that information yourself.

It doesn't answer basic questions like "I ate 100g of extra firm tofu, how did it move me towards my daily mineral/vitamin targets?"

sodality2

> So, very little nutrient info beyond calories and protein. No info about micronutrients. No info about minerals, vitamins, amino acids, fatty acids.

Many items do have these things.

https://world.openfoodfacts.org/product/5060495116377/huel-b...

hombre_fatal

That is one exception, and it's only because Huel reports that info since it's a fortified meal replacement product. The same way a multivitamin would have that info on its label.

But consider that OpenFoodFacts can't give you that info on just about anything else, especially not basic foods like "apples" or "tofu" or "chicken breast".

I'm not dumping on the project. It's really useful to have a database of packaged food labels. It's just not trying to solve this problem.

NewJazz

You can add micro nutrients to those foods, they just don't always have them. Or so I thought.

teolemon

Hi, Pierre, Open Food Facts NGO co-founder. We have an issue to propose approximation of micro-nutrients from reputable database. Feel free to join the project and contribute your time/coding skills to help us solve this: https://github.com/openfoodfacts/openfoodfacts-server/issues...

octotep

Overall, very cool and seriously much needed! How does the micronutrient estimation work? Or is that part of the secret sauce?

I was looking at this page: https://www.opennutrition.app/search/original-shells-cheese-... and saw the amino acid, vitamin, and mineral sections; there are many things listed which aren't covered by the official nutritional data. These entries also have very precise numbers but I'm not sure where and how they're derived and if I could put any serious weight in them. I'd love to hear more if you're willing to share!

null

[deleted]

joshdickson

TL;DR: They are estimates from giving an LLM (generally o3 mini high due to cost, some o1 preview) a large corpus of grounding data to reason over and asking it to use its general world knowledge to return estimates it was confident in, which, when escalating to better LLMs like o1-pro and manual verification, proved to be good enough that I thought they warranted release.

You can read about the background on how I did them in more detail in the about/methodology section: https://www.opennutrition.app/about (see "Technical Approach")

Xiol32

You need to add a disclaimer for this data. People could rely on them being accurate, and you simply can't prove they are.

joshdickson

There is a large disclaimer that states, among other things, "We strive to ensure accuracy and quality using authoritative sources and AI-based validation; however, we make no guarantees regarding completeness, accuracy, or timeliness. Always confirm nutritional data independently when accuracy is critical." on every page on the website where that kind of in-depth data is available.