Skip to content(if available)orjump to list(if available)

AI models miss disease in Black and female patients

orr94

"AIs want the future to be like the past, and AIs make the future like the past. If the training data is full of human bias, then the predictions will also be full of human bias, and then the outcomes will be full of human bias, and when those outcomes are copraphagically fed back into the training data, you get new, highly concentrated human/machine bias.”

https://pluralistic.net/2025/03/18/asbestos-in-the-walls/#go...

MountainArras

The dataset they used to train the model are chest xrays of known diseases. I'm having trouble understanding how that's relevant here. The key takeaway is that you can't treat all humans as a single group in this context, and variations in the biology across different groups of people may need to be taken into account within the training process. In other words, the model will need to be trained on this racial/gender data too in order to get better results when predicting the targeted diseases within these groups.

I think it's interesting to think about instead attaching generic information instead of group data, which would be blind to human bias and the messiness of our rough categorizations of subgroups.

genocidicbunny

One of the things that people I know in the medical field have mentioned is that there's racial and gender bias that goes through all levels and has a sort of feedback loop. A lot of medical knowledge is gained empirically, and historically that has meant that minorities and women tended to be underrepresented in western medical literature. That leads to new medical practitioners being less exposed to presentations of various ailments that may have variance due to gender or ethnicity. Basically, if most data is gathered from those who have the most access to medicine, there will be an inherent bias towards how various ailments present in those populations. So your base data set might be skewed from the very beginning.

(This is mostly just to offer some food for thought, I haven't read the article in full so I don't want to comment on it specifically.)

tbihl

>women tended to be underrepresented in western medical literature.

Is there some evidence of this? It's hard for me to picture that women see receive less medical attention than man: completely inconsistent with my culture and every doctor's office I've ever been to. It's more believable (still not very) that they disproportionately avoid studies.

searealist

Women use far more medical care than men. Men's insurance premiums subsidize women's.

belorn

It is very true that a lot of medical knowledge is gained empirically, and there is also an additional aspect to it. The history of Medical research is generally studied on the demographics where such testing is cultural acceptable, and where the gains of such research has been mostly sought, which is young men drafted into wars. The second common demographic are medical students, which historically was biased towards men but are today biased towards women.

So while access to medicine indeed one demographic, I would say that studies are more likely to target demographics which are convenient to test on.

multjoy

The key takeaway from the article is that the race etc. of the subjects wasn't disclosed to the AI, yet it was able to predict it to 80% while the human experts managed 50% suggesting that there was something else encoded in the imagery that the AI was picking up on.

mjevans

The AI might just have a better subjective / analytical weight detection criteria. Humans are likely more willing to see what they (or not see what they don't) expect to see.

dartos

> The dataset they used to train the model are chest xrays of known diseases. I'm having trouble understanding how that's relevant here.

For example, If you include no (or few enough) black women in the dataset of x-rays, the model may very well miss signs of disease in black women.

The biases and mistakes of those who created the data set leak into the model.

Early image recognition models had some very… culturally insensitive classes baked in.

darth_avocado

I am confused. I’m not a doctor, but why would a model perform poorly at detecting diseases in X-rays in different genders and races unless the diseases present themselves differently in X-Rays for different races? Shouldn’t the model not have the race and gender information to begin with? Like a model trained on detecting lesions should perform equally well on ANY X-Ray unless lesions show up differently in different demographics.

atlantic

If diseases manifest differently for different races and genders, the obvious solution is to train multiple LLMs, based on separate datasets for those different groups. Not to mutter darkly about bias and discrimination.

prasadjoglekar

Xays by definition don't look at skin color. Do chest x-rays of black women reveal that there's something different about their chests than white or asian women? That doesn't pass my non doctor sniff test, but someone can correct me (no sarcasm intended).

ruytlm

It disappoints me how easily we are collectively falling for what effectively is "Oh, our model is biased, but the only way to fix it is that everyone needs to give us all their data, so that we can eliminate that bias. If you think the model shouldn't be biased, you're morally obligated to give us everything you have for free. Oh but then we'll charge you for the outputs."

How convenient.

It's increasingly looking like the AI business model is "rent extracting middleman", just like the Elseviers et al of the academic publishing world - wedging themselves into a position where they get to take everything for free, but charge others at every opportunity.

ElevenLathe

We have to invent more ways to pay rich people for being rich, and AI looks like a promising one.

genocidicbunny

Do you think there is a middle ground for a progressive 'detailization' of the data -- you form a model based on the minimal data set that allows you to draw useful conclusions, and refine that with additional data to where you're capturing the vast majority of the problem space with minimal bias?

guhwhut

[dead]

loa_in_

X-rays are ordered only after doctor decides it's recommended. If there's dismissal bias in the decision tree at that point, many ill chests are missing from training data.

bko

Apparently providing this messy rough categorization appeared to help in some cases. From the article:

> To force CheXzero to avoid shortcuts and therefore try to mitigate this bias, the team repeated the experiment but deliberately gave the race, sex, or age of patients to the model together with the images. The model’s rate of “missed” diagnoses decreased by half—but only for some conditions.

In the end though I think you're right and we're just at the phases of hand-coding attributes. The bitter lesson always prevails

https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...

thaumasiotes

> Also important was the use [in Go] of learning by self play to learn a value function

I thought the self-play was the value function that made progress in Go. That is, it wasn't the case that we played through a lot of games and used that data to create a function that would assign a value to a Go board. Instead, the function to assign a value to a Go board would do some self-play on the board and assign value based on the outcome.

darkerside

Do you mean genetic information?

null

[deleted]

niyyou

As Sara Hooker discussed in her paper https://www.cell.com/patterns/fulltext/S2666-3899(21)00061-1..., bias goes way beyond data.

jhanschoo

I like how the author used neo-Greek words to sneak in graphic imagery that would normally be taboo in this register of writing

MonkeyClub

I dislike how they misspelled it though.

ideamotor

I really can’t help but think of the simulation hypothesis. What are the chances this copy-cat technology was developed when I was alive, given that it keeps going.

kcorbitt

We may be in a simulation, but your odds of being alive to see this (conditioned on being born as a human at some point) aren't that low. Around 7% of all humans ever born are alive today!

ToValueFunfetti

In order to address the chances of a human being alive to witness the creation of this tech, you'd have to factor in the humans who have yet to be born. If you're a doomer, 7% is probably still fine. If we just maintain the current population for another century, it'll be much lower.

encipriano

I dont believe that percentage. Especially considering how spread the homo branch already was more than 100 000 years ago. And from which point do you start counting? Homo erectus?

mhuffman

"The model used in the new study, called CheXzero, was developed in 2022 by a team at Stanford University using a data set of almost 400,000 chest x-rays of people from Boston with conditions such as pulmonary edema, an accumulation of fluids in the lungs. Researchers fed their model the x-ray images without any of the associated radiologist reports, which contained information about diagnoses. "

... very interesting that the inputs to the model had nothing related to race or gender, but somehow it still was able to miss diagnose Black and female patients? I am curious of the mechanism for this. Can it just tell which x-rays belong to Black or female patients and then use some latent racism or misogyny to change the diagnosis? I do remember when it came out that AI could predict race from medical images with no other information[1], so that part seems possible. But where would it get the idea to do a worse diagnosis, even if it determines this? Surely there is no medical literature that recommends this!

[1]https://news.mit.edu/2022/artificial-intelligence-predicts-p...

FanaHOVA

The non-tinfoil hat approach is to simply Google "Boston demographics", and think of how training data distribution impacts model performance.

> The data set used to train CheXzero included more men, more people between 40 and 80 years old, and more white patients, which Yang says underscores the need for larger, more diverse data sets.

I'm not a doctor so I cannot tell you how xrays differ across genders / ethnicities, but these models aren't magic (especially computer vision ones, which are usually much smaller). If there are meaningful differences and they don't see those specific cases in training data, they will always fail to recognize them at inference.

h2zizzle

Non-technical suggestion: if AI represents an aspect of the collective unconscious, as it were, then a racist society would produce latently racist training data that manifests in racist output, without anyone at any step being overtly racist. Same as an image model having a preference for red apples (even though there are many colors of apple, and even red ones are not uniformly cherry red).

The training data has a preponderance of examples where doctors missed a clear diagnosis because of their unconscious bias? Then this outcome would be unsurprising.

An interesting test would be to see if a similar issue pops up for obese patients. A common complaint, IIUC, is that doctors will chalk up a complaint to their obesity rather than investigating further for a more specific (perhaps pathological) cause.

protonbob

I'm going to wager an uneducated guess. Black people are less likely to go to the doctor for both economic and historical reasons so images from them are going to be underrepresented. So in some way I guess you could say that yes, latent racism caused people to go to the doctor less which made them appear less in the data.

encipriano

Arent black people like 10% of us population? You dont have ro look further

apical_dendrite

Where the data comes from also matters. Data is collected based on what's available to the researcher. Data from a particular city or time period may have a very different distribution than the general population.

ars

Men are also way less likely to go to Dr vs women. Yet this claims a bias against women as well.

cratermoon

> Can it just tell which x-rays belong to Black or female patients and then use some latent racism or misogyny to change the diagnosis?

The opposite. The dataset is for the standard model "white male", and the diagnoses generated pattern-matched on that. Because there's no gender or racial information, the model produced the statistically most likely result for white male, a result less likely to be correct for a patient that doesn't fit the standard model.

XorNot

The better question is just "are you actually just selecting for symptom occurrence by socioeconomic group?"

Like you could modify the question to ask "is the model better at diagnosing people who went to a certain school?" and simplistically the answer would likely seem to be yes.

searealist

Then why is the headline not "AI models miss disease in Asian patients" or even "AI models miss disease in Latino patients"?

It just so happens to align with what maximizes political capital in today's world.

daveguy

You really just have to understand one thing: AI is not intelligent. It's pattern matching without wisdom. If fewer people in the dataset are a particular race or gender it will do a shittier job predicting and won't even "understand" why or that it has bias, because it doesn't understand anything at a human level or even a dog level. At least most humans can learn their biases.

bilbo0s

Isn't it kind of clear that it would have to be that the data they chose was influenced somehow by bias?

Machines don't spontaneously do this stuff. But the humans that train the machines definitely do it all the time. Mostly without even thinking about it.

I'm positive the issue is in the data selection and vetting. I would have been shocked if it was anything else.

timewizard

LLMs don't and cannot want things. Human beings also like it when the future is mostly like the past. They just call that "predictability."

Human data is bias. You literally cannot remove one from the other.

There are some people who want to erase humanity's will and replace it with an anthropomorphized algorithm. These people concern me.

itishappy

Can humans want things? Our reward structures sure seem aligned in a manner that encourages anthropomorphization.

Biases are symptoms of imperfect data, but that's hardly a human-specific problem.

timewizard

> Can humans want things?

Yes. Do I have to prompt you? Or do you exist on your own?

> Our reward structures sure seem aligned in a manner that encourages anthropomorphization.

You do understand what that word /means/?

> are symptoms of imperfect data

Which means humans cannot generate perfect data. So good luck with all that high priced "training" you're doing. Mathematically errors compound.

balamatom

The most concerning people are -- as ever -- those who only think that they are thinking. Those who keep trying to fit square pegs into triangular holes without, you know, stopping to reflect: who gave them those pegs in the first place, and to what end?

Why be obtuse? There is no "anthropomorphic fallacy" here to dispel. You know very well that "LLMs want" is simply a way of speaking about teleology without antagonizing people who are taught that they should be afraid of precise notions ("big words"). But accepting that bias can lead to some pretty funny conflations.

For example, humanity as a whole doesn't have this "will" you speak of any more than LLMs can "want"; will is an aspect of the consciousness of the individual. So you seem to be be uncritically anthropomorphizing social processes!

If we assume those to be chaotic, in that sense any sort of algorithm is slightly more anthropomorphic: at least it works towards a human-given and therefore human-comprehensible purpose -- on the other hand, whether there is some particular "destination of history" towards which humanity is moving, is a question that can only ever be speculated upon, but not definitively perceived.

timewizard

> Why be obtuse?

In the context of the quote precision is called for. You cite fear but that's attempting to have it both ways.

> humanity as a whole doesn't have this "will" you speak of

Why not?

> will is an aspect of the consciousness of the individual.

I can't measure your will. I can measure the impact of your will through your actions in reality. See the problem? See why we can say "the will of humanity?"

> So you seem to be be uncritically anthropomorphizing social processes!

It's called "an aggregate."

> is a question that can only ever be speculated upon, but not definitively perceived.

The original point was that LLMs want the future to be like the past. You've way overshot the mark here.

sapphicsnail

Humans anthropocize all sorts of things but there are way bigger consequences for treating current AI like a human than someone anthropocizing their dog.

I know plenty of people that believe LLMs think and reason the same way as humans do and it leads them to make bad choices. I'm really careful about the language I use around such people because we understand expressions like, "the AI thought this" very differently.

verisimi

> If we assume those to be chaotic, in that sense any sort of algorithm is slightly more anthropomorphic: at least it works towards a human-given and therefore human-comprehensible purpose -- on the other hand, whether there is some particular "destination of history" towards which humanity is moving, is a question that can only ever be speculated upon, but not definitively perceived.

Do you not think that if you anthropomorphise things that aren't actually anthropic, that you then insert a bias towards those things? The bias will actually discriminate at the expense of people.

If that is so, the destination of history will inevitably be misanthropic.

Misplaced anthropomorphism is a genuine, present concern.

bko

Suppose you have a system that saves 90% of lives on group A but only 80% of lives in group B.

This is due to the fact that you have considerably more training data on group A.

You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A.

So the obvious thing to do is to have the technology intentionally kill ~1 out of every 10 patients from group A so the efficacy rate is ~80% for both groups. Problem solved

From the article:

> “What is clear is that it’s going to be really difficult to mitigate these biases,” says Judy Gichoya, an interventional radiologist and informatician at Emory University who was not involved in the study. Instead, she advocates for smaller, but more diverse data sets that test these AI models to identify their flaws and correct them on a small scale first. Even so, “Humans have to be in the loop,” she says. “AI can’t be left on its own.”

Quiz: What impact would smaller data sets have on efficacy for group A? How about group B? Explain your reasoning

janice1999

> You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A.

Who is preventing you in this imagined scenario?

There are drugs that are more effective on certain groups of people than others. BiDil, for example, is an FDA approved drug marketed to a single racial-ethnic group, African Americans, in the treatment of congestive heart failure. As long as the risks are understood there can be accommodations made ("this AI tool is for males only" etc). However such limitations and restrictions are rarely mentioned or understood by AI hype people.

bko

What does this have to do with FDA or drugs? Re-read the comment I was replying to. It's complaining that a technology could serve one group of people better than another, and I would argue that this should not be our goal.

A technology should be judged by "does it provide value to any group or harm any other group". But endlessly dividing people into groups and saying how everything is unfair because it benefits group A over group B due to the nature of the problem, just results in endless hand-wringing and conservatism and delays useful technology from being released due to the fear of mean headlines like this.

bilbo0s

No. That's not how it works.

It's contraindication. So you're in a race to the bottom in a busy hospital or clinic. Where people throw group A in a line to look at what the AI says, and doctors and nurses actually look at people in group B. Because you're trying to move patients through the enterprise.

The AI is never even given a chance to fail group B. But now you've got another problem with the optics.

JumpCrisscross

> You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A

I think the point is you need to let group B know this tech works less well on them.

potsandpans

Imagine if you had a strawman so full of straw, it was the most strawfilled man that ever existed.

bko

From the article:

> “What is clear is that it’s going to be really difficult to mitigate these biases,” says Judy Gichoya, an interventional radiologist and informatician at Emory University who was not involved in the study. Instead, she advocates for smaller, but more diverse data sets that test these AI models to identify their flaws and correct them on a small scale first. Even so, “Humans have to be in the loop,” she says. “AI can’t be left on its own.”

What do you think smaller data sets would do to a model? It'll get rid of disparity sure

milesrout

It is a hypothetical example not a strawman.

elietoubi

I came across a fascinating Microsoft research paper on MedFuzz (https://www.microsoft.com/en-us/research/blog/medfuzz-explor...) that explores how adding extra, misleading prompt details can cause large language models (LLMs) to arrive at incorrect answers.

For example, a standard MedQA question describes a 6-year-old African American boy with sickle cell disease. Normally, the straightforward details (e.g., jaundice, bone pain, lab results) lead to “Sickle cell disease” as the correct diagnosis. However, under MedFuzz, an “attacker” LLM repeatedly modifies the question—adding information like low-income status, a sibling with alpha-thalassemia, or the use of herbal remedies—none of which should change the actual diagnosis. These additional, misleading hints can trick the “target” LLM into choosing the wrong answer. The paper highlights how real-world complexities and stereotypes can significantly reduce an LLM’s performance, even if it initially scores well on a standard benchmark.

Disclaimer: I work in Medical AI and co-founded the AI Health Institute (https://aihealthinstitute.org/).

Terr_

> information like low-income status, a sibling with alpha-thalassemia, or the use of herbal remedies

Heck, even the ethnic-clues in a patient's name alone [0] are deeply problematic:

> Asking ChatGPT-4 for advice on how much one should pay for a used bicycle being sold by someone named Jamal Washington, for example, will yield a different—far lower—dollar amount than the same request using a seller’s name, like Logan Becker, that would widely be seen as belonging to a white man.

This extends to other things, like what the LLM's fictional character will respond-with when it is asked about who deserves sentences for crimes.

[0] https://hai.stanford.edu/news/why-large-language-models-chat...

belorn

That seems to be identical to creating an correlation table on market places and check the relationship between price and name. Names associated with higher economical status will correlate with higher price. Take a random name associated with higher economical status, and one can predict a higher price than a name that is associated with lower economical status.

As such, you don't need an LLM to create this effect. Math will have the same result.

Terr_

I'm not sure what point you're trying to make here. It doesn't matter what after-the-fact explanation someone generates to try to explain it, or whether we could purposely do the bad thing more efficiently with manual code.

It AustrianPainterLLM has an unavoidable pattern of generating stories where people are systematically misdiagnosed / shortchanged / fired / murdered because a name is Anne Frank or because a yarmulke in involved, it's totally unacceptable to implement software that might "execute" risky stories.

onlyrealcuzzo

It's almost as if you'd want to not feed what the patient says directly to an LLM.

A non-trivial part of what doctors do is charting - where they strip out all the unimportant stuff you tell them unrelated to what they're currently trying to diagnose / treat, so that there's a clear and concise record.

You'd want to have a charting stage before you send the patient input to the LLM.

It's probably not important whether the patient is low income or high income or whether they live in the hood or the uppity part of town.

dap

> It's almost as if you'd want to not feed what the patient says directly to an LLM.

> A non-trivial part of what doctors do is charting - where they strip out all the unimportant stuff you tell them unrelated to what they're currently trying to diagnose / treat, so that there's a clear and concise record.

I think the hard part of medicine -- the part that requires years of school and more years of practical experience -- is figuring out which observations are likely to be relevant, which aren't, and what they all might mean. Maybe it's useful to have a tool that can aid in navigating the differential diagnosis decision tree but if it requires that a person has already distilled the data down to what's relevant, that seems like the relatively easy part?

airstrike

By the way, the show The Pitt currently on Max touches on some of this stuff with a great deal of accuracy (I'm told) and equal amounts of empathy. It's quite good.

onlyrealcuzzo

Yes - theoretically, some form of ML/AI should be very good at charting the relevant parts, prompting the doctor for follow-up questions & tests that would be good to know to rule out certain conditions.

The harder problem would be getting the actual diagnosis right, not filtering out irrelevant details.

But it will be an important step if you're using an LLM for the diagnosis.

nradov

I generally agree, however socioeconomic and environmental factors are highly correlated with certain medical conditions (social determinants of health). In some cases even causative. For example, patients who live near an oil refinery are more likely to have certain cancers or lung diseases.

https://doi.org/10.1093/jncics/pkaa088

dekhn

Studies like that, no matter how careful, cannot say anything about causation.

onlyrealcuzzo

So that's the important part, not that they're low income.

echoangle

> a sibling with alpha-thalassemia

I have no clue what that is or why it shouldn't change the diagnosis, but it seems to be a genetic thing. Is the problem that this has nothing to do with the described symptoms? Because surely, a sibling having a genetic disease would be relevant if the disease could be a cause of the symptoms?

kulahan

In medicine, if it walk like a horse and talks like a horse, it’s a horse. You don’t start looking into the health of relatives when your patient tells the full story on their own.

Sickle cell anemia is common among African Americans (if you don’t have the full-blown version, the genes can assist with resisting one of the common mosquito-borne diseases found in Africa, which is why it developed in the first place I believe).

So, we have a patient in the primary risk group presenting with symptoms that match well with SCA. You treat that now, unless you have a specific reason not to.

Sometimes you have a list of 10-ish diseases in order of descending likelihood, and the only way to rule out which one it isn’t, is by seeing no results from the treatment.

Edit: and it’s probably worth mentioning no patient ever gives ONLY relevant info. Every human barrages you with all the things hurting that may or may not be related. A doctor’s specific job in that situation is to filter out useless info.

AnimalMuppet

Unfortunately, humans talking to a doctor give lots of additional, misleading hints...

cheschire

Can't the same be said for humans though? Not to be too reductive, but aren't most general practitioners just pattern recognition machines?

daemonologist

I'm sure humans can make similar errors, but we're definitely less suggestible than current language models. For example, if you tell a chat-tuned LLM it's incorrect, it will almost always respond with something like "I'm sorry, you're right..." A human would be much more likely to push back if they're confident.

dap

Sure, “just” a machine honed over millions of years and trained on several years of specific experience in this area.

goatlover

You are being too reductive saying humans are "just pattern recognition machines", ignoring everything else about what makes us human in favor of taking an analogy literally. For one thing, LLMs aren't black or female.

chadd

A surprisingly high number of medical studies will not include women because the study doesn't want to account for "outliers" like pregnancy and menstrual cycles[0]. This is bound to have effects on LLM answers for women.

[0] https://www.northwell.edu/katz-institute-for-womens-health/a...

jcims

Just like doctors: https://kffhealthnews.org/news/article/medical-misdiagnosis-...

I wonder how well it does with folks that have chronic conditions like type 1 diabetes as a population.

Maybe part of the problem is that we're treating these tools like humans that have to look at one fuzzy picture to figure things out. A 'multi-modal' model that can integrate inputs like raw ultrasound doppler, x-ray, ct scan, blood work, ekg, etc etc would likely be much more capable than a human counterpart.

nonethewiser

Race and gender should be inputs then.

The female part is actually a bit more surprising. Its easy to imagine a dataset not skewed towards black people. ~15% of the population in North America, probably less in Europe, and way less in Asia. But female? Thats ~52% globally.

Freak_NL

Surprising? That's not a new realisation. It's a well known fact that women are affected by this in medicine. You can do a cursory search for the gender gap in medicine and get an endless amount of reporting on that topic.

appleorchard46

I learned about this recently! It's wild how big the difference is. Even though legal/practical barriers to gender equality in medicine and data collection have been virtually nonexistent for the past few decades the inertia from the decades before that (where women were often specifically excluded, among many other factors) still weigh heavily.

To any women who happen to be reading this: if you can, please help fix this! Participate in studies, share your data when appropriate. If you see how a process can be improved to be more inclusive then please let it be known. Any (reasonable) male knows this is an issue and wants to see it fixed but it's not clear what should be done.

nonethewiser

That just makes it more surprising.

orand

Race and sex should be inputs. Giving any medical prominence to gender identity will result in people receiving wrong and potentially harmful treatment, or lack of treatment.

lalaithion

Most trans people have undergone gender affirming medical care. A trans man who has had a hysterectomy and is on testosterone will have a very different medical baseline than a cis woman. A trans woman who has had an orchiectomy and is on estrogen will have a very different medical baseline than a cis man. It is literally throwing out relevant medical information to attempt to ignore this.

nonethewiser

How is that in any way in conflict with what he said? You're just making an argument for more inputs.

Biological sex, hormone levels, etc.

root_axis

Seems like adding in gender only makes things less clear. The relevant information is sex and a medical history of specific surgeries and medications - the type of thing your doctor should already be aware of. Adding in gender only creates ambiguity because there's no way to measure gender from a biological perspective.

Der_Einzige

[flagged]

LadyCailin

That’s mostly correct, that “gender identity” doesn’t matter for physical medicine. But hormone levels and actual internal organ sets matter a huge amount, more than genes or original genitalia, in general. There are of course genetically linked diseases, but there are people with XX chromosomes that are born with a penis, and XY people that are born with a vulva, and genetically linked diseases don’t care about external genitalia either way.

You simply can’t reduce it to birth sex assignment and that’s it, if you do, you will, as you say, end up with wrong and potentially harmful treatment, or lack of treatment.

nonethewiser

>But hormone levels and actual internal organ sets matter a huge amount, more than genes or original genitalia

Or current genitalia for that matter. It's just a matter of the genitalia signifying other biological realities for 99.9% of people. For sure more info like average hormone levels or ranges over time would be more helpful.

connicpu

Actually both are important inputs, especially when someone has been taking hormones for a very long time. The human body changes greatly. Growing breast tissue increases the likelyhood of breast cancer, for example, compared to if you had never taken it (but about the same as if estradiol had been present during your initial puberty).

krapp

Modern medicine has long operated under the assumption that whatever makes sense in a male body also makes sense in a female body, and womens' health concerns were often dismissed, misdiagnosed or misunderstood in patriarchal society. Women were rarely even included in medical trials prior to 1993. As a result, there is simply a dearth of medical research directly relevant to women for models to even train on.

Avshalom

mrguyorama

Republicans early in this admin actually bitched in congress that we were "wasting" money on woman crash test dummies.

https://www.foxnews.com/video/6325465806112

chrisgarand

I'm going to lay this out how I understand it:

The NIH Revitalization Act of 1993 was supposed to bring women back into medical research. The reality was that women were always included, HOWEVER in 1977,(1) because of the outcomes from thalidomide (causing birth defects), "women of childbearing potential" were excluded from the phase 1, and early phase 2 trials (the highest risk trials). They're still generally generally excluded, even after the passage of the act. This was/is to protect the women, and potential children.

According to Edward E. Bartlett in his meta data analysis from 2001, men have been routinely under-represented in NIH data (even before adjusting for men's mortality rates) between 1966-1990. (2)

There's also routinely twice as much spent every year on women's health studies vs men's by the NIH. (3)

It makes sense to me, but I'm biased. Logically, since men lead in 9 of the top 10 causes for death, that shows there's something missing in the equation of research. (4 - It's not a straight forward table, you can view the total deaths, and causes and compare the two for men, and women)

With that being said, it doesn't tell us about the quality of the funding or research topics, maybe the money is going towards pointless goals, or unproductive researchers.

Are there gaps in research? Most definitely, like women who are pregnant. This is put in place to avoid harm but that doesn't help them when they fall into them. Are there more? Definitely. I'm not educated enough in the nuances to go into them.

If you have information that counters what I've posted, please share it, I would love know where these folks are blind so I can take a look at my bias.

(1) https://petrieflom.law.harvard.edu/2021/04/16/pregnant-clini... (2) https://journals.lww.com/epidem/fulltext/2001/09000/did_medi... (3) https://jameslnuzzo.substack.com/p/nih-funding-of-mens-and-w... < I spot checked a couple of the figures, and those lined up. I'm assuming the rest is accurate (4) https://www.cdc.gov/womens-health/lcod/index.html#:~:text=Ov...

andsoitis

> Its easy to imagine a dataset not skewed towards black people. ~15% of the population in North America, probably less in Europe, and way less in Asia.

What about Africa?

appleorchard46

That's not where most of the data is coming from. If it was we'd be seeing the opposite effect, presumably.

jsemrau

I suppose that's the problem I have with that study. T

nonethewiser

The story is that there exists this model which poorly predicts for black (and female) patients. Given there are probably lots of datasets where black people are a vast minority makes this not surprising.

For all I know there are millions of models with extremely poor accuracy based on African datasets. Wouldnt really change anything about the above though. I wouldnt expect that though and it would definitely be interesting.

rafaelmn

How much medical data/papers do you think they generate in comparison to these three ?

XorNot

Why not socioeconomic status or place of residence? Knowing mean yearly income will absolutely help an AI figure out statistically likely health outcomes.

nottorp

> as well in those 40 years or younger

Are we sure it's only about racial bias then?

Looks to me like the training data set is too small overall. They had too few black people, too few women, but also too few younger people.

xboxnolifes

It's the same old story that's been occurring for years/decades. Bad data in, bad data out.

Animats

What's so striking is how strongly race shows in X-rays. That's unexpected.

dekhn

It doesn't seem surprising at all. Genetic history correlates with race, and genetic history correlates with body-level phenotypes; race also correlates with socioeconomic status which correlates with body-level phenotypes. They are of course fairly complex correlations with many confounding factors and uncontrolled variables.

It has been controversial to discuss this and a lot of discussions about this end up in flamewars, but it doesn't seem surprising, at least to me, from my understanding of the relationship between genetic history and body-level phenotypes.

KittenInABox

What is the body-level phenotype of a ribcage by race?

I think what baffles me is that black people as a group are more genetically diverse than every other race put together so I have no idea how you would identify race by ribcage x-rays exclusively.

dekhn

I use the term genetic history, rather than race, as race is only weakly correlated with body level phenotypes.

If your question is truly in good faith (rather than a "I want to get in argument "), then my answer is: it's complicated. Machine learning models that work on images learn extremely complicated correlations between pixels and labels. If on average, people with a specific genetic history had slightly larger ribcages (due to their genetics, or even socioeconomic status that correlated with genetic history), that would exhibit in a number of ways in the pixels of a radiograph- larger bones spread across more pixels, density of bones slightly higher or lower, organ size differences, etc.

It is true that Africa has more genetic diversity than anywhere else; the current explanation is that after humans arose in africa, they spread and evolved extensively, but only a small number of genetically limited groups left africa and reproduced/evolved elsewhere in the world.

Avshalom

Africa is extremely diverse but due to the slave trade mostly drawing from the Gulf of Guinea (and then being, uh... artificially selected in addition to that) 'Black' -as an American demographic- is much less so.

lesuorac

If you have 2 samples where one is highly concentrated around 5 and the other is dispersed more evenly between 0 and 10 then for any value of 5 you should guess Sample 1.

But anyways, the article links out to a paper [1] but unfortunately the paper tries to theorize things that would explain how and they don't find one (which may mean the AI is cheating imo not theirs).

[1]: https://www.thelancet.com/journals/landig/article/PIIS2589-7...

intuitionist

Sub-Saharan Africans are extremely genetically diverse but a sample of ~100 Black Americans is unlikely to have any Khoekhoe or Twa representation.

Anyway it’s possible that the model can pick up on other cues as well; if you had some X-rays from a hospital in Portland, Oregon and some from a hospital in Montgomery, Alabama and some quirk of the machine in Montgomery left artifacts that a model could pick up on, the presence of those artifacts would be quite correlated with race.

danielmarkbruce

The fact that the vast majority of physical differences don't matter in the modern world doesn't mean they don't actually exist..

DickingAround

This is a good point; a man or woman sitting behind a desk doing correlation analysis are going to look very similar in their function to a business. But they probably physically look pretty distinct to an x-ray picture.

null

[deleted]

kjkjadksj

Race has such striking phenotypes on the outside it should come as no surprise there are also internal phenotypes and significant heterogeneity.

banqjls

But is it really?

null

[deleted]

sergiotapia

It's odd how we can segment between different species in animals, but in humans it's taboo to talk about this. Threw the baby out with the baby water. I hope we can fix this soon so everybody can benefit from AI. The fact that I'm a male latino should be an input for an AI trained on male latinos! I want great care!

I don't want pretend kumbaya that we are all humans in the end. That's not true. We are distinct! We all deserve love and respect and care, but we are distinct!

schnable

That's because humans are all the same species.

sdsd

In terms ofLinnaean taxonomy, and Chihuahuas and wolves are also the same species, in that they can reproduce fertile offspring. We instead differentiate them using the less objective subspecies classification. So it appears that with canines we're comfortable delineating subspecies, why not with humans?

I don't think we should, but your particular argument seems open to this critique.

CharlesW

It seems critical to have diverse, inclusive, and equitable data for model training. (I call this concept "DIET".)

appleorchard46

I'm calling it now. My prediction is that, 5-10 years from now(ish), once training efficiency has plateaued, and we have a better idea of how to do more with less, curated datasets will be the next big thing.

Investors will throw money at startups claiming to make their own training data by consulting experts, finetuning as it is now will be obsolete, pre-ChatGPT internet scrapes will be worth their weight in gold. Once a block is hit on what we can do with data, the data itself is the next target.

0cf8612b2e1e

Funny you should say that. There was a push to have more officially collected DIET data for exactly this reason. Unfortunately such efforts were recently terminated.

nonethewiser

Or take more inputs. If there are differences between race and gender and thats not captured as an input we should expect the accuracy to be lower.

If an x-ray means different things based off the race or gender we should make sure the model knows the race and gender.

red75prime

And not applying fairness techniques to the resulting model.

kjkjadksj

This isn’t an AI problem but a general medical field problem. It is a big issue with basically any population centric analysis where the people involved in the study don’t have a perfect subset of the worlds population to model human health; they have a couple hundred blood samples from patients at a Boise hospital over the past 10 years perhaps. And they validate this population against some other available cohort that is similarly constrained by what is practically possible to sample and catalog and might not even see the same markers shake out between disease and healthy.

There are a couple populations that are really overrepresented as a result of these available datasets. Utah populations on one hand because they are genetically bottlenecked and therefore have better signal to noise in theory. And on the other the Yoruba tribe out of west africa as a model of the most diverse and ancestral population of humans for studies that concern themselves with how populations evolved perhaps.

There are other projects too amassing population data. About 2/3rd of the population of iceland has been sequenced and this dataset is also frequently used.

cratermoon

It's a generative AI LLM hype issue because it follows the confidence game playbook. Feed someone correct ideas and answers that fit their biases until they trust you, then when the time is right, suggest things that fit their biases but give incorrect (and exploitative) results.

bbarnett

I remember a male and female specialist, whatever their discipline, holding a media scrum a decade ago.

They pleaded for people to understand that men and women are physically different, including the brain, its neurological structure, and that this was in modern medicine being overlooked for political reasons.

One of the results was that many clinical trials and studies were populated by males only. The theory being that they are less risk adverse, and as "there is no difference", then who cares?

Well these two cared, and said that it was hurting medical outcomes for women.

I wonder, if this AI issue is a result of this. Fewer examples of female bodies and brains, fewer studies and trials, means less data to match on...

https://news.harvard.edu/gazette/story/2007/07/sex-differenc...

zeagle

Cool topic! This isn't surprising given the AI models would be trained such that existing medical practices, biases, and failures would propagate through them as others have said here.

There is a published, recognized bias against women and blacks (borrowing the literature term) specifically in medicine when it comes to pain assessment and treatment. Racism is a part of it but too simplistic. Most of us don't go to work trying to be horrible people. I was in a fly in community earlier this week for work where 80% of housing is subsidized social housing... so spit balling a bit... things like assumptions about rate of metabolizing medications being equal, assess to medication, culture and stoicism, dismissing concerts, and the broad effects of poverty/trauma/inter-generational trauma all must play a role in this.

For interest:

https://jamanetwork.com/journals/jamanetworkopen/fullarticle...

Overall, the authors found comparable ratings in Black and White participants’ perceptions of the patient-physician relationship across all three measures (...) Alternatively, the authors found significant racial differences in the pain-related outcomes, including higher pain intensity and greater back-related disability among Black participants compared with White participants (intensity mean: 7.1 vs 5.8; P < .001; disability mean: 15.8 vs 14.1; P < .001). The quality of the patient-physician relationship did not explain the association between participant race and the pain outcomes in the mediation analysis.

https://www.aamc.org/news/how-we-fail-black-patients-pain

(top line summary) Half of white medical trainees believe such myths as black people have thicker skin or less sensitive nerve endings than white people. An expert looks at how false notions and hidden biases fuel inadequate treatment of minorities’ pain.

And https://www.washingtonpost.com/wellness/interactive/2022/wom...

antipaul

When was AI supposed to replace radiologists? Was it 7 years ago or something?

bilbo0s

Nah.

It was more like one year away.

But one year away for the past 7 years.

dekhn

Nearly all radiology practice has integrated AI to some degree or another at this point.