Microsoft's new Dragon Copilot is an AI assistant for healthcare
101 comments
·March 4, 2025dumbmrblah
amoxichillin
I've been helping test it as well - your experience sounds identical to mine. I was initially very excited for it, but nowadays I don't really bother turning it on unless I feel the conversation will be a long one. Although I am very much looking forward to them rolling out the automated pending of orders based on what was said during the conversation.
LLM's have so much potential in medicine, and I think one of the most important applications they will have is the ability to ingest a patient's medical chart within their context and present key information to clinicians that would've otherwise been overlooked in the bloated mess that most EMR's are nowadays (including Epic).
There's been so many times where I've found critically important details hidden away as a sidenote in some lab/path note overlooked for years that very likely could've been picked up by an LLM. Just a recent example - a patient with repeated admissions over the years due to severe anemia, would usually be scoped and/or given a transfusion without much further workup and discharged once Hgb >7. Blood bank path note from 10 years ago mentions presence of warm autoantibodies as a sidenote; for some reason the diagnosis of AIHA is never mentioned nor carried forward in their chart. A few missed words which would've saved millions of dollars in prolonged admissions and diagnostic costs over the years.
skywhopper
Given everything I hear about LLMs for similar summary purposes including your description and that given above, it seems unlikely that the LLM would be all that likely to “notice” a side note in a huge chart. I agree that’d be great but I’m curious why you think it would necessarily pick up on that sort of thing.
amoxichillin
> it seems unlikely that the LLM would be all that likely to “notice” a side note in a huge chart
I respectfully disagree - I think LLM's have already made significant advances in this area as shown in the various "needle in a haystack" demonstrations we've seen over the past couple years. I've already been impressed by the minute but relevant details they can "recall" after being fed very dense journal articles and the technology is only getting better. Also keep in mind that the raw text / "data" itself found in many patients' charts is not always that expansive (though it certainly can be for patients with recurrent admissions). It's more an issue of finding the actual information given that EMR's are a nightmare to navigate effectively.
Hallucinations are always a consideration too, but any implementation of the sort I mentioned before would certainly contain in-text backlinks to actual notes in the EMR. Epic already does this with their basic text search function. So I don't think hallucinations would be too problematic as clinicians should always be verifying this type of information at the source in good practice.
marcellus23
> A few missed words which would've saved millions of dollars in prolonged admissions and diagnostic costs over the years.
I don't mean to come off antagonistic here. But surely the more important benefit is the patient who would've avoided years of sickness and repeated hospital visits?
amoxichillin
> But surely the more important benefit is the patient who would've avoided years of sickness and repeated hospital visits?
The patient experience is always important and maybe I could've been less implicit in what I wrote. I think I was focusing more on the collective/societal impact this would have, which I felt would resonate more with the readers here.
dmurray
I don't know, if it was really millions of dollars for a single patient - I wouldn't pay a few million dollars to avoid a few bouts of illness for a random member of my health insurance group scheme cohort, and that seems like the correct comparison to make. Increase the costs by another order of magnitude and I'd rather let them die.
eminence32
I think that is that "prolonged admissions" was meant to cover
stuartjohnson12
I just wanted to jump in and say - don't give them too much credit on transcribing medication, I'm guessing this is Deepgram behind the scenes and their medication transcription works pretty well out of the box in my experience.
voidUpdate
Screwing up gender and years sounds pretty serious to me?
beng-nl
Maybe they mean that it either doesn’t matter in context or it’s easy to catch and correct. Either way it seems reasonable to trust the judgement of the professional reporting on their experience with a new tool.
stvltvs
I worry that we'll get complacent and not check details like that when they are important, not just the medical field but everywhere.
johnisgood
If it is easy to catch and correct, why cannot Copilot do it? Sounds like something that it should know.
dumbmrblah
It's more in scenarios where I enter the room and I ask the patient whether this is their wife/husband etc. It's not like I'm going into the room and saying "hello patient you appear to be a human female". The model is having difficulty figuring out who actors are if their are multiple different people talking. Not a big issue if all you're doing is rewriting information. But if multi-modal context is required, its not the best.
userbinator
The notes it generates are too verbose for most medical notes even with all the customization enabled.
I've noticed that seems to be a common trend for any AI-generated text in general.
TeMPOraL
I think this might be because of what GP said later:
> it treats all the information equally as important when that’s not really the case
In the general case (and I imagine, in the specific case of GP), the model doesn't have any prior to weigh the content - people usually just prompt it with "summarize this for me please <pasted link or text>"[0], without telling it what to focus on. And, more importantly, you probably have some extra preferences that aren't consciously expressed - the overall situational context, your particular ideas, etc. translate to a different weighing that the model has, and you can't communicate that via the prompt.
Without a more specific prior, the model has to treat every information equally, and this also means erring on the side of verbosity, as to not omit anything the user may care about.
--
[0] - Or such prompt is hidden in the "AI summarizer" feature of some tool.
gmerc
Are they charging per token
hakaneskici
Same for AI coding assistants, most tools generate way too much unnecessary code. Scary part is that the code seems to be running OK.
ksaxena
Yes, the biggest problem with Healthcare AI assistants right now is that there is no way to "prompt" the AI on what a physician needs in a given scenario - eg. "only include medically relevant information in HPI", "don't give me a layman explanation of radiographic reports", "include direct patient quotes when a neurological symptom is being described" etc.
And the prompt landscape in the field is vast. And fascinating. Every specialist has their own preference for what is important to include in a note vs what should be excluded; and this preference changes by disease - what a neurologist want in an epilepsy note is very different from what they need in a dementia note for eg.
Note preferences also change widely between physicians, even in the same practice and same specialty! I'm the founder of Marvix AI (www.marvixapp.ai), an AI assistant for specialty care, we work with several small specialty care practices where every physician has their own preferences on which details they want to retain in their note.
But if you can get the prompts to really align with a physician's preferences, this tech is magical - physicians regularly confess to us that this tech saves them ~2 hours every day. We have now had half a dozen physicians tell us in their feedback calls that their wives asked them to communicate their 'thanks' to us for getting their husbands back home for dinner on an important occasion!
[Edit: typo and phrasing]
visarga
> there is no way to "prompt" the AI on what a physician needs in a given scenario - eg. "only include medically relevant information in HPI", "don't give me a layman explanation of radiographic reports", "include direct patient quotes when a neurological symptom is being described" etc.
There is, it is called RLHF.
burnte
We tried it at my job, I got us in the beta. Go try Nudge AI and tell me what you think. Our providers found Nudge to be a far better product at a fifth of the price.
monkeydreams
> Hallucinations are present, but usually they’re pretty minor (screwing up gender, years).
And if all hospitals were doing was having doctors treat patients, this would be ok. But healthcare is fueled by these "minor" details and this will result in delays in payment and reimbursent, trouble with patient identification, corruption of clinical coding, etc.
pintxo
Did you encounter any instances of hallucinations or omissions?
One would image those to be the biggest dangers.
dumbmrblah
Hallucinations are pretty minimal but present. Some lazy physicians are gonna get burned by thinking they can just zone out during the interview and let this do all the work.
I edited my original post. Omissions are less worrisome, it’s more about too much information being captured which isn’t relevant. So you get these super long notes and it’s hard to separate the “wheat from the chaff”.
dartos
> Some lazy physicians are gonna get burned by thinking they can just zone out during the interview and let this do all the work.
How many patients are going to that physician, they’re going to get burned too
bluefirebrand
Seems like capturing too much irrelevant detail would be preferable to potentially missing important details, though?
eig
As a medical student, I used the dragon dictation software (no AI) to write notes in the ED and more recently I used a pilot of this ai version to write clinic notes.
Overall, I was quite impressed. It definitely made writing notes much faster, which all doctors hate to do. While it had some problems with where to put key pieces of information (like putting details from the physical exam back in the history), it only took 5 mins of rearrangement after the visit to complete the note.
For simple diagnoses, it does a decent job coming up with the assessment and plan, probably because all the simple diagnoses were in the training set. For more complex ones though, it needs to be exactly dictated by the doctor. I can see this being used very well in primary care.
Edit: When I said “coming up with an assessment and plan” I mean documenting the assessment and plan based on the ai’s recorded conversation with the patient. The conversation with the patient is meant to be understandable. The “assessment and plan” documentation on the other hand is jargony and meant to be read by other physicians.
conartist6
This still sounds bad. 5 mins to rework your notes after each patient visit? I didn't assume doctors had that kind of time.
And let me make this clear. I, as your patient, I never NEVER want the AI's treatment plan. If you aren't capable of thinking with your own brain, I have no desire to trust you with my health, just like I would never "trust" an AI to do any technical job I was personally responsible for due to the fact that it doesn't care at all if it causes a disaster. It's just stochastic word picker. YOU are a doctor.
diggan
> This still sounds bad. 5 mins to rework your notes after each patient visit? I didn't assume doctors had that kind of time.
Compared to what though? It reads as not additional work, but less work than manually having to do all that, seems likely to needing more than 5 minutes.
> And let me make this clear. I, as your patient, I never NEVER want the AI's treatment plan.
Where are you getting this from? Neither the parent's comment nor the article talks about the AI assistant coming up with a treatment plan, and it seems to be all about voice-dictating and "ambient listening" with the goal of "free clinicians from much of the administrative burden of healthcare", so seems a bit needlessly antagonistic.
conartist6
If you should ever couch its knowledge as your knowledge, I would think you could be in serious trouble. You would have to say something like "the AI's plan to treat you, which I think might be correct", when what I want to hear "my plan to treat you is: ..."
But I think it's more subtle than that, because I expect the AI to reinforce all your biases. Whatever biases (human biases, medical biases, biases that arise from what a patient isn't telling you) go into the question you feed it, it will take cues you didn't even know you were giving and use those cues to formulate the answer it thinks you expect to hear. That seems really dangerous to me, sort of like you're conceptually introducing AI imposter doctors to the staff, whose main goal is act knowledgable all the time so people don't think they are imposters...
I dunno. I'd like to give this particular strain techno-futurism back. Can I have a different one please?
rsynnott
From the post they're replying to:
> For simple diagnoses, it does a decent job coming up with the assessment and plan
(Somewhere, a medical liability insurance actuary just woke up in a cold sweat)
Yeah, personally I'd be looking for a second opinion.
ceejayoz
The AI companies absolutely hope to be the ones to come up with the treatment plans eventually.
forum-soon-yuck
[dead]
ilikecakeandpie
> 5 mins to rework your notes after each patient visit? I didn't assume doctors had that kind of time.
I worked in a healthcare for over a decade (actually for a company that Nuance acquired previous to their acquisition) and the previous workflow was they'd pick up a phone, call a number, say all their notes, and then have to revisit their transcription to make sure it was accurate. Surgeons in particular have to spend a ton of time on documentation
eig
I think you may be misunderstanding how the tool is used (at least the version I used).
The doctor talks to the patient, does an exam, then formulates and discusses the plan with the patient. The whole conversation is recorded and converted to a note after the patient has left the room.
The diagnosis and plan was already worked out while talking to the patient. The ai has to convert that conversation into a note. The ai cant influence the plan because the plan was already discussed and the patient is gone.
zeagle
AI is an assistive tool at best but it can probably speed up by reflowing text. I use dragon dictation with one of the Philips microphones and it makes enough mistakes that I would probably spend the same time editing/proofing. Had a good example yesterday where it missed a key NOT in an impression.
As aside, the after work is what burns out physicians. There is time after the visit to do a note, 5 min for a very simple is reasonable to create dictate fax do the work flow for billing and request a follow up within a given system. A new consult might take 10 min between visits if you have time.
For after hours, ER is in my opinion a bad example because when you are done, you are done.
Take a chronic disease speciality or GP and it is hours of paperwork after clinic to finish notes (worse if teaching students), triage referrals, deal with patient phone calls that came in, deal with results and act in them, read faxes etc. I saw my last patient ~430 yesterday and left for home at 7 dealing with notes and stuff that came in since Thursday night.
bpodgursky
> I, as your patient, I never NEVER want the AI's treatment plan.
You as a patient are going to get an AI treatment plan. Come to peace with it.
You may have some mild input as to whether it's laundered through a doctor, packaged software, a SaaS, or LLM generated clinical guidelines... but you're not escaping an AI guiding the show. Sorry.
forum-soon-yuck
[dead]
_qua
You'd be horrified to learn how many doctors spend hours at the end of their day finishing notes on patients. It's a nightmare.
Ukv
> And let me make this clear. I, as your patient, I never NEVER want the AI's treatment plan. If you aren't capable of thinking with your own brain, I have no desire to trust you with my health,
To my understanding this tool is for transcription/summarization, replacing administrative work rather than any critical decision making.
> just like I would never "trust" an AI to do any technical job
I'd trust a model (whether machine-learning or traditional) to the degree of its measured accuracy on the given task. If some deep neural network for tumor detection/classification has been independently verified as having higher recall/precision than the human baseline, then I have no real issue with it. I don't see the sense in having a seemingly absolute rejection ("never NEVER").
zora_goron
This doesn’t necessarily apply to this particular offering, but having working in clinical AI previously from a CS POV and currently from as a resident physician, something I’m a little wary of is the “shunting” of reasoning away from physicians to these tools (implicitly). One can argue that it’s not always a bad thing, but I think the danger can lie in this happening surreptitiously by these tools deciding what’s important and what’s not.
I wrote a little bit more of my thoughts here, in case it’s of interest to anyone: [0]
On that same vein, I recently made a tool I wrote for myself public [1] - it’s a “copilot” for writing medical notes that’s heavily focused on letting the clinician do the clinical reasoning, with the tool exclusively augmenting the flow rather than attempting to replace even a little bit of it.
[0] https://samrawal.substack.com/p/the-human-ai-reasoning-shunt
mbb70
It does feel like we are hurtling towards a world where every industry will have a high volume producer of generated content, which will force the creation of a high volume summarizer of generated content.
"Having trouble processing a medical claim with 50+ pages of notes? Not to worry, Dragon Copilot Claim Review(tm) trims the fluff and tells you what really happened!"
"Having trouble understanding a large convoluted PR? Not to worry, Copilot(tm) Automated Review has your back!"
"Having trouble decided which cordless vacuum to buy? Not to worry, Amazon's Customers Say(tm) shows you what people think!"
There is definitely _some_ world utility to this arms race, but is it enough?
lm28469
It's dumb and I hate it. It's exactly the same with job applications: AI generated resumes and AI generated cover letters read by AIs, we might as well save the compute time and send bullet points, but no we all have to continue the dance even though the music stopped. So many bright minds working on such degenerate technology... the flip side is that I spend less and less time online as LLMs greatly accelerated the slow rot that had taken hold of the web
TeMPOraL
The way you described it, that's not a problem at all, but a clear improvement. Thing is, every industry already has "a high volume producer of generated content" that, except for the last case, arose organically, due to reasons other than trying to confuse the reader. The creation of "a high volume summarizer" doesn't automatically mean an arms race.
Medical claims won't be growing in pages just because a doctor can parse them a bit faster. They may grow initially, because it's likely that people's mental capacity is what keeps other factors from ballooning the claims further - but it'll level out when some other practical limit is reached. Same with coding and PRs, same with research and all kinds of activities - except advertising.
There, AI will (already is) causing an arms race, because the "high volume producer"'s goal is to overwhelm their victims, so if the victims start protecting themselves with AI tools, the producer will keep increasing production to compensate. But that's not the fault of AI - it's the fault of allowing the advertising industry to exist.
supriyo-biswas
Personally all I can hope for is that people start seeing it for what it is and just shorten their communication, foregoing the use of LLMs.
AtreidesTyrant
Didn't this have issues recently where symptoms or stories were hallucinated and attributed to the patient?
This seems like a tool that insurance companies would love to get a copy of the data stream, and that could get very sticky quite quickly.
burnte
I got my company into the beta of DAX Copilot, and it's ok. It's not fabulous. After a year only a third of doctors were still using it. We switched to another product that works better for our providers, but also costs a fifth as much as Dragon. Dragon Copilot is MASSIVELY overpriced, and it is not the premier healthcare note summary product now.
DebtDeflation
This sounds like a basic STT/Transcription app. What makes it a "Healthcare Virtual Assistant"? Presumably it's been trained on a medical dictionary to recognize vocabulary from this domain? Dragon has been making transcription apps since 1997, originally based on Hidden Markov Models, I assume since updated to use transformers.
potatoman22
It reformulates the visit transcription as medical notes. That's the "virtual assistant" part afaik.
davikr
Interesting, but there is a lot of "intent" in writing notes and I am not convinced it could capture the full picture without significant human supervision. Would it really save time writing paperwork if you have to go through it anyways and check if there's anything wrong? At least when I write, I know it's correct.
Ukv
> Interesting, but there is a lot of "intent" in writing notes and I am not convinced it could capture the full picture without significant human supervision. [...] At least when I write, I know it's correct.
To my understanding, notes would otherwise largely be written from memory after the visit - which adds a fairly significant opportunity for omissions and errors to sneak in.
It seems plausible to me that by fixing that low-hanging fruit, this tool could potentially reach current human levels of accuracy overall even if it has shortcomings in other areas, like not being as good at non-shallow reasoning. Not to necessarily say it's currently at human-level.
> Would it really save time writing paperwork if you have to go through it anyways and check if there's anything wrong?
Five minutes saved per encounter, allegedly[0]. The decrease in clinician burnout and patient satisfaction also seem pretty significant. But, not sure how much Microsoft have massaged those figures.
tantalor
Not surprisingly there is a lot of competition in AI medical scribe software.
Some other companies in this space are Epic, Freed, Nuance, DeepScribe, Nabla, Ambience, Tali, Augmedix
Closi
This is Nuance, Microsoft acquired them in 2021 :)
MangoCoffee
I'm not sure why some people are so hostile to this tool. It sounds like Dragon Speak plus AI. It's not going to replace your doctor.
bluefirebrand
I think people are understandably worried that their doctors are going to start relying more on the AI for diagnoses
The outcome of this is essentially that AI generated healthcare decisions will be superficially laundered through a human doctor, rather than a human doctor simply using the AI as a tool
This may be compounded by insurance companies using the AI as their guideline for their plans and payouts, and government healthcare agencies using the AI as their guideline for acceptable treatment practices
Healthcare is already not an ideal industry from a patient perspective in many places. It is difficult to imagine AI making this situation better for patients
amedeus72
I was selected as one of the testers for MS Copilot for a large hospital in Florida after tested Microsoft DAX for a while (dog shit). Copilot has nice features, like customization of the sections, but I doubt anybody would spend time playing with that too much.
I tested 5 other Ambient AI tools in the past 6 months. All of them can extract the Chief Complaint and do a decent job with the HPI, but as soon as you get a more complicated case, they all fell apart. Sections like Physical Exam became a mess and I have to rewrite the whole thing.
So far the best, in my opinion, is still LucasAI by Lucas Health. Super simple to use, very basic interface. It just works and produces the best notes. I barely touch them. With the ICD10 codes, sometimes it doesn't pick the very best. This thing has been improved over the past month, but it's always much better than Copilot.
The AVS are both good, but with LucasAI I can translate immediately in Spanish and send it over to the patient via email/sms.
I haven't explored the integration with Copilot. Bad news, I know for a fact LucasAI is not fully integrated with Meditech yet, but I just copy and paste the whole note in 10 seconds. Not a problem. My buddy is using ECW and he said it's integrated. Haven't seen with my eyes.
I’ve been beta testing this for several months. It’s OK. The notes it generates are too verbose for most medical notes even with all the customization enabled. Most medical interviews jump around chronologically and Dragon Copilot does a poor job of organizing that, which means I had to go back and edit my note which kind of defeated the purpose of the app in the first place.
It does a really good job with recognizing medications though, which most-patients butcher the name on.
Hallucinations are present, but usually they’re pretty minor (screwing up gender, years).
It doesn’t really seem to understand what the most important part of the conversation is, it treats all the information equally as important when that’s not really the case. So you end up with long text of useless information that the patient thought was useful but not at all relevant to their current presentation. That’s where having an actual physician is useful to parse through what is important or not.
At baseline it doesn’t take me long to write a note so it really wasn’t saving me that much more time.
What I do use it for is recording the conversation and then referencing back to it when I’m writing the note. Useful to “jog my memory” in a structured format.
I have to put a disclaimer in my note saying that I was using it. I also have to let the patient know upfront that the conversation is getting recorded and I’m testing something for Microsoft, etc. etc. You can tell who the programmer patients are because they immediately ask if it’s “copilot“ lol