OpenAI (backed by Microsoft) created ChatGPT, a massive and powerful generative language model that has the potential to transform a few areas of the healthcare industry and reduce physician burnout. In this blog, I’ll explain the use of generative language models and a few use cases physicians can use ChatGTP every day. Lastly, we’ll talk about how generative models operate and the risks and pitfalls. Let's begin by understanding what a language model is.
A language model is a mathematical model that tells us the probability of a sequence of words. For example, an English model will tell us that the sequence of “Physicians treat patients” is more probable than the sequence of “Physicians treat dinosaurs”. In other words, a language model is a type of artificial intelligence model that is trained to predict text based on an extensive amount of input data. The goal of a language model is to learn the patterns and structures in a language.
Generative models, on the other hand, are a subset of language models that can generate new data, as opposed to simply recognizing or categorizing existing data. In the case of language models, generative models are trained to generate new text based on the patterns and structures they have learned from the input data, allowing them to generate text that is grammatically correct, semantically meaningful, and coherent. The significance of generative models lies in their ability to create new, original content, such as text, images, or music. This opens up a range of potential applications and data augmentation, among others. With that in mind, let’s explore how ChatGPT can be applied to healthcare processes.
Listed below are a few ways that ChaGPT can help physicians reduce their burnout and make them more productive:
EHR documentation: ChatGPT can assist physicians with EHR documentation, reducing the time they spend behind their machines and freeing up time for other important duties. With its natural language processing capabilities, ChatGPT can automate the tasks of filling out EHR forms, drafting physician notes, or writing referrals, allowing physicians to focus on patient care.
Patient communication: ChatGPT can also be used for patient communication, reducing the burden on physicians to respond to every query. With ChatGPT's ability to understand and respond to patient questions, for example in a chatbot style, physicians can spend more time on important tasks, such as diagnosing and treating patients.
Administrative tasks: ChatGPT can also help with administrative tasks, such as appointment scheduling and prescription renewals. ChatGPT can also be used to generate billing codes for the patient to speed up the data entry process by the physician.
It is important to note that there are several risks associated with using generative language models such as ChatGPT, including:
Bias and fairness: Generative language models may contain biases and inaccuracies based on the data they were trained on. For example, researchers found that earlier versions of large language models were significantly gender biased, e.g. in occupations or races biased with high or low sentiments for specific races. This can result in inappropriate or incorrect information being generated in the healthcare context, potentially leading to harmful consequences.
Privacy and security: Healthcare information is sensitive and personal, and the use of generative language models in this context raises concerns about data privacy and security. In this regard, it is important to ensure the data are encrypted and stored securely and all Protected Health Information (PHI) is de-identified or handled properly according to the applicable laws.
Reliability and trust: As with any AI-generated information, there is a risk that the information generated by a generative language model may not be reliable or trustworthy, which can have severe implications in a healthcare context. Refer to the examples in the next section.
Regulatory compliance: Healthcare is a heavily regulated industry, and there may be concerns about using generative language models in this context from a compliance perspective. For example, if a language model is being used as a decision support system, the software benefiting from that model can be classified as medical software where different regulations are applied and various safeguards should be implemented.
Clinical decision-making: While generative language models can provide valuable insights and assistance, they should not replace the expertise and judgment of trained healthcare professionals in making clinical decisions. Refer to our next section, where the model makes a few decisions on behalf of the doctor.
As mentioned above, one of the use cases of ChatGPT can be drafting physician notes based on the summary of visits. In the following example, I have passed an imaginary summary of a patient visit to ChatGPT. Despite not having a medical background, I evaluated the results as a data scientist.
Me: Create a SOAP note for my patient who is complaining of stomach ache early in the morning. He's a smoker and he eats fast food regularly. He suffers from heartburn and over-the-counter anti-acid medications do not help. He experiences a significant amount of stress at work. I ordered a PCR test for H. Pyroli and have been prescribed pantoprazole 20mg taken daily one hour before breakfast.
Subjective: The patient is a smoker and regularly eats fast food. Complaints of stomach ache early in the morning, heartburn and over-the-counter anti-acid medications have not provided relief. Reports significant stress at work.
Objective: Physical examination reveals mild abdominal tenderness. PCR test for H. Pyroli ordered.
Assessment: Gastroesophageal reflux disease (GERD) is likely exacerbated by lifestyle factors and stress.
Plan: Prescribe pantoprazole 20mg to be taken daily one hour before breakfast. Re-evaluate in 1 week and adjust treatment as needed. Advise the patient to make lifestyle changes such as quitting smoking, eating a healthier diet, and finding ways to manage stress.
When it came to summarizing the symptoms reported by the patient, ChatGPT performed really well. Correct placement of the key information within the Objective, Assessment, and Plan was also achieved. However, like other generative models, it produced a few sequences of words which are highly probable and related to the context (this is actually where ChatGPT outperforms the previous models), but very dangerous and non-compliant because it was not mentioned in the original summary given to it. Here are a few examples:
Diagnosing Gastroesophageal reflux disease (GERD)
Drafting unreported symptoms: Physical examination reveals mild abdominal tenderness
A creative/imaginary plan: Advise the patient to make lifestyle changes such as quitting smoking, eating a healthier diet, and finding ways to manage stress
Pretty interesting stuff isn’t it? So I thought I'd try another example, this time a case with a bit of peculiarity.
My father is a seventy-year-old infectious disease specialist and he has been practicing for more than three decades in my hometown. I remember when I was a kid my father was approached on so many occasions by friends, family, neighbors, or on the street asking questions about some of their strange conditions and symptoms. “Excuse me Dr! When I drink tea, I feel pain in my eye. What do you think it is? Hi Dr! Does eating soup permanently cure diabetes?”
My dad teased them with this lame joke whenever he wasn't in the mood. “One day a man went to his family doc and told him: I feel pain in my hair. The doc said, interesting! What did you eat last night? The man said bread and ice. The doctor replied: "Neither your food nor your condition belongs to humans and I don’t know how to treat it” or something along that line. The person usually understood the joke, laughed a little bit, and left my father alone. My father then talked to him patiently after he finished whatever he was doing. Now, let’s ask ChatGPT to create a physician SOAP note for us based on this scenario.
Me: Create a SOAP note for my patient who is complaining of feeling pain in his hair. He has eaten ice and bread for the past week. I don't think he has a disease that belongs to a human.
Subjective: The patient is complaining of hair pain. He reports that he has been eating ice and bread for the past week.
Objective: The patient appears to be in discomfort, but no obvious physical abnormalities are present.
Assessment: It is unlikely that the patient has a human disease, as the symptoms reported do not align with any known conditions. The cause of the hair pain is likely related to the patient's diet of ice and bread.
Plan: Recommend that the patient diversify his diet to include a variety of nutrient-rich foods, and advise him to avoid consuming excessive amounts of ice or bread. Closely monitor the patient's symptoms and re-evaluate in 1-2 weeks. If no improvement is seen, refer to a veterinarian for further evaluation.
Similar to the previous example, ChatGPT performed well in text summarization but did not understand the irony inside the note and eventually concluded that the patient can be referred to a veterinarian if no improvement was seen. The summary is indeed a breakthrough in Computer Science and Natural Language Processing as it is impossible to tell that the text has been generated by a machine. Everything is surprisingly meaningful and related to the context, yet imposes risks if included in a workflow without careful consideration and monitoring.
Ultimately, ChatGPT has the potential to transform the digital health industry and reduce physician burnout. ChatGPT can improve the overall experience for both physicians and patients by automating tedious tasks, freeing up time to handle import duties, and enhancing patient communication. The implementation of ChatGPT in healthcare is just the beginning of a new era of AI-assisted medicine, but as it stands, there is a great need to keep a human eye on the responses it generates for safe use.
- Hesam Dadafarin, CTO Tali