In dismounting harry, Woody Allen defines the only priority of almost everyone who thinks they are sick: “The most beautiful words in English are not I love you!, but it is benign!”. For years, faced with health concerns, many people go to Google to be diagnosed. Often what they get is more anxiety that their doctors have to deal with afterwards. Now, the company, through which many find information, find their way around cities or make dinner reservations, can improve its position as a source of information on these existential questions through artificial intelligence models with which to accurately answer questions about medical issues.
In an article that is published today in the magazine Naturea company team shows the results of their work with Med-PaLM, a generative artificial intelligence model similar to ChatGPT, which feeds on large databases and manages to organize that information to give answers that make sense, although not always true. The second version of this technology, Med-PaLM 2, already achieves an accuracy of 86.5% in test-type exams such as those that doctors must pass at the MIR, an increase of 19% compared to the previous version, presented in this article.
In the work published today, the authors, mostly members of Google Research, test their models on large databases of medical questions and answers that also include more than 3,000 of the most searched questions by users on the Internet. According to an account by email Shek Azizi, one of the authors of the article, the evolution of the results “has passed in three months from a performance of approved scraping to an expert level” in the tests that measure their ability to respond to these questions. A panel of physicians estimated that 92.9% of the long-form responses generated by Med-PaLM agreed with the scientific consensus, slightly above the 92.6% of responses given by human physicians. When comparing the number of responses that can lead to harmful results, the machines won, with 5.8% versus 6.5% for doctors. Although the data is promising, the authors say more research is needed to bring these models to healthcare settings, and Azizi says she does not envision “these systems being used autonomously or replacing physicians.”
Josep Munuera, director of Radiodiagnosis at the Hospital de la Santa Creu i Sant Pau in Barcelona and an expert in technologies applied to health, believes that these models can be useful, but warns that “the job of doctors does not only consist of answering questions” such as those presented to these models. “It is necessary to explore or pay attention to non-verbal language to offer a diagnosis,” he points out. Later, technologies such as the one developed by Google can be used to alleviate the workload, preparing an understandable report for the patient or a treatment plan. “It can also be useful as support by giving ideas about a diagnosis or helping to search for scientific information in large databases,” he points out. “But then we need the human who checks what the AI proposes and also takes responsibility for the decision,” he concludes. “What clinicians do is multifaceted, far-reaching, and deeply dependent on human interaction. Our goal is to use AI to increase the ability of doctors to provide better treatment,” agrees Azizi.
In an interview in EL PAÍS, the MIT scientist and expert in AI applied to medicine, Regina Barzilai, warned that machines, which learn on their own from the guidelines offered to them, can surpass humans in some abilities and “our ability to see if they are doing something wrong is minimal.” “We have to learn to live in this world where technology makes a lot of decisions that we can’t oversee,” she warned. Anyone who has used ChatGPT will have verified the ability of these systems to generate completely credible answers and dotted with falsehoods that, precisely because they are well expressed, are more difficult to detect. Azizi, like Barzilai, knows that some of the answers that the machines give us may be correct, but that we do not know exactly where they come from, something that in matters as delicate as doctors can generate insecurity.
In some applications of this technology, which do not include the diagnosis of patient diseases, but rather the search for knowledge, hallucinations, as invented parts in AI-generated texts are known, may not be a problem. “Hallucinations and creativity are two sides of the same coin and some applications, such as drug repositioning or the discovery of gene-disease associations, require a certain degree of creativity, which, in turn, makes the discovery and innovation process possible,” explains Azizi.
José Ibeas, a nephrologist at the Parc Taulí Hospital in Sabadel and secretary of the Big Data and Artificial Intelligence Group of the Spanish Society of Nephrology, believes that this type of technology is the future and will be very useful to improve medical treatment, but he believes that there is still much to learn. “For example, they get the information from high-quality sources, but not all the publications are the same and many times there are no publications of negative data, of experiments in which something is tested and the expected result is not obtained. The AI builds a text from those texts, but I don’t know what ingredients it has taken from each type of article and that can cause biases”, points out Ibeas. “The same treatment can be useful for a population group with a disease in a specific setting and not be so for another population group,” he exemplifies.
For now, for Ibeas, these types of models can be a resource for doctors, but in the future their usefulness must be verified, as is done with other medical products before approval, “comparing the results of doctors in regular practice with those who use this technology.” The specialist also states that care should be taken with the application of this technology, that doctors be trained in its use and that it be used in cases where it is going to be really useful and it does not happen “as with some very good products in medicine, that due to commercial pressure to apply them to everyone, errors occur and the possibility of using a very useful technology ends up being lost”.
One last aspect that will be relevant in the use of these generative language models will be the possibility of giving access to quality answers to many people who do not have access to it. The authors themselves point out that their comparisons, in which the AI already comes out very well, were made with very high-level experts. Some physicians are concerned that this possibility could be an excuse to cut resources dedicated to healthcare, even though they recognize the usefulness of models such as Med-PaLM in these contexts.
You can follow THE COUNTRY Health and Well-being in Facebook, Twitter and instagram.