Artificial Intelligence (AI) is such a hip term that it can be difficult to sort out what it means, how it connects to related terms like “machine learning” and if it is something we should be using in our personal and professional lives. For the busy neurosurgeon, we provide a brief overview of AI and food for thought on whether and how it may be valuable to your clinical practice or research.
AI refers to computer capability to simulate human knowledge or cognitive processing and has three levels of functionality. Artificial narrow intelligence is the lowest level of AI and the only one currently in existence. It has one of two functions: it is reactive, meaning it has no memory and can only work with present information to perform a very specific task (e.g. IBM’s Deep Blue chess master); or it is limited-memory AI, which can use stored information from training. Subcategories of limited-memory AI tools include machine learning, natural language processing/large language models, generative, virtual assistants, computer vision, self-driving vehicles and robotics. The middle and highest tiers of AI functionality are only theoretical as of July 2024 with no imminent need for worry. These are artificial general intelligence, which would build on unrelated learnings to perform new tasks in different contexts and be able to understand human emotion and relationships, and artificial superintelligence, which would provide the computer cognitive abilities exceeding those of its human creators and self-awareness of its own beliefs and desires. As this author aspires to write neither science fiction nor horror, we will continue our discussion with types of limited-memory artificial narrow intelligence.
One major category of AI is machine learning, which broadly refers to the process of statistical learning and optimization using an algorithm (program) without direct human input. This allows for fast processing of vast quantities of data and identification of insights too subtle or unexpected for a human to find on their own. Machine learning can be further subdivided based on the type of data fed into the model and the type of algorithm. The data categories refer to the degree of “supervision,” or human processing, of the data before the algorithm sees it. Supervised machine learning involves a pre-labeled, pre-classified dataset, with the downside of more manual work up-front, but the advantage of the machine learning model being able to check its work as it goes. Unsupervised learning eliminates that up-front work, but the computer must identify patterns of similarities and differences without human help. Semi-supervised learning is a combination of these approaches, and all three of these require a training set of data to create the model, prior to its application on a test or new set of data. A model which does not use training data, but rather is based on trial-and-error and “rewards” or “punishments” for feedback, is called a reinforcement model. An example of this is the IBM Watson computer playing Jeopardy!, where it inherently needed to maximize its winnings. Yet in all these cases, the rules of the road or the correct answers are known ahead of time. Deep learning pushes the frontiers of machine learning, requiring the computer to learn without human rules or knowledge, such as in cases where it is not known what the important features of a dataset are to derive the desired result.
But what exactly are machine learning “algorithms”? You are already familiar with two main types: linear regression and logistic regression. These extremely common statistical techniques form the basis of machine learning.2 A linear regression model attempts to predict a continuous numerical outcome based on independent input variables with different weights of importance, where there is an expected straight-line relationship (a change in an independent variable affects the dependent variable at a fixed ratio). Logistic regression predicts the probability of a binary outcome based on input variables; this forms the basis of a supervised machine learning algorithm for classification of data into one bucket or another. Decision tree algorithms take this a step further by linking multiple binary steps to sort input (like clinical protocols such as ACLS), and random forest algorithms combine an ensemble of decision trees to either classify discrete variables or perform regression analysis on continuous variables. You may have heard of a newer, more nuanced approach called neural networks. Much like a neuron receiving myriad dendritic inputs of different weights and summating them to make a go/no-go firing decision, neural networks link multiple neuron-like processing nodes together to learn quickly from training data and adjust weights of inputs to improve model performance.
There are of course challenges in leveraging powerful AI models for real-world work. An obvious one is that trained models are highly dependent on the training data: garbage in, garbage out. How generalizable the training set is to the test or broader dataset is another issue. The algorithms themselves can be faulty. You have likely heard of the concern of “bias” in algorithms. While we often think of this as related to historical stereotypes, bias in the context of AI more generally means that the model is rushing to a conclusion in consistent ways, ignoring factors that it should pay attention to when classifying or labeling a data point. Another term for this is “underfitting.” If, in the process of addressing this, the model becomes “over-fit,” it pays too much attention to noise in the training data and demonstrates excessive variance, misclassifying data in inconsistent ways. This tradeoff between bias and variance is an inherent challenge in decision-tree-based models. Improving AI models often requires huge amounts of data and multiple teams working to refine algorithms. An open-source approach, in which anyone can access the underlying code and create new versions, is seen as one way to collaboratively improve model development and expand uses, but many large firms working on AI do not simultaneously make their vast, valuable and proprietary data sets available, challenging the notion that they truly are open-source.
How does one begin to explore the potential of AI and consider its use in neurosurgery? The form of AI that is perhaps easiest to take for a test drive is the large language model (LLM), which processes language input, can search and “read” relevant sources and provides answers to queries. Examples of cloud-based LLMs that offer a free version are ChatGPT from OpenAI, Claude from Anthropic, Gemini from Google and Llama 3 from Meta, which is one of a few open-source LLMs. These models undergo frequent refinement and are evaluated against benchmarks of providing accurate rote knowledge or even graduate-student-level reasoning, or to write computer code. Potential use-cases for neurosurgery include identifying patients for enrollment in clinical trials, searching medical literature, responding to patient or nurse phone calls and even writing clinical notes. In any of these applications, data will be the fuel that powers AI utility and scale, so integration with the electronic medical record and imaging data, and interoperability of information across health systems, will be paramount to making AI useful for neurosurgeons and patients. Although AI development is currently in early stages, understanding the basic building blocks can equip neurosurgeons to provide input and collaborate on optimizing results.
Jay Nathan, MD, is a neurosurgeon with Trinity Health IHA Medical Group in Livonia, Mich., and focuses on treating degenerative spinal disorders. He is the co-chair of the Neurosurgery Quality Council of the AANS-CNS Washington Committee and is actively involved in the areas of health policy and quality improvement.




