Breast cancer is the most common malignancy in women. However, what we term breast cancer is actually a heterogeneous group of cancers that can be considered as a combination of a near-finite number of parameters in which individual patient outcomes are driven by an interplay between tumor- and patient-characteristics. This complexity makes it challenging to predict outcomes and tailor treatments for individual patients. Indeed, owing to the sheer diversity of combinations, the prognosis for breast cancer varies greatly.
Intracranial metastatic disease (IMD) is a particularly aggressive and life-threatening complication of breast cancer that dramatically worsens prognosis, leads to significant neurological impairments and substantially reduces survival rates. The insidious nature of IMD and costs associated with screening complicate efforts for early detection and intervention, further, identifying at-risk patients has been to date impracticable. Conventional statistical techniques struggle to manage the high dimensionality and complex interdependencies of the potential prognostic factors involved and often fall short in predictive power and reliability. These difficulties are further exacerbated by the rarity of IMD cases.
Our research aims to bridge this gap by leveraging machine learning (ML) techniques to analyze large, complex datasets to identify patterns and interactions that would be impossible to discern using traditional methods. Specifically, we are employing Random Forest Classifiers on a high-dimensional dataset of more than 50,000 patients with breast cancer, about 15% of whom developed IMD. Our preliminary results have been quite promising: our initial models to identify patients who will develop IMD have exhibited a positive predictive value of 93% and a sensitivity of 68%, on a held-out validation set. Furthermore, in the hopes that the results of our model will not merely constitute a computational exercise but will occupy a clinical sphere, our model architecture addresses a foundational hurdle in medical ML: the lack of epistemic insight. By opting for the inherently interpretable decision-tree-derivative models as well as employing permutation testing, we are able to elucidate the relative magnitude and direction contribution of each parameter in our model.
Ultimately, our work demonstrates the utility of ML algorithms in addressing questions impregnable to conventional statistical techniques. By accurately identifying those at higher risk, we open the possibility of earlier and more intensive evidence-based surveillance to identify patients with IMD before it manifests clinically. This alteration would transform the management of breast cancer patients at risk of IMD from a reactive to a proactive approach and hopefully improve outcomes for breast cancer patients.



