In this era of evidence-based medicine, substantial pressure is being brought to bear on physicians to measure quality of care. For many physicians, published clinical guidelines for patient care are the obvious tool of measurement. A basic tenet of evidence-based medicine is that of accepting randomized, controlled trials, or RCTs, as “class I evidence,” and thus as the basis for determining standard of care. Guidelines in surgery rarely are based on RCTs, the most stringent level of evidence — most guidelines rely on class II or III evidence, such as nonrandomized studies and case reports — but even when they are based on class I evidence, inherent problems with RCTs limit their usefulness in determining the effectiveness of neurosurgical procedures.
The effectiveness of neurosurgical procedures could be determined and improved by the use of prospective, continuous data collection and analysis in a well-designed, risk-adjusted, procedure-specific registry. Such a system would encourage continuous quality improvement and would be applicable to a wide range of neurosurgical procedures and practice sites. This article will explore the weaknesses of RCTs for determining quality care in neurosurgery, and the value of outcomes data analysis available through a procedure-specific registry.
Randomized, Controlled Trials Examined
The RCT methodology was developed to address three problems common to clinical research — bias, confounding and chance. To do this, the properly designed RCT has four essential components: concurrent comparisons to eliminate temporal bias; objective observation of clear endpoints to eliminate physician and patient bias; randomization to equalize the effects of unknown, confounding variables; and a representative, adequately sized patient population to reduce the likelihood of chance errors. The ideal RCT, the adequately powered, double-blind study with unambiguous endpoints, has all of these components. Unfortunately, most surgical RCTs cannot approximate this ideal.
An RCT is performed to determine the presence or absence of a treatment effect. Before beginning the trial, the null hypothesis — a statement that there is no statistically significant difference between treatments — is accepted. In a positive study, the null hypothesis is rejected, indicating a significant difference between treatments. If the null hypothesis cannot be rejected, a negative result, the study concludes that there is not a statistically significant difference between the treatments. For positive trials the chance that the observed difference was seen, even though the null hypothesis was true, is represented by the P value. A trial with a P value of less than 0.05 tells us there is less than a 5 percent chance that results as different as those observed in the study occurred by chance alone.
For negative studies the power of the study is important. Power is the likelihood of determining a positive result if there is a real therapeutic difference between treatments. Stated simplistically, a study with a power of .80 means that there was an 80 percent chance of finding a difference of a predetermined magnitude if such a difference really existed. The power of a study is dependent on sample size, the magnitude of the treatment effect chosen and the statistical tests employed.
For many clinical studies the well-designed RCT is an immensely powerful tool. Consider a double-blind RCT evaluating mortality from myocardial infarction in patients who receive either placebo or aspirin after the event. In this RCT neither the patient nor the investigator know which compound is administered, there are no patients who cross over from one treatment to the other, and the endpoint is unequivocal. If this study is adequately powered it will produce unambiguous results. Furthermore, if the patient population studied in this RCT is representative of the universal population of patients suffering myocardial infarction, and the trial shows a significantly better outcome with aspirin, then aspirin can be given to myocardial infarction patients with a high degree of assurance that one is delivering quality care.
Application of RCTs to Surgery
However, surgical trials differ from this example in several important ways. Because nearly all surgical trials are unblinded, patients may elect to cross over from one treatment arm to another, such as from medicine to surgery. To preserve the benefits of randomization, it is necessary to analyze patients in their assigned groups even if they cross over to another treatment arm (intention-to-treat analysis).
Crossovers create problems in any clinical trial. In trials comparing medical to surgical treatment the problems are compounded because the crossover periods often are asymmetrical. After assignment to surgery there is a short period of time, preoperatively, during which the patient may elect other treatment. Patients have a comparatively longer time span in which to consider changing from medical to surgical treatment. For example, in a trial comparing surgical to nonsurgical treatment of back pain, the patient who is randomized to medical treatment may try this for weeks or months, have persistent pain, choose to have surgery and then do well. However, the good outcome at follow-up will be assigned to the medical treatment arm; is there anyone who would consider this to be reasonable? Statistical methods exist to deal with crossovers, but these methods ameliorate rather than eliminate the problem.
It is also difficult in many neurosurgical trials to define clear endpoints. A neurosurgical RCT does not eliminate bias if endpoints are ambiguous and neither the patient nor the evaluator is blinded. Patients may experience a substantial placebo effect with surgery and investigators may harbor a surgical or nonsurgical bias. Having someone other than the operating surgeon evaluate patients postoperatively does not solve this problem. Any unblinded observer will bring his or her bias to the evaluation.
It is also more difficult in surgical trials to choose a representative patient population because of the problems of therapeutic imperative and equipoise. The surgeon has an implicit contract with the patient to offer the best care available (therapeutic imperative). If the surgeon does not believe that surgical and nonsurgical treatment arms are equally efficacious (equipoise) he or she will offer surgical treatment outside the trial to those patients he or she believes are most likely to benefit. Only those patients less likely to benefit from surgery are randomized, skewing the patient population to the detriment of the surgical treatment arm.
Surgical RCTs also suffer from problems with surgeon selection. In a study comparing aspirin to placebo it really doesn’t matter if the medical student or the chief of cardiology writes the order to administer the agent. This is not the case with surgical trials, where the skill and experience of the surgeon have profound effects on outcome. A study showing a benefit from surgery with a highly experienced group of surgeons will not be applicable if the outcomes of an individual surgeon fail to match those of surgeons in the study. Similarly, a study showing no surgical benefit may not be applicable if the study surgeons have outcomes significantly worse than a surgeon with exceptional skill and experience.
A final issue with surgical RCTs is their cost in time, effort and money. In order to have enough patients to properly power a study, large multicenter trials often are necessary. These are expensive, time consuming and labor intensive, making it difficult or impossible to repeat a trial, even if there are grave concerns about the validity of the study. Because RCTs often take many years to complete, their results may be meaningless if new technology has developed during the trial that could affect patient outcomes.
SPORT, Scrutinized
The results of the Spine Patient Outcome Research Trial, the first multicenter prospective randomized trial of surgical versus nonsurgical treatment of patients with lumbar spinal stenosis, spondylolisthesis, and disc herniation, currently are being analyzed and are likely to be published in the near future. While I do not know the results of this study, I have grave concerns because the problems inherent in surgical RCTs exist in SPORT. A summary of SPORT, based on my May 2000 report to the AANS/CNS Washington Committee, may help to illustrate some of these points.
One problem is that of patient selection. Primary care physicians will send patients with severe pain and radiographically documented structural spine problems directly for neurosurgical or orthopedic evaluation and treatment. Patients with equivocal findings are more likely to be sent to a comprehensive spine clinic. Even within the spine clinic population, the investigators estimate that they will be able to randomize only 15 percent to 40 percent of patients who meet study criteria. Those patients who are evaluated but elect not to be involved in the randomized study will be followed. It is likely that patients with more severe symptoms and more impressive structural pathology will be triaged to surgical care. This will eliminate patients from the randomized study who are most likely to respond to surgical intervention and raises the question as to whether or not the study population will be representative of lumbar surgery patients. If the patient in agony with a large free disc fragment benefits more from surgery than the patient with intermittent sciatica from a bulging disc, and if the former patient type is underrepresented and the latter, overrepresented in the study, the benefits of surgical intervention will be underestimated.
There also are problems with the methodology in regard to crossover patients. In order to retain the benefits of randomization, the study is designed as an intention-to-treat analysis. Patients are considered members of the group to which they were randomized, even when they have crossed over to the other treatment group. The investigators anticipate that up to 25 percent of the patients originally randomized to nonsurgical therapy may cross over to the surgical group. If this group of patients then does well in long-term follow-up, the benefit will be credited to the nonsurgical treatment group. This design will maximize the benefits of nonsurgical treatment and minimize the benefits of surgical care.
Attributes of a Procedure-Specific Registry
Infrastructure and opportunity currently exist to develop a procedure-specific registry that would produce the data necessary to improve quality patient care in neurosurgery. Two examples are the NeuroLog system developed by the American Board of Neurological Surgery and the NPH Registry developed by the AANS with Outcome through the Neuro-Knowledge program.
NeuroLog is an Internet-based data collection system that has been used to collect case information for residents. The system catalogs operative data that can be compared to national benchmarks established by the Residency Review Committee. The ABNS has considered plans to adapt the system to collect the case information needed for the practice performance component of ABNS Maintenance of Certification and to expand use of the system to other practitioners for practice assessment.
If each ABNS-certified neurosurgeon were to continuously submit outcomes data on one procedure that he or she performs frequently, the data generated could become a very valuable quality improvement tool. Analysis of outcomes and practice variations over wide geographic areas could be conducted efficiently, and neurosurgeons in solo practice would be able to participate in the database as easily as those at academic centers. Data in the central database could be analyzed and hypotheses generated to determine best clinical practices. Individual outcomes that differed substantially from the universal database norms would trigger educational intervention. It would then be possible to determine if the intervention had a positive effect on subsequent outcomes. Such a system could be used for Maintenance of Certification, pay-for-performance requirements, state reporting requirements and hospital-based quality improvement efforts in neurosurgery.
The Normal Pressure Hydrocephalus Registry is an example of a procedure-specific registry that establishes reliable longitudinal data. The participating surgeon establishes a patient’s baseline information by completing an “initiation form” during the first visit. The form details demographics, NPH history and etiology, comorbidities that are present, imaging procedures that have been done, treatment thus far, and supplementary tests such as spinal tap. At the end of the form, the surgeon indicates a decision to follow the patient or to treat the patient surgically. If the decision is to treat the patient, the surgeon completes a “surgical treatment form” following surgery that describes the shunt procedure (new shunt, revision, endoscopic third ventriculostomy), whether a fixed or variable valve was used as well as the shunt’s brand name and valve setting, and the shunt configuration at placement or revision.
Six months after the initial visit or surgical treatment, the “follow-up form” is completed. On this form, completed after annual visits for five years thereafter, the surgeon records assessment of the patient’s status, degree of improvement, any imaging procedures since the last visit, comorbidities that affect outcome, complications of surgery and recovery, and, if applicable, the date and cause of death.
Participating surgeons can access the data they have provided to the registry and compare their data on patient symptoms, complications, and shunt procedures to the aggregate information. The aggregate data will also be reviewed regularly by an advisory board whose responsibilities include guiding the scientific direction of the NPH Registry, reviewing and modifying the data collection protocol as necessary, creating and implementing a data analysis and publication review process, reviewing and evaluating domestic and international proposals for analysis and publication of data, and encouraging neurosurgeon participation.
Key elements underlying the success of the NPH Registry are unencumbered accessibility and ease of use. The Web-based information platform supports electronic practice and research tools including electronic data capture that is compliant with regulations set forth by the U.S. Food and Drug Administration and privacy laws. The system allows individual surgeons to access patient information securely through the Internet, including through a hand-held computer, whenever a registry patient presents, and to customize data forms to include additional data elements of interest to them.
A Viable Alternative
I have been involved in clinical trial design and application for many years, including service as chair of the AANS/CNS Committee for the Assessment of Quality, the AANS/CNS Cerebrovascular Section representative to the AANS Guidelines Committee, the chair of the AANS/CNS Outcomes Committee, a member of the American Heart Association Stroke Council’s Guidelines Oversight Committee and presently as chair of the AANS/CNS Washington Committee’s Quality Improvement Workgroup. I also have participated in guidelines development for carotid endarterectomy, secondary stroke prevention and management of subarachnoid hemorrhage. This experience with clinical trial design and application has led me to question the application of RCTs as class I evidence in surgery.
Although RCTs are powerful tools for clinical research, their inherent problems for surgical trials make it unwise to rely solely on RCTs to establish standards of care in surgery and then to codify these results in clinical guidelines. To do so lends credence to bad science. If continuous quality improvement and applicability to a wide range of neurosurgical procedures and practice sites are desired, in my judgment, a necessary addition is to develop a registry that allows continuous collection of data on neurosurgical procedures and risk-adjusted analysis of outcomes. Such data, collected and shared in a nonpunitive environment, has been shown to result in improving the quality of surgical care. So far, nothing else we have tried has worked.
Robert E. Harbaugh, MD, FACS, is chair of the Quality Improvement Workgroup of the AANS/CNS Washington Committee.