Common Sense: The Case Against EPIDs for QA of Each Fraction

Introduction: Turn Down the Hype (So You Can Hear the Logic)

As medicine evolves, it is important to use logic and common sense when assessing new technologies and methods. Radiation therapy is no exception. Before we adopt something new, it should be proven to add new value. It should improve patient outcomes, or increase efficiency, or decrease variability by driving out sources of errors; or, ideally, all of the above. Likewise, we should not adopt something new if it is inferior to competing methods or modalities, and especially not if it adds layers of complexity and cost.

It is promoted by some that an on-board, megavoltage electronic portal imaging device (MV EPID) is a vital tool for the quality assurance (QA) of each radiation treatment fraction. The MV EPID is presented as some sort of panacea for in vivo QA and dosimetry. However, a common sense evaluation of the alleged merits of EPID-based solutions (for per fraction applications) reveal that the arguments quickly unravel, especially when you consider other more robust and sensible methods to detect the same errors.

Look No Further: A Recent Publication on EPID-Based in vivo Dosimetry

To analyze the potential of EPID-based methods for per fraction applications, one need look no further than the pioneering efforts detailed in the literature. This is a hot topic not only lately, but also over the last five to ten years (ever since the MV EPID became a standard offering with modern linear accelerators), as evidenced by a rich library of publications. Even the publications that seem to endorse EPID-based in vivo dosimetry cannot help but reveal many of the inherent weaknesses.

Consider a new paper published by Bojechko and Ford called “Quantifying the performance of in vivo portal dosimetry in detecting four types of treatment parameter variations,” published in the December 2015 issue of Medical Physics (volume 42). This paper centers around sensitivity analyses where the authors: 1) introduce errors of various types, then 2) quantify the success of an EPID-based system to detect those errors. In summary, the EPID-based system studied was able to detect errors in overall dose level, systematic MLC leaf positional errors, and large changes in patient contour (wrong patient or extreme weight loss, for example). The EPID-based system failed, however, to perform well at detecting random MLC leaf errors or, more importantly, patient setup errors.

It is an interesting exercise to first examine not where the EPID-based system failed, but where it apparently succeeded, because even here it is easy to see that these errors are best detected with other more sensitive and simple tests.

  • Dose Level. First let’s take the “overall dose level” error (e.g. linac running hot or cold). Clearly, this error is best detected and quantified before any patient treatments have begun, and machine output checks are part of a standard and quick routine that can be performed once or more per day. To find an error like this during the QA a single fraction for a single patient (and most likely not be alerted until after fractions on multiple patients have been delivered) is nonsensical and inefficient. And, in the off chance that a single patient’s fraction received too many or too few monitor units (MU), then there are other checks that would have been quicker and more sensitive than a blurry transmission image, not the least of which is an alert from the R&V system or based on the machine log file vs. the expected MU from the plan.
  • Systematic MLC Leaf Positional Errors. Again, a systematic error like MLC leaf bank shifts is not something you want to be learning from an actual patient fraction! This is also part of routine machine QA and there are easy to perform highly sensitive tests for such errors. Planning to detect such a systematic fault during actual patient treatments is inefficient and poor adherence to the basics of quality management.
  • Changes in the Patient Body Contour. Two obvious real-world examples of changes or differences in the patient body contour would be: 1) gross weight loss or gain/swelling (compared to when the planning images were taken) or 2) the wrong patient is on the table. On the latter, well it is obvious that if we are depending on EPID images and software to warn of the wrong patient on the table, then there are bigger problems in the process. As for the former (i.e. weight loss or potentially swelling or weight gain), transmission EPID images are a crude tool best. Modern radiation therapy will employ volumetric image guidance techniques such as cone beam CT (CBCT) to find anatomy differences and display them clearly before the fraction is delivered. The only caveat is if the field of view (FOV) is much smaller than the patient volume. In any case, to assess whether the patient shape is different enough to warrant a revised treatment plan will require a volumetric image series anyway, and the EPID-based data are of limited value, as they “flatten” the data and make it almost impossible to diagnose what the anatomy changes actually were. It’s also worth noting that if transmission EPID images are the only data input, then a decreased patient volume will result in higher EPID pixel values which could be hard to differentiate from a delivery error of high dose output, and a large patient volume will result in lower EPID pixels which will resemble a low dose output. To correctly diagnose the true error would require the log files and a CBCT or other volumetric image series, and if you had those to begin with then the EPID is merely an extraneous distraction.

Now that we have examined where the EPID-based system “succeeded,” we can now discuss the two error types where it failed.

  • Random MLC Leaf Positional Errors. The authors correctly explain that random errors tend to cancel each other out over time. However, if significant random errors in MLC leaf position are happening, it may speak to a bigger problem that is worth finding and not ignoring. To study MLC leaf positional accuracy with a fine-tooth comb requires the log file data with sub-millimeter precision, and EPID images with blurry pixels whose sizes are on the order of positional errors are of little to no value. The results of the aforementioned publication prove this out.
  • Patient Setup Errors. Perhaps the one potential (but not realized) advantage of an EPID-based system over a log-file-based system would be if the transmission image could accurately detect and diagnose patient position errors. Contrary to how they are being marketed, this latest study shows that an EPID-based system does not effectively detect such errors. Then again, perhaps it is a moot point because patient position errors are best detected prior to treatment by image-guidance such as CBCT, thus allowing for the patient to be repositioned accurately prior to treatment. If a patient position error happens after the image guidance procedure but before (or during) treatment, then there is little that could be done anyway outside of real-time imaging to shut off the treatment if a positional change exceeds a threshold, or retrospective dose reconstruction on the shifted patient volume which itself would require a volumetric image of the shifted patient. In any case, transmission EPID images of highly modulated treatment beams are of little to no value in detecting if there were such errors, much less what the positional error was. Simply put: patient position errors are a job for high quality, volumetric image-guidance, not low quality 2D transmission images of complex, intensity-modulated fields.

Now, it is important to realize that the limitations found in this particular study are inherent to any EPID-based system, and not to one specific implementation or design. That is, it is not likely that any improved system with EPIDs as the primary input would do much better.

Don’t Forget: Specificity Matters Too

We have summarized some of the sensitivity results from the aforementioned paper. If a system lacks sensitivity, one logical thing to try to increase sensitivity is to lower or tighten the error tolerance, or alert threshold. This by definition will increase the number of “alerts” but it does not mean that each alert is warranted, i.e. clinically relevant.

In fact, the same general EPID-based methodology was used in another recent publication in Practical Radiation Oncology by Mijnheer et al., entitled, “Overview of 3-year experience with large-scale electronic portal imaging device–based 3-dimensional transit dosimetry”, and clearly illustrates this point.  In their article they state that almost 4,700 of approximately 15,000 plans that were evaluated by their system exceeded their alert criteria. Yet, when those 4,700 results were analyzed in detail and using other data sources, only 35 were found to actually have clinically relevant deviations.  That is a very large false positive rate, meaning the system has poor specificity. This could create inefficiencies and logjams in the work flow. A large rate of false positives may also cause the users to eventually distrust, and perhaps eventually ignore, the alerts. This is akin to the story of “the boy who cried wolf” (and as we all remember, it didn’t end well for the boy).

Final Thoughts

It is clear that clinicians want to do what is best for their patients given the constraints of budget and time. This means we must be diligent in assessing, for lack of a better term, the “bang for the buck” of our options. Our goals are to deliver the right amount of radiation dose to the right volume in space, and in the case of imperfections or errors we need to not only be able to detect them right away, but diagnose root causes efficiently so we can immediately remedy the situation. A rational reading of publications on EPID-based in vivo dosimetry systems makes a few things clear. First, the things these systems purport to do are not done particularly well, and can be done more effectively in other ways with tools likely already at your disposal. Second, there are potential errors per fraction that are not detected well (if at all) by EPID-based systems. Third, if/when errors are detected by EPID-based systems, the specificity is low and to fully diagnose the error requires log file data and volumetric IGRT (e.g. CBCT), and if you had the log file data and CBCT the EPID data become extraneous. Those are three big strikes, and as we know, three strikes and you’re out. That’s just common sense.