We make all kinds of decisions every day. Some are small yet seem difficult at the time. One I sometimes joke about is ordering off a restaurant menu that has too many good choices. When I finally make my order, I tell the server that I have made my “major life decision” for the night.
Sometimes a group makes a decision after weeks or months of lengthy deliberation: many groups have weighed in, expressed their concerns, asked their questions, refined the plan or recommendation, and only then ultimately provided their support.
And then there are the potentially very impactful decisions that must be made in a matter of minutes with the best information you have available after a very quick weighing of the risks. I had to make one of those decisions last Friday.
We had scheduled our Epic version 2014 upgrade for the weekend. The plan was to bring down the production system at 12:30 AM Saturday. The system would be down until 5:00 AM while the final conversion tasks were completed. IT and operations staff were scheduled in the command center to monitor the upgrade and address any problems. Leadership calls were scheduled daily to review issues starting Saturday.
At 11:51 AM on Friday, I got a text from the executive director responsible for the application infrastructure: “We have a significant issue with the upgrade. We are having an emergency meeting with Epic at noon to discuss.” I excused myself from my meeting with our new UM CISO and got on the call. Within 10 minutes, I understood that there was a “database lock” error even though the preparatory steps had completed. We didn’t know why the error had occurred.
Epic offered two options. One was that we cancel the upgrade. They also offered an untested alternative upgrade path that would have required a longer downtime. I had to decide which way to go.
All I had to know was that if we continued with the upgrade as planned we would not have been able to ensure data integrity. With an electronic health record (EHR), we deal with critical clinical patient data. This isn’t just about having a correct bill or getting the patient’s address right. The EHR carries the information our clinicians depend on to care for critically ill patients. A question about data integrity and possible data corruption was enough for me to say “cancel it.”
We asked a few more questions of Epic to be sure we weren’t being too hasty. We weren’t. We all agreed it was the right decision.
When such a decision is made, you have to have a plan to communicate it. In this situation, we had to pull one together quickly, one that covered multiple audiences. First communication step was for me. I sent a note to our top executives telling them of my decision and why. I told them that we were developing the phased communication plan that would be implemented within the hour while we determined the root cause of the error.
We had a closure conference call later that afternoon to ensure we had taken care of all communication and logistics involved in the cancellation. By then we had identified the root cause. It came down to a manual error and process. While we have several levels of change management which control how and when changes are applied to the system, this event exposed they could be tighter. A post review of this event has determined that tighter oversight of the change management process is needed when required application patches are applied to the system in the final days/hours of an upgrade.
We have rescheduled the upgrade to the weekend of May 9th. Those necessary modifications to change control will be in place before beginning the upgrade process again.
No one has questioned the decision to cancel the upgrade. Had we gone forward and had data corruption issues, you can bet there would have been many questions as to why we let that happen. Patient safety always comes first – we made the right call.