Newark Air Traffic Control Failures: A cautionary tale for managing aging technology and systems

One of the busiest airports in the eastern United States, Newark Liberty International Airport, has over the last several months been affected by multiple technical failures in the Air Traffic Control (ATC) systems that govern its airspace.

Aside from the obvious safety concerns, this is a fascinating story about incredibly complex and critical systems, and what happens when those systems begin to age. Many of us work with complex systems, and there are lessons here for us to avoid catastrophes, or near-misses.

So, what exactly happened? Let’s look at the recent timeline:

  • April 28 – A radar facility in Philadelphia that routes planes in and out of Newark goes “dark”, effectively causing controllers to lose track of aircraft, and leading to hundreds of flights being delayed or cancelled. Several controllers go on trauma leave.
  • May 9 – A similar incident occurs in the early hours that lasts for 90 seconds. The FAA says that upgrades to communication lines between Philadelphia and New York facilities are planned, with officials noting that some very old copper wiring is to be replaced with fibre optic cable.
  • May 21 – The FAA announce limits to the number of aircraft arriving and departing Newark, initially 28 arrivals and 28 departures per hour.
  • September 25 – The FAA announce limits are still in place but increased to 36 arrivals and departures per hour.

Meanwhile, there are other factors at play that make the situation more complicated, including an upgrade to one of the runways, and severe staff shortages of air traffic controllers.

But the systems themselves were failing. Why? As revealed in a 2024 Government Accountability Office report to congressional committees, well over half of the systems used by ATC are either unsustainable or potentially unsustainable. Of these systems, over half “…have critical operational impacts on the safety and efficiency of the national airspace.”

Why are these systems considered (potentially) unsustainable? To quote from the GAO report, “…aging systems have been difficult to maintain due to the unavailability of parts and retirement of technicians with expertise in maintaining the aging systems.” While some of these systems are subject to ongoing investments by the FAA, many of the upgrades are not due for completion for several years, or longer.

Clearly the FAA face enormous challenges and we’ve seen the ATC staff continue to work tirelessly to keep travellers safe despite limitations and shutdowns.  But the key takeaway is this: What lessons can we learn for our own internal systems?

Whether we are dealing with production systems, R&D systems, or something else, we need a plan for maintaining them and keeping them running even as available technology changes, and as new components and perhaps new requirements are introduced.

Let’s break down some of the common issues:

Legacy Hardware Quality hardware can last a very long time, especially when it is fit for purpose. Are you keeping track of the availability of replacement components? Are available replacement components identical, mostly identical, or very different? What are the unknown impacts of replacing a key component of system with a slightly newer version of the original component?  For example, do your systems still rely on floppy disks or optical drives for removable storage?

Legacy Software When software works, we can easily be lulled into a false sense of security. Does the software you rely on get routine updates, particularly for security? Is it still supported by its creator, or has the community moved on to newer solutions? One prime example is the operating system.  Microsoft recently announced the end-of-life for Windows 10.  Do you have PCs still running on even older versions, just to support out-of-date software?

Staffing & support Who is responsible for maintaining the hardware, monitoring it, and sounding the alarm if they detect anything of concern? Rather than leaving this to the end users, a proactive approach of routine security and system health audits by IT professionals can help prevent system failures before they occur. Appropriate security training for users and clearly defined responsibilities are also necessary.

Redundancy For mission critical systems, a failure response plan is not enough.  Sooner or later, all hardware will fail.  Are there redundant components in place to eliminate downtime?  It could be as simple as using a RAID array for storage drives, or power supplies with hot-swappable modules.  These allow the system to continue running without data loss despite the failure of one or two components.  With larger virtual computing environments, multiple host systems can provide instant failover support to keep your applications online.  What measures do you have in place when something critical inevitably fails?

These are sensible questions that effective managers should ask.  The IT professionals at General Technics can guide you to the answers and help you manage the hardware of those systems.  We specialize in designing custom systems that can be supported for longer life cycles than most consumer level equipment can deliver.  And we can help you plan upgrades to aging systems without disrupting your workflow.

Call us today at 1-800-487-2538 or drop us an email at sales@gtweb.net, or fill out the form below.