All bigger changes, configurations and rollouts of releases on the IDnow system are done out of active working hours of the IDnow call centres to ensure there is no impact on system uptime.
However, smaller changes are also done during working hours for certain cases based on customer requests.
On 13th of May the rollout of such a small change resulted in a downtime of our system. It turned out that the script change was faulty but the error was not detected prior to rollout by our automated checks and the internal review process we have defined.
Issues: All servers were impacted.
Timeline: 13th of May, 11:50 - 12:35 CET
Customer Impact:
End users could not connect to IDnow agents during the period of outage as servers were becoming extremely slow and sometime non-responsive. End user however was waiting in the queue during this time as soon as system was up, they could connect to IDnow agent.
Actions taken:
In order to recover from the outage, we reverted the change immediately. After this the system was back online and able to handle all the incoming traffic.
Mitigation:
We are working on improving the process of the rollout of such changes by adding even more automated checks, additional testing and implementing more restrictions on what can be changed in the system especially during work hours.
We sincerely apologise for the trouble this has caused.
Your IDnow Team