European Data-Center - System Unavailability
Incident Report for IDnow GmbH
Postmortem

POST MORTEM ANALYSIS of Incidents on 2019-11-15

Between 2019-11-15 10:06 UTC+1 and 2019-11-15 11:00 UTC+1 we experienced a full outage of our system.

During the aforementioned time frame the IDnow system was not serving API calls nor video calls.

The root cause analysis showed that within our database cluster the master node failed with a physical hardware failure of a DIMM module. The ECC memory couldn’t correct the failures as multiple bit errors were detected. As this hardware node has been in service for more than 1.5 years without any hardware issue we decided to restart the node to restore service and initiate the process of a master node switch-over. This process will move the master role to another node. Once the switch-over is completed, we will be able to have the DIMM replaced.

We also initiated a process to adjust our current slave nodes into a tree-cascade fashion in order to allow faster switch overs in the future.

We sincerely apology for the incident and that we couldn’t provide the services with the availability we aim for.

Posted Nov 18, 2019 - 15:55 CET

Resolved
We monitored the system closely. The system has been working within normal parameters during the extended monitoring phase.

Impact: During the mentioned time frame our services where not available.

We will finalize our root cause analysis and will post a post mortem regarding this incident within the next days.

Incident start: 2019-11-15 10:00 UTC+1
Incident end: 2019-11-15 11:01 UTC+1
Posted Nov 15, 2019 - 13:28 CET
Monitoring
We restored the services. Our monitoring shows that the system is working again within normal parameters. We are monitoring the system closely. We will provide more information about the incident upon closing this issue.

Incident start: 2019-11-15 10:00 UTC+1
Posted Nov 15, 2019 - 11:06 CET
Identified
We identified the root cause for the service unavailability. We are working on restoring the services.

Incident start: 2019-11-15 10:00 UTC+1

We will provide an update within the next hour.
Posted Nov 15, 2019 - 10:56 CET
Investigating
We are currently experiencing a severe service unavailability of our European data-center. We are investigating the issue with highest priority and all available resources.

Incident start: 2019-11-15 10:00 UTC+1

We will provide an update within the next hour.
Posted Nov 15, 2019 - 10:19 CET
This incident affected: Europe - IDnow (Video-Ident, eSigning QES, eSigning AES, Photo-Ident, API, AutoIdent).