The team from Google’s cloud infrastructure has published a detailed analysis of the massive failure last week . The error ensured that authenticated use of all Google services was no longer possible. These included Gmail, Youtube, the Play Store and many other services. The provider had already confirmed that this was due to an error in connection with internal quota rules. As Google now writes , the quota was 0.
A number of different circumstances have led to this main cause, the team now writes. For authentication itself, it says: “The Google User ID Service maintains a unique identifier and authentication data for OAuth tokens and cookies for each account. It stores account data in a distributed database that uses Paxos protocols to coordinate updates. For security reasons this service rejects requests if outdated data is detected. “
It goes on to say that Google uses a number of automation tools to manage the quota rules for various resources. In addition, the company has migrated the Google User ID service to a new quota system. However, old parts remained in the new system, which ultimately led to the use of the service itself being specified with 0. This case has not yet been included in the verification rules for the quota system.
Ultimately, the quota for the account database was reduced so that no new data could be written. Shortly afterwards, the read operations were out of date, which led to errors when looking up the authentication data. Since all Google services use their own registration service, none of them could be used. However, it was still possible to use it without authentication.
As an immediate solution to the problem, the enforcement of the quota rules was lifted and this was then transferred to all of the company’s data centers so that the individual services were quickly available again. As a long-term solution, Google wants to improve the reliability of the authentication service so that similar things no longer happen in the future.