Microsoft’s AI researchers inadvertently made terabytes of confidential internal data publicly accessible

Microsoft's AI researchers inadvertently made terabytes of confidential internal data publicly accessible

Microsoft’s AI researchers accidentally made tens of terabytes of sensitive data, including private keys and passwords, accessible when they published a storage bucket of open-source training data on GitHub. Cloud security startup Wiz, which is actively investigating inadvertent data exposures, uncovered a GitHub repository associated with Microsoft’s AI research division. This repository was intended to provide open-source code and AI models for image recognition. Users were instructed to download the models from an Azure Storage URL. However, Wiz discovered that this URL had been misconfigured to grant permissions to the entire storage account, inadvertently exposing additional private data.

Among the exposed data were 38 terabytes of sensitive information, including personal backups from two Microsoft employees’ personal computers. The dataset also contained other sensitive personal data, such as passwords for Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from hundreds of Microsoft employees.

Wiz pointed out that the misconfiguration in question was related to an overly permissive shared access signature (SAS) token embedded in the URL rather than direct exposure of the storage account. SAS tokens are a mechanism used by Azure to create shareable links that grant access to an Azure Storage account’s data.

Ami Luttwak, co-founder and CTO of Wiz, emphasized the importance of enhanced security measures as AI technology continues to advance. While AI offers significant potential for tech companies, the vast amounts of data handled by data scientists and engineers necessitate additional security checks and safeguards. The incident involving Microsoft highlights the challenges in monitoring and preventing such cases.

Wiz reported its findings to Microsoft on June 22, prompting Microsoft to revoke the SAS token on June 24. Microsoft completed its investigation into the potential organizational impact on August 16. According to Microsoft’s Security Response Center, no customer data was exposed, and no other internal services were jeopardized due to this issue.

In response to Wiz’s research, Microsoft has expanded GitHub’s secret spanning service, which now monitors all public open-source code changes for plaintext exposure of credentials and other secrets, including SAS tokens with overly permissive expirations or privileges.