Internet Archive Was Exposing User Email Addresses for Years Before Recent Breach

The Internet Archive recently was the target of a data breach that exposed information related to 31 million users, including their usernames and email addresses, among other materials. The group SN_Blackmeta has claimed responsibility for a concurrent DDoS attack that took the site offline. The party responsible for the data breach has not yet been identified.


Internet Archive Was Exposing User Email Addresses for Years Before Recent Breach 1

Related

New York Times Doesn’t Want Its Stories Archived


The nonprofit Internet Archive plays a vital role in online culture, preserving web content and other digitized materials and operating the popular Wayback Machine, which lets visitors see historic versions of websites.

It is not yet clear how the data breach occurred, though some in the information security community have speculated that credentials for the Internet Archive’s servers may have been found in the logs of “information stealer” malware, which exfiltrates sensitive information from infected systems.  

The recent data breach is not the only way that Internet Archive user email addresses have been vulnerable online. For more than a decade, the Internet Archive has been exposing the email addresses of anyone who uploaded a file to its library, despite its claims that it does not share uploader email addresses with anyone.

When content is uploaded to the Internet Archive, a metadata file is automatically generated that includes a variety of information about the content, such as date of upload, any user-entered description of file contents, as well as the subject and media type. Alongside this metadata, however, there is an “uploader” field that shows the uploader’s email address. The metadata file is publicly viewable by clicking the “Show All” link viewable on the main page of any uploaded content. The metadata can also be accessed by going to a specific metadata URL for the file. 

Users have been raising concerns about the visibility of email addresses at Internet Archive for more than a decade. On its own site, in response to the question of “How can I contact the person / group who uploaded an item?”, the Internet Archive states that it is “unable to release any contact information for patrons.” Similarly, in a section of its guide titled “Why do you need my email address?”, the Internet Archive explains that it needs email addresses to verify accounts, allow users to log into accounts, help recover passwords, and receive notifications. The Archive goes on to “promise we will not share your data with anyone.”

Despite these assurances, however, the Internet Archive appears to readily reveal the email address of content uploaders, ignoring support requests from users who flagged the issue for years. In 2013, a user made a post on the Archive’s support forums pointing out that uploader information, specifically the uploader’s email address, was made available in a metadata file the Archive generated for every upload. The post didn’t receive a response from anyone at the Archive. 

In 2024, another user posted an issue on the Internet Archive’s GitHub page, referencing the earlier 2013 post and similarly detailing the fact that uploader emails are publicly viewable. “There is nothing on the website warning users that their email addresses are going to be exposed,” the post states. It goes on to describe this as a “betrayal of uploaders’ privacy.” Even if users subsequently updated the email address affiliated with their account, older uploads still revealed the email address which was associated with the account at the time of the upload, the user noted. As with the earlier post from 2013, no one from the Internet Archive publicly responded to the raised issue.

The Internet Archive did not immediately respond to questions about the breach or about why uploader emails are made public, despite documentation stating that uploader emails are not shared with anyone.

To mitigate the adverse impact of potential account leaks, users should have a unique, random password for each of their accounts, so that if a breach of a particular service were to occur, attackers wouldn’t be able to use the same password to attempt to get into other accounts, in what’s known as a credential stuffing attack. In this case, password materials included in the breach were hashed or scrambled using a secure algorithm, meaning victims of the attack shouldn’t be immediately at risk. 

To further safeguard yourself against data breaches, choose random and unique usernames for each online service. Setting up a unique email address for every online account makes things even more secure — and it isn’t as cumbersome as one might think thanks to new services offered by some e-mail providers. 

The post Internet Archive Was Exposing User Email Addresses for Years Before Recent Breach appeared first on The Intercept.