The history of the internet and the many old websites that are going to disappear.

La historia de internet y las muchas webs antiguas que van a desaparecer

Author: Adam Rogers

Adam Rogers is a writer and journalist recognized for his work on science, technology and culture. He has contributed to various publications, exploring how innovation and scientific advances impact everyday life.



Globalization | News | States and technology



0 Comments



November 24, 2024

24 Nov, 2024

Historians referred to the Dark Age as the period of the first centuries of Medieval Europe, heavily marked by the scarcity of sources that allow the reconstruction of the reality of that period in a reliable way. Now, it seems that we are facing a new Dark Age, in this case digital, as many URLs are disappearing, and with them, part of our memory.

The prophesied digital apocalypse is here, and it has been proclaimed by a post published on a blog.

The headline of the post, from July 18th, sounded quite cryptic. “Google’s URL shortener links will no longer be available,” it stated. I know, I know, calling it an apocalypse might sound exaggerated (it’s not exactly an alien zombie attack from a deadly dimension). But the news scared me. It means that part of the web is about to disappear. This – it’s important to remember – is not a new issue; we’ve seen these disappearance effects before, even when there were projects that tried to preserve the information that was about to vanish. There is a certain logic, as the growth of the Internet – something we discussed in a research paper a few years ago – is unstoppable. At that time, we calculated that, if someone wanted to read all the content of the web, they would need to spend approximately one hundred years doing so, without moving from their spot, 24 hours a day. We had also examined the so-called “redundant information,” meaning the repetitions within the Web, which at that time amounted to 30%-40% of the information online.

The question is this: Google used to have an online service that generated short and easy-to-use versions of the long and cumbersome URLs, the addresses that identify everything on the internet. Shorter URLs are easier to track and better for e-commerce. Google stopped shortening them in 2019, but the concise URLs it had already created continued to do their job. You would click on one, and it would take you to the correct webpage, as it should.

Well, that’s history now. In the post published on their blog, Google announced that starting next year, all shortened links will be deactivated. Puf. On the internet, if your URL doesn’t work, it’s as if you don’t exist. You become impossible to find. Without a laborious redirection process, everything behind those links – billions of them, a whole decade of digital content – will become inaccessible. It will disappear completely.

Now, making a quantity of web content invisible doesn’t mean the end of the world. Not by itself. The problem is that these kinds of things keep happening. And it’s getting worse. Social media is crumbling. Digital media outlets are facing closure. Companies are removing their online products. Links are getting corrupted. Files can’t be found. The cloud, as some jokers have pointed out, is nothing more than a concept actually based on “other people’s computers.” And when those clouds shut down, there’s no trace of what they were storing.

Maybe none of this matters much right now. But it will. The internet has become the default archive of our history and culture, and right now, it’s burning before our eyes, like the Library of Alexandria, only worse. For the first time since humans began carving on rocks, we are creating an era without history. We are about to enter the Digital Dark Age.

The algorithmic apocalypse is already here, and it’s destroying our lives:

Attempts to quantify the extent of the problem are heartbreaking. Half of the links in the decisions of the U.S. Supreme Court no longer lead to the cited information. A 2021 report found that a quarter of the more than 2.2 million hyperlinks on The New York Times’ website were broken. Even worse, the Pew Research Center estimates that a quarter of everything published on the web between 2013 and 2023 is currently inaccessible, meaning that nearly 40% of the web as it existed in 2013 simply doesn’t exist today, barely a decade later.

The destruction of those links wouldn’t worry me so much if they hadn’t replaced what was there before, if museum halls and dusty library shelves were still serving as the warehouses of our collective memory. It’s not that I miss the days of old newspapers preserved on microfiche, or trying to convince a librarian to get an international interlibrary loan. I’m glad many old movies are streaming, and that many out-of-print books are just a click away. But archives and databases are more than places to store old things; what we preserve defines who we are. Today, so much of everything is just digital that when it disappears, it leaves a void in our shared culture.

Gawker is gone. So is The Awl’s archive, the beloved cultural critique site. You can go to a library and read all the production of long-dead newspapers like the Los Angeles Herald Examiner or New York Newsday, but God help you if you want to read old articles from Vice. Ownership issues over what was once Paramount have resulted in the removal of decades of shows from MTV and Comedy Central.

The Cartoon Network archive is also gone, as are Yahoo Groups, Yahoo Answers, large parts of the Imgur photo service, the spicy parts of Tumblr that were removed in a pornography purge, everything that happened on Friendster and other social networks before Facebook, Club Penguin, Neopets, Geocities, AOL, and Prodigy. Or Tuenti in Spain. Large stretches of video games made for now-obsolete systems are today unreachable memories.

Hard drives have a finite lifespan, and those that the music industry used to store data in the 1990s, before the transition to digital formats, are deteriorating. The U.S. Department of Veterans Affairs is legally obligated to preserve all medical records for 75 years after a veteran’s death, but it’s facing problems, partly due to a faulty digital records system. And that’s not even mentioning things like personal photographs, most of which now only exist on your phone and nowhere else.

Are all the emails you sent or received in your last job, or anything that a deceased family member had on their now-unusable computer, gone? These are the things that make us who we are. Yet, I challenge you to find them.

Why it’s so frustrating to search for anything on the internet:

There are always brave souls trying to rescue scrolls from a burning library. But it’s hard to rescue something that only exists on an ethereal level. “If a library catches fire, it’s a tragedy, but most of the books survive somewhere else,” notes Mark Graham, a leading internet archivist. “But the digital world is inherently fragile and potentially ephemeral,” he adds.

Graham is the director of the Wayback Machine, a project created decades ago to collect and store digital copies of web pages to prevent them from being lost. Gawker? Yes, they managed to preserve most of it. Regarding the Pew study I mentioned earlier, which stated that more than a third of recent internet content had disappeared, Graham explains: “When we repeated their study using their data, we found that about two-thirds of that material was securely stored in the Wayback Machine. So, in reality, only one-ninth has been lost.”

As we store our lives on our devices, we are actively choosing to create massive gaps in our historical record. It’s a self-inflicted cultural amnesia.

The Wayback Machine automatically archives more than 1 billion URLs every day. It also maintains the hundreds of millions of links in the 320 language editions of Wikipedia, which are being lost at a rate of no less than 10,000 daily. Recently, Graham worked on preserving 5,000 videos from a YouTube channel run by Rohingya activists, whose people were subjected to genocide in 2017. “We were asked to archive them because YouTube regularly deletes videos from its platform. They don’t even leave the metadata, so it’s impossible to know what content has been removed,” Graham says. He adds that he managed to preserve all the videos except one, which had age restrictions.

Usually, the biggest obstacle for the Wayback Machine is paywalls. Most articles in the world’s scientific journals, for example, are widely available to anyone with a university subscription. But the articles are prohibitively expensive for the rest of us, even though our taxes paid for the research they describe. An archive is not truly an archive if it’s not accessible to everyone.

And currently, there’s a new threat to archiving our history: artificial intelligence. When websites don’t want AI to absorb their content, they block a certain type of digital crawler bot, the same type used by the Wayback Machine. “This has happened almost overnight,” says Graham. AI, with its insatiable hunger for training data, can’t access many pages. But neither can archivists. As a result of AI’s rise, more intelligence, paradoxically, will disappear.

Let’s be clear: this goes beyond the disappearance of some news articles or content from your favorite comic. What an archive is able to preserve, even the formats that fit in its file cabinets or databases, literally determines what gets remembered. If you preserve, for example, 18th-century banking records, but not sewing patterns, you’ll forget many people. Similarly, if your digital archive only preserves the records of profitable companies—because those that went bankrupt end up destroying their servers—you lose the memory of everything those vanished companies worked on. And what is remembered from the past determines what we can do in the present. “Society is memory. When you lose that memory, what does it imply?” sums up Marlene Manoff, former chief collection strategist at MIT Libraries.

Unreadable hard drives and disappearing links aren’t the only threats to the historical record. Think about selfies. Fifteen years ago, a researcher from the Scripps Institution of Oceanography named Loren McClenachan wanted to know if commercial overfishing and environmental changes were making fish smaller. To find out, she reviewed five decades of photos from winning catches in sport fishing competitions in Key West, Florida. It turned out that the fishing boat company organizing the competitions had kept all the physical photos, most with the date handwritten on the back.

Armed with these archives, McClenachan was able to show that, in the last half-century, the sizes of award-winning catches had decreased by more than 50%. None of that data would have been available if all the fishermen had kept records of their catches on their phones. Instead, we would be subject to what is known as the “shifting baseline syndrome,” the common assumption that what is normal today was also normal in the past.

As the internet disappears and we store our lives on our devices, we are actively choosing to create massive gaps in our historical record. It’s a self-inflicted cultural amnesia, worsened by the fact that much of the web is in the hands of large corporations that care little about preservation. “In the long term, you can’t preserve a digital object in its original form,” says Manoff, the former MIT librarian. “But in the case of corporate ownership, the likelihood of responsible long-term management of digital content in any form becomes increasingly unlikely,” she adds.

The Dark Age, as historians used to call the early centuries of medieval Europe, lasted 500 years. Our digital version may never end. A post-literate society leaves exactly the same trace in the world as a pre-literate one. That is, it leaves virtually none.

.

Translation by Cristina Gálvez

Comments from the Director of the Laboratory, Dr. Ricardo Petrissans (in italics)

Autor: Adam Rogers

Adam Rogers is a writer and journalist recognized for his work on science, technology and culture. He has contributed to various publications, exploring how innovation and scientific advances impact everyday life.