Tech News

As sites try to block AI searches, is the ‘open web’ shutting down?

Tai Neilson, senior lecturer at Macquarie University examines how data has become a ‘hot commodity’ for companies training AI systems.

When the World Wide Web went live in the early 1990s, its founders hope it can be a space for anyone to share information and collaborate. But today, the free and open web is in decline.

I Internet Archive it was recording the history of the Internet and making it available to the public through its use The Wayback Machine since 1996. Now, some of the biggest news outlets in the world to block archive access to their pages.

Major publishers – including the Guardian, the New York Times, the Financial Times, and USA Today – have confirmed that they are ending the Internet Archive’s access to their content.

While publishers say they support the goal of preserving the archive, saying that unrestricted access creates unintended consequences, exposing journalism to AI investigators and members of the public trying to get past their paywalls.

However, publishers don’t just want to lock AI browsers. Instead, they want sell their content for data-hungry tech companies. Their back lists of news, books and other media have become a hot material as training data for AI systems.

Robot students

Productive AI programs such as ChatGPT, Copilot and Gemini need access to large content repositories (such as media content, books, art and academic research) training and in respond to user commands.

Publishers claim that technology companies have made much of this content available for free as well without the permission of the copyright owners. Others began taking technology companies to court, claiming they had stolen their intellectual property. Top examples include The New York Times‘ lawsuit against ChatGPT’s parent company OpenAI and News Corp’s case against Perplexity AI.

Old news, new money

In response, other technology companies have he hit deals to pay for access to publishers’ content. NewsCorp’s contract with OpenAI is reportedly the same it costs more than $250m more than five years.

Similar agreements have been made between academic publishers and technology companies. Publishing houses like Taylor & Francis and Elsevier they have previously been criticized for locking publicly funded research behind commercial paywalls.

Now, Taylor & Francis signed a $10m non-exclusive deal with Microsoft that gives the company access to more than 3,000 journals.

Publishers also use it technology to stop unwanted AI bots access to their content, including the searches used by the Internet Archive to record the history of the Internet. News publishers have called the Internet Archive “back door” in their catalogs, allowing unscrupulous tech companies to continue to destroy their content.

The cost of making the news is free

The Wayback Machine has also been used by community members to avoid newspaper paywalls. Understandably, the media wants readers to pay for the news.

Business news, and its advertising revenue model has come under increasing pressure from the same tech companies that use content for AI training and retrieval. But this comes at the cost of public access to reliable information.

When newspapers began moving their content online and making it free to the public in the late 1990s, they contributed to the idea of ​​sharing and collaborating on the early web.

In retrospect, however, one commentator called free access “original sin” of online news. The public got used to getting their digital programming for free, and as online business models changed, many medium and small media companies struggled to fund their operations.

The opposite approach – putting all commercial issues behind paywalls – has its problems. As news publishers move to subscription models onlypeople have to bundle more expensive subscriptions or set a limit food issues. If not, they are left with any news that lives on the internet for free or is delivered by social media algorithms. The result is a closed, commercial Internet.

This is not the first time that an online archive has existed publishers’ intersectionssince this organization has been sued and found to be infringing copyright for its Open Library program.

The past and future of the Internet

The Wayback Machine served as the web’s public record more than thirty yearsused by researchers, educators, journalists and novice Internet historians.

Restricting its access to international newspapers would leave significant holes in the public record of the Internet.

Today, you can use the The Wayback Machine to see the front page of the New York Times from June 1997: the first Internet Archive crawl of a newspaper website. In another 30 years, Internet researchers and curious members of the public won’t have access to today’s front page, even if the Internet Archive still exists.

Today’s websites are tomorrow’s historical records. In addition to the preservation efforts of non-profit organizations such as the Internet Archive, we are in danger of losing important records.

Despite the actions of commercial publishers and emerging challenges of AInon-profit organizations such as the Internet Archive and Wikipedia aims to keep the dream of an open, interactive and transparent internet alive.

Written by Tai Neilson

Tai Neilson is a senior lecturer in media at Macquarie University. His areas of expertise include the political economy of digital media and critical cultural theory. He is the author of Journalism and Digital Labor and editor of Research Methods for Digital Humanities. Tai has published work on journalism and digital media in Digital Journalism, Journalism, Media International Australia, Journalism and Media, Triple-C, Fast Capitalism, and Global Media Journal.

Don’t miss out on the information you need to succeed. Sign up for Daily BriefSilicon Republic’s digest of must-know sci-tech news.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button