I know we can’t do this with any copyrighted materials. But a lot of books, music, art, knowledge is in the creative commons. Is it possible to create one massive torrent that includes all that can be legally included and then have people only download what they actually want to enjoy?
I think there’s a handful of problems with the idea. For starters (I’m just going with the first returned result because the actual numbers don’t matter as much as the magnitudes), there’s around 64 zetabytes comprising the internet as of 2020, 64 trillion GB. That’s going to be one hell of a zip file. In fact, pretty much the only thing capable of storing that much information is, well, the Internet itself.
Second is the rate of information being produced. These estimates vary wildly, but the rate of growth is increasing exponentially. We will soon be writing more data per day to the internet than is currently there from the very beginning until now.
So maybe we don’t need every product page from every store website around the world. Maybe we don’t need the tens of millions of pages of corporate training manuals. Maybe we need curation rather than SELECT * FROM INTERNET.
That’s what things like Gutenberg and the Internet Archive do. They’re very limited in what they catch, of course. It’s also sort of what Wikipedia does, although curation here includes summarization. It’s also a feature of historical archives from existing media - like New York Times records that go back a century (or wherever they’re at now), or back issues of Nature and Science going back to when they started publication. Those are obviously doable - people are doing them - but each alone is a microscopic piece of the puzzle.
So, given that those exist, alongside the rest of the internet, what value are we creating? Storing something digitally doesn’t give it permanence, and I have an 8” floppy disk for a cash register POS created by an unknown OS to prove it.
Someday (hopefully soon) PDFs will go away and nothing will read them. Hell, the concept of “file” could go away in 50 years. There are written texts from thousands of years ago that we cannot read, and others we’ve deciphered only very recently and imperfectly. All of that archived stuff will have to be ported over, and again that’s going to mean yet more curation. At the rate information is growing you’re going to make Sisyphus look like he’s on a vacation in Tahiti.
Does that mean it’s all one big library of Alexandria? Not necessarily.
Rather than thinking of all those data as a library, think of them as an ecosystem of knowledge. Once Amazon goes out of business, no one’s going to care about that one page of theirs with the nose hair trimmer. We will still have a copy of the NYT when we landed on the moon, or when Nazi Germany was defeated. We’ll also have other information about space programs and 20th century history. We probably won’t have my mom’s recipes or all those pictures I’ve taken of my pets over the years, and my MySpace page is thankfully gone forever. I even deleted all of my Reddit content before moving on.
Maybe my scientific publications will end up archived someplace, but there we get into the tree falling in the forest problem. If no one reads them from now to the end of time, are they really there? Maybe physically, but they’ve sort of passed out of the ecosystem of human knowledge and are now part of the fossil record, if anything.
We’ve also researched how to communicate over millennia. There’s the (kind of silly but a little cool) Long Now project. We’ve also tried to invent symbology that will allow us to put warning signs outside hazardous/nuclear waste storage facilities that will continue to communicate “Danger - Do Not Enter” for tens of thousands of years.
In short, I think that the problem you’re trying to solve is impermanence or entropy, which both Buddhists and physicists will tell you aren’t things we’re going to solve.