Difference between revisions of "Usenet archaeology"

From Usenet Big-8 Management Board
Line 118: Line 118:
"Usenet-rec" is just a handy shortcut that I've found that takes me to the archive that I want. Replace "rec" with whatever hierarchy that you want.

Revision as of 20:59, 25 September 2021

Usenet Archaeology


I admin that Usenet Archaeology is a somewhat strange idea. The Usenet has been around since 1979. While that's only 41 years since the date of this tutorial, 41 years in terms of the Internet is prehistoric. The Usenet began as a project between Duke University and the University of North Carolina at Chapel Hill back in late 1979. The Internet as we know it has only been around since 1989 with the first internet service providers. How did articles get sent and received without the Internet? At first, news articles were sent only from machine to machine via modem. These modems were often very slow (300 bits per second) and long-distance telephone calls were very expensive. Storage was also expensive and so articles were often saved on magnetic tape if they were saved at all.


The instructions in this tutorial are multi-platform as much as possible so that multiple versions don't have to be written for different operating systems. It is also assumed that the reader has a basic understanding of how to use their operating system and file management such as how to de-compress files.

UTZOO Tapes:

This brings us to the first trove of Usenet history, the UTZOO tapes. From Wikipedia, "Between 1981 and 1991, while running the zoology department's computer system at the University of Toronto, [Henry] Spencer copied more than 2 million Usenet messages onto magnetic tapes. The 141 tapes wound up at the University of Western Ontario, where Google's Michael Schmidt tracked them down and, with the help of David Wiseman and others, got them transferred onto disks and into Google's archives." The story doesn't end there, unfortunately. The UTZOO tapes still exist in some for on Google Groups, but the original files on archive.org are no longer available. According to them, "In 2020 after sustained legal demands requesting a set of messages within the Usenet Archive be redacted, and to avoid further costs and accusations of manipulation should those demands be met, the archive has been removed from this URL and is not currently accessible to the public." With that said, they are still available online, but not where you might expect them in a repository like archive.org. A link is here. If it is dead when you read it, just search online to try to find some other options.

The UTZOO Tapes files are separated into files from the actual tapes. They begin with news001f1 and end with news141f1. The compressed "tarball" files are between 4 and 65MB. That may not sound like much, but that's actually over a million messages covering around a decade of Usenet history.

Let's take a random sample of one of these files. I chose news086f1 at random. It appears that these articles are all from late 1988 to early 1989. Here is a sample breakdown of the directories in this file:

└── b107
    ├── alt
    │   └── sources
    ├── bionet
    │   ├── general
    │   ├── jobs
    │   ├── molbio
    │   │   ├── bio-matrix
    │   │   ├── evolution
    │   │   ├── genbank
    │   │   ├── methds-reagnts
    │   │   ├── news
    │   │   ├── proteins
    │   │   └── yeast
    │   └── sci-resources
    ├── biz
    │   └── comp
    │       └── software
    ├── can
    │   ├── general
    │   ├── jobs
    │   └── sun-stroke
    ├── comp
    │   ├── ai
    │   │   ├── digest
    │   │   ├── neural-nets
    │   │   ├── nlang-know-rep
    │   │   └── vision

If we want to go to biz.comp.software, we would go to the biz/comp/software subdirectory to read articles from that time. You might ask, isn't this stuff saved on Google groups? Yes and no. Let's take an article at random.

/b107/news/misc/1832 in news086f1:

Xref: utzoo news.admin:3599 news.misc:1832
Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!bcm!watson!sob
From: sob@watson.bcm.tmc.edu (Stan Barber)
Newsgroups: news.admin,news.misc
Subject: Re: Opinion on a "problem"
Message-ID: <1301@gazette.bcm.tmc.edu>
Date: 2 Oct 88 08:18:29 GMT
References: <1169@nmtsun.nmt.edu>
Sender: usenet@bcm.tmc.edu
Reply-To: sob@watson.UUCP (Stan Barber)
Organization: Baylor College of Medicine, Houston, TX
Lines: 40

In article <1169@nmtsun.nmt.edu> caasnsr@nmtsun.nmt.edu (Clifford Adams) writes:
>   I definitely favor KILL files for individuals, to "remove" a
>troublesome person.  KILLing a group of people based on a few is
>counter to many of the dreams of USENET.  Wasn't the Net supposed to
>be the "great electronic anarchy", where everyone could speak, without
>censorship or repression (at least in alt. groups)?  A community where
>all can share ideas, speaking the same language (ASCII), without
>arbitrary differences like race or gender.
> Clifford A. Adams  ---  "I understand only inasmuch as I become."
> ForthLisp Project Programmer   (Goal: LISP interpreter in Forth)
> caasnsr@nmtsun.nmt.edu     ...cmcl2!lanl!unm-la!unmvax!nmtsun!caasnsr
> (505) 835-6104 | US Mail: Box 2439 Campus Station / Socorro, NM 87801

I have always saw USENET as a great electronic community, not an anarchy.
Anarchy implies no control. This is not the case. Newsadmins can impose
whatever restrictions they choose. Some restrictions affect only the
local site. Some can actually affect the net. Within this concept of
community is the idea of citizenship. By this, I mean that there is
implied responsibility for one's own actions. USENET has attempted to
encourage everyone to use this resource responsibly. Some choose to abuse
it. Others attempt to deal with this abuse either locally or by encouraging
joint actions of a large group of newsadmins. Defining abuse and dealing
with it are always sensitive issues, but they must be done for the 
community to continue to exist. Otherwise, more sites will exit this
community for another.

I don't disagree with your basic ideas regarding the alt-net. However,
USENET remains a product of people and will continue to show all that is
good and bad (racism and sexism) about people. To expect otherwise defeats
the idea of USENET as a community. I applaude any who continue to encourage
people to follow the guidelines that are defined. This encourges responsible
usage of this resource. This resonsible use should be the hallmark of USENET.

Stan           internet: sob@bcm.tmc.edu         Baylor College of Medicine
Olan           uucp: {rice,killer,hoptoad}!academ!sob
Barber         Opinions expressed are only mine.

Now let's look at the same article in Google Groups. While the body of the article is the same, the headers are missing. Let's say you wanted to write a book on Usenet and decided to play it safe with copyright and try to contact the original authors. This is considerably more difficult if you only have the version of the article from Google Groups. Also, Google Groups lacks any kind of advanced "power" search functionality. If you are working on a Linux or BSD-based system, you have the power of tools like grep and bash which can make searching files like this much more convenient. Windows users can have this functionality also if they use tools like WSL or Cygwin. We'll talk more about Google Group later.


In 2001, the website which had been the web's archive and public gateway to Usenet, Dejanews.com, was bought by Google and so Google Groups was born. A few years later in 2013, the Usenet Historical Collection was created at archive.org with donations of newsgroup articles from Google.

The first thing that we want to do when researching the files from archive.org is to download the files for that specific hierarchy. Let's first go to https://archive.org and look for the REC Usenet hierarchy.


"Usenet-rec" is just a handy shortcut that I've found that takes me to the archive that I want. Replace "rec" with whatever hierarchy that you want.