The Deep Web

Back in 1994, author Jill Ellsworth coined the term invisible web to reference web sites not registered with the search engines of the time. Two years later, the name was against used by Bruce Mount and Matthew B. Koll of the defunct Personal Library Software. The term invisible web evolved into the deep web in 2001 by computer scientist Michael K. Bergman, who coined it as a search-indexing term.

Uses

Think of the deep web content as anything that’s hidden behind the HTTP forms found across the web. These can include webmail, online banking, social media pages and profiles, web forums, and anything else that is private from the public. It can also feature anything behind a paywall, such as an online newspaper or magazine. Like other web content, the information found on the deep web is accessible through a web address. However, accessing this content usually requires entering a password or another type of security access, such as a fingerprint or retina scan, depending on the requirements. Besides the invisible web or deep web, you might also hear the term hidden web to describe the same information.

Types of Deep Web Content

From a computer science perspective, content on the deep web falls under one of nine different categories, which are best broken down between primary and secondary types.

Primary Content

The contextual web features information that is beyond what you might have searched online. For example, in December, you might search for Christmas decorations and later see information about last-minute end-of-year vacation ideas. Conversely, in July, you might be looking for Fourth of July fireworks and also see back to school content. As ReadWrite explained way back in 2008, the critical properties of the contextual web experience include: There’s also dynamic content, which shows up after you submit a query or access a form online. Meanwhile, limited access content includes sites that limit access using technical tools such as the Robots Exclusive Standard or CAPTCHAs. The latter is a type of challenge-response in computing that can separate a human from an AI. You’ll see CAPTCHAs often on websites you’re signing on for the first time. None-HTML/text content is multimedia files such as images or video files that aren’t handled by search engines and, thus, also part of the deep web. Moving on, you’ll also find private web content. Perhaps the most recognized deep web content, this is online information that’s protected by usernames and passwords. Think banking or other account information.

Secondary Content

With scripted content, you’ll find pages that much be accessed using JavaScript or content dynamically downloaded using old school Flash or Ajax software solutions. Then, there’s online content that requires software beyond a web browser for access. The Tor browser is an example of specialized software for this purpose. The deep web also includes unlinked content. As the name implies, these are pages not linked anywhere else online. Because they don’t include backlinks, these pages are typically immune from traditional web crawling. Finally, there are web archives such as the popular Wayback Machine, which has been designed to keep snapshots of web pages at different points in time.  These archives aren’t searchable through any public web engines.

Nothing Nefarious Here

Deep web content isn’t necessarily secret or illegal, unlike the similar-sounding dark web, which is full of content that’s not for everyone and often inappropriate. Instead, it’s only content that for whatever the reason isn’t searchable online.

What is the Deep Web  - 17What is the Deep Web  - 84What is the Deep Web  - 5