The Bare Minimum You Want to Know About Regular Expressions

Many web indexes are searchable using regular expressions as the query langauge. Regular expressions comprise a powerful pattern matching language that pervades Unix programming tools (like grep and egrep). As a result, Unix geeks are very comfortable with them, and since Unix geeks write most of the web indexing services, they think that regular expressions make a good query language.

Here's the very least you need to know about regular expressions to be able to use them for web searching; you could learn lots more.

.*
If you type .* in a query, it matches any number of characters, zero or more in fact. So:
.*communication
would match communication and telecommunication, among others. Since right truncation is implicit, it would also match telecommunications (.*communication.* would be redundant). You can use .* in the middle of a query as well.
.
. by itself matches any single character.
|
The vertical bar (|) is a Boolean OR operator; for example, you could form a search for communications or networking with a query like:
.*communication|network

Much More Than You Want to Know About Regular Expressions

For a technical introduction to regular expressions, you can read some notes I wrote for a class.
k-waclena@uchicago.edu
Send me email.
This page last updated: Sat Aug 3 17:25:30 CDT 1996