Examples

The example patterns in this section describe some common character classes and shortcuts used for constructing grep patterns, and addresses some common tasks that you might find useful in your work.

Matching Identifiers

One of the most common things you will use grep patterns for is to find and modify identifiers, such as variables in computer source code or object names in HTML source documents. To match an arbitrary identifier in most programming languages, you might use this search pattern:

    [a-z][a-zA-Z0-9]*

This pattern matches any sequence that begins with a lowercase letter and is followed by zero or more alphanumeric characters. If other characters are allowed in the identifier, add them to the pattern. This pattern allows underscores in only the first character of the identifier:

    [a-z_][a-zA-Z0-9]*

The following pattern allows underscores anywhere but the first character, but allows identifiers to begin with an uppercase or lowercase letter:

    [a-zA-Z][a-zA-Z0-9_]*

Matching White Space

Often you will want to match two sequences of data that are separated by tabs or spaces, whether to simply identify them, or to rearrange them.

For example, suppose you have a list of formatted label-data pairs like this:

User name: Bernard Rubble
Occupation: Actor
Spouse: Betty

You can see that there are tabs or spaces between the labels on the left and the data on the right, but you have no way of knowing how many spaces or tabs there will be on any given line. Here is a character class that means "match one or more white space characters."

    [ \t]+

So, if you wanted to transform the list above to look like this:

User name("Bernard Rubble")
Occupation("Actor")
Spouse("Betty")

You would use this search pattern:

    ([a-z ]+):[ \t]+([a-z ]+)

and this replacement pattern:

    \1\("\2"\)

Matching Delimited Strings

In some cases, you may want to match all the text that appears between a pair of delimiters. One way to do this is to bracket the search pattern with the delimiters, like this:

    ".*"

This works well if you have only one delimited string on the line. But suppose the line looked like this:

    "apples", "oranges, kiwis, mangos", "penguins"

The search string above would match the entire line. (This is another instance of the "longest match" behavior of BBEdit's grep engine, which was discussed previously.)

Once again, non-greedy quantifiers come to the rescue. The following pattern will match "-delimited strings:

    ".+?"

Marking Structured Text

Suppose you are reading a long text document that doesn't have a table of contents, but you notice that all the sections are numbered like this:

3.2.7 Prehistoric Cartoon Communities
5.19.001 Restaurants of the Mesozoic

You can use a grep pattern to create marks for these headings, which will appear in the Mark pop-up menu. Choose Find & Mark All from the Mark pop-up menu in the status bar. Then, decide how many levels you want to mark. In this example, the headings always have at least two digits and at most four.

Use this pattern to find the headings:

    ^(\d+\.\d+\.?\d*\.?\d*)[ \t]+([a-z ]+)

and this pattern to make the file marks:

    \1 \2

The ^ before the first search group ensures that BBEdit matches the numeric string at the beginning of a line. The pattern

    \.?\d*

matches a (possible) decimal point and a digit sequence. The other groups use the white space idiom and the identifier idiom. You can use a similar technique to mark any section that has a section mark that can be described with grep.

Marking a Mail Digest

You can elaborate the structured text technique to create markers for mail digests. Assume that each digest is separated by the following lines:

From: Sadie Burke <sadie@burke.com>
Date: Sun, 16 Jul 1995 13:17:45 -0700
Subject: Fishing with the judge

Suppose you want the marker text to list the subject and the sender. You would use the following search string:

    ^From:[ \t]+(.*)\r.*\rSubject:[ \t]+(.*)

And mark the text with this replacement string:

    \2 \1

Note that for the sequence \r.*\r in the middle of the search string, the \r before "Subject" is necessary because as previously discussed, the special character . does not match carriage returns. (At least, not by default. See "Advanced Topics", below, for details on how to make dot match any character, including carriage returns.)

Rearranging Name Lists

You can use grep patterns to transform a list of names in first name first form to last name first order (for a later sorting, for instance). Assume that the names are in the form:

Junior X. Potter
Jill Safai
Dylan Schuyler Goode
Walter Wang

If you use this search pattern:

    ^(.*) ([^ ]+)$

And this replacement string:

    \2, \1

The transformed list becomes:

Potter, Junior X.
Safai, Jill
Goode, Dylan Schuyler
Wang, Walter