Writing replace patterns

Writing Replacement Patterns

Subpatterns Make Replacement Powerful

We covered subpatterns earlier when discussing search patterns and discussed how the parentheses can be used to limit the scope of the alternation operator. Another reason for employing subpatterns in your grep searches is to provide a powerful and flexible way to change or re-use found information as part of a search-and-replace operation. If you do not use subpatterns, you can still access the complete results of the search with the & metacharacter. However, this precludes reorganizing the matched data as it is replaced.

Pattern		Matches...

&		the entire matched pattern [replacement only]

(p)		the pattern p and remembers it [search only]

\1, \2, ..., \99		the nth subpattern in the entire search pattern

Using the Entire Matched Pattern

The & character is useful when you want to use the entire matched string as the basis of a replacement. Suppose that in your text every instance of product names that begin with the company name "ACME" needs to end with a trademark symbol (?). The following search pattern finds two-word combinations that begin with "ACME":

    ACME [A-Za-z]+

The following replacement string adds the trademark symbol to the matched text:

&?

For example, if you start with

    ACME Magnets, ACME Anvils, and ACME TNT are all premium products.

and perform a replace operation with the above patterns, you will get:

    ACME Magnets?, ACME Anvils?, and ACME TNT? are all premium products.

Using Parts of the Matched Pattern

While using the entire matched pattern in a replacement string is useful, it's often more useful to use only a portion of the matched pattern and to rearrange the parts in the replacement string.

For example, suppose a source file contains C-style declarations of this type:

#define Util_Menu 284
#define Tool_Menu 295

and you want to convert them so they look like this, Pascal-style:

const int Util_Menu = 284;
const int Tool_Menu = 295;

The pattern to find the original text is straightforward:

    #define[ \t]+.+[ \t]+\d+[^0-9]*$

This pattern matches the word "#define" followed by one or more tabs or spaces, followed by one or more characters of any type, followed by one or more tabs or spaces, followed by one or more digits, followed by zero or more characters that are not digits (to allow for comments), followed by the end of the line.

The problem with this pattern is that it matches the entire line. It doesn't provide a way to remember the individual parts of the found string.

If you use subpatterns to rewrite the above search pattern slightly, you get this:

    #define[ \t]+(.+)[ \t]+(\d+)[^0-9]*$

The first set of parentheses defines a subpattern which remembers the name of the constant. The second set remembers the value of the constant.

The replacement string would look like this:

    const int \1 = \2;

The sequence \1 is replaced by the name of the constant (the first subpattern from the search pattern), and the sequence \2 is replaced by the value of the constant (from the second subpattern).

Our example throws out any comment that may follow the C-style constant declaration. As an exercise, try rewriting the search and replace patterns so they preserve the comment, enclosing it in (*...*) style Pascal comment markers.

Here are some more examples:

Data	Search for	Replace	Result

4+2	(#+)\+(#+)	\1+\2	4+2

1234+5829	(#+)\+(#+)	\1+\1	1234+1234

2152 B.C.	(####)[\t ]B\.C\.	\1 A.D.	2152 A.D.

1,234.56	\$?([0-9,]+)\.(#+)	\1 dollars and \2 cents	1,234 dollars and 56 cents

$4,296,459.19	\$?([0-9,]+)\.(#+)	\1 dollars and \2 cents	4,296,459 dollars and 19 cents

$3,5,6,4.00000	\$?([0-9,]+)\.(#+)	\1 dollars and \2 cents	3,5,6,4 dollars and 00000 cents

Case Transformations

new in 6.5 - Replace patterns can also change the case of the original text when using subpattern replacements. The syntax is similar to Perl's, specifically:

Modifier	Effect

\u	Make the next character uppercase

\U	Make all following characters uppercase until reaching another case specifier (\u, \L, \l ) or \E

\l	Make the next character lowercase

\L	Make all following characters lowercase until reaching another case specifier (\u, \U, \l ) or \E

\E	End case transformation opened by \U or \L

Here are some examples to illustrate how case transformations can be used.

Given some text:

    mumbo-jumbo

and the search pattern:

    (\w+)(\W)(\w+)

the following replace patterns will produce the following output:

    \U\1\E\2\3				             MUMBO-jumbo

    \u\1\2\u\3                		Mumbo-Jumbo

Note that case transformations also affect literal strings in the replace pattern:

    \U\1\2fred	                		MUMBO-FRED

    \lMUMBLE\2\3	              mUMBLE-jumbo

Finally, note that \E is not necessary to close off a modifier; if another modifier appears before an \E is encountered, then that modifier will take effect immediately:

    \Ufred-\uwilma	            FRED-Wilma