Altova DiffDog 2024 Professional Edition

Example: Replacing Text Using Regular Expressions

Home Prev Top Next

This example illustrates how to find and replace text using regular expressions. In many cases, finding and replacing text is straightforward and does not require regular expressions at all. However, there may be instances where you need to manipulate text in a way that cannot be done with a standard find and replace operation. Consider, for example, that you have an XML file of several thousand lines where you need to rename certain elements in one operation, without affecting the content enclosed within them. Another example: you need to change the order of multiple attributes of an element. This is where regular expressions can help you, by eliminating a lot of work which would otherwise need to be done manually.

 

Example 1: Renaming elements

The sample XML code listing below contains a list of books. Let's suppose your goal is to replace the <Category> element of each book to <Genre>. One of the ways to achieve this goal is by using regular expressions.

 

<?xml version="1.0" encoding="UTF-8"?>
<books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="books.xsd">
  <book id="1">
    <author>Mark Twain</author>
    <title>The Adventures of Tom Sawyer</title>
    <category>Fiction</category>
    <year>1876</year>
  </book>
  <book id="2">
    <author>Franz Kafka</author>
    <title>The Metamorphosis</title>
    <category>Fiction</category>
    <year>1912</year>
  </book>
  <book id="3">
    <author>Herman Melville</author>
    <title>Moby Dick</title>
    <category>Fiction</category>
    <year>1851</year>
  </book>
</books>

 

To solve the requirement, follow the steps below:

 

1.Press Ctrl+H to open the Find and Replace dialog box.

2.Click Use regular expressions _ic_find_regex.

3.In the Find field, enter the following text: <category>(.+)</category> . This regular expression matches all category elements, and they become highlighted.

inc-RegexExample01

To match the inner text of each element (which is not known in advance), we used the tagged expression (.+) . The tagged expression (.+) means "match one or more occurrences of any character, that is .+ , and remember this match". As shown in the next step, we will need the reference to the tagged expression later.

 

4.In the Replace field, enter the following text: <genre>\1</genre> . This regular expression defines the replacement text. Notice it uses a back-reference \1 to the previously tagged expression from the Find field. In other words, \1 in this context means "the inner text of the currently matched <category> element".

5.Click Replace All _ic_regex_replaceall and observe the results. All category elements have now been renamed to genre, which was the intended goal.

 

Example 2: Changing the order of attributes

The sample XML code listing below contains a list of products. Each product element has two attributes: id and a size. Let's suppose your goal is to change the order of id and size attributes in each product element (in other words, the size attribute should come before id). One of the ways to solve this requirement is by using regular expressions.

 

<?xml version="1.0" encoding="UTF-8"?>
<products xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="products.xsd">
  <product id="1" size="10"/>
  <product id="2" size="20"/>
  <product id="3" size="30"/>
  <product id="4" size="40"/>
  <product id="5" size="50"/>
  <product id="6" size="60"/>
</products>

 

To solve the requirement, follow the steps below:

 

1.Press Ctrl+H to open the Find and Replace dialog box.

2.Click Use regular expressions _ic_find_regex.

3.In the Find field, enter the following: <product id="(.+)" size="(.+)"/> . This regular expression matches a product element in the XML document. Notice that, in order to match the value of each attribute (which is not known in advance), a tagged expression (.+) is used twice. The tagged expression (.+) matches the value of each attribute (assumed to be one or more occurrences of any character, that is .+ ).

4.In the Replace field, enter the following: <product size="\2" id="\1"/> . This regular expression contains the replacement text for each matched product element. Notice that it uses two references \1 and \2 . These correspond to the tagged expressions from the Find field. In other words, \1 means "the value of attribute id" and \2 means "the value of attribute size".

inc-RegexExample02

6.Click Replace All _ic_regex_replaceall and observe the results. All product elements have now been updated so that attribute size comes before attribute id.

© 2017-2023 Altova GmbH