Altova MapForce 2024 Enterprise Edition

The Collage object takes all the pages it receives from its parent and glues them into one large page. The collage presents this page as a single group of a single page to its children (which are often splitters). Collages are useful in situations in which, for example, a row of data starts at the bottom of one page and continues on the next page. The Collage object will enable you to merge the parts of this row into one (see example below).

 

For information about how to add objects to the model tree, see Insert an Object.

 

Example

The example discussed in this topic is similar to the template described in the topic called Merge Source and Target. The sample PDF file used in this example is similar to the one used in Merge Source and Target, except for one row that spans across two pages (screenshot below).

PDFEX_CollageSplitRow

In order to correctly extract data from the sample document, we will use the following procedures:

 

1.We will need to create a separate Merge Source for each page. For each Merge Source, we will define the region manually (as opposed to using automatic table suggestions).

2.We will then combine the Merge Sources into one Merge Target.

3.We will also add a Collage object as a child of the Merge Target. The Collage will glue the snippets collected by the Merge Target.

4.The Collage will include the Split object with a group of Text Captures, each Capture representing a particular column of the table.

 

Model tree

For details about the Merge Sources, Merge Target, and the Filter/Group object wrapped into the Split, see Merge Source and Target. After taking the steps described above, we have created the following model tree:

PDFEX_CollageModelTree

Regions of Merge Sources

Since the first page ends with an edge and the second page starts with an edge, we need to make sure that the bottom border of the region on the first page is above the bottom edge and the top border of the region on the second page is below the top edge. This will cause the Collage to ignore the bottom edge of the first page and the top edge of the second page, and the parts of the row that spans across two pages will successfully be merged into one row. You can always adjust the size of the region manually, by clicking the Region label in the PDF View pane and dragging the border of the region to the desired location. The screenshot below shows that the top border of the region on the second page has been dragged down.

PDFEX_CollageModifyRegion

The Collage now looks as follows:

PDFEX_CollageBeforeMerge

The Collage includes the Split object that has correctly identified the split positions, and the parts of the faulty row are now treated as a single row (screenshot below).

PDFEX_CollageSplitPositions

Output

As a result of splitting the Collage into rows, the row that spanned across two pages now looks as follows in the Output pane:

 

<Book>

<Title>Harry Potter and the Philosopher's Stone</Title>

<Author>J.K. Rowling</Author>

<ISBN>1408855895</ISBN>

<Publisher>Bloomsbury

Children's Books</Publisher>

<PrintLength>352</PrintLength>

<Year>2014</Year>

<Genre>Fantasy</Genre>

<Price>14.28</Price>

</Book>

 

© 2018-2024 Altova GmbH