Posts

Use Built-In XPath Functions


In developing one of the Altova Online Training courses, I sorted a list of books by the authors. I realized that my author field was a string of the author’s full name, so the books were sorted by the first letter of the string, or the author’s first name. It did not fit into the course to fix the sorting, but you can easily extract the last name from a string and use it for the sorting key using XPath functions. If you then use the books’ titles for a secondary sort key, you run into an issue with titles that start with “A”, “An”, or “The”. I want to use the title for the secondary sort key, but ignore a leading definite or indefinite article.Output the book list with a  sort corrected using XPath expressions Let’s take a look at how we created this XSLT code.

This article was written using XMLSpy as the platform, but the same XPath expressions can be used inside MapForce or StyleVision to achieve similar results. We can start with a simple XML book list. We have 4 books with author and title nodes. List of three books An XSLT to create a list of the books would look like this: Output the book list without a sort This will generate the following output: Unsorted Book List The books are output in the order they appear in the original data file. If we add xsl:sort to the xsl:for-each loop, we can arrange our output in other ways. Output the book list with a basic sort This will generate a sorted list, but not sorted properly. Output from XSL with Basic Sort Sorting author as a string, results in “Jules Verne” appearing ahead of “Mark Twain”. Also, “A Connecticut Yankee in King Arthur’s Court” appears ahead of “Adventures of Huckleberry Finn”. We want to ignore the indefinite article, “A”, so that “Adventures of Huckleberry Finn” appears ahead of “A Connecticut Yankee in King Arthur’s Court”. We can use XPath expressions to extract the sorting keys we want. Output the book list with a  sort corrected using XPath expressions Let’s examine the code before we look at the output. We replace “author” with “reverse(tokenize(author, ‘ ‘))[1]”. Tokenize breaks the author string into tokens using a single white space as the break point. So, “Jules Verne” is tokenized into “Jules” and “Verne”. Reverse reverses the order of the tokens to “Verne” and “Jules”. The one in square brackets chooses the first item in the list, “Verne”. This is the value that is used in for the xsl:sort function to arrange the books. This is not the perfect solution, but it works in our case. The title looks convoluted, but the logic is straightforward. The “tokenize(title,’ ‘)[1]” expression extracts the first word of the title. So, the first if test is “Is the first word of the title the word “A”? “. If it is, then we return the substring of the title that starts with its third letter, thus eliminating “A” and the space. If the first word of the title is not “A”, then we need to test it again to see if the first word of the title is “The”. If it is, we use the substring of the title starting with its fifth character, thus eliminating “The” and a space. If we fail both tests, then we just pass the title along as the sorting key. We could add another test to our code to see if the first word is “An”, but it is not needed for this data set. Executing this last XSLT, we get the following output. Output from XSL with Corrected Sort “Mark Twain” is now ahead of “Jules Verne”. “Adventures of Huckleberry Finn” appears ahead of “The Celebrated Jumping Frog of Calaveras County” and “A Connecticut Yankee in King Arthur’s Court”. The flaw in our approach to the author string is that we want “Jules Verne” to be treated as “Verne, Jules” for the sort, so that if we had a book by “Jimmy Verne”, the sort would treat them as different authors. Our code does not. Using “concat(reverse(tokenize(author, ‘ ‘))[1], reverse(tokenize(author, ‘ ‘))[2])” would sort “Jules Verne” and “Jimmy Verne” correctly, but this solution only will work with 2 word names. If an author had a suffix (“Martin Luther King, Jr.”) or multiple words (“George Herbert Walker Bush”), the code would fail. There are many exceptions to the general rules on alphabetizing names, and the code to allow for all variants goes far beyond the scope of this article. What we wanted to show was the ability to manipulate XML data on the fly using XPath expressions. We do not always have complete control on the format of our data sources, but using the power of XPath expressions, we can transform the data into the format that we need. A copy of the files used in these examples is available here.

Tags: , , , , , , , ,

XML & Digital Textbooks


Last Sunday’s New York Times had an interesting article on the front page about digital textbooks for the K-12 market. The piece was undoubtedly partially inspired by Governor Arnold Schwarzenegger’s (he’s from California by the way) recently announced initiative that will replace some high school textbooks with digital versions. In fact, compared to standard printed texts, digital textbooks:

  • Can be more quickly and readily updated by publishers
  • Can often be purchased as individual chapters or a complete text
  • Are easier to store and transport, if downloaded to a portable computer
  • Can be combined with other digital materials, such as portions of other textbooks, periodical articles, instructor-provided materials, etc.
  • Can offer enormous cost-savings of because of elimination of materials, shipping and storage costs that are partially passed on to purchasers
  • Provide purchasing and procurement efficiencies
  • May feature learning tools content such as hyperlinks to related learning modules, electronic annotation by students, keyword searches, additional graphics and pop-up modules that furnish additional information

And so XML will finally have a chance to truly demonstrate its power in the K-12 market. For my part, I cannot think of a better example of the efficiencies of XML publishing than for education. Certainly most, if not all, of the major educational publishers are already using XML workflows internally because of benefits like validation, single source publishing, amenability to standards and metadata tagging, etc. XML also gives publishers the ability to easily manage multi-dimensional educational content. Educational content, like textbooks and other learning materials, is usually structured around a fairly simple content model using word forms such as titles, paragraphs, quotes, etc. The second dimension of the content is contextual information – footnotes, glossary terms, highlighting items – anything that may be necessary to target a specific audience. For instance, if a piece of content is to be included in a sixth grade textbook it would have different markup than if it were to be used for an eighth grade classroom. The third dimension of K-12 educational content is the standards dimension. Standards are in most cases on the state level and are used to ensure that teachers know exactly what topics they are teaching in a particular piece of the content, ensuring they are covering the complete set of standards for state aptitude tests, like the MCAS. The standards dimension itself has the potential for further layering as content producers adopt their own standards to guide teachers to other relevant standards and topics that the content is aligned to. XML is particularly well-suited to digital publishing of educational content for its ability to easily separate or layer these dimensions and repurpose it in nearly unlimited ways without the need for rekeying information. For example, one company in the article, CK-12 Foundation, develops free “flexbooks” that can be customized to correlate with state standards. Without XML, this would be a nearly (if not completely) impossible undertaking – with XML you can use many of the existing XML content creation tools to streamline the process. So what has taken so long for the K-12 market to embrace XML-enabled digital learning materials? Well, it appears that the issue is an economical one. We still live in a country where many students do not have access to a computer, and few school districts have the means to provide them. Perhaps in the near future there will be a solution for this problem – and perhaps, just perhaps, California has just taken the first steps to lead us in the right direction. So, where does Altova fit into this equation? Well, the Altova MissionKit offers support for intelligent XML content creation and editing for both technical and non-technical users. These tools give educational publishers and other content contributors the ability to work with structured XML content in a comfortable atmosphere, with easy-to-use interfaces, entry-helpers, drag and drop functionality, and a wide variety of options that make working in a team environment a flexible and even seamless process. Visit the Altova website to read more about the MissionKit – or download a free 30-day trial today!

Tags: , , , , ,