Resist Data Integration Redundancy


The Internet makes massive amounts of data available for lots of interesting applications. But whenever you design a unique analysis and presentation of information you don’t privately control, you risk that the owner will offer the same view at some point in the future, instantly making your application redundant.

That’s exactly what happened to the Groupon API data-mining project we originally wrote about in August, 2011. Fortunately, the core of our project is a MapForce graphical data mapping. We can quickly and easily tweak the mapping and repurpose it to present an entirely different data set that provides new value.

HTML output from MapForce and StyleVision

Read more…

Tags: , , ,

Building Web Pages – HTML Design with StyleVision


The rapid pace of today’s business environment means that information – along with the format in which it is required – changes often. Although some Web pages contain content that doesn’t often change (e.g., About Us and directions pages), the majority of today’s corporate Websites are continually updated with new data.
For this reason, many organizations choose to store Web content in XML. This allows organizations to develop content in a highly efficient manner because information in the XML file can be used for multiple purposes and in multiple output formats – the XML Schema associated with the XML file describes the content model.
StyleVision is a powerful stylesheet and report designer that can help you leverage XML. StyleVision will allow you to build Web pages with sophisticated formatting in a template-based, drag and drop design window. StyleVision auto-generates XSLT stylesheets so that you can integrate your design into a new or existing site – you can even generate ASPX Web applications right from the File menu.
clip_image001

In this post we’ll design a Web page that will show off some of StyleVision’s HTML formatting capabilities. Although StyleVision’s built in formatting capabilities allow you to create sophisticated designs via simple drag-and-drop, for this example we’ll use CSS3, images, and other standard design elements to create a Web page that doesn’t need to be reformatted when content changes.

Read more…

Tags: , ,

Processing the Groupon API – Epilogue


Rare edge cases can derail loosely coupled data mapping applications. This is especially true when you are consuming large datasets available over the Internet and have little or no influence over the source data. In this article we describe a debugging technique that lets developers working on data mapping and transformation projects quickly identify and accommodate unexpected data in a stream from a remote source. The Problem Last summer we wrote a series of blog posts describing how to work with the Groupon API to retrieve a subset of offers in all Groupon cities and format the list for a web browser or mobile device. MapForce output from the Groupn API, displayed on a mobile device We concluded with a command line to run a MapForce data mapping that calls the Groupon API over 150 times — once for each Groupon city, then filters the data to extract deals sold on the Internet instead of a physical location, and formats the results in HTML using StyleVision. Every morning we run the command line in a batch file that saves the HTML output on a local server so our colleagues can check it out with any Web browser to find interesting offers from all over the country. The mapping ran fine for more than two months until one day it failed with this error message: “Source-value “” of type dateTime could not be converted into target-type dateTime.” The specific explanation is that somewhere in the mapping where we expected a dateTime, we received an empty value. On a more abstract level, the error suggests a potential defect in the logic of our mapping strategy. Every time we call the Groupon API we receive a well-formed XML data stream enclosed in a <response> element, but the API specs do not include an XML Schema defining the data that may be returned. When we developed our mapping we needed to analyze the raw data and select the output we wanted, so our first step was to call the API to capture all the Groupon deals for one large metro area. We assumed we would get a large enough data sample to include every possible option in the API response. After our mapping ran successfully for two months, the API finally delivered a rare edge case that did not fit the pattern we expected. Debugging Tools MapForce provides debugging help. We can run our data mapping using the MapForce built in execution engine to see more details in the Messages window. MapForce Messages window siplays data mapping error The lines labeled Related location are hyperlinked back to components in the mapping where the error occurred. Clicking on the result error takes us to a format-dateTime function. format-dateTime function in MapForce We can either click the “” error or trace the value connector to identify the input element to the format-dateTime function. Either way, we locate the element that triggered the error. clip_image004 The suspect element resides in the input component that captures all the data returned by our calls to the Groupon API before any filtering or conversion takes place. When we designed the mapping, the endAt element in our sample data always reported the ending date and time for each Groupon offer, but for some reason we must have received an empty value in this field. If the error had occurred by running a local input file we could simply examine the file contents, but in this case the data came from multiple URLs, and is only held temporarily until it is mapped to the output component. Fortunately, we can apply a trick to easily modify the mapping and preserve all data received from the Groupon API. We simply copy the input component and paste a duplicate into the mapping. We can connect the response element from the original to the duplicate, which simultaneously maps all the child elements between the components. clip_image005 Our original input component is now connected to two output components. We can select which output component will be generated by the MapForce built-in execution engine by clicking the eye icon at the top right corner of any output component. The new output component simply saves a copy of everything in the input component. When we examine the raw data using XMLSpy, sure enough we find an empty element where we expected a date and time: clip_image006 The Solution Now that we know an offer might have no specific end time, we can plan for that possibility in the mapping. In the revised treatment of the endAt element, we do an if-test before the original format-dateTime function and provide an alternate output when the endAt element is empty. clip_image007 We had to work fast because all Groupon data is time sensitive. The edge case would eventually expire and disappear from the data stream. This experience showed us how important it is to have powerful debugging tools and to use them creatively, even after you think a data mapping project is running successfully! Altova MapForce is available in a free trial – the next edge case you solve could be your own. Editor’s Note: Our original series on mapping data from the Groupon API ran in three parts you can see by clicking the links here: Part 1 of Processing the Groupon API with Altova MapForce describes how to create dynamic input by collecting data from multiple URLs. Processing the Groupon API with MapForce – Part 2 describes how we filtered data from the API and defined the output to extract only the most interesting details. Processing the Groupon API – Part 3 describes formatting the output as a single HTML document optimized for desktop and mobile devices, and reviews ways to automate repeat execution.

Tags: , , , ,

DiffDog Takes to the Cloud


Techy folks generally have a good diff tool they rely on to compare and sync files and directories. But what happens when, as more and more info is bound for the cloud, your data lives on servers accessed via URL? DiffDog diff/merge tool There are myriad applications today that live on servers accessed via HTPP – but let’s take a look at a common example: SVN. Subversion (SVN) repositories include WebDAV as a commonly used server option. WebDAV is a natural protocol for SVN because its concern is hierarchy, structured metadata, and versions. Since WebDAV is an extension of HTTP it gives easy access to basic information about files and folders to any HTTP-aware client, including DiffDog – Altova’s diff/merge tool for files, directories, and databases. However, DiffDog knows a few tricks that set it apart from the other breeds.

Diff/Merge via WebDAV

SVN clients typically support command line differencing; however, a text-only representation of the changes in even one file can be hard to read and use. When you want to compare the trunk against a tagged version, the problem is magnified.  There are several visual differencing tools available that can help with analyzing version changes in SVN. They have varying degrees of compatibility with how SVN works. Some tools are well integrated with the SVN command line. DiffDog includes all the common comparison options for a tool that is tightly integrated with SVN clients.  Where it excels is its ability to talk to SVN servers.  Accessing an SVN repository with DiffDog using WebDAV is simple. The easiest starting point is to open Directory Comparison View and paste in the URLs of the folders you want to compare. In this case we’re comparing SVN branches on Projectlocker.com. The two sets of files open, and DiffDog provides a color-coded, browsable view of the differences between the two directories. Directory Comparison in DiffDog   Clicking on either one of a pair of files opens a detailed file comparison.   File comparison in DiffDog DiffDog’s ability to distinguish between changes to XML and meaningful changes is key in this situation – most development trees have some amount of XML in them.  DiffDog also supports comparing Word docs and databases – so all bases are covered. XML-aware diff options Of course, folders you compare do not have to both be WebDAV SVN folders.  It is equally straightforward to compare the SVN server with a local directory. DiffDog’s ability to access servers via HTTP (or FTP) opens a world of possibilities: comparing a local directory with a Google Docs directory, or diffing a local Web server against files hosted on the Amazon CloudFront , or even just synching photos between your local drive and your chosen back- up service.   If you’d like to try DiffDog, it’s available for a 30-day trial over on the Altova Web site.

Tags: , , ,

Digging deeper with the Twitter API: iPhone 4S vs. Galaxy Nexus


We found some interesting data when we dug below the surface of the iPhone 4S vs. Galaxy Nexus debate using the Twitter Search API.In today’s world there is a vast quantity of data available online that can be used for research, market analysis, and competitive intelligence. While “Big Data” can be a problem for those who produce it, store it, and compile it, it is highly beneficial for those of us who are looking for answers.Some of that data is fortunately available to be queried online, and, in particular, there is a vast quantity of data on social media interactions out there.TweetsQueryingSearchAPIIn this article we will explore how to use the Twitter Search API from MapForce, Altova’s data mapping/conversion/integration tool, to aggregate data on recent user submissions (“tweets”) on two highly popular topics – the Apple “iPhone 4S” vs. the “Galaxy Nexus” as the latest hot Android phone – and extract some statistical data about the users engaged in those discussions. One of the benefits of this abundance of data available to us today is that we can query it in interesting ways and extract new meaning from it. While there are undoubtedly many existing services that already provide trends over Twitter topics (e.g., Trendistic), those services only offer very simple trends and do not allow us to query any deeper.But all of the underlying data is available for grabs if you are just willing to learn a tiny bit about web service APIs and how to use them to extract XML data for further processing. As a starting point, let’s use the Twitter Search API to query the stream of recent tweets for the last 100 postings that are about the “Galaxy Nexus”. The Usage Guidelines for Twitter Search tell us that using both words in a query will result in the use of the default operator, which is AND, so we are going to search for posts that contain “Galaxy AND Nexus”. So let’s try that and request the most recent 100 items:

http://search.twitter.com/search.atom?q=galaxy+nexus&rpp=100

If you follow this link, you will get a second window with a lot of raw XML data that is formatted according to the Atom Syndication Format specifications. Alternatively, you could request the data in JSON format, if you wanted to directly process it via JavaScript code by hand, but we will use the XML-based Atom format so that we can easily analyze the data and extract the information we want.Viewing the above search result in a browser is not very user-friendly, so we can take a quick peek at the XML data in our favorite XML Editor using the Open from URL function:TweetsAtomGridAs you can see, the data for each entry includes a language code, so for this example we will extract data from this Twitter feed as well as from a second search result on the “iPhone 4S” and combine them into one intermediate XML file for further analysis.Extracting XML data is really easy in MapForce: using the “Insert XML File” option to drop in an XML source, we can again specify the same URL as before. If needed, MapForce will automatically create an XML Schema for the supplied data so we can visualize it and extract information from it:TweetAtomMappingIn our mapping we have dropped in two sources on the left side – one using a query string to search for “Galaxy Nexus” and the other to search for “iPhone 4S” – and on the right side we have dropped in a simple XML Schema that will allow us to aggregate our data and analyze it more conveniently going forward. In this case the mapping between the two sides is straight-forward as we are only extracting basic information about the user, the date, and the language of the tweet, but in other applications the mapping could be more complicated and include functions as well as queries to other data sources, databases, or web services…Previewing the resulting XML data can be done directly inside MapForce using the output tab, and this is what we see as a result of our data transformation:TweetsRawDataNow we can easily use the reporting capabilities of StyleVision to group this data by language within each topic and count the number of posts in each language. We can then report this data in the form of pie charts, which produces the following interesting results:TweetsByLanguageObviously, this data is highly dependent on the date of execution and time of day, as well as the particular announcements happening about these products, so the numbers will fluctuate quite a bit, but it can be used as a nice monitoring for seeing different language-specific trends. And once this has been set up, the report can be refreshed easily with the click of a button to get a snapshot at that point in time. For more long-term analysis it would of course be necessary to modify the mapping a bit to query more than 100 recent tweets.In this article we have used Twitter’s Search API as one example data source and only looked at language as one unique data point, but there are many more interesting sources of data available online today, and this approach can be used on all of them in a similar fashion.If you want to experiment with other data sources and other kinds of information that you want to extract, we invite you to try for yourself. A free 30-day evaluation version of MapForce is available, and there are no limits on how you can use the other features of Altova’s data mapping and conversion tool for data processing tasks that go beyond analyzing social media trends…

Tags: , , , ,