Altova Mailing List Archives>Archive Index >microsoft.public.xsl Archive Home >Recent entries >Thread Prev - Re: PHP 5, XSL transformations of big files >Thread Next - Re: PHP 5, XSL transformations of big files Re: PHP 5, XSL transformations of big filesTo: NULL Date: 5/5/2008 10:07:00 PM On Sun, 4 May 2008 13:12:49 -0700 (PDT), Meglio <x.meglio@g...> wrote: >> >> >> > No problem with time execution, no problem with memory limits. The >> >> > problem is script execution time - it do not completed even in 10 >> >> > minutes. >> >> >> You had three quick responses yet seem to have picked on the one that least >> >> suited you, did any of the other suggestions help? >> >> >No. I do not know how to move forward to solve this problem. >> In the meantime, profiling PHP using XDebug and analysing the output >I have not skills to use CacheGrind and XDebug with PHP Actually, you can just run those on a windows workstation with minimal effort - cachegrind just reads the xdebug extension's output profile - I set those up last week first time in about 20 minutes. I'm sure you could too. >> I'll take a look at the XSL to see if it's doing unnecessary work. Well the bad news is the XSLT you provided is doing at least twice the work it needs to - the are two instances of the call to process <xsl:for-each select="./Category"> at the start of the stylesheet. The next things I noticed are (1) you're using XML default output - setting <xsl:output type="text" /> would be more appopriate for generating a SQL script for consumption by the DB server. (2) You're using xsl:for-each, which is slower than using xsl:apply-templates in pretty much all XSL processors I can think of (3) You're unnecessarily asking the processor to traverse to the current parent node (which you're already in) by using the syntax like <xsl:value-of select="./Gravity" /> - try - <xsl:value-of select="Gravity" /> instead (4) You're doing a lot of tail recursion to quote strings, and the actual quoting can still be bypassed in several ways (which is a security risk if you don't trust the source provider) As Joe noted, PHP5 supports XMLTextReader. In the document above, you're not doing anything other than a forward read - there's no need to invoke XSL to transform to flat text, as other tools exist. Similarly, although SimpleXML might look attractive, it will also load the entire document into memory, which isn't what we want here. I spent a while to run through this and generated the following script, which is close to your requirements (obviously run locally checking SQL field order and that no unexpected blank entries are made). It outputs the SQL lines as fast as possible, and discards used data by resetting sitenodes array on each pass. Probably you'd want to use this as a starting point to create a class rather than just a bunch of recursive functions. Note we're using PDO to generate the quoted string (enable that extension, which you should be using anyway), and only quoting the listed strings in your original XSL (adding Id field, which you forgot to quote in the original XSL - which is a string) After a few optimisations, I ran this on your data in 170 seconds - including building the SQL output script (14.5MB). I did that on a Celeron 400 Mhz with 160MB of memory, PHP5, MySQL5 and Apache 2. It should be a very small part of a minute on a modern machine ;-)) HTH Cheers - Neil ================================================ <?php error_reporting(E_ALL); ini_set('max_execution_time', 3600); $start = microtime(true); define('CONFIG_OUTPUT_SQL', 'D:\\htdocs\\XML\\sqlout.sql'); define('CONFIG_SQL_USER', 'your_db_username'); define('CONFIG_SQL_PASS', 'your_db_password'); define('CONFIG_SQL_CONN', 'mysql:host=127.0.0.1;port=3306;dbname=your_default_db'); define('CONFIG_INPUT_XML', 'D:\\htdocs\\XML\\marketplace_feed_v1.xml'); $conn = new PDO(CONFIG_SQL_CONN, CONFIG_SQL_USER, CONFIG_SQL_PASS); $fp = fopen(CONFIG_OUTPUT_SQL, 'w+'); $reader = new XMLReader; $reader->open(CONFIG_INPUT_XML); while ($reader->read()) { if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'Category') { parseCategoryNodes(); } } $reader->close(); $conn->close(); fclose($fp); function parseCategoryNodes() { global $reader, $conn, $fp; $sQuery = "INSERT INTO marketplace_tmp (category, subcategory, id, title, description, popularityrank, hasrecurringproducts, gravity, earnedpersale, percentpersale, totalearningspersale, totalrebillamt, referred, commission) VALUES "; while ($reader->read()) { if ($reader->nodeType == XMLReader::ELEMENT) { switch ($reader->name) { case 'Name' : $reader->read(); if ($reader->nodeType == XMLReader::TEXT) { $current_category = $conn->quote($reader->value); } break; case 'Site' : $sitenodes = parseSiteNodes(); $query = $sQuery. "("; $query .= $current_category.", "; $query .= $current_category.", "; $query .= join(", ", $sitenodes); $query .= ")\r\n"; fwrite($fp, $query); } // End switch ($reader->name) } } // End while ($reader->read()) fclose($fp); } function parseSiteNodes() { global $reader, $conn; $sitenodes = array(); while ($reader->read()) { // Check if we reached the end of the 'Site' container node, return the resultset if ($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'Site') { return $sitenodes; } if ($reader->nodeType == XMLReader::ELEMENT) { // DB field names happen to be lower case here (but needn't be for MySQL) $nodename = strtolower($reader->name); // Read next text node which follows the field name element $reader->read(); if ($reader->nodeType == XMLReader::TEXT || $reader->nodeType == XMLReader::CDATA) { // Process site nodes with category info into quoted DB strings depending on driver if ($nodename == 'id' || $nodename == 'title' || $nodename == 'description') { $sitenodes[$nodename] = $conn->quote($reader->value); } else { $sitenodes[$nodename] = ($reader->value == '' ? 0 : $reader->value); } } } } } $end = microtime(true); print("Duration : ".(($end - $start) * 1000)." msec"); ?> ================================================ ------------------------------------------------ Digital Media MVP : 2004-2008 http://mvp.support.microsoft.com/mvpfaqs | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
