Altova Mailing List Archives>Archive Index >microsoft.public.xsl Archive Home >Recent entries >Thread Prev - Re: PHP 5, XSL transformations of big files >Thread Next - Re: PHP 5, XSL transformations of big files Re: PHP 5, XSL transformations of big filesTo: NULL Date: 5/12/2008 9:36:00 AM On May 6, 1:07 am, "Neil Smith [MVP Digital Media]" <n...@nospam.com> wrote: > On Sun, 4 May 2008 13:12:49 -0700 (PDT), Meglio <x.meg...@gmail.com> > wrote: > > > > >> >> > No problem with time execution, no problem with memory limits. The > >> >> > problem is script execution time - it do not completed even in 10 > >> >> > minutes. > > >> >> You had three quick responses yet seem to have picked on the one that least > >> >> suited you, did any of the other suggestions help? > > >> >No. I do not know how to move forward to solve this problem. > >> In the meantime, profiling PHP using XDebug and analysing the output > >I have not skills to use CacheGrind and XDebug with PHP > > Actually, you can just run those on a windows workstation with minimal > effort - cachegrind just reads the xdebug extension's output profile - > I set those up last week first time in about 20 minutes. I'm sure you > could too. > > >> I'll take a look at the XSL to see if it's doing unnecessary work. > > Well the bad news is the XSLT you provided is doing at least twice the > work it needs to - the are two instances of the call to process > <xsl:for-each select="./Category"> at the start of the stylesheet. > > The next things I noticed are > > (1) you're using XML default output - setting <xsl:output type="text" > /> would be more appopriate for generating a SQL script for > consumption by the DB server. > > (2) You're using xsl:for-each, which is slower than using > xsl:apply-templates in pretty much all XSL processors I can think of > > (3) You're unnecessarily asking the processor to traverse to the > current parent node (which you're already in) by using the syntax like > <xsl:value-of select="./Gravity" /> - try - > <xsl:value-of select="Gravity" /> instead > > (4) You're doing a lot of tail recursion to quote strings, and the > actual quoting can still be bypassed in several ways (which is a > security risk if you don't trust the source provider) > > As Joe noted, PHP5 supports XMLTextReader. In the document above, > you're not doing anything other than a forward read - there's no need > to invoke XSL to transform to flat text, as other tools exist. > > Similarly, although SimpleXML might look attractive, it will also load > the entire document into memory, which isn't what we want here. > > I spent a while to run through this and generated the following > script, which is close to your requirements (obviously run locally > checking SQL field order and that no unexpected blank entries are > made). It outputs the SQL lines as fast as possible, and discards used > data by resetting sitenodes array on each pass. > > Probably you'd want to use this as a starting point to create a class > rather than just a bunch of recursive functions. > > Note we're using PDO to generate the quoted string (enable that > extension, which you should be using anyway), and only quoting the > listed strings in your original XSL (adding Id field, which you forgot > to quote in the original XSL - which is a string) > > After a few optimisations, I ran this on your data in 170 seconds - > including building the SQL output script (14.5MB). I did that on a > Celeron 400 Mhz with 160MB of memory, PHP5, MySQL5 and Apache 2. > > It should be a very small part of a minute on a modern machine ;-)) > > HTH > Cheers - Neil > > ================================================ > > <?php > > error_reporting(E_ALL); > ini_set('max_execution_time', 3600); > $start = microtime(true); > > define('CONFIG_OUTPUT_SQL', 'D:\\htdocs\\XML\\sqlout.sql'); > define('CONFIG_SQL_USER', 'your_db_username'); > define('CONFIG_SQL_PASS', 'your_db_password'); > define('CONFIG_SQL_CONN', > 'mysql:host=127.0.0.1;port=3306;dbname=your_default_db'); > define('CONFIG_INPUT_XML', > 'D:\\htdocs\\XML\\marketplace_feed_v1.xml'); > > $conn = new PDO(CONFIG_SQL_CONN, CONFIG_SQL_USER, CONFIG_SQL_PASS); > > $fp = fopen(CONFIG_OUTPUT_SQL, 'w+'); > > $reader = new XMLReader; > $reader->open(CONFIG_INPUT_XML); > > while ($reader->read()) { > if ($reader->nodeType == XMLReader::ELEMENT && $reader->name > == 'Category') { > parseCategoryNodes(); > } > > } > > $reader->close(); > $conn->close(); > fclose($fp); > > function parseCategoryNodes() { > > global $reader, $conn, $fp; > > $sQuery = "INSERT INTO marketplace_tmp (category, subcategory, > id, title, description, popularityrank, hasrecurringproducts, gravity, > earnedpersale, percentpersale, totalearningspersale, totalrebillamt, > referred, commission) VALUES "; > > while ($reader->read()) { > > if ($reader->nodeType == XMLReader::ELEMENT) { > switch ($reader->name) { > case 'Name' : > $reader->read(); > if ($reader->nodeType == > XMLReader::TEXT) { > $current_category = > $conn->quote($reader->value); > } > break; > > case 'Site' : > $sitenodes = parseSiteNodes(); > > $query = $sQuery. "("; > $query .= $current_category.", > "; > $query .= $current_category.", > "; > $query .= join(", ", > $sitenodes); > $query .= ")\r\n"; > fwrite($fp, $query); > > } // End switch ($reader->name) > } > } // End while ($reader->read()) > > fclose($fp); > > } > > function parseSiteNodes() { > > global $reader, $conn; > $sitenodes = array(); > > while ($reader->read()) { > // Check if we reached the end of the 'Site' container node, > return the resultset > if ($reader->nodeType == XMLReader::END_ELEMENT && > $reader->name == 'Site') { > return $sitenodes; > } > > if ($reader->nodeType == XMLReader::ELEMENT) { > // DB field names happen to be lower case here (but needn't be > for MySQL) > $nodename = strtolower($reader->name); > // Read next text node which follows the field name element > $reader->read(); > > if ($reader->nodeType == XMLReader::TEXT || > $reader->nodeType == XMLReader::CDATA) { > > // Process site nodes with category info into quoted DB strings > depending on driver > if ($nodename == 'id' || $nodename == > 'title' || $nodename == 'description') { > $sitenodes[$nodename] = > $conn->quote($reader->value); > } else { > $sitenodes[$nodename] = > ($reader->value == '' ? 0 : $reader->value); > } > } > } > } > > } > > $end = microtime(true); > > print("Duration : ".(($end - $start) * 1000)." msec"); > ?> > > ================================================ > ------------------------------------------------ > Digital Media MVP : 2004-2008http://mvp.support.microsoft.com/mvpfaqs Hi, Neil. Thank you again for you comments and advices. I improved your PHP script (you do not take into account that there are subcategories inside categories) and I embedded it to my automation and it works good and fast now (up to 1 minute). Thank you very much for you help! I can donate some encouragement if you have PayPal or Google Checkout account ;) You helped me a lot really!! Anton | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
