| 
				
					
				 
					
					| kbulgrien |  
					
					| Newbie |  
					
					|  |  
					
					| Longview, TX |  
					
					|  |  
					
					|  |  
					
					| None Specified |  | 
				
					
				 
					
					| Monday, June 27, 2011 |  
					
					| Monday, June 27, 2011 9:21:48 PM |  
					
					| 1 [0.01% of all post / 0.00 posts per day]
 |  | 
| 
		
			
		 
			| Since this is a topic of interest to me, I'll put a few numbers in simply to give a data point of success with a large file.  One a Xeon 3 GHz processor with Windows XP and 2 GB of RAM, it was possible to load two ~55 GB XML data files and diff them as XML.  When diff'd as text, the diff showed 1800+ differences, and when diff'd as XML, only about 450. 
 In my case, with a single processor, while it is diff'ing, the system is pegged at 100% and is basically unusable... though this might be due to a system disk that is quite full and so the page file might be very fragmented.  It might be worth trying to move the page file to a clean, unfragmented disk.
 
 I could retain a modicum of usability by taking the priority of the DiffDog process down a notch, but it is still very sluggish.
 
 During the text diff, it had the appearance that my page file was about 1.1 GB.
 
 I'm not exactly sure how much it matters under the hood, but the native encoding of the files being diff'd was UTF16-LE.  I had converted them to UTF-8 before diff'ing, and that halved the size of the files down from over 100MB to the 55MB size.
 |  |