|
|
Rank: Advanced Member
Joined: 1/6/2011 Posts: 31
|
Hi all,
This might be a off-side topic, but since we need to find out in advance whether this is possible in MapForce before we seek other alternatives.
We have two main source text files (both of them can contain up to 20 million records and can be 30gb in size).
Since mapping is relatively easy, so MapForce is definitely our preferred way, but the file size is actually too big (30gb), just wonder will MapForce be able to handle that? If yes, how is the performance?
Thanks a lot for helping.
|
|
Rank: Advanced Member
Joined: 12/13/2005 Posts: 2,856 Location: Mauritius
|
No, MapForce will unable to handle such files. In its current version MapForce is able to write files of any size (for example, this could allow you to dump a whole database into text or XML files or combine many small files into one), but not read them - this is planned for future, but is not trivial.
If you want to stick to MapForce you will have to split your files into much smaller chunks before.
|
|
Rank: Newbie
Joined: 8/3/2011 Posts: 2 Location: Belgium
|
Hi Vlad,
You said "you will have to split your files into much smaller chunks".
Can you be more precise about the size of the files ?
==> What's the maximum size for the source file that MapForce can handle ?
Thanks, Bayram
|
|
Rank: Advanced Member
Joined: 12/13/2005 Posts: 2,856 Location: Mauritius
|
This depends on a million of factors, but the main ones are:
- whether you are using 64 bit or 32 bit variant - in case of 64 bit - how much memory you have - the structure of your file - the mapping itself
So, basically, if you really want to answer this question - you will have to try in you particular case.
By the way, Altova has on their to-do implementation of streaming reading - together with streaming writing, which is already available, there will be no real limits anymore. But nobody can say for sure when they will be able to implement it...
|
|
Rank: Newbie
Joined: 7/22/2011 Posts: 5 Location: Germany
|
Hi, according to release notes 'New features in MapForce Version 2012 include: Streaming input/reading for XML using Built-in execution engine' I tried this latest 64bit version with a huge XML input and csv output and found it working using a simple 3 field extract. XML input file had 450MB size and resulted in 2.1 million rows in the 70MB csv output file. The simple mapping was to extract 3 fields from the lowest level of a complex XML structure. (see screenshot below) Looking at the MapForce process using SysInternals ProcessExplorer it did consume ~ 1.2GB memory. For the announced 'Streaming input/reading' I thought this is actually a lot of memory consumed.
In a second mapping I then added a sort which caused it to fail because all memory on my 4GB PC was consumed.
I was wondering if: 1) 'Streaming input/reading' was actually used or is applicable at all for my second scenario. 2) if I could change something in the second mapping to make it work
Is anybody in a position to comment or (even better) educate us about an approach to verify the usage of streaming by a mapping?
simple mapping
![]()
simple mapping with sort failed
![]() Michael miho attached the following image(s):



|
|
Rank: Advanced Member
Joined: 12/13/2005 Posts: 2,856 Location: Mauritius
|
Starting with v2012 MapForce is only processing files this way - always, provided you transfer files with MapForce itself and not generate your code. If you believe that MapForce is using too much memory in your particular case, then you need to give Altova chance to see what is going on - contact Altova Support with your files (or at least snippets of your files).
But it is clear why MapForce fails when you are adding a sorting component - sort has to load all data at once in order to actually sort it - in this case streaming doesn't result in any advantages.
|
|
Rank: Newbie
Joined: 7/22/2011 Posts: 5 Location: Germany
|
Hi Vlad, Thanks for answering. Actually I'd conclude that 'sort has to load all data at once in order to actually sort it' is more precisely actually implemented as 'sort loads all input data at once into memory in order to actually sort it' but really only the data required for output needs to be sorted (which is a fraction). Sadly this implementation approach will never achieve the marketing claim 'unlimited file size' because sorting is such a common requirement for any data transformation! As I started my IT life in the 'IT Stonage' where never enough main memory was available I know there are implementation approaches that can successfully deal with resource constrains. Lets hope a marketing or development manager responds to this shortcoming and adds it to the requirements list of the next release. Regards Michael
|
|
Rank: Advanced Member
Joined: 12/13/2005 Posts: 2,856 Location: Mauritius
|
OK, maybe I was not very exact - yes, of course, only the data which is required for output must be loaded for sorting, but nevertheless, in order to sort all data, MapForce has to collect all data - and this means the success of mapping depends on the size of memory you have.
This corresponds to a common sense, and not marketing or development management.
P.S. but it would be indeed a good idea for Altova to discuss such things in documentation, even though they are self-evident for most developers.
|
|
Rank: Newbie
Joined: 7/22/2011 Posts: 5 Location: Germany
|
well 'common sense' did lead me to believe that a 70MB csv output file would fit into 4GB memory but in reality it doesn't. That is the reason of my frustration based on marketing claims
|
|
Rank: Advanced Member
Joined: 12/13/2005 Posts: 2,856 Location: Mauritius
|
I don't know why you are always talking about the output file which is 70MB big, where the problem is with the input file which is 450 MB big. I don't know how much you have tried in loading whole XML files into a memory, but in the past XMLSpy has documented to require a 10-fold of the original size. And this was with a 32-bit system. With 64-bit system every pointer becomes double as big - you can usually count another 50% more of space. I'm not sure how big is the source part which needs to be sorted, how big is the index which represents such sort, but if you have a 4GB 64bit system, then you most probably have about 2GB available for a normal application.
Sorry, but you simply have not enough memory. If you really have no chance to increase memory, then you should stick to a 32-bit system, which is a much better fit for a 4GB system
|
|
|
guest |