Altova User Support Forum

Welcome Guest

IMPORTANT:
this is not a Support Forum! Experienced users might answer from time to time questions posted here. If you need a professional and reliable answer, or if you want to report a bug, please contact Altova Support instead.

Altova User Forums » Product/Technology Discussions » MapForce & MapForce Server » Source File Size Limit

Source File Size Limit

Options · View

Previous Topic · Next Topic

gymac

Posted: Tuesday, April 12, 2011 6:00:11 PM

Rank: Advanced Member

Joined: 1/6/2011
Posts: 31

Hi all,

This might be a off-side topic, but since we need to find out in advance whether this is possible in MapForce before we seek other alternatives.

We have two main source text files (both of them can contain up to 20 million records and can be 30gb in size).

Since mapping is relatively easy, so MapForce is definitely our preferred way, but the file size is actually too big (30gb), just wonder will MapForce be able to handle that? If yes, how is the performance？

Thanks a lot for helping.

vlad

Posted: Tuesday, April 12, 2011 7:22:30 PM

Rank: Advanced Member

Joined: 12/13/2005
Posts: 2,856
Location: Mauritius

No, MapForce will unable to handle such files. In its current version MapForce is able to write files of any size (for example, this could allow you to dump a whole database into text or XML files or combine many small files into one), but not read them - this is planned for future, but is not trivial.

If you want to stick to MapForce you will have to split your files into much smaller chunks before.

Bayram

Posted: Wednesday, August 3, 2011 12:54:12 PM

Rank: Newbie

Joined: 8/3/2011
Posts: 2
Location: Belgium

Hi Vlad,

You said "you will have to split your files into much smaller chunks".

Can you be more precise about the size of the files ?

==> What's the maximum size for the source file that MapForce can handle ?

Thanks,
Bayram

vlad

Posted: Wednesday, August 3, 2011 6:35:16 PM

Rank: Advanced Member

Joined: 12/13/2005
Posts: 2,856
Location: Mauritius

This depends on a million of factors, but the main ones are:

- whether you are using 64 bit or 32 bit variant
- in case of 64 bit - how much memory you have
- the structure of your file
- the mapping itself

So, basically, if you really want to answer this question - you will have to try in you particular case.

By the way, Altova has on their to-do implementation of streaming reading - together with streaming writing, which is already available, there will be no real limits anymore. But nobody can say for sure when they will be able to implement it...

miho

Posted: Monday, March 26, 2012 1:55:36 PM

Rank: Newbie

Joined: 7/22/2011
Posts: 5
Location: Germany

Hi,
according to release notes 'New features in MapForce Version 2012 include: Streaming input/reading for XML using Built-in execution engine'
I tried this latest 64bit version with a huge XML input and csv output and found it working using a simple 3 field extract.
XML input file had 450MB size and resulted in 2.1 million rows in the 70MB csv output file.
The simple mapping was to extract 3 fields from the lowest level of a complex XML structure. (see screenshot below)
Looking at the MapForce process using SysInternals ProcessExplorer it did consume ~ 1.2GB memory.
For the announced 'Streaming input/reading' I thought this is actually a lot of memory consumed.

In a second mapping I then added a sort which caused it to fail because all memory on my 4GB PC was consumed.

I was wondering if:
1) 'Streaming input/reading' was actually used or is applicable at all for my second scenario.
2) if I could change something in the second mapping to make it work

Is anybody in a position to comment or (even better) educate us about an approach to verify the usage of streaming by a mapping?

simple mapping

simple mapping with sort failed

Michael

miho attached the following image(s):

vlad

Posted: Tuesday, March 27, 2012 7:16:58 AM

Rank: Advanced Member

Joined: 12/13/2005
Posts: 2,856
Location: Mauritius

Starting with v2012 MapForce is only processing files this way - always, provided you transfer files with MapForce itself and not generate your code. If you believe that MapForce is using too much memory in your particular case, then you need to give Altova chance to see what is going on - contact Altova Support with your files (or at least snippets of your files).

But it is clear why MapForce fails when you are adding a sorting component - sort has to load all data at once in order to actually sort it - in this case streaming doesn't result in any advantages.

miho

Posted: Tuesday, March 27, 2012 8:31:37 AM

Rank: Newbie

Joined: 7/22/2011
Posts: 5
Location: Germany

Hi Vlad,
Thanks for answering.
Actually I'd conclude that 'sort has to load all data at once in order to actually sort it' is more precisely actually implemented as
'sort loads all input data at once into memory in order to actually sort it' but really only the data required for output needs to be sorted (which is a fraction).
Sadly this implementation approach will never achieve the marketing claim 'unlimited file size' because sorting is such a common requirement for any data transformation!
As I started my IT life in the 'IT Stonage' where never enough main memory was available I know there are implementation approaches that can successfully deal with resource constrains.
Lets hope a marketing or development manager responds to this shortcoming and adds it to the requirements list of the next release.
Regards
Michael

vlad

Posted: Tuesday, March 27, 2012 10:36:50 AM

Rank: Advanced Member

Joined: 12/13/2005
Posts: 2,856
Location: Mauritius

OK, maybe I was not very exact - yes, of course, only the data which is required for output must be loaded for sorting, but nevertheless, in order to sort all data, MapForce has to collect all data - and this means the success of mapping depends on the size of memory you have.

This corresponds to a common sense, and not marketing or development management.

P.S. but it would be indeed a good idea for Altova to discuss such things in documentation, even though they are self-evident for most developers.

miho

Posted: Tuesday, March 27, 2012 3:05:19 PM

Rank: Newbie

Joined: 7/22/2011
Posts: 5
Location: Germany

well 'common sense' did lead me to believe that a 70MB csv output file would fit into 4GB memory but in reality it doesn't.
That is the reason of my frustration based on marketing claims

vlad

Posted: Friday, March 30, 2012 9:10:52 PM

Rank: Advanced Member

Joined: 12/13/2005
Posts: 2,856
Location: Mauritius

I don't know why you are always talking about the output file which is 70MB big, where the problem is with the input file which is 450 MB big. I don't know how much you have tried in loading whole XML files into a memory, but in the past XMLSpy has documented to require a 10-fold of the original size. And this was with a 32-bit system. With 64-bit system every pointer becomes double as big - you can usually count another 50% more of space. I'm not sure how big is the source part which needs to be sorted, how big is the index which represents such sort, but if you have a 4GB 64bit system, then you most probably have about 2GB available for a normal application.

Sorry, but you simply have not enough memory. If you really have no chance to increase memory, then you should stick to a 32-bit system, which is a much better fit for a 4GB system

Users browsing this topic

guest

Forum Jump

You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Use of the Altova User Forum(s) is governed by the Altova Terms of Use.

Email this topic

Watch this topic

Normal

Threaded

user forum