Home All Groups Group Topic Archive Search About
Author
20 Feb 2006 12:01 PM
James
I am using vb.net and need to keep in memory a large data structure, so I am
looking for the best option. And after several test I am pretty confused. So
I will be grateful if anyone can help me.



My basic need is:



Keep an "array" (or other structure) of pointers to "arrays" (or other
structure) of bytes. Example:



arrayPointer(1,000,000) each one point to arraByte1(), arrayByte2(),
arrayByte3().., arrayByte1000000().

So it is possible to redim each arrayByte() to the dimension I need.

I supposed that a configuration like the above (suppose each arrayByte hold
1 Byte) needs:



Pointers: 4 bytes x 1 Million = 4 Mb

Arrays: 1 byte per array x 1 Million arrays = 1 Mb



I have trying different option and I obtained a very high memory
comsumption, is there any way to keep memory low and achieve what I need?,

I am using "GC.GetTotalMemory(True)" to find out the memory usage is that
right?





Array of structures

Public Structure stbyte

        Public byt() As Byte

End Structure



Memory = GC.GetTotalMemory(True)

Dim intPo(1000000) As stbyte ' 1 Million structures

For intI = 1 To 1000000

ReDim intPo(intI - 1).byt(0) ' Redim to 1 byte each

Next

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000



I obtained Memory= 20 Mbytes





Jagged Array of Bytes

Memory = GC.GetTotalMemory(True)

Dim bytes(1000000)() As Byte ' 1 Million Jagged array

For intI = 1 To 1000000

ReDim bytes(intI - 1)(1) ' Each vector to 1 byte

Next

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000



I obtained Memory= 20 Mbytes





Rectangular Array of Bytes

Memory = GC.GetTotalMemory(True)

Dim bytes(1000000, 0) As Byte ' 1 Million array of 1 byte each

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000



I obtained Memory = 1 Mbytes (but in this case I can not redim each row).





Array of BitsArrays

Memory = GC.GetTotalMemory(True)

Dim bitArray(1000000) As BitArray ' 1 Million bitarrays

For intI = 1 To 1000000

bitArray(intI - 1) = New BitArray(8) ' Each bitarray 8 bits

Next

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000



I obtained Memory = 40 Mbytes

Author
20 Feb 2006 12:29 PM
Cor Ligthert [MVP]
James,

The first question you have to ask yourself forever with arrays is. "Are all
elements of the array used". If not than in what amount not.

Maybe is by instance the HashTable than a better choise.

However in my opinion is there not to give a general answer based on your
question as it is.

Cor
Author
20 Feb 2006 1:17 PM
James
Probably the question is not well define. What I need:

I need the less consuming memory data structure that allow me to:

1) Keep a list of "Pointers" or "references" to data structures (for this a
Hashtable is fine).
2) Each data structure can be resize dynamically, each one with different
sizes.

An example could be:
An array of pointers, each pointer points to an array of bytes. That is: 1
array of M pointers, and M arrays of bytes (each one can hod different
number of bytes).

Thanks,

james



Show quoteHide quote
"Cor Ligthert [MVP]" <notmyfirstn***@planet.nl> escribió en el mensaje
news:%23nuNPkhNGHA.2912@tk2msftngp13.phx.gbl...
> James,
>
> The first question you have to ask yourself forever with arrays is. "Are
> all elements of the array used". If not than in what amount not.
>
> Maybe is by instance the HashTable than a better choise.
>
> However in my opinion is there not to give a general answer based on your
> question as it is.
>
> Cor
>
>
Author
20 Feb 2006 2:04 PM
Patrice
The big picture may help (is this for things like spreadsheet cells ? What
is the nature of those informations ?).

For example a "sparse array" structure is a combination of elementar data
structures that allows to store only filled in cells. (i.e. you can have a
200x200 cells array but non used cells won't take storage).

Also as a side note that there is AFAIK not a single solution for this (for
example basically it uses a list of rows/columns and each list holds a list
of the cells on the row/column) but as long as the external interface is
exposed as an array (cells(x,y)) you can actually change the actual
implementation with no or minimal impact on the main code...

--
Patrice

Show quoteHide quote
"James" <i***@pricetech.es> a écrit dans le message de
news:u4556$hNGHA.2560@TK2MSFTNGP09.phx.gbl...
> Probably the question is not well define. What I need:
>
> I need the less consuming memory data structure that allow me to:
>
> 1) Keep a list of "Pointers" or "references" to data structures (for this
a
> Hashtable is fine).
> 2) Each data structure can be resize dynamically, each one with different
> sizes.
>
> An example could be:
> An array of pointers, each pointer points to an array of bytes. That is: 1
> array of M pointers, and M arrays of bytes (each one can hod different
> number of bytes).
>
> Thanks,
>
> james
>
>
>
> "Cor Ligthert [MVP]" <notmyfirstn***@planet.nl> escribió en el mensaje
> news:%23nuNPkhNGHA.2912@tk2msftngp13.phx.gbl...
> > James,
> >
> > The first question you have to ask yourself forever with arrays is. "Are
> > all elements of the array used". If not than in what amount not.
> >
> > Maybe is by instance the HashTable than a better choise.
> >
> > However in my opinion is there not to give a general answer based on
your
> > question as it is.
> >
> > Cor
> >
> >
>
>
>
Author
20 Feb 2006 4:39 PM
James
Ok! These are my needs:

I have a list of words (more than a million different words) I keep this
words in a hashtable for easy access. For each of the words I need to keep
in compress (binary form) the documents ID where each word belong (it can be
hundreds of documents for each word). Sometimes I need to add more documents
ID to some of the words. That is:

Words            DocumentID
Paris -> doc123, doc234, doc345, doc345,...
London->doc2, doc34, doc234,...
....
So I need to keep in memory (that is what i am looking for low memory usage)
the Words (actully I used a hashtable) and some structure to hold the
document ID's for each word. Sometimes I need to add new documents to some
of the words (normally after the last document for that word), so the data
structure that hold the documents must be able to increase it lenght.

My first thought was to use a jagged array but it consume a lot of overhead
memory:
Dim bytes(1000000)() As Byte ' 1 Million Jagged array
For intI = 1 To 1000000
ReDim bytes(intI - 1)(1) ' Each vector to 1 byte
Next
This consume: 20 Mbytes of data?? when I was just supposing it will consume
1 Mbyte

Any help??






Show quoteHide quote
"Patrice" <a@bc.c> escribió en el mensaje
news:%230JjGaiNGHA.2528@TK2MSFTNGP12.phx.gbl...
> The big picture may help (is this for things like spreadsheet cells ? What
> is the nature of those informations ?).
>
> For example a "sparse array" structure is a combination of elementar data
> structures that allows to store only filled in cells. (i.e. you can have a
> 200x200 cells array but non used cells won't take storage).
>
> Also as a side note that there is AFAIK not a single solution for this
> (for
> example basically it uses a list of rows/columns and each list holds a
> list
> of the cells on the row/column) but as long as the external interface is
> exposed as an array (cells(x,y)) you can actually change the actual
> implementation with no or minimal impact on the main code...
>
> --
> Patrice
>
> "James" <i***@pricetech.es> a écrit dans le message de
> news:u4556$hNGHA.2560@TK2MSFTNGP09.phx.gbl...
>> Probably the question is not well define. What I need:
>>
>> I need the less consuming memory data structure that allow me to:
>>
>> 1) Keep a list of "Pointers" or "references" to data structures (for this
> a
>> Hashtable is fine).
>> 2) Each data structure can be resize dynamically, each one with different
>> sizes.
>>
>> An example could be:
>> An array of pointers, each pointer points to an array of bytes. That is:
>> 1
>> array of M pointers, and M arrays of bytes (each one can hod different
>> number of bytes).
>>
>> Thanks,
>>
>> james
>>
>>
>>
>> "Cor Ligthert [MVP]" <notmyfirstn***@planet.nl> escribió en el mensaje
>> news:%23nuNPkhNGHA.2912@tk2msftngp13.phx.gbl...
>> > James,
>> >
>> > The first question you have to ask yourself forever with arrays is.
>> > "Are
>> > all elements of the array used". If not than in what amount not.
>> >
>> > Maybe is by instance the HashTable than a better choise.
>> >
>> > However in my opinion is there not to give a general answer based on
> your
>> > question as it is.
>> >
>> > Cor
>> >
>> >
>>
>>
>>
>
>
>
Author
21 Feb 2006 12:42 AM
Dennis
Have you considered using a class object that has a "Word" property and a
Collection of Document ID"s.  You would have a class instance for each word.
--
Dennis in Houston


Show quoteHide quote
"James" wrote:

> Ok! These are my needs:
>
> I have a list of words (more than a million different words) I keep this
> words in a hashtable for easy access. For each of the words I need to keep
> in compress (binary form) the documents ID where each word belong (it can be
> hundreds of documents for each word). Sometimes I need to add more documents
> ID to some of the words. That is:
>
> Words            DocumentID
> Paris -> doc123, doc234, doc345, doc345,...
> London->doc2, doc34, doc234,...
> ....
> So I need to keep in memory (that is what i am looking for low memory usage)
> the Words (actully I used a hashtable) and some structure to hold the
> document ID's for each word. Sometimes I need to add new documents to some
> of the words (normally after the last document for that word), so the data
> structure that hold the documents must be able to increase it lenght.
>
> My first thought was to use a jagged array but it consume a lot of overhead
> memory:
> Dim bytes(1000000)() As Byte ' 1 Million Jagged array
> For intI = 1 To 1000000
> ReDim bytes(intI - 1)(1) ' Each vector to 1 byte
> Next
> This consume: 20 Mbytes of data?? when I was just supposing it will consume
> 1 Mbyte
>
> Any help??
>
>
>
>
>
>
> "Patrice" <a@bc.c> escribió en el mensaje
> news:%230JjGaiNGHA.2528@TK2MSFTNGP12.phx.gbl...
> > The big picture may help (is this for things like spreadsheet cells ? What
> > is the nature of those informations ?).
> >
> > For example a "sparse array" structure is a combination of elementar data
> > structures that allows to store only filled in cells. (i.e. you can have a
> > 200x200 cells array but non used cells won't take storage).
> >
> > Also as a side note that there is AFAIK not a single solution for this
> > (for
> > example basically it uses a list of rows/columns and each list holds a
> > list
> > of the cells on the row/column) but as long as the external interface is
> > exposed as an array (cells(x,y)) you can actually change the actual
> > implementation with no or minimal impact on the main code...
> >
> > --
> > Patrice
> >
> > "James" <i***@pricetech.es> a écrit dans le message de
> > news:u4556$hNGHA.2560@TK2MSFTNGP09.phx.gbl...
> >> Probably the question is not well define. What I need:
> >>
> >> I need the less consuming memory data structure that allow me to:
> >>
> >> 1) Keep a list of "Pointers" or "references" to data structures (for this
> > a
> >> Hashtable is fine).
> >> 2) Each data structure can be resize dynamically, each one with different
> >> sizes.
> >>
> >> An example could be:
> >> An array of pointers, each pointer points to an array of bytes. That is:
> >> 1
> >> array of M pointers, and M arrays of bytes (each one can hod different
> >> number of bytes).
> >>
> >> Thanks,
> >>
> >> james
> >>
> >>
> >>
> >> "Cor Ligthert [MVP]" <notmyfirstn***@planet.nl> escribió en el mensaje
> >> news:%23nuNPkhNGHA.2912@tk2msftngp13.phx.gbl...
> >> > James,
> >> >
> >> > The first question you have to ask yourself forever with arrays is.
> >> > "Are
> >> > all elements of the array used". If not than in what amount not.
> >> >
> >> > Maybe is by instance the HashTable than a better choise.
> >> >
> >> > However in my opinion is there not to give a general answer based on
> > your
> >> > question as it is.
> >> >
> >> > Cor
> >> >
> >> >
> >>
> >>
> >>
> >
> >
> >
>
>
>
>
Author
21 Feb 2006 2:05 AM
Brian Gideon
Dennis,

That might not be a bad idea.  That class could encapsulate the
compression/decompression of the word and associated document names for
more efficient storage.  Sure, there'll be a performance hit, but with
1,000,000 word associations the OP is going to have to get creative
with the data structure.  It'll be an excerise in balancing space vs.
performance.

That's one thing you didn't tell us James.  I think we have a pretty
good handle on your memory requirements, but what about performance.
Is that important?  What is the typical scenario for accessing the data
structure?

Brian

Dennis wrote:
Show quoteHide quote
> Have you considered using a class object that has a "Word" property and a
> Collection of Document ID"s.  You would have a class instance for each word.
> --
> Dennis in Houston
>
>