Home All Groups Group Topic Archive Search About

Defining a block of consecutive rows in a datatable

Author
11 Apr 2010 10:32 AM
prodata
I've got a large set of time-series data which is organised _strictly_
in time sequence and contained in a (VB2005) datatable.

I want to iterate through this data to extract various hourly means,
which means identifying blocks of consecutive rows and computing
hourly aggregate functions (mean, SD etc).

I do have this working but by progressively working through the table
from top to bottom, identifying a block of rows with the same hour
value, copying this block of rows to a separate scratchpad datatable
and using various datatable.compute aggregate functions on all of the
rows in the scratchpad table.

I'm sure this isn't the most efficient approach in that it would be
better to work on the full original datatable and to pass a filter
expression to the datatable.compute command to indicate which
particular block of row numbers should be used in the calculation, but
being only an occasional ADO.Net user I don't know how to build the
filter.

What I want to say for the filter argument is something like 'Where
Rownumber Between R1 And R2" or 'Rownumber >=R1 AND Rownumber <=R2".
(I will know which row numbers constitute the relevant block and I
don't want to filter by datetime value - each hourly block would be
little more than 0.01% of an annual datatable, so not efficient to
repeatedly test the whole datatable just to select a particular hour's
worth of data.)

So I guess the question is whether there is any way of accessing the
row number property from a filter expression. (I guess I could
manually build in a row number column into the table and filter on
that, but maybe this is unnecesssary.)

Anyone able to help please?

Author
11 Apr 2010 1:10 PM
Cor Ligthert[MVP]
Hi gproData,

Where did you get the idea that loops take a lot of time?

If you see some short code, then it often done behind the scene as well in a
loop.

But you can of course use Linq to Dataset, but don't assume it goes quicker.

(At least it uses some more code)

http://msdn.microsoft.com/en-us/vbasic/bb688086.aspx

Success

Cor

Show quoteHide quote
"prodata" <gprod***@googlemail.com> wrote in message
news:418fa776-d981-414d-b457-3c8fbcc346a9@v20g2000yqv.googlegroups.com...
> I've got a large set of time-series data which is organised _strictly_
> in time sequence and contained in a (VB2005) datatable.
>
> I want to iterate through this data to extract various hourly means,
> which means identifying blocks of consecutive rows and computing
> hourly aggregate functions (mean, SD etc).
>
> I do have this working but by progressively working through the table
> from top to bottom, identifying a block of rows with the same hour
> value, copying this block of rows to a separate scratchpad datatable
> and using various datatable.compute aggregate functions on all of the
> rows in the scratchpad table.
>
> I'm sure this isn't the most efficient approach in that it would be
> better to work on the full original datatable and to pass a filter
> expression to the datatable.compute command to indicate which
> particular block of row numbers should be used in the calculation, but
> being only an occasional ADO.Net user I don't know how to build the
> filter.
>
> What I want to say for the filter argument is something like 'Where
> Rownumber Between R1 And R2" or 'Rownumber >=R1 AND Rownumber <=R2".
> (I will know which row numbers constitute the relevant block and I
> don't want to filter by datetime value - each hourly block would be
> little more than 0.01% of an annual datatable, so not efficient to
> repeatedly test the whole datatable just to select a particular hour's
> worth of data.)
>
> So I guess the question is whether there is any way of accessing the
> row number property from a filter expression. (I guess I could
> manually build in a row number column into the table and filter on
> that, but maybe this is unnecesssary.)
>
> Anyone able to help please?
Author
11 Apr 2010 1:48 PM
Cor Ligthert[MVP]
By the way a block of consecutive rows in a datatable don't exist, in fact
they are ordered in the ways they are added without looking at the content.

But you can use a DataView to get a view on the rows in a consecutive way by
using the sort and rowfilter properties.

Show quoteHide quote
"prodata" <gprod***@googlemail.com> wrote in message
news:418fa776-d981-414d-b457-3c8fbcc346a9@v20g2000yqv.googlegroups.com...
> I've got a large set of time-series data which is organised _strictly_
> in time sequence and contained in a (VB2005) datatable.
>
> I want to iterate through this data to extract various hourly means,
> which means identifying blocks of consecutive rows and computing
> hourly aggregate functions (mean, SD etc).
>
> I do have this working but by progressively working through the table
> from top to bottom, identifying a block of rows with the same hour
> value, copying this block of rows to a separate scratchpad datatable
> and using various datatable.compute aggregate functions on all of the
> rows in the scratchpad table.
>
> I'm sure this isn't the most efficient approach in that it would be
> better to work on the full original datatable and to pass a filter
> expression to the datatable.compute command to indicate which
> particular block of row numbers should be used in the calculation, but
> being only an occasional ADO.Net user I don't know how to build the
> filter.
>
> What I want to say for the filter argument is something like 'Where
> Rownumber Between R1 And R2" or 'Rownumber >=R1 AND Rownumber <=R2".
> (I will know which row numbers constitute the relevant block and I
> don't want to filter by datetime value - each hourly block would be
> little more than 0.01% of an annual datatable, so not efficient to
> repeatedly test the whole datatable just to select a particular hour's
> worth of data.)
>
> So I guess the question is whether there is any way of accessing the
> row number property from a filter expression. (I guess I could
> manually build in a row number column into the table and filter on
> that, but maybe this is unnecesssary.)
>
> Anyone able to help please?
Author
11 Apr 2010 2:50 PM
prodata
On 11 Apr, 14:48, "Cor Ligthert[MVP]" <Notmyfirstn***@planet.nl>
wrote:
> By the way a block of consecutive rows in a datatable don't exist, in fact
> they are ordered in the ways they are added without looking at the content.

Yes I understand what you're saying. But in fact I control the order
that the rows are added in when creating the initial datatable so - at
least assuming that this order doesn't change during the lifetime of
the datatable - I can be confident that when I imagine that a given
block of rows is in strict datetime order then it really is.

But perhaps I might take from your comment that what I'm seeking to do
actually represents a relatively unusual and artificial situation and
so maybe is not well provided for by ADO.Net functions. Maybe I would
be better adding a column to the datatable that represents integer
hours since the very first entry in the datatable (which would be
calculated as each individual datarow was added) and then simply
performing the aggregate functions on each valid hour value in turn
(ie using the hour value as the filter in the .compute command).
Author
11 Apr 2010 3:22 PM
Mr. Arnold
prodata wrote:
Show quoteHide quote
> On 11 Apr, 14:48, "Cor Ligthert[MVP]" <Notmyfirstn***@planet.nl>
> wrote:
>> By the way a block of consecutive rows in a datatable don't exist, in fact
>> they are ordered in the ways they are added without looking at the content.
>
> Yes I understand what you're saying. But in fact I control the order
> that the rows are added in when creating the initial datatable so - at
> least assuming that this order doesn't change during the lifetime of
> the datatable - I can be confident that when I imagine that a given
> block of rows is in strict datetime order then it really is.
>
> But perhaps I might take from your comment that what I'm seeking to do
> actually represents a relatively unusual and artificial situation and
> so maybe is not well provided for by ADO.Net functions. Maybe I would
> be better adding a column to the datatable that represents integer
> hours since the very first entry in the datatable (which would be
> calculated as each individual datarow was added) and then simply
> performing the aggregate functions on each valid hour value in turn
> (ie using the hour value as the filter in the .compute command).


Datatable? Why bother? Why are you not using a List<T> of accessor
objects? Why are you not using Linq with a where clause to query the
List<T> of objects, along with using the Linq aggregate functions?
Author
11 Apr 2010 4:31 PM
Cor Ligthert[MVP]
Why do you create an extra collections when you have already a Ilist
implementing collection of rows.

Are you afraid that you don't use all the memory in a computer.

A datatable is a very good IListSource implementing Type



Show quoteHide quote
"Mr. Arnold" <Arn***@Arnold.com> wrote in message
news:OU2vNrY2KHA.5660@TK2MSFTNGP04.phx.gbl...
> prodata wrote:
>> On 11 Apr, 14:48, "Cor Ligthert[MVP]" <Notmyfirstn***@planet.nl>
>> wrote:
>>> By the way a block of consecutive rows in a datatable don't exist, in
>>> fact
>>> they are ordered in the ways they are added without looking at the
>>> content.
>>
>> Yes I understand what you're saying. But in fact I control the order
>> that the rows are added in when creating the initial datatable so - at
>> least assuming that this order doesn't change during the lifetime of
>> the datatable - I can be confident that when I imagine that a given
>> block of rows is in strict datetime order then it really is.
>>
>> But perhaps I might take from your comment that what I'm seeking to do
>> actually represents a relatively unusual and artificial situation and
>> so maybe is not well provided for by ADO.Net functions. Maybe I would
>> be better adding a column to the datatable that represents integer
>> hours since the very first entry in the datatable (which would be
>> calculated as each individual datarow was added) and then simply
>> performing the aggregate functions on each valid hour value in turn
>> (ie using the hour value as the filter in the .compute command).
>
>
> Datatable? Why bother? Why are you not using a List<T> of accessor
> objects? Why are you not using Linq with a where clause to query the
> List<T> of objects, along with using the Linq aggregate functions?
Author
11 Apr 2010 5:29 PM
Mr. Arnold
Cor Ligthert[MVP] wrote:
> Why do you create an extra collections when you have already a Ilist
> implementing collection of rows.

I moved away from datatables and datasets the moment I went to ADO.NET
Entity Framework and Linq-2-Objects.
>
> Are you afraid that you don't use all the memory in a computer.

Accessing a datatable has been proven to be a slow means of accessing
data, much slower to access than a collections of objects when it comes
to querying.
>
> A datatable is a very good IListSource implementing Type

I prefer the List<T> of objects with its ability to be quired using
Linq-2-Objects. Objects are much more flexible. I'll take objects every
time over datatable and dataset.


Show quoteHide quote
>
>
>
> "Mr. Arnold" <Arn***@Arnold.com> wrote in message
> news:OU2vNrY2KHA.5660@TK2MSFTNGP04.phx.gbl...
>> prodata wrote:
>>> On 11 Apr, 14:48, "Cor Ligthert[MVP]" <Notmyfirstn***@planet.nl>
>>> wrote:
>>>> By the way a block of consecutive rows in a datatable don't exist,
>>>> in fact
>>>> they are ordered in the ways they are added without looking at the
>>>> content.
>>>
>>> Yes I understand what you're saying. But in fact I control the order
>>> that the rows are added in when creating the initial datatable so - at
>>> least assuming that this order doesn't change during the lifetime of
>>> the datatable - I can be confident that when I imagine that a given
>>> block of rows is in strict datetime order then it really is.
>>>
>>> But perhaps I might take from your comment that what I'm seeking to do
>>> actually represents a relatively unusual and artificial situation and
>>> so maybe is not well provided for by ADO.Net functions. Maybe I would
>>> be better adding a column to the datatable that represents integer
>>> hours since the very first entry in the datatable (which would be
>>> calculated as each individual datarow was added) and then simply
>>> performing the aggregate functions on each valid hour value in turn
>>> (ie using the hour value as the filter in the .compute command).
>>
>>
>> Datatable? Why bother? Why are you not using a List<T> of accessor
>> objects? Why are you not using Linq with a where clause to query the
>> List<T> of objects, along with using the Linq aggregate functions?
>
Author
11 Apr 2010 4:26 PM
Cor Ligthert[MVP]
Why not, looks for me without your current need something you could do.

Seems to me vital for the data you are storing, as far as I understand it
from your reply.


Show quoteHide quote
"prodata" <gprod***@googlemail.com> wrote in message
news:df95b179-7f4d-4e51-9bcb-aba3496bab8d@8g2000yqz.googlegroups.com...
> On 11 Apr, 14:48, "Cor Ligthert[MVP]" <Notmyfirstn***@planet.nl>
> wrote:
>> By the way a block of consecutive rows in a datatable don't exist, in
>> fact
>> they are ordered in the ways they are added without looking at the
>> content.
>
> Yes I understand what you're saying. But in fact I control the order
> that the rows are added in when creating the initial datatable so - at
> least assuming that this order doesn't change during the lifetime of
> the datatable - I can be confident that when I imagine that a given
> block of rows is in strict datetime order then it really is.
>
> But perhaps I might take from your comment that what I'm seeking to do
> actually represents a relatively unusual and artificial situation and
> so maybe is not well provided for by ADO.Net functions. Maybe I would
> be better adding a column to the datatable that represents integer
> hours since the very first entry in the datatable (which would be
> calculated as each individual datarow was added) and then simply
> performing the aggregate functions on each valid hour value in turn
> (ie using the hour value as the filter in the .compute command).
Author
11 Apr 2010 4:33 PM
Cor Ligthert[MVP]
Why Not? Looks for me, without knowing all your current needs, something you
could do.

Show quoteHide quote
"Cor Ligthert[MVP]" <Notmyfirstn***@planet.nl> wrote in message
news:O#l8BPZ2KHA.4912@TK2MSFTNGP06.phx.gbl...
> Why not, looks for me without your current need something you could do.
>
> Seems to me vital for the data you are storing, as far as I understand it
> from your reply.
>
>
> "prodata" <gprod***@googlemail.com> wrote in message
> news:df95b179-7f4d-4e51-9bcb-aba3496bab8d@8g2000yqz.googlegroups.com...
>> On 11 Apr, 14:48, "Cor Ligthert[MVP]" <Notmyfirstn***@planet.nl>
>> wrote:
>>> By the way a block of consecutive rows in a datatable don't exist, in
>>> fact
>>> they are ordered in the ways they are added without looking at the
>>> content.
>>
>> Yes I understand what you're saying. But in fact I control the order
>> that the rows are added in when creating the initial datatable so - at
>> least assuming that this order doesn't change during the lifetime of
>> the datatable - I can be confident that when I imagine that a given
>> block of rows is in strict datetime order then it really is.
>>
>> But perhaps I might take from your comment that what I'm seeking to do
>> actually represents a relatively unusual and artificial situation and
>> so maybe is not well provided for by ADO.Net functions. Maybe I would
>> be better adding a column to the datatable that represents integer
>> hours since the very first entry in the datatable (which would be
>> calculated as each individual datarow was added) and then simply
>> performing the aggregate functions on each valid hour value in turn
>> (ie using the hour value as the filter in the .compute command).
>