Home All Groups Group Topic Archive Search About

GetDirectories Performance

Author
22 Apr 2006 11:07 PM
Tom Scales
I'm writing a VB.NET 2003 program that uses a Treeview to display the drive
structure on the computer.  I am having a major problem with performance.
The are many files on one drive (over a million) and it is killing me.  For
example, one directory has a structure:

Main

------ Sub directory

-------------- 10 Sub directories, each with roughly 30,000 files.



When I execute this code on the Sub directory (i.e. strPath = SubDirectory)

'------------------------------------------------------------------------
Dim strPath As String = tn.FullPath ' Get the parent's path
Dim diDirectory As New DirectoryInfo(strPath)
Dim adiDirectories() As DirectoryInfo

     Try

     ' Get an array of all sub-directories as DirectoryInfo objects.

          adiDirectories = diDirectory.GetDirectories()

     Catch exp As Exception

          Exit Sub

     End Try
'------------------------------------------------------------------------

it takes over 15 minutes to complete.  This is on a P4-2.66 running XP Pro.

Any suggestions for improvement?

Thanks,

Tom

Author
22 Apr 2006 11:32 PM
Herfried K. Wagner [MVP]
"Tom Scales" <tjsca***@gmail.com> schrieb:
> I'm writing a VB.NET 2003 program that uses a Treeview to display the
> drive
> structure on the computer.  I am having a major problem with performance.
> The are many files on one drive (over a million) and it is killing me.
> For
> example, one directory has a structure:

Instead of populating the whole treeview control on startup, only add the
nodes on the first level and check if the folders contain subfolders.  If
the latter is the case, add a dummy subnode to the node representing the
folder.  Then catch the node expand event and replace the dummy node with
nodes for the actual files and folders contained in the folder.  This should
lead to acceptable performance and memory usage would be much lower than
populating the whole control.  In addition, in many cases it's very unlikely
that the user will expand every single node and thus much less memory will
be occupied by your application in total.

--
M S   Herfried K. Wagner
M V P  <URL:http://dotnet.mvps.org/>
V B   <URL:http://classicvb.org/petition/>
Author
23 Apr 2006 12:08 AM
Tom Scales
Show quote Hide quote
"Herfried K. Wagner [MVP]" <hirf-spam-me-here@gmx.at> wrote in message
news:OJXiDUmZGHA.1200@TK2MSFTNGP03.phx.gbl...
> "Tom Scales" <tjsca***@gmail.com> schrieb:
>> I'm writing a VB.NET 2003 program that uses a Treeview to display the
>> drive
>> structure on the computer.  I am having a major problem with performance.
>> The are many files on one drive (over a million) and it is killing me.
>> For
>> example, one directory has a structure:
>
> Instead of populating the whole treeview control on startup, only add the
> nodes on the first level and check if the folders contain subfolders.  If
> the latter is the case, add a dummy subnode to the node representing the
> folder.  Then catch the node expand event and replace the dummy node with
> nodes for the actual files and folders contained in the folder.  This
> should lead to acceptable performance and memory usage would be much lower
> than populating the whole control.  In addition, in many cases it's very
> unlikely that the user will expand every single node and thus much less
> memory will be occupied by your application in total.
>
> --
> M S   Herfried K. Wagner
> M V P  <URL:http://dotnet.mvps.org/>
> V B   <URL:http://classicvb.org/petition/>

Unfortunately, that is essentially what I am doing.  I am adding enough
nodes to tell me if I need to add the + sign.  That's where I get bitten,
because the directory UNDER the one I am working with has 30,000+ files.
GetDirectories must search every file to see if it is a directory.  Very
inefficience code on MS' part.
Author
23 Apr 2006 5:07 AM
Frank Rizzo
Tom Scales wrote:
Show quoteHide quote
> "Herfried K. Wagner [MVP]" <hirf-spam-me-here@gmx.at> wrote in message
> news:OJXiDUmZGHA.1200@TK2MSFTNGP03.phx.gbl...
>> "Tom Scales" <tjsca***@gmail.com> schrieb:
>>> I'm writing a VB.NET 2003 program that uses a Treeview to display the
>>> drive
>>> structure on the computer.  I am having a major problem with performance.
>>> The are many files on one drive (over a million) and it is killing me.
>>> For
>>> example, one directory has a structure:
>> Instead of populating the whole treeview control on startup, only add the
>> nodes on the first level and check if the folders contain subfolders.  If
>> the latter is the case, add a dummy subnode to the node representing the
>> folder.  Then catch the node expand event and replace the dummy node with
>> nodes for the actual files and folders contained in the folder.  This
>> should lead to acceptable performance and memory usage would be much lower
>> than populating the whole control.  In addition, in many cases it's very
>> unlikely that the user will expand every single node and thus much less
>> memory will be occupied by your application in total.
>>
>> --
>> M S   Herfried K. Wagner
>> M V P  <URL:http://dotnet.mvps.org/>
>> V B   <URL:http://classicvb.org/petition/>
>
> Unfortunately, that is essentially what I am doing.  I am adding enough
> nodes to tell me if I need to add the + sign.  That's where I get bitten,
> because the directory UNDER the one I am working with has 30,000+ files.
> GetDirectories must search every file to see if it is a directory.  Very
> inefficience code on MS' part.


Tom, it sounds like you would be better served by doing P/Invoke via
FindFirst and FindNext API calls.  The way they work is through one file
at a time, thus, you'll have much more control over your operation.

In addition, another thing that maybe slowing you down is the actual
treeview (which is also very inefficient).  I'd advise to you to devise
a quick test to load 30,000 nodes and see how fast it is loading.
You'll be surprised at how slow it will be.  To get around this problem
I went with a 3rd party tree list control from
http://www.bennet-tec.com/ called TList.  Their claim to fame is the
speed and based on my usage it is not idle talk.  It truly is pedal to
the metal.

Regards



Show quoteHide quote
>
>
>
Author
23 Apr 2006 7:30 AM
Cor Ligthert [MVP]
Tom,

I miss the beginupdate and the endupdate in your code. If that is in the
real situation as well. Than you will have very iffencient code.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemwindowsformstreeviewclassbeginupdatetopic.asp

I hope this helps,

Cor

Show quoteHide quote
"Tom Scales" <tjsca***@gmail.com> schreef in bericht
news:uAz2g.13049$5b2.8680@tornado.tampabay.rr.com...
>
> "Herfried K. Wagner [MVP]" <hirf-spam-me-here@gmx.at> wrote in message
> news:OJXiDUmZGHA.1200@TK2MSFTNGP03.phx.gbl...
>> "Tom Scales" <tjsca***@gmail.com> schrieb:
>>> I'm writing a VB.NET 2003 program that uses a Treeview to display the
>>> drive
>>> structure on the computer.  I am having a major problem with
>>> performance.
>>> The are many files on one drive (over a million) and it is killing me.
>>> For
>>> example, one directory has a structure:
>>
>> Instead of populating the whole treeview control on startup, only add the
>> nodes on the first level and check if the folders contain subfolders.  If
>> the latter is the case, add a dummy subnode to the node representing the
>> folder.  Then catch the node expand event and replace the dummy node with
>> nodes for the actual files and folders contained in the folder.  This
>> should lead to acceptable performance and memory usage would be much
>> lower than populating the whole control.  In addition, in many cases it's
>> very unlikely that the user will expand every single node and thus much
>> less memory will be occupied by your application in total.
>>
>> --
>> M S   Herfried K. Wagner
>> M V P  <URL:http://dotnet.mvps.org/>
>> V B   <URL:http://classicvb.org/petition/>
>
> Unfortunately, that is essentially what I am doing.  I am adding enough
> nodes to tell me if I need to add the + sign.  That's where I get bitten,
> because the directory UNDER the one I am working with has 30,000+ files.
> GetDirectories must search every file to see if it is a directory.  Very
> inefficience code on MS' part.
>
>
>
Author
23 Apr 2006 12:11 PM
Tom Scales
My snippet didn't show it, but, yes, I have the Begin/End Update
Show quoteHide quote
"Cor Ligthert [MVP]" <notmyfirstn***@planet.nl> wrote in message
news:endaTeqZGHA.3740@TK2MSFTNGP03.phx.gbl...
> Tom,
>
> I miss the beginupdate and the endupdate in your code. If that is in the
> real situation as well. Than you will have very iffencient code.
>
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemwindowsformstreeviewclassbeginupdatetopic.asp
>
> I hope this helps,
>
> Cor
>
> "Tom Scales" <tjsca***@gmail.com> schreef in bericht
> news:uAz2g.13049$5b2.8680@tornado.tampabay.rr.com...
>>
>> "Herfried K. Wagner [MVP]" <hirf-spam-me-here@gmx.at> wrote in message
>> news:OJXiDUmZGHA.1200@TK2MSFTNGP03.phx.gbl...
>>> "Tom Scales" <tjsca***@gmail.com> schrieb:
>>>> I'm writing a VB.NET 2003 program that uses a Treeview to display the
>>>> drive
>>>> structure on the computer.  I am having a major problem with
>>>> performance.
>>>> The are many files on one drive (over a million) and it is killing me.
>>>> For
>>>> example, one directory has a structure:
>>>
>>> Instead of populating the whole treeview control on startup, only add
>>> the nodes on the first level and check if the folders contain
>>> subfolders.  If the latter is the case, add a dummy subnode to the node
>>> representing the folder.  Then catch the node expand event and replace
>>> the dummy node with nodes for the actual files and folders contained in
>>> the folder.  This should lead to acceptable performance and memory usage
>>> would be much lower than populating the whole control.  In addition, in
>>> many cases it's very unlikely that the user will expand every single
>>> node and thus much less memory will be occupied by your application in
>>> total.
>>>
>>> --
>>> M S   Herfried K. Wagner
>>> M V P  <URL:http://dotnet.mvps.org/>
>>> V B   <URL:http://classicvb.org/petition/>
>>
>> Unfortunately, that is essentially what I am doing.  I am adding enough
>> nodes to tell me if I need to add the + sign.  That's where I get bitten,
>> because the directory UNDER the one I am working with has 30,000+ files.
>> GetDirectories must search every file to see if it is a directory.  Very
>> inefficience code on MS' part.
>>
>>
>>
>
>
Author
23 Apr 2006 11:26 AM
Herfried K. Wagner [MVP]
"Tom Scales" <tjsca***@gmail.com> schrieb:
> Unfortunately, that is essentially what I am doing.  I am adding enough
> nodes to tell me if I need to add the + sign.  That's where I get bitten,
> because the directory UNDER the one I am working with has 30,000+ files.
> GetDirectories must search every file to see if it is a directory.  Very
> inefficience code on MS' part.

I think it doesn't really matter whether there are files or folders in the
folder, because if there is more than one entry, you simply add a single
dummy node.  You can use 'Directory.GetFileSystemEntries' for this purpose.
No need to deal with 'FileInfo' and 'DirectoryInfo'.  Instead you can use
the 'Directory' class.

--
M S   Herfried K. Wagner
M V P  <URL:http://dotnet.mvps.org/>
V B   <URL:http://classicvb.org/petition/>
Author
23 Apr 2006 11:30 AM
Tom Scales
Show quote Hide quote
"Herfried K. Wagner [MVP]" <hirf-spam-me-here@gmx.at> wrote in message
news:%23iO19isZGHA.440@TK2MSFTNGP05.phx.gbl...
> "Tom Scales" <tjsca***@gmail.com> schrieb:
>> Unfortunately, that is essentially what I am doing.  I am adding enough
>> nodes to tell me if I need to add the + sign.  That's where I get bitten,
>> because the directory UNDER the one I am working with has 30,000+ files.
>> GetDirectories must search every file to see if it is a directory.  Very
>> inefficience code on MS' part.
>
> I think it doesn't really matter whether there are files or folders in the
> folder, because if there is more than one entry, you simply add a single
> dummy node.  You can use 'Directory.GetFileSystemEntries' for this
> purpose. No need to deal with 'FileInfo' and 'DirectoryInfo'.  Instead you
> can use the 'Directory' class.
>
> --
> M S   Herfried K. Wagner
> M V P  <URL:http://dotnet.mvps.org/>
> V B   <URL:http://classicvb.org/petition/>

OK, I understand.  Let me play around with it some more.  I'm not adding the
30,000 entries to the treeview, of course, as they are not directories.

Good advice from all.

Thanks!

Tom