Home All Groups Group Topic Archive Search About

Outlook MSG file reading

Author
14 Apr 2006 10:39 PM
Dmitry Akselrod
Hello everyone,

I am attempting to extract some header information from typical Microsoft
Outlook MSG files in VB.NET.  I am not after a complete message or
attachments that may be enclosed.  I am particularly interested in the
Message ID field.  I have examined MSG files in notepad and hex editors.  I
can see that the Internet Headers are there and present.  I can do a search
for Message-ID and locate it without any problems in notepad.  The only
display issue I have seen so far is that each letter is separated by hex
character 00.  Thus the Message-ID string would actually be, M e s s a g e -
I D.

I don't want to use Outlook automation.  I have found it to be cumbersome
and slow.  I also don't want to be reliant on an installation of Office.

Since the file is binary, I have attempted to use the System.IO.StreamFile
object to read the file.  However, I have
not been able successfully walk through the file and obtain any readable
text.  I have played around with various encodings, such as ASCII and
Unicode.  I think that MSG files are BASE64/Mime encoded though.  Perhaps
that could be part of my trouble.

I have downloaded several example applications that mimic Notepad.  However,
none of them have been able to read the encoding of MSG files.  I have
gained a new level of appreciation for Notepad :).  I wander what it is that
notepad uses to detect the file encoding and display it in such a readable
way.

Does anyone have any experience with reading Outlook data?  Again, I am not
after pretty formatting, I just want to extract certain text fragments from
these binary files.  Can someone point me in the right direction?  I would
think that I just need to be able to read Byte Sream from the file with the
correct encoding and convert it to ASCII text.  I have been totally
unsuccessful so far.

Thanks,
Dmitry

Author
14 Apr 2006 11:35 PM
Homer J Simpson
"Dmitry Akselrod" <dmitry@nospam.com> wrote in message
news:hKKdnRpyzo83ud3Z4p2dnA@comcast.com...

> Does anyone have any experience with reading Outlook data?  Again, I am
> not after pretty formatting, I just want to extract certain text fragments
> from these binary files.  Can someone point me in the right direction?  I
> would think that I just need to be able to read Byte Sream from the file
> with the correct encoding and convert it to ASCII text.  I have been
> totally unsuccessful so far.

Outlook can be automated, just like Word, Excel etc. It's a bit cranky, but
I have done it. Have you tried adding a reference to it?
Author
15 Apr 2006 12:10 AM
Dmitry Akselrod
Hi,

That's my whole thing is that I don't want to automate Outlook.  It's very
clunky.  I need to be able to process millions of MSG files and Office
products (i.e. Access) suck with that many files.

Thank you though.

dmitry

"Homer J Simpson" <nob***@nowhere.com> wrote in message
news:QkW%f.89413$%H.47856@clgrps13...
Show quoteHide quote
>
> "Dmitry Akselrod" <dmitry@nospam.com> wrote in message
> news:hKKdnRpyzo83ud3Z4p2dnA@comcast.com...
>
>> Does anyone have any experience with reading Outlook data?  Again, I am
>> not after pretty formatting, I just want to extract certain text
>> fragments from these binary files.  Can someone point me in the right
>> direction?  I would think that I just need to be able to read Byte Sream
>> from the file with the correct encoding and convert it to ASCII text.  I
>> have been totally unsuccessful so far.
>
> Outlook can be automated, just like Word, Excel etc. It's a bit cranky,
> but I have done it. Have you tried adding a reference to it?
>
>
>
Author
15 Apr 2006 12:52 AM
Homer J Simpson
"Dmitry Akselrod" <dmitry@nospam.com> wrote in message
news:27OdnVKqzY5GpN3ZnZ2dnUVZ_vqdnZ2d@comcast.com...

> That's my whole thing is that I don't want to automate Outlook.  It's very
> clunky.  I need to be able to process millions of MSG files and Office
> products (i.e. Access) suck with that many files.

In that case I'd start searching for third party tools. I assume that MSFT
aren't offering to divulge the details of the format.
Author
15 Apr 2006 1:46 AM
Dmitry Akselrod
No, MS is definitely not documenting their MSG format.   I did find this
article:

http://www.msusenet.com/archive/topic.php/t-288764.html

A gentleman, named Eduardo A. Morcillo has developed some .NET classes that
wrap the Office OLE storage.  They are pretty good so far.  The classes are
here:

http://www.mvps.org/emorcillo/en/code/grl/storage.shtml

I have been able to take a couple of MSG files and obtain a list of streams
(properties) and their values.  However, I am still missing the Internet
Headers.  They must lie somewhere else in the file.  All of this is quite
annoying, thanks to Microsoft.

The only known working API I have seen so far (used by many forensic
applications) is from Fookes software.  These guys are great and their tools
are phenomenal, but the API is a little outside my price range.

Being able to obtain the Sender, Recipient, Subject, etc. is definitely a
plus, but I need the Message ID.  I guess it's back to more research.

Dmitry


Basically, the MSG file format is a series of binary streams.
"Homer J Simpson" <nob***@nowhere.com> wrote in message
news:UsX%f.89569$%H.59346@clgrps13...
Show quoteHide quote
>
> "Dmitry Akselrod" <dmitry@nospam.com> wrote in message
> news:27OdnVKqzY5GpN3ZnZ2dnUVZ_vqdnZ2d@comcast.com...
>
>> That's my whole thing is that I don't want to automate Outlook.  It's
>> very clunky.  I need to be able to process millions of MSG files and
>> Office products (i.e. Access) suck with that many files.
>
> In that case I'd start searching for third party tools. I assume that MSFT
> aren't offering to divulge the details of the format.
>
>
Author
15 Apr 2006 4:29 AM
Dmitry Akselrod
Actually, never mind on the Internet Headers, they are there.  They happen
to be stream, __substg1.0_007D001F.  I just had some issues with data
formatting and conversion.  I think that my problem is solved, thanks to Mr.
Morcillo.

dmitry

Show quoteHide quote
"Dmitry Akselrod" <dmitry@nospam.com> wrote in message
news:ScWdnZVeMu_8zd3ZnZ2dnUVZ_u2dnZ2d@comcast.com...
> No, MS is definitely not documenting their MSG format.   I did find this
> article:
>
> http://www.msusenet.com/archive/topic.php/t-288764.html
>
> A gentleman, named Eduardo A. Morcillo has developed some .NET classes
> that wrap the Office OLE storage.  They are pretty good so far.  The
> classes are here:
>
> http://www.mvps.org/emorcillo/en/code/grl/storage.shtml
>
> I have been able to take a couple of MSG files and obtain a list of
> streams (properties) and their values.  However, I am still missing the
> Internet Headers.  They must lie somewhere else in the file.  All of this
> is quite annoying, thanks to Microsoft.
>
> The only known working API I have seen so far (used by many forensic
> applications) is from Fookes software.  These guys are great and their
> tools are phenomenal, but the API is a little outside my price range.
>
> Being able to obtain the Sender, Recipient, Subject, etc. is definitely a
> plus, but I need the Message ID.  I guess it's back to more research.
>
> Dmitry
>
>
> Basically, the MSG file format is a series of binary streams.
> "Homer J Simpson" <nob***@nowhere.com> wrote in message
> news:UsX%f.89569$%H.59346@clgrps13...
>>
>> "Dmitry Akselrod" <dmitry@nospam.com> wrote in message
>> news:27OdnVKqzY5GpN3ZnZ2dnUVZ_vqdnZ2d@comcast.com...
>>
>>> That's my whole thing is that I don't want to automate Outlook.  It's
>>> very clunky.  I need to be able to process millions of MSG files and
>>> Office products (i.e. Access) suck with that many files.
>>
>> In that case I'd start searching for third party tools. I assume that
>> MSFT aren't offering to divulge the details of the format.
>>
>>
>
>
Author
15 Apr 2006 1:40 AM
John Bailo
Check out the Redemption COM object:

http://www.dimastr.com/redemption/



Dmitry Akselrod wrote:

Show quoteHide quote
> Hello everyone,
>
> I am attempting to extract some header information from typical Microsoft
> Outlook MSG files in VB.NET.  I am not after a complete message or
> attachments that may be enclosed.  I am particularly interested in the
> Message ID field.  I have examined MSG files in notepad and hex editors.
> I
> can see that the Internet Headers are there and present.  I can do a
> search
> for Message-ID and locate it without any problems in notepad.  The only
> display issue I have seen so far is that each letter is separated by hex
> character 00.  Thus the Message-ID string would actually be, M e s s a g e
> - I D.
>
> I don't want to use Outlook automation.  I have found it to be cumbersome
> and slow.  I also don't want to be reliant on an installation of Office.
>
> Since the file is binary, I have attempted to use the System.IO.StreamFile
> object to read the file.  However, I have
> not been able successfully walk through the file and obtain any readable
> text.  I have played around with various encodings, such as ASCII and
> Unicode.  I think that MSG files are BASE64/Mime encoded though.  Perhaps
> that could be part of my trouble.
>
> I have downloaded several example applications that mimic Notepad.
> However,
> none of them have been able to read the encoding of MSG files.  I have
> gained a new level of appreciation for Notepad :).  I wander what it is
> that notepad uses to detect the file encoding and display it in such a
> readable way.
>
> Does anyone have any experience with reading Outlook data?  Again, I am
> not after pretty formatting, I just want to extract certain text fragments
> from
> these binary files.  Can someone point me in the right direction?  I would
> think that I just need to be able to read Byte Sream from the file with
> the
> correct encoding and convert it to ASCII text.  I have been totally
> unsuccessful so far.
>
> Thanks,
> Dmitry

--
Texeme Textcasting powers
http://www.you-read-it-here-first.com