Home All Groups Group Topic Archive Search About

UTF8 Encoding and MP3 tags

Author
24 Sep 2006 2:52 AM
gene kelley
I have an app that has a class for reading mp3 tags.
According to id3.org, ID3V2.3 tags are always UTF8 encoded (unless flagged
otherwise).

In a routine, I have this line which returns an array of characters:
System.Text.Encoding.UTF8.GetChars(tBytes)

The returned characters are correct 99% of the time.  The 1% problem cases are those
tag strings that contain characters > Chr(128).
For example:
André Rieu is an expected result.  The actual result is Andr  Rieu.

Can anyone tell me if I'm missing something here, or, is this just another case of
non-compliant ID3V2.3 tags (not unusual)?

Gene

Author
24 Sep 2006 3:04 PM
Dennis
All text frames have the encoding byte as the first byte after the frame
header for text frames.  A $00 indicates ISO8859 (UTF-8) and a $01 indicates
Unicode.  If it's Unicode, you have to check for the BOM as well.  I have
written my own tag reader/writer and I've not had any problems reading any
mp3 files.  I do, however, check for invalid characters and eliminate them as
some tags are written by poorly written tag writers.  Even the best tag
writer's I've found don't fully implement the ID3V2.3 specs with regard to
encoding.
--
Dennis in Houston


Show quoteHide quote
"gene kelley" wrote:

>
> I have an app that has a class for reading mp3 tags.
> According to id3.org, ID3V2.3 tags are always UTF8 encoded (unless flagged
> otherwise).
>
> In a routine, I have this line which returns an array of characters:
> System.Text.Encoding.UTF8.GetChars(tBytes)
>
> The returned characters are correct 99% of the time.  The 1% problem cases are those
> tag strings that contain characters > Chr(128).
> For example:
> André Rieu is an expected result.  The actual result is Andr  Rieu.
>
> Can anyone tell me if I'm missing something here, or, is this just another case of
> non-compliant ID3V2.3 tags (not unusual)?
>
> Gene
>
Author
24 Sep 2006 9:27 PM
gene kelley
On Sun, 24 Sep 2006 08:04:01 -0700, Dennis <Den***@discussions.microsoft.com> wrote:

>All text frames have the encoding byte as the first byte after the frame
>header for text frames.  A $00 indicates ISO8859 (UTF-8) and a $01 indicates
>Unicode.  If it's Unicode, you have to check for the BOM as well.  I have
>written my own tag reader/writer and I've not had any problems reading any
>mp3 files.  I do, however, check for invalid characters and eliminate them as
>some tags are written by poorly written tag writers.  Even the best tag
>writer's I've found don't fully implement the ID3V2.3 specs with regard to
>encoding.


OK, I reworked the routine and seems to work as expected now.  I have randomly tried
about 200 mp3 albums from what I have on hand and had no issues show up with any of
those tested.

Do you happen to know how the name of the mp3 file's CODEC is retrived?

Gene
Author
24 Sep 2006 10:50 PM
Dennis
Not sure what you mean by CODEC name.
--
Dennis in Houston


Show quoteHide quote
"gene kelley" wrote:

> On Sun, 24 Sep 2006 08:04:01 -0700, Dennis <Den***@discussions.microsoft.com> wrote:
>
> >All text frames have the encoding byte as the first byte after the frame
> >header for text frames.  A $00 indicates ISO8859 (UTF-8) and a $01 indicates
> >Unicode.  If it's Unicode, you have to check for the BOM as well.  I have
> >written my own tag reader/writer and I've not had any problems reading any
> >mp3 files.  I do, however, check for invalid characters and eliminate them as
> >some tags are written by poorly written tag writers.  Even the best tag
> >writer's I've found don't fully implement the ID3V2.3 specs with regard to
> >encoding.
>
>
> OK, I reworked the routine and seems to work as expected now.  I have randomly tried
> about 200 mp3 albums from what I have on hand and had no issues show up with any of
> those tested.
>
> Do you happen to know how the name of the mp3 file's CODEC is retrived?
>
> Gene
>
Author
25 Sep 2006 3:22 AM
gene kelley
On Sun, 24 Sep 2006 15:50:01 -0700, Dennis <Den***@discussions.microsoft.com> wrote:

>Not sure what you mean by CODEC name.

I guess it's also referred to as "Encoder" in some apps.

Typical expected result would be something like "Lame 3.98" or "FhG".

Gene
Author
26 Sep 2006 12:28 AM
Dennis
The ID3 tags have no frames that I know of for embedding any CODEC
information.  The MP3 music header should provide sufficient information to
determine what encoder is needed, i.e., MPEG1 or MPEG2 as encoded in the 4th
bit of the second byte of the header (ID bit).
--
Dennis in Houston


Show quoteHide quote
"gene kelley" wrote:

> On Sun, 24 Sep 2006 15:50:01 -0700, Dennis <Den***@discussions.microsoft.com> wrote:
>
> >Not sure what you mean by CODEC name.
>
> I guess it's also referred to as "Encoder" in some apps.
>
> Typical expected result would be something like "Lame 3.98" or "FhG".
>
> Gene
>
Author
26 Sep 2006 3:54 AM
gene kelley
On Mon, 25 Sep 2006 17:28:02 -0700, Dennis <Den***@discussions.microsoft.com> wrote:

>The ID3 tags have no frames that I know of for embedding any CODEC
>information.  The MP3 music header should provide sufficient information to
>determine what encoder is needed, i.e., MPEG1 or MPEG2 as encoded in the 4th
>bit of the second byte of the header (ID bit).

Frame TSSE was apparently designed for that purpose, but it is very rarely used.

The encoder info, though, would be more a function of the encoding software as to
where, in the mp3 file, it writes it's signature.  "Lame3.xx" is plainly visible in
several location of an mp3 file when viewed in a hex reader, but the others are not.

If you are familiar with a small utility called EncSpot, it's primary use is mainly
to display the encoder name used in given mp3 files.  So, I assume that if one knows
where to look and what to look for in the file, the encoder info can be found, but I
have yet to find any leads.

Thanks,

Gene
Author
25 Sep 2006 12:44 PM
Terry Olsen
Are you using a self-written class?  I've been using UltraId3 which claims
to be 100% standards compliant.


Show quoteHide quote
"gene kelley" <o***@by.me> wrote in message
news:9dqbh2dp9e30ha5fffdt0ua1kk22mn12cr@4ax.com...
>
> I have an app that has a class for reading mp3 tags.
> According to id3.org, ID3V2.3 tags are always UTF8 encoded (unless flagged
> otherwise).
>
> In a routine, I have this line which returns an array of characters:
> System.Text.Encoding.UTF8.GetChars(tBytes)
>
> The returned characters are correct 99% of the time.  The 1% problem cases
> are those
> tag strings that contain characters > Chr(128).
> For example:
> André Rieu is an expected result.  The actual result is Andr  Rieu.
>
> Can anyone tell me if I'm missing something here, or, is this just another
> case of
> non-compliant ID3V2.3 tags (not unusual)?
>
> Gene
Author
25 Sep 2006 8:12 PM
gene kelley
On Mon, 25 Sep 2006 06:44:49 -0600, "Terry Olsen" <tolse***@hotmail.com> wrote:

>Are you using a self-written class?  I've been using UltraId3 which claims
>to be 100% standards compliant.
>
>

I've looked at UltraId3.  My app is only concerned with reading tags.  While it's
valid to write 100% compliant tags, I disagree with UltraID3's design to only read
compliant tags.  That's just not the "real world".

Gene


Show quoteHide quote
>"gene kelley" <o***@by.me> wrote in message
>news:9dqbh2dp9e30ha5fffdt0ua1kk22mn12cr@4ax.com...
>>
>> I have an app that has a class for reading mp3 tags.
>> According to id3.org, ID3V2.3 tags are always UTF8 encoded (unless flagged
>> otherwise).
>>
>> In a routine, I have this line which returns an array of characters:
>> System.Text.Encoding.UTF8.GetChars(tBytes)
>>
>> The returned characters are correct 99% of the time.  The 1% problem cases
>> are those
>> tag strings that contain characters > Chr(128).
>> For example:
>> André Rieu is an expected result.  The actual result is Andr  Rieu.
>>
>> Can anyone tell me if I'm missing something here, or, is this just another
>> case of
>> non-compliant ID3V2.3 tags (not unusual)?
>>
>> Gene
>
Author
25 Sep 2006 8:18 PM
darren.smurf
I have found it very difficult to pinpoint the exact location of
information within an MP3 ID3v2x encoded file, because even though the
standard (according to http://www.id3.org/easy.html) for tags is to be
before the audio data I have encountered the tags being anywhere inside
the file.  I suppose you could use a hex viewer on several files, see
where the encoder shows up and then write a routine to search the
incoming MP3s for a common value between the files to locate your
encoder.
Author
26 Sep 2006 12:22 AM
Dennis
ID3v2.4.0 allows tags to be prepended or appended.  In the appended case, you
have to search from the back of the file forward for the start header.  Also,
I believe is legit to embed part of the tag in the music file somewhere as
specified in the Seek Frame.  It should be  rare occasion that someone
embedds part of a tag within the music data...I've never run across it.
--
Dennis in Houston


Show quoteHide quote
"darren.sm***@gmail.com" wrote:

> I have found it very difficult to pinpoint the exact location of
> information within an MP3 ID3v2x encoded file, because even though the
> standard (according to http://www.id3.org/easy.html) for tags is to be
> before the audio data I have encountered the tags being anywhere inside
> the file.  I suppose you could use a hex viewer on several files, see
> where the encoder shows up and then write a routine to search the
> incoming MP3s for a common value between the files to locate your
> encoder.
>
>