|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
UTF8 Encoding and MP3 tagsI have an app that has a class for reading mp3 tags.
According to id3.org, ID3V2.3 tags are always UTF8 encoded (unless flagged otherwise). In a routine, I have this line which returns an array of characters: System.Text.Encoding.UTF8.GetChars(tBytes) The returned characters are correct 99% of the time. The 1% problem cases are those tag strings that contain characters > Chr(128). For example: André Rieu is an expected result. The actual result is Andr Rieu. Can anyone tell me if I'm missing something here, or, is this just another case of non-compliant ID3V2.3 tags (not unusual)? Gene All text frames have the encoding byte as the first byte after the frame
header for text frames. A $00 indicates ISO8859 (UTF-8) and a $01 indicates Unicode. If it's Unicode, you have to check for the BOM as well. I have written my own tag reader/writer and I've not had any problems reading any mp3 files. I do, however, check for invalid characters and eliminate them as some tags are written by poorly written tag writers. Even the best tag writer's I've found don't fully implement the ID3V2.3 specs with regard to encoding. -- Show quoteHide quoteDennis in Houston "gene kelley" wrote: > > I have an app that has a class for reading mp3 tags. > According to id3.org, ID3V2.3 tags are always UTF8 encoded (unless flagged > otherwise). > > In a routine, I have this line which returns an array of characters: > System.Text.Encoding.UTF8.GetChars(tBytes) > > The returned characters are correct 99% of the time. The 1% problem cases are those > tag strings that contain characters > Chr(128). > For example: > André Rieu is an expected result. The actual result is Andr Rieu. > > Can anyone tell me if I'm missing something here, or, is this just another case of > non-compliant ID3V2.3 tags (not unusual)? > > Gene > On Sun, 24 Sep 2006 08:04:01 -0700, Dennis <Den***@discussions.microsoft.com> wrote:
>All text frames have the encoding byte as the first byte after the frame OK, I reworked the routine and seems to work as expected now. I have randomly tried>header for text frames. A $00 indicates ISO8859 (UTF-8) and a $01 indicates >Unicode. If it's Unicode, you have to check for the BOM as well. I have >written my own tag reader/writer and I've not had any problems reading any >mp3 files. I do, however, check for invalid characters and eliminate them as >some tags are written by poorly written tag writers. Even the best tag >writer's I've found don't fully implement the ID3V2.3 specs with regard to >encoding. about 200 mp3 albums from what I have on hand and had no issues show up with any of those tested. Do you happen to know how the name of the mp3 file's CODEC is retrived? Gene Not sure what you mean by CODEC name.
-- Show quoteHide quoteDennis in Houston "gene kelley" wrote: > On Sun, 24 Sep 2006 08:04:01 -0700, Dennis <Den***@discussions.microsoft.com> wrote: > > >All text frames have the encoding byte as the first byte after the frame > >header for text frames. A $00 indicates ISO8859 (UTF-8) and a $01 indicates > >Unicode. If it's Unicode, you have to check for the BOM as well. I have > >written my own tag reader/writer and I've not had any problems reading any > >mp3 files. I do, however, check for invalid characters and eliminate them as > >some tags are written by poorly written tag writers. Even the best tag > >writer's I've found don't fully implement the ID3V2.3 specs with regard to > >encoding. > > > OK, I reworked the routine and seems to work as expected now. I have randomly tried > about 200 mp3 albums from what I have on hand and had no issues show up with any of > those tested. > > Do you happen to know how the name of the mp3 file's CODEC is retrived? > > Gene > On Sun, 24 Sep 2006 15:50:01 -0700, Dennis <Den***@discussions.microsoft.com> wrote:
>Not sure what you mean by CODEC name. I guess it's also referred to as "Encoder" in some apps.Typical expected result would be something like "Lame 3.98" or "FhG". Gene The ID3 tags have no frames that I know of for embedding any CODEC
information. The MP3 music header should provide sufficient information to determine what encoder is needed, i.e., MPEG1 or MPEG2 as encoded in the 4th bit of the second byte of the header (ID bit). -- Show quoteHide quoteDennis in Houston "gene kelley" wrote: > On Sun, 24 Sep 2006 15:50:01 -0700, Dennis <Den***@discussions.microsoft.com> wrote: > > >Not sure what you mean by CODEC name. > > I guess it's also referred to as "Encoder" in some apps. > > Typical expected result would be something like "Lame 3.98" or "FhG". > > Gene > On Mon, 25 Sep 2006 17:28:02 -0700, Dennis <Den***@discussions.microsoft.com> wrote:
>The ID3 tags have no frames that I know of for embedding any CODEC Frame TSSE was apparently designed for that purpose, but it is very rarely used.>information. The MP3 music header should provide sufficient information to >determine what encoder is needed, i.e., MPEG1 or MPEG2 as encoded in the 4th >bit of the second byte of the header (ID bit). The encoder info, though, would be more a function of the encoding software as to where, in the mp3 file, it writes it's signature. "Lame3.xx" is plainly visible in several location of an mp3 file when viewed in a hex reader, but the others are not. If you are familiar with a small utility called EncSpot, it's primary use is mainly to display the encoder name used in given mp3 files. So, I assume that if one knows where to look and what to look for in the file, the encoder info can be found, but I have yet to find any leads. Thanks, Gene Are you using a self-written class? I've been using UltraId3 which claims
to be 100% standards compliant. Show quoteHide quote "gene kelley" <o***@by.me> wrote in message news:9dqbh2dp9e30ha5fffdt0ua1kk22mn12cr@4ax.com... > > I have an app that has a class for reading mp3 tags. > According to id3.org, ID3V2.3 tags are always UTF8 encoded (unless flagged > otherwise). > > In a routine, I have this line which returns an array of characters: > System.Text.Encoding.UTF8.GetChars(tBytes) > > The returned characters are correct 99% of the time. The 1% problem cases > are those > tag strings that contain characters > Chr(128). > For example: > André Rieu is an expected result. The actual result is Andr Rieu. > > Can anyone tell me if I'm missing something here, or, is this just another > case of > non-compliant ID3V2.3 tags (not unusual)? > > Gene On Mon, 25 Sep 2006 06:44:49 -0600, "Terry Olsen" <tolse***@hotmail.com> wrote: I've looked at UltraId3. My app is only concerned with reading tags. While it's>Are you using a self-written class? I've been using UltraId3 which claims >to be 100% standards compliant. > > valid to write 100% compliant tags, I disagree with UltraID3's design to only read compliant tags. That's just not the "real world". Gene Show quoteHide quote >"gene kelley" <o***@by.me> wrote in message >news:9dqbh2dp9e30ha5fffdt0ua1kk22mn12cr@4ax.com... >> >> I have an app that has a class for reading mp3 tags. >> According to id3.org, ID3V2.3 tags are always UTF8 encoded (unless flagged >> otherwise). >> >> In a routine, I have this line which returns an array of characters: >> System.Text.Encoding.UTF8.GetChars(tBytes) >> >> The returned characters are correct 99% of the time. The 1% problem cases >> are those >> tag strings that contain characters > Chr(128). >> For example: >> André Rieu is an expected result. The actual result is Andr Rieu. >> >> Can anyone tell me if I'm missing something here, or, is this just another >> case of >> non-compliant ID3V2.3 tags (not unusual)? >> >> Gene > I have found it very difficult to pinpoint the exact location of
information within an MP3 ID3v2x encoded file, because even though the standard (according to http://www.id3.org/easy.html) for tags is to be before the audio data I have encountered the tags being anywhere inside the file. I suppose you could use a hex viewer on several files, see where the encoder shows up and then write a routine to search the incoming MP3s for a common value between the files to locate your encoder. ID3v2.4.0 allows tags to be prepended or appended. In the appended case, you
have to search from the back of the file forward for the start header. Also, I believe is legit to embed part of the tag in the music file somewhere as specified in the Seek Frame. It should be rare occasion that someone embedds part of a tag within the music data...I've never run across it. -- Show quoteHide quoteDennis in Houston "darren.sm***@gmail.com" wrote: > I have found it very difficult to pinpoint the exact location of > information within an MP3 ID3v2x encoded file, because even though the > standard (according to http://www.id3.org/easy.html) for tags is to be > before the audio data I have encountered the tags being anywhere inside > the file. I suppose you could use a hex viewer on several files, see > where the encoder shows up and then write a routine to search the > incoming MP3s for a common value between the files to locate your > encoder. > >
SQL Statement for limiting the number of detail retrieved in Access 2000?
Combining a date value with a time value SQL DISTINCT COUNT Constant - InDebugMode Deploying programs with net framework 2 onto framework 1 Single Sign On / Authentication System? error ASP.NET 2.0 won't let me put my user controls in the same directory as Web.config Binding a Text Box What does instantText mean in Edit method of DataGridColumnStyle |
|||||||||||||||||||||||