|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
UTF-8 encoding problemI am having a GUI which accepts a Unicode string and searches a given set of xml files for that string. Now, i have 2 XML files both of them saved in UTF-8 format, having characters of different language. Although both of them are having UTF-8 as BoM, but only first file is having UTF-8 defined in XML declration at the top of the XML file as well. Now, when i search for some different langauge character in that directory using a third party GUI for desktop search, it shows that the charcter exist in the first file (in which XML declation was also there), but not in the second file (having only BoM) Initilally i thought that the problem is mainly because of UTF-8 being supporting both MultiBye and Unicode, but could not find much on it. Please help. Regards, Shreshth Shreshth,
> Although both of them are having UTF-8 as BoM, but only first file is What does the second file have in its XML declaration (what specifically > having UTF-8 defined in XML declration at the top of the XML file as > well. does its declaration look like)? Sounds like you have a bug in the application that wrote the second Xml file. I suspect (hope) when that application created the Xml (the XmlWriter) it encoded the characters per what the Xml declaration states. I would then expect (but not hope) when it (the underlying text writer) wrote the file, it "transposed" (read mangled) the correctly encoded characters into UTF-8. I consider this double transposition to be bad, very bad. -- Show quoteHide quoteHope this helps Jay B. Harlow ..NET Application Architect, Enthusiast, & Evangelist T.S. Bradley - http://www.tsbradley.net <shreshth.lut***@gmail.com> wrote in message news:1161173560.157591.225610@h48g2000cwc.googlegroups.com... > Hi All, > > I am having a GUI which accepts a Unicode string and searches a given > set of xml files for that string. > > Now, i have 2 XML files both of them saved in UTF-8 format, having > characters of different language. > > Although both of them are having UTF-8 as BoM, but only first file is > having UTF-8 defined in XML declration at the top of the XML file as > well. > > Now, when i search for some different langauge character in that > directory using a third party GUI for desktop search, it shows that the > charcter exist in the first file (in which XML declation was also > there), but not in the second file (having only BoM) > > Initilally i thought that the problem is mainly because of UTF-8 being > supporting both MultiBye and Unicode, but could not find much on it. > > Please help. > > Regards, > Shreshth > By xml declaration at the beginning of the file,i mean to say the XML
Declaration having the "encoding" attribute at the begining of file (Encoding = UTF-8, do not remeber the exact format). It is the same as MSDN says. Do you still mean to say the same in that case as well. Actually i am not not able to understand completely what exact you want to say. By the way, XML write here is Notepad. Thanks for your reply. Shreshth Jay B. Harlow wrote: Show quoteHide quote > Shreshth, > > Although both of them are having UTF-8 as BoM, but only first file is > > having UTF-8 defined in XML declration at the top of the XML file as > > well. > What does the second file have in its XML declaration (what specifically > does its declaration look like)? > > Sounds like you have a bug in the application that wrote the second Xml > file. > > I suspect (hope) when that application created the Xml (the XmlWriter) it > encoded the characters per what the Xml declaration states. I would then > expect (but not hope) when it (the underlying text writer) wrote the file, > it "transposed" (read mangled) the correctly encoded characters into UTF-8. > I consider this double transposition to be bad, very bad. > > -- > Hope this helps > Jay B. Harlow > .NET Application Architect, Enthusiast, & Evangelist > T.S. Bradley - http://www.tsbradley.net > > > <shreshth.lut***@gmail.com> wrote in message > news:1161173560.157591.225610@h48g2000cwc.googlegroups.com... > > Hi All, > > > > I am having a GUI which accepts a Unicode string and searches a given > > set of xml files for that string. > > > > Now, i have 2 XML files both of them saved in UTF-8 format, having > > characters of different language. > > > > Although both of them are having UTF-8 as BoM, but only first file is > > having UTF-8 defined in XML declration at the top of the XML file as > > well. > > > > Now, when i search for some different langauge character in that > > directory using a third party GUI for desktop search, it shows that the > > charcter exist in the first file (in which XML declation was also > > there), but not in the second file (having only BoM) > > > > Initilally i thought that the problem is mainly because of UTF-8 being > > supporting both MultiBye and Unicode, but could not find much on it. > > > > Please help. > > > > Regards, > > Shreshth > > Shreshth
> By xml declaration at the beginning of the file,i mean to say the XML Yes, but what specifically does your file say (cut & paste the one from your > Declaration having the "encoding" attribute at the begining of file > (Encoding = UTF-8, do not remeber the exact format). It is the same as > MSDN says. file into your response to this message)... Alternatively email them to me. > By the way, XML write here is Notepad. Ah! There's the rub!What I am saying is the "encoding" of your physical file (the one on disk) is different then the logical file (the xml itself). (My example may have been backwards, but the net effect is the same, the characters are not encoded to what you think they are). It sounds like your physical file is UTF-8, while I'm concerned your logical file is whatever, where whatever is the text you blindly copied from an MSDN article. -- Show quoteHide quoteHope this helps Jay B. Harlow ..NET Application Architect, Enthusiast, & Evangelist T.S. Bradley - http://www.tsbradley.net <shreshth.lut***@gmail.com> wrote in message news:1161176427.440331.61020@m7g2000cwm.googlegroups.com... > By xml declaration at the beginning of the file,i mean to say the XML > Declaration having the "encoding" attribute at the begining of file > (Encoding = UTF-8, do not remeber the exact format). It is the same as > MSDN says. > > Do you still mean to say the same in that case as well. > Actually i am not not able to understand completely what exact you want > to say. > > By the way, XML write here is Notepad. > > Thanks for your reply. > > > > > > Jay B. Harlow wrote: >> Shreshth, >> > Although both of them are having UTF-8 as BoM, but only first file is >> > having UTF-8 defined in XML declration at the top of the XML file as >> > well. >> What does the second file have in its XML declaration (what specifically >> does its declaration look like)? >> >> Sounds like you have a bug in the application that wrote the second Xml >> file. >> >> I suspect (hope) when that application created the Xml (the XmlWriter) it >> encoded the characters per what the Xml declaration states. I would then >> expect (but not hope) when it (the underlying text writer) wrote the >> file, >> it "transposed" (read mangled) the correctly encoded characters into >> UTF-8. >> I consider this double transposition to be bad, very bad. >> >> -- >> Hope this helps >> Jay B. Harlow >> .NET Application Architect, Enthusiast, & Evangelist >> T.S. Bradley - http://www.tsbradley.net >> >> >> <shreshth.lut***@gmail.com> wrote in message >> news:1161173560.157591.225610@h48g2000cwc.googlegroups.com... >> > Hi All, >> > >> > I am having a GUI which accepts a Unicode string and searches a given >> > set of xml files for that string. >> > >> > Now, i have 2 XML files both of them saved in UTF-8 format, having >> > characters of different language. >> > >> > Although both of them are having UTF-8 as BoM, but only first file is >> > having UTF-8 defined in XML declration at the top of the XML file as >> > well. >> > >> > Now, when i search for some different langauge character in that >> > directory using a third party GUI for desktop search, it shows that the >> > charcter exist in the first file (in which XML declation was also >> > there), but not in the second file (having only BoM) >> > >> > Initilally i thought that the problem is mainly because of UTF-8 being >> > supporting both MultiBye and Unicode, but could not find much on it. >> > >> > Please help. >> > >> > Regards, >> > Shreshth >> > > Hi Jay,
<?xml version="1.0" encoding="UTF-8" ?> This is the XML Declaration i was speaking about. Rest of the file is the same as normal XML file. I will try what you have told me in the office tomorrow but one thing i can tell you right now is that I have already tried the same file (having only BoM and not XML declaration) by saving it in UTF-16 LE and UTF-16 BE. And my third party desktop search works with both of them. Only problem is with the UTF-8 format. Thanks. Shreshth Jay B. Harlow wrote: Show quoteHide quote > Shreshth > > By xml declaration at the beginning of the file,i mean to say the XML > > Declaration having the "encoding" attribute at the begining of file > > (Encoding = UTF-8, do not remeber the exact format). It is the same as > > MSDN says. > Yes, but what specifically does your file say (cut & paste the one from your > file into your response to this message)... Alternatively email them to me. > > > By the way, XML write here is Notepad. > Ah! There's the rub! > > What I am saying is the "encoding" of your physical file (the one on disk) > is different then the logical file (the xml itself). (My example may have > been backwards, but the net effect is the same, the characters are not > encoded to what you think they are). > > It sounds like your physical file is UTF-8, while I'm concerned your logical > file is whatever, where whatever is the text you blindly copied from an MSDN > article. > > > -- > Hope this helps > Jay B. Harlow > .NET Application Architect, Enthusiast, & Evangelist > T.S. Bradley - http://www.tsbradley.net > > > <shreshth.lut***@gmail.com> wrote in message > news:1161176427.440331.61020@m7g2000cwm.googlegroups.com... > > By xml declaration at the beginning of the file,i mean to say the XML > > Declaration having the "encoding" attribute at the begining of file > > (Encoding = UTF-8, do not remeber the exact format). It is the same as > > MSDN says. > > > > Do you still mean to say the same in that case as well. > > Actually i am not not able to understand completely what exact you want > > to say. > > > > By the way, XML write here is Notepad. > > > > Thanks for your reply. > > > > > > > > > > > > Jay B. Harlow wrote: > >> Shreshth, > >> > Although both of them are having UTF-8 as BoM, but only first file is > >> > having UTF-8 defined in XML declration at the top of the XML file as > >> > well. > >> What does the second file have in its XML declaration (what specifically > >> does its declaration look like)? > >> > >> Sounds like you have a bug in the application that wrote the second Xml > >> file. > >> > >> I suspect (hope) when that application created the Xml (the XmlWriter) it > >> encoded the characters per what the Xml declaration states. I would then > >> expect (but not hope) when it (the underlying text writer) wrote the > >> file, > >> it "transposed" (read mangled) the correctly encoded characters into > >> UTF-8. > >> I consider this double transposition to be bad, very bad. > >> > >> -- > >> Hope this helps > >> Jay B. Harlow > >> .NET Application Architect, Enthusiast, & Evangelist > >> T.S. Bradley - http://www.tsbradley.net > >> > >> > >> <shreshth.lut***@gmail.com> wrote in message > >> news:1161173560.157591.225610@h48g2000cwc.googlegroups.com... > >> > Hi All, > >> > > >> > I am having a GUI which accepts a Unicode string and searches a given > >> > set of xml files for that string. > >> > > >> > Now, i have 2 XML files both of them saved in UTF-8 format, having > >> > characters of different language. > >> > > >> > Although both of them are having UTF-8 as BoM, but only first file is > >> > having UTF-8 defined in XML declration at the top of the XML file as > >> > well. > >> > > >> > Now, when i search for some different langauge character in that > >> > directory using a third party GUI for desktop search, it shows that the > >> > charcter exist in the first file (in which XML declation was also > >> > there), but not in the second file (having only BoM) > >> > > >> > Initilally i thought that the problem is mainly because of UTF-8 being > >> > supporting both MultiBye and Unicode, but could not find much on it. > >> > > >> > Please help. > >> > > >> > Regards, > >> > Shreshth > >> > > >
Need a tip: How do you streamwrite from two different db tables?
Q: Advice on threads On Error Goto Next Loop How to simulate Application.DoEvents in a DLL Bootstrapping .NET 2.0 Dispose problem/crash with maximized MDI child windows need help!! problem with two </script> tags VB2005 - Split Files Visual Basic Programming - Reference to fpPublishLogInTempDir uri - ADVANCED question |
|||||||||||||||||||||||