|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Foreign Characters in XMLI posted a couple of weeks ago with what I thought was a problem with the file system reading accented characters however, after debugging line by line I have now found the true problem. I am storing a list of files in an XML file as a sort of database. Some of these filenames have accented characters (i.e. á é í ó ú or ñ). However, upon writing the filename to the XML file, the accented character is dropped. This causes a problem upon re-reading the filenames because the program can not find the files because their 'saved' filename is now different. For example, the word "más" is saved in the XML file as "ms". Any ideas how I can work around this? I could strip out the accents and replace them with their "normal" equivalent i.e. á becomes a. But this is a sort of bodge fix as I will lose the link to the original file. Also, I can see a scenario where a file may get overwritten because the modified filename is the same as an existing file perhaps. So, to put it blunty, I'm stuck! Help! Thanks "Hugh Janus" <my-junk-acco***@hotmail.com> schrieb: How are you currently writing data to the XML file? Which classes are you >I am storing a list of files in an XML file as a sort of database. >Some of these filenames have accented characters (i.e. á é í ó ú >or ñ). However, upon writing the filename to the XML file, the >accented character is dropped. This causes a problem upon re-reading >the filenames because the program can not find the files because their >'saved' filename is now different. For example, the word "más" is >saved in the XML file as "ms". using? It's likely that the problem is caused by a wrong encoding used to persist the data. -- M S Herfried K. Wagner M V P <URL:http://dotnet.mvps.org/> V B <URL:http://classicvb.org/petition/> Herfried K. Wagner [MVP] wrote:
> How are you currently writing data to the XML file? Which classes are you I am using the class StreamReader and StreamWriter. Can I specify the> using? It's likely that the problem is caused by a wrong encoding used to > persist the data. > enconding with these in order to have the accented characters? The StreamReader and StreamWriter classes have overloaded constructors to
specify the encoding: public StreamReader ( System.String path , System.Text.Encoding encoding ) public StreamWriter ( System.String path , System.Boolean append , System.Text.Encoding encoding ) Member of System.IO.StreamWriter You will have to use the System.Text.Encoding.Default encoding. -- Show quoteHide quoteBest regards, Carlos J. Quintero MZ-Tools: Productivity add-ins for Visual Studio 2005, Visual Studio .NET, VB6, VB5 and VBA You can code, design and document much faster in VB.NET, C#, C++ or VJ# Free resources for add-in developers: http://www.mztools.com "Hugh Janus" <my-junk-acco***@hotmail.com> escribió en el mensaje news:1136878875.990282.127420@g14g2000cwa.googlegroups.com... > Herfried K. Wagner [MVP] wrote: >> How are you currently writing data to the XML file? Which classes are >> you >> using? It's likely that the problem is caused by a wrong encoding used >> to >> persist the data. >> > > I am using the class StreamReader and StreamWriter. Can I specify the > enconding with these in order to have the accented characters? > Carlos J. Quintero [VB MVP] wrote:
> The StreamReader and StreamWriter classes have overloaded constructors to Thanks Carlos for this. I'll give it a try and post back if it fails.> specify the encoding: > > public StreamReader ( System.String path , System.Text.Encoding encoding ) > > public StreamWriter ( System.String path , System.Boolean append , > System.Text.Encoding encoding ) > Member of System.IO.StreamWriter > > You will have to use the System.Text.Encoding.Default encoding. > -- I assume that the System.Text.Encoding.Default will cater for all accents and the Ñ ? > I assume that the System.Text.Encoding.Default will cater for all accents The Default encoding uses your Windows code page instead of Unicode. As long > and the Ñ ? as your Windows code page (Control Panel, Regional Settings) matches the code page of the computer used to generate the files, it will work. -- Best regards, Carlos J. Quintero MZ-Tools: Productivity add-ins for Visual Studio 2005, Visual Studio .NET, VB6, VB5 and VBA You can code, design and document much faster in VB.NET, C#, C++ or VJ# Free resources for add-in developers: http://www.mztools.com Carlos J. Quintero [VB MVP] wrote:
> > I assume that the System.Text.Encoding.Default will cater for all accents Ah, well there is a problem. I am developing on a computer that is set> > and the Ñ ? > > The Default encoding uses your Windows code page instead of Unicode. As long > as your Windows code page (Control Panel, Regional Settings) matches the > code page of the computer used to generate the files, it will work. > > -- to Spanish regional settings but the app very possibly could be installed on a computer with different regional settings. Is there a universal one I could use? "Hugh Janus" <my-junk-acco***@hotmail.com> schrieb: I'd go with 'Encoding.UTF8' or 'Encoding.Unicode' (which is UTF-16).>Ah, well there is a problem. I am developing on a computer that is set >to Spanish regional settings but the app very possibly could be >installed on a computer with different regional settings. Is there a >universal one I could use? -- M S Herfried K. Wagner M V P <URL:http://dotnet.mvps.org/> V B <URL:http://classicvb.org/petition/> Hugh,
Have a look at those code tables for Unicode. OS systems http://www.microsoft.com/globaldev/reference/oslocversion.mspx As you can see in the last page, are countries where is spoken Western European languages, standard using code page 1252. I hope this helps a little bit? Cor The code pages are not per country, but for greater regions or alphabets.
For example, western european languages use code page 1252 (ANSI Latin I) if I remember correctly. So, if you are exchanging data from, say, France to Spain, it will work. China or Russia would be a problem, though. Also, if you know the code page that was used to create the file, you can create your own encoding instead of using "Default": new System.Text.Encoding(codepage) and pass it to your reader. If you want to avoid the code page mess, then the writer and the reader should use Unicode, which was invented to avoid this kind of problems. -- Best regards, Carlos J. Quintero MZ-Tools: Productivity add-ins for Visual Studio 2005, Visual Studio .NET, VB6, VB5 and VBA You can code, design and document much faster in VB.NET, C#, C++ or VJ# Free resources for add-in developers: http://www.mztools.com "Hugh Janus" <my-junk-acco***@hotmail.com> escribió en el mensaje Ah, well there is a problem. I am developing on a computer that is setnews:1136893406.229022.195120@z14g2000cwz.googlegroups.com... to Spanish regional settings but the app very possibly could be installed on a computer with different regional settings. Is there a universal one I could use?
Show quote
Hide quote
> The code pages are not per country, but for greater regions or alphabets. I think my safest option is to use unicode as China is one of the> For example, western european languages use code page 1252 (ANSI Latin I) if > I remember correctly. So, if you are exchanging data from, say, France to > Spain, it will work. China or Russia would be a problem, though. > > Also, if you know the code page that was used to create the file, you can > create your own encoding instead of using "Default": > > new System.Text.Encoding(codepage) > > and pass it to your reader. > > If you want to avoid the code page mess, then the writer and the reader > should use Unicode, which was invented to avoid this kind of problems. > :-O Carlos, I am impressed! here, have another MVP! markets that might be targeted in the future. This raises one other question. If unicode was invented to avoid all this, then what is the benefit of NOT using unicode? "Hugh Janus" <my-junk-acco***@hotmail.com> escribió en el mensaje Yes, the safest is to use Unicode.news:1136895474.608045.76720@g44g2000cwa.googlegroups.com... > I think my safest option is to use unicode as China is one of the > markets that might be targeted in the future. > This raises one other question. If unicode was invented to avoid all Unicode has the drawback that it increases the size of file since it uses 2 > this, then what is the benefit of NOT using unicode? bytes per character, compared to 1 byte per character when using code pages. It is the price to pay to accommodate all the characters of all alphabets.... So, NOT using unicode has the benefit of using smaller files. -- Best regards, Carlos J. Quintero MZ-Tools: Productivity add-ins for Visual Studio 2005, Visual Studio .NET, VB6, VB5 and VBA You can code, design and document much faster in VB.NET, C#, C++ or VJ# Free resources for add-in developers: http://www.mztools.com You may also enjoy this article:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) http://www.joelonsoftware.com/articles/Unicode.html -- Best regards, Carlos J. Quintero MZ-Tools: Productivity add-ins for Visual Studio 2005, Visual Studio .NET, VB6, VB5 and VBA You can code, design and document much faster in VB.NET, C#, C++ or VJ# Free resources for add-in developers: http://www.mztools.com > Carlos, this is superb. Thanks. However, when I read the filenames in> The Absolute Minimum Every Software Developer Absolutely, Positively Must > Know About Unicode and Character Sets (No Excuses!) > http://www.joelonsoftware.com/articles/Unicode.html > via StreamReader, I add them into a hashtable. Some of the filenames are getting added to the hashtable as just "?????????????????????????????" which when written back via StreamWriter become what looks like chinese characters. Any ideas? p.s. I have specified the same enconding for both writer and reader. One last thing: the 2 bytes per character for storage that I said is only
when you save as Unicode UTF-16, saving in UTF-8 consumes less space. -- Show quoteHide quoteBest regards, Carlos J. Quintero MZ-Tools: Productivity add-ins for Visual Studio 2005, Visual Studio .NET, VB6, VB5 and VBA You can code, design and document much faster in VB.NET, C#, C++ or VJ# Free resources for add-in developers: http://www.mztools.com "Hugh Janus" <my-junk-acco***@hotmail.com> escribió en el mensaje news:1136962484.815460.126650@o13g2000cwo.googlegroups.com... > don't worry, i solved it. it was a typo. > Hi, Hugh,
I'm not sure what you're code looks like, but you may need to "tokenize" (encode) these characters. They should be stored or read in as either UTF-8 or Unicode (XML processors are supposed to recognize these). This should "just work" if you are using the .NET framework's System.Xml code to generate or read an XML document -- you shouldn't have to do anything. Are you generating your own XML instead, & parsing it on your own? If so, you will need to do the encoding yourself, and will need to make sure the file you create has the appropriate header detailing the text type -- it sounds like you're translating them to bare ASCII when you're writing them out. You can use the System.Test.UTF8Encoding class to translate strings between "normal" strings and UTF8, for example. Let us know how it goes. (I'll be away for a few days, but will check back when I get back Friday.) --Matt Gertz--* VB Compiler Dev Lead -----Original Message----- From: Hugh Janus Posted At: Monday, January 09, 2006 11:41 AM Posted To: microsoft.public.dotnet.languages.vb Conversation: Foreign Characters in XML Subject: Foreign Characters in XML Hi all, I posted a couple of weeks ago with what I thought was a problem with the file system reading accented characters however, after debugging line by line I have now found the true problem. I am storing a list of files in an XML file as a sort of database. Some of these filenames have accented characters (i.e. =E1 =E9 =ED =F3 =FA or =F1). However, upon writing the filename to the XML file, the accented character is dropped. This causes a problem upon re-reading the filenames because the program can not find the files because their 'saved' filename is now different. For example, the word "m=E1s" is saved in the XML file as "ms". Any ideas how I can work around this? I could strip out the accents and replace them with their "normal" equivalent i.e. =E1 becomes a. But this is a sort of bodge fix as I will lose the link to the original file. Also, I can see a scenario where a file may get overwritten because the modified filename is the same as an existing file perhaps. So, to put it blunty, I'm stuck! Help! Thanks
Cell validating event problem.
Multiple columns in Combobox list Reading Binary File in VS2005 Bug or By Design... What version of SQL Server can I install in my Windows XP PC and.. How to change a VB solution/project name? ObjectContext problem Migration of MSFLEXGRID to DATAGRID in vb.net... Releasing a loaded assembly in ASP.NET 2.0 Insert Column in Excel in VB .NET |
|||||||||||||||||||||||