|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Replacing a string inside of a PDFI am having a lot more trouble with this than I thought I would. Here
is what I want to do in pseudocode. Open c:\some.pdf Replace "Replace this" with "Replaced!" Save c:\some_edited.pdf I can do this in notepad and it works fine, but when I start getting in to reading the files I think it has some encoding problem. I tried saving the file with every encoding option. When I open a PDF in the text editor I normally use it says it is ANSI with Mac style carriage returns. Winmerge will not let me compare the files because it says they are binary. Anyone know what I have to do? Please explain, are you trying to read the file using a binary string and
then using a binary string you try to write another file Show quoteHide quote "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message news:1153424519.544440.222000@h48g2000cwc.googlegroups.com... >I am having a lot more trouble with this than I thought I would. Here > is what I want to do in pseudocode. > > Open c:\some.pdf > Replace "Replace this" with "Replaced!" > Save c:\some_edited.pdf > > I can do this in notepad and it works fine, but when I start getting in > to reading the files I think it has some encoding problem. I tried > saving the file with every encoding option. When I open a PDF in the > text editor I normally use it says it is ANSI with Mac style carriage > returns. Winmerge will not let me compare the files because it says > they are binary. > > Anyone know what I have to do? > Samuel,
I have tried it several ways. The end goal is just to end up with an edited PDF. If I have to overwrite the original file that is fine. Samuel Shulman wrote: Show quoteHide quote > Please explain, are you trying to read the file using a binary string and > then using a binary string you try to write another file > > > > "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message > news:1153424519.544440.222000@h48g2000cwc.googlegroups.com... > >I am having a lot more trouble with this than I thought I would. Here > > is what I want to do in pseudocode. > > > > Open c:\some.pdf > > Replace "Replace this" with "Replaced!" > > Save c:\some_edited.pdf > > > > I can do this in notepad and it works fine, but when I start getting in > > to reading the files I think it has some encoding problem. I tried > > saving the file with every encoding option. When I open a PDF in the > > text editor I normally use it says it is ANSI with Mac style carriage > > returns. Winmerge will not let me compare the files because it says > > they are binary. > > > > Anyone know what I have to do? > > I'm assuming that I should somehow be using a binaryreader and a
binarywriter, I just don't know how to work with the data inside as strings and then put it back in to the writer. Josh Baltzell wrote: Show quoteHide quote > Samuel, > > I have tried it several ways. The end goal is just to end up with an > edited PDF. If I have to overwrite the original file that is fine. > > Samuel Shulman wrote: > > Please explain, are you trying to read the file using a binary string and > > then using a binary string you try to write another file > > > > > > > > "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message > > news:1153424519.544440.222000@h48g2000cwc.googlegroups.com... > > >I am having a lot more trouble with this than I thought I would. Here > > > is what I want to do in pseudocode. > > > > > > Open c:\some.pdf > > > Replace "Replace this" with "Replaced!" > > > Save c:\some_edited.pdf > > > > > > I can do this in notepad and it works fine, but when I start getting in > > > to reading the files I think it has some encoding problem. I tried > > > saving the file with every encoding option. When I open a PDF in the > > > text editor I normally use it says it is ANSI with Mac style carriage > > > returns. Winmerge will not let me compare the files because it says > > > they are binary. > > > > > > Anyone know what I have to do? > > > I think that the key to your question is how to actually read the file (I
should have realized before that this is the main issue), Did you manage to read parts of the file only if you can do that you can replace the text Show quoteHide quote "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message news:1153491098.516072.81100@m79g2000cwm.googlegroups.com... > I'm assuming that I should somehow be using a binaryreader and a > binarywriter, I just don't know how to work with the data inside as > strings and then put it back in to the writer. > > Josh Baltzell wrote: >> Samuel, >> >> I have tried it several ways. The end goal is just to end up with an >> edited PDF. If I have to overwrite the original file that is fine. >> >> Samuel Shulman wrote: >> > Please explain, are you trying to read the file using a binary string >> > and >> > then using a binary string you try to write another file >> > >> > >> > >> > "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message >> > news:1153424519.544440.222000@h48g2000cwc.googlegroups.com... >> > >I am having a lot more trouble with this than I thought I would. Here >> > > is what I want to do in pseudocode. >> > > >> > > Open c:\some.pdf >> > > Replace "Replace this" with "Replaced!" >> > > Save c:\some_edited.pdf >> > > >> > > I can do this in notepad and it works fine, but when I start getting >> > > in >> > > to reading the files I think it has some encoding problem. I tried >> > > saving the file with every encoding option. When I open a PDF in the >> > > text editor I normally use it says it is ANSI with Mac style carriage >> > > returns. Winmerge will not let me compare the files because it says >> > > they are binary. >> > > >> > > Anyone know what I have to do? >> > > > I have written the code to at least read the internals of the file as a
string or a stream and then I can find the chunk I want to replace easy enough, but I think it loses some special characters, or maybe screws up the line endings (PDF files have mac style CR only instead of CR LF like a lot of windows based files have I believe.) So I guess my problem is actually reading and writing. I can write code that looks like I am reading it with a streamreader, but I think I am really losing data. I can write code that reads it as binary, but then I have trouble working with the contents. After all that is worked out I have to figure out how to write the edited file back to disk (I believe the binary writer will do that, but I have not tested much.) I'm not sure what else I can tell you, This is just a matter of me not fully understanding how I am supposed to read and edit a file like this as opposed to the other formats that I have worked with that were all plain text. Thanks a lot for the feedback. I looked at the other post you linked to and read the linked page. I think that would be useful to me if the PDFs were compressed, but I can open these in Notepad and find my string right now (and that works when I do the edit that way.) Samuel Shulman wrote: Show quoteHide quote > I think that the key to your question is how to actually read the file (I > should have realized before that this is the main issue), > > Did you manage to read parts of the file only if you can do that you can > replace the text > > > > "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message > news:1153491098.516072.81100@m79g2000cwm.googlegroups.com... > > I'm assuming that I should somehow be using a binaryreader and a > > binarywriter, I just don't know how to work with the data inside as > > strings and then put it back in to the writer. > > > > Josh Baltzell wrote: > >> Samuel, > >> > >> I have tried it several ways. The end goal is just to end up with an > >> edited PDF. If I have to overwrite the original file that is fine. > >> > >> Samuel Shulman wrote: > >> > Please explain, are you trying to read the file using a binary string > >> > and > >> > then using a binary string you try to write another file > >> > > >> > > >> > > >> > "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message > >> > news:1153424519.544440.222000@h48g2000cwc.googlegroups.com... > >> > >I am having a lot more trouble with this than I thought I would. Here > >> > > is what I want to do in pseudocode. > >> > > > >> > > Open c:\some.pdf > >> > > Replace "Replace this" with "Replaced!" > >> > > Save c:\some_edited.pdf > >> > > > >> > > I can do this in notepad and it works fine, but when I start getting > >> > > in > >> > > to reading the files I think it has some encoding problem. I tried > >> > > saving the file with every encoding option. When I open a PDF in the > >> > > text editor I normally use it says it is ANSI with Mac style carriage > >> > > returns. Winmerge will not let me compare the files because it says > >> > > they are binary. > >> > > > >> > > Anyone know what I have to do? > >> > > > > You may be able to create identical string to the one that you want to
replace then send it to a binary stream (it doesn't have to be a file) then look for such a binary sequence within the main binary stream (binary buffer) that holds the pdf file and replace it with another binary stream created from the string you wanted to use for the replacement You still have the problem of the funny characters which you can imitate by adding CR instead of the CRLF (or what is the normal) And finally, once the code will work please send it over it seems interesting to me (if it is OK with you/your company) Regards, Samuel Show quoteHide quote "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message news:1153497761.898484.15620@i3g2000cwc.googlegroups.com... >I have written the code to at least read the internals of the file as a > string or a stream and then I can find the chunk I want to replace easy > enough, but I think it loses some special characters, or maybe screws > up the line endings (PDF files have mac style CR only instead of CR LF > like a lot of windows based files have I believe.) > > So I guess my problem is actually reading and writing. I can write > code that looks like I am reading it with a streamreader, but I think I > am really losing data. I can write code that reads it as binary, but > then I have trouble working with the contents. After all that is > worked out I have to figure out how to write the edited file back to > disk (I believe the binary writer will do that, but I have not tested > much.) > > I'm not sure what else I can tell you, This is just a matter of me not > fully understanding how I am supposed to read and edit a file like this > as opposed to the other formats that I have worked with that were all > plain text. > > Thanks a lot for the feedback. I looked at the other post you linked > to and read the linked page. I think that would be useful to me if the > PDFs were compressed, but I can open these in Notepad and find my > string right now (and that works when I do the edit that way.) > > Samuel Shulman wrote: >> I think that the key to your question is how to actually read the file (I >> should have realized before that this is the main issue), >> >> Did you manage to read parts of the file only if you can do that you can >> replace the text >> >> >> >> "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message >> news:1153491098.516072.81100@m79g2000cwm.googlegroups.com... >> > I'm assuming that I should somehow be using a binaryreader and a >> > binarywriter, I just don't know how to work with the data inside as >> > strings and then put it back in to the writer. >> > >> > Josh Baltzell wrote: >> >> Samuel, >> >> >> >> I have tried it several ways. The end goal is just to end up with an >> >> edited PDF. If I have to overwrite the original file that is fine. >> >> >> >> Samuel Shulman wrote: >> >> > Please explain, are you trying to read the file using a binary >> >> > string >> >> > and >> >> > then using a binary string you try to write another file >> >> > >> >> > >> >> > >> >> > "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message >> >> > news:1153424519.544440.222000@h48g2000cwc.googlegroups.com... >> >> > >I am having a lot more trouble with this than I thought I would. >> >> > >Here >> >> > > is what I want to do in pseudocode. >> >> > > >> >> > > Open c:\some.pdf >> >> > > Replace "Replace this" with "Replaced!" >> >> > > Save c:\some_edited.pdf >> >> > > >> >> > > I can do this in notepad and it works fine, but when I start >> >> > > getting >> >> > > in >> >> > > to reading the files I think it has some encoding problem. I >> >> > > tried >> >> > > saving the file with every encoding option. When I open a PDF in >> >> > > the >> >> > > text editor I normally use it says it is ANSI with Mac style >> >> > > carriage >> >> > > returns. Winmerge will not let me compare the files because it >> >> > > says >> >> > > they are binary. >> >> > > >> >> > > Anyone know what I have to do? >> >> > > >> > > I'm not sure I know how to do what you are saying, but here is a test I
made to write the file using a string converted in to a bytearray. This is not working. ::::::::::::::::::::::::::::::::::::::::::::::::::: Public Function ByteTest()Dim PDFFile As String Dim PDFFolder As IO.Directory Response.Write("Start Byte:" & DateTime.Now.ToLongTimeString & ":" & Now.Millisecond & "<br>") For Each PDFFile In PDFFolder.GetFiles(Server.MapPath("PDF"))'Open the file Dim FileStream As IO.StreamReader FileStream = IO.File.OpenText(PDFFile) 'Load the file in to a string Dim Contents As String = FileStream.ReadToEnd 'Replace text in string Contents = Contents.Replace("ABC1234567890", "ABC1111111111") 'Close stream FileStream.Close() 'Create byte based output file Dim OutputFileName As String = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString & "BYTE.pdf") Dim fs As FileStream = File.Create(OutputFileName) fs.Close() 'Convert the string to bytes Dim info As Byte() = New System.Text.UTF8Encoding(True).GetBytes(Contents) 'Write string as bytes to output file fs = File.OpenWrite(OutputFileName) fs.Write(info, 0, info.Length) fs.Close() Next Response.Write("Stop Byte:" & DateTime.Now.ToLongTimeString & ":" & Now.Millisecond & "<br>") End FunctionShow quoteHide quote ::::::::::::::::::::::::::::::::::::::::::::::::::: Here is another test I wrote that sucessfully generates a bunch of
useless files encoded in different ways. ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Public Function StringTest()Dim PDFFile As String Dim PDFFolder As IO.Directory Response.Write("Start String:" & DateTime.Now.ToLongTimeString & ":" & Now.Millisecond & "<br>") For Each PDFFile In PDFFolder.GetFiles(Server.MapPath("PDF"))'Open the file Dim FileStream As IO.StreamReader FileStream = IO.File.OpenText(PDFFile) 'Load the file in to a string Dim Contents As String = FileStream.ReadToEnd 'Replace text in string Contents = Contents.Replace("ABC1234567890", "ABC1111111111") 'Close stream FileStream.Close() 'Create ASCII output file Dim OutputFileName As String = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString & "STRING-ASCII.pdf") Dim fs As FileStream = File.Create(OutputFileName) Dim PDFStream As StreamWriter = New StreamWriter(fs, System.Text.Encoding.ASCII) PDFStream.Write(Contents) PDFStream.Close() fs.Close() 'Create BigEndianUnicode output file OutputFileName = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString & "STRING-BigEndianUnicode.pdf") fs = File.Create(OutputFileName) PDFStream = New StreamWriter(fs, System.Text.Encoding.BigEndianUnicode) PDFStream.Write(Contents) PDFStream.Close() fs.Close() 'Create default formatted output file OutputFileName = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString & "STRING-Default.pdf") fs = File.Create(OutputFileName) PDFStream = New StreamWriter(fs, System.Text.Encoding.Default) PDFStream.Write(Contents) PDFStream.Close() fs.Close() 'Create Unicode output file OutputFileName = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString & "STRING-Unicode.pdf") fs = File.Create(OutputFileName) PDFStream = New StreamWriter(fs, System.Text.Encoding.Unicode) PDFStream.Write(Contents) PDFStream.Close() fs.Close() 'Create UTF7 output file OutputFileName = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString & "STRING-UTF7.pdf") fs = File.Create(OutputFileName) PDFStream = New StreamWriter(fs, System.Text.Encoding.UTF7) PDFStream.Write(Contents) PDFStream.Close() fs.Close() 'Create UTF8 output file OutputFileName = Server.MapPath("PDFOutput\" & DateTime.Now.ToFileTimeUtc.ToString & "STRING-UTF8.pdf") fs = File.Create(OutputFileName) PDFStream = New StreamWriter(fs, System.Text.Encoding.UTF8) PDFStream.Write(Contents) PDFStream.Close() fs.Close() Next Response.Write("Stop String:" & DateTime.Now.ToLongTimeString & ":" & Now.Millisecond & "<br>") End FunctionShow quoteHide quote ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Finally,
Can you achieve what you actually need? Samuel Show quoteHide quote "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message news:1153509414.857395.142850@i42g2000cwa.googlegroups.com... > Here is another test I wrote that sucessfully generates a bunch of > useless files encoded in different ways. > > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > Public Function StringTest() > Dim PDFFile As String > Dim PDFFolder As IO.Directory > > Response.Write("Start String:" & DateTime.Now.ToLongTimeString > & ":" & Now.Millisecond & "<br>") > > For Each PDFFile In PDFFolder.GetFiles(Server.MapPath("PDF")) > 'Open the file > Dim FileStream As IO.StreamReader > FileStream = IO.File.OpenText(PDFFile) > > 'Load the file in to a string > Dim Contents As String = FileStream.ReadToEnd > > 'Replace text in string > Contents = Contents.Replace("ABC1234567890", > "ABC1111111111") > > 'Close stream > FileStream.Close() > > 'Create ASCII output file > Dim OutputFileName As String = Server.MapPath("PDFOutput\" > & DateTime.Now.ToFileTimeUtc.ToString & "STRING-ASCII.pdf") > Dim fs As FileStream = File.Create(OutputFileName) > Dim PDFStream As StreamWriter = New StreamWriter(fs, > System.Text.Encoding.ASCII) > PDFStream.Write(Contents) > PDFStream.Close() > fs.Close() > > 'Create BigEndianUnicode output file > OutputFileName = Server.MapPath("PDFOutput\" & > DateTime.Now.ToFileTimeUtc.ToString & "STRING-BigEndianUnicode.pdf") > fs = File.Create(OutputFileName) > PDFStream = New StreamWriter(fs, > System.Text.Encoding.BigEndianUnicode) > PDFStream.Write(Contents) > PDFStream.Close() > fs.Close() > > 'Create default formatted output file > OutputFileName = Server.MapPath("PDFOutput\" & > DateTime.Now.ToFileTimeUtc.ToString & "STRING-Default.pdf") > fs = File.Create(OutputFileName) > PDFStream = New StreamWriter(fs, > System.Text.Encoding.Default) > PDFStream.Write(Contents) > PDFStream.Close() > fs.Close() > > 'Create Unicode output file > OutputFileName = Server.MapPath("PDFOutput\" & > DateTime.Now.ToFileTimeUtc.ToString & "STRING-Unicode.pdf") > fs = File.Create(OutputFileName) > PDFStream = New StreamWriter(fs, > System.Text.Encoding.Unicode) > PDFStream.Write(Contents) > PDFStream.Close() > fs.Close() > > 'Create UTF7 output file > OutputFileName = Server.MapPath("PDFOutput\" & > DateTime.Now.ToFileTimeUtc.ToString & "STRING-UTF7.pdf") > fs = File.Create(OutputFileName) > PDFStream = New StreamWriter(fs, System.Text.Encoding.UTF7) > PDFStream.Write(Contents) > PDFStream.Close() > fs.Close() > > 'Create UTF8 output file > OutputFileName = Server.MapPath("PDFOutput\" & > DateTime.Now.ToFileTimeUtc.ToString & "STRING-UTF8.pdf") > fs = File.Create(OutputFileName) > PDFStream = New StreamWriter(fs, System.Text.Encoding.UTF8) > PDFStream.Write(Contents) > PDFStream.Close() > fs.Close() > > Next > > Response.Write("Stop String:" & DateTime.Now.ToLongTimeString & > ":" & Now.Millisecond & "<br>") > > End Function > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > may this link will be useful
http://groups.google.com/group/microsoft.public.dotnet.languages.csharp/browse_thread/thread/c3c5a40cda61918/9cace184fa716b5a?lnk=st&q=&rnum=7#9cace184fa716b5a Show quoteHide quote "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message news:1153491098.516072.81100@m79g2000cwm.googlegroups.com... > I'm assuming that I should somehow be using a binaryreader and a > binarywriter, I just don't know how to work with the data inside as > strings and then put it back in to the writer. > > Josh Baltzell wrote: >> Samuel, >> >> I have tried it several ways. The end goal is just to end up with an >> edited PDF. If I have to overwrite the original file that is fine. >> >> Samuel Shulman wrote: >> > Please explain, are you trying to read the file using a binary string >> > and >> > then using a binary string you try to write another file >> > >> > >> > >> > "Josh Baltzell" <joshbaltz***@gmail.com> wrote in message >> > news:1153424519.544440.222000@h48g2000cwc.googlegroups.com... >> > >I am having a lot more trouble with this than I thought I would. Here >> > > is what I want to do in pseudocode. >> > > >> > > Open c:\some.pdf >> > > Replace "Replace this" with "Replaced!" >> > > Save c:\some_edited.pdf >> > > >> > > I can do this in notepad and it works fine, but when I start getting >> > > in >> > > to reading the files I think it has some encoding problem. I tried >> > > saving the file with every encoding option. When I open a PDF in the >> > > text editor I normally use it says it is ANSI with Mac style carriage >> > > returns. Winmerge will not let me compare the files because it says >> > > they are binary. >> > > >> > > Anyone know what I have to do? >> > > > Josh Baltzell wrote:
> I am having a lot more trouble with this than I thought I would. Here <snip>> is what I want to do in pseudocode. > > Open c:\some.pdf > Replace "Replace this" with "Replaced!" > Save c:\some_edited.pdf > > I can do this in notepad and it works fine, but when I start getting in > to reading the files I think it has some encoding problem. I tried > saving the file with every encoding option. When I open a PDF in the > text editor I normally use it says it is ANSI with Mac style carriage > returns. Winmerge will not let me compare the files because it says > they are binary. Winmerge is right, a PDF file is actually a binary image, not a plain text in a given encoding. You should load it as a stream of bytes. On the other hand, since you want to perform text replacements in the file, you may load it with an encoding that doesn't apply transformations on the bytes in the file, such as the Ansi encoding: Sub PDFReplaceText(ByVal Path As String, ByVal OldText As String, _ ByVal OutPath As String, ByVal NewText As String) Const ANSI As Integer = 1252 Dim Encoding As Text.Encoding = Text.Encoding.GetEncoding(ANSI) Dim sr As New IO.StreamReader(Path, Encoding) Dim Data As String = sr.ReadToEnd sr.Close() Data = Data.Replace(OldText, NewText) Dim sw As New IO.StreamWriter(OutPath, False, Encoding) sw.Write(Data) sw.Close() End Sub HTH. Regards, Branco. Branco,
This worked perfect. My knowlege about the encoding options in general is very weak, so thanks for spelling it out for me with some code. Samuel, Thank you to you too. You have both been a big help. Thank you, Josh Baltzell I am glad to hear,
Is Branco's code works as is? Show quoteHide quote "Josh" <joshbaltz***@gmail.com> wrote in message news:1153747378.716993.125910@75g2000cwc.googlegroups.com... > Branco, > > This worked perfect. My knowlege about the encoding options in general > is very weak, so thanks for spelling it out for me with some code. > > Samuel, > > Thank you to you too. You have both been a big help. > > Thank you, > Josh Baltzell > I put the encoding options in to my own code, so I am not positive.
This is the final sub I ended up with. Public Sub ReplaceText(ByVal FilePath As String, ByVal OriginalText As String, ByVal NewText As String) Dim PDFFolder As IO.Directory Dim Encoding As System.Text.Encoding = Encoding.GetEncoding(1252) 'Open the file Dim FileStream As New IO.StreamReader(FilePath, Encoding) 'Load the file in to a string Dim Contents As String = FileStream.ReadToEnd 'Replace text in string Contents = Contents.Replace(OriginalText, NewText) 'Close stream FileStream.Close() 'Write string as bytes to output file Dim OutputFileName As String = FilePath Dim sw As New IO.StreamWriter(OutputFileName, False, Encoding) sw.Write(Contents) sw.Close() End Sub Samuel Shulman wrote: Show quoteHide quote > I am glad to hear, > > Is Branco's code works as is? > > > "Josh" <joshbaltz***@gmail.com> wrote in message > news:1153747378.716993.125910@75g2000cwc.googlegroups.com... > > Branco, > > > > This worked perfect. My knowlege about the encoding options in general > > is very weak, so thanks for spelling it out for me with some code. > > > > Samuel, > > > > Thank you to you too. You have both been a big help. > > > > Thank you, > > Josh Baltzell > >
Cut, Copy and Paste?
Diagnostics.Process & MSIEXEC problem... handling money SQL and VB Validating all controls on "Save" deploy window application How to Return Value from Module? exceptions/inner exceptions How to create a sound at specific frequency? Getting the full path of a folder. Expose Count from System.Collections.CollectionBase in an inherited class |
|||||||||||||||||||||||