|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Can we Read the text contents from PDF using .netCan we Read the text contents from PDF using .net.
If possible means what to do. =?Utf-8?B?Qi5OLlByYWJodQ==?= <prabh***@officetiger.com> wrote in
news:1D395A97-D922-4795-BA82-510C94FB4C0A@microsoft.com: PDF is a collection of objects - it's not formatted text.> Can we Read the text contents from PDF using .net. > > If possible means what to do. So you cna read the text - tho maybe not in a meaningful manner Here is what I do:
''' <summary> ''' Gets the PDF text from a file ''' requires pdftotext.exe from http://www.foolabs.com/xpdf ''' </summary> ''' <param name="filename">The filename.</param> ''' <returns>PDF Text</returns> Public Function getPDFtext(ByVal filename As String) As String Dim p As New System.Diagnostics.Process Dim std_out As IO.StreamReader Dim txtStdout As String = "" Try p.StartInfo.FileName = "Asset Search\pdftotext.exe" p.StartInfo.Arguments = filename & " -" p.StartInfo.UseShellExecute = False p.StartInfo.CreateNoWindow = True p.StartInfo.RedirectStandardOutput = True p.Start() std_out = p.StandardOutput() 'Get the text from standard output txtStdout = std_out.ReadToEnd() std_out.Close() Catch ex As Exception MsgBox("Error in while extracting PDF text, the error is: " & ex.Message.ToString) End Try Return txtStdout End Function I wouldn't use it for anything serious, business critical, or Realtime. For that you should probably go with a commercial control like http://www.pdfonline.com/. But for quick and dirty text extraction it works fine for me. Best Regards, Chris Show quoteHide quote "B.N.Prabhu" <prabh***@officetiger.com> wrote in message news:1D395A97-D922-4795-BA82-510C94FB4C0A@microsoft.com... > Can we Read the text contents from PDF using .net. > > If possible means what to do. Just to clarify the following line should point the the actual pdftotext.exe
program p.StartInfo.FileName = "Asset Search\pdftotext.exe" <---Points to location of pdftotext.exe Chris Show quoteHide quote "Chris" <consult_Chris@nospam.yahoo.com> wrote in message news:ec7KGYQ2GHA.4228@TK2MSFTNGP06.phx.gbl... > Here is what I do: > > ''' <summary> > ''' Gets the PDF text from a file > ''' requires pdftotext.exe from http://www.foolabs.com/xpdf > ''' </summary> > ''' <param name="filename">The filename.</param> > ''' <returns>PDF Text</returns> > Public Function getPDFtext(ByVal filename As String) As String > Dim p As New System.Diagnostics.Process > Dim std_out As IO.StreamReader > Dim txtStdout As String = "" > > Try > > p.StartInfo.FileName = "Asset Search\pdftotext.exe" > p.StartInfo.Arguments = filename & " -" > p.StartInfo.UseShellExecute = False > p.StartInfo.CreateNoWindow = True > p.StartInfo.RedirectStandardOutput = True > > p.Start() > std_out = p.StandardOutput() > > 'Get the text from standard output > txtStdout = std_out.ReadToEnd() > > std_out.Close() > Catch ex As Exception > MsgBox("Error in while extracting PDF text, the error is: " & > ex.Message.ToString) > End Try > > Return txtStdout > End Function > > I wouldn't use it for anything serious, business critical, or Realtime. > For that you should probably go with a commercial control like > http://www.pdfonline.com/. But for quick and dirty text extraction it > works fine for me. > > Best Regards, > > Chris > > > "B.N.Prabhu" <prabh***@officetiger.com> wrote in message > news:1D395A97-D922-4795-BA82-510C94FB4C0A@microsoft.com... >> Can we Read the text contents from PDF using .net. >> >> If possible means what to do. > > If you want to pull information in some type of format my advice is to
purchase some sdk software (suggestions: OmniPage or ABBYY). The software will allow you to extract information or images from a pdf. I have never used the sdk kit however I have used the software and it does extremely well when we use it to extract data from a pdf and export it as an excel file.
That Eval Question Again...
Simple Eval() (Ithink) question NullReferenceException on DataGridView.Columns Index property Optional Paramter Question Threading using QueueUserWorkItem VB to Delphi Help! how to devide string to array? Change OpenFileDialog Size (width and height) Trying to create a copy of a .NET dll filestream? |
|||||||||||||||||||||||