Home > Forum Home > Developing Business Administration Solutions > Importing Data from PDF Files Share

Importing Data from PDF Files

Excel Help for Importing Data From Pdf Files in Developing Business Administration Solutions


Forum TopicPost Reply Login

Importing Data From Pdf Files

Rate this:
(4.2/5 from 24 votes)
Happy Business Spreadsheets has developed a free Excel program to extract and import PDF data into Excel which can be downloaded and used without restriction.

There is a common need to extract and import specific data from PDF files into Excel. Since Excel does not natively support the reading of PDF content, utilities are needed to convert the PDF file content for the Excel format. Several commercial applications accomplish this; however it is often the case where only specific data is required to be imported from multiple PDF files into one structured format.

We created such an application by using VBA code in conjunction with an open source PDF to Text conversion utility, which can be found at Foolabs.

[Download the free PDF data import Excel program here]

The program relies on the conversion utility (included in the download) and all PDF files to reside in the same directory as the Excel application. Text or data to extract are defined in the Control sheet by specifying start text, end text and multiple replacements routines with wildcard support. This enables flexibility to obtain comparable data from multiple PDF files based on patterns independent of different PDF file structures.

As many extraction rules as required can be set in order to create a table of information imported by extraction rule and PDF file name. Information on how to set up rules is available within the Excel application with a help icon and cell comments. The VBA code is commented and open for modification.

Any improvements or new features to the code are welcome to be posted here so that we can update the download version to the benefit of everyone.
 Excel Business Forums Administrator
 Posted by on
 
Replies - Displaying 31 to 40 of 88Order Replies By: Most Recent | Chronological | Highest Rated
Confused
Rate this:
(4/5 from 2 votes)
Thanks for the excellent tool. I am able to extract text from pdf files perfectly.

However, the columns in the output text are spaced differently for different pages of the pdf. This makes importing to excel a bit of a problem when dealing with a large number of pages.

Is it possible to delimit the output columns with a delimiter like # or * ?
 Posted by on
Surprised
Rate this:
(3/5 from 1 vote)
The handling of differing column widths depends a lot on the PDF content. 

If you're able to get the data into the Excel output cleanly, then the best option would be to tidy it thereafter.  Options for this include the Text to Columns feature in Excel using spaces or tabs as the delimiter, and the TRIM function to remove white space. 
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Hi Guys, not sure if im just being daft but how do i use this to import data like this in the pdf:

[edit: the single numbers were meant to appear under the ref in a column format, just like the treble digits under amount) REF Amount 1 100.00 2 125.00 3 130.00 4 150.00 5 160.00 I basically want it to pull the data into excel in the exact same format. anyone got any ideas?
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Once you have established the start and end text, the row of numbers will be extracted as one text string. Often in this case, we need to clean the extracted data to reproduce the columns.  The Text to Columns menu item in Excel can accomplish this.
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Thanks for the reply. Sadly, i am still not getting anywhere with it. The file does appear to even be getting checked despite it being in the same folder as the excel document but opens the test pdf's. I also unsure on what commands to enter.
 Posted by on
Oops
Rate this:
(3/5 from 1 vote)
OK - if you can attach an example PDF by replying to this post notification email, we can take a look and see what is going on.
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Hi I am new member to this site.. I downlaoded this tool. I could see one pdfttotext.exe file. when i double click it dont see any activity. Its not getting installed anywhere. I am using windows xp. Could someone help me please.. coz i use texts from around 70-80 pdfs in a day. this tool wud really help me...
 Posted by on
Oops
Rate this:
(3/5 from 1 vote)
The executable file in Pdftotext.exe should be in the same folder as the Excel application and PDF files.  This is launched via command shell from the Excel application to import data.
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Hi All,My pdf files look similar to the below format. I couldnt attach  sample file as i couldnt see the option. Can you please help, with the below set up of pdf files, I need the fields underlined in a new excel sheet. How can i use your tool for the same. Kindly help me. As i work on around 80 - 90 pdf files on a daily basis to extract these underlined rows manually from each one.Thanks a lot for your help.The fields are 1.Reference number2. member id3.   Name4.Policy Number 5.Date of Birth6.mm/dd/yyyy7. Drug Name                                                      Reference Number : XXXXXXX         Member ID   XXXXXXXXXX              Telephone Number            xxxxxxxxxxxxx   Name                         XXXXXXXX         Fax Number   Policy Number         XXXXXXX               E-mail    xxxxxxxxx                            Primary Insured          Gender                     NA                    Date of Birth    NA          Marital Status             NA              Dependent Insured          Name   xxxxxxx                   (mm/dd/yyyy)     xxxxxxx          Gender  xxxxxxx                                                Relation         Spouse          Current Country of Residence           Switzerland             Date of Purchase:                                         10/01/2012     Drug Name:   XXXXXXXXX
 Posted by on
Shocked
Rate this:
(4/5 from 2 votes)
If data within the PDF files is tabulated we need to extract the data by rows and then use the text to columns function in Excel in order to replicate the table structure. 

To extract rows we need to specify the start and end text. End text is simple as it will be a new line by using the [new line] option.  The start text is more complicated as the extraction will return whatever is found after the starting text.  We can specify the first column text and replace it afterward is there is a common pattern. Alternatively if a common pattern exists the the last column of the prevoius row or text then we can use that with [new line] to extract the entire subsequent row. 
 Excel Business Forums Administrator
 Posted by on
 Displaying page 4 of 9 

Excel templates and solutions matched for Importing Data from PDF Files:

Solutions: Export MapPoint Waypoints Survey Data Analysis