Importing Data from PDF Files

Excel Help for Importing Data From Pdf Files in Developing Business Administration Solutions

Find Excel Solutions


Search Forums:
Recent Activity:
Importing Data From Pdf Files
Need Help With Formulas And Fixing This Spreadsheet
Wanting To Calculate All Hours Worked On A Project With Timesheets
Simple Formulae For Creating Complex Interactive Dashboards In Excel
What Is Microsoft Project Gantt Chart Used For?
Check Pdf Files For Text From Excel File.




Follow Business Spreadsheets On:
 
Forum TopicReply Login

Importing Data From Pdf Files

Rate this:
(4.7/5 from 13 votes)
HappyBusiness Spreadsheets has developed a free Excel program to extract and import PDF data into Excel which can be downloaded and used without restriction.

There is a common need to extract and import specific data from PDF files into Excel. Since Excel does not natively support the reading of PDF content, utilities are needed to convert the PDF file content for the Excel format. Several commercial applications accomplish this; however it is often the case where only specific data is required to be imported from multiple PDF files into one structured format.

We created such an application by using VBA code in conjunction with an open source PDF to Text conversion utility, which can be found at Foolabs.

[Download the free PDF data import Excel program here]

Update: 19-Feb-2012
A new version also extracts multiple instances of the same data matching pattern from one or more PDF files.

The program relies on the conversion utility (included in the download) and all PDF files to reside in the same directory as the Excel application. Text or data to extract are defined in the Control sheet by specifying start text, end text and multiple replacements routines with wildcard support. This enables flexibility to obtain comparable data from multiple PDF files based on patterns independent of different PDF file structures.

As many extraction rules as required can be set in order to create a table of information imported by extraction rule and PDF file name. Information on how to set up rules is available within the Excel application with a help icon and cell comments. The VBA code is commented and open for modification.

Any improvements or new features to the code are welcome to be posted here so that we can update the download version to the benefit of everyone.
 Excel Business Forums Administrator
 Posted by on
 
Replies - Displaying 1 to 10 of 63Order Replies By: Most recent | Chronological | Highest Rated
Oops
Rate this:
(3/5 from 1 vote)
Sometimes we need to get creative to match beginning and end text.  In your example you have a closing bracket with the item number. Perhaps this can be used or something on the third line using a wild card.

After the extraction we can run replacements to remove unwanted text content which also comes in handy for defining the extraction rules. 
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Thank you very much for this tool I need to get data from 5000 invoices and have one bit that I cant work out. My data appears on two lines the PDF as follows:

 (line 1)Invoice for
(line2)Partial Private Circuit Charges (123456789)


I need to extract line 2 and it always occurs on the line that follows "Invoice for" but the ending text is volatile. Please help
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Sir,
I have pdf files, the fields are in rowwise. I am a layman.
Will you explain in detail how to extract the data.
I read all posts in this site but I could not able to understand. 
I read the readme file in the folder, but I didn't understand the procedure.

So, please explain very easily. You can reply to my mail also, because the explation would be lengthly.

Thanks,
BMVLU 
 Posted by on
Oops
Rate this:
(3/5 from 1 vote)
To extract multiple instances of text from both within and across multiple PDFs, we need to make sure that the start and end text that surrounds the content required is generic enough to match the multiple instances.  If the text needs to be identified by instance, then we need to specify multiple start and end texts that will pick up each instance.
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
This is excellent utility - thanks for sharing this in open forum. really lot of value. The issue I have is on pulling multiple instances across pages of pdf. I did read thru different upgrades / responses from 2012 but not able to pinpoint on how exactly to do this. This is not in tabular format but has clear qualifiers "contractor name", "rate"

Can you plz help?

 Posted by on
Happy
Rate this:
(3/5 from 1 vote)
In the example for the text file, it seems you want the third column of data for each product type. In this case, you can specify the start texts as the items required (e.g. 'Food'). The end text can be a new line. When the data comes in as one column, we can use the text to columns command in Excel to split the data out with the tab delimiter and then get all data lined up next to the items.
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
This is a great tool if i can figure out how to do a couple of things.  Each month I have sales reports for every day the Bar was open.  These reports are PDF files.  Below is a excert from the txt file that was created.  

101 - System Tracking
 Food                            27              216.29        Emp Disc 50%                 0                0.00                                     0              0.00
 Liquor                         123              452.89        Manager Comp Open            0                0.00                                     0              0.00
 Beer                            59              178.81        Open Food Disc               0                0.00      Gift Cert Redeemed             0              0.00
 Daquiri                        131              690.99        Mgr Comp 100%                1              -93.08      Emp Charge                     0              0.00
 T-Shirt                          0                0.00        Mgr Comp 50%                 0                0.00      Total Other Payments                          0.00

I need the following in a spreadsheet.
Columns       Results
Food            216.29
Liquor          452.89
Beer            178.81
Daquiri        690.99
T-Shirt            0.00

Please let me know the best way to get these results.  THank you for your help and for creating a great tool. 
 Andy
 Posted by on
Happy
Rate this:
(3/5 from 1 vote)
To be sure, we have modified the Excel file for importing the PDF data to include an option for appending results.  We have tested this working with the test PDF files in the zip file.  The new version can be downloaded from the same link in the original post above.
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
In my opinion that is what its not doing.

This excel VBA code or the pdftotext.exe doesnt know how to not overwrite cells that have data on it. It doesnt matter if the code to delete the data is commented out because every time the program simply starts from the beginning, it reads every PDF file from the beginning, that is intended, but at the same time it overwrites the old data because it was not programmed to resume writing from the first empty cell.

While running the extraction command it knows how to copy data of each pdf file to a new row, but after the extraction process is over it forgets where it left. Next time you run the extraction, it starts from the beginning of the sheet overwriting the old data. That is how it runs because it was intended that way.

I have tried to change the mrow variable VBA code so that it would find the first empty cell, but every time I have run errors on the following command line and that is where I drop off.

VBA Code:
Call Run_Replacements(CStr(Cells(mrow, (j + 3)).Address), CStr(arrTmp(j, 2)))
 Posted by on
Oops
Rate this:
(3/5 from 1 vote)
The modification to the VBA for appending results is independent of the PDF files for data import and should therefore work. The logic is that results are not deleted at the beginning and the first row variable 'mrow' is set to the first empty row in the results.
 Excel Business Forums Administrator
 Posted by on
 Displaying page 1 of 7 

Excel templates and solutions matched for Importing Data from PDF Files:
Solutions: Imported Data Cleaning Export MapPoint Waypoints Survey Data Analysis Bezier Curve Fitting

 
Find Excel templates
and add-ins in the
Excel Business Solutions Directory
   
  © 2014 Business Spreadsheets. All Rights Reserved. Legal |