Home > Forum Home > Developing Business Administration Solutions > Importing Data from PDF Files Share

Importing Data from PDF Files

Excel Help for Importing Data From Pdf Files in Developing Business Administration Solutions


Forum TopicPost Reply Login

Importing Data From Pdf Files

Rate this:
(4.2/5 from 24 votes)
Happy Business Spreadsheets has developed a free Excel program to extract and import PDF data into Excel which can be downloaded and used without restriction.

There is a common need to extract and import specific data from PDF files into Excel. Since Excel does not natively support the reading of PDF content, utilities are needed to convert the PDF file content for the Excel format. Several commercial applications accomplish this; however it is often the case where only specific data is required to be imported from multiple PDF files into one structured format.

We created such an application by using VBA code in conjunction with an open source PDF to Text conversion utility, which can be found at Foolabs.

[Download the free PDF data import Excel program here]

The program relies on the conversion utility (included in the download) and all PDF files to reside in the same directory as the Excel application. Text or data to extract are defined in the Control sheet by specifying start text, end text and multiple replacements routines with wildcard support. This enables flexibility to obtain comparable data from multiple PDF files based on patterns independent of different PDF file structures.

As many extraction rules as required can be set in order to create a table of information imported by extraction rule and PDF file name. Information on how to set up rules is available within the Excel application with a help icon and cell comments. The VBA code is commented and open for modification.

Any improvements or new features to the code are welcome to be posted here so that we can update the download version to the benefit of everyone.
 Excel Business Forums Administrator
 Posted by on
 
Replies - Displaying 21 to 30 of 88Order Replies By: Most recent | Chronological | Highest Rated
Sad
Rate this:
(3/5 from 1 vote)
If there is a space after 'number' then what about starte text as 'number ' and end text as ' '? Logically this would capture only the first numeric. 
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Never mind, I see what you're saying. But is there a way to do it without this formula. it will save me a step.

Please let me know. Thanks
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Thanks again.
And yes there is more text on that same line after the %, i.e., the % is not at the end of the line.
Could you please elaborate more on where to insert this formula. I appreciate it.
 Posted by on
Oops
Rate this:
(3/5 from 1 vote)
From the example given it is hard to know whether it work, but if the end text was simply '% ' then the the extraction should stop on the first number since the last % one would be at the end of the line and have no space after it.

Otherwise, one could extract the first number from the output by using a LEFT(Cell,FIND(Cell," ")-1) formula. 
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(4/5 from 2 votes)
Hi
Thanks a lot for this great tool. it works like a charm.
Only one thing I am not having any luck with. I have a number displayed as a percentage in the pdf document and I only want to import the number as is, but not the percentage symbol. I would appreciate any help on how to do this.

Here is an example:

in the pdf I have this line:
Making up a number 0.5 %   10.0 %

I used the following to import the numbers:
Start Text     End Text      Replacement Pairs
number        %*              number,|%

When I 'Run Extraction' I get the following output:
0.5      10.0

so my question is how do I get the program to only return the number 0.5 and not the 10.0

Thanks
 Posted by on
Oops
Rate this:
(3/5 from 1 vote)
Sometimes we need to get creative to match beginning and end text.  In your example you have a closing bracket with the item number. Perhaps this can be used or something on the third line using a wild card.

After the extraction we can run replacements to remove unwanted text content which also comes in handy for defining the extraction rules. 
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Thank you very much for this tool I need to get data from 5000 invoices and have one bit that I cant work out. My data appears on two lines the PDF as follows:

 (line 1)Invoice for
(line2)Partial Private Circuit Charges (123456789)


I need to extract line 2 and it always occurs on the line that follows "Invoice for" but the ending text is volatile. Please help
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Sir,
I have pdf files, the fields are in rowwise. I am a layman.
Will you explain in detail how to extract the data.
I read all posts in this site but I could not able to understand. 
I read the readme file in the folder, but I didn't understand the procedure.

So, please explain very easily. You can reply to my mail also, because the explation would be lengthly.

Thanks,
BMVLU 
 Posted by on
Oops
Rate this:
(3/5 from 1 vote)
To extract multiple instances of text from both within and across multiple PDFs, we need to make sure that the start and end text that surrounds the content required is generic enough to match the multiple instances.  If the text needs to be identified by instance, then we need to specify multiple start and end texts that will pick up each instance.
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
This is excellent utility - thanks for sharing this in open forum. really lot of value. The issue I have is on pulling multiple instances across pages of pdf. I did read thru different upgrades / responses from 2012 but not able to pinpoint on how exactly to do this. This is not in tabular format but has clear qualifiers "contractor name", "rate"

Can you plz help?

 Posted by on
 Displaying page 3 of 9 

Excel templates and solutions matched for Importing Data from PDF Files:

Solutions: Export MapPoint Waypoints Survey Data Analysis