Home > Forum Home > Developing Business Administration Solutions > Importing Data from PDF Files Share

Importing Data from PDF Files

Excel Help for Importing Data From Pdf Files in Developing Business Administration Solutions


Forum TopicPost Reply Login

Importing Data From Pdf Files

Rate this:
(4.2/5 from 24 votes)
Happy Business Spreadsheets has developed a free Excel program to extract and import PDF data into Excel which can be downloaded and used without restriction.

There is a common need to extract and import specific data from PDF files into Excel. Since Excel does not natively support the reading of PDF content, utilities are needed to convert the PDF file content for the Excel format. Several commercial applications accomplish this; however it is often the case where only specific data is required to be imported from multiple PDF files into one structured format.

We created such an application by using VBA code in conjunction with an open source PDF to Text conversion utility, which can be found at Foolabs.

[Download the free PDF data import Excel program here]

The program relies on the conversion utility (included in the download) and all PDF files to reside in the same directory as the Excel application. Text or data to extract are defined in the Control sheet by specifying start text, end text and multiple replacements routines with wildcard support. This enables flexibility to obtain comparable data from multiple PDF files based on patterns independent of different PDF file structures.

As many extraction rules as required can be set in order to create a table of information imported by extraction rule and PDF file name. Information on how to set up rules is available within the Excel application with a help icon and cell comments. The VBA code is commented and open for modification.

Any improvements or new features to the code are welcome to be posted here so that we can update the download version to the benefit of everyone.
 Excel Business Forums Administrator
 Posted by on
 
Replies - Displaying 11 to 20 of 88Order Replies By: Most Recent | Chronological | Highest Rated
Applaud
Rate this:
(4/5 from 2 votes)
Great addition!
Testing the new version i've discovered that another problem is now fixed. :)
Here in Brasil we use a comma as a decimal separator, in the old version that caused the script to end the field when a comma was found inside the text. Now its working fine.
The multiple instances of text is now found and allocated in the "Multiple Instances Data" table.
Is it possible get the results in a table like the Combined Last Instances, with the columns in order? 
 Alexandre
 Posted by on
Grateful
Rate this:
(4/5 from 2 votes)
What can I say...

CONGRATS!!!

It's Excellent! 
It is working correctly for the test file, i'll be conducting more tests tomorrow when i get more test files.

THANK'S!

 Alexandre
 Posted by on
Confused
Rate this:
(4/5 from 2 votes)
Thanks for the excellent tool. I am able to extract text from pdf files perfectly.

However, the columns in the output text are spaced differently for different pages of the pdf. This makes importing to excel a bit of a problem when dealing with a large number of pages.

Is it possible to delimit the output columns with a delimiter like # or * ?
 Posted by on
Confused
Rate this:
(4/5 from 2 votes)
Hi
Thanks a lot for this great tool. it works like a charm.
Only one thing I am not having any luck with. I have a number displayed as a percentage in the pdf document and I only want to import the number as is, but not the percentage symbol. I would appreciate any help on how to do this.

Here is an example:

in the pdf I have this line:
Making up a number 0.5 %   10.0 %

I used the following to import the numbers:
Start Text     End Text      Replacement Pairs
number        %*              number,|%

When I 'Run Extraction' I get the following output:
0.5      10.0

so my question is how do I get the program to only return the number 0.5 and not the 10.0

Thanks
 Posted by on
Confused
Rate this:
(4/5 from 2 votes)
Thank you so much for creating such useful program. Could i get code written for this program using VBA? it will be very beneficial to me. 

Warm Regards.

 Posted by on
Oops
Rate this:
(3/5 from 1 vote)
This is definitely something that would be good to add to the code as an option. At the moment, the last instance is returned as the result.  The code loops through each line of the text content of the PDF looking for the start pattern and end pattern and saving the part in between in memory. The prt of code in the Run_Extraction() routine that gets the text is right before the end of the loop as:

VBA Code:
'test if in text gathering blCont and append
If blCont = True And strTemp <> "" And Not strLine Like "*" & cs & "*" Then
        strTemp = strTemp & Chr(10) & Trim(strLine)
End If

This needs to be modified with another variable to hold what was previously found and appended to the overall result.
 Excel Business Forums Administrator
 Posted by on
Grateful Hi there,

Firstlty, I would like to thank you for a really nice tool. I have tried it with your test files successfully but it doesn't work for for my own pdf file - The result is always empty...I already put the pdf file on the same folder like you said.

My pdf is here:
https://drive.google.com/file/d/0B822d4OBtHm5Mi1hM0dMaW1LMzg/edit?usp=sharing

I tried different rules like your instructions but the result is always empty...Could you please review my case and give me your advice?

Thanks a lot.

Neo.
 Posted by on
Grateful
Rate this:
(3/5 from 1 vote)
Dear Admin,

I did follow your suggestion but it's still empty. Now I am sharing with you my rule definition:

Start Text End Text Replacement Pairs
PERSONAL INFORMATION  STRENGTHS AND ABILITIES  PERSONAL INFORMATION,|STRENGTHS AND ABILITIES,
 
I also share the excel file with you, so you can review:

https://drive.google.com/file/d/0B822d4OBtHm5ckN2Z3FzUnNNX2s/edit?usp=sharing

For your advice - thanks a lot.

neo. 
 Posted by on
Sad Dear admin,

I already changed the file name to lowercase but I am sorry...the result is still empty.

For your advice.

Thank you.

neo. 
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Hi,
I would need tow help here.
1) For me replacement text is not working fine.
For Eg: My start text is :Receiving Office Engagement Number" which is simlier in most of the PDF .but few PDF have as "Receiving office codeblock" or "Engagement Number" SO it is not piking for those PDFs. How do i put the replacements ?

2)My PDF has Number is multiples palce.
Eg: Number ;ABC123 is in several place in my PDF si the out put is giving 3 to 4 lines.So i would need it to pick only once. How do i change the code ?

Regards,
Vijeth
 Posted by on
 Displaying page 2 of 9 

Excel templates and solutions matched for Importing Data from PDF Files:

Solutions: Export MapPoint Waypoints Survey Data Analysis