Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.




447 posts

Ultimate Geek
+1 received by user: 12

Subscriber

Topic # 146971 4-Jun-2014 18:14
Send private message

Hi Guys.

On a regular basis I get emails with PDF attachments. These attachments are scans of Proof of Deliveries of goods delivered to my clients.
I'm trying to figure out if there is any way of saving these attachments in a folder and then have a search function that I can use to find a specific file. Each scan has a number that is unique, IE S123456, as part of the text with in the document. I am using OSX.

I trust this makes sense and explains what I'm trying to achieve. 

Thanks.

Create new topic
4407 posts

Uber Geek
+1 received by user: 1240


  Reply # 1059295 4-Jun-2014 19:04
Send private message

Spotlight in OS X should search file contents automatically. What is the content of the PDFs though - do they contain actual text, or just an image of a page?

2474 posts

Uber Geek
+1 received by user: 912

Subscriber

  Reply # 1059310 4-Jun-2014 19:21
Send private message

How the PDFs are generated will be a big factor, as RunningMan pointed out, if the content is just a scanned image, rather than text data, what you are asking would require some fairly in-depth OCR functionality. Would it be more practical to save the documents in such a way their their unique identifier is part of the filename? This would then be easily searchable.




Windows 7 x64 // i5-3570K // 16GB DDR3-1600 // GTX660Ti 2GB // Samsung 830 120GB SSD // OCZ Agility4 120GB SSD // Samsung U28D590D @ 3840x2160 & Asus PB278Q @ 2560x1440
Samsung Galaxy S5 SM-G900I w/Spark

 
 
 
 




447 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059319 4-Jun-2014 19:34
Send private message

RunningMan: Spotlight in OS X should search file contents automatically. What is the content of the PDFs though - do they contain actual text, or just an image of a page?


It's just an image of a page.



447 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059320 4-Jun-2014 19:36
Send private message

Inphinity: How the PDFs are generated will be a big factor, as RunningMan pointed out, if the content is just a scanned image, rather than text data, what you are asking would require some fairly in-depth OCR functionality. Would it be more practical to save the documents in such a way their their unique identifier is part of the filename? This would then be easily searchable.


I have just saved some with the Unique Identifier as part of the file name. It works but if I have 20 scans with S123456 etc as the unique identifier, the file name will get very long and time consuming to save in this manner. lol.

4407 posts

Uber Geek
+1 received by user: 1240


  Reply # 1059324 4-Jun-2014 19:45
Send private message

What about the email subject line or body - does that have anything useful?



447 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059341 4-Jun-2014 19:54
Send private message

RunningMan: What about the email subject line or body - does that have anything useful?


No.

Subject line always say...POD's

Body always says....Hello Dxxxxx. Please find POD's attached. Thanks Bxxxxxxx.

20256 posts

Uber Geek
+1 received by user: 3822

Trusted
Subscriber

  Reply # 1059345 4-Jun-2014 20:05
Send private message

you need a proper document storage system, not just a bunch of files in email. as a makeshift option, I think acrobat on its own can ocr things which should then make it searchable by the OS search tools.




Richard rich.ms

4407 posts

Uber Geek
+1 received by user: 1240


  Reply # 1059346 4-Jun-2014 20:07
Send private message

Unless you can change the format that it gets sent in - either the PDF isn't just a scanned image, or the email has some useful info in it - then you'll some software to OCR the PDF so it is searchable.



447 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059365 4-Jun-2014 20:11
Send private message

RunningMan: Unless you can change the format that it gets sent in - either the PDF isn't just a scanned image, or the email has some useful info in it - then you'll some software to OCR the PDF so it is searchable.


It's an automated process from the carrier. They load the signed waybills into their scanner and then when it's finished the email is sent out automaticly.

2272 posts

Uber Geek
+1 received by user: 648

Trusted
Subscriber

  Reply # 1059537 5-Jun-2014 08:33
Send private message

Ask the carrier to look at whether they can OCR the PDFs.  If they are scanning via a photocopier to PDF, some of the copier vendors have basic document management solutions which include the OCRing of files.

3rd party options would monitor a folder for PDFs, OCR them, and then move the result into a different folder.  You could also do this at your end.  If there are high volumes you can script the detaching of PDFs to a folder.




"4 wheels move the body.  2 wheels move the soul."

“Don't believe anything you read on the net. Except this. Well, including this, I suppose.” Douglas Adams

Create new topic



Twitter »

Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:





News »

Vodafone TV — television in the cloud
Posted 17-Oct-2017 19:29


Nokia 8 review: Classy midrange pure Android phone
Posted 16-Oct-2017 07:27


Why carriers might want to embrace Commerce Commission study, MVNOs
Posted 13-Oct-2017 09:42


Fitbit launches Ionic, its health and fitness smartwatch
Posted 12-Oct-2017 15:52


Xero launches machine learning automation to improve coding accuracy for small businesses
Posted 12-Oct-2017 15:45


Bank of New Zealand uses Intel AI to detect financial crime
Posted 12-Oct-2017 15:39


Sony launches Xperia XZ1, a smartphone with real-time 3D capture
Posted 11-Oct-2017 10:26


Notes on Nokia’s phone comeback
Posted 10-Oct-2017 10:06


Air New Zealand begins Inflight Wi-Fi rollout
Posted 9-Oct-2017 20:16


The latest mobile phones in perspective
Posted 9-Oct-2017 18:34


Review: Acronis True Image 2018 — serious backup
Posted 8-Oct-2017 11:22


Lenovo launches ThinkPad Anniversary Edition 25
Posted 7-Oct-2017 23:16


Less fone, more tech as Vodafone gets brand make-over
Posted 6-Oct-2017 08:16


API Talent Achieves AWS MSP Partner Status
Posted 5-Oct-2017 21:20


Stellar Consulting Group now a Domo Partner
Posted 5-Oct-2017 21:03



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.