Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.




453 posts

Ultimate Geek
+1 received by user: 12

Subscriber

Topic # 146971 4-Jun-2014 18:14
Send private message

Hi Guys.

On a regular basis I get emails with PDF attachments. These attachments are scans of Proof of Deliveries of goods delivered to my clients.
I'm trying to figure out if there is any way of saving these attachments in a folder and then have a search function that I can use to find a specific file. Each scan has a number that is unique, IE S123456, as part of the text with in the document. I am using OSX.

I trust this makes sense and explains what I'm trying to achieve. 

Thanks.

Create new topic
4471 posts

Uber Geek
+1 received by user: 1287


  Reply # 1059295 4-Jun-2014 19:04
Send private message

Spotlight in OS X should search file contents automatically. What is the content of the PDFs though - do they contain actual text, or just an image of a page?

2488 posts

Uber Geek
+1 received by user: 919

Subscriber

  Reply # 1059310 4-Jun-2014 19:21
Send private message

How the PDFs are generated will be a big factor, as RunningMan pointed out, if the content is just a scanned image, rather than text data, what you are asking would require some fairly in-depth OCR functionality. Would it be more practical to save the documents in such a way their their unique identifier is part of the filename? This would then be easily searchable.




Windows 7 x64 // i5-3570K // 16GB DDR3-1600 // GTX660Ti 2GB // Samsung 830 120GB SSD // OCZ Agility4 120GB SSD // Samsung U28D590D @ 3840x2160 & Asus PB278Q @ 2560x1440
Samsung Galaxy S5 SM-G900I w/Spark

 
 
 
 




453 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059319 4-Jun-2014 19:34
Send private message

RunningMan: Spotlight in OS X should search file contents automatically. What is the content of the PDFs though - do they contain actual text, or just an image of a page?


It's just an image of a page.



453 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059320 4-Jun-2014 19:36
Send private message

Inphinity: How the PDFs are generated will be a big factor, as RunningMan pointed out, if the content is just a scanned image, rather than text data, what you are asking would require some fairly in-depth OCR functionality. Would it be more practical to save the documents in such a way their their unique identifier is part of the filename? This would then be easily searchable.


I have just saved some with the Unique Identifier as part of the file name. It works but if I have 20 scans with S123456 etc as the unique identifier, the file name will get very long and time consuming to save in this manner. lol.

4471 posts

Uber Geek
+1 received by user: 1287


  Reply # 1059324 4-Jun-2014 19:45
Send private message

What about the email subject line or body - does that have anything useful?



453 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059341 4-Jun-2014 19:54
Send private message

RunningMan: What about the email subject line or body - does that have anything useful?


No.

Subject line always say...POD's

Body always says....Hello Dxxxxx. Please find POD's attached. Thanks Bxxxxxxx.

20429 posts

Uber Geek
+1 received by user: 3899

Trusted
Subscriber

  Reply # 1059345 4-Jun-2014 20:05
Send private message

you need a proper document storage system, not just a bunch of files in email. as a makeshift option, I think acrobat on its own can ocr things which should then make it searchable by the OS search tools.




Richard rich.ms

4471 posts

Uber Geek
+1 received by user: 1287


  Reply # 1059346 4-Jun-2014 20:07
Send private message

Unless you can change the format that it gets sent in - either the PDF isn't just a scanned image, or the email has some useful info in it - then you'll some software to OCR the PDF so it is searchable.



453 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059365 4-Jun-2014 20:11
Send private message

RunningMan: Unless you can change the format that it gets sent in - either the PDF isn't just a scanned image, or the email has some useful info in it - then you'll some software to OCR the PDF so it is searchable.


It's an automated process from the carrier. They load the signed waybills into their scanner and then when it's finished the email is sent out automaticly.

2293 posts

Uber Geek
+1 received by user: 655

Trusted
Subscriber

  Reply # 1059537 5-Jun-2014 08:33
Send private message

Ask the carrier to look at whether they can OCR the PDFs.  If they are scanning via a photocopier to PDF, some of the copier vendors have basic document management solutions which include the OCRing of files.

3rd party options would monitor a folder for PDFs, OCR them, and then move the result into a different folder.  You could also do this at your end.  If there are high volumes you can script the detaching of PDFs to a folder.




"4 wheels move the body.  2 wheels move the soul."

“Don't believe anything you read on the net. Except this. Well, including this, I suppose.” Douglas Adams

Create new topic



Twitter »

Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:





News »

UFB connections pass 460,000
Posted 11-Dec-2017 11:26


The Warehouse Group to adopt IBM Cloud to support digital transformation
Posted 11-Dec-2017 11:22


Dimension Data peeks into digital business 2018
Posted 11-Dec-2017 10:55


2018 Cyber Security Predictions
Posted 7-Dec-2017 14:55


Global Govtech Accelerator to drive public sector innovation in Wellington
Posted 7-Dec-2017 11:21


Stuff Pix media strategy a new direction
Posted 7-Dec-2017 09:37


Digital transformation is dead
Posted 7-Dec-2017 09:31


Fake news and cyber security
Posted 7-Dec-2017 09:27


Dimension Data New Zealand strengthens cybersecurity practice
Posted 5-Dec-2017 20:27


Epson NZ launches new Expression Premium Photo range
Posted 5-Dec-2017 20:26


Eventbrite and Twickets launch integration partnership in Australia and New Zealand
Posted 5-Dec-2017 20:23


New Fujifilm macro lens lands in New Zealand
Posted 5-Dec-2017 20:16


Cyber security not being taken seriously enough
Posted 5-Dec-2017 20:13


Sony commences Android 8.0 Oreo rollout in New Zealand
Posted 5-Dec-2017 20:08


Revera partners with Nyriad to deliver blockchain pilot to NZ Government
Posted 5-Dec-2017 20:01



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.