Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.




472 posts

Ultimate Geek
+1 received by user: 12

Subscriber

Topic # 146971 4-Jun-2014 18:14
Send private message

Hi Guys.

On a regular basis I get emails with PDF attachments. These attachments are scans of Proof of Deliveries of goods delivered to my clients.
I'm trying to figure out if there is any way of saving these attachments in a folder and then have a search function that I can use to find a specific file. Each scan has a number that is unique, IE S123456, as part of the text with in the document. I am using OSX.

I trust this makes sense and explains what I'm trying to achieve. 

Thanks.

Create new topic
4626 posts

Uber Geek
+1 received by user: 1396


  Reply # 1059295 4-Jun-2014 19:04
Send private message

Spotlight in OS X should search file contents automatically. What is the content of the PDFs though - do they contain actual text, or just an image of a page?

2499 posts

Uber Geek
+1 received by user: 927

Subscriber

  Reply # 1059310 4-Jun-2014 19:21
Send private message

How the PDFs are generated will be a big factor, as RunningMan pointed out, if the content is just a scanned image, rather than text data, what you are asking would require some fairly in-depth OCR functionality. Would it be more practical to save the documents in such a way their their unique identifier is part of the filename? This would then be easily searchable.




Windows 7 x64 // i5-3570K // 16GB DDR3-1600 // GTX660Ti 2GB // Samsung 830 120GB SSD // OCZ Agility4 120GB SSD // Samsung U28D590D @ 3840x2160 & Asus PB278Q @ 2560x1440
Samsung Galaxy S5 SM-G900I w/Spark

 
 
 
 




472 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059319 4-Jun-2014 19:34
Send private message

RunningMan: Spotlight in OS X should search file contents automatically. What is the content of the PDFs though - do they contain actual text, or just an image of a page?


It's just an image of a page.



472 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059320 4-Jun-2014 19:36
Send private message

Inphinity: How the PDFs are generated will be a big factor, as RunningMan pointed out, if the content is just a scanned image, rather than text data, what you are asking would require some fairly in-depth OCR functionality. Would it be more practical to save the documents in such a way their their unique identifier is part of the filename? This would then be easily searchable.


I have just saved some with the Unique Identifier as part of the file name. It works but if I have 20 scans with S123456 etc as the unique identifier, the file name will get very long and time consuming to save in this manner. lol.

4626 posts

Uber Geek
+1 received by user: 1396


  Reply # 1059324 4-Jun-2014 19:45
Send private message

What about the email subject line or body - does that have anything useful?



472 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059341 4-Jun-2014 19:54
Send private message

RunningMan: What about the email subject line or body - does that have anything useful?


No.

Subject line always say...POD's

Body always says....Hello Dxxxxx. Please find POD's attached. Thanks Bxxxxxxx.

20715 posts

Uber Geek
+1 received by user: 4015

Trusted
Subscriber

  Reply # 1059345 4-Jun-2014 20:05
Send private message

you need a proper document storage system, not just a bunch of files in email. as a makeshift option, I think acrobat on its own can ocr things which should then make it searchable by the OS search tools.




Richard rich.ms

4626 posts

Uber Geek
+1 received by user: 1396


  Reply # 1059346 4-Jun-2014 20:07
Send private message

Unless you can change the format that it gets sent in - either the PDF isn't just a scanned image, or the email has some useful info in it - then you'll some software to OCR the PDF so it is searchable.



472 posts

Ultimate Geek
+1 received by user: 12

Subscriber

  Reply # 1059365 4-Jun-2014 20:11
Send private message

RunningMan: Unless you can change the format that it gets sent in - either the PDF isn't just a scanned image, or the email has some useful info in it - then you'll some software to OCR the PDF so it is searchable.


It's an automated process from the carrier. They load the signed waybills into their scanner and then when it's finished the email is sent out automaticly.

2330 posts

Uber Geek
+1 received by user: 682

Trusted
Subscriber

  Reply # 1059537 5-Jun-2014 08:33
Send private message

Ask the carrier to look at whether they can OCR the PDFs.  If they are scanning via a photocopier to PDF, some of the copier vendors have basic document management solutions which include the OCRing of files.

3rd party options would monitor a folder for PDFs, OCR them, and then move the result into a different folder.  You could also do this at your end.  If there are high volumes you can script the detaching of PDFs to a folder.




"4 wheels move the body.  2 wheels move the soul."

“Don't believe anything you read on the net. Except this. Well, including this, I suppose.” Douglas Adams

Create new topic



Twitter »

Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:





News »

Fujifilm X beats its best with new top of the range, high-performance camera
Posted 24-Feb-2018 14:05


One million kiwis affected by cybercrime
Posted 24-Feb-2018 13:58


New Zealanders want to engage with government online and via mobile apps
Posted 24-Feb-2018 13:56


Samsung launches Samsung Max
Posted 24-Feb-2018 13:52


CPTPP text and National Interest Analysis released for public scrutiny
Posted 21-Feb-2018 19:43


Foodstuffs to trial digitised shopping trolleys
Posted 21-Feb-2018 18:27


2018: The year of zero-login, smart cars & the biometrics of things
Posted 21-Feb-2018 18:25


Intel reimagines data centre storage with new 3D NAND SSDs
Posted 16-Feb-2018 15:21


Ground-breaking business programme begins in Hamilton
Posted 16-Feb-2018 10:18


Government to continue search for first Chief Technology Officer
Posted 12-Feb-2018 20:30


Time to take Appleā€™s iPad Pro seriously
Posted 12-Feb-2018 16:54


New Fujifilm X-A5 brings selfie features to mirrorless camera
Posted 9-Feb-2018 09:12


D-Link ANZ expands connected smart home with new HD Wi-Fi cameras
Posted 9-Feb-2018 09:01


Dragon Professional for Mac V6: Near perfect dictation
Posted 9-Feb-2018 08:26


OPPO announces R11s with claims to be the picture perfect smartphone
Posted 2-Feb-2018 13:28



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.