For the last nearly 3 years myself (and a development team now numbering 4) have been working on a web based data extraction system. It's designed for data capture (and associated workflow) of scanned/photographed/digital documents and its called Xtracta
At the moment the main use case we have targeted is for accounting documents like bank statements, invoices, receipts etc. but it really can be used for any kind of situation where you want to get data from unstructured formats (or structured!). We have a concept of workflow where you can set the fields you want to extract and where the data/images goes at the end (e.g. CSV onto a file server or maybe directly into an API such as Xero). Each workflow also has a unique SFTP/FTP and email address to get documents into it. Currently extraction is a click or click+drag OCR and because of tabbing and the fact that incoming emails and files (e.g. digital documents like invoices) are all auto processed, there is significant time saving.
At the end of last year we were very fortunate to get a government R&D grant (thanks guys!) to develop an artificial intelligence system that will automatically learn how to capture data based on empirical examples and uncovering patterns autonomously. We have this in testing now and will be launching the first phase in July (need to finish testing and scale up but the results are extremely impressive).
We have been in testing with our first big customer (an outsourcing bureau who do 100k+ documents through Xtracta monthly for some big companies and government agencies) and everything is running very smoothly. So it's launch time!
So anyway we are keen to get some people on board to start using the system for whatever they can think of really (although we are especially interested in more Xero users). As such I would like to invite any Geekzone member or to anyone whom you wish to refer this to 3 free months. Drop me a PM if interested.
And feedback is welcome from anyone.
Jonathan (aka Zeon)
Data Capture screen:
An excerpt from the new AI auto capture system within the data capture screen with selected fields chosen for auto extraction:
One of the home dashboards: