We have a network drive at work where the whole team stores information relevant to our projects. It's packed with more than 250,000 files, most of these Microsoft Word and Microsoft Excel documents. I've been thinking on how to index this monster. And I think I've found a cheap option.
It should be something easy for the end user - not everyone is a geek around there. But almost everyone knows how to use Google to search for information. So what best than using the Google interface?
At first I thought of the new Google Mini, a small search appliance recently released. It's not that expensive (US$5,000 at launch), but it is limited to 50,000 documents. Next step is the Google Search Appliance. It can index up to 15 million documents, but its cost is not good.
So what can we do? I tried to find if there was a way to change some of the Google Desktop Search settings to allow for indexing network drives. According to the FAQ the tools will not index a network drive. But with some registry setting changes we can have the Google Desktop Search engine scanning mapped network drives. For example, locate the following registry key:
By entering the !C:!M: (where C: and M: are drives to index) into CRAWL_DIRS and removing DONE from CRAWL_FILE we can instruct the engine to actually index remote drives. Note that is the TAB character. Best way to enter it is using Notepad, type !C: and press the TAB key. Then CTRL-A (Select All) and CTRL-C (Copy). Paste into the key.
However, how can I make this Google Desktop Search engine available to a team? What if I install it on a computer that is always on and could be used as a "Search Server"?
Google Desktop Search can be installed on any PC, but the built-in web server will only allow localhost connections. But even this can be changed. I've found that DNKA will act as proxy and allow external connections to the server! The program is very flexible, allowing user control (anonymous or logged use), IP allow/deny, and Logging. And what's more, it allows the user to define a drive list to index, including mapped network drives. This is so much easier than manually changing the registry.
There's a couple of disadvantages though. First, the server will only run in the context of the user who installed it. So this user must be always logged on the server. Or create a scheduled task that runs on startup, without the need of a user to login. Simply create a batch file, google.bat for example, with the following lines:
"C:\Program Files\Google\Google Desktop Search\GoogleDesktop.exe" /startup
"C:\Program Files\DNKA\ServerOptions.exe" /restart
Create a new scheduled task that runs when your computer starts, and run as the user with privileges to run the server programs (this is the username used to install the programs). Now, even if the computer is restarted, there's no need for someone to come around and login!
Second, the Google Desktop Search will not update the network drive index automatically. But DNKA allows for "touch" and also server port number change.
And if the whole team uses the same drive letter as the "Search Server", result links will open the correct document, all the time.
One of the advantages of this approach instead of having each team member with her own Google Desktop Search is the network traffic impact. Instead of multiple users trying to index a huge mapped drive, we have only one doing the job. There you go. A cheap search server...
PS. DNKA is free for personal use, with a cheap licence for commercial use.