Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


1 post

Wannabe Geek


Topic # 13368 6-May-2007 22:01
Send private message

I am hoping someone can shed some light of something I am very curious about. I have a website with a shopping cart where I can view who is online and what is in their shopping cart. Every night without fail, (starting late evening), there are shoppers with an IP address starting with 65 filling their shopping carts with thousands of dollars worth of goods. If the orders ever came through (they don't) you would know that they were fraud. For instance IP addresses 65.55.209.179 and 65.55.209.178 (there are also others with similar numbers. I have looked them up and they appear US based but I do not have enough knowledge to work out what they are and what they are doing. Is this a good thing or is it detrimental to my website? Your thoughts would be appreciated thank you.

Create new topic
BDFL - Memuneh
61189 posts

Uber Geek
+1 received by user: 11971

Administrator
Trusted
Geekzone
Lifetime subscriber

Reply # 69736 6-May-2007 22:55
Send private message

Welcome to Geekzone... If you lookup the IP address you posted you will find these are BOT - more specifically Microsoft Live bots...

I think they are indexing your site for the Live search engine, and probably what happens is that the "Add to Cart" functionality is a link - they are just following the links.

If you don't want the "Add to cart" to be followed you could use a ROBOTS.TXT file on your server to prevent this happening.







460 posts

Ultimate Geek
Inactive user


  Reply # 69750 6-May-2007 23:57
Send private message

Wow I never knew about that. I guess you could put /robots.txt after lots of domains and find all kinds of interesting things.

122 posts

Master Geek


  Reply # 69760 7-May-2007 01:08
Send private message

Yeah, there was a big furore about the White House robots.txt file. Basically, it was huge, forbidding all spiders from caching pages on Iraq-related sites. It was speculated that they did this to prevent internet archives from producing historical statements that contradicted current policy i.e. the porkies that led US forces into Iraq.

Full story here: http://www.2600.com/news/view/article/

Incidentally, don't you have to 'invite' the bots to your page these days a la Google's URL submission page? (http://www.google.com/addurl/). I guess the worst things that can happen with uninvited bots are damaged demographic analysis, bandwidth hogging (if they steal your images or crawl frequently), invalid search indexing (out-of-date product catalogues), etc.

Also, remember that robots.txt is purely ADVISORY. A spider/bot can choose to blatantly ignore it.

BDFL - Memuneh
61189 posts

Uber Geek
+1 received by user: 11971

Administrator
Trusted
Geekzone
Lifetime subscriber

Reply # 69767 7-May-2007 07:59
Send private message
122 posts

Master Geek


Reply # 69769 7-May-2007 08:20
Send private message

That's actually quite interesting. Do you get many bots crawling GZ? You can forge agent strings, but if hits are logged it shouldn't be too hard to identify browse patterns that were clearly too fast for a human, assuming all the requests were coming from the same IP.

I know a lot of spam bots used to trawl forums for e-mail addresses. Is that still the case? Most decent forums shield addresses these days.

Speaking of indexing, your blog made the front page of Google News last night, Mauricio! It was the article on that phishing app that politely requested users enter their credit card & PIN details.

BDFL - Memuneh
61189 posts

Uber Geek
+1 received by user: 11971

Administrator
Trusted
Geekzone
Lifetime subscriber

  Reply # 69770 7-May-2007 08:23
Send private message

We get a lot of the bots every day. On average we server about 20,000 pages/day to Google, MSN and other bots.

About Google News, not unheard of - posts on Geekzone Blogs, and our News, Reviews sections go to Google News in about ten minutes after posting and are indexed on the main Google index in less than 24 hours. No problem there...






122 posts

Master Geek


  Reply # 69771 7-May-2007 08:37
Send private message

Nicely done! I notice you even out-ranked some major IT news sites like ZDNet et all. 20,000! That's like...1 every 4 seconds. Do you actually have a say how often the bots can/should crawl?

BDFL - Memuneh
61189 posts

Uber Geek
+1 received by user: 11971

Administrator
Trusted
Geekzone
Lifetime subscriber

Create new topic

Twitter »

Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.