Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


erica

1 post

Wannabe Geek


#13368 6-May-2007 22:01
Send private message

I am hoping someone can shed some light of something I am very curious about. I have a website with a shopping cart where I can view who is online and what is in their shopping cart. Every night without fail, (starting late evening), there are shoppers with an IP address starting with 65 filling their shopping carts with thousands of dollars worth of goods. If the orders ever came through (they don't) you would know that they were fraud. For instance IP addresses 65.55.209.179 and 65.55.209.178 (there are also others with similar numbers. I have looked them up and they appear US based but I do not have enough knowledge to work out what they are and what they are doing. Is this a good thing or is it detrimental to my website? Your thoughts would be appreciated thank you.

Create new topic
freitasm
BDFL - Memuneh
80646 posts

Uber Geek
+1 received by user: 41026

Administrator
ID Verified
Trusted
Geekzone
Lifetime subscriber

#69736 6-May-2007 22:55
Send private message

Welcome to Geekzone... If you lookup the IP address you posted you will find these are BOT - more specifically Microsoft Live bots...

I think they are indexing your site for the Live search engine, and probably what happens is that the "Add to Cart" functionality is a link - they are just following the links.

If you don't want the "Add to cart" to be followed you could use a ROBOTS.TXT file on your server to prevent this happening.







Referral links: Quic Broadband (free setup code: R587125ERQ6VE) | Samsung | AliExpress | Wise | Sharesies 

 

Support Geekzone by subscribing (browse ads-free), or making a one-off or recurring donation through PressPatron.

 




weblordpepe
460 posts

Ultimate Geek
Inactive user


  #69750 6-May-2007 23:57
Send private message

Wow I never knew about that. I guess you could put /robots.txt after lots of domains and find all kinds of interesting things.

rwales
122 posts

Master Geek


  #69760 7-May-2007 01:08
Send private message

Yeah, there was a big furore about the White House robots.txt file. Basically, it was huge, forbidding all spiders from caching pages on Iraq-related sites. It was speculated that they did this to prevent internet archives from producing historical statements that contradicted current policy i.e. the porkies that led US forces into Iraq.

Full story here: http://www.2600.com/news/view/article/

Incidentally, don't you have to 'invite' the bots to your page these days a la Google's URL submission page? (http://www.google.com/addurl/). I guess the worst things that can happen with uninvited bots are damaged demographic analysis, bandwidth hogging (if they steal your images or crawl frequently), invalid search indexing (out-of-date product catalogues), etc.

Also, remember that robots.txt is purely ADVISORY. A spider/bot can choose to blatantly ignore it.




All your base are belong to us.



freitasm
BDFL - Memuneh
80646 posts

Uber Geek
+1 received by user: 41026

Administrator
ID Verified
Trusted
Geekzone
Lifetime subscriber

#69767 7-May-2007 07:59
Send private message

You don't need to invite bots, they find pages through links... And most bots respect the robots.txt file.





Referral links: Quic Broadband (free setup code: R587125ERQ6VE) | Samsung | AliExpress | Wise | Sharesies 

 

Support Geekzone by subscribing (browse ads-free), or making a one-off or recurring donation through PressPatron.

 


rwales
122 posts

Master Geek


#69769 7-May-2007 08:20
Send private message

That's actually quite interesting. Do you get many bots crawling GZ? You can forge agent strings, but if hits are logged it shouldn't be too hard to identify browse patterns that were clearly too fast for a human, assuming all the requests were coming from the same IP.

I know a lot of spam bots used to trawl forums for e-mail addresses. Is that still the case? Most decent forums shield addresses these days.

Speaking of indexing, your blog made the front page of Google News last night, Mauricio! It was the article on that phishing app that politely requested users enter their credit card & PIN details.




All your base are belong to us.

freitasm
BDFL - Memuneh
80646 posts

Uber Geek
+1 received by user: 41026

Administrator
ID Verified
Trusted
Geekzone
Lifetime subscriber

  #69770 7-May-2007 08:23
Send private message

We get a lot of the bots every day. On average we server about 20,000 pages/day to Google, MSN and other bots.

About Google News, not unheard of - posts on Geekzone Blogs, and our News, Reviews sections go to Google News in about ten minutes after posting and are indexed on the main Google index in less than 24 hours. No problem there...






Referral links: Quic Broadband (free setup code: R587125ERQ6VE) | Samsung | AliExpress | Wise | Sharesies 

 

Support Geekzone by subscribing (browse ads-free), or making a one-off or recurring donation through PressPatron.

 


 
 
 

Move to New Zealand's best fibre broadband service (affiliate link). Free setup code: R587125ERQ6VE. Note that to use Quic Broadband you must be comfortable with configuring your own router.
rwales
122 posts

Master Geek


  #69771 7-May-2007 08:37
Send private message

Nicely done! I notice you even out-ranked some major IT news sites like ZDNet et all. 20,000! That's like...1 every 4 seconds. Do you actually have a say how often the bots can/should crawl?




All your base are belong to us.

freitasm
BDFL - Memuneh
80646 posts

Uber Geek
+1 received by user: 41026

Administrator
ID Verified
Trusted
Geekzone
Lifetime subscriber

#69773 7-May-2007 08:50
Send private message

You can with Google if you register the domain with their Webmaster tools.





Referral links: Quic Broadband (free setup code: R587125ERQ6VE) | Samsung | AliExpress | Wise | Sharesies 

 

Support Geekzone by subscribing (browse ads-free), or making a one-off or recurring donation through PressPatron.

 


Create new topic








Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.