行儀の悪いbotからのアクセスを制限する

行儀の悪いbotからのアクセスを制限する

インスタグラムのAPIを叩いているサイトがここのとこAPIの制限エラーで死んでました
アクセス数は減ってるのになぜ???と思ってログを見たら・・・

BaiduspiderとGooglebotでうめつくされてました(´Д`)

180.76.5.145 - - [09/Sep/2013:00:45:56 +0900] "GET /u/rawanalhumaidi HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.58 - - [09/Sep/2013:00:45:56 +0900] "GET /u/wafiim HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
157.55.34.181 - - [09/Sep/2013:00:45:56 +0900] "GET /u/kc246 HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
66.249.74.76 - - [09/Sep/2013:00:45:57 +0900] "GET /u/mitchiethekid29 HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
180.76.5.135 - - [09/Sep/2013:00:45:57 +0900] "GET /u/numznueng HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.66 - - [09/Sep/2013:00:45:58 +0900] "GET /u/o_mandani HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.195 - - [09/Sep/2013:00:45:58 +0900] "GET /u/zisi_stern HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.144 - - [09/Sep/2013:00:45:58 +0900] "GET /u/zzlljayllzz HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
66.249.74.76 - - [09/Sep/2013:00:45:58 +0900] "GET /u/dumbestnameever HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.74.76 - - [09/Sep/2013:00:45:58 +0900] "GET /u/michael_bartaby HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
180.76.5.154 - - [09/Sep/2013:00:45:59 +0900] "GET /u/nada_334 HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
66.249.74.76 - - [09/Sep/2013:00:45:59 +0900] "GET /u/masusun_0826 HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
180.76.5.48 - - [09/Sep/2013:00:46:00 +0900] "GET /u/zbeebz HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
66.249.74.76 - - [09/Sep/2013:00:46:00 +0900] "GET /u/asa_lindback HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.74.76 - - [09/Sep/2013:00:46:00 +0900] "GET /u/arniofficial HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.74.76 - - [09/Sep/2013:00:46:01 +0900] "GET /u/ariana_luvaa HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.74.76 - - [09/Sep/2013:00:46:01 +0900] "GET /u/czamb HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.74.76 - - [09/Sep/2013:00:46:02 +0900] "GET /u/alfred_kos HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
180.76.5.107 - - [09/Sep/2013:00:46:03 +0900] "GET /u/ooy_yochi/525105240056315613_187999142 HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.111 - - [09/Sep/2013:00:46:04 +0900] "GET /u/mrs_sharoof HTTP/1.1" 200 3283 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.6.233 - - [09/Sep/2013:00:46:05 +0900] "GET /u/memintd HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.183 - - [09/Sep/2013:00:46:05 +0900] "GET /u/nelawan HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
66.249.74.76 - - [09/Sep/2013:00:46:05 +0900] "GET /p/299028344015110693_214274815 HTTP/1.1" 200 3909 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.74.76 - - [09/Sep/2013:00:46:05 +0900] "GET /u/a_wilde_idea HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
180.76.5.19 - - [09/Sep/2013:00:46:06 +0900] "GET /u/shelllllll HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.90 - - [09/Sep/2013:00:46:06 +0900] "GET /u/sophiensyy HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
66.249.74.76 - - [09/Sep/2013:00:46:06 +0900] "GET /u/jokesforyouandme HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.74.76 - - [09/Sep/2013:00:46:06 +0900] "GET /u/potatovanipi HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
180.76.5.186 - - [09/Sep/2013:00:46:07 +0900] "GET /u/oum_waritta HTTP/1.1" 200 3283 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.137 - - [09/Sep/2013:00:46:07 +0900] "GET /u/nyzakoo HTTP/1.1" 200 3285 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
66.249.74.76 - - [09/Sep/2013:00:46:07 +0900] "GET /u/jacintaonusnavi HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.74.76 - - [09/Sep/2013:00:46:07 +0900] "GET /u/milaskc HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
180.76.5.181 - - [09/Sep/2013:00:46:08 +0900] "GET /u/mmarisa23 HTTP/1.1" 200 3283 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.162 - - [09/Sep/2013:00:46:08 +0900] "GET /u/paw_print_chicago HTTP/1.1" 200 3284 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

APIは1時間に5000回しか叩けませんので自重していただきましょう

Googleの場合

ウェブマスターツールの設定で、クロールの頻度を調整できます
https://www.google.com/webmasters/tools/settings?hl=ja&siteUrl=[サイトのURL]

robot.txtでは効かないらしいのでウェブマスターツールを使いましょう
1,2日で設定が反映されるようです

Baiduさんの場合

とりあえず効かないらしいですけどrobot.txtを使って拒否ってみましょう

User-agent: baiduspider
Disallow: /

User-agent: baiduspider+
Disallow: /

User-agent: baiduimagespider
Disallow: /

User-agent: baidumobaider
Disallow: /

これで効果がなければ本格的に拒否ります

.htaccessを使う場合

order deny,allow
deny from baidu.jp
deny from baidu.com
deny from 119.63.192.0/21

Apacheの場合

SetEnvIfNoCase User-Agent "^Baiduspider" deny_ua nolog
SetEnvIfNoCase User-Agent "^BaiduImagespider" deny_ua nolog
SetEnvIfNoCase User-Agent "BaiduMobaider" deny_ua nolog

    Allow from all
    Deny from env=deny_ua

Nginxの場合

if ($http_user_agent ~* ^Baiduspider) {
  return 403;
}

いずれの場合をログを確認して、403が返っているかアクセスログに書き込まれなくなれば成功です