行儀の悪いbotからのアクセスを制限する

baidu プログラミング

インスタグラムのAPIを叩いているサイトがここのとこAPIの制限エラーで死んでました
アクセス数は減ってるのになぜ???と思ってログを見たら・・・

BaiduspiderとGooglebotでうめつくされてました(´Д`)

~~~
180.76.5.145 – – [09/Sep/2013:00:45:56 +0900] “GET /u/rawanalhumaidi HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
180.76.5.58 – – [09/Sep/2013:00:45:56 +0900] “GET /u/wafiim HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
157.55.34.181 – – [09/Sep/2013:00:45:56 +0900] “GET /u/kc246 HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
66.249.74.76 – – [09/Sep/2013:00:45:57 +0900] “GET /u/mitchiethekid29 HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
180.76.5.135 – – [09/Sep/2013:00:45:57 +0900] “GET /u/numznueng HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
180.76.5.66 – – [09/Sep/2013:00:45:58 +0900] “GET /u/o_mandani HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
180.76.5.195 – – [09/Sep/2013:00:45:58 +0900] “GET /u/zisi_stern HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
180.76.5.144 – – [09/Sep/2013:00:45:58 +0900] “GET /u/zzlljayllzz HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
66.249.74.76 – – [09/Sep/2013:00:45:58 +0900] “GET /u/dumbestnameever HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.74.76 – – [09/Sep/2013:00:45:58 +0900] “GET /u/michael_bartaby HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
180.76.5.154 – – [09/Sep/2013:00:45:59 +0900] “GET /u/nada_334 HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
66.249.74.76 – – [09/Sep/2013:00:45:59 +0900] “GET /u/masusun_0826 HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
180.76.5.48 – – [09/Sep/2013:00:46:00 +0900] “GET /u/zbeebz HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
66.249.74.76 – – [09/Sep/2013:00:46:00 +0900] “GET /u/asa_lindback HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.74.76 – – [09/Sep/2013:00:46:00 +0900] “GET /u/arniofficial HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.74.76 – – [09/Sep/2013:00:46:01 +0900] “GET /u/ariana_luvaa HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.74.76 – – [09/Sep/2013:00:46:01 +0900] “GET /u/czamb HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.74.76 – – [09/Sep/2013:00:46:02 +0900] “GET /u/alfred_kos HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
180.76.5.107 – – [09/Sep/2013:00:46:03 +0900] “GET /u/ooy_yochi/525105240056315613_187999142 HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
180.76.5.111 – – [09/Sep/2013:00:46:04 +0900] “GET /u/mrs_sharoof HTTP/1.1” 200 3283 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
180.76.6.233 – – [09/Sep/2013:00:46:05 +0900] “GET /u/memintd HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
180.76.5.183 – – [09/Sep/2013:00:46:05 +0900] “GET /u/nelawan HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
66.249.74.76 – – [09/Sep/2013:00:46:05 +0900] “GET /p/299028344015110693_214274815 HTTP/1.1” 200 3909 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.74.76 – – [09/Sep/2013:00:46:05 +0900] “GET /u/a_wilde_idea HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
180.76.5.19 – – [09/Sep/2013:00:46:06 +0900] “GET /u/shelllllll HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
180.76.5.90 – – [09/Sep/2013:00:46:06 +0900] “GET /u/sophiensyy HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
66.249.74.76 – – [09/Sep/2013:00:46:06 +0900] “GET /u/jokesforyouandme HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.74.76 – – [09/Sep/2013:00:46:06 +0900] “GET /u/potatovanipi HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
180.76.5.186 – – [09/Sep/2013:00:46:07 +0900] “GET /u/oum_waritta HTTP/1.1” 200 3283 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
180.76.5.137 – – [09/Sep/2013:00:46:07 +0900] “GET /u/nyzakoo HTTP/1.1” 200 3285 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
66.249.74.76 – – [09/Sep/2013:00:46:07 +0900] “GET /u/jacintaonusnavi HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.74.76 – – [09/Sep/2013:00:46:07 +0900] “GET /u/milaskc HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
180.76.5.181 – – [09/Sep/2013:00:46:08 +0900] “GET /u/mmarisa23 HTTP/1.1” 200 3283 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
180.76.5.162 – – [09/Sep/2013:00:46:08 +0900] “GET /u/paw_print_chicago HTTP/1.1” 200 3284 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
~~~

APIは1時間に5000回しか叩けませんので自重していただきましょう

Googleの場合

ウェブマスターツールの設定で、クロールの頻度を調整できます

[サイトのURL]

robot.txtでは効かないらしいのでウェブマスターツールを使いましょう
1,2日で設定が反映されるようです

Baiduさんの場合

とりあえず効かないらしいですけどrobot.txtを使って拒否ってみましょう

~~~
User-agent: baiduspider
Disallow: /

User-agent: baiduspider+
Disallow: /

User-agent: baiduimagespider
Disallow: /

User-agent: baidumobaider
Disallow: /
~~~

これで効果がなければ本格的に拒否ります

.htaccessを使う場合

~~~
order deny,allow
deny from baidu.jp
deny from baidu.com
deny from 119.63.192.0/21
~~~

Apacheの場合

~~~
SetEnvIfNoCase User-Agent “^Baiduspider” deny_ua nolog
SetEnvIfNoCase User-Agent “^BaiduImagespider” deny_ua nolog
SetEnvIfNoCase User-Agent “BaiduMobaider” deny_ua nolog

Allow from all
Deny from env=deny_ua

~~~

Nginxの場合

~~~
if ($http_user_agent ~* ^Baiduspider) {
return 403;
}
~~~

いずれの場合をログを確認して、403が返っているかアクセスログに書き込まれなくなれば成功です



コメント

タイトルとURLをコピーしました