代理IP怎么解决爬虫被封的问题？ - 开心代理IP平台|高质量代理IP|免费代理IP

代理ip知识与免费资源

代理ip文章推荐

首页>代理ip知识与免费资源>正文

代理IP怎么解决爬虫被封的问题？

发布日期：2018/9/6 9:55:18 阅读量：10296

在大量爬取某个网站时，突然被该网站封了IP，再也爬不动了。研究其反爬虫策略时发现，当单个IP访问次数达到某个阈值时，将会限制当天访问。爬虫不能停，工作任务必须按时完成，怎么办呢？同事告知：使用代理IP来解决。

在同事的介绍下，买了开心代理的动态高质量代理IP，接下来就是使用代理IP来继续爬虫工作了。通过python官方文档得知，可用urllib库的request方法中的ProxyHandler方法，build_opener方法，install_opener方法来使用代理IP。

官方文档很官方，有点难以理解，下面是部分关键文档，一起来看下：

class urllib.request.ProxyHandler(proxies=None)

Cause requests to go through a proxy. If proxies is given, it must be a dictionary mapping protocol names to URLs of proxies.（通过代理方法请求，如果给定一个代理，它必须是一个字典映射，key为协议，value为URLs或者代理ip。）

urllib.request.build_opener([handler, ...])

Return an OpenerDirector instance, which chains the handlers in the order given.（build_opener方法返回一个链接着给定顺序的handler的OpenerDirector实例。）

urllib.request.install_opener(opener)

Install an OpenerDirector instance as the default global opener.（install_opener方法安装OpenerDirector实例作为默认的全局opener。）

是不是云里雾里的，如果这样理顺下，就会发现其实很简单：

1、将代理IP及其协议载入ProxyHandler赋给一个opener_support变量；

2、将opener_support载入build_opener方法，创建opener；

3、安装opener。

具体代码如下：

from urllib import request

def ProxySpider(url, proxy_ip, header):

opener_support = request.ProxyHandler({'http': proxy_ip})

opener = request.build_opener(opener_support)

request.install_opener(opener)

req = request.Request(url, headers=header)

rsp = request.urlopen(req).read()

return rsp

有了代理IP，又学会了使用代理IP来做爬虫，这下可以不用担心被目标网站限制了，爬虫工作效率直线上线，按时完成任务不在话下。

1号客服服务时间：08:30AM-21:00PM

2号客服服务时间：08:30AM-21:00PM