多个代理的管理 #128

yxssfxwzy · 2014-05-19T09:25:56Z

先导入代理列表，各个线程均匀的使用这些代理，优先选择速度高的代理。

多个代理的管理

code4craft · 2014-05-27T23:30:21Z

很赞的功能，review了下，实现也不错。

但是我不知道怎么测试，看时间是2月份的，这段代码在自己的项目中已经运行一段时间了吧？

code4craft · 2014-05-27T23:55:55Z

做了一点改动，将

 if (site.getHttpProxyPool().isEnable()) { site.returnHttpProxyToPool((HttpHost) request.getExtra(Request.PROXY), (Integer) request .getExtra(Request.STATUS_CODE)); }

这段代码从Spider挪到了HttpClientDownloader，因为代理本身是HttpClientDownloader的逻辑，不适合侵入到主流程内部。

另外statusCode和proxy这个可否考虑直接在HttpClientDownloader内部消化，不放到request内部？还是说，这个东东有特别的意义？

code4craft · 2014-05-28T00:09:18Z

仔细看了一下，这里似乎PageProcessor也能对代理进行某种程度的操作，所以还必须得放到Spider中。先放回来，我想想有没有优化方案。

yxssfxwzy · 2014-05-29T01:56:20Z

这个功能是我之前为了爬了一个网站开发的，后来一边改一边爬，最后效果挺好的，就发上来了。statusCode放在request内部是因为如果网站封了爬虫的时候，只有在解析网页的时候才能发现，比如要输验证码什么的，无法再HttpClientDownloader里面发现被封了，然后需要在pageprocessor 修改 page.statuscode传给request.stutuscode，这样返回代理池的时候才会对代理进行相应的修改。相关测试还没写

yxssfxwzy added 2 commits May 19, 2014 15:56

change_gitignore

07ea042

add proxy pool

c146e2c

yxssfxwzy mentioned this pull request May 19, 2014

多个代理 #114

Closed

code4craft added a commit that referenced this pull request May 27, 2014

Merge pull request #128 from yxssfxwzy/proxy

e310139

多个代理的管理

code4craft merged commit e310139 into code4craft:master May 27, 2014

code4craft added a commit that referenced this pull request May 27, 2014

spell mistake fix #128

1f21d9c

code4craft added a commit that referenced this pull request May 27, 2014

change return proxy from spider to httpclientdownloader #128

40bf8ca

code4craft added a commit that referenced this pull request May 28, 2014

change back return proxy from spider to httpclientdownloader #128

8d67fd0

yxssfxwzy deleted the proxy branch June 5, 2014 05:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

多个代理的管理 #128

多个代理的管理 #128

Uh oh!

yxssfxwzy commented May 19, 2014

code4craft commented May 27, 2014

code4craft commented May 27, 2014

code4craft commented May 28, 2014

yxssfxwzy commented May 29, 2014

Labels

2 participants

多个代理的管理 #128

多个代理的管理 #128

Uh oh!

Conversation

yxssfxwzy commented May 19, 2014

code4craft commented May 27, 2014

code4craft commented May 27, 2014

code4craft commented May 28, 2014

yxssfxwzy commented May 29, 2014

Labels

2 participants