Python 爬虫 urllib模块：post方式

发布时间：2020-06-18 19:56:59 来源：网络阅读：1595 作者：虎皮喵的喵栏目：编程语言

本程序以爬取 'http://httpbin.org/post' 为例

格式：

导入urllib.request

导入urllib.parse

数据编码处理，再设为utf-8编码: bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')

打开爬取的网页: response = urllib.request.urlopen('网址', data = data)

读取网页代码: html = response.read()

打印:

1.不decode

print(html) #爬取的网页代码会不分行，没有空格显示，很难看

2.decode

print(html.decode()) #爬取的网页代码会分行，像写规范的代码一样，看起来很舒服

查询请求结果：

a. response.status # 返回 200：请求成功 404：网页找不到，请求失败

b. response.getcode() # 返回 200：请求成功 404：网页找不到，请求失败

1.不decode的程序如下：

import urllib.request import urllib.parsse data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8') response = urllib.request.urlopen(' data = data ) html = response.read() print(html) print("------------------------------------------------------------------") print("------------------------------------------------------------------") print(response.status) print(response.getcode())

运行结果：

Python 爬虫 urllib模块：post方式

2.带decode的程序如下：

import urllib.request import urllib.parsse data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8') response = urllib.request.urlopen(' data = data ) html = response.read() print(html.decode()) print("------------------------------------------------------------------") print("------------------------------------------------------------------") print(response.status) print(response.getcode())

运行结果：

{   "args": {},    "data": "",    "files": {},    "form": {     "word": "hello"   },    "headers": {     "Accept-Encoding": "identity",      "Connection": "close",      "Content-Length": "10",      "Content-Type": "application/x-www-form-urlencoded",      "Host": "httpbin.org",      "User-Agent": "Python-urllib/3.4"   },    "json": null,    "origin": "106.14.17.222",    "url": "http://httpbin.org/post" } ------------------------------------------------------------------ ------------------------------------------------------------------ 200 200

为什么要用bytes转换？

因为

data = urllib.parse.urlencode({'word': 'hello'}) ##没有用bytes response = urllib.request.urlopen('http://httpbin.org/post', data = data ) html = response.read()

错误提示：

Traceback (most recent call last):   File "/usercode/file.py", line 15, in <module>     response = urllib.request.urlopen('http://httpbin.org/post', data = data )   File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen     return opener.open(url, data, timeout)   File "/usr/lib/python3.4/urllib/request.py", line 453, in open     req = meth(req)   File "/usr/lib/python3.4/urllib/request.py", line 1104, in do_request_     raise TypeError(msg) TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.

由此可见，post方式需要将请求内容用二进制编码。

class bytes([source[, encoding[, errors]]])

Return a new “bytes” object, which is an immutable sequence of integers in the range 0 <= x < 256. bytes is an immutable version of bytearray– it has the same non-mutating methods and the same indexing and slicing behavior.

Accordingly, constructor arguments are interpreted as for bytearray().

向AI问一下细节

Python 爬虫 urllib模块：post方式

猜你喜欢

最新资讯

相关推荐

相关标签