Python3.x：BeautifulSoup()解决中文乱码问题

问题：

　　BeautifulSoup获取网页内容，中文显示乱码；

解决方案：

　　遇到情况也是比较奇葩，利用chardet获取网页编码，然后在BeautifulSoup构造器中传入from_encoding=参数，获取的还是一堆乱码；

无奈之下，在网络上大搜索一通，结果还是没搞清楚原因，但是问题倒是找到了解决方案；

在这里提供下，给遇到同样问题的码友：

如果中文页面编码是gb2312，gbk，在BeautifulSoup构造器中传入from_encoding=”gb18030″参数即可解决乱码问题，

即使分析的页面是utf8的页面使用gb18030也不会出现乱码问题；

import requests 
from bs4 import BeautifulSoup 
all_url = "" 
start_html= requests.get(all_url, headers=Hostreferer) 
#如果中文页面编码是gb2312，gbk，在BeautifulSoup构造器中传入from_encoding="gb18030"参数即可解决乱码问题，即使分析的页面是utf8的页面使用gb18030也不会出现乱码问题 
soup = BeautifulSoup(start_html.content, "html.parser", from_encoding="gb18030")

这里chardet的方式也贴出来，供大家参考：

import urllib.request  
import chardet  
all_url = "" 
charset1=chardet.detect(urllib.request.urlopen(all_url).read() ) 
print(charset1) 
#输出结果： {'encoding': 'GB2312', 'confidence': 0.99, 'language': 'Chinese'} 
bmfs = charset1['encoding'] 
print(bmfs) 
#输出结果：GB2312 
 
soup = BeautifulSoup(start_html.content, "html.parser", from_encoding=bmfs)

原创文章，作者：ItWorker，如若转载，请注明出处：https://blog.ytso.com/16779.html

Python3.x：BeautifulSoup()解决中文乱码问题详解编程语言

Python3.x：BeautifulSoup()解决中文乱码问题

问题：

解决方案：

发表回复

Python3.x：BeautifulSoup()解决中文乱码问题详解编程语言

Python3.x：BeautifulSoup()解决中文乱码问题

问题：

解决方案：

相关推荐

发表回复