jsoup 超时(timeout) 不起作用、timeout not worked as expected详解编程语言

问题

jsoup(版本1.11.2)请求数据时,超时时间设置为1分钟,但是30秒就超时了,爆出SocketTimeoutException:Read timed out。

示例代码

Connection.Response res = Jsoup.connect(url).timeout(60000).ignoreContentType(true) 

在这里插入图片描述
异常栈

java.net.SocketTimeoutException: Read timed out 
 
	at java.net.SocketInputStream.socketRead0(Native Method) 
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) 
	at java.net.SocketInputStream.read(SocketInputStream.java:171) 
	at java.net.SocketInputStream.read(SocketInputStream.java:141) 
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) 
	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) 
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587) 
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492) 
	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) 
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:734) 
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:706) 
	at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:299) 

保险起见,用wireshark抓了包,client端(192.168.8.12)发起请求后(366号包),server端(172.19.80.110)立刻响应了(368号包,只ACK,未携带数据),但是过了30秒后仍然未传输数据,所以client端断开链接,发送FIN报文(369号包)。
在这里插入图片描述

解决

首先,想到看java doc

/**
* Set the total request timeout duration. If a timeout occurs, an [email protected] java.net.SocketTimeoutException} will be thrown.
*

The default timeout is 30 seconds (30,000 millis). A timeout of zero is treated as an infinite timeout.
*

Note that this timeout specifies the combined maximum duration of the connection time and the time to read
* the full response.
* @param millis number of milliseconds (thousandths of a second) before timing out connects or reads.
* @return this Connection, for chaining
* @see #maxBodySize(int)
*/
Connection timeout(int millis);

按照javadoc的意思是,超时时间是connect 时间+read时间的总和,默认是30秒,这明显与实际不符。
根据异常栈,找到源代码

 
org.jsoup.helper.HttpConnection 
 
private static HttpURLConnection createConnection(Connection.Request req) throws IOException { 
            final HttpURLConnection conn = (HttpURLConnection) ( 
                req.proxy() == null ? 
                req.url().openConnection() : 
                req.url().openConnection(req.proxy()) 
            ); 
 
            conn.setRequestMethod(req.method().name()); 
            conn.setInstanceFollowRedirects(false); // don't rely on native redirection support 
            conn.setConnectTimeout(req.timeout()); 
            conn.setReadTimeout(req.timeout() / 2); // gets reduced after connection is made and status is read 
 
           //省略不相关代码 
 

注意,conn.setConnectTimeout(req.timeout()); connect timeout设置成了60s,但conn.setReadTimeout(req.timeout() / 2) 是30s(60/2),正好印证了368号包与369号包相隔30秒。至此真想打包,jsoup的timeout并不完全如javadoc所说,正确的说法应该是,connect timeout是传入的timeout,read timeout是传入timeout的一半。

总结

其实这个问题,最终还是回到了基础知识:tcp的两个超时时间(IT虾米网),一个connect timeout,一个read timeout,分别对应java api中

java.net.Socket
connect timeout

connect(SocketAddress endpoint, int timeout)  
          将此套接字连接到服务器,并指定一个超时值。 

连接超时,是三次握手的时间。


read timeout

setSoTimeout(int timeout)  
          启用/禁用带有指定超时值的 SO_TIMEOUT,以毫秒为单位。 

read timeout是数据报文与数据报文之间的间隔时间,并不是读取全部内容的时间。


正确理解以上两个概念,有助于解决问题。

原创文章,作者:ItWorker,如若转载,请注明出处:https://blog.ytso.com/tech/pnotes/20265.html

(0)
上一篇 2021年7月19日 23:18
下一篇 2021年7月19日 23:18

相关推荐

发表回复

登录后才能评论