前几天面试阿里,面试官问我如何解析HTTP协议,我大概说了一下的思路,他最后得出的结论是我对HTTP协议不了解,让我很受打击。回来看《深入剖析Tomcat》,研究一下Tomcat是如何解析HTTP协议的
1. 环境说明
- 《深入剖析Tomcat》是基于
tomcat-4.1.12
进行分析,这个版本在2002年发布,可以说是老古董了。不过就学习而言还是很好的工具. - Http协议的解析在连接器(connector) 中进行,连接器是一个独立的模块,可以被插入到容器中,
tomcat-4.1.12
里提供了默认连接器,但已被标注为过时。
2. 源码分析
2.1 连接器
默认连接器在org.apache.catalina.connector.http
包下,它实现了Connector
接口和Runnable
接口。分析的入口在run
方法中
2.1.1 run()
public void run() {
while (!stopped) {
//从ServerSocket中接受下一个进入的连接
Socket socket = null;
try {
serverSocket.accept()");
socket = serverSocket.accept();
if (connectionTimeout > 0)
socket.setSoTimeout(connectionTimeout);
socket.setTcpNoDelay(tcpNoDelay);//这个有点意思,关闭TCP延迟确认
} catch (AccessControlException ace) {
log("socket accept security exception", ace);
continue;
} catch (IOException e) {
try {
// 如果重新打开失败,退出
synchronized (threadSync) {
if (started && !stopped)
log("accept error: ", e);
if (!stopped) {
serverSocket.close();
serverSocket = open();
}
}
} catch (IOException ioe) {
log("socket reopen, io problem: ", ioe);
break;
} catch (KeyStoreException kse) {
log("socket reopen, keystore problem: ", kse);
break;
} catch (NoSuchAlgorithmException nsae) {
log("socket reopen, keystore algorithm problem: ", nsae);
break;
} catch (CertificateException ce) {
log("socket reopen, certificate problem: ", ce);
break;
} catch (UnrecoverableKeyException uke) {
log("socket reopen, unrecoverable key: ", uke);
break;
} catch (KeyManagementException kme) {
log("socket reopen, key management problem: ", kme);
break;
}
continue;
}
// 把socket给适当的处理器
HttpProcessor processor = createProcessor();//2.1.2
if (processor == null) {
try {
log(sm.getString("httpConnector.noProcessor"));
socket.close();
} catch (IOException e) {
;
}
continue;
}
processor.assign(socket);//2.2.3
// The processor will recycle itself when it finishes
}
synchronized (threadSync) {
threadSync.notifyAll();
}
}
2.1.2 createProcessor()
具体的处理将在HttpProcessor
中进行,一个连接器会创建多个处理器,连接器的数量通过maxProcessors
和minProcessors
进行控制。今天的重点在http协议的解析,创建HttpProcessor
的一些细节就不说了
private HttpProcessor createProcessor() {
synchronized (processors) {
if (processors.size() > 0) {
return ((HttpProcessor) processors.pop());
}
if ((maxProcessors > 0) && (curProcessors < maxProcessors)) {
return (newProcessor());
} else {
if (maxProcessors < 0) {
return (newProcessor());
} else {
return (null);
}
}
}
}
2.2 处理器
HttpProcessor
在独立的线程中对请求进行处理,连接器将请求分配给处理器(调用处理器的assign()
方法),处理器处理完成后将进行回收重复利用
2.2.1 run()
HttpProcessor
同样实现了Runnable
接口,在后台一直运行(被设置为守护线程),等待处理请求
public void run() {
while (!stopped) {
// 等待下一个socket
Socket socket = await();//2.2.2
if (socket == null)
continue;
// 处理请求
try {
process(socket);//2.2.4
} catch (Throwable t) {
log("process.invoke", t);
}
// 完成此次请求
connector.recycle(this);
}
synchronized (threadSync) {
threadSync.notifyAll();
}
}
2.2.2 await()
await()
监视available
变量,如果没有新的请求,就进入阻塞状态,同时run()
方法也会被阻塞
private synchronized Socket await() {
// Wait for the Connector to provide a new Socket
while (!available) {
try {
wait();
} catch (InterruptedException e) {
}
}
// Notify the Connector that we have received this Socket
Socket socket = this.socket;
available = false;
notifyAll();
if ((debug >= 1) && (socket != null))
log(" The incoming request has been awaited");
return (socket);
}
2.2.3 assign(Socket socket)
连接器调用assign
方法分配请求,它会唤醒阻塞的线程.这实际上是一个生产者-,消费者模型,通过available
变量,将请求从连接器传递到处理器。但这个实现并不优雅,并且效率也不高
synchronized void assign(Socket socket) {
// Wait for the Processor to get the previous Socket
while (available) {
try {
wait();
} catch (InterruptedException e) {
}
}
// Store the newly available Socket and notify our thread
this.socket = socket;
available = true;
notifyAll();
if ((debug >= 1) && (socket != null))
log(" An incoming request is being assigned");
}
2.2.4 process(Socket socket)
process(Socket socket)
方法对请求进行处理,此处省略了很多
private void process(Socket socket) {
boolean ok = true;
boolean finishResponse = true;
SocketInputStream input = null;
OutputStream output = null;
// 构造和初始化需要的对象
try {
input = new SocketInputStream(socket.getInputStream(),
connector.getBufferSize());
} catch (Exception e) {
log("process.create", e);
ok = false;
}
keepAlive = true;
while (!stopped && ok && keepAlive) {
finishResponse = true;
try {
//此处的request,response是循环利用的
request.setStream(input);
request.setResponse(response);
output = socket.getOutputStream();
response.setStream(output);
response.setRequest(request);
((HttpServletResponse) response.getResponse()).setHeader
("Server", SERVER_INFO);
} catch (Exception e) {
//...
}
// 解析请求
try {
if (ok) {
parseConnection(socket);//2.2.5
parseRequest(input, output);//2.2.6
if (!request.getRequest().getProtocol().startsWith("HTTP/0"))
parseHeaders(input);//2.2.8
if (http11) {
// 若在请求头中发现"EXpect:100-continue",则设置sendAck为true
//ackRequest方法检查sendAck的值和是否允许分块,如果为true向客户端发送HTTP/1.1 100 Continue\r\n\r\n
ackRequest(output);
// If the protocol is HTTP/1.1, chunking is allowed.
if (connector.isChunkingAllowed())
response.setAllowChunking(true);
}
}
} catch (EOFException e) {
//很可能client或server中的一方断开连接
ok = false;
finishResponse = false;
} catch (ServletException e) {
//...
} catch (InterruptedIOException e) {
//...
} catch (Exception e) {
//...
}
try {
((HttpServletResponse) response).setHeader("Date", FastHttpDateFormat.getCurrentDate());
if (ok) {
connector.getContainer().invoke(request, response);//如果处理正常调用容器的invoke方法
}
} catch (ServletException e) {
//...
} catch (InterruptedIOException e) {
//...
} catch (Throwable e) {
//...
}
// 完成处理请求
if (finishResponse) {
//省略...
//主要是调用response.finishResponse();
}
//必须检查Connection是否被设置为close或者在HTTP/1.0下
if ( "close".equals(response.getHeader("Connection")) ) {
keepAlive = false;
}
// 如果keepAlive为true并且解析没有发生错误,则继续while循环
status = Constants.PROCESSOR_IDLE;
// 回收request和response对象
request.recycle();
response.recycle();
}
try {
shutdownInput(input);
socket.close();
} catch (IOException e) {
//...
} catch (Throwable e) {
//...
}
socket = null;
}
2.2.5 parseConnection(Socket socket)
解析连接信息,获取Internet地址,检查是否使用代理
private void parseConnection(Socket socket)
throws IOException, ServletException {
if (debug >= 2)
log(" parseConnection: address=" + socket.getInetAddress() +
", port=" + connector.getPort());
((HttpRequestImpl) request).setInet(socket.getInetAddress());
if (proxyPort != 0)
request.setServerPort(proxyPort);
else
request.setServerPort(serverPort);
request.setSocket(socket);
}
2.2.6 parseRequest(SocketInputStream input, OutputStream output)
requestLine
是一个HttpRequest
实例,其中包含3个char[]
,分别对应method,uri,protocol.调用SocketInputStream
的readRequestLine()
方法填充请求行,再获得对应的请求方法,URI,协议版本,(查询参数,session ID)
private void parseRequest(SocketInputStream input, OutputStream output)
throws IOException, ServletException {
// 解析请求行
input.readRequestLine(requestLine);//2.2.7
status = Constants.PROCESSOR_ACTIVE;
String method = new String(requestLine.method, 0, requestLine.methodEnd);//获得请求方法
String uri = null;
String protocol = new String(requestLine.protocol, 0, requestLine.protocolEnd);//获得协议版本信息
if (protocol.length() == 0)
protocol = "HTTP/0.9";
// 如果是HTTP/1.1需要在解析请求后保持连接
if ( protocol.equals("HTTP/1.1") ) {
http11 = true;
sendAck = false;
} else {
http11 = false;
sendAck = false;
// 对于HTTP/1.0, 默认不保持连接,除非指定Connection:Keep-Alive
keepAlive = false;
}
// 验证请求行
if (method.length() < 1) {
throw new ServletException(sm.getString("httpProcessor.parseRequest.method"));
} else if (requestLine.uriEnd < 1) {
throw new ServletException(sm.getString("httpProcessor.parseRequest.uri"));
}
// 解析URI上的查询参数
int question = requestLine.indexOf("?");
if (question >= 0) {
request.setQueryString(new String(requestLine.uri, question + 1,requestLine.uriEnd - question - 1));//设置查询参数
if (debug >= 1)
log(" Query string is " +
((HttpServletRequest) request.getRequest())
.getQueryString());
uri = new String(requestLine.uri, 0, question);//获得URI
} else {
request.setQueryString(null);
uri = new String(requestLine.uri, 0, requestLine.uriEnd);
}
// Checking for an absolute URI (with the HTTP protocol)
//检验绝对URI路径和HTTP协议
if (!uri.startsWith("/")) {
int pos = uri.indexOf("://");
// 解析协议和主机名
if (pos != -1) {
pos = uri.indexOf('/', pos + 3);
if (pos == -1) {
uri = "";
} else {
uri = uri.substring(pos);
}
}
}
// 从请求URI解析session ID
int semicolon = uri.indexOf(match);//match=";jsessionid="0
if (semicolon >= 0) {
String rest = uri.substring(semicolon + match.length());
int semicolon2 = rest.indexOf(';');
if (semicolon2 >= 0) {
request.setRequestedSessionId(rest.substring(0, semicolon2));//设置session ID
rest = rest.substring(semicolon2);
} else {
request.setRequestedSessionId(rest);
rest = "";
}
request.setRequestedSessionURL(true);
uri = uri.substring(0, semicolon) + rest;
if (debug >= 1)
log(" Requested URL session id is " +
((HttpServletRequest) request.getRequest())
.getRequestedSessionId());
} else {
request.setRequestedSessionId(null);
request.setRequestedSessionURL(false);
}
//修正RUI(使用字符串操作)
String normalizedUri = normalize(uri);
if (debug >= 1) log("Normalized: '" + uri + "' to '" + normalizedUri + "'");
// 设置请求属性
((HttpRequest) request).setMethod(method);//设置请求方法
request.setProtocol(protocol);//设置协议版本
if (normalizedUri != null) {
((HttpRequest) request).setRequestURI(normalizedUri);
} else {
((HttpRequest) request).setRequestURI(uri);
}
request.setSecure(connector.getSecure());
request.setScheme(connector.getScheme());
if (normalizedUri == null) {
log(" Invalid request URI: '" + uri + "'");
throw new ServletException("Invalid URI: " + uri + "'");
}
if (debug >= 1)
log(" Request is '" + method + "' for '" + uri +
"' with protocol '" + protocol + "'");
}
2.2.7 readRequestLine(HttpRequestLine requestLine)
readRequestLine
方法会分别填充请求行,URI,协议版本
public void readRequestLine(HttpRequestLine requestLine)
throws IOException {
// 检查是否已回收
if (requestLine.methodEnd != 0)
requestLine.recycle();
// 检查空白行
int chr = 0;
do { // 跳过 CR(\r) 或 LF(\n)
try {
chr = read();
} catch (IOException e) {
chr = -1;
}
} while ((chr == CR) || (chr == LF));
if (chr == -1)
throw new EOFException (sm.getString("requestStream.readline.error"));
pos--;
// 读取方法名
int maxRead = requestLine.method.length;//这里的char[]数组的长度为8
int readStart = pos;
int readCount = 0;
boolean space = false;
//读取到空格说明方法名已解析完成
while (!space) {
// 如果char[]已满,将容量翻倍
if (readCount >= maxRead) {
if ((2 * maxRead) <= HttpRequestLine.MAX_METHOD_SIZE) {
char[] newBuffer = new char[2 * maxRead];
System.arraycopy(requestLine.method, 0, newBuffer, 0,
maxRead);
requestLine.method = newBuffer;
maxRead = requestLine.method.length;
} else {
throw new IOException
(sm.getString("requestStream.readline.toolong"));
}
}
// 检查是否读取到末尾
if (pos >= count) {
int val = read();
if (val == -1) {
throw new IOException
(sm.getString("requestStream.readline.error"));
}
pos = 0;
readStart = 0;
}
// 检查是否读取到空格
if (buf[pos] == SP) {
space = true;
}
//填充char[] method
requestLine.method[readCount] = (char) buf[pos];
readCount++;
pos++;
}
requestLine.methodEnd = readCount - 1;//设置请求方法结束位置
// 解析URI
maxRead = requestLine.uri.length;
readStart = pos;
readCount = 0;
space = false;
boolean eol = false;
while (!space) {
if (readCount >= maxRead) {
if ((2 * maxRead) <= HttpRequestLine.MAX_URI_SIZE) {
char[] newBuffer = new char[2 * maxRead];
System.arraycopy(requestLine.uri, 0, newBuffer, 0,
maxRead);
requestLine.uri = newBuffer;
maxRead = requestLine.uri.length;
} else {
throw new IOException(sm.getString("requestStream.readline.toolong"));
}
}
// 检查是否读取到末尾
if (pos >= count) {
int val = read();
if (val == -1)
throw new IOException(sm.getString("requestStream.readline.error"));
pos = 0;
readStart = 0;
}
// 检查是否读取到空格
if (buf[pos] == SP) {
space = true;
} else if ((buf[pos] == CR) || (buf[pos] == LF)) {
// HTTP/0.9 风格的请求
eol = true;
space = true;
}
//填充 char[] uri
requestLine.uri[readCount] = (char) buf[pos];
readCount++;
pos++;
}
requestLine.uriEnd = readCount - 1;//设置uri结束位置
// 解析协议
maxRead = requestLine.protocol.length;
readStart = pos;
readCount = 0;
//是否结束
while (!eol) {
if (readCount >= maxRead) {
if ((2 * maxRead) <= HttpRequestLine.MAX_PROTOCOL_SIZE) {
char[] newBuffer = new char[2 * maxRead];
System.arraycopy(requestLine.protocol, 0, newBuffer, 0,
maxRead);
requestLine.protocol = newBuffer;
maxRead = requestLine.protocol.length;
} else {
throw new IOException(sm.getString("requestStream.readline.toolong"));
}
}
// 检查是否读取到末尾
if (pos >= count) {
int val = read();
if (val == -1)
throw new IOException(sm.getString("requestStream.readline.error"));
pos = 0;
readStart = 0;
}
//是否结束
if (buf[pos] == CR) {
// 跳过\r
} else if (buf[pos] == LF) {
eol = true;
} else {
//填充char[] protocol
requestLine.protocol[readCount] = (char) buf[pos];
readCount++;
}
pos++;
}
requestLine.protocolEnd = readCount;//设置协议版本结束位置
}
2.2.8 parseHeaders(SocketInputStream input)
一个HttpHeader
包含一个name数组和value数组,通过SocketInputStream
中的readHeader
方法填充HttpHeader
对象,整个过程和readRequestLine
类似.
通过HttpHeader
对象,设置request对象对应的属性
private void parseHeaders(SocketInputStream input)
throws IOException, ServletException {
while (true) {
HttpHeader header = request.allocateHeader();//分配一个HttpHeader对象,从对象池中
// 解析请求头
input.readHeader(header);
if (header.nameEnd == 0) {
if (header.valueEnd == 0) {
return;
} else {
throw new ServletException
(sm.getString("httpProcessor.parseHeaders.colon"));
}
}
String value = new String(header.value, 0, header.valueEnd);//获得value值
if (debug >= 1)
log(" Header " + new String(header.name, 0, header.nameEnd)+ " = " + value);
// 设置对应的请求头
if (header.equals(DefaultHeaders.AUTHORIZATION_NAME)) {//authorization头
request.setAuthorization(value);
} else if (header.equals(DefaultHeaders.ACCEPT_LANGUAGE_NAME)) {//accept-language头
parseAcceptLanguage(value);
} else if (header.equals(DefaultHeaders.COOKIE_NAME)) {//cookie头
Cookie cookies[] = RequestUtil.parseCookieHeader(value);//将value解析成Cookie数组
for (int i = 0; i < cookies.length; i++) {
if (cookies[i].getName().equals
(Globals.SESSION_COOKIE_NAME)) {//判断cookie名是否为JSESSIONID
if (!request.isRequestedSessionIdFromCookie()) {
// 只接受第一个session ID
request.setRequestedSessionId(cookies[i].getValue());//设置session ID
request.setRequestedSessionCookie(true);
request.setRequestedSessionURL(false);
if (debug >= 1)
log(" Requested cookie session id is " +
((HttpServletRequest) request.getRequest())
.getRequestedSessionId());
}
}
if (debug >= 1)
log(" Adding cookie " + cookies[i].getName() + "=" +
cookies[i].getValue());
request.addCookie(cookies[i]);//添加cookie到request对象
}
} else if (header.equals(DefaultHeaders.CONTENT_LENGTH_NAME)) {//content-length头
int n = -1;
try {
n = Integer.parseInt(value);
} catch (Exception e) {
throw new ServletException
(sm.getString("httpProcessor.parseHeaders.contentLength"));
}
request.setContentLength(n);
} else if (header.equals(DefaultHeaders.CONTENT_TYPE_NAME)) {//content-type头
request.setContentType(value);
} else if (header.equals(DefaultHeaders.HOST_NAME)) {//host头
int n = value.indexOf(':');
if (n < 0) {
if (connector.getScheme().equals("http")) {
request.setServerPort(80);//设置http协议端口
} else if (connector.getScheme().equals("https")) {
request.setServerPort(443);//设置https协议端口
}
if (proxyName != null)
request.setServerName(proxyName);
else
request.setServerName(value);
} else {
if (proxyName != null)
request.setServerName(proxyName);
else
request.setServerName(value.substring(0, n).trim());
if (proxyPort != 0)
request.setServerPort(proxyPort);
else {
int port = 80;
try {
port =Integer.parseInt(value.substring(n+1).trim());
} catch (Exception e) {
throw new ServletException
(sm.getString("httpProcessor.parseHeaders.portNumber"));
}
request.setServerPort(port);
}
}
} else if (header.equals(DefaultHeaders.CONNECTION_NAME)) {//connection头
if (header.valueEquals(DefaultHeaders.CONNECTION_CLOSE_VALUE)) {//close值
keepAlive = false;
response.setHeader("Connection", "close");
}
} else if (header.equals(DefaultHeaders.EXPECT_NAME)) {//expect头
if (header.valueEquals(DefaultHeaders.EXPECT_100_VALUE))//100-continue值
sendAck = true;
else
throw new ServletException
(sm.getString("httpProcessor.parseHeaders.unknownExpectation"));
} else if (header.equals(DefaultHeaders.TRANSFER_ENCODING_NAME)) {//transfer-encoding头
//request.setTransferEncoding(header);
}
request.nextHeader();//读取下一个请求头
}
}
2.3 Request对象
在调用getParameter
,getParameterMap
,getParameterNames
,getParameterValues
时会先调用parseParameters
方法解析请求参数
2.3.1 parseParameters()
parseParameters
方法将解析结果放入ParameterMap
对象中;ParameterMap
基础自HashMap
,添加了锁定属性,当被锁定时不允许修改
protected void parseParameters() {
if (parsed)//如果已解析直接返回
return;
ParameterMap results = parameters;//初始化ParameterMap对象
if (results == null)
results = new ParameterMap();
results.setLocked(false);//解除锁定
String encoding = getCharacterEncoding();//获得编码信息
if (encoding == null) encoding = "ISO-8859-1";//默认编码
// 解析查询字符串中的参数
String queryString = getQueryString();
try {
RequestUtil.parseParameters(results, queryString, encoding);//解析查询字符串
} catch (UnsupportedEncodingException e) {
;
}
// 从正文中的参数
String contentType = getContentType();
if (contentType == null)
contentType = "";
int semicolon = contentType.indexOf(';');
if (semicolon >= 0) {
contentType = contentType.substring(0, semicolon).trim();
} else {
contentType = contentType.trim();
}
if ("POST".equals(getMethod()) && (getContentLength() > 0)&& (this.stream == null)
&& "application/x-www-form-urlencoded".equals(contentType)) {
//判断条件:POST方法,content-length>0,有ServletInputStream,content-type=application/x-www-form-urlencoded
try {
int max = getContentLength();
int len = 0;
byte buf[] = new byte[getContentLength()];
ServletInputStream is = getInputStream();
while (len < max) {//读取数据
int next = is.read(buf, len, max - len);
if (next < 0 ) {
break;
}
len += next;
}
is.close();
if (len < max) {
//FIX ME,当实际接收长度小于content-length声明的长度时
//上面的代码中检查next=-1可以预防出现死循环
//但是这个bug必须在mod_jk模块中
//记录额外的信息用于debug mod_jk
StringBuffer msg = new StringBuffer();
msg.append("HttpRequestBase.parseParameters content length mismatch\n");
msg.append(" URL: ");
msg.append(getRequestURL());
msg.append(" Content Length: ");
msg.append(max);
msg.append(" Read: ");
msg.append(len);
msg.append("\n Bytes Read: ");
if ( len > 0 ) {
msg.append(new String(buf,0,len));
}
log(msg.toString());
throw new RuntimeException
(sm.getString("httpRequestBase.contentLengthMismatch"));
}
RequestUtil.parseParameters(results, buf, encoding);//解析参数
} catch (UnsupportedEncodingException ue) {
;
} catch (IOException e) {
throw new RuntimeException
(sm.getString("httpRequestBase.contentReadFail") +
e.getMessage());
}
}
results.setLocked(true);
parsed = true;
parameters = results;
}
至此整个HTTP协议的解析流程就完成了
3. 总结
- 连接器负责接收请求,处理器负责解析请求,每个处理器拥有自己的Request和Response对象,这两个对象可以重复使用
- 处理器处理流程
- 解析连接信息:设置Internet地址和代理信息
- 解析请求行:请求方法,URI,协议版本,查询参数(如果有),keep-alive属性,session ID(如果禁用了cookie),标准化URI地址
- 解析请求头:将请求头设置到对应的属性中,其中有几个重要的属性,cookie,content-length和content-type(在处理正文时会用到),conection(主要检查是否close值)
- 解析参数:先解析URI中的查询参数,再解析正文中的参数
- 调用容器的invoke方法
PS:
看完Tomcat如何解析才发现自己对HTTP协议不了解,只考虑到了HTTP协议的格式,没有考虑到不同版本的区别,特殊请求头的处理,不同请求方法的处理,cookie的解析,session的处理。
随着HTTP协议的发展,解析的难度越来越大,要求也越来越高(高效+正确);以上的代码在Tomcat4中已经废弃,换了更高效的连接器.等有时间去看一下Tomcat8的源码