接《django manage.py command处理流程分析》一文,现在我们以现在我们以manage.py runserver为例,从运行web server 再到处理请求,到最后将response返回给http client 做一个完整的梳理。
关于manage.py runserver
我们查看django/core/management/commands/runserver.py中的handle函数即可
def handle(self, *args, **options):
from django.conf import settings
if not settings.DEBUG and not settings.ALLOWED_HOSTS:
raise CommandError('You must set settings.ALLOWED_HOSTS if DEBUG is False.')
"""
中间做一些配置检查,环境的准备,细节省略
"""
self.run(**options)
这里重点关注run函数,run函数调用inner_run(),在inner_run中重点关注以下代码块:
try:
handler = self.get_handler(*args, **options) #这个handler其实就一个get_uwsgi_application返回的application对象,然后作为参数传递
run(self.addr, int(self.port), handler,
ipv6=self.use_ipv6, threading=threading)
except socket.error as e:
首先通过get_handler获取一个处理request请求的uwsgi对象(这个在django正式应用环境中,写在了项目的wsgi.py中的)。然后调用django/core/servers/basehttps.py中的run方法(在runserver.py的顶层中可以看到这样的语句from django.core.servers.basehttp import get_internal_wsgi_application, run)。
django/core/servers/basehttps.py的这个run方法首先会去创建一个TCPServer来代替apache或者nginx作为web服务器的作用,去监听端口,(我们知道,web服务器的作用就是监听端口,接收、管理请求数据,并返回相应数据)。apache,ngnix等web服务器是一个在应用层的采用http协议进行数据传输的应用,他的下层还是一个基于tcp的socket server,所以内部服务器使用socket server作为web服务器。源码如下:
def run(addr, port, wsgi_handler, ipv6=False, threading=False):
server_address = (addr, port)
if threading:
httpd_cls = type(str('WSGIServer'), (socketserver.ThreadingMixIn, WSGIServer), {})
else:
httpd_cls = WSGIServer
httpd = httpd_cls(server_address, WSGIRequestHandler, ipv6=ipv6)
if threading:
httpd.daemon_threads = True
httpd.set_app(wsgi_handler)
httpd.serve_forever()
首先创建一个WSGIServer,关于创建这个WSGIServer需要特别关注一个参数,那就是WSGIRequestHandler,这个WSGIRequestHandler在最后的调用process_response()的时候会用到,而run函数里面的wsgi_handler,其实就是get_uwsgi_application返回的uwsgi对象。这个在最后也会用到。
接下来就到了serve_forever函数,serve_forever调用select模型监听,如果有请求到来,就执行_handle_request_noblock(),真正的执行是process_request() -> finish_request -> RequestHandlerClass,注意了,这里的RequestHandlerClass其实就是就是在创建socket server的时候传递的参数WSGIRequestHandler,也就是django/core/servers/basehttps.py中的WSGIRequestHandler类,就是上面那段源码中的httpd_cls(server_address, WSGIRequestHandler, ipv6=ipv6)的参数。WSGIRequestHandler在实例化的时候会去调用自己的handle函数,为什么会去调用自己的handle函数呢,我们看源码。
class BaseRequestHandler:
"""Base class for request handler classes."""
def __init__(self, request, client_address, server):
self.request = request
self.client_address = client_address
self.server = server
self.setup() #重点关注,后面会讲到它
try:
self.handle() #重点关注
finally:
self.finish() #重点关注
上面贴出的源码是BaseRequestHandler类,它在init函数中调用了handle函数。但是这跟WSGIRequestHandler有什么关系,现在是要WSGIRequestHandler去调用它的handle函数。其实我们从WSGIRequestHandler的继承关系中是可以看到,WSGIRequestHandler最终继承自BaseRequestHandler类,这个类在它的init函数中调用了handle函数,这个我们可以在上面贴出的源码中可以看到。也就是在实例化WSGIRequestHandler的最后会调用handle函数。那关于WSGIRequestHandler类的handle函数,其源码如下:
def handle(self):
"""Copy of WSGIRequestHandler, but with different ServerHandler"""
self.raw_requestline = self.rfile.readline(65537)
"""
中间省略
"""
handler = ServerHandler(
self.rfile, self.wfile, self.get_stderr(), self.get_environ()
)
handler.request_handler = self # backpointer for logging
handler.run(self.server.get_app())
handle函数里面调用ServerHandler.run,ServerHandler继承于run函数的源码如下:
def run(self, application):
"""Invoke the application"""
"""
中间忽略
"""
self.setup_environ()
self.result = application(self.environ, self.start_response)
self.finish_response()
run函数的关键就看self.result = application(self.environ, self.start_response),而这里的application就是传进来的wsgi_handler,他是一个具有call函数的实例,所以执行application()就会调用自身的call()函数,这个就到了django服务在正式环境中对url请求的处理流程了,最后返回response,然后调用call的参数start_response校验并准备响应头部,最后call将response和header信息组合,返回给BaseHandler(WSGIRequestHandler)的result,即上面的源码,self.result = application(self.environ, self.start_response)。详见《django框架在正式环境中的请求处理流程分析www.jianshu.com/writer#/notebooks/14133407/notes/14482608》
---------------------------------------------重点--------------------------------------------------#
也就是说,在执行self.result = application(self.environ, self.start_response)之前,都是在为处理http请求做准备工作;application(self.environ, self.start_response)是真正的去处理请求(先通过中间件middleware),然后到自己写的view。
最后通过self.finish_response()【底层还是socket】 将application处理的结果返回给http客户端
在真正的 生产环境中,self.setup_environ()函数做的工作和self.finish_response()做的工作都是cgi程序比如wsgi和web服务器比如nginx去做的。我们真正要写的就是类似于django框架的逻辑处理流程。
---------------------------------------------重点--------------------------------------------------#
接下来关注self.finish_response的源码:
def finish_response(self):
"""Send any iterable data, then close self and the iterable
Subclasses intended for use in asynchronous servers will
want to redefine this method, such that it sets up callbacks
in the event loop to iterate over the data, and to call
'self.close()' once the response is finished.
"""
try:
if not self.result_is_file() or not self.sendfile():
for data in self.result:
self.write(data)
self.finish_content()
finally:
self.close()
重点来了,self.result是什么鬼啊!返回去往上看ServerHandler.run的源码,self.result其实就是返回的response,那这个self.write又是干啥的,其实self.write直接调用self._write,因此,分析self._write就可以了。要了解self._write,我们在回到WSGIRequestHandler类的handle函数函数,再上源码:
def handle(self):
"""Copy of WSGIRequestHandler, but with different ServerHandler"""
self.raw_requestline = self.rfile.readline(65537)
"""
中间省略
"""
handler = ServerHandler(
self.rfile, self.wfile, self.get_stderr(), self.get_environ()
)
handler.request_handler = self # backpointer for logging
handler.run(self.server.get_app())
这里关注ServerHandler类,他是继承自class SimpleHandler(BaseHandler):,class SimpleHandler(BaseHandler):中重写了 _write(self,data)函数:
def _write(self,data):
self.stdout.write(data)
self._write = self.stdout.write
接下来的关键就是搞清楚self.stdout,关于self.stdout,还是否记得ServerHandler的参数self.rfile, self.wfile,这些参数最后传递到了class SimpleHandler(BaseHandler),它的init函数如下:
def __init__(self,stdin,stdout,stderr,environ,
multithread=True, multiprocess=False
):
self.stdin = stdin # self.rfile
self.stdout = stdout # self.wfile
self.stderr = stderr
self.base_env = environ
self.wsgi_multithread = multithread
self.wsgi_multiprocess = multiprocess
所以继续关注self.rfile, self.wfile的来源就好了,还记得上面提醒的在BaseRequestHandler类的init函数中要重点关注的self.setup()函数么?现在我们来看看self.setup的源码:
class StreamRequestHandler(BaseRequestHandler):
def setup(self):
self.connection = self.request
if self.timeout is not None:
self.connection.settimeout(self.timeout)
if self.disable_nagle_algorithm:
self.connection.setsockopt(socket.IPPROTO_TCP,
socket.TCP_NODELAY, True)
self.rfile = self.connection.makefile('rb', self.rbufsize)
self.wfile = self.connection.makefile('wb', self.wbufsize)
哈哈,self.rfile, self.wfile 原来在这里。self.connection = self.request。但是这个self.request又是啥啊!!
还记得class BaseServer的serve_forever函数么?不记得的话回到本文的第三段源码处的最后一行,就是httpd.serve_forever(),其源码如下:
def serve_forever(self, poll_interval=0.5):
"""Handle one request at a time until shutdown.
Polls for shutdown every poll_interval seconds. Ignores
self.timeout. If you need to do periodic tasks, do them in
another thread.
"""
self.__is_shut_down.clear()
try:
while not self.__shutdown_request:
# XXX: Consider using another file descriptor or
# connecting to the socket to wake this up instead of
# polling. Polling reduces our responsiveness to a
# shutdown request and wastes cpu at all other times.
r, w, e = _eintr_retry(select.select, [self], [], [],
poll_interval)
if self in r:
self._handle_request_noblock() #重点关注
finally:
self.__shutdown_request = False
self.__is_shut_down.set()
可以看到,它一直在用select模型监听客户端的socket请求,如果有请求到来,执行self._handle_request_noblock() 函数,self._handle_request_noblock() 源码如下:
def _handle_request_noblock(self):
"""Handle one request, without blocking.
I assume that select.select has returned that the socket is
readable before this function was called, so there should be
no risk of blocking in get_request().
"""
try:
request, client_address = self.get_request() #重点关注
except socket.error:
return
if self.verify_request(request, client_address):
try:
self.process_request(request, client_address) #重点关注
except:
self.handle_error(request, client_address)
self.shutdown_request(request)
self.get_request()其实调用的是class TCPServer(BaseServer):的self.get_request(),即socket.accept()
class TCPServer(BaseServer):
def get_request(self):
"""Get the request and client address from the socket.
May be overridden.
"""
return self.socket.accept()
所以这里的request是一个socket连接, client_address是客户端的地址。所以现在就可以知道了
self.wfile = self.connection.makefile('wb', self.wbufsize), self.connection = request 参数, request = get_request()的返回值,即socket.accept()的返回值。所以self.write(data)就是将response写到socket,然后通过socket返回给clinet。
补充一点:WSGIRequestHandler类在下面的函数调用中被实例化。RequestHandlerClass就是WSGIRequestHandler
def finish_request(self, request, client_address):
"""Finish one request by instantiating RequestHandlerClass."""
self.RequestHandlerClass(request, client_address, self)
额,已经23:18了,我也懵逼。。。下次争取画个图出来。
再次明确一点,理解的难点在于类的多重继承以及函数的重写
函数调用流程如下:
-
runserver.py的handle函数 -> self.run -> self.inner_run函数 ->basehttp.py的run函数 -> WSGIServer.serve_forever -> WSGIServer._handle_request_noblock()
【这一段走完了socket 流程,也就是对应于生产环境中的nginx的功能 】
2.self.process_request ->finish_request -> WSGIRequestHandler.init -> WSGIRequestHandler.setup, WSGIRequestHandler.handle ->SimpleHandler继承自(SimpleHandler) 所以调用了SimpleHandler.init(将stdin,stdout初始化成socket文件)
【这一段完成生产环境中的uwsgi服务的功能,接收socket中的消息 】
3.ServerHandler.run -> (self.setup_environ(), self.result = application(self.environ, self.start_response), self.finish_response()) 【真正调用django框架处理请求】本分析中的对象的继承关系如下: WSGIRequestHandler(handle函数) <-simple_server.WSGIRequestHandler <- BaseHTTPRequestHandler <- SocketServer.StreamRequestHandler (setup,finish函数) <- BaseRequestHandler (init 函数) ServerHandler <- simple_server.ServerHandler <- SimpleHandler (__init__,_write,_flush) <- BaseHandler(run,write)