网络编程初探

socket套接字模块server/client操作

如果想用Python做一个服务器端和客户端的通信程序,那么就一定要选择标准库中的 scoket 套接字模块,它支持多种网络协议:TCP/IP、ICMP/IP、UDP/IP等。

  • 在网络中一个最基本的组件就是套接字(socket),它的功能是在2个机器或进程之间建立信息的通道。
  • socket包括2个套接字,一个是服务器端(server),一个是客户端(client)。在一个程序中创建服务器端的套接字,让它等客户端的连接,这样它就在这个IP和端口处,监听。
    • 处理Client端套接字通常比处理服务器端套接字相对容易一些,因为服务器端还要准备随时处理客户端的连接,同时还要处理多个连接任务。
    • 而客户端只需要简单的设置好IP和端口就可以完成任务了。

socket套接字有2个方法,一个是send,另一个是recv,它们用来传输数据信息。
可以用字符串参数调用send方法发送数据,用一个所需的最大字节数做参数调用recv方法来接收数据。

Socket套接字模块的信息,可以参考Python官网的标准库socket — Low-level networking interface

Docstring:  
This module provides socket operations and some related functions.
On Unix, it supports IP (Internet Protocol) and Unix domain sockets.
On other systems, it only supports IP. Functions specific for a
socket are available as methods of the socket object.

socket.socket()函数用来创建套接字

import socket 
# 创建一个TCP/IP的套接字
socket.socket(socket.AF_INET,socket.SOCK_STREAM)

# 创建一个UDP/IP的套接字
socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
<socket.socket fd=388, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0>

套接字对象常用函数

函数名 说明
socket() 根据TCP/UDP协议类型创建socket(套接字编程接口)(create a new socket object)
connect() 主动方向被动方建立连接、会发起TCP三次握手(主动初始化TCP服务器连接,出错时抛出异常)
bind() 绑定地址(主机和端口)绑定到套接字(TCP/UDP)
listen() 将套接字转换为被动套接字,并制定监听TCP的最大连接输
accept() 用于返回下一个已完成连接TCP服务器,被动接受TCP客户的连接,阻塞式(accept会阻塞程序,运行到这里会挂起)
connect_ex() 在connect的基础上增加了出错返回错误码
recv() 接收TCP数据
send() 发送TCP数据
sendall() 完整发送TCP数据
recvfrom() 接收UDP数据
sendto() 发送UDP数据
socketpair() create a pair of new socket objects [*]
fromfd() create a socket object from an open file descriptor [*]
fromshare() create a socket object from data received from socket.share() [*]
gethostname() return the current hostname
gethostbyname() map a hostname to its IP number
gethostbyaddr() map an IP number or hostname to DNS info
getservbyname() map a service name and a protocol name to a port number
getprotobyname() map a protocol name (e.g. 'tcp') to a number
ntohs(), ntohl() convert 16, 32 bit int from network to host byte order
htons(), htonl() convert 16, 32 bit int from host to network byte order
inet_aton() convert IP addr string (123.45.67.89) to 32-bit packed format
inet_ntoa() convert 32-bit packed format IP to string (123.45.67.89)
socket.getdefaulttimeout() get the default timeout value
socket.setdefaulttimeout() set the default timeout value
create_connection() connects to an address, with an optional timeout and optional source address

端口是指接口电路中的一些寄存器,这些寄存器分别用来存放数据信息、控制信息和状态信息,相应的端口分别称为数据端口、控制端口和状态端口。

查看自己端口的方法如下:

  1. 切换到桌面,按Win+X组合键,选择“命令提示符(管理员)”命令;
  2. 如果只是选择了“命令提示符”命令,则后面的操作可能会出现错误;
  3. 打开DOS窗口后,一般我们会先输入“netstat”命令查看简单的统计信息,其中冒号后面的是端口信息;
  4. 输入“netstat -nao”命令时可以在最右列显示PID进程序号,以便用命令直接结束程序;
  5. 输入“netstat -nab”命令可以网络连接、端口占用和程序运行的详细信息;
  6. 发现这些异常的端口和程序后可以先结束进程树了,并进一步进行其它详细操作:
  7. 而如果你需要具体的监视和管控端口使用的话,则需要用到第三方软件了,这类如聚生网管等软件可以直观、快速的实现端口监视和管控。
# socket 套接字的服务器端写法
import socket
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)  # 定义TCP的socket
sock.bind(('localhost',13014))    # 绑定端口
sock.listen(5)    # 设置监听
while True:
    connection,address = sock.accept()     # 获取地址
    print('client ip is ')       # 打印地址
    print(address)

try:
    connection.settimeout(5)     # 设置超时时间
    buf = connection.recv(1024)   # 设置接收缓存
    if buf == '1':
        connection.send('welcome to python server!')    # 发送数据
    else:
        connection.send('please go out!')            # 发送数据
except sock.timeout:
    print('time out')
    connection.close()

# socket 套接字的客户端写法
import socket
import time
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)  # 定义TCP的socket
sock.connect(('localhost',13014))    # 设置连接地址端口
time.sleep(2)    #休眠
sock.send('1')     # 发送数据
print(sock.recv(1024))       #打印缓存信息
sock.close()

urllib模块

在Python 3以后的版本中,urllib2这个模块已经不单独存在(也就是说当你import urllib2时,系统提示你没这个模块),urllib2被合并到了urllib中,叫做urllib.request 和 urllib.error。

urllib整个模块分为urllib.request, urllib.parse, urllib.error,urllib.response
例:

  • 其中urllib2.urlopen()变成了urllib.request.urlopen()
  • urllib2.Request()变成了urllib.request.Request()

urllib.request

Type:        module
Docstring:  
An extensible library for opening URLs using a variety of protocols

The simplest way to use this module is to call the urlopen function,
which accepts a string containing a URL or a Request object (described
below).  It opens the URL and returns the results as file-like
object; the returned object has some extra methods described below.

The OpenerDirector manages a collection of Handler objects that do
all the actual work.  Each Handler implements a particular protocol or
option.  The OpenerDirector is a composite object that invokes the
Handlers needed to open the requested URL.  For example, the
HTTPHandler performs HTTP GET and POST requests and deals with
non-error returns.  The HTTPRedirectHandler automatically deals with
HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
deals with digest authentication.

urlopen(url, data=None) -- Basic usage is the same as original
urllib.  pass the url and optionally data to post to an HTTP URL, and
get a file-like object back.  One difference is that you can also pass
a Request instance instead of URL.  Raises a URLError (subclass of
OSError); for HTTP errors, raises an HTTPError, which can also be
treated as a valid response.

build_opener -- Function that creates a new OpenerDirector instance.
Will install the default handlers.  Accepts one or more Handlers as
arguments, either instances or Handler classes that it will
instantiate.  If one of the argument is a subclass of the default
handler, the argument will be installed instead of the default.

install_opener -- Installs a new opener as the default opener.

objects of interest:

OpenerDirector -- Sets up the User Agent as the Python-urllib client and manages
the Handler classes, while dealing with requests and responses.

Request -- An object that encapsulates the state of a request.  The
state can be as simple as the URL.  It can also include extra HTTP
headers, e.g. a User-Agent.

BaseHandler --

internals:
BaseHandler and parent
_call_chain conventions

Example usage:

import urllib.request

# set up authentication info
authinfo = urllib.request.HTTPBasicAuthHandler()
authinfo.add_password(realm='PDQ Application',
                      uri='https://mahler:8092/site-updates.py',
                      user='klem',
                      passwd='geheim$parole')

proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})

# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
                                     urllib.request.CacheFTPHandler)

# install it
urllib.request.install_opener(opener)

f = urllib.request.urlopen('http://www.python.org/')

requests入门

urllib.request.urlopen

Signature: urllib.request.urlopen(url, data=None, timeout=<object object at 0x000002BE3FA59760>, *, cafile=None, capath=None, cadefault=False, context=None)
Docstring:
Open the URL url, which can be either a string or a Request object.

*data* must be an object specifying additional data to be sent to
the server, or None if no such data is needed.  See Request for
details.

urllib.request module uses HTTP/1.1 and includes a "Connection:close"
header in its HTTP requests.

The optional *timeout* parameter specifies a timeout in seconds for
blocking operations like the connection attempt (if not specified, the
global default timeout setting will be used). This only works for HTTP,
HTTPS and FTP connections.

If *context* is specified, it must be a ssl.SSLContext instance describing
the various SSL options. See HTTPSConnection for more details.

The optional *cafile* and *capath* parameters specify a set of trusted CA
certificates for HTTPS requests. cafile should point to a single file
containing a bundle of CA certificates, whereas capath should point to a
directory of hashed certificate files. More information can be found in
ssl.SSLContext.load_verify_locations().

The *cadefault* parameter is ignored.

This function always returns an object which can work as a context
manager and has methods such as

* geturl() - return the URL of the resource retrieved, commonly used to
  determine if a redirect was followed

* info() - return the meta-information of the page, such as headers, in the
  form of an email.message_from_string() instance (see Quick Reference to
  HTTP Headers)

* getcode() - return the HTTP status code of the response.  Raises URLError
  on errors.

For HTTP and HTTPS URLs, this function returns a http.client.HTTPResponse
object slightly modified. In addition to the three new methods above, the
msg attribute contains the same information as the reason attribute ---
the reason phrase returned by the server --- instead of the response
headers as it is specified in the documentation for HTTPResponse.

For FTP, file, and data URLs and requests explicitly handled by legacy
URLopener and FancyURLopener classes, this function returns a
urllib.response.addinfourl object.

Note that None may be returned if no handler handles the request (though
the default installed global OpenerDirector uses UnknownHandler to ensure
this never happens).

In addition, if proxy settings are detected (for example, when a *_proxy
environment variable like http_proxy is set), ProxyHandler is default
installed and makes sure the requests are handled through the proxy.

Type:      function

urlopen创建一个表示远程url类文件对象,然后像本地文件一样操作这个类文件对象来获取远程数据。

  • 参数url表示远程数据的路径,一般是网址。如果要执行更加复杂的操作,如修改HTTP报头,可创建Request实例并当为url参数使用;
  • 参数data表示以post方式提交到url的数据,需要经过URL编码;
  • timeout是可选的超时选项。

返回值:一个类似文件对象的对象(file_like object)
该对象拥有方法:
read() , readline() , readlines() , fileno() , close() :这些方法的使用方式与文件对象完全一样

方法 描述
read([bytes]) 从文件对象中读出所有或bytes个字节
readline() 以字节字符串形式读取单行文本
readlines() 读取所有输入行并返回列表
fileno() 返回整数文件描述符
close() 关闭连接
info() 返回的mimetools.Message映射对象,表示远程服务器返回的头信息。
geturl() 返回真实的URL(之所以称为真实,是因为对于某些重定向的URL,将返回被重定后的URL)
getcode() 返回整数形式的HTTP响应代码

抓取html网页

import urllib
response=urllib.request.urlopen('http://www.cnblogs.com/linxiyue/p/3537486.html')
response.getcode()
200
response.geturl()
'http://www.cnblogs.com/linxiyue/p/3537486.html'

urllib.request.urlretrieve

Signature: urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)
Docstring:
Retrieve a URL into a temporary location on disk.

Requires a URL argument. If a filename is passed, it is used as
the temporary file location. The reporthook argument should be
a callable that accepts a block number, a read size, and the
total file size of the URL target. The data argument should be
valid URL encoded data.

If a filename is passed and the URL points to a local resource,
the result is a copy from local file to new file.

Returns a tuple containing the path to the newly created
data file as well as the resulting HTTPMessage object.
Type:      function

参数

  • finename: 指定了保存本地路径(如果参数未指定,urllib会生成一个临时文件保存数据。)
  • reporthook: 是一个回调函数,当连接上服务器、以及相应的数据块传输完毕时会触发该回调,我们可以利用这个回调函数来显示当前的下载进度。
  • data: 指 post 到服务器的数据,该方法返回一个包含两个元素的(filename, headers)元组,filename 表示保存到本地的路径,header 表示服务器的响应头。

示例:

将 google 的 html 抓取到本地,保存在 E://bing_images.html文件中,同时显示下载的进度。

import urllib
url = 'https://cn.bing.com/images/trending?form=Z9LH'
local = 'e://bing_images.html'
urllib.request.urlretrieve(url, local)
('e://bing_images.html', <http.client.HTTPMessage at 0x1e788975940>)
# 下面是 urlretrieve() 下载文件实例,可以显示下载进度
import urllib
import os

def Schedule(a,b,c):
    '''
    回调函数
    @a: 已经下载的数据块
    @b:数据块大小
    @c:远程文件大小
    '''
    per = 100.0 * a * b / c
    if per > 100:
        per = 100
    print('%.2f%%' % per)
    
url = 'http://pic.7y7.com/Uploads/Former/20154/2015040338924433_0_0_water.jpg'
local = os.path.join(r'E:\图片','water.jpg')
urllib.request.urlretrieve(url,local,Schedule)
0.00%
5.64%
11.28%
16.92%
22.56%
28.20%
33.84%
39.48%
45.12%
50.76%
56.40%
62.05%
67.69%
73.33%
78.97%
84.61%
90.25%
95.89%
100.00%





('E:\\图片\\water.jpg', <http.client.HTTPMessage at 0x1e7882a0048>)

通过上面的练习可以知道,urlopen() 可以轻松获取远端 html 页面信息,然后通过 python 正则对所需要的数据进行分析,匹配出想要用的数据,在利用urlretrieve() 将数据下载到本地。对于访问受限或者对连接数有限制的远程 url 地址可以采用proxies(代理的方式)连接,如果远程数据量过大,单线程下载太慢的话可以采用多线程下载,这个就是传说中的爬虫。

urllib.request.quote

Signature: urllib.request.quote(string, safe='/', encoding=None, errors=None)
Docstring:
quote('abc def') -> 'abc%20def'

Each part of a URL, e.g. the path info, the query, etc., has a
different set of reserved characters that must be quoted.

RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists
the following reserved characters.

reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
              "$" | ","

Each of these characters is reserved in some component of a URL,
but not necessarily in all of them.

By default, the quote function is intended for quoting the path
section of a URL.  Thus, it will not encode '/'.  This character
is reserved, but in typical usage the quote function is being
called on a path where the existing slash characters are used as
reserved characters.

string and safe may be either str or bytes objects. encoding and errors
must not be specified if string is a bytes object.

The optional encoding and errors parameters specify how to deal with
non-ASCII characters, as accepted by the str.encode method.
By default, encoding='utf-8' (characters are encoded with UTF-8), and
errors='strict' (unsupported characters raise a UnicodeEncodeError).

Type:      function

urllib.request.unquote

Signature: urllib.request.unquote(string, encoding='utf-8', errors='replace')
Docstring:
Replace %xx escapes by their single-character equivalent. The optional
encoding and errors parameters specify how to decode percent-encoded
sequences into Unicode characters, as accepted by the bytes.decode()
method.
By default, percent-encoded sequences are decoded with UTF-8, and invalid
sequences are replaced by a placeholder character.

unquote('abc%20def') -> 'abc def'.
Type:      function

urllib.request.urlopen(url).read().decode

Signature: resp.decode(encoding='utf-8', errors='strict')
Docstring:
Decode the bytes using the codec registered for encoding.

encoding
  The encoding with which to decode the bytes.
errors
  The error handling scheme to use for the handling of decoding errors.
  The default is 'strict' meaning that decoding errors raise a
  UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
  as well as any other name registered with codecs.register_error that
  can handle UnicodeDecodeErrors.
Type:      builtin_function_or_method

urllib.parse.urlencode

Signature: urllib.parse.urlencode(query, doseq=False, safe='', encoding=None, errors=None, quote_via=<function quote_plus at 0x000002BE403499D8>)
Docstring:
Encode a dict or sequence of two-element tuples into a URL query string.

If any values in the query arg are sequences and doseq is true, each
sequence element is converted to a separate parameter.

If the query arg is a sequence of two-element tuples, the order of the
parameters in the output will match the order of parameters in the
input.

The components of a query arg may each be either a string or a bytes type.

The safe, encoding, and errors parameters are passed down to the function
specified by quote_via (encoding and errors only if a component is a str).
Type: function

import urllib

try:
    web = urllib.request.urlopen('http://www.python.org/')
    resp = web.read()
except HTTPError as e:
    resp = e.read()
web
<http.client.HTTPResponse at 0x2be438c1e48>

代理

import urllib
enable_proxy = True
# 定义指定和非指定的代理ProxyHandler对象
proxy_handler = urllib.request.ProxyHandler({'http':'http://代理:8080'})
null_proxy_handler = urllib.request.ProxyHandler({})
# 通过代理对象定义opner对象,if语句判断是否打开代理
if enable_proxy:
    opener = urllib.request.build_opener(proxy_handler)
else:
    opener = urllib.request.build_opener(null_proxy_handler)
# 安装opener对象作为urlopen的全局opener
urllib.request.install_opener(opener)

UDP编程 & TCP 编程

参考Python程序设计与实现

图片发自简书App

图片发自简书App

图片发自简书App

图片发自简书App

图片发自简书App

图片发自简书App

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,456评论 5 477
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,370评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,337评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,583评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,596评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,572评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,936评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,595评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,850评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,601评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,685评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,371评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,951评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,934评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,167评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 43,636评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,411评论 2 342

推荐阅读更多精彩内容