本文是阅读了Etcd client v3官方代码后,结合个人理解做的一些整理,有理解不正确不到位的地方还望指正~
Etcd版本信息
- 代码库:https://github.com/etcd-io/etcd
- 版本:v3.3.18
Etcd client实现
概述
- Etcd client v3是基于grpc实现的,而grpc又是基于http2.0实现的。因此整体上借用grpc的框架做地址管理、连接管理、负载均衡等;而底层对每个Etcd的server只需维持一个http2.0连接。
- Etcd client v3包装了一个grpc的ClientConn,然后实现了Resolver和Balancer来管理它和与它交互。
- Etcd client v3实现了grpc中的Resolver接口,用于Etcd server地址管理。当client初始化或者server集群地址发生变更(可以配置定时刷新地址)时,Resolver解析出新的连接地址,通知grpc ClientConn来响应变更。
- Etcd client v3实现了grpc中的Balancer的接口,用于连接生命的周期管理和负载均衡等功能。当ClientConn或者其代表的SubConn状态发生变更时,会调用Balancer的接口以通知它。Balancer会生成一个视图Picker给ClientConn,供其选取可用的SubConn连接。
- 底层的SubConn又包装了Http2.0的连接ClientTransport。ClientTransport会维持与服务器的心跳,当连接断开会向上通知,让ClientConn通知Balancer去重建连接
- 总之,Balancer和Resolver像是ClientConn的手,但是由用户去定义它们的具体实现;而SubConn则是ClientConn的内脏,是其实现功能的基础。
源码
Resolver
代码路径:https://github.com/grpc/grpc-go/blob/v1.26.0/resolver/resolver.go
- Resolver接口定义了Resolver给ClientConn调用的方法
// Resolver watches for the updates on the specified target.
// Updates include address updates and service config updates.
type Resolver interface {
// ResolveNow will be called by gRPC to try to resolve the target name
// again. It's just a hint, resolver can ignore this if it's not necessary.
//
// It could be called multiple times concurrently.
ResolveNow(ResolveNowOptions)
// Close closes the resolver.
Close()
}
- ResolveNow:ClientConn在创建新连接前需要调用它来解析路径
- Close:把Resolver关掉,关闭相关的goroutine等资源
- ClientConn接口定义了ClientConn给Resolver调用的方法
// ClientConn contains the callbacks for resolver to notify any updates
// to the gRPC ClientConn.
//
// This interface is to be implemented by gRPC. Users should not need a
// brand new implementation of this interface. For the situations like
// testing, the new implementation should embed this interface. This allows
// gRPC to add new methods to this interface.
type ClientConn interface {
// UpdateState updates the state of the ClientConn appropriately.
UpdateState(State)
// ReportError notifies the ClientConn that the Resolver encountered an
// error. The ClientConn will notify the load balancer and begin calling
// ResolveNow on the Resolver with exponential backoff.
ReportError(error)
// NewAddress is called by resolver to notify ClientConn a new list
// of resolved addresses.
// The address list should be the complete list of resolved addresses.
//
// Deprecated: Use UpdateState instead.
NewAddress(addresses []Address)
// NewServiceConfig is called by resolver to notify ClientConn a new
// service config. The service config should be provided as a json string.
//
// Deprecated: Use UpdateState instead.
NewServiceConfig(serviceConfig string)
// ParseServiceConfig parses the provided service config and returns an
// object that provides the parsed config.
ParseServiceConfig(serviceConfigJSON string) *serviceconfig.ParseResult
}
- UpdateState:当Resolver发生状态变更(即重新解析产生了新的Address),通知ClientConn去处理
- ReportError:当Resolver发生异常,通知ClientConn去处理。
- NewAddress:(已废弃,用UpdateState代替)
- NewServiceConfig:(已废弃,用UpdateState代替)
- ParseServiceConfig:解析服务的配置json串
总结: Resolver的作用是提供一个用户自定义的解析、修改地址的方法,使得用户可以自己去实现地址解析的逻辑、做服务发现、地址更新等等功能。在Etcd Client v3中主要是1. 将127.0.0.1:2379这样的地址解析成可供底层连接使用的格式;2. 当集群机器发生变更时,可以修改已经配置的地址
Balancer
代码路径:https://github.com/grpc/grpc-go/blob/v1.26.0/balancer/balancer.go
- Balancer接口定义了Balancer给ClientConn调用的方法
// Balancer takes input from gRPC, manages SubConns, and collects and aggregates
// the connectivity states.
//
// It also generates and updates the Picker used by gRPC to pick SubConns for RPCs.
//
// HandleSubConnectionStateChange, HandleResolvedAddrs and Close are guaranteed
// to be called synchronously from the same goroutine.
// There's no guarantee on picker.Pick, it may be called anytime.
type Balancer interface {
// HandleSubConnStateChange is called by gRPC when the connectivity state
// of sc has changed.
// Balancer is expected to aggregate all the state of SubConn and report
// that back to gRPC.
// Balancer should also generate and update Pickers when its internal state has
// been changed by the new state.
//
// Deprecated: if V2Balancer is implemented by the Balancer,
// UpdateSubConnState will be called instead.
HandleSubConnStateChange(sc SubConn, state connectivity.State)
// HandleResolvedAddrs is called by gRPC to send updated resolved addresses to
// balancers.
// Balancer can create new SubConn or remove SubConn with the addresses.
// An empty address slice and a non-nil error will be passed if the resolver returns
// non-nil error to gRPC.
//
// Deprecated: if V2Balancer is implemented by the Balancer,
// UpdateClientConnState will be called instead.
HandleResolvedAddrs([]resolver.Address, error)
// Close closes the balancer. The balancer is not required to call
// ClientConn.RemoveSubConn for its existing SubConns.
Close()
}
- HandleSubConnStateChange:当底层的SubConn连接状态发生变更时,ClientConn会调用该方法来通知Balancer,以维护内部的状态
- HandleResolvedAddrs:当Resolver解析出新地址,调用UpdateState方法时,ClientConn会调用该方法,从而完成新连接创建等工作
- Close:把Balancer关掉,关闭相关的goroutine等资源
- ClientConn接口定义了ClientConn给Balancer调用的方法(注意这是balancer.go中的ClientConn接口,与resolver.go中的不同,但最终由同一个结构体来实现)
// ClientConn represents a gRPC ClientConn.
//
// This interface is to be implemented by gRPC. Users should not need a
// brand new implementation of this interface. For the situations like
// testing, the new implementation should embed this interface. This allows
// gRPC to add new methods to this interface.
type ClientConn interface {
// NewSubConn is called by balancer to create a new SubConn.
// It doesn't block and wait for the connections to be established.
// Behaviors of the SubConn can be controlled by options.
NewSubConn([]resolver.Address, NewSubConnOptions) (SubConn, error)
// RemoveSubConn removes the SubConn from ClientConn.
// The SubConn will be shutdown.
RemoveSubConn(SubConn)
// UpdateBalancerState is called by balancer to notify gRPC that some internal
// state in balancer has changed.
//
// gRPC will update the connectivity state of the ClientConn, and will call pick
// on the new picker to pick new SubConn.
//
// Deprecated: use UpdateState instead
UpdateBalancerState(s connectivity.State, p Picker)
// UpdateState notifies gRPC that the balancer's internal state has
// changed.
//
// gRPC will update the connectivity state of the ClientConn, and will call pick
// on the new picker to pick new SubConns.
UpdateState(State)
// ResolveNow is called by balancer to notify gRPC to do a name resolving.
ResolveNow(resolver.ResolveNowOptions)
// Target returns the dial target for this ClientConn.
//
// Deprecated: Use the Target field in the BuildOptions instead.
Target() string
}
- NewSubConn:由Balancer来调用以产生底层的SubConn连接,连接的过程是非阻塞的
- RemoveSubConn:由Balancer来调用以移除现有SubConn连接
- UpdateBalancerState:(已废弃,用UpdateState代替)
- UpdateState:由Balancer来调用,告诉ClientConn内部的状态发生了变更,并且会产生新的Picker
- ResolveNow:由Balancer来调用,告诉ClientConn去调Resolver的ResolveNow以重新解析
- Target:(已废弃)
- Picker接口定义了Picker给ClientConn调用的方法(Picker其实是Balancer产生的一个快照,包含底层SubConn的当前状态,每次为ClientConn挑选一个可用的SubConn,保证其可用和负载均衡)
// Picker is used by gRPC to pick a SubConn to send an RPC.
// Balancer is expected to generate a new picker from its snapshot every time its
// internal state has changed.
//
// The pickers used by gRPC can be updated by ClientConn.UpdateBalancerState().
//
// Deprecated: use V2Picker instead
type Picker interface {
// Pick returns the SubConn to be used to send the RPC.
// The returned SubConn must be one returned by NewSubConn().
//
// This functions is expected to return:
// - a SubConn that is known to be READY;
// - ErrNoSubConnAvailable if no SubConn is available, but progress is being
// made (for example, some SubConn is in CONNECTING mode);
// - other errors if no active connecting is happening (for example, all SubConn
// are in TRANSIENT_FAILURE mode).
//
// If a SubConn is returned:
// - If it is READY, gRPC will send the RPC on it;
// - If it is not ready, or becomes not ready after it's returned, gRPC will
// block until UpdateBalancerState() is called and will call pick on the
// new picker. The done function returned from Pick(), if not nil, will be
// called with nil error, no bytes sent and no bytes received.
//
// If the returned error is not nil:
// - If the error is ErrNoSubConnAvailable, gRPC will block until UpdateBalancerState()
// - If the error is ErrTransientFailure or implements IsTransientFailure()
// bool, returning true:
// - If the RPC is wait-for-ready, gRPC will block until UpdateBalancerState()
// is called to pick again;
// - Otherwise, RPC will fail with unavailable error.
// - Else (error is other non-nil error):
// - The RPC will fail with the error's status code, or Unknown if it is
// not a status error.
//
// The returned done() function will be called once the rpc has finished,
// with the final status of that RPC. If the SubConn returned is not a
// valid SubConn type, done may not be called. done may be nil if balancer
// doesn't care about the RPC status.
Pick(ctx context.Context, info PickInfo) (conn SubConn, done func(DoneInfo), err error)
}
- Pick:挑选一个可用的SubConn给ClientConn
- SubConn接口定义了其作为子连接的功能。grpc中一个SubConn对应多个Address,gprc会按顺序尝试连接,直到第一次成功连接。实际在Etcd client v3的使用中,一个endpoint地址对应一个SubConn。
当连接状态为IDLE时,不会连接,直到Balancer显式调用Connect,当连接过程发生异常,会立即重试。如果状态恢复到IDLE,它就不会自动重连了,除非显式调用Connect。
// SubConn represents a gRPC sub connection.
// Each sub connection contains a list of addresses. gRPC will
// try to connect to them (in sequence), and stop trying the
// remainder once one connection is successful.
//
// The reconnect backoff will be applied on the list, not a single address.
// For example, try_on_all_addresses -> backoff -> try_on_all_addresses.
//
// All SubConns start in IDLE, and will not try to connect. To trigger
// the connecting, Balancers must call Connect.
// When the connection encounters an error, it will reconnect immediately.
// When the connection becomes IDLE, it will not reconnect unless Connect is
// called.
//
// This interface is to be implemented by gRPC. Users should not need a
// brand new implementation of this interface. For the situations like
// testing, the new implementation should embed this interface. This allows
// gRPC to add new methods to this interface.
type SubConn interface {
// UpdateAddresses updates the addresses used in this SubConn.
// gRPC checks if currently-connected address is still in the new list.
// If it's in the list, the connection will be kept.
// If it's not in the list, the connection will gracefully closed, and
// a new connection will be created.
//
// This will trigger a state transition for the SubConn.
UpdateAddresses([]resolver.Address)
// Connect starts the connecting for this SubConn.
Connect()
}
- UpdateAddresses:更新SubConn的Address列表,如果已有连接在新列表中,则保持;如果没有则优雅关闭;如果有新的则创建
- Connect:开始连接
总结: Balancer的作用是提供一个用户自定义的负载均衡、内部状态管理结构。Etcd client v3中实现了一个round robin的负载均衡器
Etcd client创建流程
New
创建新的Etcd client v3对象的过程主要包含:
- 创建Client结构体 -> 2. 配置参数 -> 3. 配置Resolver -> 4. 配置Balancer -> 5. grpc dial -> 6. 创建子模块结构体 -> 7. 启动地址自动更新协程-> 8. 返回Client结构体
核心代码
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/client.go
func newClient(cfg *Config) (*Client, error) {
// 一些预处理
// ……
// 1. 创建Client结构体
ctx, cancel := context.WithCancel(baseCtx)
client := &Client{
conn: nil,
cfg: *cfg,
creds: creds,
ctx: ctx,
cancel: cancel,
mu: new(sync.RWMutex),
callOpts: defaultCallOpts,
}
// 2. 配置参数
// ……
// 3. 配置Resolver
// 其实Etcd对Resolver还封装了一层ResolverGroup,可能是为了支持多集群,但一般就一个集群,也就一个Resolver
// Prepare a 'endpoint://<unique-client-id>/' resolver for the client and create a endpoint target to pass
// to dial so the client knows to use this resolver.
client.resolverGroup, err = endpoint.NewResolverGroup(fmt.Sprintf("client-%s", uuid.New().String()))
if err != nil {
client.cancel()
return nil, err
}
client.resolverGroup.SetEndpoints(cfg.Endpoints)
if len(cfg.Endpoints) < 1 {
return nil, fmt.Errorf("at least one Endpoint must is required in client config")
}
dialEndpoint := cfg.Endpoints[0]
// 4. 配置Balancer / 5. grpc dial
// 由于在Etcd cient v3模块中注册了一个以roundRobinBalancerName为名的Balancer,所以这里只需要调用grpc.WithBalancerName,在grpc中会调用相应的Builder去创建已经配置好的Balancer。
// dialWithBalancer中调用了grpc.dialWithContext,从而创建ClientConn以及底层的SubConn
// Use a provided endpoint target so that for https:// without any tls config given, then
// grpc will assume the certificate server name is the endpoint host.
conn, err := client.dialWithBalancer(dialEndpoint, grpc.WithBalancerName(roundRobinBalancerName))
if err != nil {
client.cancel()
client.resolverGroup.Close()
return nil, err
}
// TODO: With the old grpc balancer interface, we waited until the dial timeout
// for the balancer to be ready. Is there an equivalent wait we should do with the new grpc balancer interface?
client.conn = conn
// 6. 创建子模块结构体
// 封装一些子模块,创建结构体
client.Cluster = NewCluster(client)
client.KV = NewKV(client)
client.Lease = NewLease(client)
client.Watcher = NewWatcher(client)
client.Auth = NewAuth(client)
client.Maintenance = NewMaintenance(client)
if cfg.RejectOldCluster {
if err := client.checkVersion(); err != nil {
client.Close()
return nil, err
}
}
// 7. 启动地址自动更新协程
// 如果配置了AutoSyncInterval参数,该协程会自动定时从Etcd服务器拉取当前集群列表,然后更新地址
// 如果没配置,协程会马上退出
go client.autoSync()
// 8. 返回Client结构体
return client, nil
}
Etcd client调用流程
从Client接口可以看出,Client组合了一些功能模块
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/client.go
// Client provides and manages an etcd v3 client session.
type Client struct {
Cluster
KV
Lease
Watcher
Auth
Maintenance
// 其他
// ……
}
- Cluster: 集群模块,主要是集群服务器相关,可以通过rpc请求实现对集群节点的增删改查,本文不深究。
- KV: 键值模块,核心的key-value增删查改,是最常用的模块。
- Lease: 租约模块,用于管理自动续租的会话,基于该会话写入的key-value记录会被服务器保持到会话结束,会话正常结束或者续租失效,记录会被服务器自动清除。很多功能诸如服务发现、分布式锁等都可以基于该模块实现。
- Watcher:监听模块,用于监听记录的插入、修改和删除事件。
- Auth: 鉴权模块,本文不深究。
- Maintenance:监控模块,本文不深究。
Get/Put/Delete
这部分功能是由KV模块实现的,主要实现的结构体是KVClient
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/etcdserver/etcdserverpb/rpc.pb.go
// KVClient 是KV模块的底层支撑
// KVClient is the client API for KV service.
//
// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream.
type KVClient interface {
// Range gets the keys in the range from the key-value store.
Range(ctx context.Context, in *RangeRequest, opts ...grpc.CallOption) (*RangeResponse, error)
// Put puts the given key into the key-value store.
// A put request increments the revision of the key-value store
// and generates one event in the event history.
Put(ctx context.Context, in *PutRequest, opts ...grpc.CallOption) (*PutResponse, error)
// DeleteRange deletes the given range from the key-value store.
// A delete request increments the revision of the key-value store
// and generates a delete event in the event history for every deleted key.
DeleteRange(ctx context.Context, in *DeleteRangeRequest, opts ...grpc.CallOption) (*DeleteRangeResponse, error)
// Txn processes multiple requests in a single transaction.
// A txn request increments the revision of the key-value store
// and generates events with the same revision for every completed request.
// It is not allowed to modify the same key several times within one txn.
Txn(ctx context.Context, in *TxnRequest, opts ...grpc.CallOption) (*TxnResponse, error)
// Compact compacts the event history in the etcd key-value store. The key-value
// store should be periodically compacted or the event history will continue to grow
// indefinitely.
Compact(ctx context.Context, in *CompactionRequest, opts ...grpc.CallOption) (*CompactionResponse, error)
}
// kVClient结构体实现KVClient接口
type kVClient struct {
// 与ClientConn关联
cc *grpc.ClientConn
}
// 以Put方法为例最终会调用ClientConn的Invoke方法
func (c *kVClient) Put(ctx context.Context, in *PutRequest, opts ...grpc.CallOption) (*PutResponse, error) {
out := new(PutResponse)
err := c.cc.Invoke(ctx, "/etcdserverpb.KV/Put", in, out, opts...)
if err != nil {
return nil, err
}
return out, nil
}
// ……
可以看出最后调用grpc是调用了ClientConn的Invoke方法;最终会调用以下方法:
代码路径:https://github.com/grpc/grpc-go/blob/v1.26.0/call.go
func invoke(ctx context.Context, method string, req, reply interface{}, cc *ClientConn, opts ...CallOption) error {
// 创建一个ClientStream,这一步会获取一个底层的Transport,在其基础上创建一个Stream作为本次请求的通信流
cs, err := newClientStream(ctx, unaryStreamDesc, cc, method, opts...)
if err != nil {
return err
}
// 用通信流发送request数据
if err := cs.SendMsg(req); err != nil {
return err
}
// 用通信流接收reply数据
return cs.RecvMsg(reply)
}
注意:Get/Put/Delete等方法是同步阻塞的,而且如果在请求过程中发生网络异常,会直接返回错误而不会自动重试
Watch
这部分功能是由Watcher模块实现的,主要实现的结构体是watcher
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/watch.go
type Watcher interface {
// Watch watches on a key or prefix. The watched events will be returned
// through the returned channel. If revisions waiting to be sent over the
// watch are compacted, then the watch will be canceled by the server, the
// client will post a compacted error watch response, and the channel will close.
// If the context "ctx" is canceled or timed out, returned "WatchChan" is closed,
// and "WatchResponse" from this closed channel has zero events and nil "Err()".
// The context "ctx" MUST be canceled, as soon as watcher is no longer being used,
// to release the associated resources.
//
// If the context is "context.Background/TODO", returned "WatchChan" will
// not be closed and block until event is triggered, except when server
// returns a non-recoverable error (e.g. ErrCompacted).
// For example, when context passed with "WithRequireLeader" and the
// connected server has no leader (e.g. due to network partition),
// error "etcdserver: no leader" (ErrNoLeader) will be returned,
// and then "WatchChan" is closed with non-nil "Err()".
// In order to prevent a watch stream being stuck in a partitioned node,
// make sure to wrap context with "WithRequireLeader".
//
// Otherwise, as long as the context has not been canceled or timed out,
// watch will retry on other recoverable errors forever until reconnected.
//
// TODO: explicitly set context error in the last "WatchResponse" message and close channel?
// Currently, client contexts are overwritten with "valCtx" that never closes.
// TODO(v3.4): configure watch retry policy, limit maximum retry number
// (see https://github.com/etcd-io/etcd/issues/8980)
Watch(ctx context.Context, key string, opts ...OpOption) WatchChan
// RequestProgress requests a progress notify response be sent in all watch channels.
RequestProgress(ctx context.Context) error
// Close closes the watcher and cancels all watch requests.
Close() error
}
// watcher implements the Watcher interface
type watcher struct {
remote pb.WatchClient
callOpts []grpc.CallOption
// mu protects the grpc streams map
mu sync.RWMutex
// streams holds all the active grpc streams keyed by ctx value.
streams map[string]*watchGrpcStream
}
可以看出watcher结构体中包含一个streams map[string]*watchGrpcStream的map,里面保存了所有活跃的watchGrpcStream。watchGrpcStream代表建立在grpc上的一条通信流,每条watchGrpcStream的map上又承载了若干watcherStream。每次Watch请求都会创建一个watcherStream,watcherStream会一直维持,直到人为cancel或者发生不能恢复的内部异常,如果只是网络断开等可恢复的异常,会尝试重建watcherStream。
再看Watch方法,主要分为三个阶段
- 1 准备阶段,配置参数、构建watchRequest结构体
- 2 获取watchGrpcStream阶段,会复用或者创建watchGrpcStream
- 3 提交阶段,将watchRequest提交给WatcherGrpcStream的reqc,然后等待watchRequest的返回channel: retc来返回WatchChan
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/watch.go
// Watch posts a watch request to run() and waits for a new watcher channel
func (w *watcher) Watch(ctx context.Context, key string, opts ...OpOption) WatchChan {
// 1. 准备阶段,配置参数、构建watchRequest结构体
ow := opWatch(key, opts...)
var filters []pb.WatchCreateRequest_FilterType
if ow.filterPut {
filters = append(filters, pb.WatchCreateRequest_NOPUT)
}
if ow.filterDelete {
filters = append(filters, pb.WatchCreateRequest_NODELETE)
}
wr := &watchRequest{
ctx: ctx,
createdNotify: ow.createdNotify,
key: string(ow.key),
end: string(ow.end),
rev: ow.rev,
progressNotify: ow.progressNotify,
fragment: ow.fragment,
filters: filters,
prevKV: ow.prevKV,
retc: make(chan chan WatchResponse, 1),
}
ok := false
ctxKey := streamKeyFromCtx(ctx)
// 2. 复用或者创建WatcherGrpcStream
// find or allocate appropriate grpc watch stream
w.mu.Lock()
if w.streams == nil {
// closed
w.mu.Unlock()
ch := make(chan WatchResponse)
close(ch)
return ch
}
wgs := w.streams[ctxKey]
if wgs == nil {
wgs = w.newWatcherGrpcStream(ctx)
w.streams[ctxKey] = wgs
}
// 这里获取WatcherGrpcStream的reqc和donec两个channel,分别用来向它提交请求和异常的信号
donec := wgs.donec
reqc := wgs.reqc
w.mu.Unlock()
// couldn't create channel; return closed channel
closeCh := make(chan WatchResponse, 1)
// 3. 将请求提交给WatcherGrpcStream的reqc,然后等待watchRequest的返回channel: retc来返回WatchChan
// submit request
select {
case reqc <- wr:
ok = true
case <-wr.ctx.Done():
case <-donec:
if wgs.closeErr != nil {
closeCh <- WatchResponse{Canceled: true, closeErr: wgs.closeErr}
break
}
// retry; may have dropped stream from no ctxs
return w.Watch(ctx, key, opts...)
}
// receive channel
if ok {
select {
case ret := <-wr.retc:
return ret
case <-ctx.Done():
case <-donec:
if wgs.closeErr != nil {
closeCh <- WatchResponse{Canceled: true, closeErr: wgs.closeErr}
break
}
// retry; may have dropped stream from no ctxs
return w.Watch(ctx, key, opts...)
}
}
close(closeCh)
return closeCh
}
其中需要关注的是wgs = w.newWatcherGrpcStream(ctx)这行代码,它会创建一个watchGrpcStream
这个方法会创建一个watchGrpcStream,然后起一个goroutine去执行它的run方法,主要的逻辑都在这个方法中完成。
它主要包含以下几个工作:
- 1 一些准备工作
- 2 创建一个grpc的stream,也就是watch client,用以承载每个watch产生的substreams
- 3 进入一个大的等待循环
- 3.1 等待新的watch请求。 一旦有新的watch,会创建一个substream,然后启动一个goroutine去实际接收这个substream返回的WatchResponse,最后将请求通过watch client发送出去
- 3.2 从watch client接收到response事件,会将事件分派给对应的substream去处理
- 3.3 如果watch client通信发生异常(可能是网络连不上,或者服务器挂了),会尝试重新获取一个watch client,并重发请求,这可以保证watch不会因为网络抖动等原因而中断
- 3.4 异常或结束时的处理
当然这里面还有很多细节,这里不讨论了。
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/watch.go
func (w *watcher) newWatcherGrpcStream(inctx context.Context) *watchGrpcStream {
ctx, cancel := context.WithCancel(&valCtx{inctx})
wgs := &watchGrpcStream{
owner: w,
remote: w.remote,
callOpts: w.callOpts,
ctx: ctx,
ctxKey: streamKeyFromCtx(inctx),
cancel: cancel,
substreams: make(map[int64]*watcherStream),
respc: make(chan *pb.WatchResponse),
reqc: make(chan watchStreamRequest),
donec: make(chan struct{}),
errc: make(chan error, 1),
closingc: make(chan *watcherStream),
resumec: make(chan struct{}),
}
go wgs.run()
return wgs
}
// run is the root of the goroutines for managing a watcher client
func (w *watchGrpcStream) run() {
// 1. 一些准备工作
// ……
// 2. 创建一个grpc的stream,也就是watch client,用以承载每个watch产生的substreams
// start a stream with the etcd grpc server
if wc, closeErr = w.newWatchClient(); closeErr != nil {
return
}
cancelSet := make(map[int64]struct{})
// 3. 进入一个大的等待循环
var cur *pb.WatchResponse
for {
select {
// 3.1 等待新的watch请求
// 一旦有新的watch,会创建一个substream,然后启动一个goroutine去实际接收这个substream返回的WatchResponse,最后将请求通过watch client发送出去
// Watch() requested
case req := <-w.reqc:
switch wreq := req.(type) {
case *watchRequest:
outc := make(chan WatchResponse, 1)
// TODO: pass custom watch ID?
ws := &watcherStream{
initReq: *wreq,
id: -1,
outc: outc,
// unbuffered so resumes won't cause repeat events
recvc: make(chan *WatchResponse),
}
ws.donec = make(chan struct{})
w.wg.Add(1)
// 这里就是创建一个goroutine去实际处理这个substream返回的WatchResponse
go w.serveSubstream(ws, w.resumec)
// queue up for watcher creation/resume
w.resuming = append(w.resuming, ws)
if len(w.resuming) == 1 {
// head of resume queue, can register a new watcher
wc.Send(ws.initReq.toPB())
}
case *progressRequest:
wc.Send(wreq.toPB())
}
// 3.2 从watch client接收到response事件
// 会将事件分派给对应的substream去处理
// new events from the watch client
case pbresp := <-w.respc:
if cur == nil || pbresp.Created || pbresp.Canceled {
cur = pbresp
} else if cur != nil && cur.WatchId == pbresp.WatchId {
// merge new events
cur.Events = append(cur.Events, pbresp.Events...)
// update "Fragment" field; last response with "Fragment" == false
cur.Fragment = pbresp.Fragment
}
switch {
case pbresp.Created:
// response to head of queue creation
if ws := w.resuming[0]; ws != nil {
w.addSubstream(pbresp, ws)
w.dispatchEvent(pbresp)
w.resuming[0] = nil
}
if ws := w.nextResume(); ws != nil {
wc.Send(ws.initReq.toPB())
}
// reset for next iteration
cur = nil
case pbresp.Canceled && pbresp.CompactRevision == 0:
delete(cancelSet, pbresp.WatchId)
if ws, ok := w.substreams[pbresp.WatchId]; ok {
// signal to stream goroutine to update closingc
close(ws.recvc)
closing[ws] = struct{}{}
}
// reset for next iteration
cur = nil
case cur.Fragment:
// watch response events are still fragmented
// continue to fetch next fragmented event arrival
continue
default:
// dispatch to appropriate watch stream
ok := w.dispatchEvent(cur)
// reset for next iteration
cur = nil
if ok {
break
}
// watch response on unexpected watch id; cancel id
if _, ok := cancelSet[pbresp.WatchId]; ok {
break
}
cancelSet[pbresp.WatchId] = struct{}{}
cr := &pb.WatchRequest_CancelRequest{
CancelRequest: &pb.WatchCancelRequest{
WatchId: pbresp.WatchId,
},
}
req := &pb.WatchRequest{RequestUnion: cr}
wc.Send(req)
}
// 3.3 如果watch client通信发生异常(可能是网络连不上,或者服务器挂了),会尝试重新获取一个watch client,并重发请求,这可以保证watch不会因为网络抖动等原因而中断
// watch client failed on Recv; spawn another if possible
case err := <-w.errc:
if isHaltErr(w.ctx, err) || toErr(w.ctx, err) == v3rpc.ErrNoLeader {
closeErr = err
return
}
if wc, closeErr = w.newWatchClient(); closeErr != nil {
return
}
if ws := w.nextResume(); ws != nil {
wc.Send(ws.initReq.toPB())
}
cancelSet = make(map[int64]struct{})
// 3.4 异常或结束时的处理
case <-w.ctx.Done():
return
case ws := <-w.closingc:
w.closeSubstream(ws)
delete(closing, ws)
// no more watchers on this stream, shutdown
if len(w.substreams)+len(w.resuming) == 0 {
return
}
}
}
}
Lease
这部分功能是由Lease模块实现的,主要实现的结构体是lessor
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/lease.go
type Lease interface {
// Grant creates a new lease.
Grant(ctx context.Context, ttl int64) (*LeaseGrantResponse, error)
// Revoke revokes the given lease.
Revoke(ctx context.Context, id LeaseID) (*LeaseRevokeResponse, error)
// TimeToLive retrieves the lease information of the given lease ID.
TimeToLive(ctx context.Context, id LeaseID, opts ...LeaseOption) (*LeaseTimeToLiveResponse, error)
// Leases retrieves all leases.
Leases(ctx context.Context) (*LeaseLeasesResponse, error)
// KeepAlive keeps the given lease alive forever. If the keepalive response
// posted to the channel is not consumed immediately, the lease client will
// continue sending keep alive requests to the etcd server at least every
// second until latest response is consumed.
//
// The returned "LeaseKeepAliveResponse" channel closes if underlying keep
// alive stream is interrupted in some way the client cannot handle itself;
// given context "ctx" is canceled or timed out. "LeaseKeepAliveResponse"
// from this closed channel is nil.
//
// If client keep alive loop halts with an unexpected error (e.g. "etcdserver:
// no leader") or canceled by the caller (e.g. context.Canceled), the error
// is returned. Otherwise, it retries.
//
// TODO(v4.0): post errors to last keep alive message before closing
// (see https://github.com/coreos/etcd/pull/7866)
KeepAlive(ctx context.Context, id LeaseID) (<-chan *LeaseKeepAliveResponse, error)
// KeepAliveOnce renews the lease once. The response corresponds to the
// first message from calling KeepAlive. If the response has a recoverable
// error, KeepAliveOnce will retry the RPC with a new keep alive message.
//
// In most of the cases, Keepalive should be used instead of KeepAliveOnce.
KeepAliveOnce(ctx context.Context, id LeaseID) (*LeaseKeepAliveResponse, error)
// Close releases all resources Lease keeps for efficient communication
// with the etcd server.
Close() error
}
type lessor struct {
mu sync.Mutex // guards all fields
// donec is closed and loopErr is set when recvKeepAliveLoop stops
donec chan struct{}
loopErr error
remote pb.LeaseClient
stream pb.Lease_LeaseKeepAliveClient
streamCancel context.CancelFunc
stopCtx context.Context
stopCancel context.CancelFunc
keepAlives map[LeaseID]*keepAlive
// firstKeepAliveTimeout is the timeout for the first keepalive request
// before the actual TTL is known to the lease client
firstKeepAliveTimeout time.Duration
// firstKeepAliveOnce ensures stream starts after first KeepAlive call.
firstKeepAliveOnce sync.Once
callOpts []grpc.CallOption
}
主要说三个方法:
- Grant:通过一次grpc调用,通知服务器创建一个带ttl的lease,返回一个lease id,并将其保存在自己的keepAlives map中
- Revoke:通过一次grpc调用,通知服务器销毁对应lease id的lease
- KeepAlive:会创建一个stream,并且创建两个协程分别用来发送keepalive和接收响应,从而维持这个lease
先看Grant和Revoke
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/lease.go
func (l *lessor) Grant(ctx context.Context, ttl int64) (*LeaseGrantResponse, error) {
r := &pb.LeaseGrantRequest{TTL: ttl}
resp, err := l.remote.LeaseGrant(ctx, r, l.callOpts...)
if err == nil {
gresp := &LeaseGrantResponse{
ResponseHeader: resp.GetHeader(),
ID: LeaseID(resp.ID),
TTL: resp.TTL,
Error: resp.Error,
}
return gresp, nil
}
return nil, toErr(ctx, err)
}
func (l *lessor) Revoke(ctx context.Context, id LeaseID) (*LeaseRevokeResponse, error) {
r := &pb.LeaseRevokeRequest{ID: int64(id)}
resp, err := l.remote.LeaseRevoke(ctx, r, l.callOpts...)
if err == nil {
return (*LeaseRevokeResponse)(resp), nil
}
return nil, toErr(ctx, err)
}
其中底层调用与Get/Put/Delete类似
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/etcdserver/etcdserverpb/rpc.pb.go
func (c *leaseClient) LeaseGrant(ctx context.Context, in *LeaseGrantRequest, opts ...grpc.CallOption) (*LeaseGrantResponse, error) {
out := new(LeaseGrantResponse)
err := c.cc.Invoke(ctx, "/etcdserverpb.Lease/LeaseGrant", in, out, opts...)
if err != nil {
return nil, err
}
return out, nil
}
func (c *leaseClient) LeaseRevoke(ctx context.Context, in *LeaseRevokeRequest, opts ...grpc.CallOption) (*LeaseRevokeResponse, error) {
out := new(LeaseRevokeResponse)
err := c.cc.Invoke(ctx, "/etcdserverpb.Lease/LeaseRevoke", in, out, opts...)
if err != nil {
return nil, err
}
return out, nil
}
重点看KeepAlive的实现
- 准备工作
- 创建或复用LeaseID所对应的keepAlive结构体,并将当前的ctx和ch与之关联
- 创建相关goroutine。 其中最重要的是go l.recvKeepAliveLoop()这行,在内部实现了定期与服务器续租的逻辑
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/lease.go
func (l *lessor) KeepAlive(ctx context.Context, id LeaseID) (<-chan *LeaseKeepAliveResponse, error) {
// 1. 准备工作
ch := make(chan *LeaseKeepAliveResponse, LeaseResponseChSize)
l.mu.Lock()
// ensure that recvKeepAliveLoop is still running
select {
case <-l.donec:
err := l.loopErr
l.mu.Unlock()
close(ch)
return ch, ErrKeepAliveHalted{Reason: err}
default:
}
// 2. 创建或复用LeaseID所对应的keepAlive结构体,并将当前的ctx和ch与之关联
ka, ok := l.keepAlives[id]
if !ok {
// create fresh keep alive
ka = &keepAlive{
chs: []chan<- *LeaseKeepAliveResponse{ch},
ctxs: []context.Context{ctx},
deadline: time.Now().Add(l.firstKeepAliveTimeout),
nextKeepAlive: time.Now(),
donec: make(chan struct{}),
}
l.keepAlives[id] = ka
} else {
// add channel and context to existing keep alive
ka.ctxs = append(ka.ctxs, ctx)
ka.chs = append(ka.chs, ch)
}
l.mu.Unlock()
// 3. 创建相关goroutine
// 其中最重要的是go l.recvKeepAliveLoop()这行
// 在内部实现了定期与服务器续租的逻辑
go l.keepAliveCtxCloser(id, ctx, ka.donec)
l.firstKeepAliveOnce.Do(func() {
go l.recvKeepAliveLoop()
go l.deadlineLoop()
})
return ch, nil
}
再来看下recvKeepAliveLoop这个方法:
- 1 准备工作
- 2 进入一个大的循环,实现续租逻辑。大循环的目的是当一个stream失效时可以重试
- 2.1 resetRecv中创建一个stream,并在内部创建一个goroutine不断地发送续租请求
- 2.2 进入一个小的循环,小循环会不断地从前面创建的stream中读取服务器的返回消息
- 2.2.1 stream.Recv()会阻塞,直到服务器返回消息或者产生异常
- 2.2.2 将服务器返回的消息发送到与keepAlive相关联的channel中去
- 2.3 如果小循环退出了,说明stream失效了,延时500ms后重新开始大循环
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/lease.go
func (l *lessor) recvKeepAliveLoop() (gerr error) {
// 1. 准备工作
defer func() {
l.mu.Lock()
close(l.donec)
l.loopErr = gerr
for _, ka := range l.keepAlives {
ka.close()
}
l.keepAlives = make(map[LeaseID]*keepAlive)
l.mu.Unlock()
}()
// 2. 进入一个大的循环,实现续租逻辑。大循环的目的是当一个stream失效时可以重试
for {
// 2.1 resetRecv中创建一个stream,并在内部创建一个goroutine不断地发送续租请求
stream, err := l.resetRecv()
if err != nil {
if canceledByCaller(l.stopCtx, err) {
return err
}
} else {
// 2.2 进入一个小的循环,小循环会不断地从前面创建的stream中读取服务器的返回消息
for {
// 2.2.1 stream.Recv()会阻塞,直到服务器返回消息或者产生异常
resp, err := stream.Recv()
// 异常处理
if err != nil {
if canceledByCaller(l.stopCtx, err) {
return err
}
if toErr(l.stopCtx, err) == rpctypes.ErrNoLeader {
l.closeRequireLeader()
}
break
}
// 2.2.2 将服务器返回的消息发送到与keepAlive相关联的channel中去
l.recvKeepAlive(resp)
}
}
// 2.3 如果小循环退出了,说明stream失效了,延时500ms后重新开始大循环
select {
case <-time.After(retryConnWait):
continue
case <-l.stopCtx.Done():
return l.stopCtx.Err()
}
}
}
再看下resetRecv这个方法:
- 1 调用remote的LeaseKeepAlive方法,创建一个grpc stream
- 2 启动goroutine来执行sendKeepAliveLoop,不断发送续租请求
- 3 返回该stream
其中sendKeepAliveLoop中每500ms会续租一次
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/lease.go
// resetRecv opens a new lease stream and starts sending keep alive requests.
func (l *lessor) resetRecv() (pb.Lease_LeaseKeepAliveClient, error) {
// 1 调用remote的LeaseKeepAlive方法,创建一个grpc stream
sctx, cancel := context.WithCancel(l.stopCtx)
stream, err := l.remote.LeaseKeepAlive(sctx, l.callOpts...)
if err != nil {
cancel()
return nil, err
}
l.mu.Lock()
defer l.mu.Unlock()
if l.stream != nil && l.streamCancel != nil {
l.streamCancel()
}
l.streamCancel = cancel
l.stream = stream
// 2 启动goroutine来不断发送续租请求
go l.sendKeepAliveLoop(stream)
// 3 返回该stream
return stream, nil
}
// sendKeepAliveLoop sends keep alive requests for the lifetime of the given stream.
func (l *lessor) sendKeepAliveLoop(stream pb.Lease_LeaseKeepAliveClient) {
for {
var tosend []LeaseID
now := time.Now()
l.mu.Lock()
for id, ka := range l.keepAlives {
if ka.nextKeepAlive.Before(now) {
tosend = append(tosend, id)
}
}
l.mu.Unlock()
for _, id := range tosend {
r := &pb.LeaseKeepAliveRequest{ID: int64(id)}
if err := stream.Send(r); err != nil {
// TODO do something with this error?
return
}
}
select {
case <-time.After(500 * time.Millisecond):
case <-stream.Context().Done():
return
case <-l.donec:
return
case <-l.stopCtx.Done():
return
}
}
}
注意:由以上代码可知:
- 不管Grant时候设置的ttl是多少,Keepalive永远都会以500ms左右的间隔发送续租请求,加上recvKeepAliveLoop中大循环中异常重试等待的500ms,可知正常情况下两次续租请求的间隔最多在1s左右。
- 除非某个lease被取消或已经超时,否则如果出现网络抖动,是会自动重试的。
Session
其实Session功能完全是对Lease功能的封装,从NewSession可见一斑:
只是调用了Grant和KeepAlive,将信息都封装在Session结构体中,方便使用
代码路径:https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/concurrency/session.go
// NewSession gets the leased session for a client.
func NewSession(client *v3.Client, opts ...SessionOption) (*Session, error) {
ops := &sessionOptions{ttl: defaultSessionTTL, ctx: client.Ctx()}
for _, opt := range opts {
opt(ops)
}
id := ops.leaseID
if id == v3.NoLease {
resp, err := client.Grant(ops.ctx, int64(ops.ttl))
if err != nil {
return nil, err
}
id = v3.LeaseID(resp.ID)
}
ctx, cancel := context.WithCancel(ops.ctx)
keepAlive, err := client.KeepAlive(ctx, id)
if err != nil || keepAlive == nil {
cancel()
return nil, err
}
donec := make(chan struct{})
s := &Session{client: client, opts: ops, id: id, cancel: cancel, donec: donec}
// keep the lease alive until client error or cancelled context
go func() {
defer close(donec)
for range keepAlive {
// eat messages until keep alive channel closes
}
}()
return s, nil
}
Mutex
该模块实现了分布式锁的功能。它其实是通过使用Session(也就是Lease),在客户端实现的分布式锁,其实服务器除了支持Lease功能并不需要提供额外的功能。它的大体思路是:
- 在锁的key prefix下创建一个key,并用Session来不断续租它
- key prefix下的所有key按照先后顺序有不同的revision,revision最小的那个key将获得锁
- revision不是最小的key的持有者将阻塞,直到revision比它小的所有key都被删除时,它才获得锁
首先看Mutex结构体
- s是相关联的Session
- pfx是锁的前缀
- myKey是当前持有的key
- myRev是当前持有的key的revision
- hdr是最新返回结果的Header
再看Lock方法
- 1 在pfx下注册myKey,这里用了事务,也就是在一次grpc请求中完成以下操作:
- 1.1 判断myKey是否已经存在
- 1.2 如果不存在则执行put写入,如果存在则执行get读取
- 1.3 执行getOwner读取pfx信息
- 2 将put或get的Response中的值的CreateRevision,取出来作为myRev
- 3 判断一下如果pfx下没有key,或者最小的CreateRevision和myRev一致,则表示已经获得锁了
- 4 如果没获得锁,则需要等待pfx前缀下revision比myRev小的key都被删除,才能拿到key
注意:其实这个方法这样实现有几个问题,这可能是Etcd client v3的设计bug:
- 如果服务器网络异常导致pfx下的所有key的都超时了,或者这些key被外部删除了,那所有竞争这个锁的节点都会认为自己拿到了锁
- 如果一个client拿到了锁,但是它的key超时了,它并不会有感知,仍然会认为自己拿着锁;但与此同时会有另一个client获取这把锁
为了解决这个问题,需要人为做两件事:
- 1 监听一下Mutex关联的Session有没有失效,如果失效则要手动放弃锁
- 2 监听一下当前的myKey有没有被删除,如果被删除则也要手动放弃锁
最后看一下Unlock,它比较简单,只是把myKey删掉就行了
https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/concurrency/mutex.go
// Mutex implements the sync Locker interface with etcd
type Mutex struct {
s *Session
pfx string
myKey string
myRev int64
hdr *pb.ResponseHeader
}
func NewMutex(s *Session, pfx string) *Mutex {
return &Mutex{s, pfx + "/", "", -1, nil}
}
// Lock locks the mutex with a cancelable context. If the context is canceled
// while trying to acquire the lock, the mutex tries to clean its stale lock entry.
func (m *Mutex) Lock(ctx context.Context) error {
s := m.s
client := m.s.Client()
// 1 在pfx下注册myKey
// 这里用了事务,也就是在一次grpc请求中完成以下操作:
// 1.1 判断myKey是否已经存在
// 1.2 如果不存在则执行put写入,如果存在则执行get读取
// 1.3 执行getOwner读取pfx信息
m.myKey = fmt.Sprintf("%s%x", m.pfx, s.Lease())
cmp := v3.Compare(v3.CreateRevision(m.myKey), "=", 0)
// put self in lock waiters via myKey; oldest waiter holds lock
put := v3.OpPut(m.myKey, "", v3.WithLease(s.Lease()))
// reuse key in case this session already holds the lock
get := v3.OpGet(m.myKey)
// fetch current holder to complete uncontended path with only one RPC
getOwner := v3.OpGet(m.pfx, v3.WithFirstCreate()...)
resp, err := client.Txn(ctx).If(cmp).Then(put, getOwner).Else(get, getOwner).Commit()
if err != nil {
return err
}
// 2 将put或get的Response中的值的CreateRevision,取出来作为myRev
m.myRev = resp.Header.Revision
if !resp.Succeeded {
m.myRev = resp.Responses[0].GetResponseRange().Kvs[0].CreateRevision
}
// 3 判断一下如果pfx下没有key,或者最小的CreateRevision和myRev一致,则表示已经获得锁了
// if no key on prefix / the minimum rev is key, already hold the lock
ownerKey := resp.Responses[1].GetResponseRange().Kvs
if len(ownerKey) == 0 || ownerKey[0].CreateRevision == m.myRev {
m.hdr = resp.Header
return nil
}
// 4 如果没获得锁,则需要等待pfx前缀下revision比myRev小的key都被删除,才能拿到key
// wait for deletion revisions prior to myKey
hdr, werr := waitDeletes(ctx, client, m.pfx, m.myRev-1)
// release lock key if wait failed
if werr != nil {
m.Unlock(client.Ctx())
} else {
m.hdr = hdr
}
return werr
}
func (m *Mutex) Unlock(ctx context.Context) error {
client := m.s.Client()
if _, err := client.Delete(ctx, m.myKey); err != nil {
return err
}
m.myKey = "\x00"
m.myRev = -1
return nil
}