介绍完Bitcoin P2P网络的组网机制后,本文将介绍Peer之间交换的协议消息。Bitcoin节点将Transaction和Block在全网广播,就是通过在Peer与Peer之间同步Transaction和Block实现的,这正是Bitcoin协议的设计目标。同时,为了新建或者维持Peer关系,协议也定义了ping/pong心跳和getaddr/addr等消息,我们在前文的分析中均提到过。协议消息的定义在btcd/wire包中实现,wire并没有定义协议交互,即不包含收到消息后如何处理或者响应的逻辑,只定义了消息格式、消息的封装和解析方法等。消息的响应及交互在serverPeer中实现,其中涉及到的区块处理的逻辑在blockmanager和btcd/blockchain中实现。本文将重点介绍btcd/wire中定义的协议消息并简要介绍消息之间的交互,待我们分析完btcd/blockchain中关于区块的处理和共识后,再详细介绍消息的响应及交互过程。
btcd/wire主要包含如下文件:
- protocol.go: 定义了Bitcoin协议的版本号、网络号及ServiceFlag等常量;
- common.go: 定义了向二进制流读写基础数据类型的的方法,同时也定义了读写可变长度整数值和可变长度字符串的方法;
- message.go: 定义了Message接口及消息封装和解析的“工厂方法”;
- msgXXX.go: 定义了具体消息的格式及接口方法的实现;
- blockheader.go: 定义了BlockHeader类型,用于block、headers等消息;
- invvect.go: 定义了InvVect类型,用于inv消息;
- netaddress.go: 定义了NetAddress类型;
- mgsXXX_test.go: 对应消息的测试文件;
- doc.go: 包btcd/wire的doc文档;
读者可以阅读bitcoinwiki上的《Protocol documentation》来对各协议消息的格式作全面了解,这将有助于理解我们接下来的代码分析。接下来,我们来看看Message接口及消息结构的定义:
//btcd/wire/message.go
// Message is an interface that describes a bitcoin message. A type that
// implements Message has complete control over the representation of its data
// and may therefore contain additional or fewer fields than those which
// are used directly in the protocol encoded message.
type Message interface {
BtcDecode(io.Reader, uint32) error
BtcEncode(io.Writer, uint32) error
Command() string
MaxPayloadLength(uint32) uint32
}
......
// messageHeader defines the header structure for all bitcoin protocol messages.
type messageHeader struct {
magic BitcoinNet // 4 bytes
command string // 12 bytes
length uint32 // 4 bytes
checksum [4]byte // 4 bytes
}
可以看到,Bitcoin消息的消息头中包含4个字段,它们的意义是:
- magic: 标识Bitcoin协议消息的“魔数”,同时也用于区分Bitcoin网络,有MainNet、TestNet、TestNet3及SimNet,节点启动时可以指定在哪个网络下运行;
- command: 协议消息包含一个命令字符串,如version、addr等,用于标识协议消息的类型;
- length: 消息体的长度;
- checksum: 消息体头4个字节的双SHA256的结果;
协议消息的结构如下图所示:
Message接口中BtcDecode()和BtcEncode()定义了解析和封装消息体的方法,它们在每个具体的消息定义中实现,主要是将结构化的消息体序列化为字节流或者将字节流实例化为某种消息格式,其中需要用到common.go中定义的各种读写基础数据类型的的方法。我们先来看看common.go中定义的各个方法。
//btcd/wire/common.go
// readElement reads the next sequence of bytes from r using little endian
// depending on the concrete type of element pointed to.
func readElement(r io.Reader, element interface{}) error {
// Attempt to read the element based on the concrete type via fast
// type assertions first.
switch e := element.(type) {
case *int32:
rv, err := binarySerializer.Uint32(r, littleEndian)
if err != nil {
return err
}
*e = int32(rv)
return nil
......
case *bool:
rv, err := binarySerializer.Uint8(r)
if err != nil {
return err
}
if rv == 0x00 {
*e = false
} else {
*e = true
}
return nil
......
// Message header checksum.
case *[4]byte:
_, err := io.ReadFull(r, e[:])
if err != nil {
return err
}
return nil
// Message header command.
case *[CommandSize]uint8:
_, err := io.ReadFull(r, e[:])
if err != nil {
return err
}
return nil
......
}
// Fall back to the slower binary.Read if a fast path was not available
// above.
return binary.Read(r, littleEndian, element)
}
其主要过程是通过类型断言(type assertion)解析欲读取字节对应的数据类型,然后根据类型的size读出字节slice,并进行强制类型转换后得到格式化的数据。writeElement()则是与其完全相反的过程,我们不再赘述。值得注意的是,对于如uint8、iunt32及uint64等基础数据类型的读写是通过binarySerializer的读写方法,而不是直接调用io.Read()或者io.Write()来实现的,而且,这些类型序列化后按小端字节序存储。binarySerializer是一个缓冲为1024个容量为8字节的byte slice管道,这里它并不用来在协程之间通信,而是作一个缓存队列使用。为了避免序列化或反序列化基础数据类型时频数地分配或者释放内存,binarySerializer提供了一个大小固定的“缓存池”,当需要缓存时,向“缓存池”“借”指定大小的的byte slice,使用完毕后“归还”。然而,尽管“缓存池”的大小固定,当它分配完毕后,后续的申请并不会被阻塞,而是从内存直接分配,使用完毕后交由gc回收。
//btcd/wire/common.go
// binaryFreeList defines a concurrent safe free list of byte slices (up to the
// maximum number defined by the binaryFreeListMaxItems constant) that have a
// cap of 8 (thus it supports up to a uint64). It is used to provide temporary
// buffers for serializing and deserializing primitive numbers to and from their
// binary encoding in order to greatly reduce the number of allocations
// required.
//
// For convenience, functions are provided for each of the primitive unsigned
// integers that automatically obtain a buffer from the free list, perform the
// necessary binary conversion, read from or write to the given io.Reader or
// io.Writer, and return the buffer to the free list.
type binaryFreeList chan []byte
// Borrow returns a byte slice from the free list with a length of 8. A new
// buffer is allocated if there are not any available on the free list.
func (l binaryFreeList) Borrow() []byte {
var buf []byte
select {
case buf = <-l:
default:
buf = make([]byte, 8)
}
return buf[:8]
}
// Return puts the provided byte slice back on the free list. The buffer MUST
// have been obtained via the Borrow function and therefore have a cap of 8.
func (l binaryFreeList) Return(buf []byte) {
select {
case l <- buf:
default:
// Let it go to the garbage collector.
}
}
......
// binarySerializer provides a free list of buffers to use for serializing and
// deserializing primitive integer values to and from io.Readers and io.Writers.
var binarySerializer binaryFreeList = make(chan []byte, binaryFreeListMaxItems)
上面的Borrow()和Return()方法中,select复用均添加了default分支,故在“缓存池”空时申请或者“缓存池”满时释放均不会阻塞。同时,我们也可以到,从“缓存池”中申请的byte slice的容量为8字节,即最大可以支持uint64类型的缓存。当然,我们完全可以自己设计一种数据结构,如[]byte类型的队列或都树堆来实现“缓冲池”,但用管道是一个更简洁的方法。请注意,slice不同于数组,它指向了底层数组,所以Borrow()和Return()的入参与返回值并没有发生数组拷贝。为了了解基础数据类型的序列化过程,我们来看binaryFreeList的PutUint16():
//btcd/wire/common.go
// PutUint16 serializes the provided uint16 using the given byte order into a
// buffer from the free list and writes the resulting two bytes to the given
// writer.
func (l binaryFreeList) PutUint16(w io.Writer, byteOrder binary.ByteOrder, val uint16) error {
buf := l.Borrow()[:2]
byteOrder.PutUint16(buf, val)
_, err := w.Write(buf)
l.Return(buf)
return err
}
除了基础数据类型了,为了压缩传输数据量,bitcoin协议还定义了可变长度整数值,它的定义为:
整数值 | 长度 | 格式 | 描述 |
---|---|---|---|
< 0xFD | 1 | uint8 | 用一个字节表示 |
<= 0xFFFF | 3 | 0xFD + uint16 | 用3个字节表示,在整数的字节表示前加0xFD |
<= 0xFFFF FFFF | 5 | 0xFE + unit32 | 用5个字节表示,在整数的字节表示前加0xFE |
<= 0xFFFF FFFF FFFF FFFF | 9 | 0xFF + unit64 | 用9个字节表示,在整数的字节表示前加0xFF |
我们可以通过可变长度整数值的序列化方法WriteVarInt()来进一步理解:
//btcd/wire/common.go
// WriteVarInt serializes val to w using a variable number of bytes depending
// on its value.
func WriteVarInt(w io.Writer, pver uint32, val uint64) error {
if val < 0xfd {
return binarySerializer.PutUint8(w, uint8(val))
}
if val <= math.MaxUint16 {
err := binarySerializer.PutUint8(w, 0xfd)
if err != nil {
return err
}
return binarySerializer.PutUint16(w, littleEndian, uint16(val))
}
if val <= math.MaxUint32 {
err := binarySerializer.PutUint8(w, 0xfe)
if err != nil {
return err
}
return binarySerializer.PutUint32(w, littleEndian, uint32(val))
}
err := binarySerializer.PutUint8(w, 0xff)
if err != nil {
return err
}
return binarySerializer.PutUint64(w, littleEndian, val)
}
可以看出,WriteVarInt()完全按照可变长度整数值的定义,根据整数值的大小范围将其编码成不同长度的字节序列。ReadVarInt()是完全相反的过程,读者可以自行分析。除了可变长度整数值,还有可变长度字符串和可变长度字节流,它们均是通过在字符串或字节流头部添加一个表示实际长度的可变长度整数值来实现的。在一些协议消息格式里,定义了可变长度的整数值或者字符串,我们将在分析协议消息的BtcEncode()或BtcDecode()看到。
了解了基础数据类型的读写后,我们可以进一步分析Message进行封装和解析的过程,对应到WriteMessage()和ReadMessage()方法,它们会调用WriteMessageN()或者ReadMessageN()来实现。我们分析ReadMessageN()来了解解析过程,WriteMessageN()是其相反过程,不再赘述。
//btcd/wire/message.go
// ReadMessageN reads, validates, and parses the next bitcoin Message from r for
// the provided protocol version and bitcoin network. It returns the number of
// bytes read in addition to the parsed Message and raw bytes which comprise the
// message. This function is the same as ReadMessage except it also returns the
// number of bytes read.
func ReadMessageN(r io.Reader, pver uint32, btcnet BitcoinNet) (int, Message, []byte, error) {
totalBytes := 0
n, hdr, err := readMessageHeader(r) (1)
totalBytes += n
if err != nil {
return totalBytes, nil, nil, err
}
// Enforce maximum message payload.
if hdr.length > MaxMessagePayload { (2)
str := fmt.Sprintf("message payload is too large - header "+
"indicates %d bytes, but max message payload is %d "+
"bytes.", hdr.length, MaxMessagePayload)
return totalBytes, nil, nil, messageError("ReadMessage", str)
}
// Check for messages from the wrong bitcoin network.
if hdr.magic != btcnet { (3)
discardInput(r, hdr.length)
str := fmt.Sprintf("message from other network [%v]", hdr.magic)
return totalBytes, nil, nil, messageError("ReadMessage", str)
}
// Check for malformed commands.
command := hdr.command
if !utf8.ValidString(command) { (4)
discardInput(r, hdr.length)
str := fmt.Sprintf("invalid command %v", []byte(command))
return totalBytes, nil, nil, messageError("ReadMessage", str)
}
// Create struct of appropriate message type based on the command.
msg, err := makeEmptyMessage(command) (5)
if err != nil {
discardInput(r, hdr.length)
return totalBytes, nil, nil, messageError("ReadMessage",
err.Error())
}
// Check for maximum length based on the message type as a malicious client
// could otherwise create a well-formed header and set the length to max
// numbers in order to exhaust the machine's memory.
mpl := msg.MaxPayloadLength(pver)
if hdr.length > mpl { (6)
discardInput(r, hdr.length)
str := fmt.Sprintf("payload exceeds max length - header "+
"indicates %v bytes, but max payload size for "+
"messages of type [%v] is %v.", hdr.length, command, mpl)
return totalBytes, nil, nil, messageError("ReadMessage", str)
}
// Read payload.
payload := make([]byte, hdr.length)
n, err = io.ReadFull(r, payload) (7)
totalBytes += n
if err != nil {
return totalBytes, nil, nil, err
}
// Test checksum.
checksum := chainhash.DoubleHashB(payload)[0:4] (8)
if !bytes.Equal(checksum[:], hdr.checksum[:]) {
str := fmt.Sprintf("payload checksum failed - header "+
"indicates %v, but actual checksum is %v.",
hdr.checksum, checksum)
return totalBytes, nil, nil, messageError("ReadMessage", str)
}
// Unmarshal message. NOTE: This must be a *bytes.Buffer since the
// MsgVersion BtcDecode function requires it.
pr := bytes.NewBuffer(payload)
err = msg.BtcDecode(pr, pver) (9)
if err != nil {
return totalBytes, nil, nil, err
}
return totalBytes, msg, payload, nil
}
其主要过程为:
- 读取并解析消息头,如代码(1)处所示,请注意,这里的io.Reader实际为net.Conn对象,也就是读TCP Socket;
- 检测头部里填的payload长度是否超过32M的限制,如果超过限制,表明它可能是一个恶意包,停止解析并返回错误;
- 接下来检测头部里的magic number,如果不是从指定的网络(MainNet或者TestNet)上收到的包,则丢弃;
- 检测头部里的command字段,如果不是包含非utf8字符,则丢弃该包;
- 接下来,根据command来构造空的对应类型的消息,为解析payload作准备,如代码(5)处所示;
- 在读取payload之间,进一步检测头部里声明的payload长度是否超过了对应消息规定的最大长度,如果超过则丢弃;
- 上述检查通过后,读取payload,如代码(7)处所示;在最后解析payload之前,对payload进行hash校验,检查payload是否被篡改过,如代码(8)处所示;
- 最后,调用Message的“抽象方法”BtcDecode()对消息体进行解析,如果解析正确,则返回解析的结果;
可以看出,消息的解析过程依赖于具体消息类型的BtcDecode()和MaxPayloadLength()实现,同时也可以看到,封装过程依赖于BtcEncode()和Command()实现。了解了消息的封装和解析过程后,我们就可以开始分析具体消息类型。Bitcoin协议定义了约27种消息,核心的有version、verack、add、inv、getdata、notfound、getblocks、getheaders、tx、block、headers、getaddr、ping及pong等14类,其他的要么已经弃用、要么是根据BIP添加到更新版本中。为了便于大家理解对协议消息的介绍,我们先给出核心消息的交互图,以便大家有个直观映象。
图中绿色和红色表示不同的通信方向,一组绿色和红色构成一次消息交互。Peer之间通过交换version和verack来协商版本号,通过ping和pong来维持Peer连接,我们在《Btcd区块在P2P网络上的传播之Peer》中有过介绍。Peer之间通过getaddr和add来同步地址仓库,我们在介绍AddrManager时提及过。重要的是,Peer之间要通过getblocks-inv-getdata-block|tx交互来同步transaction或者block,并进一步达成区块链的共识。一些Bitcoin轻量级客户端不希望下载区块链上完整的交易记录,它可以只下载区块的头部,这里可以通过getheaders和headers消息来同步区块头。当节点收到新的transaction或者block后,它可以通过inv消息主动向Peer通告新的tx或者block。alert消息用于核心节点向全网发布通告,目前已经弃用。接下来,我们将主要介绍version、inv、getblocks、getdata、block和tx等消息格式。
version
version消息的格式定义如下:
//btcd/wire/msgversion.go
// MsgVersion implements the Message interface and represents a bitcoin version
// message. It is used for a peer to advertise itself as soon as an outbound
// connection is made. The remote peer then uses this information along with
// its own to negotiate. The remote peer must then respond with a version
// message of its own containing the negotiated values followed by a verack
// message (MsgVerAck). This exchange must take place before any further
// communication is allowed to proceed.
type MsgVersion struct {
// Version of the protocol the node is using.
ProtocolVersion int32
// Bitfield which identifies the enabled services.
Services ServiceFlag
// Time the message was generated. This is encoded as an int64 on the wire.
Timestamp time.Time
// Address of the remote peer.
AddrYou NetAddress
// Address of the local peer.
AddrMe NetAddress
// Unique value associated with message that is used to detect self
// connections.
Nonce uint64
// The user agent that generated messsage. This is a encoded as a varString
// on the wire. This has a max length of MaxUserAgentLen.
UserAgent string
// Last block seen by the generator of the version message.
LastBlock int32
// Don't announce transactions to peer.
DisableRelayTx bool
}
其中各字段的意义在注释中解释得很清楚,我们不再赘述。需要注意的是,在较新版本(Satoshi:0.14.1及以上)Bitcoin客户端实现中,AddrMe不再包含本地的IP和Port,因为节点可能通过Proxy上网,填入本地的地址没有意义。Nonce是为了防止自己给自己发送version消息,这一点我们在《Btcd区块在P2P网络上的传播之Peer》中分析过。UserAgent会被编码为可变长度字符串,它可以用来区别不同的客户端实现;当前默认的UserAgent是"/btcwire:0.5.0/",可以通过AddUserAgent()方法来附加,如当前btcd实现的UserAgent为“/btcwire:0.5.0/0.12.0”。我们可以通过wireshark抓version包来看看具体的包格式(注意抓包采用的客户端UserAgent为“/btcwire:0.3.0/0.12.0”):
MsgVersion的定义里涉及到NetAddress,我们来看看它的定义:
//btcd/wire/netaddress.go
// NetAddress defines information about a peer on the network including the time
// it was last seen, the services it supports, its IP address, and port.
type NetAddress struct {
// Last time the address was seen. This is, unfortunately, encoded as a
// uint32 on the wire and therefore is limited to 2106. This field is
// not present in the bitcoin version message (MsgVersion) nor was it
// added until protocol version >= NetAddressTimeVersion.
Timestamp time.Time
// Bitfield which identifies the services supported by the address.
Services ServiceFlag
// IP address of the peer.
IP net.IP
// Port the peer is using. This is encoded in big endian on the wire
// which differs from most everything else.
Port uint16
}
其中各字段的意义如下:
- Timestamp: 记录节点从“外部”获知该地址的最近时间点,该时刻离现在越早,说明该地址的“存活期”越长,对应地址失效的可能性就越大。值得注意的是,version消息里的发送端(AddrMe)和接收端地址(AddrYou)里并没有包含该字段;
- Services: 表明节点支持的服务,也即节点类型,包含SFNodeNetwork、SFNodeGetUTXO和SFNodeBloom;
- IP: IP地址;
- Port: 端口号;
在熟悉了version的格式定义后,理解BtcEncode()和BtcDecode()变得非常简单,它们就是调用writeElement()或readElement等方法对不同的数据类型进行读写。MessageVersion的BtcEncode()和BtcDecode()比较简单,我们不再专门分析。
inv
inv消息的定义如下:
//btcd/wire/msginv.go
// MsgInv implements the Message interface and represents a bitcoin inv message.
// It is used to advertise a peer's known data such as blocks and transactions
// through inventory vectors. It may be sent unsolicited to inform other peers
// of the data or in response to a getblocks message (MsgGetBlocks). Each
// message is limited to a maximum number of inventory vectors, which is
// currently 50,000.
//
// Use the AddInvVect function to build up the list of inventory vectors when
// sending an inv message to another peer.
type MsgInv struct {
InvList []*InvVect
}
inv主要用来向Peer通告区块或者交易数据,它是getblocks消息的响应消息,也可以主动发送。inv消息体包含一个InvVect列表和表示InvVect个数的可变长度整数Count值。InvVect的定义如下:
//btcd/wire/invvect.go
// InvVect defines a bitcoin inventory vector which is used to describe data,
// as specified by the Type field, that a peer wants, has, or does not have to
// another peer.
type InvVect struct {
Type InvType // Type of data
Hash chainhash.Hash // Hash of the data
}
......
// These constants define the various supported inventory vector types.
const (
InvTypeError InvType = 0
InvTypeTx InvType = 1
InvTypeBlock InvType = 2
InvTypeFilteredBlock InvType = 3
)
InvVect包含两个字段:
- Type: 指明数据的类型,如Tx、Block、或者FilteredBlock;
- Hash: 对应数据的Hash值,如某个transaction的hash或者block头的hash;
getblocks
getblocks消息定义如下:
//btcd/wire/msggetblocks.go
type MsgGetBlocks struct {
ProtocolVersion uint32
BlockLocatorHashes []*chainhash.Hash
HashStop chainhash.Hash
}
其中各字段意义如下:
- ProtocolVersion: 协议的版本号;
- BlockLocatorHashes: 记录一个BlockLocator,BlockLocator用于定位列表中第一个block元素在区块链中的位置;
- HashStop: getblocks请求的block区间的结束位置;
getblocks请求的区块位于BlockLocator指向的区块和HashStop指向的区块之间,不包括BlockLocator指向的区块;如果HashStop为零,则返回BlockLocator指向的区块之后的500个区块。 这里需要理解BlockLocator,我们来看看它的定义:
//btcd/blockchain/blocklocator.go
// BlockLocator is used to help locate a specific block. The algorithm for
// building the block locator is to add the hashes in reverse order until
// the genesis block is reached. In order to keep the list of locator hashes
// to a reasonable number of entries, first the most recent previous 10 block
// hashes are added, then the step is doubled each loop iteration to
// exponentially decrease the number of hashes as a function of the distance
// from the block being located.
//
// For example, assume you have a block chain with a side chain as depicted
// below:
// genesis -> 1 -> 2 -> ... -> 15 -> 16 -> 17 -> 18
// \-> 16a -> 17a
//
// The block locator for block 17a would be the hashes of blocks:
// [17a 16a 15 14 13 12 11 10 9 8 6 2 genesis]
type BlockLocator []*chainhash.Hash
可以看出,BlockLocator实际上是一个*chainhash.Hash类型的slice,用于记录一组block的hash值,slice中的第一个元素即BlockLocator指向的区块。由于区块链可能分叉,为了指明该区块的位置,BlockLocator记录了从指定区块往创世区块回溯的路径: BlockLocator中的前10个hash值是指定区块及其后(区块高度更小)的9个区块的hash值,它们之间的步长为1,第11个元素后步长成级数增加,即每一次向前回溯时,步长翻倍,使之加快回溯到创世区块,保证了BlockLocator中元素不至于过多。总之,BlockLocator记录slice中第一个元素代表的区块的位置。
我们可以通过分析MsgGetBlocks的BtcEncode方法来了解getblocks消息体的格式:
//btcd/wire/msggetblocks.go
// BtcEncode encodes the receiver to w using the bitcoin protocol encoding.
// This is part of the Message interface implementation.
func (msg *MsgGetBlocks) BtcEncode(w io.Writer, pver uint32) error {
count := len(msg.BlockLocatorHashes)
if count > MaxBlockLocatorsPerMsg {
str := fmt.Sprintf("too many block locator hashes for message "+
"[count %v, max %v]", count, MaxBlockLocatorsPerMsg)
return messageError("MsgGetBlocks.BtcEncode", str)
}
err := writeElement(w, msg.ProtocolVersion)
if err != nil {
return err
}
err = WriteVarInt(w, pver, uint64(count))
if err != nil {
return err
}
for _, hash := range msg.BlockLocatorHashes {
err = writeElement(w, hash)
if err != nil {
return err
}
}
return writeElement(w, &msg.HashStop)
}
可以看出,MsgGetBlocks序列化时按顺序写入协议版本号、BlockLocator中hash个数、BlockLocator中hash列表及截止hash值,这就是getblocks消息体的格式。
getdata
getdata的消息定义是:
//btcd/wire/msggetdata.go
// MsgGetData implements the Message interface and represents a bitcoin
// getdata message. It is used to request data such as blocks and transactions
// from another peer. It should be used in response to the inv (MsgInv) message
// to request the actual data referenced by each inventory vector the receiving
// peer doesn't already have. Each message is limited to a maximum number of
// inventory vectors, which is currently 50,000. As a result, multiple messages
// must be used to request larger amounts of data.
//
// Use the AddInvVect function to build up the list of inventory vectors when
// sending a getdata message to another peer.
type MsgGetData struct {
InvList []*InvVect
}
节点收到Peer的inv通告后,如果发现有更新的区块或者交易,则可以向Peer发送getdata请求来同步区块或者交易。getdata消息比较简单,与inv类似,它的消息体包含了InvVect列表,指明自己希望同步的区块或者交易的hash列表;Peer收到后回复block或tx消息,将区块或者交易发送给节点。
tx
tx消息用于在Peer之间同步transations,它的定义如下:
//btcd/wire/msgtx.go
// MsgTx implements the Message interface and represents a bitcoin tx message.
// It is used to deliver transaction information in response to a getdata
// message (MsgGetData) for a given transaction.
//
// Use the AddTxIn and AddTxOut functions to build up the list of transaction
// inputs and outputs.
type MsgTx struct {
Version int32
TxIn []*TxIn
TxOut []*TxOut
LockTime uint32
}
......
// TxIn defines a bitcoin transaction input.
type TxIn struct {
PreviousOutPoint OutPoint
SignatureScript []byte
Sequence uint32
}
......
// OutPoint defines a bitcoin data type that is used to track previous
// transaction outputs.
type OutPoint struct {
Hash chainhash.Hash
Index uint32
}
......
// TxOut defines a bitcoin transaction output.
type TxOut struct {
Value int64
PkScript []byte
}
从MsgTx的定义可以看出,一个transaction中主要包含一个TxIn的列表和TxOut列表,TxIn实际上指向输入交易的UTXO,TxOut是当前交易的UTXO。MsgTx的各字段意义如下:
- Version: Tx的版本号,当前版本号为1;高版本的Tx对LockTime或TxIn中的Sequence的使用不一样;
- TxIn: 引用的输入交易的UTXO(s),包含上一个交易的hash值和Index。Index表示上一个交易的输出的序号(因为上一个交易的输出UTXO可能有多个,序号从0开始);SignatureScript是解锁脚本;Sequence表示输入交易的序号,对于同一个交易,“矿工”优先选择Sequence更大的交易加入区块进行挖矿,但如果其值为0xffffffff,则表明该交易可以被加进任何区块;
- TxOut: 当前交易的输出UTXO(s),它包含解锁脚本和输出的Bitcoin数量,这里Value的单位是“聪”,即千万分之一比特币。PreviousOutPoint中的Index即是前一个交易的[]*TxOut中的索引号;
- LockTime: 既可以表示UTC时间,也可以表示区块高度。当其值小于 5x 10^8 (Tue Nov 5 00:53:20 1985 UTC) 时,它表示区块高度。交易只能被打包进大于该高度值或者在该时间点后的区块中。如果其值为0,表明该交易可以加入任何区块中。版本2及以上的交易结构引入了相对锁定时间(RLT, relative lock-time)的概念,联合LockTime和TxIn的Sequence字段来控制“矿工”节点能否将一个交易打包到某个区块中,详细说明可以参见BIP0068,本文暂是不深入介绍,我们将在介绍区块的处理和共识时再说明;
从PreviousOutPoint的定义中,可以看到所有的交易均会向前引用形成一条“交易链”,直到coinbase交易,我们在《曲线上的“加密货币”(一)》中介绍过UTXO(s)引用的结构。了解了其定义和结构后,我们再通过MsgTx的TxHash()了解交易的Hash是如何计算的:
//btcd/wire/msgtx.go
// TxHash generates the Hash for the transaction.
func (msg *MsgTx) TxHash() chainhash.Hash {
// Encode the transaction and calculate double sha256 on the result.
// Ignore the error returns since the only way the encode could fail
// is being out of memory or due to nil pointers, both of which would
// cause a run-time panic.
buf := bytes.NewBuffer(make([]byte, 0, msg.SerializeSize()))
_ = msg.Serialize(buf)
return chainhash.DoubleHashH(buf.Bytes())
}
可以看到,交易的Hash是整个交易结构的字节流进行两次SHA256()后的结果。其中,Serialize()方法就是调用BtcEncode()对MsgTx进行序列化。BtcEncode()或BtcDecode()就是按MsgTx的定义逐元素写或者读,逻辑比较清晰,我们不再赘述。需要注意的是,在BtcDecode()中,对锁定脚本或解锁脚本的读取用到了scriptFreeList,它类似于前面介绍过的binaryFreeList,也是用channel实现的“内存池”,读者可以自行分析。
block
除了tx外,block是btcd/wire里最重要的概念,它定义了区块的结构:
//btcd/wire/msgblock.go
// MsgBlock implements the Message interface and represents a bitcoin
// block message. It is used to deliver block and transaction information in
// response to a getdata message (MsgGetData) for a given block hash.
type MsgBlock struct {
Header BlockHeader
Transactions []*MsgTx
}
区块里包含区块头和一系列交易的集合,区块头的定义为:
//btcd/wire/blockheader.go
// BlockHeader defines information about a block and is used in the bitcoin
// block (MsgBlock) and headers (MsgHeaders) messages.
type BlockHeader struct {
// Version of the block. This is not the same as the protocol version.
Version int32
// Hash of the previous block in the block chain.
PrevBlock chainhash.Hash
// Merkle tree reference to hash of all transactions for the block.
MerkleRoot chainhash.Hash
// Time the block was created. This is, unfortunately, encoded as a
// uint32 on the wire and therefore is limited to 2106.
Timestamp time.Time
// Difficulty target for the block.
Bits uint32
// Nonce used to generate the block.
Nonce uint32
}
其中和字段的意义为:
- Version: 区块的版本,与协议版本号不同;
- PrevBlock: 链上前一个区块的Hash值,每个区块都通过该字段指向上一个区块,直到创世区块,从而形成链结构;
- MerkleRoot: 该区块中所有交易Hash构成的Merkle树的树根的Hash,它包涵了区块中所有交易的信息,我们将在后文中介绍Merkle树;
- Timestamp: 区块创建的时间点;
- Bits: 区块的目标难度值,“挖矿”的过程就是找到一个Nonce值使得区块Hash小于该值;
- Nonce: 用于“挖矿”或验证区块难度的随机值;
通过MsgBlock的BtcEncode()方法可以了解区块序列化后的格式:
//btcd/wire/msgblock.go
// BtcEncode encodes the receiver to w using the bitcoin protocol encoding.
// This is part of the Message interface implementation.
// See Serialize for encoding blocks to be stored to disk, such as in a
// database, as opposed to encoding blocks for the wire.
func (msg *MsgBlock) BtcEncode(w io.Writer, pver uint32) error {
err := writeBlockHeader(w, pver, &msg.Header)
if err != nil {
return err
}
err = WriteVarInt(w, pver, uint64(len(msg.Transactions)))
if err != nil {
return err
}
for _, tx := range msg.Transactions {
err = tx.BtcEncode(w, pver)
if err != nil {
return err
}
}
return nil
}
可以看出,区块的序列化结构中包含区块头,表示交易数量的整数值和交易列表。它的结构如下图所示:
值得注意的是,区块头是不包含交易数量值的。在计算区块的Hash时,由于MerkleRoot已经包涵了所有交易的信息,所以不用计算事个区块的Hash,只计算区块头的Hash,且不包括交易数量值。
//btcd/wire/msgblock.go
// BlockHash computes the block identifier hash for this block.
func (msg *MsgBlock) BlockHash() chainhash.Hash {
return msg.Header.BlockHash()
}
下面是通过wireshark抓到的block消息的包:
可以看出,block及tx的格式与我们上述介绍的一致。值得注意的是,图中第一个交易的输入是一个coinbase交易,其Hash是全零,Index是0xffffffff,且输出的Value是25.06238530个比特币。图中的Hash值全是小端模式,在blockchain.info上查询时需要先转换成大端模式。
至此,我们就介绍完了wire中协议消息封装和解析的过程,并重点分析了inv、tx、block等核心消息和概念。然而,节点同步到Peer的transaction或者block后会如何处理?或者,节点收到Peer的getblocks或者getdata请求后如何从自己的交易池或者区块链上找到对方需要的交易或者区块呢?只有弄清了这些问题,我们才能完整地了解Bitcoin协议交互的全过程,所以在继续消息的交互之前,我们将在下一篇文章《Btcd区块链的构建》中回答这些问题。