[原创] Techniques 中文翻译(Google Protocol Buffers中文教程)

Techniques 技巧


·         Streaming Multiple Messages

·         Large Data Sets

·         Union Types

·         Self-describing Messages

This page describes some commonly-used design patterns for dealing with Protocol Buffers. You can also send design and usage questions to the Protocol Buffers discussion group.

l  将多个消息转化为流

l  大数据集

l  联合类型

l  自描述的消息

本文描述了处理Protocol Buffers的时候一些常用的设计模式。你也可以向Protocol Buffers讨论组(Protocol Buffers discussion group)发送设计和使用方面的问题寻求解答。


Streaming Multiple Messages 将多个消息转化为流

If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer. (If you want to avoid copying bytes to a separate buffer, check out the CodedInputStream class (in both C++ and Java) which can be told to limit reads to a certain number of bytes.)

如果你想将多个消息写入一个文件或流(stream)中,那么是由你来记录一个消息的终点以及另一个消息的起点的。Protocol Buffer数据传输格式不是自我限定的(self-delimiting),所以protocol buffer解析器无法自己决定一个消息结束于何处。解决这个问题的最简单的方法就是:在写入每一个消息之前,先写入消息的大小。当你读取消息的时候,先读取消息大小,然后将指定的字节数读入一个独立的缓冲区中,然后再解析缓冲区里的东西。如果你不想将数据拷贝到一个独立的缓冲区中,请查看CodedInputStream类(C++Java都可用)的使用方法——你可以用它来限制只读取指定字节的数据。


Large Data Sets 大数据集

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

Protocol Buffers不是设计来处理大消息的。根据一般经验,如果你要处理的单条消息大于1M,那就是采取其他策略的时候了。


That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.

也就是说,Protocol Buffers非常适合于处理一个大数据集内有多个单独的消息。通常,大数据集只是许多小块数据的集合,每一小块都是一块结构化的数据。即使是这样,Protocol Buffers也不能马上处理整个数据集,使用Protocol Buffers来编码每一块数据可以极大地简化你的问题:现在你所需要的只是处理一组字符串,而不是一组结构体了。


Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.

Protocol Buffers没有内置任何对大数据集的支持,因为不同的情况需要不同的解决方案。



Union Types 联合类型

You may sometimes want to send a message that could be one of several different types. However, protocol buffer parsers cannot necessarily determine the type of a message based on the contents alone. So how do you make sure that the recipient application knows how to decode your message? One solution is to create a wrapper message that has one optional field for each possible message type.

有时,你可能想发送一个消息,它的类型可以是几种不同的类型之一。然而,protocol buffer解析器无法仅凭消息内容来决定消息的类型。所以,你如何确保接收方应用程序能知道怎么解析消息呢?有一个解决方案是:创建一个封装的消息,其含有Noptional的字段,每一个字段对应一种可能的消息类型。


For example, if you have message types FooBar, and Baz, you can combine them with a type like:


message OneMessage {

  // One of the following will be filled in.

  optional Foo foo = 1;

  optional Bar bar = 2;

  optional Baz baz = 3;


You may also want to have an enum field that identifies which message is filled in, so that you can switch on it:


message OneMessage {

  enum Type { FOO = 1; BAR = 2; BAZ = 3; }


  // Identifies which field is filled in.

  required Type type = 1;


  // One of the following will be filled in.

  optional Foo foo = 2;

  optional Bar bar = 3;

  optional Baz baz = 4;


If you have a very large number of possible types, listing every one of them in your container type may be unwieldy. Instead, you should consider using extensions:


message OneMessage {

  extensions 100 to max;



// Elsewhere...

extend OneMessage {

  optional Foo foo_ext = 100;

  optional Bar bar_ext = 101;

  optional Baz baz_ext = 102;


Note that you can use the ListFields reflection method (in C++, Java, and Python) to get a list of all fields present in the message, including extensions. You might use this as part of a scheme for registering handlers for diverse message types.



Self-describing Messages 自描述的消息

Protocol Buffers do not contain descriptions of their own types. Thus, given only a raw message without the corresponding .proto file defining its type, it is difficult to extract any useful data.

However, note that the contents of a .proto file can itself be represented using protocol buffers. The file src/google/protobuf/descriptor.protoin the source code package defines the message types involved. protoc can output a FileDescriptorSet – which represents a set of .proto files – using the --descriptor_set_out option. With this, you could define a self-describing protocol message like so:

Protocol Buffers不包含自我类型描述的信息。因此,如果只提供原始消息,而不提供对应的.proto文件,你将很难从中提取出任何有用的数据。

然而,请注意:一个.proto文件的内容可以使用protocol buffers来描述。通过使用--descriptor_set_out选项,源代码包中的src/google/protobuf/descriptor.proto文件定义了相关的消息类型。protoc编译器可以输出一个FileDescriptorSet——这个集合表示一系列的.proto文件。利用它,你可以像这样定义一个自描述的协议消息:


message SelfDescribingMessage {

  // Set of .proto files which define the type.

  required FileDescriptorSet proto_files = 1;


  // Name of the message type.  Must be defined by one of the files in

  // proto_files.

  required string type_name = 2;


  // The message data.

  required bytes message_data = 3;


By using classes like DynamicMessage (available in C++ and Java), you can then write tools which can manipulate SelfDescribingMessages.

All that said, the reason that this functionality is not included in the Protocol Buffer library is because we have never had a use for it inside Google.


总之,Protocol Buffer库之所以没有包含这个特性,是因为我们在Google还从没有需要用它的机会。

➤➤ 版权声明 ➤➤ 

wechat qrcode of codelast

One thought on “[原创] Techniques 中文翻译(Google Protocol Buffers中文教程)


电子邮件地址不会被公开。 必填项已用*标注