一種自動反射消息類型的 Google Protobuf 網(wǎng)絡(luò)傳輸方案

陳碩 (giantchen_AT_gmail)

Blog.csdn.net/Solstice t.sina.com.cn/giantchen

這篇文章要解決的問題是：在接收到 protobuf 數(shù)據(jù)之后，如何自動創(chuàng)建具體的 Protobuf Message 對象，再做的反序列化。“自動”的意思是：當(dāng)程序中新增一個 protobuf Message 類型時，這部分代碼不需要修改，不需要自己去注冊消息類型。其實，Google Protobuf 本身具有很強(qiáng)的反射(reflection)功能，可以根據(jù) type name 創(chuàng)建具體類型的 Message 對象，我們直接利用即可。

本文假定讀者了解 Google Protocol Buffers 是什么，這不是一篇 protobuf 入門教程。

本文以 C++ 語言舉例，其他語言估計有類似的解法，歡迎補(bǔ)充。

本文的示例代碼在： https://github.com/chenshuo/recipes/tree/master/protobuf

網(wǎng)絡(luò)編程中使用 protobuf 的兩個問題

Google Protocol Buffers (Protobuf) 是一款非常優(yōu)秀的庫，它定義了一種緊湊的可擴(kuò)展二進(jìn)制消息格式，特別適合網(wǎng)絡(luò)數(shù)據(jù)傳輸。它為多種語言提供 binding，大大方便了分布式程序的開發(fā)，讓系統(tǒng)不再局限于用某一種語言來編寫。

在網(wǎng)絡(luò)編程中使用 protobuf 需要解決兩個問題：

長度，protobuf 打包的數(shù)據(jù)沒有自帶長度信息或終結(jié)符，需要由應(yīng)用程序自己在發(fā)生和接收的時候做正確的切分；
類型，protobuf 打包的數(shù)據(jù)沒有自帶類型信息，需要由發(fā)送方把類型信息傳給給接收方，接收方創(chuàng)建具體的 Protobuf Message 對象，再做的反序列化。

第一個很好解決，通常的做法是在每個消息前面加個固定長度的 length header，例如我在《Muduo 網(wǎng)絡(luò)編程示例之二： Boost.Asio 的聊天服務(wù)器》中實現(xiàn)的 LengthHeaderCodec，代碼見 http://code.google.com/p/muduo/source/browse/trunk/examples/asio/chat/codec.h

第二個問題其實也很好解決，Protobuf 對此有內(nèi)建的支持。但是奇怪的是，從網(wǎng)上簡單搜索的情況看，我發(fā)現(xiàn)了很多山寨的做法。

山寨做法

以下均為在 protobuf data 之前加上 header，header 中包含 int length 和類型信息。類型信息的山寨做法主要有兩種：

在 header 中放 int typeId，接收方用 switch-case 來選擇對應(yīng)的消息類型和處理函數(shù)；
在 header 中放 string typeName，接收方用 look-up table 來選擇對應(yīng)的消息類型和處理函數(shù)。

這兩種做法都有問題。

第一種做法要求保持 typeId 的唯一性，它和 protobuf message type 一一對應(yīng)。如果 protobuf message 的使用范圍不廣，比如接收方和發(fā)送方都是自己維護(hù)的程序，那么 typeId 的唯一性不難保證，用版本管理工具即可。如果 protobuf message 的使用范圍很大，比如全公司都在用，而且不同部門開發(fā)的分布式程序可能相互通信，那么就需要一個公司內(nèi)部的全局機(jī)構(gòu)來分配 typeId，每次增加新 message type 都要去注冊一下，比較麻煩。

第二種做法稍好一點(diǎn)。typeName 的唯一性比較好辦，因為可以加上 package name（也就是用 message 的 fully qualified type name），各個部門事先分好 namespace，不會沖突與重復(fù)。但是每次新增消息類型的時候都要去手工修改 look-up table 的初始化代碼，比較麻煩。

其實，不需要自己重新發(fā)明輪子，protobuf 本身已經(jīng)自帶了解決方案。

根據(jù) type name 反射自動創(chuàng)建 Message 對象

Google Protobuf 本身具有很強(qiáng)的反射(reflection)功能，可以根據(jù) type name 創(chuàng)建具體類型的 Message 對象。但是奇怪的是，其官方教程里沒有明確提及這個用法，我估計還有很多人不知道這個用法，所以覺得值得寫這篇 blog 談一談。

以下是陳碩繪制的 Protobuf class diagram，點(diǎn)擊查看原圖。

protobuf_classdiagram

我估計大家通常關(guān)心和使用的是圖的左半部分：MessageLite、Message、Generated Message Types (Person, AddressBook) 等，而較少注意到圖的右半部分：Descriptor, DescriptorPool, MessageFactory。

上圖中，其關(guān)鍵作用的是 Descriptor class，每個具體 Message Type 對應(yīng)一個 Descriptor 對象。盡管我們沒有直接調(diào)用它的函數(shù)，但是Descriptor在“根據(jù) type name 創(chuàng)建具體類型的 Message 對象”中扮演了重要的角色，起了橋梁作用。上圖的紅色箭頭描述了根據(jù) type name 創(chuàng)建具體 Message 對象的過程，后文會詳細(xì)介紹。

原理簡述

Protobuf Message class 采用了 prototype pattern，Message class 定義了 New() 虛函數(shù)，用以返回本對象的一份新實例，類型與本對象的真實類型相同。也就是說，拿到 Message* 指針，不用知道它的具體類型，就能創(chuàng)建和它類型一樣的具體 Message Type 的對象。

每個具體 Message Type 都有一個 default instance，可以通過 ConcreteMessage::default_instance() 獲得，也可以通過 MessageFactory::GetPrototype(const Descriptor*) 來獲得。所以，現(xiàn)在問題轉(zhuǎn)變?yōu)?1. 如何拿到 MessageFactory；2. 如何拿到 Descriptor*。

當(dāng)然，ConcreteMessage::descriptor() 返回了我們想要的 Descriptor*，但是，在不知道 ConcreteMessage 的時候，如何調(diào)用它的靜態(tài)成員函數(shù)呢？這似乎是個雞與蛋的問題。

我們的英雄是 DescriptorPool，它可以根據(jù) type name 查到 Descriptor*，只要找到合適的 DescriptorPool，再調(diào)用 DescriptorPool::FindMessageTypeByName(const string& type_name) 即可。眼前一亮？

在最終解決問題之前，先簡單測試一下，看看我上面說的對不對。

簡單測試

本文用于舉例的 proto 文件：query.proto，見 https://github.com/chenshuo/recipes/blob/master/protobuf/query.proto

package muduo;
message Query {
required int64 id = 1;
required string questioner = 2;
repeated string question = 3;
}
message Answer {
required int64 id = 1;
required string questioner = 2;
required string answerer = 3;
repeated string solution = 4;
}
message Empty {
optional int32 id = 1;
}

其中的 Query.questioner 和 Answer.answerer 是我在前一篇文章這提到的《分布式系統(tǒng)中的進(jìn)程標(biāo)識》。

以下代碼驗證 ConcreteMessage::default_instance()、ConcreteMessage::descriptor()、 MessageFactory::GetPrototype()、DescriptorPool::FindMessageTypeByName() 之間的不變式 (invariant)：

https://github.com/chenshuo/recipes/blob/master/protobuf/descriptor_test.cc#L15

  typedef muduo::Query T;
std::string type_name = T::descriptor()->full_name();
cout << type_name << endl;
const Descriptor* descriptor = DescriptorPool::generated_pool()->FindMessageTypeByName(type_name);
assert(descriptor == T::descriptor());
cout << "FindMessageTypeByName() = " << descriptor << endl;
cout << "T::descriptor()         = " << T::descriptor() << endl;
cout << endl;
const Message* prototype = MessageFactory::generated_factory()->GetPrototype(descriptor);
assert(prototype == &T::default_instance());
cout << "GetPrototype()        = " << prototype << endl;
cout << "T::default_instance() = " << &T::default_instance() << endl;
cout << endl;
T* new_obj = dynamic_cast<T*>(prototype->New());
assert(new_obj != NULL);
assert(new_obj != prototype);
assert(typeid(*new_obj) == typeid(T::default_instance()));
cout << "prototype->New() = " << new_obj << endl;
cout << endl;
delete new_obj;

根據(jù) type name 自動創(chuàng)建 Message 的關(guān)鍵代碼

好了，萬事具備，開始行動：

用 DescriptorPool::generated_pool() 找到一個 DescriptorPool 對象，它包含了程序編譯的時候所鏈接的全部 protobuf Message types。
用 DescriptorPool::FindMessageTypeByName() 根據(jù) type name 查找 Descriptor。
再用 MessageFactory::generated_factory() 找到 MessageFactory 對象，它能創(chuàng)建程序編譯的時候所鏈接的全部 protobuf Message types。
然后，用 MessageFactory::GetPrototype() 找到具體 Message Type 的 default instance。
最后，用 prototype->New() 創(chuàng)建對象。

示例代碼見 https://github.com/chenshuo/recipes/blob/master/protobuf/codec.h#L69

Message* createMessage(const std::string& typeName)
{
Message* message = NULL;
const Descriptor* descriptor = DescriptorPool::generated_pool()->FindMessageTypeByName(typeName);
if (descriptor)
{
const Message* prototype = MessageFactory::generated_factory()->GetPrototype(descriptor);
if (prototype)
{
message = prototype->New();
}
}
return message;
}

調(diào)用方式：https://github.com/chenshuo/recipes/blob/master/protobuf/descriptor_test.cc#L49

  Message* newQuery = createMessage("muduo.Query");
assert(newQuery != NULL);
assert(typeid(*newQuery) == typeid(muduo::Query::default_instance()));
cout << "createMessage(\"muduo.Query\") = " << newQuery << endl;

古之人不余欺也 :-)

注意，createMessage() 返回的是動態(tài)創(chuàng)建的對象的指針，調(diào)用方有責(zé)任釋放它，不然就會內(nèi)存泄露。在 muduo 里，我用 shared_ptr<Message> 來自動管理 Message 對象的生命期。

線程安全性

Google 的文檔說，我們用到的那幾個 MessageFactory 和 DescriptorPool 都是線程安全的，Message::New() 也是線程安全的。并且它們都是 const member function。

關(guān)鍵問題解決了，那么剩下工作就是設(shè)計一種包含長度和消息類型的 protobuf 傳輸格式。

Protobuf 傳輸格式

陳碩設(shè)計了一個簡單的格式，包含 protobuf data 和它對應(yīng)的長度與類型信息，消息的末尾還有一個 check sum。格式如下圖，圖中方塊的寬度是 32-bit。

protobuf_wireformat1

用 C struct 偽代碼描述：

 struct ProtobufTransportFormat __attribute__ ((__packed__))
{
int32_t  len;
int32_t  nameLen;
char     typeName[nameLen];
char     protobufData[len-nameLen-8];
int32_t  checkSum; // adler32 of nameLen, typeName and protobufData
};

注意，這個格式不要求 32-bit 對齊，我們的 decoder 會自動處理非對齊的消息。

例子

用這個格式打包一個 muduo.Query 對象的結(jié)果是：

protobuf_wireexample

設(shè)計決策

以下是我在設(shè)計這個傳輸格式時的考慮：

signed int。消息中的長度字段只使用了 signed 32-bit int，而沒有使用 unsigned int，這是為了移植性，因為 Java 語言沒有 unsigned 類型。另外 Protobuf 一般用于打包小于 1M 的數(shù)據(jù)，unsigned int 也沒用。
check sum。雖然 TCP 是可靠傳輸協(xié)議，雖然 Ethernet 有 CRC-32 校驗，但是網(wǎng)絡(luò)傳輸必須要考慮數(shù)據(jù)損壞的情況，對于關(guān)鍵的網(wǎng)絡(luò)應(yīng)用，check sum 是必不可少的。對于 protobuf 這種緊湊的二進(jìn)制格式而言，肉眼看不出數(shù)據(jù)有沒有問題，需要用 check sum。
adler32 算法。我沒有選用常見的 CRC-32，而是選用 adler32，因為它計算量小、速度比較快，強(qiáng)度和 CRC-32差不多。另外，zlib 和 java.unit.zip 都直接支持這個算法，不用我們自己實現(xiàn)。
type name 以 '\0' 結(jié)束。這是為了方便 troubleshooting，比如通過 tcpdump 抓下來的包可以用肉眼很容易看出 type name，而不用根據(jù) nameLen 去一個個數(shù)字節(jié)。同時，為了方便接收方處理，加入了 nameLen，節(jié)省 strlen()，空間換時間。
沒有版本號。Protobuf Message 的一個突出優(yōu)點(diǎn)是用 optional fields 來避免協(xié)議的版本號（凡是在 protobuf Message 里放版本號的人都沒有理解 protobuf 的設(shè)計），讓通信雙方的程序能各自升級，便于系統(tǒng)演化。如果我設(shè)計的這個傳輸格式又把版本號加進(jìn)去，那就畫蛇添足了。具體請見本人《分布式系統(tǒng)的工程化開發(fā)方法》第 57 頁：消息格式的選擇。

示例代碼

為了簡單起見，采用 std::string 來作為打包的產(chǎn)物，僅為示例。

打包 encode 的代碼：https://github.com/chenshuo/recipes/blob/master/protobuf/codec.h#L35

解包 decode 的代碼：https://github.com/chenshuo/recipes/blob/master/protobuf/codec.h#L99

測試代碼： https://github.com/chenshuo/recipes/blob/master/protobuf/codec_test.cc

如果以上代碼編譯通過，但是在運(yùn)行時出現(xiàn)“cannot open shared object file”錯誤，一般可以用 sudo ldconfig 解決，前提是 libprotobuf.so 位于 /usr/local/lib，且 /etc/ld.so.conf 列出了這個目錄。

$ make all # 如果你安裝了 boost，可以 make whole

$ ./codec_test
./codec_test: error while loading shared libraries: libprotobuf.so.6: cannot open shared object file: No such file or directory

$ sudo ldconfig

與 muduo 集成

muduo 網(wǎng)絡(luò)庫將會集成對本文所述傳輸格式的支持（預(yù)計 0.1.9 版本），我會另外寫一篇短文介紹 Protobuf Message <=> muduo::net::Buffer 的相互轉(zhuǎn)化，使用 muduo::net::Buffer 來打包比上面 std::string 的代碼還簡單，它是專門為 non-blocking 網(wǎng)絡(luò)庫設(shè)計的 buffer class。

此外，我們可以寫一個 codec 來自動完成轉(zhuǎn)換，就行 asio/char/codec.h 那樣。這樣客戶代碼直接收到的就是 Message 對象，發(fā)送的時候也直接發(fā)送 Message 對象，而不需要和 Buffer 對象打交道。

消息的分發(fā) (dispatching)

目前我們已經(jīng)解決了消息的自動創(chuàng)建，在網(wǎng)絡(luò)編程中，還有一個常見任務(wù)是把不同類型的 Message 分發(fā)給不同的處理函數(shù)，這同樣可以借助 Descriptor 來完成。我在 muduo 里實現(xiàn)了 ProtobufDispatcherLite 和 ProtobufDispatcher 兩個分發(fā)器，用戶可以自己注冊針對不同消息類型的處理函數(shù)。預(yù)計將會在 0.1.9 版本發(fā)布，您可以先睹為快：

初級版，用戶需要自己做 down casting： https://github.com/chenshuo/recipes/blob/master/protobuf/dispatcher_lite.cc

高級版，使用模板技巧，節(jié)省用戶打字： https://github.com/chenshuo/recipes/blob/master/protobuf/dispatcher.cc

基于 muduo 的 Protobuf RPC?

Google Protobuf 還支持 RPC，可惜它只提供了一個框架，沒有開源網(wǎng)絡(luò)相關(guān)的代碼，muduo 正好可以填補(bǔ)這一空白。我目前還沒有決定是不是讓 muduo 也支持以 protobuf message 為消息格式的 RPC，muduo 還有很多事情要做，我也有很多博客文章打算寫，RPC 這件事情以后再說吧。

注：Remote Procedure Call (RPC) 有廣義和狹義兩種意思。狹義的講，一般特指 ONC RPC，就是用來實現(xiàn) NFS 的那個東西；廣義的講，“以函數(shù)調(diào)用之名，行網(wǎng)絡(luò)通信之實”都可以叫 RPC，比如 Java RMI，.Net Remoting，Apache Thrift，libevent RPC，XML-RPC 等等。

(待續(xù))

posted on 2011-04-03 15:56 陳碩閱讀(5578) 評論(1) 編輯收藏引用

# re: 一種自動反射消息類型的 Google Protobuf 網(wǎng)絡(luò)傳輸方案 2014-05-15 14:29 yilong

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

陳碩的Blog

一種自動反射消息類型的 Google Protobuf 網(wǎng)絡(luò)傳輸方案

網(wǎng)絡(luò)編程中使用 protobuf 的兩個問題

山寨做法

根據(jù) type name 反射自動創(chuàng)建 Message 對象

原理簡述

簡單測試

根據(jù) type name 自動創(chuàng)建 Message 的關(guān)鍵代碼

線程安全性

Protobuf 傳輸格式

例子

設(shè)計決策

示例代碼

與 muduo 集成

消息的分發(fā) (dispatching)

基于 muduo 的 Protobuf RPC?

評論

導(dǎo)航

統(tǒng)計

常用鏈接

隨筆分類

隨筆檔案

相冊

搜索

最新評論

閱讀排行榜

評論排行榜

只有注冊用戶登錄后才能發(fā)表評論。
【推薦】100%開源！大型工業(yè)跨平臺軟件C++源碼提供，建模，組態(tài)！



網(wǎng)站導(dǎo)航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理