c++ - Reading multiple delimited protobuf messages from a file on Windows -
i'm writing tool master thesis, needs read protobuf datastreams file. until worked exclusively on mac os , fine, i'm trying run tool on windows too.
sadly on windows not able read multiple consecutive messages single stream. tried narrow problem down , came following small program reproduces problem.
#include "tokens.pb.h" #include <google/protobuf/io/coded_stream.h> #include <google/protobuf/io/zero_copy_stream_impl.h> #include <fstream> int main(int argc, char* argv[]) { std::fstream tokenfile(argv[1], std::ios_base::in); if(!tokenfile.is_open()) return -1; google::protobuf::io::istreaminputstream iis(&tokenfile); google::protobuf::io::codedinputstream cis(&iis); while(true){ google::protobuf::io::codedinputstream::limit l; unsigned int msgsize; if(!cis.readvarint32(&msgsize)) return 0; // reached eof l = cis.pushlimit(msgsize); tokenio::union msg; if(!msg.parsefromcodedstream(&cis)) return -2; // couldn't read msg if(cis.bytesuntillimit() > 0) return -3; // msg not read cis.poplimit(l); if(!msg.has_string() && !msg.has_file() && !msg.has_token() && !msg.has_type()) return -4; // msg contains no data } return 0; }
on mac os runs fine , returns 0 after reading whole file expected.
on windows first message read without problems. second messageparsefromcodedinputstream
still returns true not read data. results in bytesuntillimit
value larger 0 , return value of -3. of course message not contain useable data. further reads cis
fail, if end of stream reached, though file has not been read completely.
i tried using fileinputstream
file descriptor input same result. removing push/poplimit
, reading data using readstring
calls explicit message sizes , parsing string didn't help.
the following protobuf file used.
package tokenio; message tokentype { required uint32 id = 1; required string name = 2; } message stringinstance { required string value = 1; optional uint64 id = 2; } message beginoffile { required uint64 name = 1; optional uint64 type = 2; } message token { required uint32 type = 1; required uint32 offset = 2; optional uint32 line = 3; optional uint32 column = 4; optional uint64 value = 5; } message union { optional tokentype type = 1; optional stringinstance string = 2; optional beginoffile file = 3; optional token token = 4; }
the input file seems ok. @ least readable protobuf editor (on windows , mac os) c++ implementation on mac os.
the code tested:
- as working on mac os 10.8.4, compiled xcode 4.6.3 , protobuf 2.5.0
- as not working on windows 8 64bit, compiled visual studio 2012 ultimate , protobuf 2.5.0
what doing wrong?
make std::fstream tokenfile(argv[1], std::ios_base::in | std::ios_base::binary);
. default text mode; on mac , other unix-like systems doesn't matter, on windows in text mode crlf sequences translated lf, , ^z (aka '\x1a') character treated end-of-file indicator. characters might, coincidence, occur in binary stream, , cause trouble.
Comments
Post a Comment