My personal notes and sample codes.
Aditya Hajare (Linkedin).
WIP (Work In Progress)!
Open-sourced software licensed under the MIT license.
- Data is fully typed.
- Data is compressed automatically (less CPU usage).
Schema(defined using.protofile) is needed to generate code and read the data.- Documentation can be embedded in the
Schema. - Data can be read across any language (C#, Java, Go, Python, JavaScript, etc..).
Schemacan evolve over time in a safe manner (Schemaevolution).- 3-10x smaller, 20-100x faster than XML.
- Code is generated for us automatically.
Protobufsupport for some languages might be lacking (but the main ones are fine).- Can't open the serialized data with a text editor (because it's compressed an serialized).
- In protocol buffers, field names are not important! But when programming, fields/field names are important.
- For
protobuf, the important element is thetag. - Smallest Tag number we can have is
1. - Largest Tag number we can have is
2²⁹-1i.e.536,870,911. - We cannot use numbers between
19000to19999. These are reserved by Google for special use. - Tags numbered from
1to15use1 bytein space, so use them for frequently populated fields. - For fields those are less populated, use Tag numbers from
16to2024. They use2 bytesin space.
- To make a
listor anarray, we can use a concept ofRepeated Fields. - The list can take any number (0 or more) of elements we want.
- The opposite of
repeatedissingular(We don't write it).
- If we know all the values a field can take in advance, we can leverage an
Enumtype. - The first value of an
Enumis the Default value. Enummust start by the tag0(which is the default value).
- Following command is used to generate Golang code:
# Browse to the directory cd ~/work/Golang/golang_protocol_buffers/02-Protoc-To-Generate-Golang-Code # protoc: Compiler for protocol buffers # -I: specifies source directory where protocol buffer files are resided # --go_out: specifies output directory # At the end specify path to protocol buffer file(s) protoc -I=proto --go_out=go proto/*.proto- Don't change the numeric tags for any existing fields.
- We can add new fields, and old code will just ignore them.
- Likewise, if old/new code reads unknown data, the defaults will take place.
- Fields can be removed as long as the tag number is not used again in our updated message type. We may want to field instead, perhaps adding the prefix
OBSOLUTE_, or make the tagreservedso that future uses of our.protocan't accidentially reuse the number. - For changing data types (e.g.
int32toint643) we must refer to the documentations. - When removing a field, we must always
reservethe tag and the field name! This preventstagand field name to be re-used. For e.g.
// Original Message message Message { int32 id = 1; string first_name = 2; } // Lets remove field "first_name" message Message { reserved 2; // Reserved tag number reserved "first_name"; // Reserved field name int32 id = 1; }- We can reserve
TAGSandFIELD NAMES. - We can't mix
TAGSandFIELD NAMESin the samereservedstatement. For e.g.
// Correct way to use "reserved" keyword: message Message { reserved 2, 4, 15, 20 to 30; reserved "first_name", "last_name"; }- Do not EVER remove any RESERVED tags!
- We can use
oneofto tell protocol buffers that only one field can have a value set to it. For e.g.:
message HelloAditya { int32 id = 1; oneof some_name_field { // In "some_name_field" either "name" or // "first_name" field will have value set to it. string name = 2; string first_name = 3; } }- Maps can be used to define scaler message types. It's like a
dictionaryin python orstructsin go:
message Message { int32 id = 1; map<string, Result> results = 2; }- Map fields cannot be repeated.
- THere's no ordering for map (its
key => valuestore).
- Protocol Buffers contains a set of
Well Known Types. For e.g. advanced types known to all programming languages. - Full list of
Well Known Types: https://developers.google.com/protocol-buffers/docs/reference/google.protobuf - One of the types is
Timestamp- fields aresecondsandnanoseconds(UTC). - Don't forget to use the
importstatement. - For e.g.
syntax = "proto3"; import "google/protobuf/timestamp.proto"; message Sample { google.protobuf.Timestamp my_timestamp = 1; }Durationis yet anotherWell Known Type.- It represents the time span between 2 timestamps.
- Just like
Timestamp, it containssecondsandnanoseconds. - For e.g.
syntax = "proto3"; import "google/protobuf/timestamp.proto"; import "google/protobuf/duration.proto"; message Sample { google.protobuf.Timestamp my_timestamp = 1; google.protobuf.Duration validity = 2; }