Understanding Protocol Buffers: Part 1 - Introduction and Core Concepts

Protocol Buffers explained: what they are, why Google uses them for billions of RPCs, and when you should choose protobuf over JSON. A practical introduction without overwhelming technical details.

What is Protocol Buffers?

Protocol Buffers (protobuf) is a method for serializing structured data. Think of it as a replacement for JSON or XML, but:

  • Smaller - 3-10x less data over the wire
  • Faster - 5-10x faster to encode/decode
  • Type-safe - Compile-time validation instead of runtime errors
  • Language-agnostic - One schema works for Go, Python, Java, C++, JavaScript

Google developed protobuf internally and uses it for virtually all inter-service communication. When you’re handling billions of requests per second, those performance gains matter.

The Core Idea: Schema-First Design

Unlike JSON (which is schema-optional), protobuf requires you to define your data structure upfront:

JSON approach (no schema):

1
2
3
4
5
6
// Send this, hope the other side knows what to expect
{
  "name": "John",
  "email": "john@example.com",
  "age": 30
}

Protobuf approach (schema required):

1
2
3
4
5
6
7
8
// user.proto - define the schema
syntax = "proto3";

message User {
  string name = 1;
  string email = 2;
  int32 age = 3;
}

The schema becomes the contract between services. Both sides know exactly what fields exist, what types they are, and what the message structure looks like.

How It Works: The Three-Step Process

Step 1: Define Your Schema

Write a .proto file describing your data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
syntax = "proto3";
package example;

message Person {
  string name = 1;
  string email = 2;
  int32 age = 3;
  repeated string hobbies = 4;  // Array/list
}

message Team {
  string name = 1;
  repeated Person members = 2;  // Nested messages
}

Key concepts:

  • message = struct/class (a collection of fields)
  • Numbers (1, 2, 3) = field identifiers (not values!)
  • repeated = array/list of values
  • Nested messages allowed

Step 2: Generate Code

Run the protobuf compiler:

1
2
3
4
5
6
7
8
# Generate Go code
protoc --go_out=. user.proto

# Generate Python code
protoc --python_out=. user.proto

# Generate Java code
protoc --java_out=. user.proto

This creates language-specific code with:

  • Structs/classes matching your schema
  • Serialization methods (message → bytes)
  • Deserialization methods (bytes → message)

Step 3: Use in Your Application

Go example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import pb "example.com/generated/user"

// Create a message
person := &pb.Person{
    Name:    "Alice",
    Email:   "alice@example.com",
    Age:     28,
    Hobbies: []string{"coding", "hiking"},
}

// Serialize to bytes (binary format)
data, err := proto.Marshal(person)
// data is now compact binary representation

// Send over network, write to file, etc.

// Deserialize back
var person2 pb.Person
proto.Unmarshal(data, &person2)
// person2 now has all the fields

The binary format is what makes it fast and small.

Why Binary Format Matters

JSON representation:

1
2
3
4
5
6
{
  "name": "Alice",
  "email": "alice@example.com",
  "age": 28,
  "hobbies": ["coding", "hiking"]
}

Size: 94 bytes (human-readable text)

Protobuf binary:

[binary data]

Size: ~35 bytes (optimized binary)

Why smaller:

  • No field names in the binary (uses field numbers instead)
  • Compact integer encoding (small numbers use 1 byte)
  • No whitespace or formatting
  • Efficient string encoding

Why faster:

  • No text parsing
  • Direct memory access
  • Optimized for CPU cache
  • Predictable structure

Field Numbers: The Secret Sauce

Those numbers (1, 2, 3) in your schema aren’t arbitrary:

1
2
3
4
5
message User {
  string name = 1;   // Field number 1
  string email = 2;  // Field number 2
  int32 age = 3;     // Field number 3
}

In the binary format:

  • Field names (“name”, “email”) are never sent
  • Only field numbers (1, 2, 3) are encoded
  • Receiver uses the schema to map numbers → names

This enables backward compatibility:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Version 1
message User {
  string name = 1;
  string email = 2;
}

// Version 2 - add a field
message User {
  string name = 1;
  string email = 2;
  int32 age = 3;      // New field!
}

Old clients (using Version 1) can still read messages from new servers (Version 2). They just ignore field 3. New clients can read old messages - they see field 3 as empty.

Golden rule: Never reuse field numbers. Once you assign number 3 to “age”, that number is forever “age”.

Types in Protobuf

Scalar types:

1
2
3
4
5
6
7
8
message Example {
  string text = 1;       // UTF-8 string
  int32 number = 2;      // 32-bit integer
  int64 big_number = 3;  // 64-bit integer
  bool flag = 4;         // true/false
  bytes data = 5;        // Raw bytes
  double price = 6;      // Floating point
}

Collections:

1
2
3
4
message Example {
  repeated string tags = 1;           // Array of strings
  map<string, int32> counts = 2;      // Key-value map
}

Nested messages:

1
2
3
4
5
6
7
8
9
message Address {
  string street = 1;
  string city = 2;
}

message Person {
  string name = 1;
  Address address = 2;  // Nested message
}

Enums:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
enum Status {
  UNKNOWN = 0;
  ACTIVE = 1;
  INACTIVE = 2;
}

message User {
  string name = 1;
  Status status = 2;
}

Protobuf vs JSON: When to Use Each

Use Protobuf When:

Performance is critical:

  • High-throughput systems (thousands of requests/sec)
  • Mobile apps (bandwidth costs money)
  • IoT devices (limited CPU/memory)
  • Real-time systems (latency matters)

Type safety matters:

  • Multiple teams consuming your API
  • Long-term API stability required
  • Cross-language communication
  • Compile-time error catching

Examples:

  • Microservices (gRPC between services)
  • Mobile backends (reduce data usage)
  • Streaming systems (Kafka, Pub/Sub)
  • Internal APIs at scale

Use JSON When:

Human interaction needed:

  • REST APIs for web browsers
  • Public APIs (easier to document/debug)
  • Configuration files
  • Quick prototyping

Simplicity matters:

  • Small projects
  • Infrequent requests
  • Developer experience > performance
  • Debugging with curl/browser

Examples:

  • Public REST APIs
  • Web dashboards
  • Config files
  • Development/testing

gRPC: Protobuf’s Most Common Use

gRPC is a framework for building APIs that uses protobuf for serialization:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Define both data structures AND service methods
service UserService {
  rpc GetUser(GetUserRequest) returns (User);
  rpc CreateUser(CreateUserRequest) returns (User);
  rpc ListUsers(ListUsersRequest) returns (ListUsersResponse);
}

message GetUserRequest {
  string user_id = 1;
}

message User {
  string id = 1;
  string name = 2;
  string email = 3;
}

This generates:

  • Server interface - implement these methods in your language
  • Client code - call remote methods like local functions
  • Network protocol - HTTP/2 + protobuf encoding

Server (Go):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
type server struct {
    pb.UnimplementedUserServiceServer
}

func (s *server) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.User, error) {
    // Fetch user from database
    return &pb.User{
        Id:    req.UserId,
        Name:  "Alice",
        Email: "alice@example.com",
    }, nil
}

Client (Go):

1
2
3
4
5
6
7
conn, _ := grpc.Dial("localhost:9090")
client := pb.NewUserServiceClient(conn)

user, err := client.GetUser(ctx, &pb.GetUserRequest{
    UserId: "123",
})
// Looks like a local function call, but it's network RPC

Protobuf Without gRPC: REST APIs

COMMON MISUNDERSTANDING

Many developers think protobuf and gRPC are inseparable - that you can’t use one without the other.

This is false.

Protobuf is a serialization format (like JSON). gRPC is an RPC framework that happens to use protobuf. They’re separate technologies that work well together but don’t require each other.

You can use:

  • Protobuf with REST APIs (HTTP/1.1)
  • Protobuf with WebSockets
  • Protobuf with message queues (Kafka, RabbitMQ)
  • Protobuf for file storage
  • gRPC with other serialization formats (though protobuf is the standard)

Don’t skip protobuf just because you don’t want gRPC. They’re decoupled.

Protobuf is just a serialization format. You can use it with REST APIs, message queues, websockets, or any transport layer.

REST + Protobuf Example

You can build traditional REST APIs using protobuf instead of JSON:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// Standard REST endpoint with protobuf
func CreateUser(w http.ResponseWriter, r *http.Request) {
    // Read protobuf from request body
    body, _ := io.ReadAll(r.Body)

    var req pb.CreateUserRequest
    proto.Unmarshal(body, &req)

    // Process request
    user := &pb.User{
        Id:    generateID(),
        Name:  req.Name,
        Email: req.Email,
    }

    // Return protobuf response
    data, _ := proto.Marshal(user)
    w.Header().Set("Content-Type", "application/x-protobuf")
    w.Write(data)
}

Standard REST structure:

POST /api/v1/users          → Create user (protobuf body)
GET /api/v1/users/123       → Get user (protobuf response)
PUT /api/v1/users/123       → Update user
DELETE /api/v1/users/123    → Delete user

Same RESTful URLs and HTTP methods, just binary protobuf bodies instead of JSON text.

Three Ways to Use Protobuf

1. REST + Protobuf (HTTP/1.1)

  • Traditional REST endpoints
  • Protobuf binary bodies
  • Standard HTTP status codes
  • Works with existing proxies and load balancers

Use when: You want protobuf performance but need REST semantics or HTTP/1.1 compatibility

2. gRPC + Protobuf (HTTP/2)

  • Service definitions in protobuf
  • Generated client/server code
  • Streaming support
  • Maximum performance

Use when: Building microservices or need streaming/bidirectional communication

3. Message Passing + Protobuf

  • Serialize to bytes, send via Kafka/RabbitMQ/Pub/Sub
  • No HTTP at all
  • Async processing

Use when: Event-driven architectures or async workflows

Why People Think They’re Coupled

Most tutorials show protobuf with gRPC because:

  • gRPC is protobuf’s most popular use case
  • They were released together
  • Google promotes them as a pair

But they’re separate concerns:

  • Protobuf = serialization format (like JSON)
  • gRPC = RPC framework that happens to use protobuf (like REST frameworks use JSON)

Analogy: JSON doesn’t require REST. You can send JSON over websockets, message queues, or any transport. Same with protobuf.

Real Companies Using REST + Protobuf

Google Cloud APIs:

  • Offer BOTH gRPC and REST
  • REST endpoints can accept protobuf OR JSON
  • Same protobuf definitions power both

Twitch:

  • Uses protobuf for message payloads
  • Sends over WebSocket (not gRPC)
  • Custom protocol, not RPC

Square:

  • Internal: gRPC + protobuf
  • Public merchant APIs: REST + JSON
  • Some internal REST APIs: REST + protobuf

Real-World Example: Google Cloud

All Google Cloud APIs are defined in protobuf:

googleapis/
├── google/
│   ├── cloud/
│   │   ├── secretmanager/v1/
│   │   │   └── service.proto
│   │   ├── storage/v1/
│   │   │   └── storage.proto
│   │   └── pubsub/v1/
│   │       └── pubsub.proto

Why this matters:

  • Every GCP SDK (Go, Python, Java, etc.) generates from the same .proto files
  • Guaranteed API compatibility across languages
  • When Google updates the API, everyone gets the same changes
  • Consistent behavior across all languages and platforms

The Trade-Off: Schema Management

Benefit: Type safety and performance

Cost: Schema evolution requires planning

Example challenge:

1
2
3
4
5
6
7
8
9
// Version 1: Used "userId" (string)
message Request {
  string userId = 1;
}

// Later: Want to change to int64
message Request {
  int64 userId = 1;  // BREAKING CHANGE!
}

Solution: Add a new field instead:

1
2
3
4
message Request {
  string userId = 1;           // Deprecated but keep for old clients
  int64 user_id_numeric = 2;   // New field
}

Old clients still work. New clients use field 2. Eventually deprecate field 1.

Protobuf in the Wild

Who uses it:

  • Google - All internal services (billions of RPCs/day)
  • Netflix - Inter-service communication
  • Uber - Microservices architecture
  • Square - Payment processing
  • Dropbox - File synchronization protocol

Open-source projects:

  • Kubernetes (internal API definitions)
  • Envoy proxy (configuration and APIs)
  • Prometheus (remote write protocol)
  • Kafka (schema registry supports protobuf)

Getting Started

Install protoc (protobuf compiler):

1
2
3
4
5
6
7
8
# macOS
brew install protobuf

# Ubuntu/Debian
apt-get install protobuf-compiler

# Windows (via Chocolatey)
choco install protoc

Verify installation:

1
2
protoc --version
# libprotoc 25.1

Install language plugins:

1
2
3
4
5
# Go
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest

# Python (comes with protobuf package)
pip install protobuf

Your First Protobuf Message

1. Create person.proto:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
syntax = "proto3";
package example;

option go_package = "example.com/person";

message Person {
  string name = 1;
  string email = 2;
  int32 age = 3;
}

2. Generate code:

1
protoc --go_out=. person.proto

3. Use it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
package main

import (
    "fmt"
    pb "example.com/person"
    "google.golang.org/protobuf/proto"
)

func main() {
    person := &pb.Person{
        Name:  "Alice",
        Email: "alice@example.com",
        Age:   28,
    }

    // Serialize
    data, _ := proto.Marshal(person)
    fmt.Printf("Binary size: %d bytes\n", len(data))

    // Deserialize
    var decoded pb.Person
    proto.Unmarshal(data, &decoded)
    fmt.Printf("Name: %s\n", decoded.Name)
}

That’s it. You’re using protobuf.

Choosing Your Approach: Quick Decision Guide

Use CaseBest ChoiceWhy
Internal microservicesgRPC + protobufMaximum performance, streaming
Public web APIREST + JSONBrowser compatibility, easy debugging
Mobile backendREST + protobuf or gRPCReduce bandwidth costs
Real-time featuresgRPC + protobufBidirectional streaming
Event processingMessage queue + protobufAsync, decoupled
Legacy integrationREST + JSONWidest compatibility
High throughputgRPC + protobufLowest latency

The key insight: protobuf is transport-agnostic. Choose your transport (REST, gRPC, message queue) based on requirements, then decide if protobuf’s benefits justify the schema overhead.

What’s Next

In the next parts of this series, we’ll explore:

  • Part 2: Protobuf in Practice - Decision matrix, transport combinations, real-world patterns
  • Part 3: gRPC Deep Dive - Building services, streaming, client-server code
  • Part 4: Advanced Features - Oneofs, any types, well-known types, optimizations
  • Part 5: Production Patterns - Schema evolution, versioning, monitoring, debugging

When NOT to Use Protobuf

Be honest about the trade-offs:

Skip protobuf if:

  • Building a simple REST API for web browsers
  • Data needs to be human-readable (logs, config files)
  • Team isn’t comfortable with schema management
  • Performance isn’t a concern
  • Quick prototyping phase

JSON is fine for:

  • Public REST APIs
  • Configuration files
  • Small-scale systems
  • Web-first applications

Use the right tool for the job. Protobuf shines at scale and in type-safety-critical systems, but it’s overkill for many applications.

The Verdict

Protocol Buffers trades human readability for performance and type safety. If you’re building:

  • Microservices communicating internally
  • Mobile apps where bandwidth costs money
  • High-throughput systems processing thousands of requests
  • Cross-language APIs requiring strict contracts

Then protobuf is worth learning.

If you’re building a REST API consumed by browsers, JSON is probably the right choice.

In the next part, we’ll explore gRPC - the RPC framework that pairs with protobuf to create type-safe, high-performance APIs.

Resources


Coming up in Part 2: Building your first gRPC service with protobuf, implementing server and client code, and understanding how RPC methods map to protobuf messages.