Understanding Protocol Buffers: Part 1 - Introduction and Core Concepts

Protocol Buffers explained: what they are, why Google uses them for billions of RPCs, and when you should choose protobuf over JSON. A practical introduction without overwhelming technical details.

tags: #Protobuf #Protocol-Buffers #Grpc #Serialization #Api-Design #Microservices #Go #Golang #Distributed-Systems #Data-Formats
categories: Tutorials Distributed-Systems
published: 2025-12-11
reading time: 12 minutes

What is Protocol Buffers?

Protocol Buffers (protobuf) is a method for serializing structured data. Think of it as a replacement for JSON or XML, but:

Smaller - 3-10x less data over the wire
Faster - 5-10x faster to encode/decode
Type-safe - Compile-time validation instead of runtime errors
Language-agnostic - One schema works for Go, Python, Java, C++, JavaScript

Google developed protobuf internally and uses it for virtually all inter-service communication. When you’re handling billions of requests per second, those performance gains matter.

The Core Idea: Schema-First Design

Unlike JSON (which is schema-optional), protobuf requires you to define your data structure upfront:

JSON approach (no schema):

1
2
3
4
5
6
// Send this, hope the other side knows what to expect
{
  "name": "John",
  "email": "john@example.com",
  "age": 30
}

Protobuf approach (schema required):

1
2
3
4
5
6
7
8
// user.proto - define the schema
syntax = "proto3";

message User {
  string name = 1;
  string email = 2;
  int32 age = 3;
}

The schema becomes the contract between services. Both sides know exactly what fields exist, what types they are, and what the message structure looks like.

How It Works: The Three-Step Process

Step 1: Define Your Schema

Write a .proto file describing your data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
syntax = "proto3";
package example;

message Person {
  string name = 1;
  string email = 2;
  int32 age = 3;
  repeated string hobbies = 4;  // Array/list
}

message Team {
  string name = 1;
  repeated Person members = 2;  // Nested messages
}

Key concepts:

message = struct/class (a collection of fields)
Numbers (1, 2, 3) = field identifiers (not values!)
repeated = array/list of values
Nested messages allowed

Step 2: Generate Code

Run the protobuf compiler:

1
2
3
4
5
6
7
8
# Generate Go code
protoc --go_out=. user.proto

# Generate Python code
protoc --python_out=. user.proto

# Generate Java code
protoc --java_out=. user.proto

This creates language-specific code with:

Structs/classes matching your schema
Serialization methods (message → bytes)
Deserialization methods (bytes → message)

Step 3: Use in Your Application

Go example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import pb "example.com/generated/user"

// Create a message
person := &pb.Person{
    Name:    "Alice",
    Email:   "alice@example.com",
    Age:     28,
    Hobbies: []string{"coding", "hiking"},
}

// Serialize to bytes (binary format)
data, err := proto.Marshal(person)
// data is now compact binary representation

// Send over network, write to file, etc.

// Deserialize back
var person2 pb.Person
proto.Unmarshal(data, &person2)
// person2 now has all the fields

The binary format is what makes it fast and small.

Why Binary Format Matters

JSON representation:

1
2
3
4
5
6
{
  "name": "Alice",
  "email": "alice@example.com",
  "age": 28,
  "hobbies": ["coding", "hiking"]
}

Size: 94 bytes (human-readable text)

Protobuf binary:

[binary data]

Size: ~35 bytes (optimized binary)

Why smaller:

No field names in the binary (uses field numbers instead)
Compact integer encoding (small numbers use 1 byte)
No whitespace or formatting
Efficient string encoding

Why faster:

No text parsing
Direct memory access
Optimized for CPU cache
Predictable structure

Field Numbers: The Secret Sauce

Those numbers (1, 2, 3) in your schema aren’t arbitrary:

1
2
3
4
5
message User {
  string name = 1;   // Field number 1
  string email = 2;  // Field number 2
  int32 age = 3;     // Field number 3
}

In the binary format:

Field names (“name”, “email”) are never sent
Only field numbers (1, 2, 3) are encoded
Receiver uses the schema to map numbers → names

This enables backward compatibility:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Version 1
message User {
  string name = 1;
  string email = 2;
}

// Version 2 - add a field
message User {
  string name = 1;
  string email = 2;
  int32 age = 3;      // New field!
}

Old clients (using Version 1) can still read messages from new servers (Version 2). They just ignore field 3. New clients can read old messages - they see field 3 as empty.

Golden rule: Never reuse field numbers. Once you assign number 3 to “age”, that number is forever “age”.

Types in Protobuf

Scalar types:

1
2
3
4
5
6
7
8
message Example {
  string text = 1;       // UTF-8 string
  int32 number = 2;      // 32-bit integer
  int64 big_number = 3;  // 64-bit integer
  bool flag = 4;         // true/false
  bytes data = 5;        // Raw bytes
  double price = 6;      // Floating point
}

Collections:

1
2
3
4
message Example {
  repeated string tags = 1;           // Array of strings
  map<string, int32> counts = 2;      // Key-value map
}

Nested messages:

1
2
3
4
5
6
7
8
9
message Address {
  string street = 1;
  string city = 2;
}

message Person {
  string name = 1;
  Address address = 2;  // Nested message
}

Enums:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
enum Status {
  UNKNOWN = 0;
  ACTIVE = 1;
  INACTIVE = 2;
}

message User {
  string name = 1;
  Status status = 2;
}

Protobuf vs JSON: When to Use Each

Use Protobuf When:

Performance is critical:

High-throughput systems (thousands of requests/sec)
Mobile apps (bandwidth costs money)
IoT devices (limited CPU/memory)
Real-time systems (latency matters)

Type safety matters:

Multiple teams consuming your API
Long-term API stability required
Cross-language communication
Compile-time error catching

Examples:

Microservices (gRPC between services)
Mobile backends (reduce data usage)
Streaming systems (Kafka, Pub/Sub)
Internal APIs at scale

Use JSON When:

Human interaction needed:

REST APIs for web browsers
Public APIs (easier to document/debug)
Configuration files
Quick prototyping

Simplicity matters:

Small projects
Infrequent requests
Developer experience > performance
Debugging with curl/browser

Examples:

Public REST APIs
Web dashboards
Config files
Development/testing

gRPC: Protobuf’s Most Common Use

gRPC is a framework for building APIs that uses protobuf for serialization:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Define both data structures AND service methods
service UserService {
  rpc GetUser(GetUserRequest) returns (User);
  rpc CreateUser(CreateUserRequest) returns (User);
  rpc ListUsers(ListUsersRequest) returns (ListUsersResponse);
}

message GetUserRequest {
  string user_id = 1;
}

message User {
  string id = 1;
  string name = 2;
  string email = 3;
}

This generates:

Server interface - implement these methods in your language
Client code - call remote methods like local functions
Network protocol - HTTP/2 + protobuf encoding

Server (Go):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
type server struct {
    pb.UnimplementedUserServiceServer
}

func (s *server) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.User, error) {
    // Fetch user from database
    return &pb.User{
        Id:    req.UserId,
        Name:  "Alice",
        Email: "alice@example.com",
    }, nil
}

Client (Go):

1
2
3
4
5
6
7
conn, _ := grpc.Dial("localhost:9090")
client := pb.NewUserServiceClient(conn)

user, err := client.GetUser(ctx, &pb.GetUserRequest{
    UserId: "123",
})
// Looks like a local function call, but it's network RPC

Protobuf Without gRPC: REST APIs

COMMON MISUNDERSTANDING
Many developers think protobuf and gRPC are inseparable - that you can’t use one without the other.
This is false.
Protobuf is a serialization format (like JSON). gRPC is an RPC framework that happens to use protobuf. They’re separate technologies that work well together but don’t require each other.
You can use:
Protobuf with REST APIs (HTTP/1.1)
Protobuf with WebSockets
Protobuf with message queues (Kafka, RabbitMQ)
Protobuf for file storage
gRPC with other serialization formats (though protobuf is the standard)
Don’t skip protobuf just because you don’t want gRPC. They’re decoupled.

Protobuf is just a serialization format. You can use it with REST APIs, message queues, websockets, or any transport layer.

REST + Protobuf Example

You can build traditional REST APIs using protobuf instead of JSON:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// Standard REST endpoint with protobuf
func CreateUser(w http.ResponseWriter, r *http.Request) {
    // Read protobuf from request body
    body, _ := io.ReadAll(r.Body)

    var req pb.CreateUserRequest
    proto.Unmarshal(body, &req)

    // Process request
    user := &pb.User{
        Id:    generateID(),
        Name:  req.Name,
        Email: req.Email,
    }

    // Return protobuf response
    data, _ := proto.Marshal(user)
    w.Header().Set("Content-Type", "application/x-protobuf")
    w.Write(data)
}

Standard REST structure:

POST /api/v1/users          → Create user (protobuf body)
GET /api/v1/users/123       → Get user (protobuf response)
PUT /api/v1/users/123       → Update user
DELETE /api/v1/users/123    → Delete user

Same RESTful URLs and HTTP methods, just binary protobuf bodies instead of JSON text.

Three Ways to Use Protobuf

1. REST + Protobuf (HTTP/1.1)

Traditional REST endpoints
Protobuf binary bodies
Standard HTTP status codes
Works with existing proxies and load balancers

Use when: You want protobuf performance but need REST semantics or HTTP/1.1 compatibility

2. gRPC + Protobuf (HTTP/2)

Service definitions in protobuf
Generated client/server code
Streaming support
Maximum performance

Use when: Building microservices or need streaming/bidirectional communication

3. Message Passing + Protobuf

Serialize to bytes, send via Kafka/RabbitMQ/Pub/Sub
No HTTP at all
Async processing

Use when: Event-driven architectures or async workflows

Why People Think They’re Coupled

Most tutorials show protobuf with gRPC because:

gRPC is protobuf’s most popular use case
They were released together
Google promotes them as a pair

But they’re separate concerns:

Protobuf = serialization format (like JSON)
gRPC = RPC framework that happens to use protobuf (like REST frameworks use JSON)

Analogy: JSON doesn’t require REST. You can send JSON over websockets, message queues, or any transport. Same with protobuf.

Real Companies Using REST + Protobuf

Google Cloud APIs:

Offer BOTH gRPC and REST
REST endpoints can accept protobuf OR JSON
Same protobuf definitions power both

Twitch:

Uses protobuf for message payloads
Sends over WebSocket (not gRPC)
Custom protocol, not RPC

Square:

Internal: gRPC + protobuf
Public merchant APIs: REST + JSON
Some internal REST APIs: REST + protobuf

Real-World Example: Google Cloud

All Google Cloud APIs are defined in protobuf:

googleapis/
├── google/
│   ├── cloud/
│   │   ├── secretmanager/v1/
│   │   │   └── service.proto
│   │   ├── storage/v1/
│   │   │   └── storage.proto
│   │   └── pubsub/v1/
│   │       └── pubsub.proto

Why this matters:

Every GCP SDK (Go, Python, Java, etc.) generates from the same .proto files
Guaranteed API compatibility across languages
When Google updates the API, everyone gets the same changes
Consistent behavior across all languages and platforms

The Trade-Off: Schema Management

Benefit: Type safety and performance

Cost: Schema evolution requires planning

Example challenge:

1
2
3
4
5
6
7
8
9
// Version 1: Used "userId" (string)
message Request {
  string userId = 1;
}

// Later: Want to change to int64
message Request {
  int64 userId = 1;  // BREAKING CHANGE!
}

Solution: Add a new field instead:

1
2
3
4
message Request {
  string userId = 1;           // Deprecated but keep for old clients
  int64 user_id_numeric = 2;   // New field
}

Old clients still work. New clients use field 2. Eventually deprecate field 1.

Protobuf in the Wild

Who uses it:

Google - All internal services (billions of RPCs/day)
Netflix - Inter-service communication
Uber - Microservices architecture
Square - Payment processing
Dropbox - File synchronization protocol

Open-source projects:

Kubernetes (internal API definitions)
Envoy proxy (configuration and APIs)
Prometheus (remote write protocol)
Kafka (schema registry supports protobuf)

Getting Started

Install protoc (protobuf compiler):

1
2
3
4
5
6
7
8
# macOS
brew install protobuf

# Ubuntu/Debian
apt-get install protobuf-compiler

# Windows (via Chocolatey)
choco install protoc

Verify installation:

1
2
protoc --version
# libprotoc 25.1

Install language plugins:

1
2
3
4
5
# Go
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest

# Python (comes with protobuf package)
pip install protobuf

Your First Protobuf Message

1. Create person.proto:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
syntax = "proto3";
package example;

option go_package = "example.com/person";

message Person {
  string name = 1;
  string email = 2;
  int32 age = 3;
}

2. Generate code:

1
protoc --go_out=. person.proto

3. Use it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
package main

import (
    "fmt"
    pb "example.com/person"
    "google.golang.org/protobuf/proto"
)

func main() {
    person := &pb.Person{
        Name:  "Alice",
        Email: "alice@example.com",
        Age:   28,
    }

    // Serialize
    data, _ := proto.Marshal(person)
    fmt.Printf("Binary size: %d bytes\n", len(data))

    // Deserialize
    var decoded pb.Person
    proto.Unmarshal(data, &decoded)
    fmt.Printf("Name: %s\n", decoded.Name)
}

That’s it. You’re using protobuf.

Choosing Your Approach: Quick Decision Guide

Use Case	Best Choice	Why
Internal microservices	gRPC + protobuf	Maximum performance, streaming
Public web API	REST + JSON	Browser compatibility, easy debugging
Mobile backend	REST + protobuf or gRPC	Reduce bandwidth costs
Real-time features	gRPC + protobuf	Bidirectional streaming
Event processing	Message queue + protobuf	Async, decoupled
Legacy integration	REST + JSON	Widest compatibility
High throughput	gRPC + protobuf	Lowest latency

The key insight: protobuf is transport-agnostic. Choose your transport (REST, gRPC, message queue) based on requirements, then decide if protobuf’s benefits justify the schema overhead.

What’s Next

In the next parts of this series, we’ll explore:

Part 2: Protobuf in Practice - Decision matrix, transport combinations, real-world patterns
Part 3: gRPC Deep Dive - Building services, streaming, client-server code
Part 4: Advanced Features - Oneofs, any types, well-known types, optimizations
Part 5: Production Patterns - Schema evolution, versioning, monitoring, debugging

When NOT to Use Protobuf

Be honest about the trade-offs:

Skip protobuf if:

Building a simple REST API for web browsers
Data needs to be human-readable (logs, config files)
Team isn’t comfortable with schema management
Performance isn’t a concern
Quick prototyping phase

JSON is fine for:

Public REST APIs
Configuration files
Small-scale systems
Web-first applications

Use the right tool for the job. Protobuf shines at scale and in type-safety-critical systems, but it’s overkill for many applications.

The Verdict

Protocol Buffers trades human readability for performance and type safety. If you’re building:

Microservices communicating internally
Mobile apps where bandwidth costs money
High-throughput systems processing thousands of requests
Cross-language APIs requiring strict contracts

Then protobuf is worth learning.

If you’re building a REST API consumed by browsers, JSON is probably the right choice.

In the next part, we’ll explore gRPC - the RPC framework that pairs with protobuf to create type-safe, high-performance APIs.

Resources

Coming up in Part 2: Building your first gRPC service with protobuf, implementing server and client code, and understanding how RPC methods map to protobuf messages.