Understanding Protocol Buffers: Part 1 - Introduction and Core Concepts
Protocol Buffers explained: what they are, why Google uses them for billions of RPCs, and when you should choose protobuf over JSON. A practical introduction without overwhelming technical details.
- tags
- #Protobuf #Protocol-Buffers #Grpc #Serialization #Api-Design #Microservices #Go #Golang #Distributed-Systems #Data-Formats
- categories
- Tutorials Distributed-Systems
- published
- reading time
- 12 minutes
What is Protocol Buffers?
Protocol Buffers (protobuf) is a method for serializing structured data. Think of it as a replacement for JSON or XML, but:
- Smaller - 3-10x less data over the wire
- Faster - 5-10x faster to encode/decode
- Type-safe - Compile-time validation instead of runtime errors
- Language-agnostic - One schema works for Go, Python, Java, C++, JavaScript
Google developed protobuf internally and uses it for virtually all inter-service communication. When you’re handling billions of requests per second, those performance gains matter.
The Core Idea: Schema-First Design
Unlike JSON (which is schema-optional), protobuf requires you to define your data structure upfront:
JSON approach (no schema):
| |
Protobuf approach (schema required):
| |
The schema becomes the contract between services. Both sides know exactly what fields exist, what types they are, and what the message structure looks like.
How It Works: The Three-Step Process
Step 1: Define Your Schema
Write a .proto file describing your data:
| |
Key concepts:
message= struct/class (a collection of fields)- Numbers (1, 2, 3) = field identifiers (not values!)
repeated= array/list of values- Nested messages allowed
Step 2: Generate Code
Run the protobuf compiler:
| |
This creates language-specific code with:
- Structs/classes matching your schema
- Serialization methods (message → bytes)
- Deserialization methods (bytes → message)
Step 3: Use in Your Application
Go example:
| |
The binary format is what makes it fast and small.
Why Binary Format Matters
JSON representation:
| |
Size: 94 bytes (human-readable text)
Protobuf binary:
[binary data]
Size: ~35 bytes (optimized binary)
Why smaller:
- No field names in the binary (uses field numbers instead)
- Compact integer encoding (small numbers use 1 byte)
- No whitespace or formatting
- Efficient string encoding
Why faster:
- No text parsing
- Direct memory access
- Optimized for CPU cache
- Predictable structure
Field Numbers: The Secret Sauce
Those numbers (1, 2, 3) in your schema aren’t arbitrary:
| |
In the binary format:
- Field names (“name”, “email”) are never sent
- Only field numbers (1, 2, 3) are encoded
- Receiver uses the schema to map numbers → names
This enables backward compatibility:
| |
Old clients (using Version 1) can still read messages from new servers (Version 2). They just ignore field 3. New clients can read old messages - they see field 3 as empty.
Golden rule: Never reuse field numbers. Once you assign number 3 to “age”, that number is forever “age”.
Types in Protobuf
Scalar types:
| |
Collections:
| |
Nested messages:
| |
Enums:
| |
Protobuf vs JSON: When to Use Each
Use Protobuf When:
Performance is critical:
- High-throughput systems (thousands of requests/sec)
- Mobile apps (bandwidth costs money)
- IoT devices (limited CPU/memory)
- Real-time systems (latency matters)
Type safety matters:
- Multiple teams consuming your API
- Long-term API stability required
- Cross-language communication
- Compile-time error catching
Examples:
- Microservices (gRPC between services)
- Mobile backends (reduce data usage)
- Streaming systems (Kafka, Pub/Sub)
- Internal APIs at scale
Use JSON When:
Human interaction needed:
- REST APIs for web browsers
- Public APIs (easier to document/debug)
- Configuration files
- Quick prototyping
Simplicity matters:
- Small projects
- Infrequent requests
- Developer experience > performance
- Debugging with curl/browser
Examples:
- Public REST APIs
- Web dashboards
- Config files
- Development/testing
gRPC: Protobuf’s Most Common Use
gRPC is a framework for building APIs that uses protobuf for serialization:
| |
This generates:
- Server interface - implement these methods in your language
- Client code - call remote methods like local functions
- Network protocol - HTTP/2 + protobuf encoding
Server (Go):
| |
Client (Go):
| |
Protobuf Without gRPC: REST APIs
COMMON MISUNDERSTANDING
Many developers think protobuf and gRPC are inseparable - that you can’t use one without the other.
This is false.
Protobuf is a serialization format (like JSON). gRPC is an RPC framework that happens to use protobuf. They’re separate technologies that work well together but don’t require each other.
You can use:
- Protobuf with REST APIs (HTTP/1.1)
- Protobuf with WebSockets
- Protobuf with message queues (Kafka, RabbitMQ)
- Protobuf for file storage
- gRPC with other serialization formats (though protobuf is the standard)
Don’t skip protobuf just because you don’t want gRPC. They’re decoupled.
Protobuf is just a serialization format. You can use it with REST APIs, message queues, websockets, or any transport layer.
REST + Protobuf Example
You can build traditional REST APIs using protobuf instead of JSON:
| |
Standard REST structure:
POST /api/v1/users → Create user (protobuf body)
GET /api/v1/users/123 → Get user (protobuf response)
PUT /api/v1/users/123 → Update user
DELETE /api/v1/users/123 → Delete user
Same RESTful URLs and HTTP methods, just binary protobuf bodies instead of JSON text.
Three Ways to Use Protobuf
1. REST + Protobuf (HTTP/1.1)
- Traditional REST endpoints
- Protobuf binary bodies
- Standard HTTP status codes
- Works with existing proxies and load balancers
Use when: You want protobuf performance but need REST semantics or HTTP/1.1 compatibility
2. gRPC + Protobuf (HTTP/2)
- Service definitions in protobuf
- Generated client/server code
- Streaming support
- Maximum performance
Use when: Building microservices or need streaming/bidirectional communication
3. Message Passing + Protobuf
- Serialize to bytes, send via Kafka/RabbitMQ/Pub/Sub
- No HTTP at all
- Async processing
Use when: Event-driven architectures or async workflows
Why People Think They’re Coupled
Most tutorials show protobuf with gRPC because:
- gRPC is protobuf’s most popular use case
- They were released together
- Google promotes them as a pair
But they’re separate concerns:
- Protobuf = serialization format (like JSON)
- gRPC = RPC framework that happens to use protobuf (like REST frameworks use JSON)
Analogy: JSON doesn’t require REST. You can send JSON over websockets, message queues, or any transport. Same with protobuf.
Real Companies Using REST + Protobuf
Google Cloud APIs:
- Offer BOTH gRPC and REST
- REST endpoints can accept protobuf OR JSON
- Same protobuf definitions power both
Twitch:
- Uses protobuf for message payloads
- Sends over WebSocket (not gRPC)
- Custom protocol, not RPC
Square:
- Internal: gRPC + protobuf
- Public merchant APIs: REST + JSON
- Some internal REST APIs: REST + protobuf
Real-World Example: Google Cloud
All Google Cloud APIs are defined in protobuf:
googleapis/
├── google/
│ ├── cloud/
│ │ ├── secretmanager/v1/
│ │ │ └── service.proto
│ │ ├── storage/v1/
│ │ │ └── storage.proto
│ │ └── pubsub/v1/
│ │ └── pubsub.proto
Why this matters:
- Every GCP SDK (Go, Python, Java, etc.) generates from the same
.protofiles - Guaranteed API compatibility across languages
- When Google updates the API, everyone gets the same changes
- Consistent behavior across all languages and platforms
The Trade-Off: Schema Management
Benefit: Type safety and performance
Cost: Schema evolution requires planning
Example challenge:
| |
Solution: Add a new field instead:
| |
Old clients still work. New clients use field 2. Eventually deprecate field 1.
Protobuf in the Wild
Who uses it:
- Google - All internal services (billions of RPCs/day)
- Netflix - Inter-service communication
- Uber - Microservices architecture
- Square - Payment processing
- Dropbox - File synchronization protocol
Open-source projects:
- Kubernetes (internal API definitions)
- Envoy proxy (configuration and APIs)
- Prometheus (remote write protocol)
- Kafka (schema registry supports protobuf)
Getting Started
Install protoc (protobuf compiler):
| |
Verify installation:
| |
Install language plugins:
| |
Your First Protobuf Message
1. Create person.proto:
| |
2. Generate code:
| |
3. Use it:
| |
That’s it. You’re using protobuf.
Choosing Your Approach: Quick Decision Guide
| Use Case | Best Choice | Why |
|---|---|---|
| Internal microservices | gRPC + protobuf | Maximum performance, streaming |
| Public web API | REST + JSON | Browser compatibility, easy debugging |
| Mobile backend | REST + protobuf or gRPC | Reduce bandwidth costs |
| Real-time features | gRPC + protobuf | Bidirectional streaming |
| Event processing | Message queue + protobuf | Async, decoupled |
| Legacy integration | REST + JSON | Widest compatibility |
| High throughput | gRPC + protobuf | Lowest latency |
The key insight: protobuf is transport-agnostic. Choose your transport (REST, gRPC, message queue) based on requirements, then decide if protobuf’s benefits justify the schema overhead.
What’s Next
In the next parts of this series, we’ll explore:
- Part 2: Protobuf in Practice - Decision matrix, transport combinations, real-world patterns
- Part 3: gRPC Deep Dive - Building services, streaming, client-server code
- Part 4: Advanced Features - Oneofs, any types, well-known types, optimizations
- Part 5: Production Patterns - Schema evolution, versioning, monitoring, debugging
When NOT to Use Protobuf
Be honest about the trade-offs:
Skip protobuf if:
- Building a simple REST API for web browsers
- Data needs to be human-readable (logs, config files)
- Team isn’t comfortable with schema management
- Performance isn’t a concern
- Quick prototyping phase
JSON is fine for:
- Public REST APIs
- Configuration files
- Small-scale systems
- Web-first applications
Use the right tool for the job. Protobuf shines at scale and in type-safety-critical systems, but it’s overkill for many applications.
The Verdict
Protocol Buffers trades human readability for performance and type safety. If you’re building:
- Microservices communicating internally
- Mobile apps where bandwidth costs money
- High-throughput systems processing thousands of requests
- Cross-language APIs requiring strict contracts
Then protobuf is worth learning.
If you’re building a REST API consumed by browsers, JSON is probably the right choice.
In the next part, we’ll explore gRPC - the RPC framework that pairs with protobuf to create type-safe, high-performance APIs.
Resources
- Protocol Buffers Documentation
- Protobuf Language Guide (proto3)
- gRPC Official Site
- Why We Use gRPC - CNCF Blog
Coming up in Part 2: Building your first gRPC service with protobuf, implementing server and client code, and understanding how RPC methods map to protobuf messages.