Serialization and Deserialization: The Bridge Between Runtime Objects and Bytes
- tags
- #Fundamentals #Computer-Science #Data-Formats
- categories
- Programming
- published
- reading time
- 11 minutes
Every time you save a file, make an API call, or store data in a database, you’re using serialization. Yet many developers use these mechanisms daily without understanding the fundamental transformation happening under the hood.
Let’s demystify serialization and deserialization by understanding what they really are: conversions between runtime objects and bytes.
The Core Concept
SERIALIZATION: Runtime Objects → Bytes
DESERIALIZATION: Bytes → Runtime Objects
That’s it. Everything else is implementation details.
What Are Runtime Objects?
Runtime objects are data structures that exist in your program’s memory while it’s running. They’re language-specific constructs with:
- Type information
- Memory addresses
- Language-specific structure (prototypes, vtables, reference counting)
- Behavior (methods, functions)
Examples across languages:
Go:
| |
JavaScript:
| |
Python:
| |
The key insight: These objects only exist while your program is running. They live in RAM. When your program exits, they vanish.
Why Bytes?
Bytes are universal:
- Files on disk store bytes
- Network packets contain bytes
- Database records are bytes
- HTTP bodies are bytes
- Everything that persists or travels is bytes
Runtime objects are ephemeral and language-specific:
- They only exist in RAM while your program runs
- A Go struct can’t be directly stored on disk
- A JavaScript object can’t be sent over a network socket
- A Python dict can’t be read by a Java program
Bytes bridge this gap. They’re the universal intermediate format that enables:
- Persistence - Survive program restarts
- Communication - Cross machine boundaries
- Interoperability - Cross language boundaries
- Storage - Save to disk, databases, caches
The Transformation
───────────
type Profile struct {
Name string
IsActive bool
}
profile := Profile{
Name: 'my-project',
IsActive: true
}"] end subgraph bytes["Bytes (Persistent)"] data["Byte Sequence
───────────
[123, 34, 110, 97, 109, 101, 34, 58, 34, ...]
OR as text:
{'name':'my-project','is_active':true}"] end subgraph storage["Storage / Transmission"] disk["Disk File"] net["Network Packet"] db["Database Record"] cache["Cache Entry"] end obj -->|"Serialize
(Marshal/Encode)"| data data -->|"Deserialize
(Unmarshal/Decode)"| obj data --> disk data --> net data --> db data --> cache style memory fill:#1e3a5f,stroke:#4a9eff,color:#e2e8f0 style bytes fill:#2c5282,stroke:#4299e1,color:#e2e8f0 style storage fill:#22543d,stroke:#2f855a,color:#e2e8f0 style obj fill:#1a365d,stroke:#2c5282,color:#e2e8f0 style data fill:#2c5282,stroke:#63b3ed,color:#e2e8f0
Serialization: Objects → Bytes
Serialization converts runtime objects into a byte sequence.
Go example:
| |
What happens during serialization:
- Traverse object structure - Walk through all fields/properties
- Convert to format - Apply encoding rules (JSON, protobuf, etc.)
- Generate bytes - Produce sequential byte stream
- Discard metadata - Type info, methods, pointers are lost
The bytes have no structure - they’re just a sequence of numbers. No type information. No methods. Just data.
Deserialization: Bytes → Objects
Deserialization reconstructs runtime objects from bytes.
Go example:
| |
What happens during deserialization:
- Parse bytes - Interpret according to format rules
- Allocate memory - Create new object/struct/dict
- Populate fields - Assign values from parsed data
- Type checking - Validate against schema (if statically typed)
The Lifecycle
profile := Profile{...}"] use1["Use Object
fmt.Println(profile.Name)"] end subgraph serial["Serialization"] marshal["json.Marshal()
Object → Bytes"] end subgraph persist["Persistence"] file["file.json
bytes on disk"] end subgraph deserial["Deserialization"] unmarshal["json.Unmarshal()
Bytes → Object"] end subgraph prog2["Program 2 (JavaScript)"] parse["Parse JSON
JSON.parse(bytes)"] use2["Use Object
console.log(obj.name)"] end create --> use1 use1 --> marshal marshal --> file file --> unmarshal unmarshal --> use1 file -.->|"Different language!"| parse parse --> use2 style prog1 fill:#1e3a5f,stroke:#4a9eff,color:#e2e8f0 style serial fill:#742a2a,stroke:#c53030,color:#e2e8f0 style persist fill:#2c5282,stroke:#4299e1,color:#e2e8f0 style deserial fill:#22543d,stroke:#2f855a,color:#e2e8f0 style prog2 fill:#1e3a5f,stroke:#4a9eff,color:#e2e8f0
Notice: The bytes don’t “know” they came from Go. JavaScript can read the same bytes and build a JavaScript object. This is the power of serialization.
Serialization Formats
Different formats offer different tradeoffs:
JSON (JavaScript Object Notation)
Characteristics:
- Human-readable text
- Language-agnostic
- UTF-8 encoded
- Self-describing (field names included)
Tradeoffs:
- + Easy to debug
- + Universal support
- + Works with any language
- - Verbose (large size)
- - Slow to parse
- - No schema enforcement
Use cases: Config files, REST APIs, human-readable data
Example:
| |
Protocol Buffers (protobuf)
Characteristics:
- Binary format
- Schema-required (.proto files)
- Strongly typed
- Very compact
Tradeoffs:
- + Extremely fast
- + Very small size
- + Strong typing
- + Forward/backward compatibility
- - Not human-readable
- - Requires schema
- - Requires code generation
Use cases: gRPC, high-performance APIs, microservices
Schema (.proto):
| |
Bytes (hex):
0a 0a 6d 79 2d 70 72 6f 6a 65 63 74 10 01 1a 04 77 6f 72 6b
MessagePack
Characteristics:
- Binary format
- Like “binary JSON”
- No schema required
- More compact than JSON
Tradeoffs:
- + Smaller than JSON
- + Faster than JSON
- + No schema needed
- + Multiple language support
- - Not human-readable
- - Less universal than JSON
Use cases: Redis caching, log shipping, binary APIs
XML (eXtensible Markup Language)
Characteristics:
- Human-readable text
- Tag-based structure
- Schema optional (XSD)
- Verbose
Tradeoffs:
- + Self-describing
- + Schema validation available
- + Mature tooling
- - Very verbose
- - Slow to parse
- - Falling out of favor
Use cases: Legacy systems, SOAP APIs, enterprise integration
Example:
| |
YAML (YAML Ain’t Markup Language)
Characteristics:
- Human-readable text
- Indentation-based
- Superset of JSON
- Comments supported
Tradeoffs:
- + Very readable
- + Supports comments
- + Less verbose than JSON
- - Indentation-sensitive
- - Ambiguous syntax
- - Slower to parse
Use cases: Config files, CI/CD (GitHub Actions, Kubernetes), Ansible
Example:
| |
TOML (Tom’s Obvious Minimal Language)
Characteristics:
- Human-readable text
- INI-file inspired
- Explicit and unambiguous
- Table-based structure
Tradeoffs:
- + Very readable
- + Unambiguous syntax
- + Good for config
- - Limited adoption
- - Verbose for nested data
Use cases: Config files (Cargo.toml, pyproject.toml)
Example:
| |
Format Comparison
• Universal
• REST APIs
• Debugging"] yaml["YAML
• Config files
• Comments
• Readable"] toml["TOML
• Simple config
• Unambiguous
• Rust/Python"] protobuf["Protobuf
• gRPC
• High perf
• Typed"] msgpack["MessagePack
• Binary JSON
• Fast
• Compact"] question --> human question --> perf question --> compat question --> config human --> yaml human --> toml perf --> protobuf perf --> msgpack compat --> json config --> yaml config --> toml style question fill:#2d3748,stroke:#4a5568,color:#e2e8f0 style human fill:#742a2a,stroke:#c53030,color:#e2e8f0 style perf fill:#742a2a,stroke:#c53030,color:#e2e8f0 style compat fill:#742a2a,stroke:#c53030,color:#e2e8f0 style config fill:#742a2a,stroke:#c53030,color:#e2e8f0 style json fill:#2c5282,stroke:#4299e1,color:#e2e8f0 style yaml fill:#2c5282,stroke:#4299e1,color:#e2e8f0 style toml fill:#2c5282,stroke:#4299e1,color:#e2e8f0 style protobuf fill:#22543d,stroke:#2f855a,color:#e2e8f0 style msgpack fill:#22543d,stroke:#2f855a,color:#e2e8f0
Real-World Examples
Example 1: Saving Config
Go - dotclaude saving active profile:
| |
Later, loading config:
| |
Example 2: REST API
Client (JavaScript) sending data:
| |
Server (Go) receiving data:
| |
Different languages, same data!
Example 3: Database Storage
Saving to database:
| |
Loading from database:
| |
Common Pitfalls
1. Assuming Serialization Preserves Everything
Not preserved:
- Methods/functions
- Private fields (depends on language/serializer)
- Pointer relationships
- Type information (in some formats)
- Circular references
| |
2. Version Compatibility
Schema changes break deserialization:
| |
Solution: Use versioning strategies:
- Optional fields with defaults
- Schema evolution (protobuf)
- Version numbers in serialized data
3. Performance Assumptions
JSON is slow for large datasets:
| |
4. Security Vulnerabilities
Example of dangerous code:
| |
Safe approach:
- Use safe formats (JSON, not pickle)
- Validate after deserialization
- Use schemas (JSON Schema, protobuf)
- Set size limits
Best Practices
1. Choose the Right Format
| Scenario | Format |
|---|---|
| Config files | YAML or TOML |
| REST APIs | JSON |
| High-performance RPC | Protobuf |
| Logs/metrics | MessagePack or JSON |
| Legacy systems | XML |
| Binary caching | MessagePack |
2. Handle Errors
| |
3. Use Schemas When Possible
Protobuf schema:
| |
Benefits:
- Type safety
- Validation
- Documentation
- Code generation
- Version compatibility
4. Consider Size and Speed
Performance comparison (serializing/deserializing 1000 user records):
| Format | Size | Speed | Best For |
|---|---|---|---|
| Protobuf | 61 KB | 12ms | Internal services, gRPC |
| MessagePack | 89 KB | 35ms | Caching, binary APIs |
| JSON | 245 KB | 100ms | REST APIs, config files |
| XML | 412 KB | 187ms | Legacy systems (avoid for new projects) |
Key takeaway: Binary formats (protobuf, MessagePack) are 2-4x smaller and 3-8x faster than text formats (JSON, XML).
Rule of thumb:
- JSON - REST APIs, config files, anything human-readable
- Protobuf - High-performance internal services, gRPC
- MessagePack - Fast caching, log shipping
- XML - Only for legacy integration
Conclusion
Serialization and deserialization are fundamental transformations that enable:
- Persistence - Objects survive program restarts
- Communication - Objects travel across networks
- Interoperability - Objects cross language boundaries
- Storage - Objects live in databases and caches
The key insight: Runtime objects are ephemeral and language-specific. Bytes are persistent and universal. Serialization is the bridge.
Every time you save a file, call an API, or query a database, you’re converting between these two worlds. Understanding this transformation helps you choose the right format, debug issues, and build robust systems.