Serialization and Deserialization: The Bridge Between Runtime Objects and Bytes

tags: #Fundamentals #Computer-Science #Data-Formats
categories: Programming
published: 2025-12-11
reading time: 11 minutes

Every time you save a file, make an API call, or store data in a database, you’re using serialization. Yet many developers use these mechanisms daily without understanding the fundamental transformation happening under the hood.

Let’s demystify serialization and deserialization by understanding what they really are: conversions between runtime objects and bytes.

The Core Concept

SERIALIZATION:   Runtime Objects → Bytes
DESERIALIZATION: Bytes → Runtime Objects

That’s it. Everything else is implementation details.

What Are Runtime Objects?

Runtime objects are data structures that exist in your program’s memory while it’s running. They’re language-specific constructs with:

Type information
Memory addresses
Language-specific structure (prototypes, vtables, reference counting)
Behavior (methods, functions)

Examples across languages:

Go:

1
2
3
4
5
6
7
// Runtime object: struct instance in memory
profile := Profile{
    Name:     "my-project",
    IsActive: true,
    Created:  time.Now(),
}
// Exists as a memory structure with fields at specific offsets

JavaScript:

1
2
3
4
5
6
7
// Runtime object: JS object in V8 heap
const profile = {
    name: "my-project",
    isActive: true,
    created: new Date()
};
// Exists with prototype chain, hidden classes, etc.

Python:

1
2
3
4
5
6
7
# Runtime object: dict in Python heap
profile = {
    "name": "my-project",
    "is_active": True,
    "created": datetime.now()
}
# Exists as PyObject with reference counting

The key insight: These objects only exist while your program is running. They live in RAM. When your program exits, they vanish.

Key Concept: Runtime objects are ephemeral. They exist only in memory while your program runs. Once the program exits or the object goes out of scope, it’s gone forever. This is why we need serialization to preserve data.

Why Bytes?

Bytes are universal:

Files on disk store bytes
Network packets contain bytes
Database records are bytes
HTTP bodies are bytes
Everything that persists or travels is bytes

Runtime objects are ephemeral and language-specific:

They only exist in RAM while your program runs
A Go struct can’t be directly stored on disk
A JavaScript object can’t be sent over a network socket
A Python dict can’t be read by a Java program

Bytes bridge this gap. They’re the universal intermediate format that enables:

Persistence - Survive program restarts
Communication - Cross machine boundaries
Interoperability - Cross language boundaries
Storage - Save to disk, databases, caches

The Transformation

flowchart TB subgraph memory["Runtime Memory (Ephemeral)"] obj["Go Struct
───────────
type Profile struct {
Name string
IsActive bool
}

profile := Profile{
Name: 'my-project',
IsActive: true
}"] end subgraph bytes["Bytes (Persistent)"] data["Byte Sequence
───────────
[123, 34, 110, 97, 109, 101, 34, 58, 34, ...]

OR as text:
{'name':'my-project','is_active':true}"] end subgraph storage["Storage / Transmission"] disk["Disk File"] net["Network Packet"] db["Database Record"] cache["Cache Entry"] end obj -->|"Serialize
(Marshal/Encode)"| data data -->|"Deserialize
(Unmarshal/Decode)"| obj data --> disk data --> net data --> db data --> cache style memory fill:#1e3a5f,stroke:#4a9eff,color:#e2e8f0 style bytes fill:#2c5282,stroke:#4299e1,color:#e2e8f0 style storage fill:#22543d,stroke:#2f855a,color:#e2e8f0 style obj fill:#1a365d,stroke:#2c5282,color:#e2e8f0 style data fill:#2c5282,stroke:#63b3ed,color:#e2e8f0

Serialization: Objects → Bytes

Serialization converts runtime objects into a byte sequence.

Go example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// 1. Runtime object (in memory)
profile := Profile{
    Name:     "my-project",
    IsActive: true,
}

// 2. Serialize to bytes
jsonBytes, _ := json.Marshal(profile)
// jsonBytes = [123, 34, 110, 97, 109, 101, 34, 58, ...] (raw bytes)
// As string: {"name":"my-project","is_active":true}

// 3. Now you can persist/transmit the bytes
os.WriteFile("profile.json", jsonBytes, 0644)  // Write to disk
conn.Write(jsonBytes)                           // Send over network
redis.Set("profile", jsonBytes)                 // Store in cache

What happens during serialization:

Traverse object structure - Walk through all fields/properties
Convert to format - Apply encoding rules (JSON, protobuf, etc.)
Generate bytes - Produce sequential byte stream
Discard metadata - Type info, methods, pointers are lost

The bytes have no structure - they’re just a sequence of numbers. No type information. No methods. Just data.

Deserialization: Bytes → Objects

Deserialization reconstructs runtime objects from bytes.

Go example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// 1. Read bytes (from disk, network, database)
jsonBytes, _ := os.ReadFile("profile.json")
// jsonBytes = [123, 34, 110, 97, 109, 101, 34, ...]

// 2. Deserialize to runtime object
var profile Profile
json.Unmarshal(jsonBytes, &profile)

// 3. Use the reconstructed object
fmt.Println(profile.Name)       // Access fields
if profile.IsActive {           // Use in logic
    activate(profile)           // Pass to functions
}

What happens during deserialization:

Parse bytes - Interpret according to format rules
Allocate memory - Create new object/struct/dict
Populate fields - Assign values from parsed data
Type checking - Validate against schema (if statically typed)

The Lifecycle

flowchart LR subgraph prog1["Program 1 (Go)"] create["Create Object
profile := Profile{...}"] use1["Use Object
fmt.Println(profile.Name)"] end subgraph serial["Serialization"] marshal["json.Marshal()
Object → Bytes"] end subgraph persist["Persistence"] file["file.json
bytes on disk"] end subgraph deserial["Deserialization"] unmarshal["json.Unmarshal()
Bytes → Object"] end subgraph prog2["Program 2 (JavaScript)"] parse["Parse JSON
JSON.parse(bytes)"] use2["Use Object
console.log(obj.name)"] end create --> use1 use1 --> marshal marshal --> file file --> unmarshal unmarshal --> use1 file -.->|"Different language!"| parse parse --> use2 style prog1 fill:#1e3a5f,stroke:#4a9eff,color:#e2e8f0 style serial fill:#742a2a,stroke:#c53030,color:#e2e8f0 style persist fill:#2c5282,stroke:#4299e1,color:#e2e8f0 style deserial fill:#22543d,stroke:#2f855a,color:#e2e8f0 style prog2 fill:#1e3a5f,stroke:#4a9eff,color:#e2e8f0

Notice: The bytes don’t “know” they came from Go. JavaScript can read the same bytes and build a JavaScript object. This is the power of serialization.

Serialization Formats

Different formats offer different tradeoffs:

JSON (JavaScript Object Notation)

Characteristics:

Human-readable text
Language-agnostic
UTF-8 encoded
Self-describing (field names included)

Tradeoffs:

+ Easy to debug
+ Universal support
+ Works with any language
- Verbose (large size)
- Slow to parse
- No schema enforcement

Use cases: Config files, REST APIs, human-readable data

Example:

1
2
3
4
5
{
  "name": "my-project",
  "is_active": true,
  "tags": ["work", "golang"]
}

Protocol Buffers (protobuf)

Characteristics:

Binary format
Schema-required (.proto files)
Strongly typed
Very compact

Tradeoffs:

+ Extremely fast
+ Very small size
+ Strong typing
+ Forward/backward compatibility
- Not human-readable
- Requires schema
- Requires code generation

Use cases: gRPC, high-performance APIs, microservices

Schema (.proto):

1
2
3
4
5
message Profile {
  string name = 1;
  bool is_active = 2;
  repeated string tags = 3;
}

Bytes (hex):

0a 0a 6d 79 2d 70 72 6f 6a 65 63 74 10 01 1a 04 77 6f 72 6b

MessagePack

Characteristics:

Binary format
Like “binary JSON”
No schema required
More compact than JSON

Tradeoffs:

+ Smaller than JSON
+ Faster than JSON
+ No schema needed
+ Multiple language support
- Not human-readable
- Less universal than JSON

Use cases: Redis caching, log shipping, binary APIs

XML (eXtensible Markup Language)

Characteristics:

Human-readable text
Tag-based structure
Schema optional (XSD)
Verbose

Tradeoffs:

+ Self-describing
+ Schema validation available
+ Mature tooling
- Very verbose
- Slow to parse
- Falling out of favor

Use cases: Legacy systems, SOAP APIs, enterprise integration

Example:

1
2
3
4
5
6
7
8
<profile>
  <name>my-project</name>
  <is_active>true</is_active>
  <tags>
    <tag>work</tag>
    <tag>golang</tag>
  </tags>
</profile>

YAML (YAML Ain’t Markup Language)

Characteristics:

Human-readable text
Indentation-based
Superset of JSON
Comments supported

Tradeoffs:

+ Very readable
+ Supports comments
+ Less verbose than JSON
- Indentation-sensitive
- Ambiguous syntax
- Slower to parse

Use cases: Config files, CI/CD (GitHub Actions, Kubernetes), Ansible

Example:

1
2
3
4
5
name: my-project
is_active: true
tags:
  - work
  - golang

TOML (Tom’s Obvious Minimal Language)

Characteristics:

Human-readable text
INI-file inspired
Explicit and unambiguous
Table-based structure

Tradeoffs:

+ Very readable
+ Unambiguous syntax
+ Good for config
- Limited adoption
- Verbose for nested data

Use cases: Config files (Cargo.toml, pyproject.toml)

Example:

1
2
3
name = "my-project"
is_active = true
tags = ["work", "golang"]

Format Comparison

flowchart TB question{"What's your priority?"} human["Human readability?"] perf["Performance?"] compat["Maximum compatibility?"] config["Configuration files?"] json["JSON
• Universal
• REST APIs
• Debugging"] yaml["YAML
• Config files
• Comments
• Readable"] toml["TOML
• Simple config
• Unambiguous
• Rust/Python"] protobuf["Protobuf
• gRPC
• High perf
• Typed"] msgpack["MessagePack
• Binary JSON
• Fast
• Compact"] question --> human question --> perf question --> compat question --> config human --> yaml human --> toml perf --> protobuf perf --> msgpack compat --> json config --> yaml config --> toml style question fill:#2d3748,stroke:#4a5568,color:#e2e8f0 style human fill:#742a2a,stroke:#c53030,color:#e2e8f0 style perf fill:#742a2a,stroke:#c53030,color:#e2e8f0 style compat fill:#742a2a,stroke:#c53030,color:#e2e8f0 style config fill:#742a2a,stroke:#c53030,color:#e2e8f0 style json fill:#2c5282,stroke:#4299e1,color:#e2e8f0 style yaml fill:#2c5282,stroke:#4299e1,color:#e2e8f0 style toml fill:#2c5282,stroke:#4299e1,color:#e2e8f0 style protobuf fill:#22543d,stroke:#2f855a,color:#e2e8f0 style msgpack fill:#22543d,stroke:#2f855a,color:#e2e8f0

Real-World Examples

Example 1: Saving Config

Go - dotclaude saving active profile:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// 1. Runtime object
config := Config{
    ActiveProfile: "my-project",
    BackupCount:   5,
    LastSync:      time.Now(),
}

// 2. Serialize to bytes
data, _ := json.MarshalIndent(config, "", "  ")

// 3. Write bytes to disk
os.WriteFile("~/.dotclaude/config.json", data, 0644)

// Program exits. Object gone. Only bytes remain on disk.

Later, loading config:

1
2
3
4
5
6
7
8
9
// 1. Read bytes from disk
data, _ := os.ReadFile("~/.dotclaude/config.json")

// 2. Deserialize to runtime object
var config Config
json.Unmarshal(data, &config)

// 3. Use the object
fmt.Println("Active:", config.ActiveProfile)

Example 2: REST API

Client (JavaScript) sending data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// 1. Runtime object
const user = {
    name: "Alice",
    email: "alice@example.com",
    role: "admin"
};

// 2. Serialize to JSON bytes
const json = JSON.stringify(user);
// json = '{"name":"Alice","email":"alice@example.com","role":"admin"}'

// 3. Send bytes over network
fetch('/api/users', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: json  // Bytes go over HTTP
});

Server (Go) receiving data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// 1. Read bytes from HTTP request
body, _ := io.ReadAll(r.Body)

// 2. Deserialize to runtime object
var user User
json.Unmarshal(body, &user)

// 3. Use the object
if user.Role == "admin" {
    grantAdminAccess(user)
}

Different languages, same data!

Example 3: Database Storage

Saving to database:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// 1. Runtime object
session := Session{
    UserID:    123,
    Token:     "abc123",
    ExpiresAt: time.Now().Add(24 * time.Hour),
}

// 2. Serialize to bytes (JSON or protobuf)
bytes, _ := json.Marshal(session)

// 3. Store bytes in database
db.Exec("INSERT INTO sessions (data) VALUES (?)", bytes)

Loading from database:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// 1. Query bytes from database
var bytes []byte
db.QueryRow("SELECT data FROM sessions WHERE id = ?", id).Scan(&bytes)

// 2. Deserialize to runtime object
var session Session
json.Unmarshal(bytes, &session)

// 3. Use the object
if time.Now().After(session.ExpiresAt) {
    return errors.New("session expired")
}

Common Pitfalls

1. Assuming Serialization Preserves Everything

Not preserved:

Methods/functions
Private fields (depends on language/serializer)
Pointer relationships
Type information (in some formats)
Circular references

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
type User struct {
    Name string
    // Methods are NOT serialized
}

func (u *User) Greet() string {
    return "Hello, " + u.Name
}

// After serialization → deserialization:
// You get the Name field, but NOT the Greet() method

2. Version Compatibility

Schema changes break deserialization:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Version 1
type Config struct {
    Name string
}

// Serialize with V1, store in database

// Version 2 (field added)
type Config struct {
    Name   string
    Email  string  // NEW FIELD
}

// Deserialize old bytes with V2 struct
// Email will be zero value (empty string)

Solution: Use versioning strategies:

Optional fields with defaults
Schema evolution (protobuf)
Version numbers in serialized data

3. Performance Assumptions

JSON is slow for large datasets:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// BAD: Repeatedly serialize in hot path
for i := 0; i < 1000000; i++ {
    json.Marshal(largeObject)  // Expensive!
}

// GOOD: Serialize once, reuse bytes
bytes, _ := json.Marshal(largeObject)
for i := 0; i < 1000000; i++ {
    sendBytes(bytes)
}

4. Security Vulnerabilities

Security Warning: Deserializing untrusted data is dangerous and can lead to remote code execution vulnerabilities. Never deserialize data from untrusted sources without proper validation.

Example of dangerous code:

1
2
3
# NEVER do this with untrusted input
import pickle
data = pickle.loads(untrusted_bytes)  # Can execute arbitrary code!

Safe approach:

Use safe formats (JSON, not pickle)
Validate after deserialization
Use schemas (JSON Schema, protobuf)
Set size limits

Best Practices

1. Choose the Right Format

Scenario	Format
Config files	YAML or TOML
REST APIs	JSON
High-performance RPC	Protobuf
Logs/metrics	MessagePack or JSON
Legacy systems	XML
Binary caching	MessagePack

2. Handle Errors

1
2
3
4
5
6
7
8
9
// BAD: Ignoring errors
var config Config
json.Unmarshal(bytes, &config)

// GOOD: Check errors
var config Config
if err := json.Unmarshal(bytes, &config); err != nil {
    return fmt.Errorf("failed to deserialize config: %w", err)
}

3. Use Schemas When Possible

Protobuf schema:

1
2
3
4
message User {
  string name = 1 [(validate.rules).string.min_len = 1];
  string email = 2 [(validate.rules).string.email = true];
}

Benefits:

Type safety
Validation
Documentation
Code generation
Version compatibility

4. Consider Size and Speed

Performance comparison (serializing/deserializing 1000 user records):

Format	Size	Speed	Best For
Protobuf	61 KB	12ms	Internal services, gRPC
MessagePack	89 KB	35ms	Caching, binary APIs
JSON	245 KB	100ms	REST APIs, config files
XML	412 KB	187ms	Legacy systems (avoid for new projects)

Key takeaway: Binary formats (protobuf, MessagePack) are 2-4x smaller and 3-8x faster than text formats (JSON, XML).

Rule of thumb:

JSON - REST APIs, config files, anything human-readable
Protobuf - High-performance internal services, gRPC
MessagePack - Fast caching, log shipping
XML - Only for legacy integration

Conclusion

Serialization and deserialization are fundamental transformations that enable:

Persistence - Objects survive program restarts
Communication - Objects travel across networks
Interoperability - Objects cross language boundaries
Storage - Objects live in databases and caches

The key insight: Runtime objects are ephemeral and language-specific. Bytes are persistent and universal. Serialization is the bridge.

Every time you save a file, call an API, or query a database, you’re converting between these two worlds. Understanding this transformation helps you choose the right format, debug issues, and build robust systems.

Serialization and Deserialization: The Bridge Between Runtime Objects and Bytes

The Core Concept

What Are Runtime Objects?

Why Bytes?

The Transformation

Serialization: Objects → Bytes

Deserialization: Bytes → Objects

The Lifecycle

Serialization Formats

JSON (JavaScript Object Notation)

Protocol Buffers (protobuf)

MessagePack

XML (eXtensible Markup Language)

YAML (YAML Ain’t Markup Language)

TOML (Tom’s Obvious Minimal Language)

Format Comparison

Real-World Examples

Example 1: Saving Config

Example 2: REST API

Example 3: Database Storage

Common Pitfalls

1. Assuming Serialization Preserves Everything

2. Version Compatibility

3. Performance Assumptions

4. Security Vulnerabilities

Best Practices

1. Choose the Right Format

2. Handle Errors

3. Use Schemas When Possible

4. Consider Size and Speed

Conclusion

Further Reading