You Don't Know JSON: Part 4 - Binary JSON for APIs and Data Transfer

Master MessagePack and CBOR for API optimization: universal binary serialization that cuts bandwidth costs and improves mobile performance. Compare with Protocol Buffers and learn when to use each format.

tags: #Json #Messagepack #Cbor #Binary-Serialization #Performance #Optimization #Api-Performance #Microservices #Bandwidth-Optimization #Data-Transfer #Mobile #Iot
categories: Fundamentals Programming Performance
published: 2025-12-15
reading time: 23 minutes

📚 Series: You Dont Know JSON

You Don't Know JSON: Part 1 - Origins, Evolution, and the Cracks in the Foundation
You Don't Know JSON: Part 2 - JSON Schema and the Art of Validation
You Don't Know JSON: Part 3 - Binary JSON in Databases
You Don't Know JSON: Part 4 - Binary JSON for APIs and Data Transfer (current)
You Don't Know JSON: Part 5 - JSON-RPC: When REST Isn't Enough
You Don't Know JSON: Part 6 - JSON Lines: Processing Gigabytes Without Running Out of Memory
You Don't Know JSON: Part 7 - Security: Authentication, Signatures, and Attacks
You Don't Know JSON: Part 8 - Lessons from the JSON Revolution

In Part 1 , we explored JSON’s triumph through simplicity. In Part 2 , we added validation with JSON Schema. In Part 3 , we optimized database storage with JSONB and BSON.

Now we tackle the next performance frontier: API data transfer and bandwidth optimization.

While database binary formats optimize storage and queries, API binary formats optimize network efficiency - smaller payloads, faster serialization, and reduced bandwidth costs for mobile and distributed systems.

What XML Had: Text-based encoding only for APIs (1998-2015)

XML’s approach: XML APIs (SOAP/REST) encoded data as human-readable text characters. Every API response used verbose XML syntax with repeated namespace declarations, schema references, and nested element tags.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
<!-- SOAP: Text-based encoding (verbose characters) -->
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
               xmlns:user="http://example.com/users">
  <soap:Header>
    <wsse:Security>...</wsse:Security>
  </soap:Header>
  <soap:Body>
    <user:GetUser>
      <user:UserId>123</user:UserId>
    </user:GetUser>
  </soap:Body>
</soap:Envelope>

Size: 400+ bytes for simple request (all ASCII text characters)

Binary encoding attempts existed but failed:

Fast Infoset (2005): Binary XML encoding, complex spec, minimal adoption
EXI (2011): IETF standard, too late, required specialized parsers
None achieved widespread API usage

Note on embedding binary content: Both XML and JSON equally bad - must base64 encode files/images (33% overhead):

1
<image>iVBORw0KGgoAAAANSUhEUgAAAAUA...</image>  <!-- XML -->

1
{"image": "iVBORw0KGgoAAAANSUhEUgAAAAUA..."}  // JSON

Benefit: Human-readable responses, universal parser support, debuggable
Cost: Large payloads (verbose text), slow parsing, high bandwidth costs, mobile-unfriendly

JSON’s approach: Multiple binary encoding formats (MessagePack, CBOR) - compact byte representation

The key distinction:

Text encoding: Data as ASCII/UTF-8 characters - {"id":123} = readable text
Binary encoding: Data as compact bytes - 0x82 0xa2 id 0x7b = efficient binary

Architecture shift: Text-only encoding → Binary encoding options, Failed standards → Modular ecosystem success, One verbose approach → Multiple optimized formats

This article focuses on MessagePack (universal binary JSON) and CBOR (IETF-standardized format), comparing them with Protocol Buffers and analyzing real bandwidth cost savings.

MessagePack: Universal Binary Serialization

What is MessagePack?

MessagePack is a language-agnostic binary serialization format. Think of it as “binary JSON” - it serializes the same data structures (objects, arrays, strings, numbers) but in efficient binary form.

Design goals:

Smaller than JSON
Faster than JSON
Simple specification
Wide language support
Streaming-friendly

Created: 2010 by Sadayuki Furuhashi
Specification: msgpack.org

Type System

MessagePack types map cleanly to JSON:

MessagePack	JSON	Notes
nil	null	Single byte
boolean	boolean	Single byte
integer	number	Variable: 1-9 bytes depending on value
float	number	5 bytes (float32) or 9 bytes (float64)
string	string	Length-prefixed UTF-8
binary	(Base64)	Raw bytes, not in JSON
array	array	Length-prefixed
map	object	Length-prefixed key-value pairs
extension	N/A	User-defined types

Size Efficiency

Encoding examples:

Value: null
JSON:  4 bytes  "null"
MsgPack: 1 byte   0xc0

Value: true
JSON:  4 bytes  "true"
MsgPack: 1 byte   0xc3

Value: 42
JSON:  2 bytes  "42"
MsgPack: 1 byte   0x2a (fixint)

Value: 1000
JSON:  4 bytes  "1000"
MsgPack: 3 bytes  0xcd 0x03 0xe8 (uint16)

Value: "hello"
JSON:  7 bytes  "hello" (with quotes in transmission)
MsgPack: 6 bytes  0xa5 "hello" (fixstr: type+length+data)

Sample object:

1
2
3
4
5
{
  "id": 123,
  "name": "alice",
  "active": true
}

Sizes:

JSON: 46 bytes
MessagePack: 28 bytes
Savings: 39%

Array of 1000 small objects:

JSON: ~45 KB
MessagePack: ~28 KB
Savings: 38%

Encoding and Decoding

JavaScript (Node.js):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
const msgpack = require('msgpack5')();

// Encode
const data = {
  id: 123,
  username: 'alice',
  tags: ['golang', 'rust'],
  active: true,
  balance: 1234.56
};

const encoded = msgpack.encode(data);
console.log('Size:', encoded.length);  // 48 bytes vs 83 JSON

// Decode
const decoded = msgpack.decode(encoded);
console.log(decoded);  // Original data

// Stream encoding
const stream = msgpack.encoder();
stream.pipe(output);
stream.write(data);

// Stream decoding
const decoder = msgpack.decoder();
input.pipe(decoder);
decoder.on('data', obj => console.log(obj));

Go:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import "github.com/vmihailenco/msgpack/v5"

type User struct {
    ID       int      `msgpack:"id"`
    Username string   `msgpack:"username"`
    Tags     []string `msgpack:"tags"`
    Active   bool     `msgpack:"active"`
    Balance  float64  `msgpack:"balance"`
}

// Encode
user := User{
    ID:       123,
    Username: "alice",
    Tags:     []string{"golang", "rust"},
    Active:   true,
    Balance:  1234.56,
}

data, err := msgpack.Marshal(user)
if err != nil {
    panic(err)
}
fmt.Println("Size:", len(data))  // 48 bytes

// Decode
var decoded User
err = msgpack.Unmarshal(data, &decoded)
if err != nil {
    panic(err)
}
fmt.Printf("%+v\n", decoded)

// Streaming
encoder := msgpack.NewEncoder(writer)
encoder.Encode(user)

decoder := msgpack.NewDecoder(reader)
decoder.Decode(&decoded)

Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import msgpack

# Encode
data = {
    'id': 123,
    'username': 'alice',
    'tags': ['golang', 'rust'],
    'active': True,
    'balance': 1234.56
}

encoded = msgpack.packb(data)
print(f'Size: {len(encoded)}')  # 48 bytes

# Decode
decoded = msgpack.unpackb(encoded, raw=False)
print(decoded)

# Streaming
packer = msgpack.Packer()
for item in items:
    stream.write(packer.pack(item))

unpacker = msgpack.Unpacker(stream, raw=False)
for unpacked in unpacker:
    print(unpacked)

Rust:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
use serde::{Serialize, Deserialize};
use rmp_serde::{Serializer, Deserializer};

#[derive(Serialize, Deserialize, Debug)]
struct User {
    id: i32,
    username: String,
    tags: Vec<String>,
    active: bool,
    balance: f64,
}

fn main() {
    let user = User {
        id: 123,
        username: "alice".to_string(),
        tags: vec!["golang".to_string(), "rust".to_string()],
        active: true,
        balance: 1234.56,
    };

    // Encode
    let encoded = rmp_serde::to_vec(&user).unwrap();
    println!("Size: {}", encoded.len());  // 48 bytes

    // Decode
    let decoded: User = rmp_serde::from_slice(&encoded).unwrap();
    println!("{:?}", decoded);
}

Extension Types

MessagePack supports user-defined extension types:

Define custom type:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
const msgpack = require('msgpack5')();

// Register timestamp extension
msgpack.register(0x01, Date, 
  // Encode
  (date) => {
    const buf = Buffer.allocUnsafe(8);
    buf.writeDoubleBE(date.getTime());
    return buf;
  },
  // Decode
  (buf) => {
    return new Date(buf.readDoubleBE());
  }
);

// Now dates encode as binary timestamps
const data = { created: new Date() };
const encoded = msgpack.encode(data);  // Uses extension
const decoded = msgpack.decode(encoded);  // Reconstructs Date object

Performance Benchmarks

Serialization (10,000 iterations):

Format	Encode	Decode	Total
JSON	45ms	38ms	83ms
MessagePack	28ms	22ms	50ms
Speedup	1.6x	1.7x	1.7x

Complex nested object:

Format	Encode	Decode	Size
JSON	125ms	98ms	15.2 KB
MessagePack	72ms	54ms	9.8 KB
Speedup	1.7x	1.8x	1.55x

Real-World Use Cases

1. Redis caching:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
const redis = require('redis');
const msgpack = require('msgpack5')();

const client = redis.createClient();

// Store with MessagePack
async function cacheUser(user) {
  const encoded = msgpack.encode(user);
  await client.set(`user:${user.id}`, encoded);
}

// Retrieve with MessagePack
async function getUser(id) {
  const encoded = await client.getBuffer(`user:${id}`);
  return msgpack.decode(encoded);
}

// 35% memory savings vs JSON strings

2. Microservice communication:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// HTTP endpoint that returns MessagePack
func handleGetUser(w http.ResponseWriter, r *http.Request) {
    user := getUserFromDB(id)
    
    data, _ := msgpack.Marshal(user)
    
    w.Header().Set("Content-Type", "application/msgpack")
    w.Write(data)
}

// Client decodes MessagePack
resp, _ := http.Get("http://api/users/123")
defer resp.Body.Close()

var user User
decoder := msgpack.NewDecoder(resp.Body)
decoder.Decode(&user)

3. Message queue (RabbitMQ):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import pika
import msgpack

connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

# Publish with MessagePack
def publish_event(event):
    data = msgpack.packb(event)
    channel.basic_publish(
        exchange='events',
        routing_key='user.created',
        body=data
    )

# Consume with MessagePack
def callback(ch, method, properties, body):
    event = msgpack.unpackb(body, raw=False)
    handle_event(event)

channel.basic_consume(queue='events', on_message_callback=callback)

4. Log aggregation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
// Write log files in MessagePack
const fs = require('fs');
const msgpack = require('msgpack5')();

const logStream = fs.createWriteStream('app.log.msgpack');
const encoder = msgpack.encoder();
encoder.pipe(logStream);

function log(entry) {
  encoder.write({
    timestamp: Date.now(),
    level: entry.level,
    message: entry.message,
    metadata: entry.metadata
  });
}

// 40-50% smaller log files than JSON
// Faster to parse when processing logs

MessagePack Best For:

General-purpose binary serialization
Microservice communication (schemaless flexibility)
Caching layers (size matters)
Message queues
Log files (size + speed)
Mobile apps (bandwidth savings)

When to avoid:

Human debugging needed (use JSON)
Schema enforcement critical (use Protocol Buffers)
Database-specific needs (use JSONB/BSON)

CBOR: Concise Binary Object Representation

What is CBOR?

CBOR (RFC 8949) is an IETF-standardized binary data format similar to MessagePack but with more rigorous specification and additional features.

Key differences from MessagePack:

Formal IETF standard (RFC 8949)
Self-describing format
Deterministic encoding (for signatures)
Tagged types (extensible type system)
Better specification clarity

Created: 2013 (RFC 7049), updated 2020 (RFC 8949)
Specification: RFC 8949

When to Use CBOR

CBOR is preferred in:

1. Security applications (WebAuthn, COSE)

Deterministic encoding for signatures
Tagged types for security objects
Well-specified for cryptographic use

2. IoT and embedded systems

Smaller than JSON
Simple parsing (low memory)
Standardized (interoperability)

3. Standards-based systems

IETF specification ensures consistency
Multiple independent implementations
Long-term stability

CBOR vs MessagePack

Feature	CBOR	MessagePack
Standardization	IETF RFC	Community spec
Deterministic encoding	Yes (canonical)	No
Tagged types	Yes (extensible)	Extension types (simpler)
Float16 support	Yes	No
Specification clarity	Very detailed	Brief
Adoption	IoT, security	General purpose
Performance	Similar	Slightly faster

CBOR in Practice

JavaScript (Node.js):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
const cbor = require('cbor');

// Encode
const data = {
  id: 123,
  username: 'alice',
  created: new Date(),
  tags: ['golang', 'rust']
};

const encoded = cbor.encode(data);
console.log('Size:', encoded.length);

// Decode
const decoded = cbor.decode(encoded);
console.log(decoded);

// Tagged types
const tagged = new cbor.Tagged(32, 'https://example.com');  // URI tag
const encoded2 = cbor.encode(tagged);

Go:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import "github.com/fxamacker/cbor/v2"

type User struct {
    ID       int      `cbor:"id"`
    Username string   `cbor:"username"`
    Created  time.Time `cbor:"created"`
    Tags     []string `cbor:"tags"`
}

// Encode
user := User{
    ID:       123,
    Username: "alice",
    Created:  time.Now(),
    Tags:     []string{"golang", "rust"},
}

data, err := cbor.Marshal(user)
if err != nil {
    panic(err)
}

// Decode
var decoded User
err = cbor.Unmarshal(data, &decoded)

// Deterministic encoding (for signatures)
encMode, _ := cbor.CanonicalEncMode()
canonical, _ := encMode.Marshal(user)

Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import cbor2
from datetime import datetime

# Encode
data = {
    'id': 123,
    'username': 'alice',
    'created': datetime.now(),
    'tags': ['golang', 'rust']
}

encoded = cbor2.dumps(data)
print(f'Size: {len(encoded)}')

# Decode
decoded = cbor2.loads(encoded)
print(decoded)

# Tagged types
from cbor2 import CBORTag
tagged = CBORTag(32, 'https://example.com')  # URI tag
encoded2 = cbor2.dumps(tagged)

CBOR Tagged Types

CBOR’s tagged type system enables extensibility:

Standard tags:

Tag 0: Date/time string (ISO 8601)
Tag 1: Epoch-based date/time (number)
Tag 2: Positive bignum
Tag 3: Negative bignum
Tag 32: URI
Tag 33: Base64url
Tag 34: Base64
Tag 55799: Self-describe CBOR (magic number)

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
const cbor = require('cbor');

// Date (tag 1: epoch timestamp)
const date = new Date();
const encoded = cbor.encode(date);
// Encoded as tag 1 + numeric timestamp

// URI (tag 32)
const uri = new cbor.Tagged(32, 'https://example.com');
const encoded2 = cbor.encode(uri);

// Custom tag
const custom = new cbor.Tagged(1000, {custom: 'data'});

CBOR in WebAuthn

WebAuthn (web authentication standard) uses CBOR for credential data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// Browser WebAuthn API returns CBOR
const credential = await navigator.credentials.create({
  publicKey: options
});

// attestationObject is CBOR-encoded
const attestation = credential.response.attestationObject;

// Server decodes CBOR
const cbor = require('cbor');
const decoded = cbor.decode(attestation);

console.log(decoded);
// {
//   fmt: 'packed',
//   attStmt: {...},
//   authData: <Buffer...>
// }

Size Comparison

Sample data:

1
2
3
4
5
6
7
{
  "id": 123,
  "username": "alice",
  "email": "alice@example.com",
  "created": "2023-01-15T10:30:00Z",
  "tags": ["golang", "rust", "python"]
}

Sizes:

JSON: 142 bytes
MessagePack: 88 bytes
CBOR: 90 bytes
Difference: CBOR ~2 bytes larger (negligible)

CBOR Best For:

IoT devices and embedded systems
Security applications (WebAuthn, COSE)
Standards-based systems (need RFC)
Cryptographic use (deterministic encoding)

Use MessagePack instead if:

General-purpose serialization
Performance critical (slight edge)
Simpler specification preferred
Wider ecosystem matters

Performance Benchmarks

Test Methodology

Environment:

CPU: Intel i7-12700K
RAM: 32GB DDR4
OS: Ubuntu 22.04
Languages: Node.js 20, Go 1.21, Python 3.11

Test data:

Small object: User profile (200 bytes JSON)
Medium object: API response (5 KB JSON)
Large array: 10,000 user objects (2 MB JSON)

Results: Small Object (200 bytes)

Encoding speed (ops/sec):

Format	JavaScript	Go	Python
JSON	1,245,000	2,100,000	385,000
MessagePack	1,890,000	3,200,000	580,000
CBOR	1,720,000	2,950,000	520,000
BSON	945,000	1,850,000	310,000

Speedup vs JSON:

MessagePack: 1.5x
CBOR: 1.4x
BSON: 0.8x (slower)

Size:

JSON: 200 bytes
MessagePack: 128 bytes (36% smaller)
CBOR: 131 bytes (35% smaller)
BSON: 142 bytes (29% smaller)

Results: Medium Object (5 KB)

Encoding speed (ops/sec):

Format	JavaScript	Go	Python
JSON	52,000	98,000	18,500
MessagePack	88,000	165,000	32,000
CBOR	79,000	152,000	28,000
BSON	41,000	85,000	15,000

Speedup vs JSON:

MessagePack: 1.7x
CBOR: 1.5x
BSON: 0.8x

Size:

JSON: 5,120 bytes
MessagePack: 3,280 bytes (36% smaller)
CBOR: 3,350 bytes (35% smaller)
BSON: 3,680 bytes (28% smaller)

Results: Large Array (2 MB, 10K objects)

Encoding time:

Format	JavaScript	Go	Python
JSON	125ms	72ms	385ms
MessagePack	73ms	41ms	225ms
CBOR	82ms	48ms	255ms
BSON	145ms	85ms	425ms

Speedup vs JSON:

MessagePack: 1.7x
CBOR: 1.5x
BSON: 0.9x

Size:

JSON: 2.05 MB
MessagePack: 1.31 MB (36% smaller)
CBOR: 1.34 MB (35% smaller)
BSON: 1.48 MB (28% smaller)

Memory Usage

Peak memory during encoding (2 MB dataset):

Format	JavaScript	Go	Python
JSON	8.2 MB	4.5 MB	12.3 MB
MessagePack	6.1 MB	3.2 MB	9.1 MB
CBOR	6.4 MB	3.4 MB	9.5 MB
BSON	7.8 MB	4.1 MB	11.8 MB

flowchart TB subgraph perf["Performance Characteristics"] size[Size Efficiency
36% smaller than JSON] speed[Parse Speed
1.7x faster] memory[Memory Usage
25% less memory] end subgraph formats["Binary Format Rankings"] msgpack[MessagePack
Best Overall Balance] cbor[CBOR
Standards Compliant] bson[BSON
MongoDB Extended Types] end perf --> formats style msgpack fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style cbor fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style bson fill:#4C4538,stroke:#6b7280,color:#f0f0f0 style perf fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style formats fill:#3A4C43,stroke:#6b7280,color:#f0f0f0

Key Takeaways

Size savings:

Binary formats: 28-36% smaller than JSON
MessagePack/CBOR most efficient
BSON less efficient (extended type overhead)

Speed improvements:

1.5-1.7x faster encoding/decoding
Go implementations fastest
Python benefits most from binary formats

Memory efficiency:

20-30% less memory than JSON
Streaming parsers reduce memory further

Benchmark Caveats:

Results vary by data structure (nested vs flat)
Implementation quality matters (library choice)
Compression changes the equation (gzip, zstd)
Network overhead may dominate (size less critical)
Always benchmark with YOUR actual data

Binary JSON vs Protocol Buffers

Both solve JSON’s performance problems, but through different philosophies:

Fundamental Difference

Binary JSON (MessagePack, CBOR):

Schemaless (like JSON)
Self-describing format
Flexible structure
No compilation step

Protocol Buffers:

Schema required
Schema compiled to code
Strict structure
Type safety enforced

Detailed Comparison

Aspect	Binary JSON	Protocol Buffers
Schema	Optional	Required
Flexibility	Add fields freely	Schema evolution rules
Size	30-40% smaller than JSON	50-70% smaller than JSON
Speed	1.5-2x faster than JSON	3-5x faster than JSON
Type safety	Runtime only	Compile-time
Versioning	Implicit	Explicit (field numbers)
Debugging	Can inspect structure	Need schema to decode
Setup	Zero (just library)	Schema compilation
Cross-language	Parse anywhere	Generated code per language

Size Comparison

Sample user object:

1
2
3
4
5
6
7
{
  "id": 123,
  "username": "alice",
  "email": "alice@example.com",
  "age": 30,
  "active": true
}

Sizes:

JSON: 98 bytes
MessagePack: 62 bytes (37% smaller)
Protocol Buffers: 28 bytes (71% smaller)

Why Protocol Buffers is smaller:

Field numbers instead of names (1 byte vs “username” = 8 bytes)
Efficient varint encoding
No type markers (schema provides types)

When to Use Each

Use Binary JSON (MessagePack/CBOR) when:

Schema flexibility needed (rapid iteration)
Dynamic data structures (user-generated content)
Different clients need different fields
Simple setup (no compilation)
Debugging matters (self-describing)
Multiple data types in same stream

Use Protocol Buffers when:

Schema stability (defined API contract)
Maximum performance (size + speed)
Type safety critical
Versioning discipline needed
RPC systems (gRPC)
Long-term data storage

Hybrid Approaches

1. Protocol Buffers with JSON names:

1
2
3
4
message User {
  int32 id = 1 [json_name = "id"];
  string username = 2 [json_name = "username"];
}

Can serialize as JSON or binary.

2. MessagePack with schema validation:

1
2
3
4
5
6
7
8
const Ajv = require('ajv');
const msgpack = require('msgpack5')();

// Validate before encoding
const validate = ajv.compile(schema);
if (validate(data)) {
  const encoded = msgpack.encode(data);
}

3. Mixed protocols:

1
2
3
4
5
6
7
8
// JSON for configuration (human-edited)
const config = JSON.parse(fs.readFileSync('config.json'));

// MessagePack for high-volume data
const data = msgpack.decode(message);

// Protocol Buffers for RPC
const request = UserRequest.decode(buffer);

Migration Example

From JSON to MessagePack (gradual):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Step 1: Support both formats
app.post('/api/users', async (req, res) => {
  const contentType = req.headers['content-type'];
  
  let data;
  if (contentType === 'application/msgpack') {
    data = msgpack.decode(req.body);
  } else {
    data = JSON.parse(req.body);
  }
  
  // Process data...
  
  // Return in same format
  if (contentType === 'application/msgpack') {
    res.type('application/msgpack');
    res.send(msgpack.encode(result));
  } else {
    res.json(result);
  }
});

// Step 2: Update clients gradually
// Step 3: Monitor metrics (size, speed, errors)
// Step 4: Deprecate JSON after migration complete

For more on Protocol Buffers, see: Understanding Protocol Buffers: Part 1

Cloud Bandwidth Cost Savings

The Economics of Binary Formats

For commercial products with metered bandwidth, binary formats can dramatically reduce infrastructure costs.

Cloud provider pricing (examples):

AWS: $0.09/GB data transfer out (first 10TB/month)
Google Cloud: $0.12/GB egress (first 1TB/month)
Azure: $0.087/GB bandwidth (first 5TB/month)

Real-World Cost Analysis

Scenario: API serving 1 billion requests/month with 2KB average response

Text JSON:

2KB × 1,000,000,000 = 2,000 GB/month
At $0.09/GB = $180/month bandwidth costs

Protocol Buffers (60% size reduction):

0.8KB × 1,000,000,000 = 800 GB/month
At $0.09/GB = $72/month bandwidth costs
Savings: $108/month ($1,296/year)

MessagePack (40% size reduction):

1.2KB × 1,000,000,000 = 1,200 GB/month
At $0.09/GB = $108/month bandwidth costs
Savings: $72/month ($864/year)

Mobile API Cost Impact

Mobile apps on cellular networks are especially sensitive:

JSON response (5KB):

1
2
3
4
5
6
7
{
  "users": [
    {"id": 1, "username": "alice", "email": "alice@example.com", ...},
    {"id": 2, "username": "bob", "email": "bob@example.com", ...},
    // ... 50 users
  ]
}

Size: 5KB
10M API calls/month = 50,000 GB
Cost: $4,500/month

MessagePack (3KB - 40% reduction):

Size: 3KB
10M API calls/month = 30,000 GB
Cost: $2,700/month
Savings: $1,800/month ($21,600/year)

Protocol Buffers (2KB - 60% reduction):

Size: 2KB
10M API calls/month = 20,000 GB
Cost: $1,800/month
Savings: $2,700/month ($32,400/year)

Break-Even Analysis

When does binary format investment pay off?

Implementation costs (one-time):

Developer time: 40-80 hours ($4,000-$8,000)
Testing and validation: 20-40 hours ($2,000-$4,000)
Documentation and training: 10-20 hours ($1,000-$2,000)
Total: $7,000-$14,000

Monthly savings from examples above:

Small API (1B requests): $72-$108/month → ROI in 6-12 months
Mobile API (10M requests): $1,800-$2,700/month → ROI in 3-5 months
Large API (10B requests): $7,200-$10,800/month → ROI in 1 month

Cost Optimization Strategy: For APIs serving >100M requests/month or mobile apps with bandwidth-constrained users, binary formats often pay for themselves within 6 months purely from bandwidth savings - before considering performance improvements.

Additional Cost Benefits

Beyond bandwidth:

Compute costs: Faster parsing = lower CPU usage = smaller instances
Cache efficiency: Smaller payloads = more entries in fixed-size caches
CDN costs: Many CDNs charge per GB - binary formats reduce bills
Mobile UX: Faster responses = better retention = higher revenue

When Cost Savings Don’t Apply

Free tiers and small scale:

Personal projects within free tier limits
APIs with <10M requests/month
Internal tools on private networks (no egress charges)
Development/staging environments

Break-even threshold: ~50-100M requests/month depending on response size

Why Not Always Use Protocol Buffers?

Given the cost savings and performance benefits, why doesn’t everyone use Protocol Buffers for everything?

1. Schema Rigidity and Deployment Coordination

Protocol Buffers require compilation and strict schemas:

1
2
3
4
5
message User {
  int32 id = 1;
  string username = 2;
  string email = 3;
}

What happens when you need a new field:

Update .proto file
Regenerate code for all languages (Go, Python, JS, etc.)
Deploy updated code to all services
Coordinate deployments across teams
Handle backward compatibility

JSON/MessagePack: Just add the field, it works immediately.

1
2
3
4
5
6
// JSON: Add field instantly
const user = {
  id: 123,
  username: "alice",
  newField: "works immediately"  // No compilation needed
};

Impact depends on your setup:

With mature tooling (automated pipeline):

make generate → commit → CI deploys
Similar velocity to JSON for established teams
Overhead: ~2-5 minutes for regeneration + deployment

Without automation (manual process):

Update proto → manually regenerate → test → coordinate → deploy
Cross-team coordination if shared protos
Overhead: 30 minutes to 2 hours depending on team size

Where this genuinely slows development:

Rapid prototyping: Trying different data shapes daily
A/B testing: Frontend experimenting with new fields
Cross-team dependencies: Service A waits for Service B’s proto update
Small teams: No dedicated DevOps to automate workflow

2. Dynamic Data Structures

User-generated content doesn’t fit schemas:

1
2
3
4
5
6
7
8
9
{
  "post_id": "abc123",
  "content": "Hello world",
  "metadata": {
    "custom_field_1": "user defined",
    "custom_field_2": 42,
    "arbitrary_key": ["dynamic", "array"]
  }
}

With Protobuf, you’d need:

1
2
3
4
5
message Post {
  string post_id = 1;
  string content = 2;
  map<string, google.protobuf.Any> metadata = 3;  // Loses type safety
}

You end up with Any types everywhere, defeating the purpose of schemas.

Use cases requiring flexibility:

CMS platforms (arbitrary fields per content type)
Analytics events (different properties per event)
Plugin systems (plugins add their own fields)
Form builders (user-defined form schemas)

3. Developer Experience Friction

JSON workflow (instant feedback):

1
2
3
4
curl https://api.example.com/users/123
# See data immediately in terminal
# Copy/paste into docs
# Share with coworkers in Slack

Protobuf workflow (requires tooling):

1
2
3
4
5
curl https://api.example.com/users/123
# Get binary garbage: ▒▒▒alice▒▒▒
# Need protoc to decode
# Need .proto files
# Need to explain to frontend devs

Onboarding cost:

New developers must learn protobuf toolchain
Need IDE plugins for syntax highlighting
Need to understand wire format for debugging
Harder to write integration tests

4. Browser and Client Limitations

JavaScript ecosystem challenges:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// JSON: Native support
fetch('/api/users')
  .then(r => r.json())  // Built-in
  .then(data => console.log(data));

// Protobuf: Requires libraries and setup
import { User } from './generated/user_pb.js';  // 50KB+ bundle size

fetch('/api/users')
  .then(r => r.arrayBuffer())
  .then(buf => {
    const user = User.deserializeBinary(new Uint8Array(buf));
    // More complex API
  });

Bundle size impact:

protobuf.js: ~50KB minified
JSON: 0KB (native)
For small apps, protobuf library is larger than data savings

5. Third-Party Integrations

Many services only accept JSON:

Webhooks (Stripe, GitHub, etc.)
Logging services (Datadog, Splunk)
Monitoring tools (Prometheus, Grafana)
CI/CD systems (GitHub Actions, GitLab)

You’d need JSON anyway for integrations.

6. Rapid Prototyping and Exploratory Development

Early-stage development priorities:

Ship fast, iterate quickly
Schema changes frequently
Developer velocity > optimization
Unknown requirements

Protobuf’s schema-first approach adds friction during exploration phase.

Example: Evolving user model

Week 1: User has name field
Week 2: Split into first_name and last_name
Week 3: Add optional middle_name
Week 4: Support international names (single field after all)

With JSON: Immediate changes, no regeneration
With Protobuf: Regeneration each iteration (adds 2-5 minutes per change with automation, more without)

This matters most when:

Requirements are unknown or changing daily
Team is experimenting with different approaches
Product-market fit not yet established
Schema volatility is high

Less relevant when:

API contracts are stable
Team has established patterns
Schema changes are infrequent (monthly, not daily)

7. Mixed Data Scenarios

Real applications use multiple formats:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// Config files: JSON (human-edited)
const config = require('./config.json');

// API responses: JSON (client compatibility)
app.get('/api/users', (req, res) => {
  res.json(users);
});

// Internal RPC: Protobuf (performance critical)
const response = await internalService.getUsers(request);

// Logs: JSON Lines (tooling compatibility)
logger.info({userId: 123, action: 'login'});

Using protobuf everywhere would mean:

Config files need compilation
Logs need special tools
API clients need protobuf libraries
Higher complexity for marginal additional gains

8. When Protobuf Makes Sense

Use Protocol Buffers when:

High-scale APIs (>100M requests/month) - cost savings justify complexity
Internal microservices - control both ends, can coordinate schemas
Performance-critical paths - gRPC for low-latency RPC
Stable APIs - schema rarely changes
Type safety matters - compilation catches errors
Mobile apps - bandwidth constrained, latency sensitive

Stick with JSON/MessagePack when:

Public APIs - broad compatibility needed
Rapid iteration - schema changes frequently
Simple projects - not worth the tooling overhead
Browser clients - avoid bundle size bloat
Third-party integrations - JSON required anyway
Development/staging - easier debugging

The Real Answer: Most successful systems use both. JSON for public APIs and configuration, Protobuf for internal high-traffic RPC. The “always use X” approach ignores the trade-offs between developer velocity, operational complexity, and performance gains.

Real-World Use Cases

1. High-Throughput API (MessagePack)

Scenario: API serving 50K requests/sec, 5KB average response

Before (JSON):

Response size: 5 KB
Parse time: 2.1ms
Network: 250 Mbps
Memory: 12 GB

After (MessagePack):

Response size: 3.2 KB (36% smaller)
Parse time: 1.2ms (43% faster)
Network: 160 Mbps (36% reduction)
Memory: 8.5 GB (29% reduction)

Implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// Express middleware
app.use((req, res, next) => {
  res.sendMsgPack = (data) => {
    res.type('application/msgpack');
    res.send(msgpack.encode(data));
  };
  next();
});

app.get('/api/products', async (req, res) => {
  const products = await db.products.find();
  res.sendMsgPack(products);
});

// Client
const response = await fetch('/api/products', {
  headers: {'Accept': 'application/msgpack'}
});
const buffer = await response.arrayBuffer();
const products = msgpack.decode(Buffer.from(buffer));

2. Mobile App (MessagePack)

Scenario: Mobile app on cellular networks, battery-conscious

Benefits:

35% less bandwidth (cost savings)
Faster parsing (battery savings)
Better on slow networks

Implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// React Native client
import msgpack from 'react-native-msgpack';

async function fetchData(endpoint) {
  const response = await fetch(API_URL + endpoint, {
    headers: {
      'Accept': 'application/msgpack',
      'Content-Type': 'application/msgpack'
    }
  });
  
  const buffer = await response.arrayBuffer();
  return msgpack.decode(new Uint8Array(buffer));
}

async function postData(endpoint, data) {
  const encoded = msgpack.encode(data);
  
  const response = await fetch(API_URL + endpoint, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/msgpack'
    },
    body: encoded
  });
  
  const buffer = await response.arrayBuffer();
  return msgpack.decode(new Uint8Array(buffer));
}

3. IoT Device Communication (CBOR)

Scenario: Temperature sensors sending data every minute

Device code (embedded C):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include "cbor.h"

void send_reading() {
    CborEncoder encoder, map;
    uint8_t buffer[128];
    
    cbor_encoder_init(&encoder, buffer, sizeof(buffer), 0);
    cbor_encoder_create_map(&encoder, &map, 4);
    
    cbor_encode_text_stringz(&map, "device_id");
    cbor_encode_text_stringz(&map, "sensor-001");
    
    cbor_encode_text_stringz(&map, "temperature");
    cbor_encode_float(&map, 23.5);
    
    cbor_encode_text_stringz(&map, "humidity");
    cbor_encode_float(&map, 65.2);
    
    cbor_encode_text_stringz(&map, "timestamp");
    cbor_encode_int(&map, time(NULL));
    
    cbor_encoder_close_container(&encoder, &map);
    
    size_t length = cbor_encoder_get_buffer_size(&encoder, buffer);
    send_to_gateway(buffer, length);
}

Gateway (Node.js):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
const cbor = require('cbor');

function processReading(buffer) {
  const reading = cbor.decode(buffer);
  
  console.log(`Device: ${reading.device_id}`);
  console.log(`Temp: ${reading.temperature}°C`);
  console.log(`Humidity: ${reading.humidity}%`);
  
  // Store in time-series database
  influx.writePoints([{
    measurement: 'temperature',
    tags: {device: reading.device_id},
    fields: {
      value: reading.temperature,
      humidity: reading.humidity
    },
    timestamp: reading.timestamp * 1000000000
  }]);
}

Benefits:

45% smaller than JSON (bandwidth critical)
Standardized format (IETF RFC)
Simple parsing on embedded devices
Low memory footprint

Choosing Your Binary Format

Binary JSON formats solve the performance limitations of text JSON for API and data transfer while maintaining structural flexibility. The choice depends on your specific needs:

Decision Matrix

Choose MessagePack if:

General-purpose binary serialization
Maximum speed and size efficiency
Microservice communication
Message queues, caching layers
Wide language support needed

Choose CBOR if:

IoT or embedded systems
Security applications (WebAuthn, COSE)
Need IETF standard
Deterministic encoding required

Choose Protocol Buffers if:

Maximum performance (size + speed)
Schema enforcement critical
Long-term data storage
RPC systems (gRPC)

Stick with JSON if:

Human readability critical (configs, logs)
Debugging frequency high
Payloads small (<10 KB)
Performance acceptable
Simplicity trumps efficiency

What We Learned

Binary formats provide:

30-40% size reduction over JSON
1.5-2x faster parsing
Extended type systems (dates, binary data)
Better memory efficiency
Significant bandwidth cost savings

Trade-offs:

Loss of human-readability
Binary debugging tools needed
Schema drift without validation
Ecosystem smaller than JSON

Key insight: Binary formats fill the gap between JSON’s simplicity and Protocol Buffers’ schema enforcement. They’re the right choice when JSON’s performance matters but schema flexibility is still needed.

What’s Next: Streaming JSON

We’ve optimized JSON storage (Part 3) and network transfer (this part). But what about processing large datasets that don’t fit in memory? What about streaming APIs and log processing?

In Part 5 , we’ll explore JSON-RPC - adding structured RPC protocols on top of JSON for API consistency and type safety. Then in Part 6 , we’ll tackle streaming with JSON Lines (JSONL) for processing gigabytes of data without running out of memory.

Coming up:

JSON-RPC: Structured remote procedure calls
JSON Lines: Streaming and big data processing
Security considerations: JWT, canonicalization, and attacks

The goal remains the same - extending JSON’s capabilities while maintaining its fundamental simplicity and flexibility.

References

Specifications:

Libraries:

Performance:

Related:

📚 Series: You Dont Know JSON

You Don't Know JSON: Part 1 - Origins, Evolution, and the Cracks in the Foundation
You Don't Know JSON: Part 2 - JSON Schema and the Art of Validation
You Don't Know JSON: Part 3 - Binary JSON in Databases
You Don't Know JSON: Part 4 - Binary JSON for APIs and Data Transfer (current)
You Don't Know JSON: Part 5 - JSON-RPC: When REST Isn't Enough
You Don't Know JSON: Part 6 - JSON Lines: Processing Gigabytes Without Running Out of Memory
You Don't Know JSON: Part 7 - Security: Authentication, Signatures, and Attacks
You Don't Know JSON: Part 8 - Lessons from the JSON Revolution