You Don't Know JSON: Part 4 - Binary JSON for APIs and Data Transfer

Master MessagePack and CBOR for API optimization: universal binary serialization that cuts bandwidth costs and improves mobile performance. Compare with Protocol Buffers and learn when to use each format.

In Part 1 , we explored JSON’s triumph through simplicity. In Part 2 , we added validation with JSON Schema. In Part 3 , we optimized database storage with JSONB and BSON.

Now we tackle the next performance frontier: API data transfer and bandwidth optimization.

While database binary formats optimize storage and queries, API binary formats optimize network efficiency - smaller payloads, faster serialization, and reduced bandwidth costs for mobile and distributed systems.

What XML Had: Text-based encoding only for APIs (1998-2015)

XML’s approach: XML APIs (SOAP/REST) encoded data as human-readable text characters. Every API response used verbose XML syntax with repeated namespace declarations, schema references, and nested element tags.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
<!-- SOAP: Text-based encoding (verbose characters) -->
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
               xmlns:user="http://example.com/users">
  <soap:Header>
    <wsse:Security>...</wsse:Security>
  </soap:Header>
  <soap:Body>
    <user:GetUser>
      <user:UserId>123</user:UserId>
    </user:GetUser>
  </soap:Body>
</soap:Envelope>

Size: 400+ bytes for simple request (all ASCII text characters)

Binary encoding attempts existed but failed:

  • Fast Infoset (2005): Binary XML encoding, complex spec, minimal adoption
  • EXI (2011): IETF standard, too late, required specialized parsers
  • None achieved widespread API usage

Note on embedding binary content: Both XML and JSON equally bad - must base64 encode files/images (33% overhead):

1
<image>iVBORw0KGgoAAAANSUhEUgAAAAUA...</image>  <!-- XML -->
1
{"image": "iVBORw0KGgoAAAANSUhEUgAAAAUA..."}  // JSON

Benefit: Human-readable responses, universal parser support, debuggable
Cost: Large payloads (verbose text), slow parsing, high bandwidth costs, mobile-unfriendly

JSON’s approach: Multiple binary encoding formats (MessagePack, CBOR) - compact byte representation

The key distinction:

  • Text encoding: Data as ASCII/UTF-8 characters - {"id":123} = readable text
  • Binary encoding: Data as compact bytes - 0x82 0xa2 id 0x7b = efficient binary

Architecture shift: Text-only encoding β†’ Binary encoding options, Failed standards β†’ Modular ecosystem success, One verbose approach β†’ Multiple optimized formats

This article focuses on MessagePack (universal binary JSON) and CBOR (IETF-standardized format), comparing them with Protocol Buffers and analyzing real bandwidth cost savings.


MessagePack: Universal Binary Serialization

What is MessagePack?

MessagePack is a language-agnostic binary serialization format. Think of it as “binary JSON” - it serializes the same data structures (objects, arrays, strings, numbers) but in efficient binary form.

Design goals:

  • Smaller than JSON
  • Faster than JSON
  • Simple specification
  • Wide language support
  • Streaming-friendly

Created: 2010 by Sadayuki Furuhashi
Specification: msgpack.org

Type System

MessagePack types map cleanly to JSON:

MessagePackJSONNotes
nilnullSingle byte
booleanbooleanSingle byte
integernumberVariable: 1-9 bytes depending on value
floatnumber5 bytes (float32) or 9 bytes (float64)
stringstringLength-prefixed UTF-8
binary(Base64)Raw bytes, not in JSON
arrayarrayLength-prefixed
mapobjectLength-prefixed key-value pairs
extensionN/AUser-defined types

Size Efficiency

Encoding examples:

Value: null
JSON:  4 bytes  "null"
MsgPack: 1 byte   0xc0

Value: true
JSON:  4 bytes  "true"
MsgPack: 1 byte   0xc3

Value: 42
JSON:  2 bytes  "42"
MsgPack: 1 byte   0x2a (fixint)

Value: 1000
JSON:  4 bytes  "1000"
MsgPack: 3 bytes  0xcd 0x03 0xe8 (uint16)

Value: "hello"
JSON:  7 bytes  "hello" (with quotes in transmission)
MsgPack: 6 bytes  0xa5 "hello" (fixstr: type+length+data)

Sample object:

1
2
3
4
5
{
  "id": 123,
  "name": "alice",
  "active": true
}

Sizes:

  • JSON: 46 bytes
  • MessagePack: 28 bytes
  • Savings: 39%

Array of 1000 small objects:

  • JSON: ~45 KB
  • MessagePack: ~28 KB
  • Savings: 38%

Encoding and Decoding

JavaScript (Node.js):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
const msgpack = require('msgpack5')();

// Encode
const data = {
  id: 123,
  username: 'alice',
  tags: ['golang', 'rust'],
  active: true,
  balance: 1234.56
};

const encoded = msgpack.encode(data);
console.log('Size:', encoded.length);  // 48 bytes vs 83 JSON

// Decode
const decoded = msgpack.decode(encoded);
console.log(decoded);  // Original data

// Stream encoding
const stream = msgpack.encoder();
stream.pipe(output);
stream.write(data);

// Stream decoding
const decoder = msgpack.decoder();
input.pipe(decoder);
decoder.on('data', obj => console.log(obj));

Go:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import "github.com/vmihailenco/msgpack/v5"

type User struct {
    ID       int      `msgpack:"id"`
    Username string   `msgpack:"username"`
    Tags     []string `msgpack:"tags"`
    Active   bool     `msgpack:"active"`
    Balance  float64  `msgpack:"balance"`
}

// Encode
user := User{
    ID:       123,
    Username: "alice",
    Tags:     []string{"golang", "rust"},
    Active:   true,
    Balance:  1234.56,
}

data, err := msgpack.Marshal(user)
if err != nil {
    panic(err)
}
fmt.Println("Size:", len(data))  // 48 bytes

// Decode
var decoded User
err = msgpack.Unmarshal(data, &decoded)
if err != nil {
    panic(err)
}
fmt.Printf("%+v\n", decoded)

// Streaming
encoder := msgpack.NewEncoder(writer)
encoder.Encode(user)

decoder := msgpack.NewDecoder(reader)
decoder.Decode(&decoded)

Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import msgpack

# Encode
data = {
    'id': 123,
    'username': 'alice',
    'tags': ['golang', 'rust'],
    'active': True,
    'balance': 1234.56
}

encoded = msgpack.packb(data)
print(f'Size: {len(encoded)}')  # 48 bytes

# Decode
decoded = msgpack.unpackb(encoded, raw=False)
print(decoded)

# Streaming
packer = msgpack.Packer()
for item in items:
    stream.write(packer.pack(item))

unpacker = msgpack.Unpacker(stream, raw=False)
for unpacked in unpacker:
    print(unpacked)

Rust:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
use serde::{Serialize, Deserialize};
use rmp_serde::{Serializer, Deserializer};

#[derive(Serialize, Deserialize, Debug)]
struct User {
    id: i32,
    username: String,
    tags: Vec<String>,
    active: bool,
    balance: f64,
}

fn main() {
    let user = User {
        id: 123,
        username: "alice".to_string(),
        tags: vec!["golang".to_string(), "rust".to_string()],
        active: true,
        balance: 1234.56,
    };

    // Encode
    let encoded = rmp_serde::to_vec(&user).unwrap();
    println!("Size: {}", encoded.len());  // 48 bytes

    // Decode
    let decoded: User = rmp_serde::from_slice(&encoded).unwrap();
    println!("{:?}", decoded);
}

Extension Types

MessagePack supports user-defined extension types:

Define custom type:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
const msgpack = require('msgpack5')();

// Register timestamp extension
msgpack.register(0x01, Date, 
  // Encode
  (date) => {
    const buf = Buffer.allocUnsafe(8);
    buf.writeDoubleBE(date.getTime());
    return buf;
  },
  // Decode
  (buf) => {
    return new Date(buf.readDoubleBE());
  }
);

// Now dates encode as binary timestamps
const data = { created: new Date() };
const encoded = msgpack.encode(data);  // Uses extension
const decoded = msgpack.decode(encoded);  // Reconstructs Date object

Performance Benchmarks

Serialization (10,000 iterations):

FormatEncodeDecodeTotal
JSON45ms38ms83ms
MessagePack28ms22ms50ms
Speedup1.6x1.7x1.7x

Complex nested object:

FormatEncodeDecodeSize
JSON125ms98ms15.2 KB
MessagePack72ms54ms9.8 KB
Speedup1.7x1.8x1.55x

Real-World Use Cases

1. Redis caching:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
const redis = require('redis');
const msgpack = require('msgpack5')();

const client = redis.createClient();

// Store with MessagePack
async function cacheUser(user) {
  const encoded = msgpack.encode(user);
  await client.set(`user:${user.id}`, encoded);
}

// Retrieve with MessagePack
async function getUser(id) {
  const encoded = await client.getBuffer(`user:${id}`);
  return msgpack.decode(encoded);
}

// 35% memory savings vs JSON strings

2. Microservice communication:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// HTTP endpoint that returns MessagePack
func handleGetUser(w http.ResponseWriter, r *http.Request) {
    user := getUserFromDB(id)
    
    data, _ := msgpack.Marshal(user)
    
    w.Header().Set("Content-Type", "application/msgpack")
    w.Write(data)
}

// Client decodes MessagePack
resp, _ := http.Get("http://api/users/123")
defer resp.Body.Close()

var user User
decoder := msgpack.NewDecoder(resp.Body)
decoder.Decode(&user)

3. Message queue (RabbitMQ):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import pika
import msgpack

connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

# Publish with MessagePack
def publish_event(event):
    data = msgpack.packb(event)
    channel.basic_publish(
        exchange='events',
        routing_key='user.created',
        body=data
    )

# Consume with MessagePack
def callback(ch, method, properties, body):
    event = msgpack.unpackb(body, raw=False)
    handle_event(event)

channel.basic_consume(queue='events', on_message_callback=callback)

4. Log aggregation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
// Write log files in MessagePack
const fs = require('fs');
const msgpack = require('msgpack5')();

const logStream = fs.createWriteStream('app.log.msgpack');
const encoder = msgpack.encoder();
encoder.pipe(logStream);

function log(entry) {
  encoder.write({
    timestamp: Date.now(),
    level: entry.level,
    message: entry.message,
    metadata: entry.metadata
  });
}

// 40-50% smaller log files than JSON
// Faster to parse when processing logs

MessagePack Best For:

  • General-purpose binary serialization
  • Microservice communication (schemaless flexibility)
  • Caching layers (size matters)
  • Message queues
  • Log files (size + speed)
  • Mobile apps (bandwidth savings)

When to avoid:

  • Human debugging needed (use JSON)
  • Schema enforcement critical (use Protocol Buffers)
  • Database-specific needs (use JSONB/BSON)

CBOR: Concise Binary Object Representation

What is CBOR?

CBOR (RFC 8949) is an IETF-standardized binary data format similar to MessagePack but with more rigorous specification and additional features.

Key differences from MessagePack:

  • Formal IETF standard (RFC 8949)
  • Self-describing format
  • Deterministic encoding (for signatures)
  • Tagged types (extensible type system)
  • Better specification clarity

Created: 2013 (RFC 7049), updated 2020 (RFC 8949)
Specification: RFC 8949

When to Use CBOR

CBOR is preferred in:

1. Security applications (WebAuthn, COSE)

  • Deterministic encoding for signatures
  • Tagged types for security objects
  • Well-specified for cryptographic use

2. IoT and embedded systems

  • Smaller than JSON
  • Simple parsing (low memory)
  • Standardized (interoperability)

3. Standards-based systems

  • IETF specification ensures consistency
  • Multiple independent implementations
  • Long-term stability

CBOR vs MessagePack

FeatureCBORMessagePack
StandardizationIETF RFCCommunity spec
Deterministic encodingYes (canonical)No
Tagged typesYes (extensible)Extension types (simpler)
Float16 supportYesNo
Specification clarityVery detailedBrief
AdoptionIoT, securityGeneral purpose
PerformanceSimilarSlightly faster

CBOR in Practice

JavaScript (Node.js):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
const cbor = require('cbor');

// Encode
const data = {
  id: 123,
  username: 'alice',
  created: new Date(),
  tags: ['golang', 'rust']
};

const encoded = cbor.encode(data);
console.log('Size:', encoded.length);

// Decode
const decoded = cbor.decode(encoded);
console.log(decoded);

// Tagged types
const tagged = new cbor.Tagged(32, 'https://example.com');  // URI tag
const encoded2 = cbor.encode(tagged);

Go:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import "github.com/fxamacker/cbor/v2"

type User struct {
    ID       int      `cbor:"id"`
    Username string   `cbor:"username"`
    Created  time.Time `cbor:"created"`
    Tags     []string `cbor:"tags"`
}

// Encode
user := User{
    ID:       123,
    Username: "alice",
    Created:  time.Now(),
    Tags:     []string{"golang", "rust"},
}

data, err := cbor.Marshal(user)
if err != nil {
    panic(err)
}

// Decode
var decoded User
err = cbor.Unmarshal(data, &decoded)

// Deterministic encoding (for signatures)
encMode, _ := cbor.CanonicalEncMode()
canonical, _ := encMode.Marshal(user)

Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import cbor2
from datetime import datetime

# Encode
data = {
    'id': 123,
    'username': 'alice',
    'created': datetime.now(),
    'tags': ['golang', 'rust']
}

encoded = cbor2.dumps(data)
print(f'Size: {len(encoded)}')

# Decode
decoded = cbor2.loads(encoded)
print(decoded)

# Tagged types
from cbor2 import CBORTag
tagged = CBORTag(32, 'https://example.com')  # URI tag
encoded2 = cbor2.dumps(tagged)

CBOR Tagged Types

CBOR’s tagged type system enables extensibility:

Standard tags:

Tag 0: Date/time string (ISO 8601)
Tag 1: Epoch-based date/time (number)
Tag 2: Positive bignum
Tag 3: Negative bignum
Tag 32: URI
Tag 33: Base64url
Tag 34: Base64
Tag 55799: Self-describe CBOR (magic number)

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
const cbor = require('cbor');

// Date (tag 1: epoch timestamp)
const date = new Date();
const encoded = cbor.encode(date);
// Encoded as tag 1 + numeric timestamp

// URI (tag 32)
const uri = new cbor.Tagged(32, 'https://example.com');
const encoded2 = cbor.encode(uri);

// Custom tag
const custom = new cbor.Tagged(1000, {custom: 'data'});

CBOR in WebAuthn

WebAuthn (web authentication standard) uses CBOR for credential data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// Browser WebAuthn API returns CBOR
const credential = await navigator.credentials.create({
  publicKey: options
});

// attestationObject is CBOR-encoded
const attestation = credential.response.attestationObject;

// Server decodes CBOR
const cbor = require('cbor');
const decoded = cbor.decode(attestation);

console.log(decoded);
// {
//   fmt: 'packed',
//   attStmt: {...},
//   authData: <Buffer...>
// }

Size Comparison

Sample data:

1
2
3
4
5
6
7
{
  "id": 123,
  "username": "alice",
  "email": "alice@example.com",
  "created": "2023-01-15T10:30:00Z",
  "tags": ["golang", "rust", "python"]
}

Sizes:

  • JSON: 142 bytes
  • MessagePack: 88 bytes
  • CBOR: 90 bytes
  • Difference: CBOR ~2 bytes larger (negligible)

CBOR Best For:

  • IoT devices and embedded systems
  • Security applications (WebAuthn, COSE)
  • Standards-based systems (need RFC)
  • Cryptographic use (deterministic encoding)

Use MessagePack instead if:

  • General-purpose serialization
  • Performance critical (slight edge)
  • Simpler specification preferred
  • Wider ecosystem matters

Performance Benchmarks

Test Methodology

Environment:

  • CPU: Intel i7-12700K
  • RAM: 32GB DDR4
  • OS: Ubuntu 22.04
  • Languages: Node.js 20, Go 1.21, Python 3.11

Test data:

  • Small object: User profile (200 bytes JSON)
  • Medium object: API response (5 KB JSON)
  • Large array: 10,000 user objects (2 MB JSON)

Results: Small Object (200 bytes)

Encoding speed (ops/sec):

FormatJavaScriptGoPython
JSON1,245,0002,100,000385,000
MessagePack1,890,0003,200,000580,000
CBOR1,720,0002,950,000520,000
BSON945,0001,850,000310,000

Speedup vs JSON:

  • MessagePack: 1.5x
  • CBOR: 1.4x
  • BSON: 0.8x (slower)

Size:

  • JSON: 200 bytes
  • MessagePack: 128 bytes (36% smaller)
  • CBOR: 131 bytes (35% smaller)
  • BSON: 142 bytes (29% smaller)

Results: Medium Object (5 KB)

Encoding speed (ops/sec):

FormatJavaScriptGoPython
JSON52,00098,00018,500
MessagePack88,000165,00032,000
CBOR79,000152,00028,000
BSON41,00085,00015,000

Speedup vs JSON:

  • MessagePack: 1.7x
  • CBOR: 1.5x
  • BSON: 0.8x

Size:

  • JSON: 5,120 bytes
  • MessagePack: 3,280 bytes (36% smaller)
  • CBOR: 3,350 bytes (35% smaller)
  • BSON: 3,680 bytes (28% smaller)

Results: Large Array (2 MB, 10K objects)

Encoding time:

FormatJavaScriptGoPython
JSON125ms72ms385ms
MessagePack73ms41ms225ms
CBOR82ms48ms255ms
BSON145ms85ms425ms

Speedup vs JSON:

  • MessagePack: 1.7x
  • CBOR: 1.5x
  • BSON: 0.9x

Size:

  • JSON: 2.05 MB
  • MessagePack: 1.31 MB (36% smaller)
  • CBOR: 1.34 MB (35% smaller)
  • BSON: 1.48 MB (28% smaller)

Memory Usage

Peak memory during encoding (2 MB dataset):

FormatJavaScriptGoPython
JSON8.2 MB4.5 MB12.3 MB
MessagePack6.1 MB3.2 MB9.1 MB
CBOR6.4 MB3.4 MB9.5 MB
BSON7.8 MB4.1 MB11.8 MB
flowchart TB subgraph perf["Performance Characteristics"] size[Size Efficiency
36% smaller than JSON] speed[Parse Speed
1.7x faster] memory[Memory Usage
25% less memory] end subgraph formats["Binary Format Rankings"] msgpack[MessagePack
Best Overall Balance] cbor[CBOR
Standards Compliant] bson[BSON
MongoDB Extended Types] end perf --> formats style msgpack fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style cbor fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style bson fill:#4C4538,stroke:#6b7280,color:#f0f0f0 style perf fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style formats fill:#3A4C43,stroke:#6b7280,color:#f0f0f0

Key Takeaways

Size savings:

  • Binary formats: 28-36% smaller than JSON
  • MessagePack/CBOR most efficient
  • BSON less efficient (extended type overhead)

Speed improvements:

  • 1.5-1.7x faster encoding/decoding
  • Go implementations fastest
  • Python benefits most from binary formats

Memory efficiency:

  • 20-30% less memory than JSON
  • Streaming parsers reduce memory further

Benchmark Caveats:

  • Results vary by data structure (nested vs flat)
  • Implementation quality matters (library choice)
  • Compression changes the equation (gzip, zstd)
  • Network overhead may dominate (size less critical)
  • Always benchmark with YOUR actual data

Binary JSON vs Protocol Buffers

Both solve JSON’s performance problems, but through different philosophies:

Fundamental Difference

Binary JSON (MessagePack, CBOR):

  • Schemaless (like JSON)
  • Self-describing format
  • Flexible structure
  • No compilation step

Protocol Buffers:

  • Schema required
  • Schema compiled to code
  • Strict structure
  • Type safety enforced

Detailed Comparison

AspectBinary JSONProtocol Buffers
SchemaOptionalRequired
FlexibilityAdd fields freelySchema evolution rules
Size30-40% smaller than JSON50-70% smaller than JSON
Speed1.5-2x faster than JSON3-5x faster than JSON
Type safetyRuntime onlyCompile-time
VersioningImplicitExplicit (field numbers)
DebuggingCan inspect structureNeed schema to decode
SetupZero (just library)Schema compilation
Cross-languageParse anywhereGenerated code per language

Size Comparison

Sample user object:

1
2
3
4
5
6
7
{
  "id": 123,
  "username": "alice",
  "email": "alice@example.com",
  "age": 30,
  "active": true
}

Sizes:

  • JSON: 98 bytes
  • MessagePack: 62 bytes (37% smaller)
  • Protocol Buffers: 28 bytes (71% smaller)

Why Protocol Buffers is smaller:

  • Field numbers instead of names (1 byte vs “username” = 8 bytes)
  • Efficient varint encoding
  • No type markers (schema provides types)

When to Use Each

Use Binary JSON (MessagePack/CBOR) when:

  • Schema flexibility needed (rapid iteration)
  • Dynamic data structures (user-generated content)
  • Different clients need different fields
  • Simple setup (no compilation)
  • Debugging matters (self-describing)
  • Multiple data types in same stream

Use Protocol Buffers when:

  • Schema stability (defined API contract)
  • Maximum performance (size + speed)
  • Type safety critical
  • Versioning discipline needed
  • RPC systems (gRPC)
  • Long-term data storage

Hybrid Approaches

1. Protocol Buffers with JSON names:

1
2
3
4
message User {
  int32 id = 1 [json_name = "id"];
  string username = 2 [json_name = "username"];
}

Can serialize as JSON or binary.

2. MessagePack with schema validation:

1
2
3
4
5
6
7
8
const Ajv = require('ajv');
const msgpack = require('msgpack5')();

// Validate before encoding
const validate = ajv.compile(schema);
if (validate(data)) {
  const encoded = msgpack.encode(data);
}

3. Mixed protocols:

1
2
3
4
5
6
7
8
// JSON for configuration (human-edited)
const config = JSON.parse(fs.readFileSync('config.json'));

// MessagePack for high-volume data
const data = msgpack.decode(message);

// Protocol Buffers for RPC
const request = UserRequest.decode(buffer);

Migration Example

From JSON to MessagePack (gradual):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Step 1: Support both formats
app.post('/api/users', async (req, res) => {
  const contentType = req.headers['content-type'];
  
  let data;
  if (contentType === 'application/msgpack') {
    data = msgpack.decode(req.body);
  } else {
    data = JSON.parse(req.body);
  }
  
  // Process data...
  
  // Return in same format
  if (contentType === 'application/msgpack') {
    res.type('application/msgpack');
    res.send(msgpack.encode(result));
  } else {
    res.json(result);
  }
});

// Step 2: Update clients gradually
// Step 3: Monitor metrics (size, speed, errors)
// Step 4: Deprecate JSON after migration complete

For more on Protocol Buffers, see: Understanding Protocol Buffers: Part 1


Cloud Bandwidth Cost Savings

The Economics of Binary Formats

For commercial products with metered bandwidth, binary formats can dramatically reduce infrastructure costs.

Cloud provider pricing (examples):

  • AWS: $0.09/GB data transfer out (first 10TB/month)
  • Google Cloud: $0.12/GB egress (first 1TB/month)
  • Azure: $0.087/GB bandwidth (first 5TB/month)

Real-World Cost Analysis

Scenario: API serving 1 billion requests/month with 2KB average response

Text JSON:

  • 2KB Γ— 1,000,000,000 = 2,000 GB/month
  • At $0.09/GB = $180/month bandwidth costs

Protocol Buffers (60% size reduction):

  • 0.8KB Γ— 1,000,000,000 = 800 GB/month
  • At $0.09/GB = $72/month bandwidth costs
  • Savings: $108/month ($1,296/year)

MessagePack (40% size reduction):

  • 1.2KB Γ— 1,000,000,000 = 1,200 GB/month
  • At $0.09/GB = $108/month bandwidth costs
  • Savings: $72/month ($864/year)

Mobile API Cost Impact

Mobile apps on cellular networks are especially sensitive:

JSON response (5KB):

1
2
3
4
5
6
7
{
  "users": [
    {"id": 1, "username": "alice", "email": "alice@example.com", ...},
    {"id": 2, "username": "bob", "email": "bob@example.com", ...},
    // ... 50 users
  ]
}
  • Size: 5KB
  • 10M API calls/month = 50,000 GB
  • Cost: $4,500/month

MessagePack (3KB - 40% reduction):

  • Size: 3KB
  • 10M API calls/month = 30,000 GB
  • Cost: $2,700/month
  • Savings: $1,800/month ($21,600/year)

Protocol Buffers (2KB - 60% reduction):

  • Size: 2KB
  • 10M API calls/month = 20,000 GB
  • Cost: $1,800/month
  • Savings: $2,700/month ($32,400/year)

Break-Even Analysis

When does binary format investment pay off?

Implementation costs (one-time):

  • Developer time: 40-80 hours ($4,000-$8,000)
  • Testing and validation: 20-40 hours ($2,000-$4,000)
  • Documentation and training: 10-20 hours ($1,000-$2,000)
  • Total: $7,000-$14,000

Monthly savings from examples above:

  • Small API (1B requests): $72-$108/month β†’ ROI in 6-12 months
  • Mobile API (10M requests): $1,800-$2,700/month β†’ ROI in 3-5 months
  • Large API (10B requests): $7,200-$10,800/month β†’ ROI in 1 month
Cost Optimization Strategy: For APIs serving >100M requests/month or mobile apps with bandwidth-constrained users, binary formats often pay for themselves within 6 months purely from bandwidth savings - before considering performance improvements.

Additional Cost Benefits

Beyond bandwidth:

  1. Compute costs: Faster parsing = lower CPU usage = smaller instances
  2. Cache efficiency: Smaller payloads = more entries in fixed-size caches
  3. CDN costs: Many CDNs charge per GB - binary formats reduce bills
  4. Mobile UX: Faster responses = better retention = higher revenue

When Cost Savings Don’t Apply

Free tiers and small scale:

  • Personal projects within free tier limits
  • APIs with <10M requests/month
  • Internal tools on private networks (no egress charges)
  • Development/staging environments

Break-even threshold: ~50-100M requests/month depending on response size

Why Not Always Use Protocol Buffers?

Given the cost savings and performance benefits, why doesn’t everyone use Protocol Buffers for everything?

1. Schema Rigidity and Deployment Coordination

Protocol Buffers require compilation and strict schemas:

1
2
3
4
5
message User {
  int32 id = 1;
  string username = 2;
  string email = 3;
}

What happens when you need a new field:

  1. Update .proto file
  2. Regenerate code for all languages (Go, Python, JS, etc.)
  3. Deploy updated code to all services
  4. Coordinate deployments across teams
  5. Handle backward compatibility

JSON/MessagePack: Just add the field, it works immediately.

1
2
3
4
5
6
// JSON: Add field instantly
const user = {
  id: 123,
  username: "alice",
  newField: "works immediately"  // No compilation needed
};

Impact depends on your setup:

With mature tooling (automated pipeline):

  • make generate β†’ commit β†’ CI deploys
  • Similar velocity to JSON for established teams
  • Overhead: ~2-5 minutes for regeneration + deployment

Without automation (manual process):

  • Update proto β†’ manually regenerate β†’ test β†’ coordinate β†’ deploy
  • Cross-team coordination if shared protos
  • Overhead: 30 minutes to 2 hours depending on team size

Where this genuinely slows development:

  • Rapid prototyping: Trying different data shapes daily
  • A/B testing: Frontend experimenting with new fields
  • Cross-team dependencies: Service A waits for Service B’s proto update
  • Small teams: No dedicated DevOps to automate workflow

2. Dynamic Data Structures

User-generated content doesn’t fit schemas:

1
2
3
4
5
6
7
8
9
{
  "post_id": "abc123",
  "content": "Hello world",
  "metadata": {
    "custom_field_1": "user defined",
    "custom_field_2": 42,
    "arbitrary_key": ["dynamic", "array"]
  }
}

With Protobuf, you’d need:

1
2
3
4
5
message Post {
  string post_id = 1;
  string content = 2;
  map<string, google.protobuf.Any> metadata = 3;  // Loses type safety
}

You end up with Any types everywhere, defeating the purpose of schemas.

Use cases requiring flexibility:

  • CMS platforms (arbitrary fields per content type)
  • Analytics events (different properties per event)
  • Plugin systems (plugins add their own fields)
  • Form builders (user-defined form schemas)

3. Developer Experience Friction

JSON workflow (instant feedback):

1
2
3
4
curl https://api.example.com/users/123
# See data immediately in terminal
# Copy/paste into docs
# Share with coworkers in Slack

Protobuf workflow (requires tooling):

1
2
3
4
5
curl https://api.example.com/users/123
# Get binary garbage: β–’β–’β–’aliceβ–’β–’β–’
# Need protoc to decode
# Need .proto files
# Need to explain to frontend devs

Onboarding cost:

  • New developers must learn protobuf toolchain
  • Need IDE plugins for syntax highlighting
  • Need to understand wire format for debugging
  • Harder to write integration tests

4. Browser and Client Limitations

JavaScript ecosystem challenges:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// JSON: Native support
fetch('/api/users')
  .then(r => r.json())  // Built-in
  .then(data => console.log(data));

// Protobuf: Requires libraries and setup
import { User } from './generated/user_pb.js';  // 50KB+ bundle size

fetch('/api/users')
  .then(r => r.arrayBuffer())
  .then(buf => {
    const user = User.deserializeBinary(new Uint8Array(buf));
    // More complex API
  });

Bundle size impact:

  • protobuf.js: ~50KB minified
  • JSON: 0KB (native)
  • For small apps, protobuf library is larger than data savings

5. Third-Party Integrations

Many services only accept JSON:

  • Webhooks (Stripe, GitHub, etc.)
  • Logging services (Datadog, Splunk)
  • Monitoring tools (Prometheus, Grafana)
  • CI/CD systems (GitHub Actions, GitLab)

You’d need JSON anyway for integrations.

6. Rapid Prototyping and Exploratory Development

Early-stage development priorities:

  • Ship fast, iterate quickly
  • Schema changes frequently
  • Developer velocity > optimization
  • Unknown requirements

Protobuf’s schema-first approach adds friction during exploration phase.

Example: Evolving user model

  • Week 1: User has name field
  • Week 2: Split into first_name and last_name
  • Week 3: Add optional middle_name
  • Week 4: Support international names (single field after all)

With JSON: Immediate changes, no regeneration
With Protobuf: Regeneration each iteration (adds 2-5 minutes per change with automation, more without)

This matters most when:

  • Requirements are unknown or changing daily
  • Team is experimenting with different approaches
  • Product-market fit not yet established
  • Schema volatility is high

Less relevant when:

  • API contracts are stable
  • Team has established patterns
  • Schema changes are infrequent (monthly, not daily)

7. Mixed Data Scenarios

Real applications use multiple formats:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// Config files: JSON (human-edited)
const config = require('./config.json');

// API responses: JSON (client compatibility)
app.get('/api/users', (req, res) => {
  res.json(users);
});

// Internal RPC: Protobuf (performance critical)
const response = await internalService.getUsers(request);

// Logs: JSON Lines (tooling compatibility)
logger.info({userId: 123, action: 'login'});

Using protobuf everywhere would mean:

  • Config files need compilation
  • Logs need special tools
  • API clients need protobuf libraries
  • Higher complexity for marginal additional gains

8. When Protobuf Makes Sense

Use Protocol Buffers when:

  • High-scale APIs (>100M requests/month) - cost savings justify complexity
  • Internal microservices - control both ends, can coordinate schemas
  • Performance-critical paths - gRPC for low-latency RPC
  • Stable APIs - schema rarely changes
  • Type safety matters - compilation catches errors
  • Mobile apps - bandwidth constrained, latency sensitive

Stick with JSON/MessagePack when:

  • Public APIs - broad compatibility needed
  • Rapid iteration - schema changes frequently
  • Simple projects - not worth the tooling overhead
  • Browser clients - avoid bundle size bloat
  • Third-party integrations - JSON required anyway
  • Development/staging - easier debugging
The Real Answer: Most successful systems use both. JSON for public APIs and configuration, Protobuf for internal high-traffic RPC. The “always use X” approach ignores the trade-offs between developer velocity, operational complexity, and performance gains.

Real-World Use Cases

1. High-Throughput API (MessagePack)

Scenario: API serving 50K requests/sec, 5KB average response

Before (JSON):

  • Response size: 5 KB
  • Parse time: 2.1ms
  • Network: 250 Mbps
  • Memory: 12 GB

After (MessagePack):

  • Response size: 3.2 KB (36% smaller)
  • Parse time: 1.2ms (43% faster)
  • Network: 160 Mbps (36% reduction)
  • Memory: 8.5 GB (29% reduction)

Implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// Express middleware
app.use((req, res, next) => {
  res.sendMsgPack = (data) => {
    res.type('application/msgpack');
    res.send(msgpack.encode(data));
  };
  next();
});

app.get('/api/products', async (req, res) => {
  const products = await db.products.find();
  res.sendMsgPack(products);
});

// Client
const response = await fetch('/api/products', {
  headers: {'Accept': 'application/msgpack'}
});
const buffer = await response.arrayBuffer();
const products = msgpack.decode(Buffer.from(buffer));

2. Mobile App (MessagePack)

Scenario: Mobile app on cellular networks, battery-conscious

Benefits:

  • 35% less bandwidth (cost savings)
  • Faster parsing (battery savings)
  • Better on slow networks

Implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// React Native client
import msgpack from 'react-native-msgpack';

async function fetchData(endpoint) {
  const response = await fetch(API_URL + endpoint, {
    headers: {
      'Accept': 'application/msgpack',
      'Content-Type': 'application/msgpack'
    }
  });
  
  const buffer = await response.arrayBuffer();
  return msgpack.decode(new Uint8Array(buffer));
}

async function postData(endpoint, data) {
  const encoded = msgpack.encode(data);
  
  const response = await fetch(API_URL + endpoint, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/msgpack'
    },
    body: encoded
  });
  
  const buffer = await response.arrayBuffer();
  return msgpack.decode(new Uint8Array(buffer));
}

3. IoT Device Communication (CBOR)

Scenario: Temperature sensors sending data every minute

Device code (embedded C):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include "cbor.h"

void send_reading() {
    CborEncoder encoder, map;
    uint8_t buffer[128];
    
    cbor_encoder_init(&encoder, buffer, sizeof(buffer), 0);
    cbor_encoder_create_map(&encoder, &map, 4);
    
    cbor_encode_text_stringz(&map, "device_id");
    cbor_encode_text_stringz(&map, "sensor-001");
    
    cbor_encode_text_stringz(&map, "temperature");
    cbor_encode_float(&map, 23.5);
    
    cbor_encode_text_stringz(&map, "humidity");
    cbor_encode_float(&map, 65.2);
    
    cbor_encode_text_stringz(&map, "timestamp");
    cbor_encode_int(&map, time(NULL));
    
    cbor_encoder_close_container(&encoder, &map);
    
    size_t length = cbor_encoder_get_buffer_size(&encoder, buffer);
    send_to_gateway(buffer, length);
}

Gateway (Node.js):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
const cbor = require('cbor');

function processReading(buffer) {
  const reading = cbor.decode(buffer);
  
  console.log(`Device: ${reading.device_id}`);
  console.log(`Temp: ${reading.temperature}Β°C`);
  console.log(`Humidity: ${reading.humidity}%`);
  
  // Store in time-series database
  influx.writePoints([{
    measurement: 'temperature',
    tags: {device: reading.device_id},
    fields: {
      value: reading.temperature,
      humidity: reading.humidity
    },
    timestamp: reading.timestamp * 1000000000
  }]);
}

Benefits:

  • 45% smaller than JSON (bandwidth critical)
  • Standardized format (IETF RFC)
  • Simple parsing on embedded devices
  • Low memory footprint

Choosing Your Binary Format

Binary JSON formats solve the performance limitations of text JSON for API and data transfer while maintaining structural flexibility. The choice depends on your specific needs:

Decision Matrix

Choose MessagePack if:

  • General-purpose binary serialization
  • Maximum speed and size efficiency
  • Microservice communication
  • Message queues, caching layers
  • Wide language support needed

Choose CBOR if:

  • IoT or embedded systems
  • Security applications (WebAuthn, COSE)
  • Need IETF standard
  • Deterministic encoding required

Choose Protocol Buffers if:

  • Maximum performance (size + speed)
  • Schema enforcement critical
  • Long-term data storage
  • RPC systems (gRPC)

Stick with JSON if:

  • Human readability critical (configs, logs)
  • Debugging frequency high
  • Payloads small (<10 KB)
  • Performance acceptable
  • Simplicity trumps efficiency

What We Learned

Binary formats provide:

  • 30-40% size reduction over JSON
  • 1.5-2x faster parsing
  • Extended type systems (dates, binary data)
  • Better memory efficiency
  • Significant bandwidth cost savings

Trade-offs:

  • Loss of human-readability
  • Binary debugging tools needed
  • Schema drift without validation
  • Ecosystem smaller than JSON

Key insight: Binary formats fill the gap between JSON’s simplicity and Protocol Buffers’ schema enforcement. They’re the right choice when JSON’s performance matters but schema flexibility is still needed.


What’s Next: Streaming JSON

We’ve optimized JSON storage (Part 3) and network transfer (this part). But what about processing large datasets that don’t fit in memory? What about streaming APIs and log processing?

In Part 5 , we’ll explore JSON-RPC - adding structured RPC protocols on top of JSON for API consistency and type safety. Then in Part 6 , we’ll tackle streaming with JSON Lines (JSONL) for processing gigabytes of data without running out of memory.

Coming up:

  • JSON-RPC: Structured remote procedure calls
  • JSON Lines: Streaming and big data processing
  • Security considerations: JWT, canonicalization, and attacks

The goal remains the same - extending JSON’s capabilities while maintaining its fundamental simplicity and flexibility.


References

Specifications:

Libraries:

Performance:

Related: