You Don't Know JSON: Part 1 - Origins, Evolution, and the Cracks in the Foundation

The complete history of JSON from Douglas Crockford's discovery to today's dominance. Learn why JSON replaced XML, where it fails, and why the JSON ecosystem evolved beyond basic key-value pairs.

tags: #Json #Data-Formats #Xml #Yaml #Serialization #Web-Development #Api-Design #Javascript #Rest-Api #Data-Interchange #Json-Schema #Standards #Rfc #History #Web-Standards #Parsing #Validation #Configuration #Distributed-Systems #Microservices
categories: Fundamentals Programming
published: 2025-12-15
reading time: 18 minutes

📚 Series: You Dont Know JSON

You Don't Know JSON: Part 1 - Origins, Evolution, and the Cracks in the Foundation (current)
You Don't Know JSON: Part 2 - JSON Schema and the Art of Validation
You Don't Know JSON: Part 3 - Binary JSON in Databases
You Don't Know JSON: Part 4 - Binary JSON for APIs and Data Transfer
You Don't Know JSON: Part 5 - JSON-RPC: When REST Isn't Enough
You Don't Know JSON: Part 6 - JSON Lines: Processing Gigabytes Without Running Out of Memory
You Don't Know JSON: Part 7 - Security: Authentication, Signatures, and Attacks
You Don't Know JSON: Part 8 - Lessons from the JSON Revolution

Every developer knows JSON. You’ve written {"key": "value"} thousands of times. You’ve debugged missing commas, fought with trailing characters, and cursed the lack of comments in configuration files.

But how did we get here? Why does the world’s most popular data format have such obvious limitations? And why, despite being “simple,” has JSON spawned an entire ecosystem of variants, extensions, and workarounds?

This series explores the JSON you don’t know - the one beyond basic syntax. We’ll examine binary formats, streaming protocols, validation schemas, RPC layers, and security considerations. But first, we need to understand why JSON exists and where it falls short.

What XML Had: Everything built-in (1998-2005)

XML’s approach: Monolithic specification with validation (XSD), transformation (XSLT), namespaces, querying (XPath), protocols (SOAP), and security (XML Signature/Encryption) all integrated into one ecosystem.

1
2
3
4
5
6
<!-- XML had it all in one place -->
<user xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="user.xsd">
  <name>Alice</name>
  <email>alice@example.com</email>
</user>

Benefit: Complete solution with built-in type safety, validation, and extensibility
Cost: Massive complexity, steep learning curve, rigid coupling between features

JSON’s approach: Minimal core with separate standards for each need

Architecture shift: Integrated → Modular, Everything built-in → Composable solutions, Monolithic → Ecosystem-driven

The Pre-JSON Dark Ages: XML Everywhere

The Problem Space (Late 1990s)

The web was growing explosively. Websites evolved from static HTML to dynamic applications. Services needed to communicate across networks, applications needed configuration files, and developers needed a way to move structured data between systems.

The requirements were clear:

Human-readable (developers must debug it)
Machine-parseable (computers must process it)
Language-agnostic (works in any programming language)
Supports nested structures (real data has hierarchy)
Self-describing (data carries its own schema)

XML: The Heavyweight Champion

XML (eXtensible Markup Language) emerged as the answer. By the early 2000s, it dominated:

XML everywhere:

Configuration files (web.xml, applicationContext.xml)
SOAP web services (the enterprise standard)
Data exchange (RSS, Atom feeds)
Document formats (DOCX, SVG)
Build systems (Maven pom.xml, Ant build.xml)

A simple person record in XML:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<?xml version="1.0" encoding="UTF-8"?>
<person>
  <name>Alice Johnson</name>
  <email>alice@example.com</email>
  <age>30</age>
  <active>true</active>
  <hobbies>
    <hobby>reading</hobby>
    <hobby>cycling</hobby>
  </hobbies>
</person>

Size: 247 bytes

XML’s Strengths

XML wasn’t chosen arbitrarily. It had real advantages:

+ Schema validation (XSD, DTD, RelaxNG)
+ Namespaces (avoid naming conflicts)
+ XPath (query language)
+ XSLT (transformation)
+ Comments (documentation support)
+ Attributes and elements (flexible modeling)
+ Mature tooling (parsers in every language)

XML’s Fatal Flaws: The Monolithic Architecture

But XML’s complexity became its downfall. The problem wasn’t any single feature - it was the architectural decision to build everything into one specification.

XML wasn’t just a data format. It was an entire technology stack:

Core XML (parsing and structure):

DOM (Document Object Model) - load entire document into memory
SAX (Simple API for XML) - event-driven streaming parser
StAX (Streaming API for XML) - pull parser
Namespace handling (xmlns declarations)
Entity resolution (external references)
CDATA sections (unparsed character data)

Validation layer (built-in):

DTD (Document Type Definition) - original schema language
XSD (XML Schema Definition) - complex type system
RelaxNG - alternative schema language
Schematron - rule-based validation

Query layer (built-in):

XPath - query language for selecting nodes
XQuery - SQL-like language for XML
XSLT - transformation and templating

Protocol layer (built-in):

SOAP (Simple Object Access Protocol)
WSDL (Web Services Description Language)
WS-Security, WS-ReliableMessaging, WS-AtomicTransaction
50+ WS-* specifications

The architectural problem: Every XML parser had to support this entire stack. You couldn’t use XML without dealing with namespaces. You couldn’t validate without learning XSD. You couldn’t query without XPath.

The result:

XML parsers: 50,000+ lines of code
XSD validators: Complex type systems rivaling programming languages
SOAP toolkits: Megabytes of libraries just to call a remote function
Learning curve: Months to master the ecosystem

Specific pain points:

Verbosity:

1
2
3
4
<user>
  <name>Alice</name>
  <email>alice@example.com</email>
</user>

1
{"name":"Alice","email":"alice@example.com"}

Namespace confusion:

1
2
3
4
5
6
7
8
9
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <soap:Body>
    <GetUser xmlns="http://example.com/users">
      <UserId>123</UserId>
    </GetUser>
  </soap:Body>
</soap:Envelope>

Schema complexity (XSD):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="user">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="name" type="xs:string"/>
        <xs:element name="email" type="xs:string"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

vs JSON Schema:

1
2
3
4
5
6
7
{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "email": {"type": "string"}
  }
}

The real killer: Developer experience. Writing XML by hand was tedious. Reading XML logs was painful. Debugging SOAP requests required specialized tools. The monolithic architecture meant you couldn’t use just the parts you needed - it was all or nothing.

flowchart TB subgraph xml["XML Complexity"] parse[XML Parser] ns[Namespace Handler] schema[Schema Validator] xpath[XPath Processor] parse --> ns ns --> schema schema --> xpath end subgraph json["JSON Simplicity"] jsparse[JSON.parse] end data[Raw Data] --> xml data --> json xml --> result1[Parse Result] json --> result2[Parse Result] style xml fill:#4C3A3C,stroke:#6b7280,color:#f0f0f0 style json fill:#3A4C43,stroke:#6b7280,color:#f0f0f0

JSON’s Accidental Discovery

Douglas Crockford’s Realization (2001)

JSON wasn’t invented - it was discovered. Douglas Crockford realized that JavaScript’s object literal notation was already a perfect data format:

1
2
3
4
5
6
7
8
// JavaScript code that's also data
var person = {
    name: "Alice Johnson",
    email: "alice@example.com",
    age: 30,
    active: true,
    hobbies: ["reading", "cycling"]
};

Key insight: This notation was:

Already in JavaScript engines (browsers everywhere)
Minimal syntax (no closing tags)
Easy to parse (recursive descent parser is ~500 lines)
Human-readable
Machine-friendly

The Same Data in JSON

1
2
3
4
5
6
7
{
  "name": "Alice Johnson",
  "email": "alice@example.com",
  "age": 30,
  "active": true,
  "hobbies": ["reading", "cycling"]
}

Size: 129 bytes (52% smaller than XML)

The Simplicity Revolution

JSON’s radical simplification:

Six data types:

object - { "key": "value" }
array - [1, 2, 3]
string - "text"
number - 123 or 123.45
boolean - true or false
null - null

That’s it. No attributes. No namespaces. No CDATA sections. No processing instructions.

Browser Native Support

The killer feature:

1
2
3
4
5
// Parse JSON (browsers built-in)
var data = JSON.parse(jsonString);

// Generate JSON
var json = JSON.stringify(data);

No XML parser library needed. No SAX vs DOM decision. Just two functions.

timeline title Evolution of Data Formats 1998 : XML 1.0 Specification : SOAP begins development 2001 : JSON discovered by Crockford : First JSON parsers appear 2005 : JSON used in AJAX applications : Web 2.0 movement 2006 : RFC 4627 - JSON specification : JSON becomes formal standard 2013 : RFC 7159 - Updated JSON spec : ECMA-404 standard 2017 : RFC 8259 - Current JSON standard : JSON dominates REST APIs 2020+ : JSON Schema, JSONB, JSONL : JSON ecosystem mature

Why JSON Won

1. The AJAX Revolution (2005)

Google Maps launched and changed everything. AJAX (Asynchronous JavaScript and XML) applications became the future of the web.

Irony: Despite the name, JSON quickly replaced XML in AJAX because:

Faster to parse in JavaScript
Smaller payloads (bandwidth mattered on 2005 connections)
Native browser support
Easier for front-end developers

2. REST vs SOAP

REST APIs adopted JSON as the default format:

SOAP request (XML):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
  <soap:Header>
  </soap:Header>
  <soap:Body>
    <m:GetUser xmlns:m="http://example.com/users">
      <m:UserId>123</m:UserId>
    </m:GetUser>
  </soap:Body>
</soap:Envelope>

REST request (JSON):

1
2
GET /users/123
Accept: application/json

REST response:

1
2
3
4
5
{
  "id": 123,
  "name": "Alice Johnson",
  "email": "alice@example.com"
}

The difference was stark. REST + JSON became the de facto standard for web APIs.

3. NoSQL Movement (2009+)

MongoDB, CouchDB, and other NoSQL databases chose JSON-like formats:

1
2
3
4
5
6
7
// MongoDB document (BSON internally)
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "name": "Alice Johnson",
  "email": "alice@example.com",
  "created": ISODate("2023-01-15T10:30:00Z")
}

Why JSON for databases:

Schema flexibility (add fields without migrations)
Direct JavaScript integration
Document model matches JSON structure
Query results are already in API format

4. Configuration Files

JSON displaced XML in configuration:

package.json (Node.js):

1
2
3
4
5
6
7
{
  "name": "my-app",
  "version": "1.0.0",
  "dependencies": {
    "express": "^4.18.0"
  }
}

tsconfig.json (TypeScript):

1
2
3
4
5
6
7
{
  "compilerOptions": {
    "target": "ES2020",
    "module": "commonjs",
    "strict": true
  }
}

Developers preferred JSON over XML for configuration because it was easier to read and edit.

5. Language Support Explosion

By 2010, every major language had JSON support:

Go:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import "encoding/json"

type Person struct {
    Name  string   `json:"name"`
    Email string   `json:"email"`
    Age   int      `json:"age"`
}

json.Marshal(person)   // encode
json.Unmarshal(data, &person)  // decode

Python:

1
2
3
4
5
import json

person = {"name": "Alice", "email": "alice@example.com"}
json.dumps(person)  # encode
json.loads(data)    # decode

Java:

1
2
3
4
5
import com.fasterxml.jackson.databind.ObjectMapper;

ObjectMapper mapper = new ObjectMapper();
String json = mapper.writeValueAsString(person);  // encode
Person person = mapper.readValue(json, Person.class);  // decode

The Ecosystem Effect: Once every language had JSON support, it became the obvious choice for data interchange. Network effects made JSON the default - not because it was technically superior, but because it was universally supported.

flowchart LR subgraph formats["Data Format Comparison"] xml[XML
247 bytes] json[JSON
129 bytes] yaml[YAML
98 bytes] end subgraph metrics["Key Metrics"] size[Size] parse[Parse Speed] write[Write Speed] human[Readability] end formats --> metrics json -.Best Balance.-> metrics style xml fill:#4C3A3C,stroke:#6b7280,color:#f0f0f0 style json fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style yaml fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style metrics fill:#4C4538,stroke:#6b7280,color:#f0f0f0

JSON’s Fundamental Weaknesses

Now we reach the core problem. JSON won because it was simple. But that simplicity came with trade-offs that become painful at scale.

1. No Schema or Validation

The problem:

1
2
3
4
{
  "name": "Alice",
  "age": "30"
}

Is age a string or a number? Both are valid JSON. The parser accepts both. Your application crashes when it expects a number.

Real-world consequences:

API breaking changes go undetected
Invalid data passes validation
Runtime errors instead of compile-time checks
Documentation is separate from data format
Client-server contract is implicit, not explicit

2. No Date/Time Type

JSON has no standard way to represent dates:

1
2
3
{
  "created": "2023-01-15"
}

1
2
3
{
  "created": "2023-01-15T10:30:00Z"
}

1
2
3
{
  "created": 1673780400
}

All are valid JSON. Which format do you use? ISO 8601 string? Unix timestamp? Custom format?

Every project reinvents this. Libraries make assumptions. APIs document their chosen format. Parsing errors happen when formats don’t match.

3. Number Precision Issues

JavaScript uses IEEE 754 double-precision floats for all numbers:

1
2
3
// JavaScript
console.log(9007199254740992 + 1);  // 9007199254740992
// Lost precision!

Critical Production Issue: JSON’s number type causes real-world failures:

Database IDs beyond 2^53 silently corrupt (Snowflake IDs, Twitter IDs)
Financial calculations lose cents ($1234.56 becomes $1234.5599999999)
Timestamps break (millisecond precision lost after 2^53)
Different languages parse differently (Python preserves precision, JavaScript doesn’t)

This isn’t theoretical - major APIs (Twitter, Stripe, GitHub) return large IDs as strings to prevent JavaScript corruption. If your API has >10M records with auto-increment IDs, you WILL hit this.

Problems:

Large integers lose precision (database IDs, timestamps)
No distinction between integer and float
Different languages handle this differently
Financial calculations require special handling

Common workaround:

1
2
3
4
{
  "id": "9007199254740993",
  "balance": "1234.56"
}

Represent numbers as strings to preserve precision. But now you need custom parsing logic.

Real-world examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Twitter API returns IDs as strings
{
  "id": 1234567890123456789,       // Unsafe in JavaScript
  "id_str": "1234567890123456789"  // Always use this
}

// Stripe amounts are integers (cents)
{
  "amount": 123456,  // $1234.56 as integer cents
  "currency": "usd"
}

// Shopify order numbers as strings
{
  "order_number": "1001",  // String to avoid precision issues
  "total": "29.99"         // String for exact decimal
}

4. No Comments

You cannot add comments to JSON:

1
2
3
4
{
  "port": 8080,
  "debug": true
}

Why is debug enabled? What does this configuration do? You can’t document it in the file itself.

Workarounds:

1
2
3
4
{
  "_comment": "Enable debug mode in development",
  "debug": true
}

Use fake fields for comments. But parsers still process these as data.

5. No Binary Data Support

JSON is text-based. Binary data must be encoded:

1
2
3
{
  "image": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
}

Problems:

Base64 encoding increases size by ~33%
Additional encoding/decoding overhead
Not efficient for large binary files

6. Verbose for Large Datasets

Repeated field names add significant overhead:

1
2
3
4
5
[
  {"id": 1, "name": "Alice", "email": "alice@example.com"},
  {"id": 2, "name": "Bob", "email": "bob@example.com"},
  {"id": 3, "name": "Carol", "email": "carol@example.com"}
]

Field names (“id”, “name”, “email”) repeat for every record. In a 100,000 row dataset, this is wasteful.

CSV alternative (for comparison):

1
2
3
4
id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com
3,Carol,carol@example.com

More compact, but loses type information and nested structure support.

7. No Circular References

JSON cannot represent circular references:

1
2
3
4
5
6
7
// JavaScript object
let person = {name: "Alice"};
let company = {name: "Acme Corp", ceo: person};
person.employer = company;  // Circular reference

JSON.stringify(person);  
// TypeError: Converting circular structure to JSON

You must manually break cycles or use a serialization library that detects and handles them.

Critical Insight: JSON’s weaknesses aren’t bugs - they’re consequences of extreme simplification. Every missing feature (schemas, comments, binary support) was left out intentionally to keep the format minimal.

The Format Comparison Landscape

Let’s compare JSON to its alternatives across key dimensions:

Feature	JSON	XML	YAML	TOML	Protocol Buffers
Human-readable	Yes	Yes	Yes	Yes	No
Schema validation	No*	Yes	No	No	Yes
Comments	No	Yes	Yes	Yes	No
Binary support	No	No	No	No	Yes
Date types	No	No	No	Yes	Yes
Size efficiency	Medium	Large	Medium	Medium	Small
Parse speed	Fast	Slow	Medium	Medium	Very Fast
Language support	Universal	Universal	Wide	Growing	Wide
Nested structures	Yes	Yes	Yes	Limited	Yes
Trailing commas	No	N/A	Yes	Yes	N/A
Type safety	No	Yes	No	Partial	Yes

*JSON Schema provides validation but isn’t part of JSON itself.

flowchart TB subgraph decision["Choose Your Format"] start{What's the use case?} config{Human-edited
configuration?} api{API/Network
transfer?} perf{Performance
critical?} legacy{Legacy system
integration?} start --> config start --> api start --> perf start --> legacy config -->|Need comments| yaml[YAML] config -->|Simple config| toml[TOML] config -->|Complex schema| xml[XML] api -->|Web APIs| json[JSON] api -->|Microservices| proto[Protocol Buffers] perf -->|Extreme perf| proto2[Protocol Buffers] perf -->|Binary + schema| msgpack[MessagePack/CBOR] legacy -->|Enterprise| xml2[XML/SOAP] legacy -->|Tabular data| csv[CSV] end style decision fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style json fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style proto fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style proto2 fill:#3A4C43,stroke:#6b7280,color:#f0f0f0

When NOT to Use JSON

Despite JSON’s dominance, there are clear cases where alternatives are better:

1. High-Performance Systems → Protocol Buffers, FlatBuffers

When you’re handling millions of requests per second, Protocol Buffers offer compelling advantages:

1
2
3
4
5
message Person {
  string name = 1;
  string email = 2;
  int32 age = 3;
}

Benefits:

3-10x smaller than JSON
5-20x faster to parse
Schema enforced at compile time
Backward/forward compatibility built-in

Trade-off: Not human-readable, requires schema compilation.

2. Human-Edited Configuration → YAML, TOML, JSON5

When developers edit config files frequently:

TOML:

1
2
3
4
5
6
7
8
[database]
host = "localhost"
port = 5432

# Connection pool settings
[database.pool]
max_connections = 100
min_connections = 10

YAML:

1
2
3
4
5
6
database:
  host: localhost
  port: 5432
  pool:
    max_connections: 100
    min_connections: 10  # Minimum pool size

Benefits: Comments, less syntax noise, more readable.

Trade-off: YAML has subtle parsing gotchas (indentation, special values like no/yes).

3. Large Tabular Datasets → CSV, Parquet, Arrow

For analytics and data pipelines:

1
2
3
id,name,email,created
1,Alice,alice@example.com,2023-01-15
2,Bob,bob@example.com,2023-01-16

Benefits: Much more compact, streaming-friendly, tooling optimized for analysis.

Trade-off: No nested structures, limited type information.

4. Document Storage → BSON, MessagePack

When JSON-like flexibility meets binary efficiency:

1
2
3
4
5
6
7
// MongoDB (BSON)
{
  _id: ObjectId("507f1f77bcf86cd799439011"),
  name: "Alice",
  created: ISODate("2023-01-15T10:30:00Z"),
  avatar: BinData(0, "base64data...")
}

Benefits: Native date types, binary data support, efficient storage.

Trade-off: Binary format, language-specific implementations.

The Evolution: JSON’s Ecosystem Response

JSON’s limitations didn’t kill it. Instead, an entire ecosystem evolved to address the weaknesses while preserving the core simplicity:

The Architectural Choice: XML’s completeness was a weakness - validation, namespaces, transformation, and querying were built into one monolithic specification. Every XML parser needed to support everything, making the system rigid and complex. JSON chose the opposite path: radical incompleteness. The core format has no validation, no binary support, no streaming, no protocol conventions. Each gap would be filled by modular, composable solutions that could evolve independently.

1. Validation Layer: JSON Schema

Problem: No built-in validation
Solution: External schema language

1
2
3
4
5
6
7
8
9
{
  "type": "object",
  "properties": {
    "name": {"type": "string", "minLength": 1},
    "email": {"type": "string", "format": "email"},
    "age": {"type": "integer", "minimum": 0}
  },
  "required": ["name", "email"]
}

Transformation: This single innovation transformed JSON from “hope the data is correct” to “validate at runtime with strict schemas.” JSON Schema adds the type safety layer that JSON itself deliberately omitted.

Next article: Part 2 dives deep into JSON Schema - how it works, why it matters, and how it solves JSON’s validation problem.

2. Binary Variants: JSONB, BSON, MessagePack

Problem: Text format is inefficient
Solution: Binary encoding with JSON-like structure

These formats maintain JSON’s structure while using efficient binary serialization :

PostgreSQL JSONB: Decomposed binary format, indexable, faster queries
MongoDB BSON: Binary JSON with extended types
MessagePack: Universal binary serialization

3. Streaming Format: JSON Lines (JSONL)

Problem: JSON arrays don’t stream
Solution: Newline-delimited JSON objects

{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Carol"}

Each line is independent, enabling streaming, log files, and Unix pipeline processing.

4. Protocol Layer: JSON-RPC

Problem: No standard RPC convention
Solution: Structured request/response format

1
2
3
4
5
6
{
  "jsonrpc": "2.0",
  "method": "getUser",
  "params": {"id": 123},
  "id": 1
}

Used by Ethereum, LSP (Language Server Protocol), and many other systems.

5. Human-Friendly Variants: JSON5, HJSON

Problem: No comments, strict syntax
Solution: Relaxed JSON with comments and trailing commas

JSON deliberately omitted developer-friendly features to stay minimal. For machine-to-machine communication, this is fine. For configuration files humans edit daily, it’s painful.

JSON5 adds convenience features while maintaining JSON compatibility:

{
  // Single-line comments
  /* Multi-line comments */
  
  name: 'my-app',              // Unquoted keys
  port: 8080,
  features: {
    debug: false,              // Trailing commas OK
    maxConnections: 1_000,     // Numeric separators
  },
  
  // Multi-line strings
  description: 'This is a \
    multi-line description',
}

HJSON goes further with extreme readability:

{
  # Hash comments (like YAML)
  
  # Quotes optional for strings
  name: my-app
  port: 8080
  
  # Commas optional
  features: {
    debug: false
    maxConnections: 1000
  }
  
  # Multi-line strings without escaping
  description:
    '''
    This is a naturally
    multi-line description
    '''
}

Comparison:

Feature	JSON	JSON5	HJSON	YAML	TOML
Comments	No	Yes	Yes	Yes	Yes
Trailing commas	No	Yes	Yes	N/A	N/A
Unquoted keys	No	Yes	Yes	Yes	Yes
Unquoted strings	No	No	Yes	Yes	Yes
Native browser support	Yes	No	No	No	No
Designed for configs	No	Partial	Yes	Yes	Yes

When to use:

JSON5:

VSCode settings (.vscode/settings.json5)
Build tool configs where JSON is expected
Need comments but want JSON compatibility

HJSON:

Developer-facing configs prioritizing readability
Local development settings
Documentation examples

Standard JSON:

APIs and data interchange
Production configs (parsed by machines)
Anything needing browser/native support

Why they’re niche: Unlike JSON Schema (essential for validation) or JSONB (essential for performance), JSON5/HJSON solve a convenience problem that YAML and TOML also solve. Most teams choose YAML or TOML for configuration files - they were designed for this purpose from the start and have broader ecosystem support.

The Configuration Choice: For human-edited configs, the ecosystem offers multiple solutions - JSON5, HJSON, YAML, TOML. Each makes different trade-offs between readability, features, and compatibility. JSON5 stays closest to JSON, YAML is most popular, TOML is clearest for nested config. The choice depends on your team’s preferences and tooling.

6. Security Layer: JWS, JWE

Problem: No built-in security
Solution: JSON Web Signatures and Encryption standards

flowchart TB subgraph core["JSON Core (2001)"] json[JSON Specification
RFC 8259] end subgraph extensions["JSON Ecosystem (2005-2025)"] schema[JSON Schema
Validation] jsonb[JSONB/BSON
Binary Storage] jsonl[JSON Lines
Streaming] rpc[JSON-RPC
Protocols] json5[JSON5/HJSON
Human-Friendly] jwt[JWT/JWS/JWE
Security] end json --> schema json --> jsonb json --> jsonl json --> rpc json --> json5 json --> jwt style core fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style extensions fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0

Running Example: Building a User API

Throughout this series, we’ll follow a single use case: a User API for a social platform. Each part will show how that layer of the ecosystem improves this real-world scenario.

The scenario:

REST API for user management
10 million users in PostgreSQL
Mobile and web clients
Need authentication, validation, performance, and security

Part 1 (this article): The basic JSON structure

1
2
3
4
5
6
7
8
9
{
  "id": "user-5f9d88c",
  "username": "alice",
  "email": "alice@example.com",
  "created": "2023-01-15T10:30:00Z",
  "bio": "Software engineer",
  "followers": 1234,
  "verified": true
}

What’s missing:

No validation (what if email is invalid?)
Inefficient storage (text format repeated 10M times)
Can’t stream user exports (arrays don’t stream)
No authentication (how do we secure this?)
No protocol (how do clients call getUserById?)

The journey ahead:

Part 2: Add JSON Schema validation for type safety
Part 3: Store users in PostgreSQL JSONB for performance
Part 5: Add JSON-RPC protocol for structured API calls
Part 5: Export users with JSON Lines for streaming
Part 6: Secure API with JWT authentication

This single API will demonstrate how each ecosystem layer solves a real problem.

Conclusion: JSON’s Success Through Simplicity

JSON won not because it was perfect, but because it was simple enough to understand, implement, and adopt universally. Its weaknesses are real, but they’re addressable through layered solutions.

What made JSON win:

Minimal syntax (6 data types, simple rules)
Browser native support (JSON.parse/stringify)
Perfect timing (AJAX era, REST movement)
Universal language support (parsers in everything)
Good enough for most use cases

What JSON lacks:

Schema validation (solved by JSON Schema)
Binary efficiency (solved by JSONB, BSON, MessagePack)
Streaming support (solved by JSON Lines)
Protocol conventions (solved by JSON-RPC)
Human-friendly syntax (solved by JSON5, HJSON)

The JSON ecosystem evolved to patch these gaps while preserving the core simplicity that made JSON successful.

Series Roadmap: This series explores the JSON ecosystem:

Part 1 (this article): Origins and fundamental weaknesses
Part 2: JSON Schema - validation, types, and contracts
Part 3: Binary JSON formats - JSONB, BSON, MessagePack
Part 6: Streaming JSON - JSON Lines and large datasets
Part 5: JSON-RPC and protocol layers
Part 6: Security - JWT, canonicalization, and attacks

In Part 2, we’ll solve JSON’s most critical weakness: the lack of validation. JSON Schema transforms JSON from “untyped text” into “strongly validated contracts” without sacrificing simplicity. We’ll explore how to define schemas, validate data at runtime, generate code from schemas, and integrate validation into your entire stack.

The core problem JSON Schema solves: How do you maintain the simplicity of JSON while gaining the safety of typed, validated data?

Next: You Don’t Know JSON: Part 2 - JSON Schema and the Art of Validation

You Don't Know JSON: Part 1 - Origins, Evolution, and the Cracks in the Foundation

📚 Series: You Dont Know JSON

The Pre-JSON Dark Ages: XML Everywhere

The Problem Space (Late 1990s)

XML: The Heavyweight Champion

XML’s Strengths

XML’s Fatal Flaws: The Monolithic Architecture

JSON’s Accidental Discovery

Douglas Crockford’s Realization (2001)

The Same Data in JSON

The Simplicity Revolution

Browser Native Support

Why JSON Won

1. The AJAX Revolution (2005)

2. REST vs SOAP

3. NoSQL Movement (2009+)

4. Configuration Files

5. Language Support Explosion

JSON’s Fundamental Weaknesses

1. No Schema or Validation

2. No Date/Time Type

3. Number Precision Issues

4. No Comments

5. No Binary Data Support

6. Verbose for Large Datasets

7. No Circular References

The Format Comparison Landscape

When NOT to Use JSON

1. High-Performance Systems → Protocol Buffers, FlatBuffers

2. Human-Edited Configuration → YAML, TOML, JSON5

3. Large Tabular Datasets → CSV, Parquet, Arrow

4. Document Storage → BSON, MessagePack

The Evolution: JSON’s Ecosystem Response

1. Validation Layer: JSON Schema

2. Binary Variants: JSONB, BSON, MessagePack

3. Streaming Format: JSON Lines (JSONL)

4. Protocol Layer: JSON-RPC

5. Human-Friendly Variants: JSON5, HJSON

6. Security Layer: JWS, JWE

Running Example: Building a User API

Conclusion: JSON’s Success Through Simplicity

Further Reading

📚 Series: You Dont Know JSON