You Don't Know JSON: Part 1 - Origins, Evolution, and the Cracks in the Foundation
The complete history of JSON from Douglas Crockford's discovery to today's dominance. Learn why JSON replaced XML, where it fails, and why the JSON ecosystem evolved beyond basic key-value pairs.
- tags
- #Json #Data-Formats #Xml #Yaml #Serialization #Web-Development #Api-Design #Javascript #Rest-Api #Data-Interchange #Json-Schema #Standards #Rfc #History #Web-Standards #Parsing #Validation #Configuration #Distributed-Systems #Microservices
- categories
- Fundamentals Programming
- published
- reading time
- 18 minutes
📚 Series: You Dont Know JSON
- You Don't Know JSON: Part 1 - Origins, Evolution, and the Cracks in the Foundation (current)
- You Don't Know JSON: Part 2 - JSON Schema and the Art of Validation
- You Don't Know JSON: Part 3 - Binary JSON in Databases
- You Don't Know JSON: Part 4 - Binary JSON for APIs and Data Transfer
- You Don't Know JSON: Part 5 - JSON-RPC: When REST Isn't Enough
- You Don't Know JSON: Part 6 - JSON Lines: Processing Gigabytes Without Running Out of Memory
- You Don't Know JSON: Part 7 - Security: Authentication, Signatures, and Attacks
- You Don't Know JSON: Part 8 - Lessons from the JSON Revolution
Every developer knows JSON. You’ve written {"key": "value"} thousands of times. You’ve debugged missing commas, fought with trailing characters, and cursed the lack of comments in configuration files.
But how did we get here? Why does the world’s most popular data format have such obvious limitations? And why, despite being “simple,” has JSON spawned an entire ecosystem of variants, extensions, and workarounds?
This series explores the JSON you don’t know - the one beyond basic syntax. We’ll examine binary formats, streaming protocols, validation schemas, RPC layers, and security considerations. But first, we need to understand why JSON exists and where it falls short.
What XML Had: Everything built-in (1998-2005)
XML’s approach: Monolithic specification with validation (XSD), transformation (XSLT), namespaces, querying (XPath), protocols (SOAP), and security (XML Signature/Encryption) all integrated into one ecosystem.
| |
Benefit: Complete solution with built-in type safety, validation, and extensibility
Cost: Massive complexity, steep learning curve, rigid coupling between features
JSON’s approach: Minimal core with separate standards for each need
Architecture shift: Integrated → Modular, Everything built-in → Composable solutions, Monolithic → Ecosystem-driven
The Pre-JSON Dark Ages: XML Everywhere
The Problem Space (Late 1990s)
The web was growing explosively. Websites evolved from static HTML to dynamic applications. Services needed to communicate across networks, applications needed configuration files, and developers needed a way to move structured data between systems.
The requirements were clear:
- Human-readable (developers must debug it)
- Machine-parseable (computers must process it)
- Language-agnostic (works in any programming language)
- Supports nested structures (real data has hierarchy)
- Self-describing (data carries its own schema)
XML: The Heavyweight Champion
XML (eXtensible Markup Language) emerged as the answer. By the early 2000s, it dominated:
XML everywhere:
- Configuration files (web.xml, applicationContext.xml)
- SOAP web services (the enterprise standard)
- Data exchange (RSS, Atom feeds)
- Document formats (DOCX, SVG)
- Build systems (Maven pom.xml, Ant build.xml)
A simple person record in XML:
| |
Size: 247 bytes
XML’s Strengths
XML wasn’t chosen arbitrarily. It had real advantages:
+ Schema validation (XSD, DTD, RelaxNG)
+ Namespaces (avoid naming conflicts)
+ XPath (query language)
+ XSLT (transformation)
+ Comments (documentation support)
+ Attributes and elements (flexible modeling)
+ Mature tooling (parsers in every language)
XML’s Fatal Flaws: The Monolithic Architecture
But XML’s complexity became its downfall. The problem wasn’t any single feature - it was the architectural decision to build everything into one specification.
XML wasn’t just a data format. It was an entire technology stack:
Core XML (parsing and structure):
- DOM (Document Object Model) - load entire document into memory
- SAX (Simple API for XML) - event-driven streaming parser
- StAX (Streaming API for XML) - pull parser
- Namespace handling (xmlns declarations)
- Entity resolution (external references)
- CDATA sections (unparsed character data)
Validation layer (built-in):
- DTD (Document Type Definition) - original schema language
- XSD (XML Schema Definition) - complex type system
- RelaxNG - alternative schema language
- Schematron - rule-based validation
Query layer (built-in):
- XPath - query language for selecting nodes
- XQuery - SQL-like language for XML
- XSLT - transformation and templating
Protocol layer (built-in):
- SOAP (Simple Object Access Protocol)
- WSDL (Web Services Description Language)
- WS-Security, WS-ReliableMessaging, WS-AtomicTransaction
- 50+ WS-* specifications
The architectural problem: Every XML parser had to support this entire stack. You couldn’t use XML without dealing with namespaces. You couldn’t validate without learning XSD. You couldn’t query without XPath.
The result:
- XML parsers: 50,000+ lines of code
- XSD validators: Complex type systems rivaling programming languages
- SOAP toolkits: Megabytes of libraries just to call a remote function
- Learning curve: Months to master the ecosystem
Specific pain points:
Verbosity:
| |
vs
| |
Namespace confusion:
| |
Schema complexity (XSD):
| |
vs JSON Schema:
| |
The real killer: Developer experience. Writing XML by hand was tedious. Reading XML logs was painful. Debugging SOAP requests required specialized tools. The monolithic architecture meant you couldn’t use just the parts you needed - it was all or nothing.
JSON’s Accidental Discovery
Douglas Crockford’s Realization (2001)
JSON wasn’t invented - it was discovered. Douglas Crockford realized that JavaScript’s object literal notation was already a perfect data format:
| |
Key insight: This notation was:
- Already in JavaScript engines (browsers everywhere)
- Minimal syntax (no closing tags)
- Easy to parse (recursive descent parser is ~500 lines)
- Human-readable
- Machine-friendly
The Same Data in JSON
| |
Size: 129 bytes (52% smaller than XML)
The Simplicity Revolution
JSON’s radical simplification:
Six data types:
object-{ "key": "value" }array-[1, 2, 3]string-"text"number-123or123.45boolean-trueorfalsenull-null
That’s it. No attributes. No namespaces. No CDATA sections. No processing instructions.
Browser Native Support
The killer feature:
| |
No XML parser library needed. No SAX vs DOM decision. Just two functions.
Why JSON Won
1. The AJAX Revolution (2005)
Google Maps launched and changed everything. AJAX (Asynchronous JavaScript and XML) applications became the future of the web.
Irony: Despite the name, JSON quickly replaced XML in AJAX because:
- Faster to parse in JavaScript
- Smaller payloads (bandwidth mattered on 2005 connections)
- Native browser support
- Easier for front-end developers
2. REST vs SOAP
REST APIs adopted JSON as the default format:
SOAP request (XML):
| |
REST request (JSON):
| |
REST response:
| |
The difference was stark. REST + JSON became the de facto standard for web APIs.
3. NoSQL Movement (2009+)
MongoDB, CouchDB, and other NoSQL databases chose JSON-like formats:
| |
Why JSON for databases:
- Schema flexibility (add fields without migrations)
- Direct JavaScript integration
- Document model matches JSON structure
- Query results are already in API format
4. Configuration Files
JSON displaced XML in configuration:
package.json (Node.js):
| |
tsconfig.json (TypeScript):
| |
Developers preferred JSON over XML for configuration because it was easier to read and edit.
5. Language Support Explosion
By 2010, every major language had JSON support:
Go:
| |
Python:
| |
Java:
| |
247 bytes] json[JSON
129 bytes] yaml[YAML
98 bytes] end subgraph metrics["Key Metrics"] size[Size] parse[Parse Speed] write[Write Speed] human[Readability] end formats --> metrics json -.Best Balance.-> metrics style xml fill:#4C3A3C,stroke:#6b7280,color:#f0f0f0 style json fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style yaml fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style metrics fill:#4C4538,stroke:#6b7280,color:#f0f0f0
JSON’s Fundamental Weaknesses
Now we reach the core problem. JSON won because it was simple. But that simplicity came with trade-offs that become painful at scale.
1. No Schema or Validation
The problem:
| |
Is age a string or a number? Both are valid JSON. The parser accepts both. Your application crashes when it expects a number.
Real-world consequences:
- API breaking changes go undetected
- Invalid data passes validation
- Runtime errors instead of compile-time checks
- Documentation is separate from data format
- Client-server contract is implicit, not explicit
2. No Date/Time Type
JSON has no standard way to represent dates:
| |
| |
| |
All are valid JSON. Which format do you use? ISO 8601 string? Unix timestamp? Custom format?
Every project reinvents this. Libraries make assumptions. APIs document their chosen format. Parsing errors happen when formats don’t match.
3. Number Precision Issues
JavaScript uses IEEE 754 double-precision floats for all numbers:
| |
Critical Production Issue: JSON’s number type causes real-world failures:
- Database IDs beyond 2^53 silently corrupt (Snowflake IDs, Twitter IDs)
- Financial calculations lose cents ($1234.56 becomes $1234.5599999999)
- Timestamps break (millisecond precision lost after 2^53)
- Different languages parse differently (Python preserves precision, JavaScript doesn’t)
This isn’t theoretical - major APIs (Twitter, Stripe, GitHub) return large IDs as strings to prevent JavaScript corruption. If your API has >10M records with auto-increment IDs, you WILL hit this.
Problems:
- Large integers lose precision (database IDs, timestamps)
- No distinction between integer and float
- Different languages handle this differently
- Financial calculations require special handling
Common workaround:
| |
Represent numbers as strings to preserve precision. But now you need custom parsing logic.
Real-world examples:
| |
4. No Comments
You cannot add comments to JSON:
| |
Why is debug enabled? What does this configuration do? You can’t document it in the file itself.
Workarounds:
| |
Use fake fields for comments. But parsers still process these as data.
5. No Binary Data Support
JSON is text-based. Binary data must be encoded:
| |
Problems:
- Base64 encoding increases size by ~33%
- Additional encoding/decoding overhead
- Not efficient for large binary files
6. Verbose for Large Datasets
Repeated field names add significant overhead:
| |
Field names (“id”, “name”, “email”) repeat for every record. In a 100,000 row dataset, this is wasteful.
CSV alternative (for comparison):
| |
More compact, but loses type information and nested structure support.
7. No Circular References
JSON cannot represent circular references:
| |
You must manually break cycles or use a serialization library that detects and handles them.
The Format Comparison Landscape
Let’s compare JSON to its alternatives across key dimensions:
| Feature | JSON | XML | YAML | TOML | Protocol Buffers |
|---|---|---|---|---|---|
| Human-readable | Yes | Yes | Yes | Yes | No |
| Schema validation | No* | Yes | No | No | Yes |
| Comments | No | Yes | Yes | Yes | No |
| Binary support | No | No | No | No | Yes |
| Date types | No | No | No | Yes | Yes |
| Size efficiency | Medium | Large | Medium | Medium | Small |
| Parse speed | Fast | Slow | Medium | Medium | Very Fast |
| Language support | Universal | Universal | Wide | Growing | Wide |
| Nested structures | Yes | Yes | Yes | Limited | Yes |
| Trailing commas | No | N/A | Yes | Yes | N/A |
| Type safety | No | Yes | No | Partial | Yes |
*JSON Schema provides validation but isn’t part of JSON itself.
configuration?} api{API/Network
transfer?} perf{Performance
critical?} legacy{Legacy system
integration?} start --> config start --> api start --> perf start --> legacy config -->|Need comments| yaml[YAML] config -->|Simple config| toml[TOML] config -->|Complex schema| xml[XML] api -->|Web APIs| json[JSON] api -->|Microservices| proto[Protocol Buffers] perf -->|Extreme perf| proto2[Protocol Buffers] perf -->|Binary + schema| msgpack[MessagePack/CBOR] legacy -->|Enterprise| xml2[XML/SOAP] legacy -->|Tabular data| csv[CSV] end style decision fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style json fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style proto fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style proto2 fill:#3A4C43,stroke:#6b7280,color:#f0f0f0
When NOT to Use JSON
Despite JSON’s dominance, there are clear cases where alternatives are better:
1. High-Performance Systems → Protocol Buffers, FlatBuffers
When you’re handling millions of requests per second, Protocol Buffers offer compelling advantages:
| |
Benefits:
- 3-10x smaller than JSON
- 5-20x faster to parse
- Schema enforced at compile time
- Backward/forward compatibility built-in
Trade-off: Not human-readable, requires schema compilation.
Read more: Understanding Protocol Buffers: Part 1
2. Human-Edited Configuration → YAML, TOML, JSON5
When developers edit config files frequently:
TOML:
| |
YAML:
| |
Benefits: Comments, less syntax noise, more readable.
Trade-off: YAML has subtle parsing gotchas (indentation, special values like no/yes).
3. Large Tabular Datasets → CSV, Parquet, Arrow
For analytics and data pipelines:
| |
Benefits: Much more compact, streaming-friendly, tooling optimized for analysis.
Trade-off: No nested structures, limited type information.
4. Document Storage → BSON, MessagePack
When JSON-like flexibility meets binary efficiency:
| |
Benefits: Native date types, binary data support, efficient storage.
Trade-off: Binary format, language-specific implementations.
The Evolution: JSON’s Ecosystem Response
JSON’s limitations didn’t kill it. Instead, an entire ecosystem evolved to address the weaknesses while preserving the core simplicity:
1. Validation Layer: JSON Schema
Problem: No built-in validation
Solution: External schema language
| |
Next article: Part 2 dives deep into JSON Schema - how it works, why it matters, and how it solves JSON’s validation problem.
2. Binary Variants: JSONB, BSON, MessagePack
Problem: Text format is inefficient
Solution: Binary encoding with JSON-like structure
These formats maintain JSON’s structure while using efficient binary serialization :
- PostgreSQL JSONB: Decomposed binary format, indexable, faster queries
- MongoDB BSON: Binary JSON with extended types
- MessagePack: Universal binary serialization
3. Streaming Format: JSON Lines (JSONL)
Problem: JSON arrays don’t stream
Solution: Newline-delimited JSON objects
{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Carol"}
Each line is independent, enabling streaming, log files, and Unix pipeline processing.
4. Protocol Layer: JSON-RPC
Problem: No standard RPC convention
Solution: Structured request/response format
| |
Used by Ethereum, LSP (Language Server Protocol), and many other systems.
5. Human-Friendly Variants: JSON5, HJSON
Problem: No comments, strict syntax
Solution: Relaxed JSON with comments and trailing commas
JSON deliberately omitted developer-friendly features to stay minimal. For machine-to-machine communication, this is fine. For configuration files humans edit daily, it’s painful.
JSON5 adds convenience features while maintaining JSON compatibility:
{
// Single-line comments
/* Multi-line comments */
name: 'my-app', // Unquoted keys
port: 8080,
features: {
debug: false, // Trailing commas OK
maxConnections: 1_000, // Numeric separators
},
// Multi-line strings
description: 'This is a \
multi-line description',
}
HJSON goes further with extreme readability:
{
# Hash comments (like YAML)
# Quotes optional for strings
name: my-app
port: 8080
# Commas optional
features: {
debug: false
maxConnections: 1000
}
# Multi-line strings without escaping
description:
'''
This is a naturally
multi-line description
'''
}
Comparison:
| Feature | JSON | JSON5 | HJSON | YAML | TOML |
|---|---|---|---|---|---|
| Comments | No | Yes | Yes | Yes | Yes |
| Trailing commas | No | Yes | Yes | N/A | N/A |
| Unquoted keys | No | Yes | Yes | Yes | Yes |
| Unquoted strings | No | No | Yes | Yes | Yes |
| Native browser support | Yes | No | No | No | No |
| Designed for configs | No | Partial | Yes | Yes | Yes |
When to use:
JSON5:
- VSCode settings (
.vscode/settings.json5) - Build tool configs where JSON is expected
- Need comments but want JSON compatibility
HJSON:
- Developer-facing configs prioritizing readability
- Local development settings
- Documentation examples
Standard JSON:
- APIs and data interchange
- Production configs (parsed by machines)
- Anything needing browser/native support
Why they’re niche: Unlike JSON Schema (essential for validation) or JSONB (essential for performance), JSON5/HJSON solve a convenience problem that YAML and TOML also solve. Most teams choose YAML or TOML for configuration files - they were designed for this purpose from the start and have broader ecosystem support.
6. Security Layer: JWS, JWE
Problem: No built-in security
Solution: JSON Web Signatures and Encryption standards
RFC 8259] end subgraph extensions["JSON Ecosystem (2005-2025)"] schema[JSON Schema
Validation] jsonb[JSONB/BSON
Binary Storage] jsonl[JSON Lines
Streaming] rpc[JSON-RPC
Protocols] json5[JSON5/HJSON
Human-Friendly] jwt[JWT/JWS/JWE
Security] end json --> schema json --> jsonb json --> jsonl json --> rpc json --> json5 json --> jwt style core fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style extensions fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0
Running Example: Building a User API
Throughout this series, we’ll follow a single use case: a User API for a social platform. Each part will show how that layer of the ecosystem improves this real-world scenario.
The scenario:
- REST API for user management
- 10 million users in PostgreSQL
- Mobile and web clients
- Need authentication, validation, performance, and security
Part 1 (this article): The basic JSON structure
| |
What’s missing:
- No validation (what if email is invalid?)
- Inefficient storage (text format repeated 10M times)
- Can’t stream user exports (arrays don’t stream)
- No authentication (how do we secure this?)
- No protocol (how do clients call getUserById?)
The journey ahead:
- Part 2: Add JSON Schema validation for type safety
- Part 3: Store users in PostgreSQL JSONB for performance
- Part 5: Add JSON-RPC protocol for structured API calls
- Part 5: Export users with JSON Lines for streaming
- Part 6: Secure API with JWT authentication
This single API will demonstrate how each ecosystem layer solves a real problem.
Conclusion: JSON’s Success Through Simplicity
JSON won not because it was perfect, but because it was simple enough to understand, implement, and adopt universally. Its weaknesses are real, but they’re addressable through layered solutions.
What made JSON win:
- Minimal syntax (6 data types, simple rules)
- Browser native support (JSON.parse/stringify)
- Perfect timing (AJAX era, REST movement)
- Universal language support (parsers in everything)
- Good enough for most use cases
What JSON lacks:
- Schema validation (solved by JSON Schema)
- Binary efficiency (solved by JSONB, BSON, MessagePack)
- Streaming support (solved by JSON Lines)
- Protocol conventions (solved by JSON-RPC)
- Human-friendly syntax (solved by JSON5, HJSON)
The JSON ecosystem evolved to patch these gaps while preserving the core simplicity that made JSON successful.
Series Roadmap: This series explores the JSON ecosystem:
- Part 1 (this article): Origins and fundamental weaknesses
- Part 2: JSON Schema - validation, types, and contracts
- Part 3: Binary JSON formats - JSONB, BSON, MessagePack
- Part 6: Streaming JSON - JSON Lines and large datasets
- Part 5: JSON-RPC and protocol layers
- Part 6: Security - JWT, canonicalization, and attacks
In Part 2, we’ll solve JSON’s most critical weakness: the lack of validation. JSON Schema transforms JSON from “untyped text” into “strongly validated contracts” without sacrificing simplicity. We’ll explore how to define schemas, validate data at runtime, generate code from schemas, and integrate validation into your entire stack.
The core problem JSON Schema solves: How do you maintain the simplicity of JSON while gaining the safety of typed, validated data?
Next: You Don’t Know JSON: Part 2 - JSON Schema and the Art of Validation
Further Reading
Specifications:
Historical:
Comparisons:
📚 Series: You Dont Know JSON
- You Don't Know JSON: Part 1 - Origins, Evolution, and the Cracks in the Foundation (current)
- You Don't Know JSON: Part 2 - JSON Schema and the Art of Validation
- You Don't Know JSON: Part 3 - Binary JSON in Databases
- You Don't Know JSON: Part 4 - Binary JSON for APIs and Data Transfer
- You Don't Know JSON: Part 5 - JSON-RPC: When REST Isn't Enough
- You Don't Know JSON: Part 6 - JSON Lines: Processing Gigabytes Without Running Out of Memory
- You Don't Know JSON: Part 7 - Security: Authentication, Signatures, and Attacks
- You Don't Know JSON: Part 8 - Lessons from the JSON Revolution