You Don't Know JSON: Part 2 - JSON Schema and the Art of Validation

Master JSON Schema: the validation layer that transforms JSON from untyped text into strongly-validated contracts. Learn schemas, composition patterns, code generation, and OpenAPI integration with real-world examples.

In Part 1 , we explored JSON’s triumph over XML and its fundamental weakness: no built-in validation. JSON parsers accept any syntactically valid structure, but they can’t tell you if the data makes sense for your application.

1
{"age": "thirty"}
1
{"age": 30}
1
{"age": null}

All three parse successfully. But which is correct? Your application crashes at runtime when it expects a number.

What XML Had: XSD (XML Schema Definition) - 2001

XML’s approach: Built-in validation system with complex type hierarchies, inheritance, constraints, and namespaces integrated into the core specification.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<!-- XSD schema -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="user">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="age" type="xs:integer"/>
        <xs:element name="email" type="xs:string"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Benefit: Comprehensive type system with inheritance and built-in validation
Cost: Extreme complexity, tight coupling to XML parsers, difficult to learn

JSON’s approach: External validation layer (JSON Schema) - separate standard

Architecture shift: Built-in validation → External validation, Complex type system → Simple constraint-based, Monolithic → Modular

JSON Schema solves this. It’s a vocabulary for defining the structure, types, and constraints of JSON documents. Think of it as TypeScript for JSON - adding type safety and validation without changing the underlying format.

This article covers:

  • How JSON Schema works (concepts and syntax)
  • Validation in Go, JavaScript, and Python
  • Advanced patterns (composition, references, recursion)
  • Code generation from schemas
  • OpenAPI integration
  • Real-world best practices

Running Example: Validating Our User API

In Part 1 , we introduced a User API for a social platform. We have basic JSON, but no validation:

1
2
3
4
5
6
7
8
9
{
  "id": "user-5f9d88c",
  "username": "alice",
  "email": "alice@example.com",
  "created": "2023-01-15T10:30:00Z",
  "bio": "Software engineer",
  "followers": 1234,
  "verified": true
}

The problems:

  • Clients could send "email": "not-an-email"
  • Nothing prevents "followers": -1000
  • Users could set "verified": true themselves
  • No validation on username length or format

What we need:

  • Email format validation
  • Numeric ranges (followers ≥ 0)
  • Required fields (username, email)
  • String constraints (username 3-20 chars)
  • Read-only fields (id, verified, created)

JSON Schema will solve all of these.


The Core Problem: Trust Nothing

Every system boundary is a vulnerability. Never trust input from external sources - users, other services, configuration files, or databases. Validate at the boundary before data enters your system.


JSON Schema Fundamentals

What is JSON Schema?

JSON Schema is itself a JSON document that describes other JSON documents.

Let’s validate our User API from Part 1:

User data:

1
2
3
4
5
6
7
8
9
{
  "id": "user-5f9d88c",
  "username": "alice",
  "email": "alice@example.com",
  "created": "2023-01-15T10:30:00Z",
  "bio": "Software engineer",
  "followers": 1234,
  "verified": true
}

User schema:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://api.example.com/schemas/user.json",
  "title": "User",
  "description": "Social platform user profile",
  "type": "object",
  "properties": {
    "id": {
      "type": "string",
      "pattern": "^user-[a-z0-9]+$",
      "readOnly": true
    },
    "username": {
      "type": "string",
      "minLength": 3,
      "maxLength": 20,
      "pattern": "^[a-z0-9_]+$"
    },
    "email": {
      "type": "string",
      "format": "email"
    },
    "created": {
      "type": "string",
      "format": "date-time",
      "readOnly": true
    },
    "bio": {
      "type": "string",
      "maxLength": 500
    },
    "followers": {
      "type": "integer",
      "minimum": 0
    },
    "verified": {
      "type": "boolean",
      "readOnly": true
    }
  },
  "required": ["username", "email"],
  "additionalProperties": false
}

This schema enforces:

  • Required fields (username, email)
  • Username format (3-20 chars, lowercase alphanumeric + underscore)
  • Valid email format
  • Non-negative followers count
  • Read-only fields (id, created, verified) - clients can’t set these
  • No additional fields allowed

Key concepts:

  • $schema - Declares which JSON Schema version you’re using
  • type - The data type this schema validates
  • properties - Object field definitions
  • required - Fields that must be present
  • additionalProperties - Whether extra fields are allowed

Schema Evolution: Draft Versions

JSON Schema has evolved through multiple draft versions:

DraftYearKey Features
Draft 42013First widely adopted version
Draft 62017const, contains, property dependencies
Draft 72018if/then/else, readOnly, writeOnly
Draft 2019-092019$recursiveRef, unevaluatedProperties
Draft 2020-122020prefixItems, $dynamicRef (current)

Always specify $schema: Different validators support different drafts. Explicit declaration prevents compatibility issues.

timeline title JSON Schema Evolution 2013 : Draft 4 - First major adoption : Basic validation keywords 2017 : Draft 6 - Property dependencies : const keyword 2018 : Draft 7 - Conditional schemas : readOnly/writeOnly 2019 : Draft 2019-09 - Recursive refs : Vocabulary system 2020 : Draft 2020-12 - Dynamic refs : Tuple validation 2024+ : Widespread tooling support : OpenAPI 3.1 alignment

Core Validation Types

String Validation

1
2
3
4
5
6
7
{
  "type": "string",
  "minLength": 3,
  "maxLength": 100,
  "pattern": "^[A-Za-z0-9_-]+$",
  "format": "email"
}

Constraints:

  • minLength / maxLength - Character count limits
  • pattern - Regular expression (ECMAScript regex flavor)
  • format - Built-in formats (see below)

Built-in formats:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
"format": "date-time"   // "2023-01-15T10:30:00Z"
"format": "date"        // "2023-01-15"
"format": "time"        // "10:30:00"
"format": "email"       // "user@example.com"
"format": "hostname"    // "example.com"
"format": "ipv4"        // "192.168.1.1"
"format": "ipv6"        // "2001:0db8::1"
"format": "uri"         // "https://example.com/path"
"format": "uuid"        // "550e8400-e29b-41d4-a716-446655440000"
"format": "regex"       // Valid regular expression

Example: Username validation

1
2
3
4
5
6
7
{
  "type": "string",
  "minLength": 3,
  "maxLength": 20,
  "pattern": "^[a-z0-9_]+$",
  "description": "Lowercase alphanumeric with underscores"
}

Number Validation

1
2
3
4
5
6
{
  "type": "integer",
  "minimum": 0,
  "maximum": 150,
  "multipleOf": 5
}
1
2
3
4
5
{
  "type": "number",
  "exclusiveMinimum": 0,
  "exclusiveMaximum": 100
}

Number vs Integer:

  • integer - Whole numbers only
  • number - Any numeric value (integers and floats)

Constraints:

  • minimum / maximum - Inclusive bounds
  • exclusiveMinimum / exclusiveMaximum - Exclusive bounds
  • multipleOf - Must be divisible by value

Boolean and Null

1
{"type": "boolean"}
1
{"type": "null"}

Multiple types allowed:

1
2
3
{
  "type": ["string", "null"]
}

This accepts strings or null, useful for optional fields.

Array Validation

Simple arrays (all items same type):

1
2
3
4
5
6
7
{
  "type": "array",
  "items": {"type": "string"},
  "minItems": 1,
  "maxItems": 10,
  "uniqueItems": true
}

Tuple validation (fixed positions):

1
2
3
4
5
6
7
8
9
{
  "type": "array",
  "prefixItems": [
    {"type": "string"},
    {"type": "number"},
    {"type": "boolean"}
  ],
  "items": false
}

This validates ["name", 42, true] but rejects arrays with different types or length.

Example: Tag list

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "type": "array",
  "items": {
    "type": "string",
    "minLength": 1,
    "maxLength": 50
  },
  "minItems": 1,
  "maxItems": 20,
  "uniqueItems": true
}

Object Validation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "email": {"type": "string", "format": "email"},
    "age": {"type": "integer", "minimum": 0}
  },
  "required": ["name", "email"],
  "additionalProperties": false
}

Key concepts:

  • properties - Expected fields
  • required - Mandatory fields (array of property names)
  • additionalProperties - Controls unexpected fields

additionalProperties strategies:

1
"additionalProperties": false

Rejects any field not in properties. Strict validation.

1
"additionalProperties": true

Allows any extra fields. Flexible validation.

1
"additionalProperties": {"type": "string"}

Allows extra fields but validates their type.

Pattern properties (dynamic field names):

1
2
3
4
5
6
{
  "type": "object",
  "patternProperties": {
    "^[a-z]+_id$": {"type": "integer"}
  }
}

Validates {"user_id": 123, "order_id": 456} where field names match the pattern.

flowchart TB subgraph validation["Validation Process"] start[Receive JSON] parse[Parse JSON] validate[Apply Schema] start --> parse parse --> validate validate --> valid{Valid?} valid -->|Yes| accept[Accept Data] valid -->|No| reject[Reject with Errors] end subgraph schema["Schema Components"] types[Type Checking] constraints[Constraints] required[Required Fields] formats[Format Validation] validate --> types validate --> constraints validate --> required validate --> formats end style validation fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style schema fill:#3A4C43,stroke:#6b7280,color:#f0f0f0

Schema Composition: Building Complex Schemas

allOf: Intersection (AND)

Combines multiple schemas - data must satisfy all:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
  "allOf": [
    {
      "type": "object",
      "properties": {
        "name": {"type": "string"}
      },
      "required": ["name"]
    },
    {
      "type": "object",
      "properties": {
        "email": {"type": "string", "format": "email"}
      },
      "required": ["email"]
    }
  ]
}

Data must have both name and email. Useful for combining base schemas with extensions.

Use case: Adding audit fields

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "allOf": [
    {"$ref": "#/$defs/BaseEntity"},
    {
      "properties": {
        "created_at": {"type": "string", "format": "date-time"},
        "updated_at": {"type": "string", "format": "date-time"}
      },
      "required": ["created_at", "updated_at"]
    }
  ]
}

anyOf: Union (OR)

Data must satisfy at least one schema:

1
2
3
4
5
6
7
{
  "anyOf": [
    {"type": "string"},
    {"type": "number"},
    {"type": "null"}
  ]
}

Accepts strings, numbers, or null. Useful for flexible types.

Use case: Multiple contact methods

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
  "type": "object",
  "anyOf": [
    {"required": ["email"]},
    {"required": ["phone"]},
    {"required": ["address"]}
  ],
  "properties": {
    "email": {"type": "string", "format": "email"},
    "phone": {"type": "string"},
    "address": {"type": "string"}
  }
}

User must provide at least one contact method.

oneOf: Exclusive OR (XOR)

Data must satisfy exactly one schema:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
  "oneOf": [
    {
      "type": "object",
      "properties": {
        "credit_card": {"type": "string"}
      },
      "required": ["credit_card"]
    },
    {
      "type": "object",
      "properties": {
        "paypal_email": {"type": "string", "format": "email"}
      },
      "required": ["paypal_email"]
    }
  ]
}

User must choose exactly one payment method, not both.

not: Negation

Data must NOT match schema:

1
2
3
4
5
{
  "not": {
    "type": "null"
  }
}

Rejects null values. Useful for excluding specific patterns.

Combining composition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "allOf": [
    {"$ref": "#/$defs/User"},
    {
      "not": {
        "properties": {
          "role": {"const": "admin"}
        }
      }
    }
  ]
}

Accepts users who are not admins.

flowchart LR subgraph composition["Schema Composition"] allof[allOf
Intersection] anyof[anyOf
Union] oneof[oneOf
Exclusive] notof[not
Negation] end subgraph examples["Use Cases"] e1[Combine schemas] e2[Flexible types] e3[Exclusive choice] e4[Exclude patterns] end allof --> e1 anyof --> e2 oneof --> e3 notof --> e4 style composition fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style examples fill:#3A4C43,stroke:#6b7280,color:#f0f0f0

Schema Reuse and References

Local Definitions with $defs

Define reusable schemas within the document:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "user": {"$ref": "#/$defs/User"},
    "manager": {"$ref": "#/$defs/User"}
  },
  "$defs": {
    "User": {
      "type": "object",
      "properties": {
        "name": {"type": "string"},
        "email": {"type": "string", "format": "email"}
      },
      "required": ["name", "email"]
    }
  }
}

Benefits:

  • DRY principle (Don’t Repeat Yourself)
  • Single source of truth for shared types
  • Easier maintenance

External References

Reference schemas in other files:

1
2
3
{
  "$ref": "https://example.com/schemas/user.json"
}
1
2
3
{
  "$ref": "./user.json"
}
1
2
3
{
  "$ref": "./user.json#/$defs/Address"
}

Use case: Shared schema library

schemas/
  common/
    address.json
    contact.json
  user.json
  order.json
1
2
3
{
  "$ref": "./common/address.json"
}

Recursive Schemas

Self-referencing schemas for tree structures:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$defs": {
    "Node": {
      "type": "object",
      "properties": {
        "value": {"type": "string"},
        "children": {
          "type": "array",
          "items": {"$ref": "#/$defs/Node"}
        }
      }
    }
  },
  "$ref": "#/$defs/Node"
}

Validates nested tree structures:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
  "value": "root",
  "children": [
    {
      "value": "child1",
      "children": []
    },
    {
      "value": "child2",
      "children": [
        {"value": "grandchild", "children": []}
      ]
    }
  ]
}

Validation in Practice: Code Examples

JavaScript with AJV

AJV (Another JSON Validator) is the fastest JSON Schema validator for JavaScript:

1
npm install ajv ajv-formats

Basic validation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
const Ajv = require('ajv');
const addFormats = require('ajv-formats');

const ajv = new Ajv({allErrors: true});
addFormats(ajv);

const schema = {
  type: 'object',
  properties: {
    username: {
      type: 'string',
      minLength: 3,
      maxLength: 20,
      pattern: '^[a-z0-9_]+$'
    },
    email: {
      type: 'string',
      format: 'email'
    },
    age: {
      type: 'integer',
      minimum: 0,
      maximum: 150
    }
  },
  required: ['username', 'email'],
  additionalProperties: false
};

const validate = ajv.compile(schema);

const data = {
  username: 'alice',
  email: 'alice@example.com',
  age: 30
};

if (validate(data)) {
  console.log('Valid!');
} else {
  console.log('Validation errors:', validate.errors);
}

Error output:

1
2
3
4
5
6
7
8
9
[
  {
    instancePath: '/email',
    schemaPath: '#/properties/email/format',
    keyword: 'format',
    params: { format: 'email' },
    message: 'must match format "email"'
  }
]

TypeScript integration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import Ajv, {JSONSchemaType} from 'ajv';

interface User {
  username: string;
  email: string;
  age?: number;
}

const schema: JSONSchemaType<User> = {
  type: 'object',
  properties: {
    username: {type: 'string', minLength: 3},
    email: {type: 'string', format: 'email'},
    age: {type: 'integer', nullable: true}
  },
  required: ['username', 'email'],
  additionalProperties: false
};

const ajv = new Ajv();
const validate = ajv.compile(schema);

const data: unknown = JSON.parse(input);

if (validate(data)) {
  // TypeScript knows data is User here
  console.log(data.username);
}

Go with gojsonschema

1
go get github.com/xeipuuv/gojsonschema
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
package main

import (
    "fmt"
    "github.com/xeipuuv/gojsonschema"
)

func main() {
    schemaJSON := `{
        "type": "object",
        "properties": {
            "username": {
                "type": "string",
                "minLength": 3,
                "maxLength": 20
            },
            "email": {
                "type": "string",
                "format": "email"
            },
            "age": {
                "type": "integer",
                "minimum": 0
            }
        },
        "required": ["username", "email"],
        "additionalProperties": false
    }`

    dataJSON := `{
        "username": "alice",
        "email": "alice@example.com",
        "age": 30
    }`

    schemaLoader := gojsonschema.NewStringLoader(schemaJSON)
    documentLoader := gojsonschema.NewStringLoader(dataJSON)

    result, err := gojsonschema.Validate(schemaLoader, documentLoader)
    if err != nil {
        panic(err)
    }

    if result.Valid() {
        fmt.Println("Document is valid")
    } else {
        fmt.Println("Document is invalid:")
        for _, err := range result.Errors() {
            fmt.Printf("- %s: %s\n", err.Field(), err.Description())
        }
    }
}

Struct-based schema generation:

1
2
3
4
5
6
7
type User struct {
    Username string `json:"username" jsonschema:"required,minLength=3,maxLength=20"`
    Email    string `json:"email" jsonschema:"required,format=email"`
    Age      int    `json:"age,omitempty" jsonschema:"minimum=0"`
}

schema := jsonschema.Reflect(&User{})

Python with jsonschema

1
pip install jsonschema
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from jsonschema import validate, ValidationError, Draft7Validator
import jsonschema

schema = {
    "type": "object",
    "properties": {
        "username": {
            "type": "string",
            "minLength": 3,
            "maxLength": 20,
            "pattern": "^[a-z0-9_]+$"
        },
        "email": {
            "type": "string",
            "format": "email"
        },
        "age": {
            "type": "integer",
            "minimum": 0,
            "maximum": 150
        }
    },
    "required": ["username", "email"],
    "additionalProperties": False
}

data = {
    "username": "alice",
    "email": "alice@example.com",
    "age": 30
}

try:
    validate(instance=data, schema=schema)
    print("Valid!")
except ValidationError as e:
    print(f"Validation error: {e.message}")
    print(f"Failed at path: {e.json_path}")

Detailed error handling:

1
2
3
4
5
6
validator = Draft7Validator(schema)
errors = sorted(validator.iter_errors(data), key=lambda e: e.path)

for error in errors:
    path = '.'.join(str(p) for p in error.path)
    print(f"Error at {path}: {error.message}")

Pydantic integration (Pythonic validation):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from pydantic import BaseModel, EmailStr, Field

class User(BaseModel):
    username: str = Field(..., min_length=3, max_length=20, regex="^[a-z0-9_]+$")
    email: EmailStr
    age: int = Field(..., ge=0, le=150)

    class Config:
        extra = 'forbid'  # No additional fields

# Validation happens automatically
user = User(username="alice", email="alice@example.com", age=30)

# Export JSON Schema
print(User.schema_json(indent=2))
Performance Tip: Compile schemas once and reuse the validator. Schema compilation is expensive, but validation is fast. In AJV and most libraries, compile at application startup, not per-request.

Code Generation from Schemas

TypeScript from JSON Schema

quicktype generates TypeScript types:

1
npm install -g quicktype
1
quicktype -s schema user-schema.json -o user.ts

Output:

1
2
3
4
5
export interface User {
    username: string;
    email:    string;
    age?:     number;
}

json-schema-to-typescript:

1
npm install -D json-schema-to-typescript
1
2
3
4
5
import {compile} from 'json-schema-to-typescript';

const schema = {...};
const ts = await compile(schema, 'User');
console.log(ts);

Go from JSON Schema

go-jsonschema:

1
go install github.com/atombender/go-jsonschema/cmd/gojsonschema@latest
1
gojsonschema -p models user-schema.json

Output:

1
2
3
4
5
6
7
package models

type User struct {
    Username string `json:"username"`
    Email    string `json:"email"`
    Age      *int   `json:"age,omitempty"`
}

Python from JSON Schema

datamodel-code-generator:

1
pip install datamodel-code-generator
1
datamodel-codegen --input user-schema.json --output user.py

Output:

1
2
3
4
5
6
from pydantic import BaseModel, EmailStr, Field

class User(BaseModel):
    username: str = Field(..., min_length=3, max_length=20)
    email: EmailStr
    age: int = Field(None, ge=0, le=150)
flowchart LR subgraph sources["Schema Sources"] manual[Hand-written
Schema] generated[Generated from
Code] openapi[OpenAPI
Spec] end subgraph targets["Generated Artifacts"] types[Type Definitions
TS, Go, Python] validators[Validators] docs[Documentation] end manual --> targets generated --> targets openapi --> targets style sources fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style targets fill:#3A4C43,stroke:#6b7280,color:#f0f0f0

OpenAPI Integration

OpenAPI 3.1 uses JSON Schema for request/response validation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
openapi: 3.1.0
info:
  title: User API
  version: 1.0.0

paths:
  /users:
    post:
      summary: Create user
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/User'
      responses:
        '201':
          description: Created
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/User'

components:
  schemas:
    User:
      type: object
      properties:
        username:
          type: string
          minLength: 3
          maxLength: 20
        email:
          type: string
          format: email
        age:
          type: integer
          minimum: 0
      required:
        - username
        - email
      additionalProperties: false

Benefits:

  • Single source of truth (schema + docs + validation)
  • Code generation for clients and servers
  • Contract testing
  • Interactive documentation (Swagger UI)

Generate validators from OpenAPI:

1
2
3
4
openapi-generator-cli generate \
  -i openapi.yaml \
  -g typescript-axios \
  -o ./generated

Schema Evolution and Versioning

Safe Changes (Non-Breaking)

+ Add optional field:

1
2
3
4
5
6
7
8
{
  "properties": {
    "name": {"type": "string"},
    "email": {"type": "string"},
    "phone": {"type": "string"}
  },
  "required": ["name", "email"]
}

Old data still validates. New field is optional.

+ Relax constraints:

1
2
3
{
  "minLength": 3
}

Change to:

1
2
3
{
  "minLength": 1
}

More permissive. Old data still validates.

+ Remove required field:

1
"required": ["name", "email", "age"]

Change to:

1
"required": ["name", "email"]

Breaking Changes (Dangerous)

- Make field required:

1
"required": ["name"]

Change to:

1
"required": ["name", "email"]

Old data without email fails validation.

- Restrict type:

1
{"type": ["string", "null"]}

Change to:

1
{"type": "string"}

Old data with null fails.

- Tighten constraints:

1
{"minLength": 3}

Change to:

1
{"minLength": 10}

Old data with shorter strings fails.

Versioning Strategies

1. Schema $id versioning:

1
2
3
4
5
{
  "$id": "https://example.com/schemas/user/v2.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  ...
}

2. API versioning:

/v1/users  → user-schema-v1.json
/v2/users  → user-schema-v2.json

3. Feature flags in schema:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
  "allOf": [
    {"$ref": "#/$defs/BaseUser"},
    {
      "if": {
        "properties": {
          "version": {"const": 2}
        }
      },
      "then": {
        "properties": {
          "new_field": {"type": "string"}
        },
        "required": ["new_field"]
      }
    }
  ]
}
Migration Strategy: When introducing breaking schema changes, support both old and new versions during a transition period. Use API versioning or content negotiation to route requests to appropriate validators.

Best Practices

1. Always Specify $schema

1
2
3
{
  "$schema": "https://json-schema.org/draft/2020-12/schema"
}

Different validators support different drafts. Explicit declaration prevents confusion.

2. Use Descriptive Field Names and Descriptions

1
2
3
4
5
6
7
8
9
{
  "properties": {
    "email": {
      "type": "string",
      "format": "email",
      "description": "User's primary email address for notifications"
    }
  }
}

Schemas are documentation. Make them readable.

3. Leverage $defs for Reusable Types

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
  "$defs": {
    "Email": {
      "type": "string",
      "format": "email"
    },
    "Username": {
      "type": "string",
      "minLength": 3,
      "pattern": "^[a-z0-9_]+$"
    }
  }
}

DRY principle. Define once, reference everywhere.

4. Include Examples

1
2
3
4
5
6
7
8
{
  "type": "string",
  "format": "email",
  "examples": [
    "user@example.com",
    "admin@company.org"
  ]
}

Examples help developers understand expected format.

5. Set additionalProperties Explicitly

1
2
3
{
  "additionalProperties": false
}

Or:

1
2
3
{
  "additionalProperties": true
}

Never leave it implicit. Be clear about whether extra fields are allowed.

6. Validate at System Boundaries

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// API endpoint
app.post('/api/users', (req, res) => {
  if (!validate(req.body)) {
    return res.status(400).json({
      error: 'Validation failed',
      details: validate.errors
    });
  }
  
  // Business logic here
});

As discussed earlier - validate at boundaries, reject early.

7. Compile Schemas Once

1
2
3
4
5
6
7
8
9
// At startup (once)
const validateUser = ajv.compile(userSchema);

// Per request (many times)
app.post('/users', (req, res) => {
  if (!validateUser(req.body)) {
    return res.status(400).json(validateUser.errors);
  }
});

Schema compilation is expensive. Do it once at application startup.

8. Test Your Schemas

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
describe('User Schema', () => {
  it('accepts valid user', () => {
    const data = {username: 'alice', email: 'alice@example.com'};
    expect(validate(data)).toBe(true);
  });

  it('rejects missing required field', () => {
    const data = {username: 'alice'};
    expect(validate(data)).toBe(false);
    expect(validate.errors[0].message).toContain('required');
  });

  it('rejects invalid email format', () => {
    const data = {username: 'alice', email: 'not-an-email'};
    expect(validate(data)).toBe(false);
  });
});

Schemas are code. Test them like code.


Common Pitfalls

1. Over-Constraining Schemas

Too strict:

1
2
3
4
{
  "type": "string",
  "pattern": "^[A-Z][a-z]+$"
}

Rejects valid names like “O’Brien”, “van Gogh”, “José”.

Better:

1
2
3
4
5
{
  "type": "string",
  "minLength": 1,
  "maxLength": 100
}

Let application logic handle complex name validation.

2. Regex Performance Issues

Dangerous (exponential backtracking):

1
2
3
{
  "pattern": "^(a+)+b$"
}

Can cause ReDoS (Regular Expression Denial of Service).

Safe:

1
2
3
{
  "pattern": "^a+b$"
}

Avoid nested quantifiers.

3. Not Handling additionalProperties

Forgetting to set it:

1
2
3
4
5
{
  "properties": {
    "name": {"type": "string"}
  }
}

Accepts ANY extra fields by default. Be explicit.

4. Format Validation Inconsistencies

Format keywords are optional in JSON Schema spec. Not all validators implement all formats.

Solution: Use regex patterns for critical validation:

1
2
3
4
{
  "type": "string",
  "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
}

5. Misunderstanding Draft Differences

Draft 4 uses definitions. Draft 2020-12 uses $defs.

1
2
3
4
5
// Draft 4
{"definitions": {...}}

// Draft 2020-12
{"$defs": {...}}

Always specify $schema to avoid confusion.


Alternatives to JSON Schema

Zod (TypeScript-First)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import {z} from 'zod';

const userSchema = z.object({
  username: z.string().min(3).max(20).regex(/^[a-z0-9_]+$/),
  email: z.string().email(),
  age: z.number().int().nonnegative().optional()
});

type User = z.infer<typeof userSchema>;

const result = userSchema.safeParse(data);
if (result.success) {
  console.log(result.data);
} else {
  console.log(result.error.errors);
}

Benefits:

  • TypeScript-native (types inferred from schema)
  • Better DX (developer experience)
  • Composable validators

Trade-off: JavaScript ecosystem only.

Joi (JavaScript Validation)

1
2
3
4
5
6
7
8
9
const Joi = require('joi');

const schema = Joi.object({
  username: Joi.string().min(3).max(20).pattern(/^[a-z0-9_]+$/),
  email: Joi.string().email(),
  age: Joi.number().integer().min(0).optional()
});

const {error, value} = schema.validate(data);

Benefits: Mature, expressive API, good error messages.

Trade-off: JavaScript only, no JSON Schema compatibility.

Pydantic (Python)

1
2
3
4
5
6
7
8
from pydantic import BaseModel, EmailStr, Field

class User(BaseModel):
    username: str = Field(..., min_length=3, max_length=20)
    email: EmailStr
    age: int = Field(None, ge=0)

user = User(**data)  # Automatic validation

Benefits: Pythonic, integrated with FastAPI, excellent performance.

Trade-off: Python only.

When to Use JSON Schema

Use JSON Schema when:

    • Cross-language validation needed
    • OpenAPI integration required
    • Standard-based validation important
    • Schema portability matters
    • Documentation generation from schema

Use language-specific alternatives when:

    • Single-language project
    • Better DX is priority
    • Type inference important
    • Framework integration available (FastAPI + Pydantic)

Real-World Use Cases

1. API Request Validation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
app.post('/api/users', async (req, res) => {
  if (!validateUser(req.body)) {
    return res.status(400).json({
      error: 'Invalid request',
      details: validateUser.errors
    });
  }

  const user = await db.users.create(req.body);
  res.status(201).json(user);
});

2. Configuration File Validation

1
2
3
4
5
6
7
{
  "$schema": "https://example.com/config-schema.json",
  "database": {
    "host": "localhost",
    "port": 5432
  }
}

IDE provides autocomplete and validation while editing.

3. Contract Testing

1
2
3
4
5
6
7
8
describe('User API Contract', () => {
  it('returns user matching schema', async () => {
    const response = await fetch('/api/users/1');
    const data = await response.json();
    
    expect(validateUser(data)).toBe(true);
  });
});

4. Database Schema Enforcement

PostgreSQL with JSON Schema:

1
2
3
4
5
6
7
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  data JSONB,
  CONSTRAINT valid_user CHECK (
    jsonb_matches_schema('{"type": "object", ...}', data)
  )
);

5. Message Queue Validation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Producer validates before sending
if (!validateEvent(event)) {
  throw new Error('Invalid event');
}
await queue.publish('events', event);

// Consumer validates on receipt
queue.subscribe('events', (msg) => {
  if (!validateEvent(msg)) {
    logger.error('Invalid message', msg);
    return;
  }
  
  processEvent(msg);
});

Conclusion: JSON + Schema = Type Safety

JSON Schema transforms JSON from “any structure passes” to “only valid structures accepted.” It bridges the gap between dynamic typing and type safety without changing JSON itself.

What you learned:

  • JSON Schema provides validation layer for JSON
  • Schemas define types, constraints, and structure
  • Composition patterns (allOf, anyOf, oneOf) enable complex validation
  • References ($ref, $defs) enable schema reuse
  • Code generation creates types from schemas
  • OpenAPI uses JSON Schema for API contracts
  • Schema evolution requires careful planning

Key insight: JSON Schema adds the contract layer JSON was missing. It enables:

  • Type safety without changing JSON format
  • API contracts that are both docs and validation
  • Code generation from a single source of truth
  • Runtime validation with compile-time-like guarantees

Best Practice Summary:

  • Specify $schema version explicitly
  • Validate at system boundaries (API endpoints, file readers)
  • Compile schemas once at startup
  • Use $defs for reusable components
  • Set additionalProperties explicitly
  • Test your schemas like code
  • Version schemas when making breaking changes

In Part 3, we’ll explore binary JSON formats (JSONB, BSON, MessagePack) - solving JSON’s size and performance limitations while maintaining JSON-like structure.

Next: Part 3 - Binary JSON: When Text Format Isn’t Fast Enough


Further Reading

Specifications:

Tools:

Libraries: