Document databases store data as self-contained, hierarchical documents (typically JSON or BSON) that can contain nested objects and arrays, eliminating the need for joins to access related data. Documents in the same collection may have different fields (schema-flexible), enabling fast iteration during development. Queries match documents by field values and can address nested fields using dot notation. The design philosophy encourages embedding related data within a single document — trading write-time redundancy for fast, join-free reads aligned with specific access patterns.
Model a blog (posts with embedded comments and tags) in both a relational schema and a document schema. Query for common patterns (get all comments on a post, get all posts by an author) and compare verbosity and execution.
Coming from relational databases and the NoSQL concepts you already know, document databases represent a fundamentally different way of thinking about data modeling. Instead of spreading related data across multiple tables linked by foreign keys and reassembled with JOINs, a document database stores related data together in a single document — a self-contained, hierarchical structure typically represented as JSON or BSON (Binary JSON). A blog post document might contain the title, body, author info, an array of tags, and an array of comment objects, all nested inside one structure. To display the post, you fetch one document — no joins required.
This embedding strategy directly trades normalization for read performance. In a relational schema, displaying that blog post might require joining four tables (posts, authors, tags, comments). In a document database, it's a single read by document ID. The tradeoff is write-side complexity: if the author changes their display name, you may need to update that name in every document where it's embedded. This is why document modeling requires thinking carefully about access patterns — which queries will you run most often? Data that is always read together should be embedded; data that is shared across many documents or updated independently should be referenced by storing an ID and resolving it in application code.
Documents within the same collection (the document database analog of a table) do not need to share the same structure. One product document might have a "dimensions" field while another has a "download_size" field. This schema flexibility accelerates early development — you can evolve your data model without running ALTER TABLE migrations. In practice, however, applications enforce an implicit schema through code: your application expects certain fields to exist and have certain types. Libraries like Mongoose (for MongoDB) formalize this with schema definitions at the application layer, recovering some of the structure that relational databases enforce at the storage layer.
The most important limitation to understand is the transaction boundary. Operations on a single document are atomic — the database guarantees you won't see a half-updated document. But operations that span multiple documents lack this guarantee by default. If you need to transfer money between two accounts stored as separate documents, you must either restructure your data so both accounts live in one document, use the database's multi-document transaction support (which carries performance overhead), or accept the risk of partial updates. This is the central design tension in document databases: the unit of atomicity is the document, so your document boundaries are also your consistency boundaries.
No topics depend on this one yet.