NoSQL Modeling 101
Data modeling in MongoDB is fundamentally different from SQL. Rather than organizing data into normalized tables, MongoDB embraces a document-oriented approach. The golden rule: Data that is accessed together should be stored together. This principle guides whether you should embed data or reference it.
The Fundamentals of Document Structure
MongoDB documents are stored as BSON (Binary JSON), which supports rich data types. A well-designed document structure should reflect how your application accesses data. Consider your queries first, then structure your documents to satisfy those queries efficiently.
Embedding vs Referencing
This is the most critical decision in MongoDB schema design. The choice between embedding and referencing has major implications for performance, consistency, and scalability.
Embedding (Denormalization)
Use when you have "contains" relationships or one-to-few relationships. Embedding means storing related data within a single document:
// User Document with Embedded Address
{
"_id": ObjectId("..."),
"name": "Jane Doe",
"email": "jane@example.com",
"createdAt": ISODate("2024-01-01"),
"addresses": [
{
"street": "123 Main St",
"city": "New York",
"zipCode": "10001",
"isDefault": true
},
{
"street": "456 Park Ave",
"city": "Boston",
"zipCode": "02101",
"isDefault": false
}
]
}
Pros: Fast reads (single query). No need for joins. Atomic writes. Better for frequently accessed related data. Cons: Potential data duplication. Documents can grow large. Limited to 16MB document size.
Referencing (Normalization)
Use when you have one-to-many (unbounded) or many-to-many relationships where one parent can have thousands of children:
// User Document
{
"_id": ObjectId("user123"),
"name": "Jane Doe",
"email": "jane@example.com"
}
// Order Document - references the user
{
"_id": ObjectId("order999"),
"user_id": ObjectId("user123"),
"total": 150.00,
"items": ["item1", "item2", "item3"]
}
Pros: Avoids data duplication. Keeps collections manageable. Easier to scale. Cons: Requires multiple queries. Need to implement joins manually. No referential integrity.
Indexing Strategies for Performance
Indexes support the efficient execution of queries. Without indexes, MongoDB must perform a collection scan—reading every document to find matches. Proper indexing is crucial for application performance:
- Single Field Index: Index on a single field for basic filtering.
db.users.createIndex({ email: 1 }) - Compound Index: Index on multiple fields where field order matters significantly. MongoDB uses the index left-to-right.
// Good for queries filtering by status first, then by date db.orders.createIndex({ status: 1, createdAt: -1 }) - Multikey Index: Automatically created when indexing array fields. Allows efficient queries on array elements.
db.blogs.createIndex({ tags: 1 }) // Works on array fields - Text Index: For full-text search capabilities on string fields.
- Sparse Index: Only indexes documents where the indexed field exists, saving space.
Key Design Patterns
Pattern 1: Polymorphic Documents - Store different types of documents in the same collection with a type field:
// Both types in one collection, distinguished by type field
{ "type": "comment", "content": "Great post!" }
{ "type": "like", "emoji": "👍" }
{ "type": "share", "platform": "Twitter" }
Pattern 2: Denormalization for Reads - Duplicate frequently accessed data to avoid joins:
// Store author name in the post document
{
"_id": ObjectId("post123"),
"title": "MongoDB Best Practices",
"authorId": ObjectId("user123"),
"authorName": "Jane Doe", // Denormalized for quick reads
"authorAvatar": "https://...",
"createdAt": ISODate("2024-01-01")
}
Conclusion
MongoDB schema design is less about following rigid rules and more about understanding your application's access patterns. Ask yourself: How will this data be queried? How often? By making deliberate choices about embedding vs referencing and implementing proper indexes, you can build MongoDB applications that are both performant and maintainable.