Data Modeling

Effective data modeling in MongoDB depends on how your application will access and manipulate data. There are two primary patterns: Embedding and Referencing.

Embedding

(Denormalized)

{
  _id: 1,
  user: "John",
  address: {
    city: "NYC",
    zip: 10001
  }
}

Referencing

(Normalized)

{
  _id: 1,
  user: "John",
  address_id: 50
}

1. Embedding (The "Document" Way)

Data is nested inside the parent document. This is preferred for One-to-One and One-to-Few relationships.

  • Pros: Fast reads (single query), data is always together.
  • Cons: Document size limit (16MB), potential data duplication.

2. Referencing (The "SQL" Way)

Related data is stored in separate collections and linked via an ID. This is preferred for One-to-Many (Large) and Many-to-Many relationships.

  • Pros: No data duplication, avoids 16MB limit issues.
  • Cons: Requires multiple queries or `$lookup` (slower).

Guiding Principles

  1. Model for the UI: Structure your documents according to what you display on your screens.
  2. Data that changes together should live together: If you always update two pieces of data together, embed them.
  3. Avoid "N+1" Queries: If referencing forces you to run many separate queries, consider embedding.
Key takeaway: In MongoDB, there is no single "correct" schema. The best schema is the one that provides the best performance for your specific application queries.