MongoDB Best Practices Guide

0 11 5 min read en

MongoDB is a popular NoSQL document database used across many platforms. It stores data in flexible, JSON-like documents, which allow for hierarchical structures instead of rigid tables. This guide offers practical best practices for junior to mid-level back-end engineers using MongoDB.

We’ll cover schema design, performance tips, indexing, query patterns, transactions, sharding, operations, and common pitfalls. Concepts are explained with code examples, diagrams, and pros/cons. 

Note: For the most up-to-date information, please refer to the official documentation.

🧱 Schema Design

In MongoDB, schemas are flexible, but thoughtful design prevents performance issues. Unlike relational DBs with joins, MongoDB favors denormalization for reads but requires balancing with updates.

To link related data:

  • Embed related data within a single document (denormalized).
  • Reference data in separate collections via IDs (normalized).

Hybrid models combine both for complex apps.

Embedded

Embed when data is accessed together and bounded in size (e.g., <100 items to stay under 16MB document limit). This leverages atomic single-document operations.

Example: 

Embedded Mongo DB

Pros and Cons:

AspectProsCons
PerformanceSingle query for readsDuplication if data shared
Use CasesOne-to-few, read-heavyNot for unbounded growth (e.g., logs)
LimitsAtomic updatesExceeds 16MB if arrays grow

Suitable for: Better read performance, one-to-few relationships, data read/written together, small subdocuments.

Reference

Use references for extensive or shared data. Resolve via separate queries or $lookup in aggregation.

Reference mondodb

Pros and Cons:

AspectProsCons
PerformanceReduces duplication, handles growthMultiple queries (N+1 risk)
Use CasesOne-to-many, update-heavySlower reads without aggregation
LimitsFlexible scalingNo atomic cross-doc updates without transactions

Polymorphic Schemas

For varying document types (e.g., events), use a _type field:

{ _id: ..., _type: "click", data: { ... } }
{ _id: ..., _type: "purchase", data: { ... } }

Link to documentation: Mongo data models

πŸš€ Performance Tips

  • Keep frequently accessed data and indexes in RAM (WiredTiger defaults to 50% of RAM minus 1GB). Monitor with db.serverStatus().wiredTiger.cache or tools like mongostat.
  • Use a single shared client with connection pooling (e.g., maxPoolSize: 100 in Node.js driver). Avoid new connections per request to prevent excessive resource consumption.
  • Defaults for read concern (local) and write concern (w:1) are fine for many cases, but use w: "majority" for durability in replica sets. Read concerns ensure consistency (e.g., majority for committed data).
  • Monitor with Atlas, Compass, or Ops Manager. Enable slow query logging with db.setProfilingLevel(1, { slowms: 100 }).
  • Separate read-heavy (analytics) from write-heavy (transactions) using different databases or replica sets.

🧭 Indexing

  • Index fields that appear in queries and sorts:
db.users.createIndex({ email: 1 })
  • Use compound indexes for multi-field queries:
db.orders.createIndex({ customerId: 1, orderDate: -1 })
  • Avoid over-indexing; each index adds RAM and slows down writes.
  • Check index usage with:
db.collection.aggregate([{ $indexStats: {} }])
  • Use projections to fetch only needed fields:
db.users.find({}, { name: 1, email: 1, _id: 0 })
  • Use partial indexes to only include documents that meet certain conditions:
db.orders.createIndex(
  { email: 1 },
  { partialFilterExpression: { isActive: true } }
)
  • Use TTL (Time-To-Live) indexes to auto-expire documents after a period:
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 })
  • Use sparse indexes when the field may not exist in every document:
db.users.createIndex({ nickname: 1 }, { sparse: true })
  • Use wildcard indexes to index fields without knowing their exact names. Useful when you have documents with dynamic or unpredictable fields:
db.logs.createIndex({ "$**": 1 })
  • Use geospatial indexes for location-based queries:
db.places.createIndex({ location: "2dsphere" })

This allows efficient querying with $near, $geoWithin, etc.

Link to documentation: https://www.mongodb.com/docs/manual/indexes/

πŸ” Query Patterns

  • Use $in, $elemMatch, $exists, and other operators instead of filtering in the app code.
  • Avoid leading wildcards in regex queries. Use anchored ones like /^start/.
  • Use pagination with ranged queries instead of large .skip() values.
  • Use bulkWrite or insertMany for batches.
  • Use aggregation pipelines for complex filtering and transformation.
  • For large sets: find().cursor() joins via $lookup.

πŸ”„ Transactions

  • Use multi-document transactions when atomicity is needed (e.g., money transfer).
  • Keep transactions short and limit the number of modified docs.
  • One-document operations are atomic without a transaction.
  • Use transactions only in replica sets or sharded clusters.

🧩 Sharding

  • Don’t shard early. Use vertical scaling first.
  • Choose a high-cardinality shard key with even distribution.
  • Include the shard key in most queries to minimize scatter-gather operations.
  • Monitor chunk distribution with sh.status().
  • Each shard should be a member of a replica set.
  • Enable balancer for chunk migration; use zones for data locality.

βš™οΈ Operations

  • Replica sets for high availability (HA): 3+ nodes, automatic failover via elections.
  • Backups: Use mongodump, filesystem snapshots, or Atlas continuous backups. Test restores.
  • Security: Enable authentication (SCRAM) and TLS encryption. Use role-based access: db.createRole().
  • Updates: Implementing rolling upgrades to minimize downtime.
  • Monitoring: CPU, disk IOPS, oplog window.

🚫 Common Mistakes

  • Over-normalizing like SQL.
  • Embedding large arrays that grow without limit.
  • Creating too many collections.
  • Not indexing properly or indexing everything.
  • Replacing full documents when only a field has changed:
db.users.updateOne({ _id: id }, { $set: { email: newEmail } })
  • Skipping projections and fetching entire documents.
  • N+1 queries. Use $in or aggregation to batch reads.
  • Not batching writes. Use insertMany or bulkWrite.
  • Ignoring errors on writes.
  • Leaving DB open with no auth.
  • Not planning for growth.
  • Arrays create multikey indexes; avoid if not needed.

Comments:

Please log in to be able add comments.