MongoDB Best Practices Guide

polishchuk 0 55 03.11.2025 5 min read en

MongoDB is a popular NoSQL document database used across many platforms. It stores data in flexible, JSON-like documents, which allow for hierarchical structures instead of rigid tables. This guide offers practical best practices for junior to mid-level back-end engineers using MongoDB.

We’ll cover schema design, performance tips, indexing, query patterns, transactions, sharding, operations, and common pitfalls. Concepts are explained with code examples, diagrams, and pros/cons.

Note: For the most up-to-date information, please refer to the official documentation.

🧱 Schema Design

In MongoDB, schemas are flexible, but thoughtful design prevents performance issues. Unlike relational DBs with joins, MongoDB favors denormalization for reads but requires balancing with updates.

To link related data:

Embed related data within a single document (denormalized).
Reference data in separate collections via IDs (normalized).

Hybrid models combine both for complex apps.

Embedded

Embed when data is accessed together and bounded in size (e.g., <100 items to stay under 16MB document limit). This leverages atomic single-document operations.

Example:

Pros and Cons:

Aspect	Pros	Cons
Performance	Single query for reads	Duplication if data shared
Use Cases	One-to-few, read-heavy	Not for unbounded growth (e.g., logs)
Limits	Atomic updates	Exceeds 16MB if arrays grow

Suitable for: Better read performance, one-to-few relationships, data read/written together, small subdocuments.

Reference

Use references for extensive or shared data. Resolve via separate queries or $lookup in aggregation.

Pros and Cons:

Aspect	Pros	Cons
Performance	Reduces duplication, handles growth	Multiple queries (N+1 risk)
Use Cases	One-to-many, update-heavy	Slower reads without aggregation
Limits	Flexible scaling	No atomic cross-doc updates without transactions

Polymorphic Schemas

For varying document types (e.g., events), use a _type field:

{ _id: ..., _type: "click", data: { ... } }
{ _id: ..., _type: "purchase", data: { ... } }

Link to documentation: Mongo data models

🚀 Performance Tips

Keep frequently accessed data and indexes in RAM (WiredTiger defaults to 50% of RAM minus 1GB). Monitor with db.serverStatus().wiredTiger.cache or tools like mongostat.
Use a single shared client with connection pooling (e.g., maxPoolSize: 100 in Node.js driver). Avoid new connections per request to prevent excessive resource consumption.
Defaults for read concern (local) and write concern (w:1) are fine for many cases, but use w: "majority" for durability in replica sets. Read concerns ensure consistency (e.g., majority for committed data).
Monitor with Atlas, Compass, or Ops Manager. Enable slow query logging with db.setProfilingLevel(1, { slowms: 100 }).
Separate read-heavy (analytics) from write-heavy (transactions) using different databases or replica sets.

🧭 Indexing

Index fields that appear in queries and sorts:

db.users.createIndex({ email: 1 })

Use compound indexes for multi-field queries:

db.orders.createIndex({ customerId: 1, orderDate: -1 })

Avoid over-indexing; each index adds RAM and slows down writes.
Check index usage with:

db.collection.aggregate([{ $indexStats: {} }])

Use projections to fetch only needed fields:

db.users.find({}, { name: 1, email: 1, _id: 0 })

Use partial indexes to only include documents that meet certain conditions:

db.orders.createIndex(
  { email: 1 },
  { partialFilterExpression: { isActive: true } }
)

Use TTL (Time-To-Live) indexes to auto-expire documents after a period:

db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 })

Use sparse indexes when the field may not exist in every document:

db.users.createIndex({ nickname: 1 }, { sparse: true })

Use wildcard indexes to index fields without knowing their exact names. Useful when you have documents with dynamic or unpredictable fields:

db.logs.createIndex({ "$**": 1 })

Use geospatial indexes for location-based queries:

db.places.createIndex({ location: "2dsphere" })

This allows efficient querying with $near, $geoWithin, etc.

Link to documentation: https://www.mongodb.com/docs/manual/indexes/

🔍 Query Patterns

Use $in, $elemMatch, $exists, and other operators instead of filtering in the app code.
Avoid leading wildcards in regex queries. Use anchored ones like /^start/.
Use pagination with ranged queries instead of large .skip() values.
Use bulkWrite or insertMany for batches.
Use aggregation pipelines for complex filtering and transformation.
For large sets: find().cursor() joins via $lookup.

🔄 Transactions

Use multi-document transactions when atomicity is needed (e.g., money transfer).
Keep transactions short and limit the number of modified docs.
One-document operations are atomic without a transaction.
Use transactions only in replica sets or sharded clusters.

🧩 Sharding

Don’t shard early. Use vertical scaling first.
Choose a high-cardinality shard key with even distribution.
Include the shard key in most queries to minimize scatter-gather operations.
Monitor chunk distribution with sh.status().
Each shard should be a member of a replica set.
Enable balancer for chunk migration; use zones for data locality.

⚙️ Operations

Replica sets for high availability (HA): 3+ nodes, automatic failover via elections.
Backups: Use mongodump, filesystem snapshots, or Atlas continuous backups. Test restores.
Security: Enable authentication (SCRAM) and TLS encryption. Use role-based access: db.createRole().
Updates: Implementing rolling upgrades to minimize downtime.
Monitoring: CPU, disk IOPS, oplog window.

🚫 Common Mistakes

Over-normalizing like SQL.
Embedding large arrays that grow without limit.
Creating too many collections.
Not indexing properly or indexing everything.
Replacing full documents when only a field has changed:

db.users.updateOne({ _id: id }, { $set: { email: newEmail } })

Skipping projections and fetching entire documents.
N+1 queries. Use $in or aggregation to batch reads.
Not batching writes. Use insertMany or bulkWrite.
Ignoring errors on writes.
Leaving DB open with no auth.
Not planning for growth.
Arrays create multikey indexes; avoid if not needed.

Comments:

Please log in to be able add comments.