MongoDB Best Practices Guide
                        
                    MongoDB is a popular NoSQL document database used across many platforms. It stores data in flexible, JSON-like documents, which allow for hierarchical structures instead of rigid tables. This guide offers practical best practices for junior to mid-level back-end engineers using MongoDB.
Weβll cover schema design, performance tips, indexing, query patterns, transactions, sharding, operations, and common pitfalls. Concepts are explained with code examples, diagrams, and pros/cons.
Note: For the most up-to-date information, please refer to the official documentation.
π§± Schema Design
In MongoDB, schemas are flexible, but thoughtful design prevents performance issues. Unlike relational DBs with joins, MongoDB favors denormalization for reads but requires balancing with updates.
To link related data:
- Embed related data within a single document (denormalized).
 - Reference data in separate collections via IDs (normalized).
 
Hybrid models combine both for complex apps.
Embedded
Embed when data is accessed together and bounded in size (e.g., <100 items to stay under 16MB document limit). This leverages atomic single-document operations.
Example:
Pros and Cons:
| Aspect | Pros | Cons | 
|---|---|---|
| Performance | Single query for reads | Duplication if data shared | 
| Use Cases | One-to-few, read-heavy | Not for unbounded growth (e.g., logs) | 
| Limits | Atomic updates | Exceeds 16MB if arrays grow | 
Suitable for: Better read performance, one-to-few relationships, data read/written together, small subdocuments.
Reference
Use references for extensive or shared data. Resolve via separate queries or $lookup in aggregation.
Pros and Cons:
| Aspect | Pros | Cons | 
|---|---|---|
| Performance | Reduces duplication, handles growth | Multiple queries (N+1 risk) | 
| Use Cases | One-to-many, update-heavy | Slower reads without aggregation | 
| Limits | Flexible scaling | No atomic cross-doc updates without transactions | 
Polymorphic Schemas
For varying document types (e.g., events), use a _type field:
{ _id: ..., _type: "click", data: { ... } }
{ _id: ..., _type: "purchase", data: { ... } }Link to documentation: Mongo data models
π Performance Tips
- Keep frequently accessed data and indexes in RAM (WiredTiger defaults to 50% of RAM minus 1GB). Monitor with 
db.serverStatus().wiredTiger.cacheor tools like mongostat. - Use a single shared client with connection pooling (e.g., maxPoolSize: 100 in Node.js driver). Avoid new connections per request to prevent excessive resource consumption.
 - Defaults for read concern (
local) and write concern (w:1) are fine for many cases, but usew: "majority"for durability in replica sets. Read concerns ensure consistency (e.g., majority for committed data). - Monitor with Atlas, Compass, or Ops Manager. Enable slow query logging with 
db.setProfilingLevel(1, { slowms: 100 }). - Separate read-heavy (analytics) from write-heavy (transactions) using different databases or replica sets.
 
π§ Indexing
- Index fields that appear in queries and sorts:
 
db.users.createIndex({ email: 1 })- Use compound indexes for multi-field queries:
 
db.orders.createIndex({ customerId: 1, orderDate: -1 })- Avoid over-indexing; each index adds RAM and slows down writes.
 - Check index usage with:
 
db.collection.aggregate([{ $indexStats: {} }])- Use projections to fetch only needed fields:
 
db.users.find({}, { name: 1, email: 1, _id: 0 })- Use partial indexes to only include documents that meet certain conditions:
 
db.orders.createIndex(
  { email: 1 },
  { partialFilterExpression: { isActive: true } }
)- Use TTL (Time-To-Live) indexes to auto-expire documents after a period:
 
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 })- Use sparse indexes when the field may not exist in every document:
 
db.users.createIndex({ nickname: 1 }, { sparse: true })- Use wildcard indexes to index fields without knowing their exact names. Useful when you have documents with dynamic or unpredictable fields:
 
db.logs.createIndex({ "$**": 1 })- Use geospatial indexes for location-based queries:
 
db.places.createIndex({ location: "2dsphere" })This allows efficient querying with $near, $geoWithin, etc.
Link to documentation: https://www.mongodb.com/docs/manual/indexes/
π Query Patterns
- Use 
$in,$elemMatch,$exists, and other operators instead of filtering in the app code. - Avoid leading wildcards in regex queries. Use anchored ones like 
/^start/. - Use pagination with ranged queries instead of large 
.skip()values. - Use 
bulkWriteorinsertManyfor batches. - Use aggregation pipelines for complex filtering and transformation.
 - For large sets: 
find().cursor()joins via$lookup. 
π Transactions
- Use multi-document transactions when atomicity is needed (e.g., money transfer).
 - Keep transactions short and limit the number of modified docs.
 - One-document operations are atomic without a transaction.
 - Use transactions only in replica sets or sharded clusters.
 
π§© Sharding
- Donβt shard early. Use vertical scaling first.
 - Choose a high-cardinality shard key with even distribution.
 - Include the shard key in most queries to minimize scatter-gather operations.
 - Monitor chunk distribution with 
sh.status(). - Each shard should be a member of a replica set.
 - Enable balancer for chunk migration; use zones for data locality.
 
βοΈ Operations
- Replica sets for high availability (HA): 3+ nodes, automatic failover via elections.
 - Backups: Use mongodump, filesystem snapshots, or Atlas continuous backups. Test restores.
 - Security: Enable authentication (SCRAM) and TLS encryption. Use role-based access: 
db.createRole(). - Updates: Implementing rolling upgrades to minimize downtime.
 - Monitoring: CPU, disk IOPS, oplog window.
 
π« Common Mistakes
- Over-normalizing like SQL.
 - Embedding large arrays that grow without limit.
 - Creating too many collections.
 - Not indexing properly or indexing everything.
 - Replacing full documents when only a field has changed:
 
db.users.updateOne({ _id: id }, { $set: { email: newEmail } })- Skipping projections and fetching entire documents.
 - N+1 queries. Use 
$inor aggregation to batch reads. - Not batching writes. Use 
insertManyorbulkWrite. - Ignoring errors on writes.
 - Leaving DB open with no auth.
 - Not planning for growth.
 - Arrays create multikey indexes; avoid if not needed.