Part 7: SQL Database Interview Questions and Answers
his chapter explores SQL Database questions that .NET engineers should be able to answer in an interview.
The Answers are split into sections: What πΌ Junior, π Middle, and π Senior .NET engineers should know about a particular topic.
Also, please take a look at other articles in the series: C# / .NET Interview Questions and Answers
Core Concepts

β What is normalization, and when would you denormalize your schema?
Database normalization organizes data into related tables to reduce duplication and maintain consistency.
Each fact is stored once, and relationships define how data connects.
Normalization happens in stages β Normal Forms (1NF to 5NF) β each with stricter design rules that remove redundancy.

When to denormalize?
Normalization improves consistency, but it can slow down reads because it requires joins.
Sometimes you denormalize for performance β for example:
- Reporting dashboards that repeatedly join multiple large tables.
- E-commerce product listings where you store the category name inside the Products table to avoid joins.
- Analytics systems where query speed matters more than storage efficiency.
In modern systems, normalization is mainly a logical design principle, while denormalization is often applied physically β through materialized views, caching, or summary tables rather than manual duplication.
What .NET engineers should know about normalization:
- πΌ Junior: Know normalization avoids duplicate data and organizes tables by rules (1NF, 2NF, 3NFβ¦).
- π Middle: Understand trade-offs: normalized data is consistent, but denormalized data is faster to query, with the risk of duplication.
- π Senior: Decide where to denormalize (e.g., caching, read models, reporting) and design hybrid models that balance performance and maintainability.
π Resources: 5 Database Normalization Forms
β How would you explain the ACID properties to a junior developer, and why are they important?
ACID defines the guarantees that a transactional database provides to ensure data correctness and reliability.
It stands for:
- Atomicity β a transaction is all-or-nothing. Example: when transferring money, either both debit and credit succeed or neither does.
- Consistency β every transaction maintains the database's validity in accordance with rules and constraints (no orphan records, no negative balances).
- Isolation β transactions running at the same time behave as if they were executed sequentially.
- Durability β once a transaction is committed, its changes survive crashes or power loss.
Without ACID, data could be lost or duplicated β e.g., charging a user twice or creating inconsistent order records.

What .NET engineers should know:
- πΌ Junior: Know that ACID keeps database operations safe and consistent.
- π Middle: Understand how transactions enforce atomicity and isolation, and when to use them in business operations.
- π Senior: Tune isolation levels, handle concurrency issues, and design systems that balance strict ACID with performance (e.g., when to use eventual consistency).
π Resources: ACID in Simple Terms
β What does SARGability mean in SQL?
SARGability (Search ARGument Able) describes whether a query condition can efficiently use an index.
A SARGable query allows the database engine to perform an Index Seek rather than scanning the entire table.
β SARGable example
SELECT * FROM Orders WHERE OrderDate >= '2024-01-01';β Non-SARGable example:
SELECT * FROM Orders WHERE YEAR(OrderDate) = 2024;In the second query, the YEAR() function blocks the use of the index on OrderDate, forcing a full scan.
The idea is to write conditions so that the column stands alone on one side of the comparison β no functions or complex expressions applied to it.
I usually check execution plans for βIndex Seekβ vs βIndex Scanβ to spot non-SARGable patterns.
What .NET engineers should know:
- πΌ Junior: Understand that SARGability helps SQL use indexes efficiently instead of scanning all rows.
- π Middle: Avoid functions, calculations, or mismatched types in WHERE clauses that block index seeks.
- π Senior: Analyze execution plans and ORM-generated SQL (e.g., EF Core LINQ queries) for non-SARGable patterns; design indexes and predicates together for high-performance queries.
π Resources: What Is Sargability in SQL
β Whatβs the difference between a WHERE clause and a HAVING clause? Can you give a practical example of when youβd need HAVING?

The difference lies in when each filter runs during query execution:
WHEREfilters rows before grouping/aggregation. βGive me all orders placed in 2025.βHAVINGfilters groups after aggregation. βGive me all customers who placed more than five orders in 2025.β
So if youβre working with raw rows, use WHERE. If youβre filtering based on an aggregate like COUNT(), SUM(), AVG(), you need HAVING.
Example: find customers with more than five orders.
-- Wrong (WHERE doesnβt see aggregated COUNT)
SELECT CustomerId, COUNT(*) AS OrderCount
FROM Orders
WHERE COUNT(*) > 5 -- β invalid
GROUP BY CustomerId;
-- Correct (HAVING filters on aggregated result)
SELECT CustomerId, COUNT(*) AS OrderCount
FROM Orders
WHERE OrderDate >= '2025-01-01' AND OrderDate < '2026-01-01' -- β
Filter by year first
GROUP BY CustomerId
HAVING COUNT(*) > 5;Best practice:
- Use
WHEREfirst to reduce the data volume before grouping β itβs faster. - Only use
HAVINGfor aggregate conditions. - Some databases (e.g., PostgreSQL) may internally optimize
HAVINGwithout aggregates into aWHERE.
In EF Core, this maps to:
var result = db.Orders
.Where(o => o.OrderDate.Year == 2025)
.GroupBy(o => o.CustomerId)
.Where(g => g.Count() > 5)
.Select(g => new { CustomerId = g.Key, OrderCount = g.Count() });EF translates the second Where after the GroupBy into a SQL HAVING clause.
What .NET engineers should know:
- πΌ Junior: Know
WHEREfilters rows before grouping,HAVINGfilters after aggregation. - π Middle: Understand query execution order (
FROM β WHERE β GROUP BY β HAVING β SELECT) and filter early for performance. - π Senior: Optimize aggregates and grouping operations, ensure correct index usage, and review generated SQL from LINQ group queries to avoid performance pitfalls.
π Resources: The Difference Between WHERE and HAVING in SQL
β What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN?
Joins define how rows from two tables are combined. They determine which rows are included when thereβs no matching data between the tables.
- INNER JOIN β returns only matching rows from both tables.
- LEFT JOIN β returns all rows from the left table and matching rows from the right; unmatched ones show
NULLon the right side. - RIGHT JOIN β the opposite of LEFT JOIN; all rows from the right, plus matching rows from the left.
- FULL OUTER JOIN β returns all rows from both tables, filling
NULLfor missing matches.

Example:
-- Tables: Users and Orders
-- INNER JOIN: only users who have orders
SELECT u.UserName, o.OrderDate
FROM Users u
INNER JOIN Orders o ON u.UserId = o.UserId;
-- LEFT JOIN: all users, even if no orders
SELECT u.UserName, o.OrderDate
FROM Users u
LEFT JOIN Orders o ON u.UserId = o.UserId;
-- RIGHT JOIN: all orders, even if user missing
SELECT u.UserName, o.OrderDate
FROM Users u
RIGHT JOIN Orders o ON u.UserId = o.UserId;
-- FULL OUTER JOIN: everything from both tables
SELECT u.UserName, o.OrderDate
FROM Users u
FULL OUTER JOIN Orders o ON u.UserId = o.UserId;π‘ Best practices:
- Prefer
LEFT JOINinstead ofRIGHT JOINβ Itβs clearer and more widely supported. - Use explicit
JOINsyntax (not comma joins). - Always filter joined data explicitly to avoid Cartesian products (huge cross-multiplication of rows).
In EF Core:
// INNER JOIN
var result = from u in db.Users
join o in db.Orders on u.UserId equals o.UserId
select new { u.UserName, o.OrderDate };
// LEFT JOIN
var leftJoin = from u in db.Users
join o in db.Orders on u.UserId equals o.UserId into userOrders
from o in userOrders.DefaultIfEmpty()
select new { u.UserName, o?.OrderDate };EF Core automatically translates LINQ joins to SQL INNER JOIN or LEFT JOIN.
What .NET engineers should know:
- πΌ Junior: Know INNER JOIN shows only matching rows, LEFT/RIGHT keep all rows from one side.
- π Middle: Understand when to use each join, and how NULL values appear in results.
- π Senior: Optimize joins on large datasets (indexes, query plans), avoid Cartesian products, and design schemas that minimize unnecessary joins.
π Resources: SQL Join types explained visually
β Let's say you have a table for Posts and Comments. How would you model the database to retrieve a post along with all its associated comments efficiently?
The classic way is a one-to-many relationship:
- A
Poststable with a primary key (PostId). - A
Commentstable with a foreign key (PostId) pointing toPosts.
Schema:
CREATE TABLE Posts (
PostId INT PRIMARY KEY,
Title NVARCHAR(200),
Content NVARCHAR(MAX),
CreatedAt DATETIME2
);
CREATE TABLE Comments (
CommentId INT PRIMARY KEY,
PostId INT NOT NULL,
Author NVARCHAR(100),
Text NVARCHAR(MAX),
CreatedAt DATETIME2,
FOREIGN KEY (PostId) REFERENCES Posts(PostId)
);Query to fetch a post with its comments:
SELECT p.PostId, p.Title, p.Content, c.CommentId, c.Author, c.Text, c.CreatedAt
FROM Posts p
LEFT JOIN Comments c ON p.PostId = c.PostId
WHERE p.PostId = @postId
ORDER BY c.CreatedAt;Efficiency considerations:
- Index
Comments.PostIdfor fast lookups. - Use pagination if a post can have thousands of comments (
OFFSET ... FETCH). - In ORMs like EF Core, you can use eager loading:
// Eager loading (loads post + all comments)
var post = await db.Posts
.Include(p => p.Comments)
.FirstOrDefaultAsync(p => p.PostId == id);
// Lazy loading alternative (loads comments only when accessed)
db.ChangeTracker.LazyLoadingEnabled = true;
var post = await db.Posts.FindAsync(id);
var comments = post.Comments; // triggers lazy loadWhat .NET engineers should know:
- πΌ Junior: Know a post has many comments, linked by a foreign key.
- π Middle: Know how to query one-to-many related items using
JOINor EF Core.Include(). Handle pagination for large collections. - π Senior: Optimize read-heavy scenarios β use caching, denormalized read models, or projections; tune EF loading strategies (eager, lazy, explicit) for scalability.
β How would you model a "self-referencing" relationship, like an employee-manager hierarchy, in a SQL table?
A self-referencing relationship means a tableβs rows relate to other rows in the same table.
In an employeeβmanager hierarchy, both employees and managers are stored in a single table β a manager is simply another employee.
Table design
You add a column that references the same tableβs primary key.
CREATE TABLE Employees (
EmployeeId INT PRIMARY KEY,
Name NVARCHAR(100) NOT NULL,
ManagerId INT NULL,
FOREIGN KEY (ManagerId) REFERENCES Employees(EmployeeId)
);EmployeeId- uniquely identifies each employee.ManagerId- points to another employee (their manager).- Top-level managers (like CEOs) have
NULLasManagerId.
Example data:
| EmployeeId | Name | ManagerId |
|---|---|---|
| 1 | Alice (CEO) | NULL |
| 2 | Bob | 1 |
| 3 | Carol | 2 |
| 4 | Dave | 2 |
This creates a simple hierarchy: Alice β Bob β Carol/Dave
Querying the hierarchy
Find direct reports:
SELECT * FROM Emloyees WHERE ManagerId = 2; -- Bobβs teamFind employee + manager name (self-join):
SELECT e.Name AS Employee, m.Name AS Manager
FROM Employees e
LEFT JOIN Employees m ON e.ManagerId = m.EmployeeId;Find complete hierarchy (recursive CTE):
WITH OrgChart AS (
SELECT EmployeeId, Name, ManagerId, 0 AS Level
FROM Employees
WHERE ManagerId IS NULL
UNION ALL
SELECT e.EmployeeId, e.Name, e.ManagerId, Level + 1
FROM Employees e
JOIN OrgChart o ON e.ManagerId = o.EmployeeId
)
SELECT * FROM OrgChart;Design considerations
- Add indexes on ManagerId for faster lookups.
- Use recursive CTEs for reporting and hierarchy queries.
- Prevent circular references (e.g., an employee managing themselves) with constraints or triggers.
What .NET engineers should know:
- πΌ Junior: Know
ManagerIdreferencesEmployeeIdin the same table. - π Middle: Understand how to query hierarchies using self-joins or recursive CTEs.
- π Senior: Design for large org charts β use indexes, detect cycles, and consider hierarchyid (SQL Server) or closure tables for complex trees.
π Resources:
β Whatβs the difference between a Primary Key, Unique Key, and Foreign Key?
These three types of constraints define how data relate to and remain consistent in a relational database.
| Key Type | Purpose | Allows NULLs | Can have duplicates? | Example Use |
|---|---|---|---|---|
| Primary Key | Uniquely identifies each row in a table. | No | No | Id column in Users table |
| Unique Key | Ensures all values in a column (or combination) are unique. | Yes (one in SQL Server, multiple in PostgreSQL/MySQL) | No | Email in Users table |
| Foreign Key | Links a row to another tableβs primary key to maintain referential integrity. | Yes | Yes | UserId in Orders table referencing Users.Id |
Example:
CREATE TABLE Users (
Id INT PRIMARY KEY,
Email NVARCHAR(100) UNIQUE,
Name NVARCHAR(100)
);
CREATE TABLE Orders (
Id INT PRIMARY KEY,
UserId INT,
FOREIGN KEY (UserId) REFERENCES Users(Id)
);- The Primary Key (
Users.Id) uniquely identifies each user. - The Unique Key (
Email) ensures no two users share the same email. - The Foreign Key (
Orders.UserId) ensures each order belongs to an existing user.
Best practices:
- Define indexes on foreign keys to improve join performance.
- Use composite keys when a single column doesnβt uniquely identify a record (e.g.,
(OrderId, ProductId)). - Avoid natural keys such as email addresses or usernames for PKs β use surrogate keys (INT or GUID).
In EF Core:
public class User
{
[Key] public int Id { get; set; }
[Index(IsUnique = true)] public string Email { get; set; } = null!;
public ICollection<Order> Orders { get; set; } = new List<Order>();
}
public class Order
{
[Key] public int Id { get; set; }
[ForeignKey(nameof(User))] public int UserId { get; set; }
public User User { get; set; } = null!;
}What .NET engineers should know:
- πΌ Junior: Understand that a Primary Key uniquely identifies a record, a Unique Key prevents duplicates, and a Foreign Key links related tables.
- π Middle: Know how these constraints enforce integrity in SQL and how EF Core represents them with
[Key],[Index(IsUnique=true)], and[ForeignKey]. - π Senior: Design schemas with composite keys, cascade behaviors, and indexed foreign keys. Balance normalization with query performance.
π Resources: SQL Constraints
β How do foreign keys affect data integrity and performance?
A foreign key enforces a relationship between two tables. It ensures that the value in one table (the child) matches an existing value in another (the parent), thereby maintaining referential integrity. But this integrity comes with a performance cost β every insert, update, or delete must be validated by the database.
Example:
CREATE TABLE Orders (
Id INT PRIMARY KEY,
CustomerId INT,
FOREIGN KEY (CustomerId) REFERENCES Customers(Id)
);This guarantees that every CustomerId in Orders exists in Customers.
Impact on performance:
- Every
INSERT,UPDATE, orDELETEmust check the parent table β adding a small validation cost. - Foreign keys can improve query plans when indexed properly (faster joins).
- Use
ON DELETE CASCADEorON UPDATE CASCADEcarefully β convenient, but can cause large chained deletions.
When high-throughput writes are required (e.g., ETL, event ingestion), you may temporarily disable constraints or defer them until batch completion β but only when the application guarantees consistency.
In EF Core, foreign keys are automatically created for navigation properties unless configured otherwise:
modelBuilder.Entity<Order>()
.HasOne(o => o.Customer)
.WithMany(c => c.Orders)
.HasForeignKey(o => o.CustomerId)
.OnDelete(DeleteBehavior.Restrict);This maps directly to a SQL foreign key constraint.
What .NET engineers should know:
- πΌ Junior: Understand that foreign keys ensure data consistency β every child row must reference an existing parent.
- π Middle: Know foreign keys add validation overhead on writes, but help query optimization when indexed.
- π Senior: Design trade-offs β when to use cascading deletes, when to disable or defer constraints for bulk inserts, and how to manage integrity in distributed or event-driven systems.
β When would you use a junction table in a many-to-many relationship?
A junction table (or bridge table) links two tables in a many-to-many relationship.
Each record in one table can relate to many in the other, and vice versa.
The junction table breaks this into two one-to-many relationships.
Example:
CREATE TABLE Students (
Id INT PRIMARY KEY,
Name NVARCHAR(100)
);
CREATE TABLE Courses (
Id INT PRIMARY KEY,
Title NVARCHAR(100)
);
CREATE TABLE StudentCourses (
StudentId INT,
CourseId INT,
PRIMARY KEY (StudentId, CourseId),
FOREIGN KEY (StudentId) REFERENCES Students(Id),
FOREIGN KEY (CourseId) REFERENCES Courses(Id)
);Here, StudentCourses connects students and courses β a student can enroll in many classes, and each course can have many students.
If the link itself has attributes (e.g., EnrollmentDate, Grade), the junction table becomes a fully modeled entity rather than just a connector.
In EF Core:
Implicit many-to-many (no explicit join entity):
modelBuilder.Entity<Student>()
.HasMany(s => s.Courses)
.WithMany(c => c.Students);Explicit junction entity (when you need extra fields):
public class StudentCourse
{
public int StudentId { get; set; }
public int CourseId { get; set; }
public DateTime EnrolledOn { get; set; }
}What .NET engineers should know:
- πΌ Junior: Understand that a junction table connects two tables when many-to-many relationships are required.
- π Middle: Know how to define and query many-to-many relationships and avoid storing lists of IDs in a single column.
- π Senior: Choose between implicit EF Core many-to-many mappings and explicit junction entities. Optimize for large datasets with proper indexes and handle cascade rules carefully.
Querying and Optimization

β How would you return all users and their last order date, even if some users have no orders
Youβd join the Users table with the Orders table using a LEFT JOIN, so users without orders still appear.
To get the last order date, use MAX(order_date) and group by the user.
Example:
SELECT
u.UserId,
u.UserName,
MAX(o.OrderDate) AS LastOrderDate
FROM Users u
LEFT JOIN Orders o
ON u.UserId = o.UserId
GROUP BY u.UserId, u.UserName
ORDER BY LastOrderDate DESC;This ensures:
- Users with orders show their latest order date.
- Users with no orders still appear, but
LastOrderDatewill beNULL.
π‘ Performance tip:
For large datasets, use a window function instead of grouping:
SELECT UserId, UserName, OrderDate
FROM (
SELECT
u.UserId, u.UserName, o.OrderDate,
ROW_NUMBER() OVER (PARTITION BY u.UserId ORDER BY o.OrderDate DESC) AS rn
FROM Users u
LEFT JOIN Orders o ON u.UserId = o.UserId
) AS ranked
WHERE rn = 1;
This avoids complete aggregation when you only need the most recent row per user.
What .NET engineers should know:
- πΌ Junior: Know how to join tables and why to use
LEFT JOINto keep all users. - π Middle: Understand grouping and aggregation (
MAX,GROUP BY), and how to handleNULLresults safely. - π Senior: Optimize for scale β use indexes on
UserIdandOrderDate, consider window functions for efficiency, and avoid unnecessary sorting in ORM queries.
π Resources:
β How does a subquery differ from a JOIN?
A JOIN combines data from multiple tables into one result set by linking rows that share a related key. A subquery runs a nested query first, then uses its result in the outer query β often as a filter or computed value.
Both return similar results, but the JOIN is usually faster and more readable for multi-table queries.
Example (JOIN):
SELECT o.OrderId, u.UserName
FROM Orders o
JOIN Users u ON o.UserId = u.UserId;
Example (Subquery):
SELECT OrderId
FROM Orders
WHERE UserId IN (SELECT UserId FROM Users WHERE IsActive = 1);
When to use each:
- Use a JOIN when you need data from multiple tables side-by-side.
- Use a subquery when you only need to reference another tableβs value (like filtering or aggregation).
π‘ Performance note: Modern SQL optimizers often rewrite subqueries as joins, but correlated subqueries (those that reference the outer query) can be slower since they may execute once per row. Prefer CTEs or JOINs for clarity and optimization hints in complex logic.
What .NET engineers should know:
- πΌ Junior: Understand that JOIN merges tables, while a subquery runs a query inside another query.
- π Middle: Know that JOINs are typically more efficient, but subqueries can simplify logic when you only need a single value or aggregate (like
MAX(),COUNT(), etc.). - π Senior: Use subqueries carefully β they can hurt performance if executed per row. Replace with JOINs or CTEs when possible for clarity and optimizer hints. Understand that some databases internally rewrite subqueries into joins.
π Resources: Joins SQL Server
β What is a Common Table Expression (CTE) and how does it differ from a temporary table?
A Common Table Expression (CTE) is a temporary, named result set defined within a query.
A temporary table is a physical object created in the temp database that can be reused within the same session.
Example scenario
We needed to find customers who placed multiple orders in the last 30 days and spent more than 10% above their average purchase value.
The original query had several nested subqueries and was hard to maintain, so it was refactored using a CTE:
CTE
WITH RecentOrders AS (
SELECT CustomerId, SUM(Amount) AS TotalSpent
FROM Orders
WHERE OrderDate >= DATEADD(DAY, -30, GETDATE())
GROUP BY CustomerId
),
AverageSpending AS (
SELECT CustomerId, AVG(Amount) AS AvgSpent
FROM Orders
GROUP BY CustomerId
)
SELECT r.CustomerId, r.TotalSpent, a.AvgSpent
FROM RecentOrders r
JOIN AverageSpending a ON r.CustomerId = a.CustomerId
WHERE r.TotalSpent > a.AvgSpent * 1.1;CTEs improve readability by breaking complex queries into logical parts. They exist only during query execution.
Example with a temporary table:
SELECT * INTO #RecentOrders
FROM Orders WHERE OrderDate >= DATEADD(DAY, -30, GETDATE());
CREATE INDEX IX_RecentOrders_CustomerId ON #RecentOrders(CustomerId);Temporary tables persist for the session, can be indexed, and are helpful when the same data is reused multiple times.
When should you use a CTE vs. a temporary table?
| Technique | Use When | Advantages | Limitations |
|---|---|---|---|
| CTE | Query needs multiple logical steps or recursion | Improves readability; no cleanup needed | Re-evaluated on each reference |
| Temp Table | Data reused or needs indexing | Can persist and be optimized | Extra I/O and storage overhead |
CTEs can also be recursive β ideal for hierarchical data (e.g., employee trees, folder structures).
What .NET engineers should know:
- πΌ Junior: Know what CTEs and temp tables are and how they simplify complex queries.
- π Middle: Understand performance trade-offs β CTEs are inline, temp tables can be indexed and reused.
- π Senior: Choose based on workload β use temp tables for large intermediate datasets and CTEs for clarity or recursion. In ORMs like EF Core, CTEs often appear in generated SQL for LINQ groupings or projections.
π Resources:
β What are window functions (ROW_NUMBER, RANK, DENSE_RANK, etc.), and where are they useful?
Window functions perform calculations across a set of rows related to the current row β without collapsing them like GROUP BY does. Theyβre used for ranking, running totals, moving averages, and comparing rows within partitions.
Example:
SELECT
EmployeeId,
DepartmentId,
Salary,
ROW_NUMBER() OVER (PARTITION BY DepartmentId ORDER BY Salary DESC) AS RowNum,
RANK() OVER (PARTITION BY DepartmentId ORDER BY Salary DESC) AS RankNum,
DENSE_RANK() OVER (PARTITION BY DepartmentId ORDER BY Salary DESC) AS DenseRankNum
FROM Employees;ROW_NUMBER()gives unique sequential numbers.RANK()skips numbers when there are ties.DENSE_RANK()doesnβt skip numbers on relations.
Other common window functions:
SUM()/AVG()withOVER()running totals and moving averages.LAG()/LEAD()compare current and previous/next rows (e.g., change since last month).
π‘ Performance tips:
- Index partition and order columns used in the window definition.
- Avoid sorting giant unindexed sets β the
ORDER BYin a window function can be expensive. - Prefer window functions over self-joins or correlated subqueries β theyβre more efficient and readable.
In EF Core, window functions can be written via raw SQL or newer LINQ support for RowNumber and pagination:
var employees = db.Employees
.OrderByDescending(e => e.Salary)
.Select((e, i) => new { e.Name, RowNumber = i + 1 })
.Take(10);
What .NET engineers should know:
- πΌ Junior: Know that window functions let you number or compare rows without grouping them.
- π Middle: Use them for pagination, top-N queries, or comparisons. Know the differences between
ROW_NUMBER,RANK, andDENSE_RANK. - π Senior: Optimize partitioning and ordering; apply window functions to replace self-joins or nested queries for better performance. Ensure indexing aligns with partition keys on large datasets.
π Resources: Introduction to T-SQL Window Functions
β You have a query that needs to filter on a column that can contain NULL values. What are some pitfalls to avoid?
In SQL, NULL means unknown, not βemptyβ β which affects comparisons, filters, and even indexes. SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. Thatβs why queries involving NULL behave differently than expected.
1. Comparisons with NULL
Comparing with = or != never returns TRUE:
SELECT * FROM Users WHERE Email = NULL; -- β Always returns 0 rowsβ Correct way:
SELECT * FROM Users WHERE Email IS NULL;
SELECT * FROM Users WHERE Email IS NOT NULL;2. Beware of NOT IN with NULLs
If the subquery or list contains a NULL, your entire comparison fails because of SQLβs three-valued logic.
-- Suppose one of these Ids is NULL
SELECT * FROM Users WHERE Id NOT IN (SELECT UserId FROM BannedUsers);
-- β Returns 0 rows if any UserId is NULLβ
Use NOT EXISTS instead:
SELECT * FROM Users u
WHERE NOT EXISTS (
SELECT 1 FROM BannedUsers b WHERE b.UserId = u.Id
);3. Indexes and NULLs
- Most databases include
NULLs in indexes, but their handling can differ. - SQL Server and PostgreSQL include them by default; PostgreSQL also allows partial indexes:
Example (exclude NULLs from index):
CREATE INDEX IX_Users_Email_NotNull
ON Users(Email)
WHERE Email IS NOT NULL;4. Aggregations skip NULLs
- Functions like
COUNT(column)ignore NULLs. - If you want to count all rows, use
COUNT(*).
-- Counts only non-null emails
SELECT COUNT(Email) FROM Users;
-- Counts all users
SELECT COUNT(*) FROM Users;5. Handle NULLs explicitly
Use COALESCE or ISNULL to replace unknowns:
SELECT COALESCE(Phone, 'N/A') FROM Users;Note:
In C#, null is not the same as SQL NULL. When using ADO.NET or EF Core, DBNull.Value represents a SQL null β ensure proper conversion when reading or writing nullable columns.
What .NET engineers should know:
- πΌ Junior: Understand
NULLmeans βunknown,β not empty. UseIS NULLandIS NOT NULL. - π Middle: Know pitfalls like
NOT INwithNULLs, and that aggregations skip nulls. Learn how indexing treats null values. - π Senior: Design schema and defaults to minimize
NULLissues. Handle conversions properly in code (DBNullvsnull), and ensure queries, constraints, and ORM mappings behave consistently.
π Resources:
β How do you decide between using a JOIN in the database versus handling the relationship in your application code?
Whether to JOIN in SQL or in application code depends on where itβs cheaper and cleaner to combine the data.
Use a JOIN in the database when:
- Tables have a clear relational link (
Orders β Customers). - The database can perform filtering, aggregation, or sorting more efficiently.
- You want to minimize round-trips between the app and DB.
- Data volumes are moderate and join operations are well-indexed.
Example (JOIN in DB is usually better):
SELECT o.OrderId, u.UserName
FROM Orders o
JOIN Users u ON o.UserId = u.UserId;Handle in application code when:
- Data comes from multiple sources (SQL, external APIs, caches, or services).
- Joins would require complex logic that cannot be expressed in SQL (e.g., custom business rules, ML scoring, etc.)
- Youβre batching or caching data to improve performance across requests.
- You need loose coupling (e.g., in microservices β separate read models per service).
If Users were in SQL, but order details came from a third-party API, youβd fetch them separately in C# and merge in memory.
In EF Core:
var orders = db.Orders
.Include(o => o.User) // Executes a JOIN
.ToList();Use .Include() or .ThenInclude() when data lives in the same DB. If the data comes from another source (e.g., an API), load it separately and merge it in memory.
What .NET engineers should know:
- πΌ Junior: Understand that
JOINcombines related tables, while app-side joins merge data after retrieval. - π Middle: Know when DB joins are more efficient and when app-side composition makes sense (e.g., combining SQL + API results).
- π Senior: Decide on a join strategy based on latency, data ownership, and scalability. Avoid chatty queries; favor server-side joins for local data and app joins for distributed systems.
π Resources:
β What is the difference between COUNT(*) and COUNT(column_name)?
COUNT(*) counts all rows in the result set β including those with NULL values. COUNT(column_name) counts only rows where the column is not NULL.
Example:
SELECT
COUNT(*) AS TotalUsers, -- Counts all rows
COUNT(Email) AS UsersWithEmail -- Counts only rows where Email IS NOT NULL
FROM Users;In most databases (SQL Server, PostgreSQL, MySQL),
COUNT(*)does not read all columns β itβs internally optimized to count rows from metadata or index statistics.COUNT(1)behaves identically toCOUNT(*); itβs a common myth that itβs faster.
In EF Core:
var total = await db.Users.CountAsync();
var withEmail = await db.Users.CountAsync(u => u.Email != null);
Both queries translate directly to the correct SQL form (COUNT(*) vs COUNT(column)).
What .NET engineers should know:
- πΌ Junior: Remember that
COUNT(*)counts every row, whileCOUNT(column)skipsNULLs. - π Middle: Use
COUNT(column)when checking for filled fields; understand thatCOUNT(1)is not faster thanCOUNT(*). - π Senior: Know how COUNT is optimized by the DB engine (metadata or index scans) and ensure correct semantic intent when using ORM predicates (
!= null).
β How can you pivot or unpivot data in SQL?
Pivoting converts rows into columns, summarizing data (e.g., totals by quarter).
Unpivoting does the opposite β it converts columns into rows, useful for normalizing wide tables.
Example (Pivot):
SELECT *
FROM (
SELECT Year, Quarter, Revenue
FROM Sales
) AS Source
PIVOT (
SUM(Revenue) FOR Quarter IN ([Q1], [Q2], [Q3], [Q4])
) AS PivotTable;Example (Unpivot):
SELECT Year, Quarter, Revenue
FROM SalesPivoted
UNPIVOT (
Revenue FOR Quarter IN ([Q1], [Q2], [Q3], [Q4])
) AS Unpivoted;Alternative (Cross-platform):
Not all databases support PIVOT syntax (e.g., MySQL).
You can achieve the same result with conditional aggregation:
SELECT
Year,
SUM(CASE WHEN Quarter = 'Q1' THEN Revenue END) AS Q1,
SUM(CASE WHEN Quarter = 'Q2' THEN Revenue END) AS Q2
FROM Sales
GROUP BY Year;
π‘ Performance note:
- Always aggregate before pivoting large datasets.
- Ensure the pivot key (e.g.,
Quarter) has limited distinct values. - Avoid pivoting extremely wide datasets in OLTP systems β better to handle such reshaping in ETL or reporting layers.
What .NET engineers should know:
- πΌ Junior: Know that PIVOT turns rows into columns for summaries; UNPIVOT does the reverse.
- π Middle: Understand PIVOT syntax and how to emulate it using
CASE+GROUP BYwhen native syntax isnβt supported - π Senior: Understand that
PIVOTcan be replaced with conditional aggregation usingCASE+GROUP BYfor flexibility and performance. Always validate indexing and memory usage when transforming large datasets.
π Resources: Using PIVOT and UNPIVOT
β How do you find duplicates in a table?
To find duplicates, group rows by the columns that define βuniquenessβ and filter with HAVING COUNT(*) > 1.
Example:
SELECT Email, COUNT(*) AS Count
FROM Users
GROUP BY Email
HAVING COUNT(*) > 1;What .NET engineers should know:
- πΌ Junior: Learn to use
GROUP BYwithHAVING COUNT(*) > 1to detect duplicate values. - π Middle: Can join this result back to the original table to inspect complete duplicate rows or use
ROW_NUMBER()to keep only one instance and remove others. - π Senior: Understand root causes (missing constraints, concurrent inserts, ETL logic). Prevent duplicates via unique indexes, transactions, and data validation β not just cleanup queries.
β Whatβs the difference between UNION and UNION ALL?
UNIONcombines results from multiple queries and removes duplicates.UNION ALLalso combines results but keeps all rows, including duplicates β itβs faster because it skips the distinct check.
In EF Core:
var query = db.Customers
.Select(c => new { c.Name, c.Email })
.Union(db.Subscribers.Select(s => new { s.Name, s.Email })); // UNION by default
var all = db.Customers
.Select(c => new { c.Name, c.Email })
.Concat(db.Subscribers.Select(s => new { s.Name, s.Email })); // UNION ALL equivalent
In LINQ, Union() performs DISTINCT behavior, while Concat() keeps duplicates.
What .NET engineers should know:
- πΌ Junior: Remember β
UNIONremoves duplicates,UNION ALLdoesnβt. - π Middle: Know
UNIONadds sorting overhead; preferUNION ALLfor large or non-overlapping datasets. - π Senior: Understand that
UNIONrequires a sort or hash operation to eliminate duplicates. On large datasets, this can be costly β chooseUNION ALLwith downstream deduplication if performance matters more than strict uniqueness.
π Resources: Set Operators - UNION
Indexing and Query Optimization

β Whatβs the difference between a clustered and a non-clustered index?
- A clustered index defines the physical order of data in a table β the tableβs rows are stored directly in that index.
- A non-clustered index is a separate structure that stores key values and pointers (row locators) to the actual data.
Key difference between Clustered and non-clustered index:
| Feature | Clustered Index | Non-Clustered Index |
|---|---|---|
| Physical data order | Yes | No |
| Number per table | 1 | Many |
| Storage | Data pages are the index | Separate structure |
| Best for | Range scans, sorting | Targeted lookups, filters |
| Lookup cost | Direct (data is in index) | Requires key lookup to table |
Clustered index

- A table can have only one clustered index.
- The data rows themselves are stored in that order.
- The clustered index key is automatically part of every non-clustered index.
Example:
CREATE CLUSTERED INDEX IX_Orders_OrderId
ON Orders(OrderId);This makes the table physically sorted by OrderId.
When to use: Primary key or frequently range-filtered column (e.g., OrderDate, Id).
Non-clustered index

- Doesnβt change the data order β itβs a lookup table pointing to rows.
- You can have many non-clustered indexes.
- Useful for search-heavy queries on non-key columns.
Example:
CREATE NONCLUSTERED INDEX IX_Orders_CustomerId
ON Orders(CustomerId);π‘ Tip:
- Every table should have a clustered index, usually on the primary key.
- Add non-clustered indexes for frequent filters, joins, or sorts.
- Keep indexes lean β too many slow down inserts/updates.
What .NET engineers should know:
- πΌ Junior: Know clustered = main data order, non-clustered = secondary index.
- π Middle: Understand performance trade-offs and how SQL uses indexes for lookups and sorting.
- π Senior: Design index strategies based on workload β balance reads vs writes, monitor fragmentation, and tune composite or covering indexes based on query patterns.
π Resources: Clustered and nonclustered indexes
β Can you explain what a composite index is and why the order of columns in it matters?
A composite index (or multi-column index) is built on two or more columns of a table.
It helps speed up queries that filter or sort by the same column combinations.
Example
CREATE INDEX IX_Orders_CustomerId_OrderDate
ON Orders(CustomerId, OrderDate);This index works efficiently for:
-- Uses both columns in the index
SELECT * FROM Orders
WHERE CustomerId = 42 AND OrderDate > '2025-01-01';β¦but not this one:
-- Only filters by OrderDate β won't use the composite index efficiently
SELECT * FROM Orders
WHERE OrderDate > '2025-01-01';Thatβs because the index is sorted first by CustomerId, then by OrderDate. The column order defines which filters can use the index efficiently β the database can use the index for the leftmost prefix of the defined order.
π‘ Best practices:
- Put the most selective column first β the one that filters out the most rows.
- Match column order to your most frequent query patterns.
- Avoid redundant indexes β
(A, B)already covers(A)in most databases. - Use INCLUDE columns (SQL Server) for extra fields used in
SELECTto create a covering index.
In EF Core, the same logic applies β queries must align with index column order:
// Uses both parts of the composite index
var orders = await db.Orders
.Where(o => o.CustomerId == 42 &amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp; o.OrderDate > new DateTime(2025, 1, 1))
.ToListAsync();If you filter only by OrderDate, the database may perform an index scan instead of a seek.
What .NET engineers should know:
- πΌ Junior: Know that composite indexes combine multiple columns for faster lookups.
- π Middle: Understand that column order affects which queries can use the index.
- π Senior: Design composite indexes based on query selectivity and workload patterns; avoid redundant indexes and use covering or filtered variants for critical queries.
π Resources: Multicolumn Indexes
β What are the different types of indexes available in SQL databases?
Indexes come in several flavors, each optimized for a different kind of query or storage strategy. Think of them like various kinds of maps β each one helps you find data faster, but in its own way.
1. Clustered Index

- Defines the physical order of table data.
Each table can have only one clustered index β the table is the index.
β Best for range queries, sorting, or primary key lookups.
CREATE CLUSTERED INDEX IX_Orders_OrderId ON Orders(OrderId);2. Non-Clustered Index
- Separate the structure from the table.
- Points to data rows (like a book index).
- You can have many per table.
- Supports quick lookups for frequently filtered columns.
β Great for frequent filters and lookups on non-key columns.
CREATE NONCLUSTERED INDEX IX_Orders_CustomerId ON Orders(CustomerId);3. Composite (Multicolumn) Index
Combines multiple columns into one index.
- β Useful when queries filter or sort by a combination of columns.
- β οΈ Column order matters β
(A, B)β(B, A).
CREATE INDEX IX_Orders_CustomerId_OrderDate ON Orders(CustomerId, OrderDate);4. Covering Index (with Included Columns)
Includes all columns needed by a query so that the database can serve results without table lookups.
- β Improves read performance, especially for frequent, read-heavy queries.
- β οΈ Increases index size and slows writes.
CREATE NONCLUSTERED INDEX IX_Orders_CustomerId
ON Orders(CustomerId)
INCLUDE (OrderDate, Total);5. Unique Index
Ensures all indexed values are distinct β often created automatically by PRIMARY KEY or UNIQUE constraints.
CREATE UNIQUE INDEX IX_Users_Email ON Users(Email);6. Filtered / Partial Index
Indexes only a subset of rows based on a condition.
- β Saves space and speeds up targeted queries.
- β οΈ Limited to databases that support it (e.g., SQL Server, PostgreSQL).
CREATE UNIQUE INDEX IX_Subscriptions_Active
ON Subscriptions(UserId)
WHERE IsActive = 1;7. Full-Text Index
Specialized for searching inside long text fields or documents. Supports CONTAINS / FREETEXT queries.
CREATE FULLTEXT INDEX ON Articles(Title, Body)
KEY INDEX PK_Articles;8. Spatial Index
Optimized for geolocation or geometric data (e.g., points, polygons).
Enables queries like βfind locations within 10 km.β
CREATE SPATIAL INDEX SIDX_Locations_Geo ON Locations(Geo);9. Columnstore Index
Stores data in a columnar format rather than in rows.
Ideal for analytics and aggregation-heavy workloads (OLAP).
β Massive compression, optimized for SUM, COUNT, and AVG queries.
β οΈ Slower for frequent single-row lookups.
CREATE CLUSTERED COLUMNSTORE INDEX IX_Sales_ColumnStore ON Sales;
10. Hash Index (Memory-Optimized Tables)
Used in in-memory database tables for constant-time lookups.
Available in SQL Server (In-Memory OLTP) and PostgreSQL hash indexes.
Summary
| Type | Stores Data | Unique | Best For | Trade-off |
|---|---|---|---|---|
| Clustered | Yes | Yes | Range scans, sorting | Only one per table |
| Non-Clustered | No | Optional | Filters and lookups | Extra lookups |
| Composite | No | Optional | Multi-column filters | Column order matters |
| Covering | No | Optional | Read-heavy queries | Larger index size |
| Unique | No | Yes | Data integrity | None |
| Filtered | No | Optional | Partial data sets | Limited support |
| Full-Text | No | Optional | Text search | Storage-heavy |
| Spatial | No | Optional | Geo queries | Complex setup |
| Columnstore | Columnar | Optional | Analytics workloads | Slow single-row ops |
EF Core
Doesnβt automatically create indexes, except for primary and foreign keys. You can define custom ones using the Fluent API:
modelBuilder.Entity<Order>()
.HasIndex(o => new { o.CustomerId, o.OrderDate })
.HasDatabaseName("IX_Orders_CustomerId_OrderDate");
What .NET engineers should know:
- πΌ Junior: Know basic index types (clustered, non-clustered, unique).
- π Middle: Understand composite, covering, and filtered indexes and their trade-offs.
- π Senior: Design an overall indexing strategy β analyze query workloads, tune for selectivity, monitor index usage, and remove redundant or overlapping indexes.
π Resources: Index architecture and design guide
β How would you debug a slow query, and what tools would you use?
When a query is slow, start by verifying where the delay comes from β app logic, ORM, or the database itself.
Once itβs confirmed as a SQL issue, follow these steps:
1. Confirm the bottleneck
Before diving into SQL:
- Log timings at the app level (for example, EF Core logs SQL durations).
- Check whether the network, the ORM-generated SQL, or the database itself is causing the slowness.
Once youβre sure the query is the issue, it's time to go deeper.
2. Capture slow queries
| Database | Built-in Tools |
|---|---|
| SQL Server | Query Store, SQL Profiler, Extended Events |
| PostgreSQL | pg_stat_statements, log_min_duration_statement |
| MySQL / MariaDB | Slow Query Log, performance_schema, SHOW PROFILES |
| Oracle | Automatic Workload Repository (AWR) |
| SQLite | EXPLAIN QUERY PLAN |
3. Reproduce and measure
Run the query manually in your SQL tool (SSMS, pgAdmin, MySQL Workbench, etc.)
Note execution time, result size, and resource usage.
4. Check the execution plan
| Database | How to see plan | Example |
|---|---|---|
| SQL Server | Ctrl+M in SSMS > Include Actual Execution Plan | Identifies index scans, key lookups, bad joins. |
| PostgreSQL | EXPLAIN (ANALYZE, BUFFERS) | Shows real runtime and I/O cost. |
| MySQL | EXPLAIN ANALYZE (MySQL 8+) | Displays cost, row estimates, and actual timings. |
Look for:
- Table scans instead of index seeks.
- Large join loops or missing indexes.
- Misestimated row counts (outdated statistics).
5. Optimize
Common fixes:
- Add or tune indexes.
- Simplify joins and filters.
- Use selective
WHEREconditions. - Update statistics and ensure query parameters donβt trigger 'bad' plans (parameter sniffing).
- In MySQL/PostgreSQL, check buffer pool or work_mem settings for memory limits.
π‘ In .NET / EF Core:
Use simple query logging or diagnostic interceptors:
optionsBuilder.LogTo(Console.WriteLine, LogLevel.Information);
You can also tag queries for easy tracking:
var query = db.Orders.TagWith("Top slow query: Orders Dashboard")
.Where(o => o.Status == "Pending");
What .NET engineers should know:
- πΌ Junior: Be able to identify slow queries via logs and EF Core timing; understand that database design affects performance.
- π Middle: Read execution plans, detect missing indexes or bad joins, and fix obvious inefficiencies.
- π Senior: Diagnose advanced issues β parameter sniffing, plan cache reuse, or I/O bottlenecks; know when to recompile, refactor, or redesign data models.
π Resources:
β What is parameter sniffing, and how can it cause performance issues?
Parameter sniffing happens when the database engine caches an execution plan for a query or stored procedure based on the first parameter values it receives β and then reuses that same plan for all future executions, even when parameter values vary drastically.
Example of the problem
Letβs say you have a stored procedure:
CREATE PROCEDURE GetOrdersByCustomer
@CustomerId INT
AS
BEGIN
SELECT *
FROM Orders
WHERE CustomerId = @CustomerId;
END;- The first call might be for a small customer (10 orders).
- SQL builds a plan optimized for few rows β likely using an Index Seek.
- The next call might be for a large customer (100,000 orders).
- The same cached plan runs, causing slow performance β perhaps a Nested Loop Join instead of a Hash Join.
Thatβs parameter sniffing β a plan optimized for one scenario reused where it doesnβt fit.
π Why it happens
SQL Server and other databases cache query plans to save CPU time.
But if data distribution is skewed (some customers have 10 rows, others 100k), the βsniffedβ parameter can cause inefficient plans for future calls.

β How to fix it
- If performance depends heavily on parameter values:
OPTIMIZE FOR UNKNOWNorRECOMPILE. - For predictable parameters, keep sniffing, as caching can improve overall performance.
Option 1 β Use local variables
This prevents the optimizer from βsniffingβ the parameter value.
DECLARE @cid INT = @CustomerId;
SELECT * FROM Orders WHERE CustomerId = @cid;
Downside: the plan might not be fully optimized for any specific parameter, but it avoids extremes
Option 2 β Use OPTIMIZE FOR UNKNOWN
SELECT * FROM Orders
WHERE CustomerId = @CustomerId
OPTION (OPTIMIZE FOR UNKNOWN);Tell SQL Server to ignore the first parameter value and use general statistics instead.
Option 3 β Use RECOMPILE hint
SELECT * FROM Orders
WHERE CustomerId = @CustomerId
OPTION (RECOMPILE);Builds a new plan for every execution β always optimal but more CPU-heavy.
β
Always optimal for current parameters
β οΈ More CPU overhead β best for procedures run infrequently or with unpredictable data.
Option 4 β Manually clear or rebuild plans
- Clear cached plan:
DBCC FREEPROCCACHE;(use sparingly). - Rebuild statistics if theyβre stale or skewed.
What .NET engineers should know:
- πΌ Junior: Know that stored procs can behave differently for different parameters.
- π Middle: Understand that cached query plans cause parameter sniffing; know how to fix it with
RECOMPILEorOPTIMIZE FOR UNKNOWN. - π Senior: Diagnose sniffing with execution plans,
sp_BlitzCache, or Query Store; design queries and statistics to minimize skew effects in high-load systems.
π Resources:
β How does indexing improve read performance but slow down writes?
Indexes make reads faster β but every index adds maintenance work on writes.
When you INSERT, UPDATE, or DELETE a row:
- The tableβs data is modified and
- Every related index must also be updated or rebalanced.
This extra work increases CPU, I/O, and sometimes lock contention.
Example:
CREATE INDEX IX_Users_Email ON Users(Email);Now queries are filtering by Email run much fasterβbut inserting or changing an email takes slightly longer because because SQL must also insert into the index tree.
π‘ Best practices:
- Create indexes only for columns used in
WHERE,JOIN, orORDER BY. - Avoid βjust-in-caseβ indexes.
- Drop unused or duplicate indexes β check system views like:
SELECT * FROM sys.dm_db_index_usage_stats;
- Rebuild or reorganize fragmented indexes periodically.
- Use filtered, covering, or composite indexes for precision instead of many single-column ones.
What .NET engineers should know:
- πΌ Junior: Indexes make reads faster but slow down writes because the database must keep them in sync.
- π Middle: Know how to design indexes for query patterns (
WHERE,JOIN,ORDER BY) and avoid over-indexing. - π Senior: Balance read vs write workloads β minimize index count in high-write systems, use filtered or composite indexes, monitor fragmentation and index maintenance schedules.
π Resources: Index architecture and design guide
β Could you explain what a covering index is and how it can enhance performance?
A covering index (also called a covering or cover index) is an index that contains all the columns needed to satisfy a query, both for filtering and for returning data.
Because everything is already in the index, the database doesnβt have to go back to the main table (called a bookmark lookup or heap lookup) to fetch missing columns.
How it helps
- Fewer reads: The database doesnβt need to jump between the index and the table.
- Less I/O: Data is fetched from smaller index pages instead of whole table pages.
- Better caching: Indexes are often smaller and more likely to stay in memory.
However:
- Covering indexes take up more storage.
- They can slow down inserts and updates (because more data needs to be maintained).
- They should be used strategically for frequently executed, read-heavy queries.

Example:
-- Suppose you often run this query:
SELECT FirstName, LastName
FROM Employees
WHERE DepartmentId = 5;
-- A normal index:
CREATE INDEX IX_Employees_DepartmentId ON Employees(DepartmentId);
-- A covering index:
CREATE INDEX IX_Employees_DepartmentId_Covering
ON Employees(DepartmentId)
INCLUDE (FirstName, LastName);In SQL Server, the INCLUDE clause adds extra columns to the index that arenβt part of the search key but can be returned directly.
In PostgreSQL or MySQL, the same concept applies β the index can βcoverβ the query if all selected columns are in it, even without an explicit INCLUDE clause.
What .NET engineers should know:
- πΌ Junior: Understand that a covering index can make a query faster by including all the needed columns, so the engine doesnβt read the main table.
- π Middle: Know how to identify queries that benefit from covering indexes and how to use the
INCLUDEclause in SQL Server. Be aware of the trade-offs β storage and slower writes. - π Senior: Design indexes based on query patterns, use
INCLUDEefficiently, and read execution plans to confirm the index truly covers the query. Should know how this works in different engines β SQL Server (INCLUDE), PostgreSQL (multi-column indexes), MySQL (InnoDB clustered indexes).
π Resources:
β What are the best practices for indexing large tables?
Indexing large tables is all about balance β speeding up queries without slowing down inserts, updates, and storage. Done right, indexes make reads lightning-fast; done wrong, they can cripple write performance and bloat your database.
Hereβs how to handle them effectively:
1. Index what you query β not everything
Every index has a cost. Each write (insert, update, delete) must also update all related indexes.
β Index columns used in:
WHERE,JOIN,ORDER BY, orGROUP BY- Frequent lookups or range filters
β οΈ Avoid βjust in caseβ indexes β they add cost but no benefit.
2. Use composite indexes for common filters
If queries often combine multiple conditions, build composite indexes matching those patterns.
CREATE INDEX IX_Orders_CustomerId_Date
ON Orders (CustomerId, OrderDate DESC);β Works efficiently for queries like:
WHERE CustomerId = 42 AND OrderDate > '2025-01-01'
3. Keep indexes lean
The larger the index, the more memory it consumes.
- Avoid wide indexes (more than 3β4 key columns).
- Exclude large text or blob fields.
- Use
INCLUDEfor non-key columns instead of adding them all to the key.
CREATE INDEX IX_Orders_Status
ON Orders (Status)
INCLUDE (OrderDate, Total);4. Maintain and rebuild regularly
Large tables suffer from index fragmentation due to inserts and deletes.
β Schedule maintenance:
ALTER INDEX ALL ON Orders REBUILD WITH (ONLINE = ON);
or use REORGANIZE for lighter defragmentation.
π‘ For huge databases, use incremental or partition-level rebuilds.
5. Monitor index usage
Remove unused indexes and tune underused ones.
In SQL Server:
SELECT * FROM sys.dm_db_index_usage_stats;
In PostgreSQL:
SELECT * FROM pg_stat_user_indexes;
Check for:
- Unused indexes - Drop them.
- Missing indexes - Add them where scans dominate.
6. Consider partitioning
For massive datasets, partition tables and indexes by date or range. This reduces scan size and allows parallel operations.
CREATE PARTITION SCHEME psOrderRange AS PARTITION pfOrderRange
TO ([PRIMARY], [FG2025]);
7. Evaluate specialized indexes
- Use filtered or partial indexes to target subsets of data.
- Use columnstore indexes for analytical (OLAP) queries.
- Use hash indexes or memory-optimized tables for hot data in in-memory systems.
What .NET engineers should know:
- πΌ Junior: Understand that indexes make queries faster but slow down writes; add them only where needed.
- π Middle: Know how to design and maintain indexes β use composites, INCLUDE, and monitor fragmentation or usage.
- π Senior: Architect index strategies for high-scale systems β balance read/write ratio, schedule maintenance, partition large tables, and apply columnstore or filtered indexes for workload-specific optimization.
π Resources:
β How do database statistics affect query performance?
Statistics tell the query optimizer how data is distributed in a table β things like row counts, distinct values, and data ranges.
The optimizer uses them to choose the best execution plan. Outdated or missing stats can lead to poor choices, like full table scans instead of index seeks.
-- Manually update statistics
UPDATE STATISTICS Orders WITH FULLSCAN;Keeping stats fresh helps the optimizer make accurate cost estimates and avoid evil query plans.
π‘ Best practices:
- Keep auto-update statistics turned on (default in SQL Server, PostgreSQL, MySQL).
- Trigger manual updates for large changes (>20% of rows modified):
UPDATE STATISTICS Orders;
- For massive datasets, use:
- FULLSCAN for precision (expensive, but accurate).
- INCREMENTAL STATS on partitioned tables.
- AUTO CREATE STATISTICS for dynamic columns in SQL Server.
- Rebuild indexes periodically β it updates statistics automatically.
What .NET engineers should know:
- πΌ Junior: Know that statistics guide how the DB decides to use indexes β stale stats can cause full scans.
- π Middle: Understand auto-update thresholds and when to manually refresh statistics for large tables.
- π Senior: Diagnose bad query plans caused by skewed or outdated stats; compare before/after execution plans, use FULLSCAN or incremental stats for partitioned data, and monitor the cardinality estimator behavior.
π Resources: Statistics SQL Server
β What is the purpose of the NOLOCK hint in SQL Server, and what are the serious risks of using it?
The NOLOCK table hint tells SQL Server to read data without acquiring shared locks β meaning it doesnβt wait for ongoing transactions.
Itβs equivalent to running the query under the READ UNCOMMITTED isolation level.
Example:
-- Typical usage
SELECT * FROM Orders WITH (NOLOCK);This can make queries appear faster β because they donβt block or get blocked.
But the trade-off is serious: the query may read uncommitted, inconsistent, or even corrupt data.
β οΈ Risks of using NOLOCK
- Dirty reads. You might see rows that were inserted or updated but later rolled back.
- Missing or duplicated rows. Page splits or concurrent updates can cause the same row to appear twice β or not at all.
- Phantom reads. Data might change mid-scan, so aggregates like
SUM()orCOUNT()are unreliable. - Corrupted results during page splits. SQL Server might read half an old page and half a new one β returning nonsense values.
β Safer alternatives
- Use READ COMMITTED SNAPSHOT ISOLATION (RCSI) β it reads from a versioned snapshot instead of dirty data:
ALTER DATABASE MyAppDB SET READ_COMMITTED_SNAPSHOT ON;
Reads are non-blocking and consistent.
- Optimize queries and indexes to reduce blocking rather than skipping locks.
- Use
WITH (READPAST)only when intentionally skipping locked rows (e.g., queue systems).
What .NET engineers should know:
- πΌ Junior: Know
NOLOCKskips locks to make reads faster but can return wrong data. - π Middle: Understand real effects β dirty reads, missing or duplicated rows, and wrong aggregates. Know safer alternatives like
READ COMMITTED SNAPSHOT. - π Senior: Avoid
NOLOCKin production code unless you absolutely understand the trade-off. Should know how to tune queries, isolation levels, and indexing to achieve concurrency without sacrificing data integrity.
Advanced SQL Patterns

β When would you consider using a database view, and what are its limitations?
A view is a saved SQL query that acts like a virtual table. It doesnβt store data β it just stores the definition of a query that runs when you select from it. Views can simplify complex queries, improve security, and standardize access to data.

β When to use a view
1. Simplify complex queries
Instead of repeating joins or aggregations everywhere, wrap them in a view:
CREATE VIEW ActiveCustomers AS
SELECT Id, Name, Email
FROM Customers
WHERE IsActive = 1;Then use it efficiently:
SELECT * FROM ActiveCustomers;2. Encapsulate business logic
Centralize derived calculations, so multiple applications or reports share consistent results.
3. Improve security
Restrict users to specific columns or rows by granting access to the view instead of the base table.
4. Data abstraction layer
If table schemas change, you can preserve the exact view definition to avoid breaking queries in dependent applications.
β οΈ Limitations
1. Performance overhead
A view doesnβt store data β it re-runs the underlying query each time, which can be slow for heavy joins or aggregations.
2. Read-only in most cases
You usually canβt INSERT, UPDATE, or DELETE through a view unless it maps directly to a single base table (and even then, there are restrictions).
3. No automatic indexing
A view itself isnβt indexed, though you can create indexed/materialized views (if your DB supports them) to store results physically.
4. Maintenance complexity
If base tables change (column names are renamed or types are changed), views can break silently.
π‘ When to avoid views
- For real-time, performance-critical queries, use materialized views or precomputed tables instead.
- When you need full CRUD operations.
- When application logic needs to shape data dynamically (views are static).
What .NET engineers should know:
- πΌ Junior: Know that a view is a saved SQL query that simplifies data access.
- π Middle: Understand how views help with abstraction and security, but can slow down complex queries.
- π Senior: Use indexed/materialized views for performance, manage dependencies carefully, and monitor query plans for inefficiencies.
π Resources: Views SQL
β Whatβs the difference between views and materialized views?
Both views and materialized views represent saved SQL queries, but they differ in how they handle data storage and performance.

| Feature | View | Materialized View |
|---|---|---|
| Data storage | Doesnβt store data β runs query on demand | Stores query result physically |
| Performance | Slower for complex joins/aggregates | Much faster for repeated reads |
| Freshness | Always current | Must be refreshed |
| Indexing | Canβt be indexed directly (except indexed views in SQL Server) | Can be indexed normally |
| Use case | Lightweight abstraction, security, reusable logic | Analytics, dashboards, reporting, read models |
Example (PostgreSQL):
-- Regular view
CREATE VIEW ActiveCustomers AS
SELECT * FROM Customers WHERE IsActive = TRUE;
-- Materialized view
CREATE MATERIALIZED VIEW CustomerStats AS
SELECT Region, COUNT(*) AS Total
FROM Customers
GROUP BY Region;Refresh when data changes:
REFRESH MATERIALIZED VIEW CustomerStats;You can schedule refreshes or trigger them via events.
π‘ Performance tip:
- Materialized views reduce load on large joins and aggregations.
- You can index them, unlike regular views.
- But stale data risk exists β choose refresh intervals wisely.
- Some databases (like PostgreSQL) support concurrent refreshes to avoid locking.
What .NET engineers should know:
- πΌ Junior: Views show real-time data; materialized views store precomputed data for speed.
- π Middle: Understand refresh trade-offs β materialized views are faster but can become stale.
- π Senior: Design refresh strategies (scheduled, trigger-based, or event-driven) and decide between SQL Server indexed views or PostgreSQL materialized views for reporting and caching layers.
π Resources:
β What are the pros and cons of using stored procedures?
Stored procedures are precompiled SQL scripts stored in the database. They can encapsulate logic, improve performance, and simplify maintenance β but they can also make versioning, testing, and scaling more complex.
Stored Procedures Pros and Cons
- Pros: stable contract, plan caching, reduced wire traffic, centralized security/permissions, sometimes easier hot-fixing.
- Cons: portability and versioning friction, risk of duplicating domain logic in DB, and testing complexity.
Note: procs arenβt literally precompiledβexecution plans are compiled on first use and cached. Use parameters; dynamic SQL inside a proc can still be injectable.
β When to use stored procedures:
- Data-heavy logic that benefits from server-side computation (reporting, aggregations, batch jobs).
- Security-critical operations that require strict validation at the database level.
- Systems with shared databases accessed by multiple services or tools.
β When not to:
- Application-level business logic that changes often.
- Microservices or DDD-based architectures β logic should live closer to the domain layer.
What .NET engineers should know:
- πΌ Junior: Stored procedures are reusable SQL scripts that run faster and help keep database logic secure.
- π Middle: Theyβre great for shared business rules and performance-critical operations β but hard to maintain in CI/CD pipelines.
- π Senior: Use them strategically for data-heavy operations near the database layer. Avoid mixing too much app logic inside them; prefer code-based services when scalability, observability, or versioning matter.
π Resources: Stored procedures (Database Engine)
β How do you efficiently pass a list of values to a stored procedure in SQL Server versus PostgreSQL?
Both SQL Server and PostgreSQL can accept multiple values in a single call, but they handle them differently.
In SQL Server, you use Table-Valued Parameters (TVPs), while in PostgreSQL, you typically use arrays or unnest().
SQL Server: Table-Valued Parameters (TVP)
A TVP lets you pass a structured list (table) to a procedure.
You define a custom type once and reuse it.
-- Define a table type
CREATE TYPE IdList AS TABLE (Id INT);
-- Create stored procedure using TVP
CREATE PROCEDURE GetOrdersByIds @Ids IdList READONLY AS
BEGIN
SELECT * FROM Orders WHERE OrderId IN (SELECT Id FROM @Ids);
END;
-- Pass values from .NET
var ids = new DataTable();
ids.Columns.Add("Id", typeof(int));
ids.Rows.Add(1);
ids.Rows.Add(2);
using var cmd = new SqlCommand("GetOrdersByIds", conn);
cmd.CommandType = CommandType.StoredProcedure;
var param = cmd.Parameters.AddWithValue("@Ids", ids);
param.SqlDbType = SqlDbType.Structured;
cmd.ExecuteReader();- Pros: Fast, type-safe, avoids string parsing.
- Cons: SQL Serverβspecific.
PostgreSQL: Arrays and unnest()
PostgreSQL doesnβt have TVPs β instead, you can pass an array parameter and unpack it inside the query.
CREATE OR REPLACE FUNCTION get_orders_by_ids(ids int[])
RETURNS TABLE (order_id int, status text) AS $$
BEGIN
RETURN QUERY
SELECT o.id, o.status
FROM orders o
WHERE o.id = ANY(ids);
END;
$$ LANGUAGE plpgsql;
-- Call it
SELECT * FROM get_orders_by_ids(ARRAY[1, 2, 3]);You can also expand the array with unnest(ids) for joins or complex logic.
- Pros: Native, concise, and efficient.
- Cons: Doesnβt support structured types easily (only arrays of primitives).
What .NET engineers should know:
- πΌ Junior: Know that SQL Server and PostgreSQL handle lists differently β TVPs vs arrays. Understand that sending one significant parameter is better than concatenating strings like
'1,2,3'. - π Middle: Be able to create and use TVPs in SQL Server and arrays in PostgreSQL, understanding how to map them from .NET (
DataTableβ TVP,int[]β array). - π Senior: Know performance characteristics β batching vs TVP vs JSON, how to pass complex types (PostgreSQL composite types or JSONB), and when to offload this logic to the app layer.
π Resources:
β Whatβs the difference between temporary tables and table variables?
Temporary tables (#TempTable) and table variables (@TableVar) are used to store temporary data β but they differ in how they behave, how long they live, and how SQL Server optimizes them.
- Temporary tables act like real tables: they live in
tempdb, support indexes and statistics, and are visible to nested stored procedures. - Table variables live in memory (though also backed by
tempdb), donβt maintain statistics, and are scoped only to the current batch or procedure.
Example:
-- Temporary table
CREATE TABLE #TempUsers (Id INT, Name NVARCHAR(100));
INSERT INTO #TempUsers VALUES (1, 'Alice');
-- Table variable
DECLARE @Users TABLE (Id INT, Name NVARCHAR(100));
INSERT INTO @Users VALUES (2, 'Bob');Key Differences:
| Feature | #Temp Table | @Table Variable |
|---|---|---|
| Storage | tempdb | tempdb (lightweight) |
| Statistics | β Yes | β No (until SQL Server 2019+) |
| Indexing | β Any index | Limited (PK/Unique only) |
| Transaction scope | β Follows transaction | β Not fully transactional |
| Recompilation | Can recompile for better plan | No recompile (fixed plan) |
| Performance | Better for large sets | Better for small sets |
| Scope | Session / connection | Batch / function |
What .NET engineers should know:
- πΌ Junior: Both store temporary data, but temporary tables are more flexible and better suited for larger datasets.
- π Middle: Use temporary tables when you need statistics or multiple joins. Use table variables for small lookups or for simple logic within a single procedure.
- π Senior: Use temp tables for complex or large operations, TVPs for batch inserts from .NET, and monitor tempdb contention in high-load systems.
β How would you enforce a business rule like "a user can only have one active subscription" at the database level?
You can enforce it directly in the database using a filtered unique index or a trigger β depending on what your database supports.
Option 1: Unique filtered index (preferred)
If your database supports it (SQL Server, PostgreSQL), create a unique index that applies only when IsActive = 1.
CREATE UNIQUE INDEX UX_User_ActiveSubscription
ON Subscriptions(UserId)
WHERE IsActive = 1;This ensures that only one active subscription per user can exist β attempts to insert a second active one will fail automatically.
Option 2: Trigger-based validation
If your database doesnβt support filtered indexes, use a trigger to check before insert/update.
CREATE TRIGGER TR_EnsureSingleActiveSubscription
ON Subscriptions
AFTER INSERT, UPDATE
AS
BEGIN
IF EXISTS (
SELECT UserId
FROM Subscriptions
WHERE IsActive = 1
GROUP BY UserId
HAVING COUNT(*) > 1
)
BEGIN
ROLLBACK TRANSACTION;
RAISERROR('User cannot have more than one active subscription.', 16, 1);
END
END;This prevents commits that violate the business rule, regardless of how many concurrent requests hit the DB.
What .NET engineers should know:
- πΌ Junior: Know that database constraints can prevent invalid data even if the app misbehaves.
- π Middle: Understand filtered indexes and how they enforce conditional uniqueness.
- π Senior: Design proper DB-level constraints and triggers to enforce rules safely under concurrency and heavy load.
π Resources:
β What are some strategies for efficiently paginating through a huge result set, as opposed to using OFFSET/FETCH?
OFFSET/FETCH (or LIMIT/OFFSET) works fine for small pages, but it gets slower as the offset grows β the database still scans all skipped rows. For large datasets, you need smarter paging.

Better pagination strategies:
Keyset (Seek) Pagination

Use the last seen key instead of OFFSET.
SELECT * FROM Orders
WHERE Id > @LastId
ORDER BY Id
LIMIT 50;Fast because it uses an index and skips directly to the next page. Works only when your ID is sortable (is not suitable for UUID V4).
Bookmark Pagination
Used when sorting by multiple columns.
You remember the last recordβs values and continue from them.
SELECT * FROM Orders
WHERE (OrderDate, Id) > (@LastDate, @LastId)
ORDER BY OrderDate, Id
FETCH NEXT 50 ROWS ONLY;
Precomputed / Cached Pagination /Snapshot pagination
For reports or exports, materialize results into a temp table or cache so paging doesnβt re-run the same heavy query each time.

Use proper index
Always ensure the ORDER BY columns are indexed β seek-based pagination relies on it.
What .NET engineers should know:
- πΌ Junior: Know that
OFFSETgets slower on big tables. - π Middle: Use keyset pagination for large or live data.
- π Senior: Combine seek-based paging with caching, filtering, or precomputed datasets for scalable APIs.
π Resources: Pagination Strategies
β What is a trigger, and when should it be avoided?
A trigger is a special stored procedure that runs automatically in response to certain database events, such as INSERT, UPDATE, or DELETE. Triggers are useful for enforcing rules or auditing changes, but they can make logic hard to trace, debug, and maintain, especially when multiple triggers chain together.
Example:
CREATE TRIGGER trg_AuditOrders
ON Orders
AFTER INSERT, UPDATE
AS
BEGIN
INSERT INTO OrderAudit (OrderId, ChangedAt)
SELECT Id, GETDATE()
FROM inserted;
END;This trigger automatically logs every order creation or update.
When to Use:
- Enforcing data integrity rules not covered by constraints.
- Creating audit trails or history logs.
- Handling cascading actions (e.g., soft deletes).
When to Avoid:
- When business logic can live in the application layer.
- When triggers create hidden side effects that confuse other developers.
- When performance or scalability matters, triggers can cause recursive updates and slow down bulk operations.
What .NET engineers should know:
- πΌ Junior: Triggers run automatically after data changes β good for enforcing rules, but easy to misuse.
- π Middle: Use triggers mainly for auditing or strict integrity constraints. Avoid complex logic or multiple triggers on the same table.
- π Senior: Keep the database lean β business rules belong in services, not triggers. If auditing is needed, prefer CDC (Change Data Capture), temporal tables, or event-based approaches for transparency and scalability.
π Resources: CREATE TRIGGER (Transact-SQL)
β How can you implement audit logging using SQL features?
Audit logging tracks who changed what and when in your database.
You can implement it using built-in SQL features like triggers, CDC (Change Data Capture), or temporal tables β depending on how detailed and real-time your auditing needs are.
Triggers (manual auditing):
CREATE TRIGGER trg_AuditOrders
ON Orders
AFTER UPDATE
AS
BEGIN
INSERT INTO OrderAudit (OrderId, ChangedAt, ChangedBy)
SELECT Id, GETDATE(), SUSER_SNAME()
FROM inserted;
END;- Pros: Simple to set up
- Cons: Harder to maintain, can affect performance
Change Data Capture (CDC):
EXEC sys.sp_cdc_enable_table
@source_schema = 'dbo',
@source_name = 'Orders',
@role_name = NULL;- Pros: Automatically tracks all changes
- Cons: Requires SQL Server Enterprise or equivalent rights, adds storage overhead
Temporal (System-Versioned) Tables:
CREATE TABLE Orders (
Id INT PRIMARY KEY,
Amount DECIMAL(10,2),
SysStartTime DATETIME2 GENERATED ALWAYS AS ROW START,
SysEndTime DATETIME2 GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.OrdersHistory));- Pros: Keeps a complete history of row changes automatically
- Cons: More storage, not suited for high-churn tables
What .NET engineers should know:
- πΌ Junior: Auditing means keeping a record of data changes β start with triggers if you need a quick log.
- π Middle: Use CDC or temporal tables for reliable, built-in tracking. Theyβre safer and easier than custom trigger logic.
- π Senior: Design audit logging based on business and compliance needs β choose between real-time CDC, temporal versioning, or external event-driven logs (e.g., outbox or change-feed patterns). Continually monitor storage and retention.
π Resources: Change Data Capture (CDC)
β How can you use SQL window functions to calculate rolling averages or cumulative totals?
You can use window functions like AVG() or SUM() with the OVER() clause to calculate rolling averages or running totals β without collapsing rows like GROUP BY does. They let you look βacrossβ rows related to the current one, often within a defined time or ordering window.
Cumulative total example:
SELECT
CustomerId,
OrderDate,
SUM(TotalAmount) OVER (PARTITION BY CustomerId ORDER BY OrderDate) AS RunningTotal
FROM Orders;This gives each customerβs running total of order amounts in chronological order.
Rolling average example (last three orders):
SELECT
CustomerId,
OrderDate,
AVG(TotalAmount) OVER (
PARTITION BY CustomerId
ORDER BY OrderDate
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS RollingAvg3
FROM Orders;What .NET engineers should know:
- πΌ Junior: Use
SUM()orAVG()withOVER(ORDER BY β¦)to create running totals or averages. - π Middle: Learn how
PARTITION BYgroups data and how the frame (ROWS BETWEEN β¦) controls the range of rows included in the calculation. - π Senior: Use window functions to replace self-joins or subqueries for analytics β theyβre faster and easier to maintain. Optimize with proper indexing on the
ORDER BYcolumn to avoid sorting overhead.
π Resources: SQL Window Functions
β ORM vs Raw SQL Query β When to Use What
Both approaches have their strengths β the key is knowing when each is most suitable. Think of it like this: EF Core (ORM) is your daily driver, while raw SQL is your race car β more control, but more work.
Use ORM (like EF Core) when:
- Youβre building CRUD-heavy apps (create, read, update, delete).
- Queries are either simple or moderately complex.
- You value maintainability and readability over absolute performance.
- You want LINQ, type-safety, and automatic model tracking.
- You donβt want to maintain SQL scripts across environments.
β Great for 90% of app queries.
β οΈ But it may generate less efficient SQL for big joins or analytics.
βοΈ Use Raw SQL when:
- You need complex reporting, aggregations, or window functions.
- Youβre chasing maximum performance β e.g., dashboards, exports, analytics.
- You must use database-specific features (CTEs, stored procedures, hints).
- Youβre working with legacy databases or data sources outside EFβs model.
- You need fine-grained control over indexing, joins, or query plans.
β Perfect for heavy read/reporting queries.
β οΈ Harder to maintain, no compile-time checking.
Real-world approach:
- Most production systems use a hybrid:
- EF Core for standard business logic.
- Raw SQL for specialized data reporting or batch processing.
- Sometimes, stored procedures are used for long-running jobs or bulk operations.
What .NET engineers should know:
- πΌ Junior: Use EF Core by default β itβs easier, safer, and cleaner.
- π Middle: Mix EF and raw SQL when performance or complexity requires it.
- π Senior: Profile, measure, and choose based on data size, complexity, and maintainability β not personal preference.
π Resources:
Concurrency, Transactions, and Isolation

β How would you explain the difference between the DB isolation levels?
Isolation levels define how much one transaction can βseeβ from another before itβs committed. The stricter the level, the safer your data, but the slower your system.
When multiple users read and modify the same data at once, three things can go wrong:
- Dirty Read β You read uncommitted data from another transaction.
Example: someone updates a balance, you read it, and then they roll back β you saw data that never existed. - Non-Repeatable Read β You read the same row twice, but its value changed between reads.
Example: you check an orderβs status at the start and end of your transaction, and itβs different. - Phantom Read β You re-run a query and get a new row that didnβt exist before.
Example: between two reads, another transaction inserts a new record matching your WHERE clause.
| Level | Prevents Dirty Reads | Prevents Non-Repeatable Reads | Prevents Phantom Reads | Notes |
|---|---|---|---|---|
| Read Uncommitted | β | β | β | Fastest, but least safe |
| Read Committed | β | β | β | Default in SQL Server |
| Repeatable Read | β | β | β | Keeps read locks until commit |
| Serializable | β | β | β | Most consistent, slowest |
| Snapshot | β | β | β | Uses row versioning instead of locks |
Example: How to set Isolation level:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRAN
SELECT * FROM Orders WHERE Status = 'Pending';
-- Another transaction modifies Orders here
SELECT * FROM Orders WHERE Status = 'Pending'; -- may return different results
COMMIT;If you switch to REPEATABLE READ SQL Server will lock the rows you selected, so the second query sees the same data, but updates from others will be blocked.
What .NET engineers should know:
- πΌ Junior: Understand what dirty, non-repeatable, and phantom reads are. Know that isolation levels help prevent these concurrency issues.
- π Middle: Be able to pick the correct isolation level for a use case. Understand trade-offs β more isolation = less concurrency.
- π Senior: Know how isolation levels work under the hood (locking vs MVCC), tune them per operation, and use
SNAPSHOTor retry strategies to handle high-load concurrency safely in distributed systems.
β Whatβs the difference between pessimistic and optimistic locking?
Pessimistic locking means you lock a record as soon as you start working with it β no one else can change it until youβre done.
Optimistic locking lets multiple users read and work on the same data, but before saving, it checks if someone else has already updated it.

Pessimistic locking example:
BEGIN TRAN;
SELECT * FROM Accounts WITH (UPDLOCK, ROWLOCK) WHERE Id = 1;
UPDATE Accounts SET Balance = Balance - 100 WHERE Id = 1;
COMMIT;Here, the row is immediately locked, blocking other writes until the transaction completes.
Optimistic locking example (EF Core):
public class Account
{
public int Id { get; set; }
public decimal Balance { get; set; }
[Timestamp]
public byte[] RowVersion { get; set; }
}EF Core adds the RowVersion to the WHERE clause during updates. If the row was changed by someone else, the update fails, and the app decides how to handle it (e.g., retry or show an error).
What .NET engineers should know:
- πΌ Junior: Pessimistic locks the data right away. Optimistic checks for changes only when saving.
- π Middle: Use pessimistic when updates must be serialized, like in finance systems. Use optimistic in web apps or APIs where most operations donβt overlap.
- π Senior: Choose based on workload and contention. Pessimistic is safer but reduces throughput; optimistic provides higher concurrency but requires conflict handling in the code.
π Resources:
β How do you detect and prevent deadlocks?
A deadlock occurs when two or more transactions block each other β each holds a lock that the other needs. SQL Server detects the situation and kills one transaction (the βvictimβ) with error 1205.
The fix is to find which statements lock the same resources and in what order.
How can a deadlock happen?
-- Session 1
BEGIN TRAN
UPDATE Orders SET Status = 'Processing' WHERE Id = 1;
WAITFOR DELAY '00:00:05';
UPDATE Customers SET Name = 'John' WHERE Id = 1;
COMMIT;
-- Session 2
BEGIN TRAN
UPDATE Customers SET Name = 'Alice' WHERE Id = 1;
WAITFOR DELAY '00:00:05';
UPDATE Orders SET Status = 'Pending' WHERE Id = 1;
COMMIT;Each transaction locks one table and then waits for the other β classic deadlock.
How to handle it in application code
// Example of retrying a transaction that failed due to a deadlock
for (int attempt = 1; attempt <= 3; attempt++)
{
try
{
using var connection = new SqlConnection(connString);
connection.Open();
using var tx = connection.BeginTransaction();
// Some update logic here...
tx.Commit();
break;
}
catch (SqlException ex) when (ex.Number == 1205)
{
// Deadlock detected β retry the transaction
Thread.Sleep(200 * attempt);
}
}This snippet demonstrates a retry policy β a common approach to recover gracefully when a transaction is rolled back due to a deadlock.
What .NET engineers should know:
- πΌ Junior: Understand what a deadlock is β two processes waiting on each other. Know that SQL Server aborts one with error 1205.
- π Middle: Be able to capture and analyze deadlock graphs, identify lock order conflicts, and fix design issues (consistent table access order, shorter transactions, better indexing).
- π Senior: Understand locking internals, apply retry logic with idempotent operations, tune isolation levels, and design schema or workflows to minimize contention in distributed or high-load systems.
π Resources: Deadlocks guide
β How would you design a retry policy for failed transactions?
A retry policy helps your system recover from temporary issues β like deadlocks, timeouts, or network hiccups β without user intervention.
The key is to retry safely: only for transient errors and with a proper delay between attempts.
Example (C# with Polly):
var retryPolicy = Policy
.Handle<SqlException>(ex => ex.Number == 1205) // deadlock
.Or<TimeoutException>()
.WaitAndRetry(
retryCount: 3,
sleepDurationProvider: attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)) // exponential backoff
);
retryPolicy.Execute(() =>
{
// Database operation
SaveChanges();
});This retries the operation up to 3 times, waiting 2, 4, and 8 seconds between attempts.
Best practices:
- Retry only idempotent operations (safe to repeat).
- Use exponential backoff to avoid overloading the DB.
- Add jitter (random delay) to prevent synchronized retries.
- Log every retry and failure for diagnostics.
- Stop retrying after a few attempts β donβt loop forever.
What .NET engineers should know:
- πΌ Junior: A retry policy automatically repeats failed operations β useful for temporary errors.
- π Middle: Use exponential backoff and retry only transient issues, such as deadlocks or timeouts. Donβt retry logic errors.
- π Senior: Design retries with idempotency, circuit breakers, and observability in mind. Combine with transaction scopes or outbox patterns for guaranteed consistency in distributed systems.
Database Maintenance and System Design

β Whatβs the difference between OLTP and OLAP systems?
OLTP (Online Transaction Processing) handles real-time operations β inserts, updates, deletes β for day-to-day business activity.
OLAP (Online Analytical Processing) is built for analytics β it processes large volumes of data for reports, dashboards, and trend analysis.
What .NET engineers should know:
- πΌ Junior: OLTP is for live operations (like processing an order). OLAP is for analyzing data later.
- π Middle: OLTP keeps data normalized for consistency; OLAP stores it denormalized for speed. Use OLTP for transactions and ETL pipelines to feed OLAP for reporting.
- π Senior: Architect them separately β OLTP for system-of-record workloads, OLAP for analytical queries.
Use ETL or ELT pipelines to sync them and avoid performance hits on production databases.
π Resources:
β What is table bloat in PostgreSQL, and how does the autovacuum process help manage it?
In PostgreSQL, table bloat occurs when a table or index contains a large number of dead tuples β old row versions that remain after updates or deletes.
Unlike some databases, PostgreSQL uses MVCC (Multi-Version Concurrency Control), which never updates data in place. Instead, it creates a new row version and marks the old one as obsolete once no active transaction needs it. Over time, those dead tuples accumulate, wasting disk space and slowing sequential scans.
Thatβs where autovacuum comes in.
Itβs a background process that automatically:
- Scans tables for dead tuples.
- Mark's unused space as reusable (via
VACUUM). - Updates visibility maps so queries can skip clean pages.
- Optionally runs
ANALYZEto refresh statistics.
Without autovacuum, bloat grows until queries slow down or the disk fills up.
Example: Manually trigger vacuum if autovacuum can't keep up
VACUUM ANALYZE orders;
-- Check for bloat using pg_stat views
SELECT relname, n_dead_tup, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;You can also tune autovacuum in postgresql.conf:
autovacuum = on
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1This means that a vacuum runs when 20% of rows are dead or when 10% of changes need re-analysis.
What .NET engineers should know:
- πΌ Junior: Understand that PostgreSQL doesnβt delete rows instantly β old versions remain and can cause βbloat.β Know that autovacuum cleans it up automatically.
- π Middle: Know how to check autovacuum activity via
pg_stat_activityand tune parameters likeautovacuum_vacuum_scale_factor. Recognize symptoms of bloat β increasing table size and slower queries. - π Senior: Be able to diagnose severe bloat using tools like
pgstattuple, decide when to runVACUUM FULLorREINDEXand design update-heavy workloads to reduce bloat (e.g., avoid frequent minor updates and use partitioning).
π Resources: sql-vacuum
β What are database partitions, and when should you use them?
Partitioning splits a large table or index into smaller, more manageable pieces while keeping them logically as one table.
It improves performance, scalability, and maintenance for large datasets by letting the database work on smaller chunks.
CREATE PARTITION FUNCTION OrderDateRangePFN (DATE)
AS RANGE LEFT FOR VALUES ('2023-12-31', '2024-12-31');
CREATE PARTITION SCHEME OrderDateRangePS
AS PARTITION OrderDateRangePFN
TO ([PRIMARY], [FG_2024], [FG_2025]);
CREATE TABLE Orders (
Id INT PRIMARY KEY,
OrderDate DATE,
Amount DECIMAL(10,2)
) ON OrderDateRangePS(OrderDate);This partitions the Orders table by year β each yearβs data can live on a different filegroup.
When to use partitioning:
- Tables grow into tens or hundreds of millions of rows.
- You need faster queries on specific data ranges (e.g. by date).
- You want to archive or delete old data efficiently (switch/drop partitions).
- You run ETL or analytics on large datasets without locking the entire table.
Avoid it when:
- Tables are small, or queries already perform well with indexes.
- It adds unnecessary complexity without measurable benefit.
What .NET engineers should know:
- πΌ Junior: Partitioning breaks a big table into smaller parts, but you still query it as one.
- π Middle: Use it for large, date-driven tables to speed up queries and simplify maintenance, such as archiving or rolling data windows.
- π Senior: Plan partition keys carefullyβthey affect query performance and maintenance costs. Combine partitioning, indexing, and compression for high-scale systems.
π Resources: Partitioned tables and indexes
βHow would you archive old records without affecting query performance?
Archiving means moving old, rarely accessed data out of your main tables to keep them small and fast.
The key is to separate hot (active) and cold (historical) data so queries only scan whatβs relevant.
Common approaches:
1. Table partitioning (best for large datasets):
Keep old data in separate partitions and move it to an archive table.
ALTER TABLE Orders
SWITCH PARTITION 1 TO OrdersArchive PARTITION 1;β Zero downtime, minimal locking
β οΈ Requires partitioned tables set up correctly
2. Scheduled archiving jobs:
Periodically move or delete records with a SQL Agent job or a background service.
INSERT INTO OrdersArchive
SELECT * FROM Orders WHERE OrderDate < '2024-01-01';
DELETE FROM Orders WHERE OrderDate < '2024-01-01';β Simple and flexible
β οΈ Must run during off-peak hours, requires proper indexing and batching
3. Separate archive database:
Store archived data in a dedicated database or cheaper storage (e.g., Azure SQL Hyperscale, S3).
Your app can still query it through a linked server or API when needed.
β Keeps production DB light
β οΈ Increases query complexity if historical data is often needed.
What .NET engineers should know:
- πΌ Junior: Move old data to another table to keep active queries fast.
- π Middle: Use partitioning or scheduled jobs to archive by date or status. Keep indexes on active data lean.
- π Senior: Design for data lifecycle β use partition switching for instant archival, automate retention policies, and store cold data on cheaper storage tiers. Constantly monitor index fragmentation and query plans after archiving.
β How would you secure sensitive data at rest and in transit?
Data security comes down to two layers:
- At rest: protecting stored data (in files, databases, backups).
- In transit: protecting data while itβs moving over the network.
You use encryption, access control, and secure connections for both.
1. Data at rest (stored data):
- Use Transparent Data Encryption (TDE) to encrypt entire databases on disk.
- Use column-level encryption or Always Encrypted for specific fields, such as SSNs or credit card numbers.
- Secure backups with encryption and restrict file-level permissions.
Example: enabling TDE
CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_256
ENCRYPTION BY SERVER CERTIFICATE MyServerCert;
ALTER DATABASE MyAppDB SET ENCRYPTION ON;2. Data in transit (network data):
- Always connect over TLS (SSL) β encrypts data between client and server.
- Disable plain TCP connections and enforce encrypted connections in connection strings.
- Use VPNs, private endpoints, or firewalls for sensitive environments.
// Example: Enforce encryption in connection string
"Server=myserver;Database=mydb;Encrypt=True;TrustServerCertificate=False;"3. Access control & monitoring:
- Use least privilege β grant only whatβs necessary.
- Rotate credentials and use managed identities or Azure Key Vault for secrets.
- Audit access with built-in logging (e.g., SQL Server Audit, Azure Defender).
What .NET engineers should know:
- πΌ Junior: Encrypt the database and use secure connections (TLS). Never store passwords or secrets in plain text.
- π Middle: Combine TDE for full-database protection with column-level encryption for critical fields. Use
Encrypt=Truein all DB connections. - π Senior: Apply defense-in-depth: encryption, access control, key management, and network isolation. Integrate with Key Vault, KMS, or HSM for key lifecycle management and automate compliance monitoring.
π Resources:
β How would you design a reporting database separate from the OLTP database?
Iβd start by separating the operational and analytical workloads. The OLTP database should handle fast inserts and updates, while the reporting database focuses on heavy read and aggregation queries.
To do that, Iβd build a data pipeline β usually ETL or ELT β that copies data from the OLTP system into a reporting or data warehouse environment. The ETL job could run on a schedule, say hourly or nightly, depending on how fresh the reports need to be.
The reporting database would use a denormalized schema, such as a star or snowflake schema, to simplify and speed up queries. Iβd also add indexes, materialized views, or pre-aggregated tables for common reports.
For near-real-time dashboards, I might use Change Data Capture (CDC) or an event-driven pipeline to ensure updates flow continuously.
The main goal is to offload reporting from the OLTP system β so user transactions stay fast, and analytics queries donβt lock or slow down production data.
What .NET engineers should know:
- πΌ Junior: Know that you need to move data from the central database into a separate one for reports to avoid slowing down users.
- π Middle: Use ETL or CDC to sync data into a reporting database thatβs optimized for reads and aggregations.
- π Senior: Design the warehouse with a star schema, partitioning, and caching. Keep compute isolated and automate refreshes with incremental loads or streaming ingestion.
β What is "Sharding" and what are the biggest challenges when implementing it?
Sharding is a technique for horizontally scaling a database by splitting data across multiple servers (called shards). Each shard holds only a portion of the total data β for example, users AβM on one server, NβZ on another. The goal is to handle more data and traffic than a single database can manage, while keeping queries fast and writes balanced.

How it works:
- You pick a shard key (like
UserId,Region, orTenantId). - The system uses that key to decide where a record lives.
- Each shard operates independently β with its own storage, indexes, and performance profile.
Most significant challenges when implementing sharding:
- Choosing the right shard key
- Cross-shard queries and joins
- Queries that need data from multiple shards are slow and complex.
- Joins, aggregates, and transactions across shards often require custom logic or data duplication.
- Rebalancing shards
- If one shard grows faster than the others, you may need to move data (resharding).
- This is difficult without downtime or consistency issues.
- Transactions and consistency
- ACID transactions are typically limited to a single shard.
- Cross-shard transactions require coordination (like two-phase commit).
- Operational complexity
- Backups, migrations, and monitoring become multi-node operations.
- You need tooling to detect hotspots and manage shard health.
What .NET engineers should know:
- πΌ Junior: Should know sharding means splitting data across multiple servers for scalability.
- π Middle: Should understand how shard keys affect performance and what problems arise with cross-shard operations.
- π Senior: Should design resilient shard-aware systems β abstracting routing logic, monitoring data distribution, and planning for rebalancing, multi-region scaling, and observability.
π Resources:
β How would you configure high availability and disaster recovery for a critical SQL Server database?
High availability (HA) and disaster recovery (DR) both aim to keep your database online β but they solve different problems:
- High availability = minimal downtime during local failures (like a server crash).
- Disaster recovery = the ability to recover from larger incidents (such as data center loss).
SQL Server gives you a few main ways to achieve this, depending on your needs and budget:
1. Always On Availability Groups (AG)

Best choice for mission-critical systems. It keeps multiple replicas of the same database on separate servers. One is the primary, and the others are secondaries.
If the primary fails, another replica automatically takes over.
- Sync mode = zero data loss, used for HA (within the same region).
- Async mode = some delay, used for DR (across regions).
-- Simplified setup (conceptually)
CREATE AVAILABILITY GROUP SalesAG
WITH (AUTOMATED_BACKUP_PREFERENCE = SECONDARY)
FOR DATABASE SalesDB
REPLICA ON 'SQLNode1' WITH (ROLE = PRIMARY),
'SQLNode2' WITH (ROLE = SECONDARY);2. Failover Cluster Instance (FCI)

The entire SQL Server instance (not just a database) runs on a Windows Failover Cluster. Shared storage is used β only one node is active at a time.
Suitable for hardware or OS-level failures, but not storage corruption.
3. Log Shipping

Transaction log backups are automatically shipped and restored to a secondary server.
Thereβs no automatic failover, but itβs simple and suitable for DR setups.
4. Backups + Geo-Replication
For smaller systems, you can combine frequent backups with geo-replicated storage (e.g., Azure Blob Storage or S3).
Itβs slower to recover but cost-effective.
What .NET engineers should know:
- πΌ Junior: Understand that HA/DR keeps the database available during failures. Know basic terms like primary/secondary replicas and failover.
- π Middle: Be familiar with options like Availability Groups, Failover Clusters, and Log Shipping β and when to use each. Know how connection strings use listeners for automatic failover.
- π Senior: Design a complete HA/DR strategy based on business RPO/RTO goals, handle cross-region replication, test failover regularly, and plan for split-brain and backup consistency. Should also integrate monitoring and alerting (SQL Agent, Prometheus, or Azure Monitor).
π Resources:
- Get started with log shipping on Linux
- About log shipping
- Always On failover cluster instances
- What is an Always On availability group?
β How can you reduce locking contention in a high-write environment?
todo
Scenario-Based & Practical

β Can you define a transaction and give an example of a business operation that absolutely needs to be wrapped in one?
A transaction is a unit of work in a database that involves multiple operations, which either succeed or fail as a whole.
If one step fails, everything rolls back β so your data isnβt left in a broken or partial state.
Classic example: bank transfer.
Step 1: Subtract money from Account A
Step 2: Add money to Account B
If the second step fails, the first one must roll back. Otherwise, money βdisappears.β
Example in C#:
using var transaction = await db.Database.BeginTransactionAsync();
try
{
var from = await db.Accounts.FindAsync(1);
var to = await db.Accounts.FindAsync(2);
from.Balance -= 100;
to.Balance += 100;
await db.SaveChangesAsync();
await transaction.CommitAsync();
}
catch
{
await transaction.RollbackAsync();
throw;
}What .NET engineers should know:
- πΌ Junior: Determine whether a transaction involves multiple database actions or none at all.
- π Middle: Understand how to implement transactions in EF Core or SQL, and when theyβre needed (payments, inventory updates).
- π Senior: Tune isolation levels, handle deadlocks, and design distributed transactions or compensating actions in microservices.
π Resources:
β A page in your application is loading slowly. How would you systematically determine if the problem is a database query?
If you suspect the database, you can narrow it down step by step:
- Measure total request time β add timing logs at the start and end of your controller or service.
- Log query durations β most ORMs, such as EF Core, can log SQL commands and execution times.
- Find slow queries β in SQL Server, check the Query Store or Extended Events; in PostgreSQL, use
pg_stat_statementsorEXPLAIN (ANALYZE). - Run the query directly β copy the SQL from the logs and execute it in SSMS or psql to see if itβs prolonged or if something else is causing a delay.
- Check the execution plan β look for missing indexes, table scans, or heavy joins.
- Test with parameters β sometimes the plan works for one input and breaks for another (parameter sniffing).
What .NET engineers should know:
- πΌ Junior: Learn to identify if the slowdown is from the database by logging execution times. Be able to run simple queries manually and compare performance.
- π Middle: Use tools like Query Store or
EXPLAIN (ANALYZE)to analyze execution plans, find missing indexes, or optimize queries. - π Senior: Understand deeper causes β parameter sniffing, locking, poor statistics β and know how to correlate app metrics with DB metrics. Should also be able to tune connection pools and caching strategies.
π Resources:
β How does database connection pooling work in .NET, and what are some common mistakes that can break it?
Connection pooling in .NET is a way to reuse existing database connections rather than opening and closing new ones each time your code runs a query. Opening a database connection is expensive β pooling makes it fast and efficient by keeping a small pool of ready-to-use connections in memory.
When you call SqlConnection.Open(), .NET doesnβt always create a brand-new connection. It checks if thereβs an available one in the pool with the exact connection string:
- If yes, it reuses it.
- If not, it creates a new one (up to the pool limit).
When you call Close() or Dispose(), the connection isnβt really closed β itβs returned to the pool for reuse.
What .NET engineers should know:
- πΌ Junior: Know that connection pooling reuses DB connections to improve performance. Always close or properly dispose of connections.
- π Middle: Understand that each unique connection string has its own pool. Know how to adjust pool sizes and identify issues like pool exhaustion.
- π Senior: Be able to debug connection leaks (e.g., via
Performance Countersordotnet-trace), tune pools for async workloads, and understand provider-specific pooling (SQL Server, Npgsql, MySQL). Should design data access code that scales safely under load.
π Resources: SQL Server connection pooling (ADO.NET)
β If you had to improve the performance of a large DELETE or UPDATE operation, what strategies would you consider?
Big DELETE or UPDATE queries can lock tables and fill logs. The trick is to do it in smaller batches and reduce overhead.
Main approaches to handling it:
- Batch processing β run deletes in chunks:
DELETE TOP (10000) FROM Orders WHERE Status = 'Archived'; - Partitioning β drop or truncate old partitions instead of deleting rows.
- Disable extra work β temporarily drop unused indexes or triggers.
- Use the proper recovery/logging mode β e.g.,
BULK_LOGGEDin SQL Server. - Run off-peak and monitor lock and log sizes.
What .NET engineers should know:
- πΌ Junior: Avoid deleting millions of rows at once β use batching.
- π Middle: Tune indexes and use partitioning for large tables.
- π Senior: Design cleanup jobs and archiving strategies that donβt block production systems.
π Resources: How to use batching to improve application performance
β How would you handle a situation where a necessary query is just inherently slow due to the volume of data it needs to process?
Sometimes a query is slow, not because itβs written badly β itβs just doing a lot of work. When thatβs the case, the goal isnβt to make it instant, but to make it manageable β by controlling when, how often, and how much data it touches.
Hereβs how you can approach it step by step π
1. Donβt make it faster β make it smarter
Ask:
Do we really need all the data right now?
- Fetch summaries instead of raw rows (e.g., aggregates or pre-computed stats).
- Use incremental updatesβ process only what has changed since the last run.
- Cache results if they donβt change often.
2. Run it asynchronously or in the background
Move the heavy query out of the request path.
For example:
- Use a background job (Hangfire, Quartz.NET, Azure Function, etc.).
- Trigger it periodically and store results in a reporting table.
- Let the frontend fetch preprocessed data instead of running the query live.
3. Partition or segment data
Split large tables by date, region, or tenant β so queries only scan relevant partitions.
SELECT * FROM Orders
WHERE OrderDate >= '2024-01-01' AND OrderDate < '2024-02-01';Now the DB only touches one monthβs data instead of years.
4. Use materialized views or pre-aggregated tables
Precompute common aggregations and refresh them periodically.
5. Tune hardware and execution environment
Sometimes youβve already optimized everything logically β then itβs about:
- More memory for caching.
- Faster storage (SSD).
- Parallel query settings or read replicas for scaling reads.
What .NET engineers should know:
- πΌ Junior: Understand that some queries are heavy because of the data size. Learn to cache or limit data when possible.
- π Middle: Move heavy queries to background jobs or precompute results using materialized views or summaries.
- π Senior: Design systems around data volume β partitioning, async pipelines, read replicas, and caching strategies that keep heavy operations off the critical path.
π Resources:
β What's the purpose of the STRING_AGG or FOR XML PATH techniques in SQL? Can you think of a use case?
Both STRING_AGG (modern approach) and FOR XML PATH (older workaround) are used to combine multiple row values into a single string β a process often called string aggregation or grouped concatenation.
Theyβre commonly used when you need to return a list of related values as a single field instead of multiple rows.
Example use case
Suppose you want a list of products purchased by each customer.
With STRING_AGG (SQL Server 2017+, PostgreSQL 9.0+, MySQL 8+):
SELECT c.CustomerName,
STRING_AGG(p.ProductName, ', ') AS Products
FROM Orders o
JOIN Customers c ON o.CustomerId = c.Id
JOIN Products p ON o.ProductId = p.Id
GROUP BY c.CustomerName;Output:
| CustomerName | Products |
|---|---|
| Alice | Laptop, Mouse, Keyboard |
| Bob | Monitor, Mouse |
Before STRING_AGG existed, SQL Server developers used FOR XML PATH to achieve the same result:
SELECT c.CustomerName,
STUFF((
SELECT ',' + p.ProductName
FROM Orders o2
JOIN Products p ON o2.ProductId = p.Id
WHERE o2.CustomerId = c.Id
FOR XML PATH('')
), 1, 1, '') AS Products
FROM Customers c;This works by building XML, then flattening it into a string β a creative workaround that was popular before aggregation functions became standard.
What .NET engineers should know:
- πΌ Junior: Understand that
STRING_AGGcombines multiple row values into one string. - π Middle: Know how to use it in grouped queries and replace older
FOR XML PATHlogic in legacy systems. - π Senior: Optimize its use in reporting or API queries and understand its behavior with ordering (
STRING_AGG(... ORDER BY ...)) and null handling.
π Resources:
Entity Framework Core

β What are the trade-offs of using AsNoTracking() in EF Core?
AsNoTracking() tells Entity Framework Core not to track changes for the returned entities. Usually, EF keeps a copy of every loaded entity in its change tracker β so if you modify it later, EF knows how to generate an UPDATE. With AsNoTracking(), EF skips that overhead completely.
// Regular query: tracked entities
var users = context.Users.ToList();
// No tracking: faster, lightweight
var users = context.Users.AsNoTracking().ToList();AsNoTrackingWithIdentityResolution
Introduced in EF Core 5.0, this version offers a middle ground.
It still disables change tracking for performance, but ensures each unique entity from the result is returned only once β meaning if the same record appears in multiple navigation paths, you get a shared instance instead of duplicates.

Example:
var users = context.Users
.Include(u => u.Orders)
.AsNoTrackingWithIdentityResolution()
.ToList();This helps when working with complex queries that join or include relationships β avoiding duplicate entity instances while keeping the query read-only.
What .NET engineers should know:
- πΌ Junior: Know that
AsNoTracking()speeds up read-only queries by skipping change tracking. - π Middle: Understand when to use
AsNoTrackingWithIdentityResolution()β for read-only queries that include navigation properties. - π Senior: Balance both options β use regular tracking for write operations,
AsNoTracking()for simple reads, andAsNoTrackingWithIdentityResolution()for complex projections where identity consistency matters.
π Resources: Tracking vs. No-Tracking Queries
β You need to insert 100,000 records. What's wrong with using a simple loop with SaveChanges(), and what would you do instead?
Calling SaveChanges() inside a loop is one of the most common performance mistakes when inserting large datasets. Each call sends a separate transaction and round-trip to the database β meaning 100,000 inserts = 100,000 transactions. Thatβs extremely slow and puts unnecessary load on both the application and the database.
foreach (var item in records)
{
context.Items.Add(item);
context.SaveChanges(); // β 100,000 separate database calls
}This approach can take minutes instead of seconds.
Better approaches to insert multiple records:
Batch inserts with fewer SaveChanges calls
Add multiple records, then save once per batch:
await foreach (var chunk in records.Chunk(1000).ToAsyncEnumerable()) // β
Use async streaming
{
context.AddRange(chunk);
await context.SaveChangesAsync(); // β
Async, 1000 records per transaction
}This dramatically reduces round-trip and transaction overhead.
Tip: Disable change tracking and use a using block for scoped DbContext:
using var context = new AppDbContext(); // β
Scoped lifetime
context.ChangeTracker.AutoDetectChangesEnabled = false;
await foreach (var chunk in records.Chunk(1000).ToAsyncEnumerable())
{
context.AddRange(chunk);
await context.SaveChangesAsync();
}Use Bulk Operations
Libraries like EFCore.BulkExtensions perform bulk inserts directly via SQL:
context.BulkInsert(records);These bypass EFβs change tracking, making large inserts 10β100x faster.
Use raw SQL or database-specific tools.
- SQL Server:
SqlBulkCopy - PostgreSQL:
COPYcommand - MySQL: multi-row
INSERTstatements
These are ideal for millions of records where EF overhead is too high.
What .NET engineers should know:
- πΌ Junior: Understand that calling
SaveChanges()per record is inefficient. Use batching to reduce database calls. - π Middle: Apply chunked inserts or use bulk libraries to optimize mass inserts.
- π Senior: Choose the best approach per scenario β EF batching for moderate data, raw bulk APIs for massive imports, and tune transaction size to balance performance and memory.
π Resources:
β How can you use projections in EF Core to optimize a query that only needs a few fields?
By default, EF Core maps entire entities β all columns β even if you only use a few of them later. That means extra data is fetched, tracked, and stored in memory unnecessarily. Using projections with .Select() lets you fetch only the columns you actually need, which reduces database I/O, memory usage, and tracking overhead.
Example
Letβs say your entity has many columns, but you only need a userβs name and email for a list view.
// Without projection β loads full User entity
var users = await context.Users.ToListAsync();
// With projection β fetch only required fields
var users = await context.Users
.Select(u => new { u.Name, u.Email })
.ToListAsync();EF Core translates this directly into SQL, selecting only those columns:
SELECT [u].[Name], [u].[Email]
FROM [Users] AS [u];The result is smaller, faster to transfer, and uses less memory on the .NET side.
You can also project into DTOs or view models:
var customers = await context.Customers
.Select(c => new CustomerDto
{
Id = c.Id,
Name = c.Name,
TotalOrders = c.Orders.Count
})
.ToListAsync();What .NET engineers should know:
- πΌ Junior: Use
.Select()to fetch only the fields needed β not whole entities. - π Middle: Project into DTOs or anonymous types to reduce memory and improve performance.
- π Senior: Combine projections with pagination and no-tracking for maximum efficiency in API and reporting layers.
π Resources: Efficient Querying
β What exactly is the "N+1 query problem," and how have you solved it in EF Core?
The N+1 query problem happens when your code runs one main query to fetch a list of items (the β1β), and then executes a new query for each item to load its related data (the βNβ).
So instead of one efficient join or batch, you end up making dozens β or hundreds β of small queries.
It usually occurs when lazy loading or unoptimized navigation properties are used.

Example of the problem
// β Triggers N+1 queries
var orders = context.Orders.ToList();
foreach (var order in orders)
{
Console.WriteLine(order.Customer.Name); // Each access triggers a new query
}This looks simple, but EF runs: 1 query for all orders, then 1 query per order to fetch its customer β leading to poor performance.
How to fix it
- Option 1: Use eager loading (
Include)
// β
Single query
var orders = context.Orders
.Include(o => o.Customer)
.ToList();EF Core generates one SQL query with a join to include customers upfront.
- Option 2: Use projection instead of navigation
var orders = context.Orders
.Select(o => new
{
o.Id,
CustomerName = o.Customer.Name,
o.Total
})
.ToList();Projections fetch exactly what you need in a single, efficient query.
- Option 3. Disable lazy loading.
Lazy loading is often the root cause. If you donβt explicitly need it, disable it:
context.ChangeTracker.LazyLoadingEnabled = false;- Option 4. Consider split queries (when using large includes)
When eager loading too much data in one query leads to cartesian explosions, EF Core 5+ can split them safely:
var orders = context.Orders
.Include(o => o.Customer)
.AsSplitQuery()
.ToList();What .NET engineers should know:
- πΌ Junior: Understand that N+1 means too many queries caused by loading related data separately.
- π Middle: Use
.Include()or projections to efficiently load related data. Disable lazy loading if not needed. - π Senior: Combine eager loading, projections, and split queries strategically based on data shape and query size. Optimize using profiling tools such as EF Core Power Tools or SQL Profiler.
π Resources:
β How would you explain the Unit of Work and Repository patterns to someone just starting with EF Core? Does the DbContext itself implement these?
The Repository and Unit of Work patterns are classic ways to organize data access in an application.
But if youβre using EF Core, youβre already using both, even if you donβt realize it.
Repository pattern
The Repository pattern acts as a collection-like interface for accessing data.
Instead of writing SQL, you interact with objects:
// Repository-style access
var user = await context.Users.FirstOrDefaultAsync(u => u.Email == email);Unit of Work pattern
The Unit of Work groups a set of changes into a single transaction. In EF Core, DbContext tracks changes to entities and saves them all at once with SaveChanges() β committing as one atomic operation.
var user = new User { Name = "Alice" };
context.Users.Add(user);
user.Name = "Alice Cooper";
await context.SaveChangesAsync(); // β
single transaction (Unit of Work)Do you still need to implement them manually?
In most modern applications, it's not needed; EF Core already provides these abstractions:
- Each
DbSet<TEntity>acts as a Repository. - The
DbContextacts as a Unit of Work.
Creating extra layers often adds complexity without a significant benefit. However, some teams still use custom repository interfaces to:
- Decouple EF Core from business logic (for testing or mocking).
- Enforce consistent query rules across the app.
But for many projects, DbContext is enough.
What .NET engineers should know:
- πΌ Junior: Understand that
DbContexttracks entities (Repository) and commits changes together (Unit of Work). - π Middle: Know when to use EF Core directly vs adding abstraction for testability or specific business rules.
- π Senior: Design clean data layers β avoid redundant abstractions, use generic repositories only when they add value, and keep transaction scope inside the Unit of Work (
DbContext).
π Resources:
β How do you manage the lifetime of your DbContext in a modern ASP.NET Core application? Why shouldn't it be a singleton?
In ASP.NET Core, the DbContext is designed to be short-lived and scoped to a single request. That means a new instance is created for each incoming HTTP request and disposed automatically when the request ends.
You register it like this:
services.AddDbContextPool<AppDbContext>(options =>
options.UseSqlServer(Configuration.GetConnectionString("Default")));By default, this sets it up with a Scoped lifetime.
Even though a new DbContext is created per request; youβre not opening a brand-new database connection every time.
Why shouldnβt it be a singleton?
- A singleton DbContext lives for the entire app lifetime β which leads to multiple serious issues:
- Thread safety β DbContext isnβt thread-safe. Multiple requests using the same instance can corrupt its internal state.
- Memory leaks β It keeps tracking every entity itβs ever loaded. Over time, this can consume a massive amount of memory.
- Stale data β Long-lived contexts donβt reflect the current database state unless you explicitly reload entities.
- Transaction issues β The context might hold open transactions or connections longer than intended.
Essentially, using a single shared DbContext for all requests turns your app into a ticking time bomb of concurrency and data-consistency issues.
Recommended lifetimes:
| Lifetime | Description | When to use |
|---|---|---|
| Scoped | One per web request (default) | β Standard for web apps and APIs |
| Transient | New instance every time itβs requested | For background jobs or short-lived operations |
| Singleton | One instance for the app lifetime | β Never use for DbContext |
What .NET engineers should know:
- πΌ Junior: Know that
DbContextshould be short-lived β one per request. - π Middle: Understand why scoped lifetime prevents concurrency and memory issues.
- π Senior: Manage custom scopes for background jobs or async work; ensure transactional consistency by keeping
DbContextlifetimes aligned with business operations.
π Resources:
β Let's say you have a complex, pre-tuned SQL query. How do you execute it and get the results back into your .NET code using EF Core?
Sometimes you already have a hand-optimized SQL query β maybe from a DBA or profiler β and you want to reuse it in your .NET app.
EF Core lets you run raw SQL directly and still map the results to your entities or custom DTOs safely.
Below are the options for how we can execute those queries:
- Using
FromSqlRaw()for entity results
var users = await context.Users
.FromSqlInterpolated($"SELECT * FROM Users WHERE City = {city}")
.ToListAsync();- Using
Database.SqlQuery<T>()
If your query doesnβt match an entity (e.g., returns aggregates or joins), map it to a custom type
public record UserStats(string City, int Count);
var stats = await context.Database
.SqlQuery<UserStats>(
"SELECT City, COUNT(*) AS Count FROM Users GROUP BY City")
.ToListAsync();- Using
Database.ExecuteSqlRaw()for commands
If the SQL doesnβt return rows (e.g., bulk updates or deletes):
await context.Database.ExecuteSqlRawAsync(
"UPDATE Orders SET Status = 'Archived' WHERE OrderDate < {0}", cutoffDate);- Fallback: ADO.NET for complete control
For advanced cases (multiple result sets, temp tables, etc.), you can still drop to raw ADO.NET via EFβs connection:
using var cmd = context.Database.GetDbConnection().CreateCommand();
cmd.CommandText = "EXEC GetUserStats";
context.Database.OpenConnection();
using var reader = await cmd.ExecuteReaderAsync();
while (await reader.ReadAsync())
{
// manual mapping
}What .NET engineers should know:
- πΌ Junior: Know you can run raw SQL via
FromSqlRaw()andExecuteSqlRaw(). - π Middle: Use
SqlQuery<T>()or DTO projections when results donβt match an entity. Handle parameters safely with interpolation. - π Senior: Mix EF queries with raw SQL strategically β prefer EF for maintainability but drop to SQL for performance-critical paths. Understand connection management and transaction context when running raw commands.
π Resources: SQL Queries
β How does EF Core handle concurrency conflicts? Can you describe a scenario where youβve had to implement this?
EF Core handles concurrency conflicts using optimistic concurrency control. The idea is simple β multiple users can read and modify the same data, but when saving, EF Core checks if the data has changed since it was read. If yes, it throws a DbUpdateConcurrencyException.
To make it work, you add a concurrency token (like a RowVersion column) to your entity. EF Core includes this value in the WHERE clause during updates, ensuring the record is updated only if the token hasnβt changed.
Example:
public class Product
{
public int Id { get; set; }
public string Name { get; set; }
public decimal Price { get; set; }
[Timestamp]
public byte[] RowVersion { get; set; }
}
// Handling concurrency
try
{
context.SaveChanges();
}
catch (DbUpdateConcurrencyException)
{
Console.WriteLine("Data was modified by another user. Reload and retry.");
}In practice, this happens when, say, two admins edit the same productβs price. The second SaveChanges() will fail, and you can choose whether to overwrite or reload the entity.
What .NET engineers should know:
- πΌ Junior: Should know EF Core can detect conflicting updates and that this exception means the record was changed by someone else.
- π Middle: Should understand how to use
[Timestamp]orIsConcurrencyToken()and how to handle conflicts properly β retry, reload, or merge data. - π Senior: Should design system-wide concurrency strategies, combining EF Core mechanisms with DB-level or custom logic to ensure consistency under heavy load.
π Resources: Handling Concurrency Conflicts
β How do you seed reference data (like a list of countries or product categories) using EF Core migrations?
In EF Core, you can seed reference data using the HasData() method in your OnModelCreating() configuration. When you add a migration, EF Core generates INSERT statements for that data. This approach is helpful for static reference data β like countries, product types, or predefined roles β that rarely changes.
Example:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Country>().HasData(
new Country { Id = 1, Name = "Poland" },
new Country { Id = 2, Name = "Germany" },
new Country { Id = 3, Name = "France" }
);
}Then run:
dotnet ef migrations add SeedCountries
dotnet ef database updateEF Core will insert the seed data during migration. If you change the data later, EF will generate UPDATE or DELETE statements automatically.
What .NET engineers should know:
- πΌ Junior: Should know that
HasData()can seed static data and that it runs through migrations, not runtime initialization. - π Middle: Should understand that seeded data must include primary keys and how EF Core compares data changes to update or delete existing records.
- π Senior: Should know when to avoid
HasData()(e.g., large datasets, dynamic configs) and instead use custom scripts, JSON imports, or background seeders for more flexibility.
π Resources: Data Seeding
β What are owned entities in EF Core, and when would you use them?
Owned entities in EF Core let you group related fields into a value object that doesnβt have its own identity or table. Theyβre helpful when you want to logically separate parts of an entity β like an address or a money type β but still keep them stored in the same table as the owner. Itβs a clean way to model value objects following Domain-Driven Design principles.
Example:
public class User
{
public int Id { get; set; }
public string Name { get; set; }
public Address Address { get; set; }
}
[Owned]
public class Address
{
public string City { get; set; }
public string Street { get; set; }
}
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<User>().OwnsOne(u => u.Address);
}This generates one table Users with columns like City and Street embedded directly in it.
When to use:
Use owned entities when you want to model a complex value object that always belongs to a single parent entity and doesnβt need its own lifecycle, primary key, or repository.
What .NET engineers should know:
- πΌ Junior: Should understand that the owned entities group-related fields are stored in the same table as their parent.
- π Middle: Should understand how to configure them with
[Owned]orOwnsOne(), and how they differ from regular relationships (no separate key or table). - π Senior: Should use owned entities to implement value objects and ensure clean domain boundaries, possibly combining them with immutability or record types for better consistency.
π Resources: Owned Entity Types
β How do you configure a many-to-many relationship in EF Core?
In EF Core, many-to-many relationships are easy to set up starting from EF Core 5. You can define them directly without creating a separate join entity class β EF Core automatically builds a join table under the hood.
If you need additional columns in the join table (such as timestamps or metadata), you can define the join entity explicitly.
Example:
public class Student
{
public int Id { get; set; }
public string Name { get; set; }
public ICollection<Course> Courses { get; set; }
}
public class Course
{
public int Id { get; set; }
public string Title { get; set; }
public ICollection<Student> Students { get; set; }
}
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Student>()
.HasMany(s => s.Courses)
.WithMany(c => c.Students);
}EF Core will create a join table named CourseStudent automatically with StudentId and CourseId as composite keys.
What .NET engineers should know:
- πΌ Junior: Should know that EF Core supports many-to-many relationships out of the box and automatically creates a join table.
- π Middle: Should understand how to customize the join table and define explicit join entities when additional fields are needed.
- π Senior: Should design data models with clear ownership rules, consider performance implications (like lazy loading or query joins), and optimize navigation properties for large datasets.
π Resources: Many-to-many relationships
β What are global query filters, and what's a practical use case for them?
Global query filters in EF Core let you automatically apply a WHERE condition to all queries for a specific entity type. Theyβre great for scenarios like soft deletes, multi-tenancy, or filtering archived data β so you donβt have to repeat the same condition in every query.
Example (soft delete):
public class Product
{
public int Id { get; set; }
public string Name { get; set; }
public bool IsDeleted { get; set; }
}
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Product>().HasQueryFilter(p => !p.IsDeleted);
}
Now every query like context.Products.ToList() automatically includes WHERE IsDeleted = 0. You can still disable the filter temporarily using IgnoreQueryFilters().
What .NET engineers should know:
- πΌ Junior: Should know global filters automatically apply to all queries and simplify logic like soft deletes.
- π Middle: Should understand how to use
HasQueryFilter()and disable filters withIgnoreQueryFilters()when needed. - π Senior: Should design multi-tenant and soft-delete strategies using filters carefully β considering performance, caching, and how filters interact with includes or raw SQL.
π Resources: Global Query Filters
β Can you explain what a shadow property is in EF Core?
A shadow property in EF Core is a field that exists in the model but not in your entity class. Itβs managed by EF Core internally β you can query, filter, and store values for it, even though itβs not part of your C# object.
This is handy when you need extra metadata (like CreatedAt, UpdatedBy, or TenantId) that doesnβt belong to the domain model itself but is still helpful in the database.
Example:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Product>()
.Property<DateTime>("CreatedAt")
.HasDefaultValueSql("GETDATE()");
}Now EF Core will create a CreatedAt column in the Products table, even though Product doesnβt have that property in C#. You can access it like this:
var createdAt = context.Entry(product).Property("CreatedAt").CurrentValue;What .NET engineers should know:
- πΌ Junior: Should know shadow properties exist in the EF model but not in the entity class.
- π Middle: Should understand how to define and use them for audit fields, multi-tenancy, or soft deletes.
- π Senior: Should design when to use shadow properties deliberately β balancing between clean domain models and operational metadata needs, possibly integrating them with interceptors or change tracking.
π Resources: Shadow and Indexer Properties
β How does EF Core decide whether to issue an INSERT or an UPDATE when you call SaveChanges()?
When you call SaveChanges() in Entity Framework Core, the framework looks at the state of each tracked entity in the DbContext to decide what SQL command to send. Itβs all about change tracking β EF Core keeps track of each entityβs lifecycle from when you load or attach it until you save.

It compares current property values with original ones (snapshotted on load).Based on that diff, it decides which SQL operation to generate.
What .NET engineers should know:
- πΌ Junior: Should know
Add(),Update(), andRemove()change entity states, and EF Core decides what to save based on them. - π Middle: Should understand how the Change Tracker works and when to use
AsNoTracking()to skip tracking for read-only queries. - π Senior: Should control state management explicitly β using
Attach(),Entry(entity).State, or detached entity updates to avoid unnecessary SQL operations in high-load systems.
π Resources: