Node.js Developer Interview Questions
200 scenario-based questions with detailed model answers, organized skill-wise and tool-wise. Filter by topic, level or keyword, reveal the answer — then pressure-test yourself in a real mock.
Your Node.js payment service processes 10k webhooks/minute. On Black Friday, latency spikes to 8 seconds and the process CPU hits 100%. Heap is stable. You suspect a synchronous bottleneck in the event loop. How do you diagnose and fix this without a restart?
A junior developer on your team wrote an Express route that calls `fs.readFileSync` inside a request handler, arguing it is simpler than dealing with callbacks. The endpoint serves about 200 requests per second. Walk through the exact problem this creates and how you would refactor it.
You are debugging a Node.js service where Promises resolve but callbacks registered with `setImmediate` run before them unexpectedly in certain code paths. A colleague insists this is a Node.js bug. Explain what is actually happening, with a concrete code trace through the event loop phases.
You inherit an Express app where error handling is scattered — some routes throw, some call `next(err)`, and some errors are silently swallowed. The app has no global error middleware. Describe the minimal, correct refactor to centralize error handling without rewriting every route.
Your team is migrating a 3-year-old Express monolith to NestJS. The Express app uses raw middleware for auth, a mix of callback and Promise-based services, and no DI container. Engineers are worried about a big-bang rewrite. Propose a strangler-fig migration strategy with concrete NestJS concepts.
A NestJS microservice handles 50,000 requests per day. After deploying a new interceptor that logs request/response bodies for auditing, p99 latency jumps from 120ms to 900ms. Memory usage also climbs by 400MB per hour. What is your diagnosis and fix?
Your team is designing a bulk REST endpoint for creating up to 500 orders at once. A colleague proposes `POST /orders` with an array body. Another proposes `POST /orders/batch`. A third argues for multiple parallel single-item requests from the client. Referee this debate with concrete trade-offs.
You are the lead on a public REST API serving 200 third-party integrators. You need to introduce a breaking change to the `/users/{id}` response schema — removing the `ssn` field for GDPR compliance. Describe your versioning strategy, migration timeline, and how you communicate this to partners.
An internal service returns HTTP 200 with `{ success: false, error: 'Not found' }` for missing resources. Your new client code is treating every 200 as successful and missing these errors. Make the case for fixing this at the API layer, not the client layer, and propose the correct response contract.
Your Node.js service exports a report endpoint that reads 2 million rows from PostgreSQL, formats them as CSV, and streams the response. In staging, memory usage hits 4GB and the process crashes. Production rows are 3x larger. Walk through a streams-based architecture that keeps memory under 50MB.
A colleague's code reads a large uploaded file into memory with `req.pipe(concat)` before validating it. The endpoint receives files up to 500MB from enterprise clients. Memory spikes to 1.5GB under load. Propose a streaming validation approach that avoids buffering.
You need to build a real-time log aggregator that reads from 200 concurrent TCP sockets, parses newline-delimited JSON, and writes to ClickHouse in batches of 10,000 rows. How do you handle backpressure between the socket ingest layer and the batch writer to avoid memory growth?
Your Node.js API service memory grows from 200MB to 1.8GB over 72 hours in production without a restart. Heap snapshots show the number of `IncomingMessage` objects increasing steadily. The service uses Express and an Axios-based HTTP client. Diagnose and fix the leak.
A Node.js background job processes 1 million records nightly from a CSV file. It completes in 6 hours but you need it under 30 minutes. The current code reads the file line by line, calls an external enrichment API per record, and writes results to Postgres. How do you redesign this?
Your Node.js service runs behind a load balancer with 8 instances. Each instance reports healthy CPU and memory, but end-to-end request latency is bimodal: half the requests complete in 80ms, half take 2+ seconds. The slow requests always hit specific instances. What is your investigation approach?
You are converting a Node.js library from CommonJS to ESM. Your library uses `__dirname` and `__filename` in several places, and consumers import it using both `require()` and `import`. How do you handle the conversion while maintaining backward compatibility?
Your monorepo has 40 packages, a mix of ESM-only and CJS-only. Jest tests work fine, but when running integration tests with Vitest, certain packages throw 'require() of ES Module is not supported'. The packages have not changed. How do you diagnose and resolve this?
Your Node.js API uses JWTs with a 24-hour expiry. A security audit flags that a stolen token can be used for up to 24 hours after an account compromise is detected. The team suggests reducing expiry to 5 minutes. Walk through the trade-offs and propose a balanced solution.
You are building a multi-tenant SaaS API where each tenant has their own RSA key pair for JWT signing. When a tenant's private key is compromised, you need to rotate it without logging out all active users of that tenant. Design this key rotation flow in Node.js.
After deploying a new authentication service, you discover that JWT verification is adding 40ms to every request. The previous implementation was synchronous and added under 1ms. The new implementation uses `jsonwebtoken.verify` with an async callback. Explain the performance regression and fix it.
Your Postgres query `SELECT * FROM orders WHERE customer_id = $1 AND status = 'pending'` takes 800ms on a table with 5 million rows. EXPLAIN ANALYZE shows a sequential scan. You have an index on `customer_id` alone. Why is the planner ignoring it, and how do you fix this?
A MongoDB collection with 50 million documents is being queried for analytics: average order value by city for the last 30 days. The aggregation pipeline takes 45 seconds. The collection has a compound index on `{city: 1, createdAt: 1}`. Why is performance still poor, and what is the optimized approach?
Your Node.js service uses a Postgres connection pool (pg-pool, max 20 connections). Under load tests at 500 concurrent users, you observe connection timeout errors and the database shows 20 active connections all in 'idle in transaction' state. What went wrong and how do you fix it?
You add Redis caching to a Node.js product catalog API. After deploying, you notice that during a cache miss, 50 concurrent requests for the same product all hit the database simultaneously, overwhelming it. This is called a cache stampede. How do you prevent it?
Your Node.js e-commerce platform uses Redis for session storage. A Redis primary node fails at 2am. Your monitoring shows 40,000 active sessions are lost and 15,000 users are forcibly logged out during peak time in Asia. How do you redesign the session architecture to survive primary failure?
You implement a write-through cache for a Node.js user profile service: every profile update writes to Postgres and Redis simultaneously. Six months later, you detect that 3% of Redis profiles differ from Postgres. No errors are logged. What causes this and how do you make the cache consistent?
You build a Node.js WebSocket server using the `ws` library. During load testing with 10,000 concurrent connections, you notice that broadcasting a message to all connected clients takes 4 seconds and blocks new connections from completing their handshake. Diagnose and fix this.
Your collaborative document editing service uses WebSockets. Users complain that concurrent edits from two users occasionally overwrite each other's changes. The backend receives both edits within milliseconds of each other and applies them sequentially to a Mongo document. Design an OT or CRDT-based fix.
You operate a live sports score service delivering updates to 500,000 WebSocket clients. During a World Cup final, you need to push a score update to all clients within 200ms. Your current single-region Node.js cluster handles 50k connections per machine. Describe the fanout architecture.
You are writing Jest tests for a Node.js service that calls an external payment API. A colleague mocks the entire `axios` module globally with `jest.mock('axios')`. Tests pass locally but fail intermittently in CI, and some tests affect others. What is wrong and how do you restructure the mocks?
Your Node.js service has a critical background job that runs every 15 minutes. Testing it with real timers means tests take 15 minutes each. A developer used Jest fake timers but the async Promises inside the job never resolve. Walk through the correct approach to test timer-dependent async code.
A Node.js API has 90% test coverage but still ships a bug where a race condition corrupts user balances during concurrent transfers. The tests all pass. What does this tell you about your test strategy and what would you add?
Your Node.js service crashes with `UnhandledPromiseRejectionWarning` in production three times per week. Each crash takes down the process. The errors come from different parts of the codebase. How do you systematically eliminate this class of error?
You are designing error handling for a Node.js microservice that calls five downstream services. Each downstream has different reliability characteristics. The product requirement is to return partial results when some downstream services fail rather than failing the entire request. Design the error propagation model.
After a deployment, your Node.js service starts returning 500 errors for 2% of requests. Logs show `TypeError: Cannot read property 'id' of undefined` from a data transformation function. The data looks correct in staging. What is your live debugging and rollback decision framework?
Your Node.js order service and inventory service need to be consistent: when an order is placed, inventory must be decremented. Both services have their own Postgres databases. A two-phase commit is off the table. How do you implement reliable, eventually consistent coordination?
You have 12 Node.js microservices in a Kubernetes cluster. A new developer deploys a change to the user-profile service that accidentally introduces a 3-second delay on every response. Within 10 minutes, 6 other services that call user-profile are cascading into 504 timeouts. How do you design resilience to prevent this cascade?
Your team debates whether inter-service communication for a new Node.js order fulfillment system should use REST HTTP calls or a message queue like RabbitMQ. The operations involve: create shipment, send customer email, update inventory, notify warehouse. Make the architectural decision.
Your Node.js service in production is responding correctly to health checks but user-facing requests time out after 30 seconds with no errors in the logs. The process is alive and not restarting. CPU is at 5%, memory is stable. What is happening and how do you debug it?
Three weeks after a major release, you get a PagerDuty alert: your Node.js API's p99 response time has climbed from 200ms to 4 seconds over 48 hours, but error rates are still near zero. The release added a new recommendation algorithm. How do you isolate whether the release caused this?
Your Node.js app uses `console.log` extensively for debugging. In production with 100 RPS, logs fill 500GB of disk in 3 days and the disk-full condition crashes the service. How do you implement structured logging that scales and what log levels should apply to what?
Your Node.js service uses cluster mode with 8 workers. You notice that one worker consistently handles 60% of requests while the others handle 5-10% each. The load balancer is round-robin. Diagnose why one worker is saturated and fix the distribution.
A junior developer wrote `Promise.all([fetchUser(), fetchOrders(), fetchInventory()])` but then added `await` inside each fetch function sequentially. She is confused why the three calls take the same time as sequential calls. Explain what went wrong and show the correct pattern.
Your team runs 8 Node.js microservices. A new DevOps engineer wants to add a centralized API gateway. The services currently talk to each other directly via HTTP. Two senior engineers resist, citing added complexity. Make the case for and against the gateway and recommend a decision.
You are using Mongoose in a Node.js app. A support ticket shows that deleting a user does not delete their associated posts, comments, and sessions. The Mongoose model has no middleware. How do you implement cascading deletes and what are the trade-offs of each approach?
You cache API responses in Redis with a 1-hour TTL. A content editor updates a blog post at 10am, but visitors still see the old version until 11am. Product wants updates visible within 30 seconds. How do you redesign the invalidation strategy?
Your customer support chat built with Socket.IO has been working for 6 months. After migrating to 3 Node.js instances behind a load balancer, agents report that customers' messages sometimes never arrive. The instances are not configured for Socket.IO clustering. Explain why and fix it.
You need to build a Node.js CLI tool that reads a 20GB log file, filters lines matching a regex, and writes matches to an output file. A colleague implements it by splitting the entire file into lines with `fs.readFileSync` and `String.split`. What is wrong and how do you implement it correctly?
Your Node.js service runs with a single replica in Kubernetes. After deploying a change that adds a per-user in-memory rate limiter, memory grows unbounded until the pod OOMKills every 6 hours. Identify the pattern and fix it.
You publish a Node.js SDK that wraps a REST API. Version 3.0 switches from CJS to pure ESM. Three enterprise customers report that their Webpack 4 build pipelines break after upgrading. Webpack 4 cannot process ESM packages. How do you handle this responsibly?
Your team has 800 Jest tests that take 12 minutes to run in CI. Developers skip tests locally because the feedback cycle is too slow. Propose a concrete strategy to reduce total test runtime to under 3 minutes.
Your Node.js service wraps all errors in a custom `AppError` class. A new developer catches every error in a try/catch and re-throws `new AppError(err.message)` regardless of type. You notice that stack traces are now 2 lines instead of 30. What is the problem and how do you fix the error wrapping pattern?
A developer stores JWTs in localStorage on the frontend that consumes your Node.js API. A security review flags XSS risk. The team argues that HttpOnly cookies have CSRF risks. Walk through the correct storage and transmission strategy for JWTs in a web app.
Your Node.js API serves both a mobile app and a web app. The web app needs paginated lists of users with full profile data; the mobile app needs the same users but only needs three fields to render a list view. Both teams complain the API is either over-fetching or under-fetching for them. Architect a solution.
Your fintech API processes 10,000 payment webhooks per minute. Engineers added a synchronous JSON.parse inside a hot webhook handler and P99 latency jumped from 12ms to 340ms overnight. Walk through diagnosing this and what you change architecturally.
A junior developer on your team wrote an Express route that calls await Promise.all([dbQuery(), cacheRead(), externalApiCall()]) and is confused why it still feels sequential when one of the three services is slow. Explain what's actually happening and how to fix it.
An e-commerce platform uses setImmediate to break up large batch jobs in Node. After upgrading from Node 16 to Node 20, the batch jobs run noticeably slower. The event loop phase order didn't change — what else might explain this and how do you investigate?
You join a startup using Express and find that every route handler has its own try/catch with console.error. There's no central error handling, errors reach clients as HTML stack traces, and 500s aren't logged to any system. What do you put in place first?
A healthcare SaaS team migrating from Express to NestJS wants to keep their existing middleware chain (rate limiting, HIPAA audit logging, auth). The NestJS architect proposes rewriting everything as NestJS guards and interceptors. What's your recommendation and why?
Your NestJS monolith handles both public REST API calls and internal background job processing. Under load, background jobs starve HTTP requests because they share the same event loop. Describe your approach to resolving this without splitting into separate services immediately.
You're designing a REST API for a logistics platform where a shipment can move through 12 states (created, picked_up, in_transit, customs_hold, etc.). A frontend team asks for a single PATCH /shipments/:id endpoint that accepts any field. What problems do you foresee and how do you design it?
A mobile team complains that your REST API for a social app returns too much data — a GET /posts feed endpoint returns full user objects nested inside each post, causing 4MB responses on a slow 4G connection. Product won't allow GraphQL. How do you solve this at the API layer?
Your team debates whether to version a breaking API change as /v2/users or via Accept: application/vnd.api+json;version=2 headers. The change is adding a required field to POST /users and deprecating a response field. Which approach do you recommend and why?
A data pipeline service reads 2GB CSV files from S3, transforms each row, and writes to PostgreSQL. The current implementation buffers the entire file in memory and crashes with OOM errors on large files. Redesign this using Node streams.
You're building a real-time log tail API where clients poll GET /logs/tail and receive the last 50 log lines from a file that is continuously written to. Users report the response sometimes returns garbled UTF-8 characters at line boundaries. Explain what's happening and how to fix it.
Your Node service streams video uploads from clients to cloud storage. During peak traffic, memory spikes to 8GB and the process gets OOM-killed. Your team says streams are already in use. What would you audit and what's likely wrong?
A Node.js API service at a retail company shows steady memory growth of 50MB per hour under normal load. The service is restarted nightly as a workaround. You're given one working day to diagnose and fix the leak. Walk through your exact approach.
Your Node API handles product search queries. Under load, response times are bimodal — most are 8ms but roughly 5% are 800ms. APM shows no slow DB queries. What event loop phenomena could explain this and how do you confirm?
An ad-tech platform's Node service generates 50,000 personalized HTML email templates per hour using a Handlebars template with 200 partials. CPU pegs at 95% and throughput falls to 30,000/hour. What do you investigate and optimize?
After adding type: module to package.json in your Node service, your Jest tests all fail with SyntaxError: Cannot use import statement in a module. Your colleague says to just add transform: {} to jest config. What's actually going on and what's the right fix?
You're publishing a utility library used by both legacy CJS Node services and modern ESM applications. The library uses named exports and has no default export. How do you set up the package.json exports field so both consumers work correctly without duplication?
Your Node service conditionally requires a module inside a function, and you notice it adds ~200ms latency to the first request that triggers that code path. In production this causes timeout errors. What's happening and how do you fix it?
Your Node API issues short-lived JWTs (15 min) with refresh tokens stored in a Redis cache. After a security incident, you need to immediately invalidate all active sessions for a specific user without waiting for token expiry. Walk through your implementation.
A mobile app team asks why their users are getting logged out every 15 minutes even though they're actively using the app. You trace it to your JWT middleware. Explain the likely bug and what the correct sliding session design looks like.
You're building a multi-tenant SaaS on Node where each tenant has their own JWT signing key. A new requirement asks that rotating a tenant's key should immediately invalidate all existing sessions for that tenant. How do you design the key management and validation flow?
A Node service runs a MongoDB aggregation pipeline that was averaging 40ms but now takes 3–8 seconds after the collection grew from 500K to 10M documents. The collection has the same indexes it always had. Walk through your diagnosis and remediation.
Your Node service uses Sequelize with a PostgreSQL connection pool of size 10. Under moderate load you see 'Error: connection pool exhausted' errors. Increasing pool size to 100 makes the errors go away, but your Postgres instance CPU jumps to 90%. What's actually happening?
You're designing a Node.js service for an edtech platform that tracks student progress events at 50K events/minute. The product team wants both real-time dashboards and historical analytics on the same data. How do you design the data storage and access pattern?
Your Node API serves a product catalog with 100K SKUs. You implement Redis caching with a 5-minute TTL. At 8am when the marketing team runs a flash sale and updates 10K products simultaneously, your service crashes due to a thundering herd. What happened and how do you fix it?
You use Redis to cache user permission sets as JSON blobs keyed by userId. A security fix requires revoking a specific permission from 50,000 users immediately. You can't iterate the entire user table in production. How do you handle this efficiently?
Your Node microservices use Redis Cluster for distributed caching. You notice that one Redis shard handles 80% of the traffic while others are nearly idle. Your ops team calls it 'hot key' problem. Walk through diagnosis and the design fix.
A trading platform uses Socket.IO on Node to push price updates to 50,000 concurrent browser clients. After deploying a second Node instance behind a load balancer, clients report missing updates and duplicate connection events. Diagnose and redesign.
You're building a collaborative document editor where multiple users can edit the same document simultaneously. You're using native WebSockets in Node without a framework. A user reports that after a network dropout, their local changes are lost when they reconnect. How do you handle reconnection and state sync?
Your gaming platform's Node WebSocket server handles 20,000 concurrent game sessions. Each session has a 60-second heartbeat to detect dead connections. Under GC pressure, heartbeat timeouts fire incorrectly, disconnecting active users. Explain the failure mode and your fix.
Your Jest test suite takes 14 minutes to run in CI. It has 800 tests across 60 test files. The majority of the time is spent in integration tests that spin up an actual database. How do you reduce this to under 3 minutes without deleting tests?
A team refactored a Node service's database layer from direct pg queries to an ORM, and test coverage dropped from 87% to 62% because many existing unit tests mocked pg directly and those mocks no longer work. How do you restructure the test strategy?
Your Node service has a function that sends an email via SendGrid when an order ships. In your Jest test, you want to verify the email is sent with the correct recipient and subject without actually calling SendGrid. How do you set this up and what mistakes do you avoid?
Your Node microservice calls three downstream services. When Service C fails, you want to: return a partial response using data from A and B, log the C failure with full context, and not expose internal error messages to the client. Design the error handling pattern.
Developers on your Node team are using async/await everywhere but you still see unhandled promise rejection warnings in production logs. You've verified all async functions are wrapped in try/catch. What other sources of unhandled rejections exist and how do you plug them?
A Node service integrates with a third-party billing API that intermittently returns 503 errors. Your current implementation immediately returns a 500 to the client, causing the frontend to show an error and the user to retry manually. Design a resilient integration pattern.
Your team has five Node microservices communicating via direct HTTP REST calls. During a deployment of Service B, a cascade failure takes down Service A and C, which both depend on B. The postmortem asks you to redesign inter-service communication to prevent this. What do you propose?
You're the tech lead for a Node microservices platform. The team wants to introduce a distributed transaction spanning three services: Order, Inventory, and Payment. They propose 2PC. What do you recommend instead and why?
Your Node API gateway routes requests to five downstream microservices. A new compliance requirement says every inbound request must be logged with a unique trace ID that propagates through all downstream calls. How do you implement this without modifying all five services?
At 3am, your Node API's P95 response time spikes from 80ms to 12 seconds. CPU is normal, memory is normal, DB query times are normal. What's your systematic diagnosis process and what do you look for first?
A Node service deployed on Kubernetes is restarting every 6–8 hours. Logs show no errors before the crash. The only clue is exit code 137. What happened and how do you investigate and fix it?
Your Node service at a media company handles video metadata ingestion. A specific video ID causes the service to hang indefinitely — no response, no error, no CPU usage. Other requests continue normally. Describe how you diagnose an individual request hanging without affecting traffic.
A backend developer asks why setTimeout(fn, 0) and setImmediate(fn) behave differently, and which to use when they want to yield to I/O callbacks between loop iterations. Give a concrete example using file reading.
Your Express app has a middleware that reads a JWT from the Authorization header and attaches the decoded user to req.user. A new endpoint needs to support both JWT auth and API key auth. How do you refactor the middleware to handle both without duplicating route code?
You need to implement a Node.js endpoint that accepts a gzip-compressed JSON body, decompresses it, parses the JSON, and validates the schema — all without loading the full body into memory first. How do you wire this together?
Your Node API renders server-side HTML reports using a template engine. Each report call allocates 50MB of JavaScript objects for the data transformation, releases them after the response, and you observe this causes frequent GC pauses. What are your options for reducing allocation pressure?
Your Node monorepo has internal packages that mix CommonJS and ESM. A new package needs to import from both types. During the build, you hit ERR_REQUIRE_ESM. Walk through why this happens and the options for resolving it sustainably.
A colleague proposes storing JWT tokens in localStorage for your React + Node web app. You disagree. What exactly is the security risk, what's the alternative, and what are the trade-offs of each?
Your Node service has an endpoint that returns a user's full activity history from Postgres. As users age, their history grows to 100,000+ rows and the query is timing out. The current query is SELECT * FROM activities WHERE userId = $1 ORDER BY createdAt DESC. What do you change?
Your team uses Redis as a rate limiter for an API, implementing it with a simple INCR key and EXPIRE. A load test reveals the rate limiter can be bypassed under race conditions. Explain the race condition and the correct Redis implementation.
You're building a live-score feature for a sports app in Node. Scores update every 30 seconds. You're choosing between WebSockets and Server-Sent Events (SSE). The app only pushes data from server to client. Which do you recommend and what are the implementation considerations?
You're testing a Node service that uses AsyncLocalStorage to propagate a request context (including userId and correlationId) through async operations. Some tests fail intermittently when tests run in parallel — they pick up the wrong context. How do you structure the tests correctly?
Your Node API returns different error shapes from different routes — some return {error: 'message'}, others return {message: '...'}, some return HTML from Express's default error handler. The frontend team has to handle all three cases. How do you standardize this?
Your team builds a Node service that must be resilient to its PostgreSQL dependency being temporarily down (rolling restarts, failover). During DB downtime, the service should continue serving cached responses for read endpoints and queue write operations for later. How do you design this?
A Node service is making redundant database calls — the same query with the same parameters is being executed multiple times per request. You can see this in your DB slow-query log. The code uses async/await and looks correct. What are common causes of this and how do you find the specific one?
Your fintech startup's Node.js gateway processes 4,000 payment callbacks per second. Engineers report that heavy JSON parsing inside setImmediate handlers is causing 200ms jitter in downstream latency. How do you diagnose and fix this without a full rewrite?
A Node.js microservice at an e-commerce company resolves three independent database queries sequentially inside an async function, adding roughly 90ms of unnecessary latency. The team lead asks you to fix it without changing the database tier. What is your approach?
Your NestJS monolith at a SaaS company has grown to 80 modules with circular dependency warnings causing occasional undefined-provider errors at startup. The DI container sometimes resolves in a different order across deployments. How do you systematically eliminate the problem?
You are building a REST API with Express. A product manager requests that all API responses include a request-id header for tracing and a consistent JSON envelope with status, data, and error fields. How do you implement this cleanly without touching every route handler?
A payments API you maintain returns full card-object representations in every webhook payload, including fields the receiving merchants never use. Payload sizes have grown to 8KB average, causing downstream timeout spikes on mobile networks. How do you redesign the API without breaking existing integrations?
You are designing an API endpoint for a job-board application that lets employers search job applicants by skills, location, and experience range. The search parameters can be complex and the URL can become very long. How do you handle this in a RESTful way?
A data pipeline at a logistics company reads 2GB CSV files from S3, transforms each row, and writes results to Postgres. The current implementation loads the entire file into memory, causing OOM crashes on the production container with a 512MB limit. Redesign it using Node.js streams.
You need to build an HTTP endpoint in Node.js that accepts a file upload, compresses it with gzip on the fly, and stores the compressed bytes in S3 without buffering the entire file in memory. Walk through the implementation.
A Node.js reporting service at a media company generates PDF reports by aggregating data from five microservices. Memory grows unboundedly over 6 hours, triggering OOM kills. Heap snapshots show thousands of retained EventEmitter instances. How do you find and fix the leak?
Your Node.js API server handles 500 requests per second but CPU is pinned at 95% even though most requests are I/O-bound database lookups. A flame chart shows significant time spent in JSON.stringify. What optimizations do you apply?
Your Node.js library published on npm uses CommonJS. A major consumer wants to import it as an ESM module in their Deno and browser-compatible project. They report that require() stubs in ESM break tree-shaking in their bundler. How do you publish a dual CJS/ESM package without breaking existing consumers?
A colleague runs a Node.js script and gets 'Cannot use import statement in a module' even though the file has a .js extension. They are confused because another file with identical syntax works fine. What are the likely causes and how do you resolve each?
A Node.js API uses long-lived JWTs with a 7-day expiry. The security team mandates token revocation within 60 seconds of a logout event for enterprise accounts. The architecture must support 50,000 concurrent enterprise users. How do you implement this without rebuilding authentication from scratch?
You are reviewing a Node.js Express application and find that the JWT secret is read from process.env.JWT_SECRET but there is no validation that the env var is set, and the default falls back to the string 'secret'. Explain the risk and fix it properly.
A Node.js application uses Mongoose to query a MongoDB collection of 10 million user documents. Queries on the email field take 800ms in production despite an index existing. A MongoDB Atlas performance advisor shows the index is not being used on some query patterns. How do you diagnose and fix this?
You are building a Node.js service that writes user activity events to Postgres. The table receives 2,000 inserts per second. After two weeks, write latency has increased from 2ms to 40ms. The DBA says the table has 500 million rows and the B-tree index on user_id is bloated. What do you do?
Your Node.js API caches user profile objects in Redis with a 5-minute TTL. After a successful marketing campaign, you experience a cache stampede: 50,000 concurrent requests hit the same expired key simultaneously, overloading your Postgres instance. How do you prevent this?
A Node.js service caches API responses in Redis using the full URL as the cache key. After deploying a fix that changes a response field, users report seeing stale data for up to 10 minutes. How do you implement a proper cache invalidation strategy?
You are architecting a real-time collaborative document editor on Node.js with Socket.IO. The system must support 10,000 concurrent connections across four horizontally-scaled Node.js instances. A user's changes must be broadcast to all other collaborators on the same document regardless of which server they are connected to. Design the pub/sub layer.
A Node.js chat application uses raw WebSocket connections. Users report that they do not receive messages sent while they were briefly disconnected due to a network blip. How do you implement a reliable message delivery mechanism with at-least-once guarantees?
Your Node.js service's integration test suite takes 22 minutes to run in CI because each test file spins up a real Postgres container, runs migrations, and tears it down. The team wants sub-5-minute CI. How do you restructure the test architecture?
A colleague writes a Jest test for a Node.js function that calls an external payment API. The test passes locally but fails intermittently in CI with network timeout errors. How do you fix the test to be reliable and fast without hitting the real API?
Your Express API at a healthcare company has inconsistent error responses — some routes return plain text, some return HTML error pages, some return JSON. A security audit flags that stack traces are leaking to clients. Design a unified error handling architecture.
A junior developer on your team uses try/catch around an async function but the catch block only logs the error and does not rethrow or send a response. The Express route appears to hang indefinitely for clients when the error occurs. Explain the problem and the correct pattern.
Your team runs eight Node.js microservices communicating via REST. Tracing a single user request across services requires correlating eight separate log files. The DevOps team asks for distributed tracing without requiring a full infrastructure overhaul. Design a lightweight solution.
Two of your Node.js microservices need to maintain data consistency: when Service A creates an order, Service B must create a shipping record. Service B is sometimes down for 2-3 minutes during deploys. How do you guarantee at-least-once processing without a distributed transaction?
Your team is splitting a Node.js monolith into microservices. The notifications module currently shares an in-process event emitter with the orders module. After extraction, how do you replace the in-process event bus with an async inter-service communication pattern?
Your Node.js API suddenly starts returning 502 errors on 15% of requests in production. No deployment happened in the last 4 hours. Logs show ECONNRESET errors from outgoing calls to a downstream service. CPU and memory metrics look normal. Walk through your incident investigation process.
A Node.js worker service running in a Kubernetes pod crashes with 'JavaScript heap out of memory' every 6 hours. The pod is restarted by Kubernetes automatically, but the team wants to understand the root cause. What is your debugging approach?
A high-frequency trading firm's Node.js order router uses setInterval every 50ms to check for new orders. Under load, the interval fires at 80-120ms instead of 50ms, causing missed order windows. Explain why this happens and how you redesign the timing mechanism.
Your NestJS application serves both a public REST API and an internal admin API. Security requires the admin endpoints to be accessible only from a specific internal CIDR range and to require a different authentication scheme. How do you implement this without duplicating route handlers?
Your public Node.js API needs to support versioning as the product evolves. You have consumers on v1 who cannot migrate quickly, and you need to ship breaking changes in v2. Design a versioning strategy that minimizes operational burden.
A Node.js service processes video subtitle files uploaded by users. Some files contain mixed encodings (UTF-8, Windows-1252, ISO-8859-1), causing garbled text when stored. How do you detect and normalize encodings in a streaming pipeline?
Your Node.js API has a critical path that instantiates a new Ajv validator and compiles a JSON schema on every incoming request. Under load testing at 3,000 RPS, you see CPU at 80% and schema compilation appears in the flame chart. How do you fix this?
Your Node.js application uses a CommonJS module that relies on __dirname for path resolution. After migrating the file to ESM (.mjs), __dirname is undefined and the application crashes at startup. How do you correctly replicate __dirname and __filename in ESM modules?
Your Node.js API must support multi-tenant authentication where each tenant has its own JWKS endpoint for RS256 tokens. Tenants rotate keys on a 24-hour cycle. Requests arrive at 2,000 RPS with tenant IDs in the JWT header. How do you implement efficient key resolution?
A Node.js service uses Sequelize ORM with Postgres. A code review reveals that a function fetches a parent record and its 50 children in two separate queries inside a loop over 200 parents, resulting in 10,001 queries per request. Fix the N+1 problem without removing Sequelize.
A Node.js job queue stores tasks in a Postgres table. Multiple worker instances pick up tasks by querying for status='pending' and updating to status='processing'. Under load, two workers occasionally pick up the same task. How do you prevent duplicate processing?
Your Node.js API uses Redis to cache product listings. During a flash sale, Redis goes down for 90 seconds. The database receives 100x normal query volume and falls over too. How do you make your caching layer resilient to Redis outages without losing the performance benefit during normal operation?
A Node.js API uses Redis INCR to implement rate limiting: 100 requests per minute per IP. A tester reports that they can burst 200 requests if they time the window boundary correctly. Explain the vulnerability and implement a correct sliding window rate limiter.
A fintech dashboard uses Server-Sent Events to push real-time portfolio updates to 20,000 concurrent browser clients from a Node.js server. After 2 hours, the server stops pushing events to roughly 30% of clients, though those clients still show as connected. What are the likely causes and fixes?
You need to implement a Node.js endpoint where clients subscribe to live price updates for a list of stock symbols. Different clients subscribe to different symbol sets. When a price changes, only the relevant clients should be notified. How do you design the routing logic?
Your team has 1,200 Jest unit tests with a 15-second average run time in CI. A dependency on a shared test utility that imports a heavy ORM causes 400ms of setup overhead per test file. How do you reduce total test time by 50%?
A Jest test for a Node.js function that uses setTimeout passes in isolation but fails when run alongside other tests in the same file, with timers firing in unexpected order. What is the cause and how do you fix it?
Your Node.js service uses Promise chains. A new developer adds an async function that returns a Promise but forgets to await it in the calling function. The error is swallowed silently. How do you prevent unhandled promise rejections across a large codebase?
A Node.js HTTP client in your service retries failed requests with a fixed 1-second delay for up to 5 attempts. During a downstream outage, all retrying requests pile up and exhaust the connection pool. How do you implement a correct retry strategy?
Your Node.js API gateway aggregates responses from three downstream microservices for a single endpoint. One of the three services frequently takes 3 seconds to respond, degrading the entire endpoint. The data from that slow service is optional for 80% of use cases. How do you redesign the aggregation?
Your Node.js API's response time p99 degraded from 80ms to 2,400ms starting at 14:22 UTC yesterday. No deployment occurred. CPU and memory are normal. Database query times are normal. The slowdown appears on all endpoints. What is your systematic investigation process?
A Node.js service running in Docker logs the error 'EMFILE: too many open files'. The container's ulimit for open files is 65536. The service handles 200 concurrent requests. How do you diagnose which resources are being leaked?
A Node.js script processes an array of 10,000 items by awaiting an async database call for each item in a for-of loop. It takes 50 seconds to complete. A colleague suggests using Promise.all instead. What are the trade-offs and what is the correct approach for this workload?
Your Node.js payment gateway service handles 2,000 req/s but occasionally shows latency spikes of 800ms every few minutes. Metrics show CPU stays low. How do you diagnose whether the event loop is being blocked and what tools would you reach for first?
A junior developer on your team wrapped a Promise in a new Promise constructor and calls resolve inside .then. The code works but senior reviewers flagged it. Explain the exact anti-pattern and how you would refactor it for a fintech API where silently swallowed errors are dangerous.
You're migrating a monolithic Express app serving 5 million daily users to NestJS. The Express app has 40 middleware functions wired in a fragile order-dependent chain. Describe your migration strategy to avoid production regressions and keep the team shipping features during the transition.
Your NestJS API returns a 500 with a generic message when the PostgreSQL connection pool is exhausted under load. Product wants user-facing errors to say 'Service temporarily busy' instead. Walk through exactly where and how you implement this in NestJS without touching every controller.
A mobile app team complains your REST API forces them to make 8 sequential calls to render a single product detail screen, causing 3-second load times on slow networks. The backend team resists GraphQL. What REST-native solutions do you propose and implement in Node.js?
You are building a public REST API for an e-commerce platform. A partner integration team asks why your PATCH endpoint for updating order status sometimes returns 200 and sometimes 204. Explain the correct semantics and implement a consistent contract in Express.
A data pipeline Node.js service reads 2GB CSV files from S3, transforms rows, and writes to PostgreSQL. Memory spikes to 4GB and the process OOMs. Describe how you redesign this using Node.js streams to keep memory under 100MB regardless of file size.
You need to serve dynamically generated PDF reports from a Node.js API. Some reports exceed 50MB. A colleague suggests accumulating the full PDF in a Buffer before sending. Why is this problematic and how do you stream the PDF to the HTTP response instead?
A Node.js microservice that processes webhook events shows a memory leak — RSS grows from 150MB to 1.5GB over 6 hours before the container is restarted. The leak is not obvious from code review. Describe your systematic debugging process using production-safe tooling.
Your Node.js API gateway handles 10,000 req/min but p99 latency jumped from 80ms to 600ms after a deployment. CPU is at 30%, memory is stable. A colleague says to increase the cluster worker count. Why might that not help and what would you investigate first?
Your Node.js monorepo has a shared utility package that some apps import as CommonJS and others as ES modules. Publishing a dual-format package is causing subtle bugs where singleton state is duplicated. Explain the problem mechanically and how you solve it properly.
You're converting a legacy Node.js utility library from CommonJS to ESM. A colleague points out that __dirname and __filename are not available in ES modules. How do you replicate their behavior and what other CJS globals must you account for in the migration?
Your SaaS platform issues JWTs with 15-minute expiry and refresh tokens stored in a database. A security audit flags that compromised refresh tokens can be reused indefinitely until manually revoked. Design a refresh token rotation system in Node.js that automatically detects and handles token theft.
A coworker's Node.js middleware validates JWTs by checking only the signature and trusts the alg header from the token. In a penetration test, the tester changed the alg to none and bypassed authentication entirely. Explain the vulnerability and write a secure validation approach.
A financial reporting query on PostgreSQL via Node.js runs in 4 seconds for a single tenant but takes 40 seconds when other tenants run it concurrently. The table has 50 million rows and uses row-level security. Describe your investigation and optimization strategy.
Your Node.js e-commerce API uses Mongoose. You notice that a product search endpoint performs well in development with 1,000 products but degrades to 8 seconds in production with 500,000 products. The query filters by category, price range, and sorts by rating. Walk through your optimization steps.
Your Node.js API serves personalized recommendation feeds. Redis caches each user's feed for 5 minutes. During a flash sale, 200,000 users hit the endpoint simultaneously as caches expire at the same time. Design a solution that prevents a cache stampede in Node.js.
You're adding Redis caching to a Node.js REST API that lists product categories. A code review shows cache invalidation only happens on category deletion but not on creation or update. What bugs does this cause and how do you implement a correct invalidation strategy?
You're building a collaborative document editor in Node.js with Socket.io. During load testing, you find that when one server instance broadcasts an edit, clients connected to other instances don't receive it. The app runs on 4 pods in Kubernetes. How do you architect the broadcasting layer?
A Node.js chat application uses the ws library. Users report messages occasionally go missing. You discover that the server sends messages without checking if the socket is still open. Implement a robust send helper that handles this edge case correctly.
Your Node.js service's test suite takes 18 minutes to run in CI and the team is pushing fewer commits to avoid waiting. Describe a comprehensive strategy to bring test runtime under 3 minutes without removing coverage.
You write a Jest test for a Node.js function that calls setTimeout internally. The test passes in isolation but flakes intermittently in CI because the timer fires at unexpected times. How do you make this test deterministic without increasing test timeout?
A Node.js API running in production crashes once a day with UnhandledPromiseRejectionWarning. The crash happens in a background job that processes queue messages. The error is intermittent and the stack trace doesn't point to a specific line. Describe your complete debugging and hardening plan.
Your Express API propagates raw database error messages to the client, leaking table names and SQL syntax in 500 responses. Describe how you implement a centralized error handler that sanitizes errors without losing diagnostic information internally.
Your Node.js microservices architecture has a checkout service that calls inventory, pricing, and shipping services synchronously via HTTP. During peak load, a slow shipping service causes the entire checkout to hang for 30 seconds. Redesign this for resilience without requiring a full rewrite.
Two Node.js microservices share a PostgreSQL database to reduce complexity. The teams deploy independently and a schema migration by team A broke team B's service. How do you architect the data layer to allow independent deployments?
A Node.js API in production is returning incorrect data for 0.3% of requests. There are no error logs. Users report seeing another user's data intermittently. Describe your systematic investigation of this data isolation bug.
Your Node.js service's health check endpoint returns 200 but downstream services report intermittent connection refused errors. The process appears healthy from outside. How do you diagnose whether the problem is in the Node.js service or the network layer?
You're profiling a Node.js image-processing service that resizes user-uploaded photos. Even with async I/O, the service handles only 3 concurrent resize operations before latency degrades. The resize uses a pure-JS library. What is the fundamental problem and how do you resolve it architecturally?
Your NestJS application handles file uploads and the team discovered that malicious users upload files with double extensions like shell.php.jpg to bypass MIME type validation. The files end up executed on the server. Describe a defense-in-depth upload validation strategy.
You are designing a REST API for a multi-tenant SaaS. An enterprise customer requires their data never mingles with other tenants' data in API responses, even in edge cases like pagination bugs. Describe the full tenant isolation strategy from routing to database query.
A Node.js log aggregation service receives 50MB/s of log data via TCP. After 2 hours, the process has consumed 8GB of memory. You suspect backpressure is not being respected. Explain how Node.js stream backpressure works and how you diagnose and fix the violation.
A financial institution's Node.js service processes 500 trade events per second. GC pauses are causing 50ms latency spikes every 10 seconds, breaching SLA. The heap is 800MB. How do you reduce GC pressure without reducing throughput?
Your Node.js API issues JWTs and the signing secret is copied to 12 microservices. A secret rotation is required but changing the secret immediately invalidates all active user sessions. Design a zero-downtime secret rotation procedure.
Your Node.js order service using PostgreSQL needs to debit a user's balance and create an order record atomically. The current implementation makes two separate database calls and occasionally creates orders without debiting when the second call fails. Fix this with proper transaction handling.
Your Node.js API caches user permission lists in Redis. A security incident reveals a suspended user could still access protected resources for up to 10 minutes because cached permissions weren't invalidated. Design a real-time cache invalidation system.
A trading platform's Node.js price feed pushes updates to 50,000 concurrent WebSocket clients. Memory grows proportionally with connected clients. Profiling shows 8KB of per-connection state. How do you reduce memory per connection to support 200,000 clients on the same hardware?
Your team's Node.js services have 85% line coverage but still have production bugs that tests don't catch. Post-mortem analysis reveals bugs involve incorrect behavior at the boundaries between mocked dependencies and real ones. How do you redesign the test strategy?
A Node.js service processing financial transactions intermittently loses transactions silently with no error logged. Investigation reveals some async code paths exit without resolving or rejecting their promises. How do you detect and fix these promise black holes systematically?
Your e-commerce platform uses 15 Node.js microservices. A new feature requires a distributed transaction across order, inventory, and payment services. A senior engineer recommends the Saga pattern. Explain choreography versus orchestration sagas and why you would choose one over the other here.
A new Node.js microservice needs to consume events from a Kafka topic. The team proposes polling the topic in a setInterval every 100ms. Why is this approach problematic and how should you implement Kafka consumption in Node.js correctly?
Book a mock interview with a senior Node.js Developer mentor — structured scorecard, replay, and a gap plan.