HikeCatalystHikeCatalyst
← All roles

Solutions Architect Interview Questions

200 scenario-based questions with detailed model answers, organized skill-wise and tool-wise. Filter by topic, level or keyword, reveal the answer — then pressure-test yourself in a real mock.

SKILL / TOOL
LEVEL
200 questions
Q001System DesignMid

A healthcare startup needs to build a patient portal where doctors upload lab reports (PDFs, up to 50 MB each) and patients view them. They expect 10,000 active patients at launch, growing to 500,000 within 18 months. Design the file storage and delivery architecture.

Q002Cloud ArchitectureSenior

Your fintech client runs everything on a single AWS region (us-east-1). A 4-hour us-east-1 outage cost them $2.4M in SLA penalties. The CTO wants active-active multi-region within 90 days. Their stack is Aurora PostgreSQL, EKS, SQS, and a monolithic Node.js API. What is your architecture and sequencing plan?

Q003Cloud ArchitectureMid

A retail company's AWS bill jumped 40% in one month. The engineering team blames a new ML training pipeline. You're asked to audit the spend and present a remediation plan in 48 hours. How do you approach this and what are the most common culprits?

Q004Integration PatternsSenior

A logistics company integrates with 12 third-party carrier APIs, each with different authentication schemes, rate limits, and payload formats. One carrier's API goes down for hours weekly, causing their order management system to stall. Design an integration layer that isolates failures and normalizes data.

Q005Integration PatternsMid

A mid-sized SaaS company wants to send user activity events to three downstream systems: a data warehouse, a customer success tool, and an in-app notification engine. They currently call all three synchronously from their API server. A slow notification service is adding 800ms to every API response. Redesign this.

Q006Microservices vs MonolithSenior

A Series B startup with 18 engineers has a well-tested Django monolith serving 2M users. A VP of Engineering wants to migrate to microservices citing 'scalability.' Performance metrics show the monolith handles current load fine with p99 under 200ms. How do you advise them?

Q007Microservices vs MonolithMid

You join a company that has decomposed their platform into 40 microservices, but a single user login flow requires synchronous calls through 7 services. Average login latency is 3 seconds and engineers are afraid to change anything. What has gone wrong and how do you stabilize it?

Q008Scalability & PerformanceSenior

A media streaming company's recommendation API handles 80,000 requests/second during primetime. Latency spikes to 4 seconds at 7 PM every day, correlating with a recommendation model recomputation job that runs on the same database cluster. Design a solution.

Q009Scalability & PerformanceMid

A SaaS application's API response times are fine for 95% of requests but the p99 latency is 8 seconds, causing SLA breaches for enterprise customers. The team has added indexes and the database looks healthy. What is your diagnostic approach?

Q010Security ArchitectureSenior

A financial services company stores customer PII and transaction data in S3. A security audit finds that 3 S3 buckets were publicly accessible for 11 months. You are brought in to assess the blast radius, remediate, and prevent recurrence. Walk through your response.

Q011Security ArchitectureMid

A startup is building a multi-tenant B2B SaaS app where each customer's data must be strictly isolated. They're using a single PostgreSQL database with a `tenant_id` column on every table. An engineer notices a bug where a missing WHERE clause leaked data across tenants. Design an architecture that makes cross-tenant leaks structurally impossible.

Q012Data ArchitectureSenior

A global retail chain wants to consolidate data from 800 stores (each running a local MySQL instance), 3 ERP systems, and an e-commerce platform into a single analytics platform for real-time inventory and demand forecasting. Data volumes are 200 GB/day. Design the ingestion and serving architecture.

Q013Data ArchitectureMid

A product analytics team is complaining that their dashboards take 45 minutes to refresh because they run directly against the production PostgreSQL database. The DB team says adding more queries will destabilize production. How do you solve this without a major re-architecture?

Q014Cost & TradeoffsSenior

Your team is designing a real-time fraud detection service that must evaluate every transaction within 150ms. The ML team proposes deploying a large transformer model (8B parameters) on GPU instances for best accuracy. The business analyst estimates 2,000 transactions/second at peak. Walk through the cost-accuracy tradeoff analysis.

Q015Cost & TradeoffsMid

A startup CTO asks you to choose between a managed Kubernetes service (EKS) and AWS Lambda for their new event processing backend that handles variable workloads between 10 and 50,000 events per hour. What is your recommendation and how do you justify it?

Q016Migration StrategySenior

A 15-year-old insurance company runs their core policy management system on an on-premises Oracle database with 800 stored procedures, 200 GB of data, and 40 integrating systems. They want to migrate to AWS within 12 months. Design the migration strategy.

Q017Migration StrategyMid

A company wants to migrate from a self-hosted Jenkins CI/CD pipeline to GitHub Actions. They have 150 pipelines, some with custom Jenkins plugins that have no GitHub Actions equivalent. How do you approach the migration without disrupting ongoing development?

Q018API DesignSenior

A platform team is building a public API that will be consumed by 500+ enterprise customers. Some customers are on v1 and cannot upgrade for 18 months due to compliance freezes. New customers need features that require breaking v1 contract changes. How do you design the versioning and lifecycle strategy?

Q019API DesignMid

Your REST API returns a full customer object (45 fields) on every response, even when clients only need 3 fields. Mobile clients are complaining about slow load times on 4G connections. You cannot rewrite the API in the next quarter. What quick wins and longer-term fixes do you implement?

Q020Event-Driven ArchitectureSenior

A ride-hailing platform processes driver location updates at 50,000 events/second. These feed into three consumers: a real-time map display, a surge pricing engine, and a trip ETA calculator. The team currently uses a single Kafka topic and all three consumers share one consumer group, causing processing delays. Diagnose and fix.

Q021Event-Driven ArchitectureMid

A payment processing system uses an event-driven architecture. A bug caused 15,000 payment.processed events to be published twice, resulting in duplicate downstream effects — double entries in accounting, double loyalty points. How do you fix the immediate problem and prevent recurrence?

Q022High Availability & DRSenior

A hospital's electronic health records system has an RTO of 15 minutes and an RPO of 1 minute for their patient data. They currently do nightly database backups to tape. A ransomware attack hits their primary datacenter at 2 AM. Walk through the incident response and the architectural changes needed post-incident.

Q023High Availability & DRMid

An e-commerce platform runs on a single EC2 instance with an RDS database. The EC2 instance failed on Cyber Monday, causing 3 hours of downtime. The CTO asks you to redesign for 99.9% availability by the next major sale in 6 weeks. What do you build and in what order?

Q024Technology SelectionSenior

A fintech startup needs to choose between Apache Kafka and AWS SQS/SNS for their new transaction event streaming platform. They expect 5,000 events/second, need event replay for 30 days, have a 4-person backend team, and are fully on AWS. Make the recommendation with explicit tradeoff analysis.

Q025Technology SelectionMid

A team is debating whether to use Redis or a relational database for storing user session data. Sessions are 4KB on average, 500,000 concurrent users, 30-minute expiry. The application is on AWS. What do you recommend and why?

Q026Stakeholder & Pre-salesSenior

You are in a pre-sales meeting with a CTO who says 'We don't need a cloud architect, we already have one.' You discover they have a $1.8M/year AWS bill with 60% utilization on reserved instances they can no longer use because their stack changed. How do you reframe the conversation?

Q027Stakeholder & Pre-salesMid

A prospective customer's engineering director tells you they prefer to keep everything on-premises due to data sovereignty concerns. Their current datacenter is 8 years old, and their colocation contract expires in 14 months. How do you handle this objection and shape the conversation?

Q028System DesignSenior

A food delivery platform needs to match orders to nearby drivers in real time. At peak they have 200,000 active drivers and 50,000 open orders across a major metro area. Current approach uses a Postgres table with lat/lon columns and a cron job every 30 seconds. Orders are going unmatched for 4+ minutes. Redesign the matching system.

Q029Cloud ArchitectureMid

A company has just acquired a smaller startup that runs entirely on GCP while the acquiring company is AWS-native. Integration is needed within 6 months — customer data must flow bidirectionally. What are your options and how do you choose?

Q030Integration PatternsSenior

A bank needs to integrate a modern cloud-native payments service with a 30-year-old core banking COBOL system that communicates via fixed-width flat files over SFTP, processed nightly. Business wants real-time payments. Design the integration layer without replacing the core banking system.

Q031Scalability & PerformanceSenior

A social platform's notification service sends 10 million push notifications daily. During major live events, this spikes to 5 million within 10 minutes. Notifications are getting delayed by 20–40 minutes during spikes, and the FCM/APNS delivery rate has dropped to 60%. Diagnose and fix.

Q032Security ArchitectureSenior

A SaaS company's penetration test reveals that their internal microservices communicate over HTTP without authentication, and any service can call any other service. An attacker who compromises one service can pivot to the payment service. Design a zero-trust service mesh architecture.

Q033Data ArchitectureSenior

A streaming analytics company ingests 2TB of clickstream data per day. They need to support both real-time dashboards (< 5-second latency) and historical batch reports (querying 18 months of data). They're currently running everything through a single Spark cluster, causing both use cases to degrade each other. Design a Lambda or Kappa architecture.

Q034Cost & TradeoffsMid

A team is considering buying a $200,000/year enterprise Elasticsearch license vs. building and operating their own open-source OpenSearch cluster on AWS. They have one senior DevOps engineer. How do you structure the build-vs-buy analysis?

Q035Migration StrategySenior

A large retailer runs a monolithic e-commerce application on 400 bare-metal servers in two datacenters. They want to migrate to AWS. The CIO mandates zero downtime and a 24-month timeline. Their Black Friday traffic peaks are 10x normal load. How do you structure the migration program?

Q036API DesignSenior

You're designing a webhook system for a developer platform. Customers will configure endpoints to receive events. How do you design for reliability, security, and developer experience when a customer's endpoint is unreliable or slow?

Q037Event-Driven ArchitectureSenior

A supply chain platform needs to coordinate a multi-step order fulfillment process across 5 independent microservices: inventory reservation, payment processing, warehouse picking, shipping label generation, and customer notification. Any step can fail and must be compensated. Design a distributed transaction pattern.

Q038High Availability & DRSenior

A telecommunications company has an SLA of 99.999% availability (five nines) for their network management platform. They suffered a 12-minute outage last year due to a bad database schema migration. How do you architect both the deployment process and the system to prevent a recurrence?

Q039Migration StrategyMid

A company needs to migrate from MySQL 5.7 to MySQL 8.0 in production. They have 50 databases, 10TB of data, and their application developers have written queries that rely on MySQL 5.7-specific behaviors that changed in 8.0. How do you plan and execute the upgrade?

Q040API DesignMid

Your API currently uses API keys for authentication. Enterprise customers are asking for OAuth 2.0 support with SAML SSO integration for their identity provider. You have 6 weeks. What is your approach?

Q041Stakeholder & Pre-salesSenior

During a technical workshop with a prospective banking client, their lead architect says 'We tried microservices 3 years ago. It was a disaster. We're never doing it again.' You believe microservices could genuinely solve their current scaling problem. How do you handle this?

Q042Stakeholder & Pre-salesMid

A product manager wants to add real-time collaborative editing (like Google Docs) to your document management SaaS within one quarter. You estimate it's a 12-month project. How do you manage this expectation and what do you propose instead?

Q043System DesignMid

Design a URL shortener service that needs to handle 100 million short URL creations per month and 10 billion redirects per month. A key requirement is that redirect latency must be under 10ms globally.

Q044Security ArchitectureMid

Your company's CI/CD pipeline has access to production secrets (database passwords, API keys) stored as environment variables in Jenkins. Three engineers have left the company this month. What is your immediate action plan and longer-term secrets management strategy?

Q045High Availability & DRMid

A company's primary region (AWS us-east-1) is their only environment. Their SLA requires 99.9% uptime but they have no DR plan. You have a $15,000/month budget to add. How do you design a DR solution within this constraint?

Q046Event-Driven ArchitectureMid

A team wants to migrate from a polling-based integration (checking a database table every 5 seconds) to an event-driven approach. The source system is a PostgreSQL database they don't own and cannot modify. How do you implement this without modifying the source system?

Q047System DesignSenior

An edtech platform serves 3 million students with adaptive quiz content. Each student's next question is determined by a real-time ML model based on their last 20 responses. Generating the next question takes 800ms on average, causing students to see a loading spinner between every question. Fix the latency without degrading recommendation quality.

Q048Data ArchitectureMid

A marketing team wants to query customer behavior data in real time via SQL, but their data is scattered across Salesforce, Stripe, Mixpanel, and a custom PostgreSQL transactional database. They can't wait for a 6-month data warehouse build. What is the fastest path to a working solution?

Q049Microservices vs MonolithSenior

A company has 6 microservices. Deploying a new feature requires coordinating changes across 4 of them simultaneously, causing deployment windows to stretch to 4 hours with frequent rollbacks. The team is considering merging services back into a monolith. How do you advise?

Q050Cost & TradeoffsSenior

A startup's CTO wants to use a multi-master, globally distributed database (CockroachDB or Spanner) for a new application that serves 10,000 users in one country. You believe this is architectural overkill. How do you make the case for a simpler alternative without dismissing the CTO's technical sophistication?

Q051Integration PatternsMid

A company's order management system calls a third-party tax calculation API synchronously. The tax API has an SLA of 99.5% uptime and averages 200ms latency. When the tax API is slow or down, customer checkout fails. How do you decouple this dependency?

Q052Scalability & PerformanceMid

A B2B SaaS application has a 'generate monthly report' feature that takes 45 minutes to complete for large enterprise customers, blocking the HTTP request the entire time. Three customers recently hit browser timeouts. How do you redesign this?

Q053System DesignSenior

A global e-commerce company asks you to design the checkout service for Black Friday: 10 million concurrent users, peak 500,000 orders per minute, inventory must never oversell. Walk through the architecture you would propose and the trade-offs you would make explicit to the CTO.

Q054System DesignSenior

A healthcare SaaS startup needs a patient-messaging platform: real-time chat between patients and providers, message history for 7 years, HIPAA compliance, and the ability to audit every read. The team is four engineers. Propose a design that is simple enough for them to operate.

Q055System DesignMid

You are designing a URL shortener for a marketing platform. It must handle 100 million redirects per day, return links in under 10 ms globally, and let marketers expire links immediately. Describe your architecture and the key data store choices.

Q056System DesignMid

A B2B SaaS company stores each customer's data in a shared PostgreSQL database with a tenant_id column. They are hitting connection pool limits and slow queries at 800 tenants. The CTO wants a path to 5,000 tenants without a full rewrite. Propose a migration path.

Q057Cloud ArchitectureSenior

A financial services firm runs critical batch jobs on EC2 On-Demand instances costing $180,000 per month. The CFO demands a 60% reduction. The jobs run 6–10 hours daily, are idempotent, and must complete by 6 AM UTC. Design a cost-optimized architecture.

Q058Cloud ArchitectureSenior

An insurance company wants to deploy a new claims-processing app in AWS but their InfoSec team mandates: no data ever leaves a specific AWS region, all traffic must traverse a private network, and every API call must be logged. Walk through the network and compliance architecture.

Q059Cloud ArchitectureMid

Your team deployed a containerized API on ECS Fargate behind an ALB. During a load test, response times degrade from 50 ms to 4 seconds at 2,000 RPS, and ECS shows no CPU or memory pressure. The ALB access logs show 503 errors. Diagnose and fix.

Q060Cloud ArchitectureMid

A startup is building on GCP and receives a $120,000 monthly bill despite expecting $40,000. The CTO asks you to do a spend audit. Which services and configuration mistakes do you check first?

Q061Integration PatternsSenior

A logistics company has 14 third-party carrier APIs, each with different authentication schemes, rate limits, and payload formats. The platform team wants to standardize integration and add circuit breakers. Design the integration layer architecture.

Q062Integration PatternsSenior

An enterprise HR system needs to sync employee records bidirectionally between Workday and Salesforce in near-real-time, with conflict resolution when both systems update the same record within minutes of each other. How do you design this?

Q063Integration PatternsMid

You are onboarding a new partner bank whose API returns paginated responses of up to 50,000 records per hour. Your pipeline must process these records, deduplicate them, and load into a data warehouse within 15 minutes of the partner's batch publish. Design the pipeline.

Q064Integration PatternsMid

A retail client's ERP system can only send data via SFTP file drops in CSV format every 4 hours. Their new microservices platform expects REST API calls. You need to bridge these without modifying the ERP. Propose the integration.

Q065Microservices vs MonolithSenior

A FinTech startup has a well-functioning Rails monolith handling $50M/year in transactions. The engineering team of 8 wants to adopt microservices because they read it scales better. You are the consulting architect. How do you advise them?

Q066Microservices vs MonolithSenior

An online education platform decomposed into 22 microservices 18 months ago. Now the median API latency is 800 ms due to service-to-service call chains 8 hops deep. Engineering wants to re-aggregate some services. How do you approach the consolidation?

Q067Microservices vs MonolithMid

A media company runs a video transcoding job as part of their upload monolith. Transcoding jobs sometimes run for 45 minutes and cause OOM kills that crash the upload service. How do you extract transcoding without a full microservices rewrite?

Q068Microservices vs MonolithMid

Your company is starting a new product line alongside an existing monolith. The CTO wants to build the new product as microservices on Kubernetes while the monolith stays on-premise on VMs. How do you handle shared authentication and shared customer data?

Q069Scalability & PerformanceSenior

A social platform's notification service sends 200 million push notifications per day. During a viral event, the spike reaches 50 million in 10 minutes. The current architecture uses a single Kafka topic with 32 partitions and one consumer group. It falls 40 minutes behind during spikes. Redesign it.

Q070Scalability & PerformanceSenior

A gaming company's leaderboard API handles 10,000 RPS with sub-5 ms P99. After they add a new in-game event, traffic hits 80,000 RPS and P99 jumps to 350 ms. The leaderboard is backed by a sorted set in Redis. What do you investigate and fix?

Q071Scalability & PerformanceMid

A REST API serving product search returns results in 2.3 seconds on average. The product catalog has 5 million SKUs. The team is using a full table scan on PostgreSQL with a LIKE query. What is the right architecture to get to under 200 ms?

Q072Scalability & PerformanceMid

A SaaS analytics dashboard loads in 12 seconds for enterprise customers with 2 million rows in their dataset. Users are complaining loudly. The frontend makes 8 sequential API calls to build the dashboard. How do you diagnose and fix this?

Q073Security ArchitectureSenior

A healthcare company stores PHI in S3 and has discovered that an S3 bucket with 300,000 patient records was publicly accessible for 11 days due to a misconfigured bucket policy. You are called in as the incident architect. Walk through your response and the controls you add afterward.

Q074Security ArchitectureSenior

An enterprise wants to implement zero-trust network architecture across a hybrid environment: 400 VMs on-premise, 200 EKS workloads in AWS, and 50 SaaS integrations. Their current model is perimeter-based VPN. Design the migration.

Q075Security ArchitectureMid

A payments API currently stores API keys as plaintext in a PostgreSQL database. The security team flags this as critical. You need to fix it without downtime and without invalidating existing integrations. Describe your approach.

Q076Security ArchitectureMid

A startup deploys all infrastructure via Terraform on AWS. A junior engineer accidentally ran `terraform destroy` on the production workspace, deleting the RDS instance and the EKS cluster. Walk through the recovery and the controls to prevent recurrence.

Q077Data ArchitectureSenior

A retail conglomerate runs 12 business units each with their own data warehouse. The Chief Data Officer wants a unified data platform where analysts can query across all 12 warehouses without data movement. Design the architecture.

Q078Data ArchitectureSenior

A streaming media company ingests 80 GB per hour of clickstream events into S3 via Kinesis Firehose. Analysts need to run ad-hoc queries on the last 90 days with results in under 30 seconds. Current Athena queries on raw JSON files take 4–6 minutes. Fix the architecture.

Q079Data ArchitectureMid

An e-commerce company's recommendation engine trains daily on a MySQL database export. Training takes 9 hours and the data team wants to cut it to under 2 hours. The dataset is 500 GB of order history. Diagnose and redesign.

Q080Data ArchitectureMid

A SaaS startup uses MongoDB as their primary database for all data including financial ledger entries. The auditors flag that financial records need strict double-entry bookkeeping consistency. How do you advise them?

Q081Cost & TradeoffsSenior

A media streaming service spends $2.4M/year on AWS. The board wants a 30% reduction without degrading user experience. You are given 90 days. What is your action plan?

Q082Cost & TradeoffsSenior

An engineering team argues they should use Kafka for every service-to-service communication in the platform, replacing all synchronous REST calls. You disagree. How do you make the architectural case for a hybrid approach?

Q083Cost & TradeoffsMid

A startup is choosing between a multi-region active-active deployment and active-passive for their SaaS app. The CEO wants 99.99% uptime SLA. Walk them through the cost and complexity trade-offs.

Q084Cost & TradeoffsMid

Your team wants to adopt GraphQL for the entire API layer, replacing all existing REST endpoints. The estimated migration cost is 6 months of engineering time. The current REST APIs work well. Make the go/no-go recommendation.

Q085Migration StrategySenior

A 12-year-old on-premise Java EE application serving 8 million users needs to migrate to AWS. The application has a 1.2 TB Oracle database, 40 interconnected modules, and no test suite. The business requires zero data loss and less than 4 hours downtime. Design the migration strategy.

Q086Migration StrategySenior

A bank running its core banking system on IBM mainframe wants to gradually migrate to microservices in AWS. The mainframe processes 2 million transactions per day and has been running for 25 years. The board has a 3-year timeline. Propose a phased approach.

Q087Migration StrategyMid

A startup is migrating from Heroku to AWS because Heroku's dyno costs are becoming prohibitive at scale. They have 5 Heroku apps, a Heroku Postgres database, and use Heroku Scheduler for cron jobs. The team has no AWS experience. Propose a practical migration path.

Q088Migration StrategyMid

A company needs to migrate 200 TB of data from on-premise NAS to AWS S3 over a 6-month window without impacting production workloads running on the same NAS. Describe the migration plan.

Q089API DesignSenior

You are designing a public REST API for a fintech platform that will have 200+ enterprise customers, each requiring custom rate limits, versioning, and audit trails of every API call. How do you design the API platform layer?

Q090API DesignSenior

A mobile app's API has grown to 120 endpoints over 5 years. The mobile team complains that each screen requires 6–12 API calls, battery usage is high, and adding any new app feature requires backend changes. Propose a modernization approach.

Q091API DesignMid

You are reviewing a colleague's REST API design for a payment service. The endpoint `POST /payments/delete/{id}` has a 200 response body containing the deleted payment details. List the issues and propose corrections.

Q092API DesignMid

Your API returns complete customer objects (200 fields) on every response, even when the client only needs 3 fields. Mobile clients are complaining about slow load times in poor network conditions. How do you fix this without breaking existing clients?

Q093Event-Driven ArchitectureSenior

A travel booking platform uses synchronous REST calls between its flight, hotel, and car-rental services. During peak booking periods, a 2-second delay in the hotel service causes a cascade of timeouts that takes the entire booking flow offline. Redesign using event-driven patterns.

Q094Event-Driven ArchitectureSenior

A supply chain company ingests 500 supplier event feeds via webhooks. Events arrive out of order due to network delays. Consumers need to process events in the correct order per supplier. Design the ordering and deduplication architecture.

Q095Event-Driven ArchitectureMid

A new engineer on your team proposes using a single Kafka topic with 3 partitions for all events across the entire platform — orders, inventory updates, user signups, and payment confirmations. What issues do you raise?

Q096Event-Driven ArchitectureMid

You implemented an event-driven order processing pipeline. In production, you discover that some orders are being processed twice, resulting in duplicate charges to customers. Diagnose and fix.

Q097High Availability & DRSenior

A logistics company has an RTO of 15 minutes and RPO of 5 minutes for their dispatch system. Currently they run on a single EC2 instance with a single RDS instance. Design the HA architecture to meet these SLAs.

Q098High Availability & DRSenior

A SaaS platform is experiencing 45 minutes of downtime per month due to database-related incidents. The CTO wants to reduce this to under 5 minutes annually. Walk through the architectural changes needed.

Q099High Availability & DRMid

Your team has a disaster recovery plan documented but it has never been tested. A compliance audit requires you to demonstrate RTO and RPO adherence. How do you run the DR drill safely?

Q100High Availability & DRMid

A cold-storage archival system stores 2 PB in S3 Glacier. The business suddenly needs to restore 50 TB of data within 4 hours for a regulatory inquiry. Standard Glacier retrieval takes 3–5 hours per object. How do you handle this?

Q101Technology SelectionSenior

A rapidly growing fintech is choosing between AWS, Azure, and GCP for their primary cloud. They process financial transactions, have a small ML team that uses Python/TensorFlow, and plan to expand to India and Singapore in 18 months. Make the recommendation.

Q102Technology SelectionSenior

An engineering team needs to choose a message broker: they are evaluating Kafka, RabbitMQ, and AWS SQS/SNS. The use cases are: high-throughput event streaming (500k events/day), task queuing for background jobs (10k tasks/day), and pub/sub fan-out to 20 microservices. Recommend and justify.

Q103Technology SelectionMid

A startup is building an internal analytics dashboard that will serve 50 data analysts running ad-hoc SQL queries on a 500 GB dataset updated nightly. They ask you to choose between PostgreSQL, Redshift, BigQuery, and ClickHouse. Recommend one and explain the others.

Q104Technology SelectionMid

Your company is choosing an API gateway. The finalists are AWS API Gateway, Kong, and Nginx Plus. The requirements are: custom auth plugins, rate limiting per customer tier, and on-premise deployment alongside AWS workloads. Which do you recommend?

Q105Stakeholder & Pre-salesSenior

A prospective enterprise client's IT director says: 'We tried cloud migration three years ago and it cost twice as much as on-premise. We are not doing this again.' You are the solutions architect on the pre-sales call. How do you respond?

Q106Stakeholder & Pre-salesSenior

You are presenting a proposed microservices architecture to a board of directors who have non-technical backgrounds. The CTO, sitting in the room, disagrees with your recommendation and interrupts to advocate for a monolith. How do you handle this in the room?

Q107Stakeholder & Pre-salesMid

A customer's development lead insists that their existing on-premise IBM MQ implementation be retained as the central message bus for all new cloud-native workloads in Azure. The business case does not justify it technically. How do you navigate this?

Q108Stakeholder & Pre-salesMid

During a discovery call for a cloud architecture engagement, the client's CTO says their budget is $500K for the year but wants a global, multi-region, active-active deployment with real-time analytics and ML-driven features. How do you set expectations and structure the proposal?

Q109System DesignSenior

A global e-commerce platform serving 50M users needs to redesign its checkout service after a Black Friday incident where latency spiked to 8 seconds and 15% of orders failed. The current monolithic checkout handles inventory checks, payment, and order creation synchronously. How do you redesign it?

Q110Cloud ArchitectureSenior

A fintech startup runs all workloads in a single AWS region (us-east-1). After a 4-hour partial outage in that region affects three AZs simultaneously, the CISO mandates active-active multi-region within 90 days. The database is Aurora MySQL with 2TB of data. Walk through your architecture decision process.

Q111Integration PatternsMid

A logistics company needs to integrate a legacy AS400-based warehouse management system with a new React/Node.js order tracking portal. The AS400 team can provide flat-file extracts every 15 minutes but cannot expose APIs or message queues. How do you design the integration layer?

Q112Microservices vs MonolithSenior

A Series B SaaS company has a 6-year-old Django monolith serving 800 enterprise customers. The CTO wants to migrate to microservices because 'Netflix does it.' Engineering has 22 developers. Three specific pain points: deployments take 45 minutes, the billing module has caused 5 outages in 12 months, and the reporting module slows down the primary database. How do you advise?

Q113Scalability & PerformanceSenior

A gaming company's matchmaking service struggles during peak hours: it uses a Redis sorted set to rank 200,000 concurrent players and match them within 1 second. Latency is acceptable at 50K concurrent but degrades to 4 seconds at 150K. The backend is a single Go service with one Redis instance. Diagnose and fix.

Q114Security ArchitectureSenior

A healthcare SaaS company stores PHI in AWS and has just passed a SOC 2 Type II audit. The new CISO wants to achieve HIPAA compliance and specifically asks you to design the encryption-at-rest and in-transit strategy, key management, and audit logging architecture. Where do you start?

Q115Data ArchitectureSenior

A retail analytics team runs daily Spark jobs on 10TB of clickstream data stored in S3. Jobs take 6 hours and analysts complain that yesterday's data isn't available until 2 PM. The business wants hourly freshness. The current stack is EMR on-demand clusters with Hive metastore. What architectural changes do you propose?

Q116Cost & TradeoffsMid

A startup's AWS bill jumped from $12K to $47K/month after launching a new feature that generates thumbnail images. Engineering suspects it's Lambda and S3 costs. You have Cost Explorer access and 2 hours to diagnose and propose fixes. Walk through your approach.

Q117Migration StrategySenior

A 15-year-old insurance company runs a core policy management system on-premise on Oracle Database 12c and Java EE application servers. The board approved a 3-year cloud migration. The system has 400 stored procedures, 200 application tables, 12 integration points with partner systems, and zero automated tests. How do you structure the migration?

Q118API DesignMid

You're designing a public REST API for a B2B SaaS product that will be consumed by 50+ enterprise customers, each with their own integration teams. A customer reports that your API returns 200 OK with an error message in the body when validation fails. How do you redesign the error contract?

Q119Event-Driven ArchitectureSenior

A food delivery platform uses Kafka for order lifecycle events. Consumers are falling behind and consumer group lag is growing to 2 million messages during lunch peaks. The topic has 12 partitions. Investigation shows the bottleneck is a downstream notification service that calls an SMS gateway at 50ms per call. How do you fix this without increasing SMS costs?

Q120High Availability & DRSenior

A media streaming company has an RTO of 15 minutes and RPO of 5 minutes for its video catalog service. The service runs on AWS us-west-2 with Aurora PostgreSQL, ElastiCache Redis, and 200TB of video files in S3. Design the DR architecture and the runbook for a full region failure.

Q121Technology SelectionMid

A startup building a real-time collaborative document editor (think Notion/Google Docs) is choosing between WebSockets, Server-Sent Events, and long-polling for the real-time sync layer. They have 3 engineers, expect 10,000 concurrent users at launch, and are deploying on Vercel and Supabase. Make the call.

Q122Stakeholder & Pre-salesSenior

During a pre-sales discovery call, a prospect's CTO says: 'We've been burned before by vendors who overpromised. Our current system handles 500K requests per day and we need to be confident your platform can scale to 5M before we sign a 3-year contract.' How do you respond and what do you offer?

Q123System DesignMid

Design a URL shortener service (like bit.ly) that needs to handle 10,000 URL creation requests per second and 100,000 redirect requests per second. The shortened URLs must never expire and must be globally unique. Describe your data model and the critical path for redirect latency.

Q124Cloud ArchitectureMid

A team runs a containerized Node.js API on AWS ECS Fargate. During a security review, the auditor flags that task IAM roles are overly permissive — all tasks use the same role with S3 full access and RDS full access. How do you remediate and what's the right-sized permission model?

Q125Integration PatternsSenior

A bank wants to integrate 8 different core banking system modules (loans, deposits, cards, fx, trade finance, payments, KYC, reporting) that were built by 3 different vendors over 20 years. Each module has its own data model and synchronous API. They want a unified customer 360 view with sub-second query response. How do you design this?

Q126Microservices vs MonolithMid

A new service you're designing needs to send transactional emails (password reset, order confirmation, invoices). The team proposes building a custom email microservice. A senior engineer argues this should just be an inline library call from the application. Who is right and why?

Q127Scalability & PerformanceMid

An e-commerce site's product catalog page loads in 6 seconds on mobile. Lighthouse shows the LCP is a hero image (400KB unoptimized JPEG) fetched from the origin server in Europe, and the page makes 23 separate API calls to assemble the product details. The engineering team asks you to prioritize fixes. What order and why?

Q128Security ArchitectureMid

A developer on your team accidentally committed AWS credentials to a public GitHub repository. The commit was live for 47 minutes before being detected. Walk through the immediate incident response and the architectural controls you'd add to prevent recurrence.

Q129Data ArchitectureMid

A product analytics team stores events in BigQuery. They've noticed that their most expensive queries scan the entire events table (200TB) even when analysts are only asking about last 7 days of data. BigQuery billing is per-TB scanned. How do you fix this without rewriting every analyst query?

Q130Cost & TradeoffsSenior

A company is choosing between a multi-tenant SaaS architecture (all customers share infrastructure) and a single-tenant architecture (each customer gets dedicated infrastructure). They have 5 enterprise customers now and expect 500 in 3 years. The enterprise customers have strict data residency requirements and varying compliance certifications. Make the architectural recommendation.

Q131Migration StrategyMid

A team wants to migrate their Node.js backend from MongoDB Atlas to PostgreSQL because of joins complexity. The MongoDB collections have deeply nested documents with optional fields. Production traffic is 2,000 requests/second and they cannot take more than 5 minutes of downtime. How do you approach the migration?

Q132API DesignSenior

You're reviewing a GraphQL API for a social media platform. A single query can join User, Posts, Comments, and Reactions in one request. During a load test, you discover that a query fetching 100 users with their last 10 posts and all reactions generates 1,100 database queries (N+1 problem). How do you resolve this while keeping the flexible GraphQL interface?

Q133Event-Driven ArchitectureMid

A startup uses AWS SQS for background job processing. They notice that when a payment processing job fails, it keeps being retried endlessly, causing the card to be charged multiple times. The queue has no dead-letter queue configured. Fix the architecture and add proper idempotency.

Q134High Availability & DRMid

Your team deploys a new version of a payment service to production and within 3 minutes, error rates spike to 18%. The deployment was a rolling update on Kubernetes and half the pods are still running v1. How do you respond and what does your rollback procedure look like?

Q135Technology SelectionSenior

A real-time fraud detection system needs to evaluate 5,000 transactions per second against 200 ML model features. The current Python ML pipeline takes 350ms per transaction (too slow for real-time). The team is debating between rewriting in Java/Go, using a feature store, or deploying models to a dedicated inference service. What's your recommendation?

Q136Stakeholder & Pre-salesMid

A VP of Engineering at a prospect company asks you: 'Why should we choose your platform over building it ourselves? We have a strong engineering team.' How do you handle this objection without dismissing their capability?

Q137System DesignSenior

Design the notification system for a ride-sharing platform that needs to send push notifications, SMS, and in-app messages across 20 million daily active users. Peak load is during surge events where 500,000 notifications are sent in under 30 seconds. Describe the architecture and the handling of delivery failures.

Q138Cloud ArchitectureSenior

An enterprise customer insists on running your SaaS product in their own AWS account (a 'Bring Your Own Account' deployment) rather than your managed cloud. Their IT team controls the account. How do you design a deployment and update model that maintains your ability to push updates without requiring their IT team's involvement for every release?

Q139Integration PatternsMid

Two microservices need to share data: the Order service needs to know the current inventory level from the Inventory service to decide whether to accept an order. A teammate proposes adding a synchronous REST call from Order to Inventory during the order placement flow. What concerns do you raise and what alternatives do you propose?

Q140Microservices vs MonolithSenior

You're joining a company as their first Solutions Architect. They have 40 microservices, 8 engineers, and most services are owned by one person each. Deployments require coordinating 3-4 services simultaneously. When one engineer leaves, their service becomes unmaintainable. What's your diagnosis and recommendation?

Q141Scalability & PerformanceSenior

A SaaS product's background job queue (Sidekiq on Redis) starts backing up every Sunday evening when a large batch of weekly report generation jobs is enqueued for 50,000 customers. Each report takes 20 seconds of CPU to generate. By Monday morning some customers still haven't received their reports. How do you redesign for predictable delivery?

Q142Security ArchitectureSenior

A fintech company runs a microservices architecture on EKS. A penetration test reveals that any compromised pod can call any other service's API without restriction — there is no service-to-service authentication or network segmentation. Design a zero-trust network architecture for the cluster.

Q143Data ArchitectureSenior

A media company has a recommendation engine that runs hourly Spark batch jobs to generate personalized content recommendations for 5M users. Users complain that recommendations don't reflect content they watched 20 minutes ago. The business wants real-time recommendations but the data science team insists their models require batch features. How do you bridge this gap?

Q144Cost & TradeoffsMid

A company is running 50 EC2 On-Demand m5.xlarge instances 24/7 for a production workload. A cloud cost consultant says they should switch to Reserved Instances. The engineering team argues for Spot Instances instead. Evaluate both options and recommend.

Q145Migration StrategySenior

A telecom company needs to migrate 300 million customer records from a Teradata data warehouse to Google BigQuery without disrupting daily ETL pipelines that run at 2 AM, or the BI dashboards that 800 analysts use. The migration window is 6 months. Design the migration plan.

Q146API DesignMid

You need to design a webhook system for a SaaS platform where customers subscribe to receive HTTP callbacks when events occur (payment received, subscription renewed, etc.). A customer reports they missed 3 days of events because their endpoint was returning 500 errors and the platform stopped retrying after 1 hour. Design a robust webhook delivery system.

Q147Event-Driven ArchitectureSenior

A marketplace platform uses an event-driven architecture with Kafka. The team discovers that the 'OrderPlaced' event schema has been changed by the producer (a new required field added) without coordinating with 6 downstream consumers. Three consumers are crashing on deserialization. How do you establish schema governance going forward and fix the immediate crisis?

Q148High Availability & DRSenior

A financial services company's trading platform requires 99.999% uptime (5-minute annual downtime budget). Their current architecture achieves 99.95% (4.5 hours/year). Describe what architectural changes are required to close this gap and what the operational practices must look like.

Q149Technology SelectionMid

A startup is building a geospatial application that needs to find all points of interest within a 5km radius of a user's location, filtering by category, across 10 million POIs. The current implementation using PostgreSQL with lat/lng columns and a Haversine query takes 8 seconds. What database or indexing strategy do you recommend?

Q150Stakeholder & Pre-salesSenior

You're presenting a cloud migration proposal to a CFO and CTO together. The CTO is enthusiastic but the CFO says: 'Our current on-premise setup cost us $800K to build and still has 4 years of useful life. Why would I spend $600K/year on cloud when I've already paid for the hardware?' How do you respond?

Q151System DesignMid

Design a rate limiting system for a public API that needs to enforce: 100 requests/minute per API key, 1,000 requests/minute per IP address, and a global cap of 500,000 requests/minute across all clients. The system must work across 10 horizontally scaled API gateway instances.

Q152Cloud ArchitectureMid

A development team is using a single AWS account for all environments (dev, staging, production). A developer accidentally deleted a production RDS snapshot while cleaning up dev resources. How do you redesign the account structure and access controls to prevent this class of incident?

Q153Integration PatternsSenior

A large retailer has 30 supplier integrations, each with a custom EDI or flat-file format. Adding a new supplier takes 6 weeks of integration work. The integration team is a bottleneck. Propose an integration platform architecture that reduces new supplier onboarding to 2 days.

Q154Microservices vs MonolithMid

A team is building a new B2B SaaS product from scratch. They have 5 engineers and 6 months to get to first paying customer. The lead engineer wants to start with microservices 'to avoid refactoring later.' Make a recommendation.

Q155Scalability & PerformanceMid

A SaaS application's PostgreSQL database is at 95% CPU during business hours. Investigation shows the top CPU-consuming query is a reporting query that joins 4 large tables and runs every 5 minutes via a scheduled task, scanning 50M rows each time. How do you fix this without scaling up the database?

Q156Security ArchitectureMid

A developer asks you to review a new feature that stores user-uploaded profile images in a public S3 bucket because 'it makes serving images simpler.' What security concerns do you raise and what alternative do you recommend?

Q157Data ArchitectureMid

A startup's analytics database is a single PostgreSQL instance that stores both operational data (orders, users, payments) and analytics queries run by the business team. When a BI analyst runs a complex aggregation, the application's API response times spike from 50ms to 2 seconds. How do you fix this without a major architectural overhaul?

Q158Cost & TradeoffsSenior

A company's data engineering team runs 50 Spark jobs nightly on EMR, and the AWS bill shows $45,000/month for EMR. Upon investigation, 80% of jobs finish within 20 minutes but clusters are configured with a 4-hour minimum billing period and jobs are spread across 15 separate clusters. How do you optimize?

Q159Migration StrategyMid

A company runs a cron job on a bare-metal server that processes daily financial reconciliation. The server is end-of-life in 3 months. The cron job is 3000 lines of Perl, undocumented, and the original developer left 2 years ago. How do you migrate this safely?

Q160API DesignSenior

A mobile team complains that the existing REST API requires 7 round trips to assemble the home screen data (user profile, notifications count, recent orders, recommended products, active promotions, loyalty points, delivery address). Network latency is 80ms and the screen takes 640ms to load. Design a solution.

Q161Event-Driven ArchitectureMid

A team builds an event-sourced order management system using Kafka. After 18 months, the orders topic has 2 billion events and rebuilding the order state from scratch (replaying the full event log) takes 14 hours. New service instances are slow to start. How do you fix the snapshot problem?

Q162High Availability & DRMid

A startup's only database administrator left last month. The PostgreSQL database has no automated backups configured, no monitoring, and no failover. Production traffic is 500 requests/second. You have 1 week to reduce the risk to an acceptable level. What do you prioritize?

Q163Technology SelectionSenior

A logistics company needs to build a route optimization engine that finds optimal delivery routes for 10,000 packages across 500 drivers in under 30 seconds, updated every 15 minutes as new packages arrive and drivers complete stops. Evaluate whether to build this in-house or use a third-party solver, and recommend a tech stack.

Q164Stakeholder & Pre-salesMid

During a technical discovery call, the prospect's lead architect says your proposed microservices-based solution is 'over-engineered for our scale' and suggests a simpler monolithic approach would meet their needs. How do you respond and when do you agree with them?

Q165System DesignSenior

Design an audit logging system for a healthcare platform that must record every read and write access to patient records, be tamper-evident, support queries like 'all accesses by user X in the past 30 days' and 'all users who accessed patient Y's records', and retain logs for 7 years. Expected write rate: 50,000 audit events/second at peak.

Q166Cloud ArchitectureSenior

A gaming company runs a multiplayer game server on EC2 instances behind a Network Load Balancer. Players complain about inconsistent latency — some see 20ms, others see 180ms — even though they're in the same city. Investigation shows players are being routed to game servers in different AWS regions. How did this happen and how do you fix it?

Q167Integration PatternsMid

A team needs to synchronize customer data bidirectionally between their CRM (Salesforce) and their internal PostgreSQL database. Updates can happen in either system, and they need to be reflected in the other within 5 minutes. Two engineers have proposed different approaches: (A) poll both systems every minute and compare, (B) use webhooks from Salesforce and CDC from PostgreSQL. Evaluate both.

Q168Microservices vs MonolithSenior

You're designing the architecture for a new AI-powered customer support platform. The core features are: NLP intent classification, knowledge base search, human escalation routing, ticket management, and reporting. The team has 12 engineers and is building for 500 enterprise customers. What service decomposition do you recommend and why?

Q169System DesignSenior

A large e-commerce retailer runs a product catalog service that handles 200K RPM on normal days but spikes to 2M RPM during flash sales. Their current Redis cache has a 60% hit rate and the PostgreSQL database buckles under the load. How would you redesign this system to handle 10x peaks without downtime?

Q170System DesignMid

You are designing a URL shortener service expected to handle 10K writes per second and 100K reads per second with globally low latency. Walk through the key design decisions, from ID generation to storage choice to read path optimization.

Q171Cloud ArchitectureSenior

Your fintech startup processes payment authorizations on AWS and your CTO wants to achieve 99.99% uptime SLA. You currently deploy in a single region (us-east-1) with Multi-AZ RDS. Regulatory constraints require data residency in the US. Design an architecture that achieves the SLA without violating compliance.

Q172Cloud ArchitectureMid

A SaaS company runs all workloads in a single AWS account shared by dev, staging, and production environments. You've been asked to design a multi-account strategy. What are the key principles, account boundaries, and governance mechanisms you would put in place?

Q173Integration PatternsSenior

A logistics company has 40 internal services and 15 external partner integrations (carriers, customs brokers, ERP systems). They use point-to-point REST calls, resulting in a spaghetti integration landscape. Three incidents in the past quarter were caused by a carrier API change breaking downstream services. How do you re-architect this?

Q174Integration PatternsMid

You need to integrate a legacy mainframe order management system with a new microservices platform. The mainframe processes batch files at midnight and cannot be modified. How do you design the integration layer to make real-time order data available to the new platform?

Q175Microservices vs MonolithSenior

A health-tech startup has a Django monolith serving 50K active users. The CTO wants to break it into microservices. The engineering team is 8 developers. The monolith has 200K lines of code and a shared PostgreSQL database. Make the case for or against microservices and propose an approach.

Q176Microservices vs MonolithMid

Two microservices—Order Service and Inventory Service—need to maintain consistency. When an order is placed, inventory must be reserved atomically. They have separate databases. A 2PC distributed transaction is on the table. How do you approach this?

Q177Scalability & PerformanceSenior

A media streaming company's recommendation engine takes 800ms p99 to serve personalized content recommendations, causing a high abandonment rate on their mobile app. The engine runs Python ML inference on CPU-only EC2 instances. Recommendations depend on a user's 90-day watch history and 300-dimensional embedding vectors. How do you get to sub-100ms p99?

Q178Scalability & PerformanceMid

A SaaS application's background job queue has a 45-minute average processing time for report generation jobs during peak hours, up from 5 minutes three months ago. The queue uses Sidekiq backed by Redis. Nothing changed in the job code. What is your diagnostic and resolution approach?

Q179Security ArchitectureSenior

A healthcare platform stores PHI (Protected Health Information) and is undergoing a HIPAA compliance audit. The auditors flag that your EC2-based application has IAM role credentials with overly broad S3 permissions, database credentials stored in environment variables, and no encryption at rest on some EBS volumes. How do you remediate this systematically?

Q180Security ArchitectureMid

Your API gateway currently accepts JWT tokens signed with RS256 and validates them at every service in your microservices cluster. Each service independently calls the identity provider's JWKS endpoint to fetch public keys. During a high-traffic event, the identity provider's JWKS endpoint returns 429s, causing cascading auth failures. How do you fix this?

Q181Data ArchitectureSenior

A retail company runs analytics on transactional data from a PostgreSQL OLTP database using direct queries on the production database. These analytics queries cause 30-second lock contention events during business hours, slowing down customer-facing operations. Design an analytics architecture that fully isolates analytical workloads.

Q182Data ArchitectureMid

A startup is building a multi-tenant SaaS platform and must decide between a shared database with tenant ID columns, a shared database with separate schemas per tenant, or a separate database per tenant. They expect 500 tenants at launch growing to 10,000 within 2 years. Walk through the trade-offs.

Q183Cost & TradeoffsSenior

Your AWS bill has grown from $80K to $320K per month over 18 months while the customer base grew 2x. The CTO asks you to perform a cost optimization review. You have 3 weeks and a mandate to reduce costs by 40% without degrading performance. What is your methodology?

Q184Cost & TradeoffsMid

A startup is choosing between self-managed Kubernetes on EC2 and AWS EKS for their container orchestration. They have 4 backend engineers and expect to run 15-20 microservices. What are the real cost and operational trade-offs, and what would you recommend?

Q185Migration StrategySenior

A financial services firm has a core trading platform running on Oracle Database 12c on bare-metal servers in their own data center. The CEO mandates a move to AWS within 18 months. The database is 8TB, has 2,000 stored procedures, and 40+ dependent applications. How do you plan this migration?

Q186Migration StrategyMid

An e-commerce company wants to migrate their on-premise Jenkins CI/CD pipelines to GitHub Actions. They have 150 pipelines, some running on-premise agents for accessing internal resources. How would you plan this migration without disrupting ongoing development?

Q187API DesignSenior

Your public REST API serves 50+ third-party partners who have built integrations. You need to introduce breaking changes to the resource model (renaming fields, changing a field type from string to object). How do you manage API versioning and the partner migration without breaking existing integrations?

Q188API DesignMid

A mobile app team is complaining that your REST API requires 6-8 separate calls to assemble the home screen data, causing slow first loads. The backend team suggests GraphQL. How would you evaluate this proposal and what alternatives exist?

Q189Event-Driven ArchitectureSenior

A ride-sharing platform uses Kafka to stream driver location updates at 50K events/second. Downstream consumers (ETA calculator, surge pricing engine, dispatch engine) all subscribe to the same topic. The surge pricing engine is slow and its consumer group is falling 30 minutes behind, causing stale surge prices. How do you fix this without affecting other consumers?

Q190Event-Driven ArchitectureMid

Your team is implementing an event-driven architecture using Amazon SNS and SQS fan-out. A new requirement demands that events be processed exactly once, but your SQS consumers occasionally process a message twice due to network timeouts before deleting it. How do you achieve exactly-once semantics?

Q191High Availability & DRSenior

A global SaaS company has an RTO of 15 minutes and RPO of 1 minute for their primary application database. The database is a 2TB MySQL 8.0 instance on AWS RDS in us-east-1. Design a DR architecture that meets these requirements.

Q192High Availability & DRMid

Your company runs a Node.js API on a single EC2 instance behind an Elastic Load Balancer. The instance goes down for 20 minutes once a month due to hardware failures or updates. The business wants 99.9% uptime. How do you redesign the deployment for high availability?

Q193Technology SelectionSenior

A startup is building a real-time collaborative document editing platform (similar to Google Docs). They need to handle simultaneous edits from multiple users, persist the document state, and show live cursors. Your team is evaluating Operational Transformation vs CRDT for conflict resolution. Make the recommendation.

Q194Technology SelectionMid

Your team needs to choose a message broker for a new order processing system expected to handle 5K orders per minute at launch, growing to 50K per minute in 2 years. Candidates are RabbitMQ, Apache Kafka, and AWS SQS. How do you decide?

Q195Stakeholder & Pre-salesSenior

A prospective enterprise customer (Fortune 500 manufacturing firm) asks you to prove your SaaS platform can meet their 99.95% uptime SLA and their requirement for data to never leave the EU. They have a CISO review next week. How do you prepare and present your architecture for this review?

Q196Stakeholder & Pre-salesMid

During a pre-sales technical discovery call with a mid-market SaaS company, their CTO says 'we tried a similar platform 2 years ago and the migration destroyed 3 months of productivity. We're not doing this again.' How do you handle this objection and what would you commit to?

Q197System DesignSenior

A global news platform serves 10M daily active users and experiences traffic spikes of 100x normal load when a breaking news event occurs. Their monolithic Node.js application crashes under these spikes. Design a resilient architecture that handles unpredictable 100x traffic spikes within 60 seconds.

Q198Cloud ArchitectureSenior

An insurance company wants to adopt a multi-cloud strategy using both AWS and Azure to avoid vendor lock-in. Their platform engineering team of 6 must manage both clouds. What are the real trade-offs, and under what conditions would you recommend multi-cloud vs a committed single-cloud strategy?

Q199Integration PatternsSenior

A B2B SaaS platform must send real-time transaction data to 80 enterprise customers, each of whom has a different preferred integration method: some want webhooks, some want SFTP file drops, some want direct database inserts into their data warehouse, and some want Kafka topic access. How do you design a single event pipeline that supports all four delivery mechanisms?

Q200Microservices vs MonolithSenior

A digital bank's monolith handles account management, payments, fraud detection, and customer notifications in a single Java Spring Boot application. The fraud detection team wants to deploy model updates multiple times per day, but the current monolith release cycle is biweekly. How do you architect the extraction without disrupting banking operations?

Can you defend these answers under follow-up pressure?

Book a mock interview with a senior Solutions Architect mentor — structured scorecard, replay, and a gap plan.

Book a Mock Interview →
FREE PROFILE AUDIT

Book your free audit

Tell us where you are — a senior mentor reviews your profile and shows you exactly what's blocking interview calls. Only name, email and role are required; the more you share, the sharper your audit. No spam, no obligation.

A FEW MORE DETAILS (OPTIONAL)
I want

* required · Prefer talking? WhatsApp +91 83598 96054 or email connect@hikecatalyst.com

📄 Score My Resume