HikeCatalystHikeCatalyst
← All roles

Data Analyst / BI Analyst Interview Questions

200 scenario-based questions with detailed model answers, organized skill-wise and tool-wise. Filter by topic, level or keyword, reveal the answer — then pressure-test yourself in a real mock.

SKILL / TOOL
LEVEL
200 questions
Q001SQL & AnalyticsSenior

Your e-commerce company's revenue report shows a 15% drop last Tuesday, but the engineering team says no deployments happened. You have access to raw transaction tables. Walk through your SQL-based investigation—what queries do you write first, and what are the most likely culprits?

Q002SQL & AnalyticsMid

A product manager asks you to calculate 7-day rolling retention for a mobile gaming app. The events table has user_id, event_name, and event_timestamp. She wants to see cohort retention by the week users first installed. How do you structure this query?

Q003SQL & AnalyticsSenior

You're auditing a finance team's SQL report used for monthly revenue recognition. You discover the query uses a LEFT JOIN between invoices and payments without a WHERE clause on payment status, effectively double-counting partially paid invoices. The report has been used in board presentations for six months. How do you handle this?

Q004Data VisualizationMid

A marketing analyst built a stacked bar chart showing channel attribution across 12 months, with 9 color-coded channels. The VP of Marketing says the chart is 'impossible to read.' What's wrong with this design, and how would you redesign it?

Q005Data VisualizationSenior

Your team is building a real-time operational dashboard for a logistics company showing package delivery status across 50 cities. The data refreshes every 5 minutes. A product manager wants animated transitions and a globe visualization. You have limited front-end resources. How do you push back and what do you build instead?

Q006Tableau / Power BISenior

Your Tableau Server extract refreshes are failing silently every night for a key sales dashboard. By morning, executives are looking at two-day-old data but the dashboard shows no error state. Walk through how you diagnose and fix both the immediate failure and the alerting gap.

Q007Tableau / Power BIMid

A colleague built a Power BI report with 15 visuals on one page, and it takes 45 seconds to load. Your manager asks you to optimize it. What's your diagnostic process, and what are the three most common causes of slow Power BI reports?

Q008Tableau / Power BISenior

Your company is deciding between Tableau and Power BI for a new enterprise BI platform serving 300 analysts, half of whom are in finance (heavy Excel users) and half in product and engineering (Python/SQL users). The IT team prefers Azure. How do you frame this decision?

Q009Dashboard DesignMid

You're designing a weekly performance dashboard for a SaaS customer success team. The team lead wants to track 22 different metrics. How do you prioritize which metrics to show, and how do you structure the information hierarchy?

Q010Dashboard DesignSenior

You've delivered a beautifully designed executive dashboard for a retail chain's COO. Three months later, adoption is near zero—the COO checks it once a week at most, and regional managers don't use it at all. What went wrong and how do you recover?

Q011Dashboard DesignSenior

A business unit head wants a single 'company health score' shown prominently on the executive dashboard as one number. You think this is a bad idea. How do you handle this conversation, and what do you build instead?

Q012Statistics & MetricsSenior

Your fintech company's fraud detection model flags 0.3% of transactions. The compliance team says the false positive rate is unacceptably high and is causing customer complaints. Your data science colleague says the model is 99.7% accurate. Who's right, and how do you reconcile this?

Q013Statistics & MetricsMid

You're analyzing weekly active users for a subscription app. The metric has been flat for three months at 50,000 WAU, but your CEO says growth is stalling. A junior analyst says 'it's fine, the numbers haven't changed.' What deeper analysis would you run?

Q014Statistics & MetricsSenior

An e-commerce team reports that average order value increased by $12 after a UX redesign. The sample size is 500,000 orders. The p-value is 0.001. A VP declares it a huge win and wants to roll out the redesign company-wide immediately. What do you check before agreeing?

Q015Excel & SpreadsheetsMid

A finance analyst shares an Excel model used for monthly P&L reporting. You discover formulas like =C14+C15+C16+C17 instead of =SUM(C14:C17), hardcoded values scattered in formula cells, and no documentation. You're asked to audit and improve it. Where do you start?

Q016Excel & SpreadsheetsSenior

Your company's sales operations team runs a 15-tab Excel workbook with 200 VLOOKUP formulas connecting five data sources. Recalculation takes 3 minutes and crashes once a week. You've been asked to fix it without rebuilding it entirely in a BI tool. What do you do?

Q017ETL & Data PrepSenior

You're running a nightly dbt pipeline that transforms raw Salesforce CRM data into a marketing attribution model. One morning you find that 40,000 lead records silently disappeared from the output table overnight. No dbt test failed. How do you investigate and prevent this in the future?

Q018ETL & Data PrepMid

You're joining two datasets: a user activity log (100 million rows, daily events) and a user profile table (5 million rows). Your Spark job is timing out. What are the likely causes and how do you fix them?

Q019ETL & Data PrepSenior

Your company is migrating from an on-premise Informatica ETL pipeline to dbt + Snowflake. The legacy pipeline has 300 transformation jobs, some of which have undocumented business logic buried in stored procedures. How do you approach the migration without breaking downstream reports?

Q020Stakeholder CommunicationMid

A senior VP asks you for a report showing 'everything about our customer churn.' You have two weeks and a data warehouse full of relevant tables. How do you handle the initial scoping conversation, and what do you deliver?

Q021Stakeholder CommunicationSenior

You've delivered an analysis showing that a major product initiative your company spent $2M on over 6 months has had zero measurable impact on the target KPI. The product VP is about to present this initiative as a success to the board. How do you handle this?

Q022Stakeholder CommunicationSenior

Two business units are in conflict over which one 'owns' a customer who uses both products. Each unit's analyst has produced a different revenue attribution report, and the CFO has asked you, as the central data team lead, to resolve it. What's your process?

Q023KPI & Metric DefinitionSenior

You've joined a B2B SaaS company where the sales team tracks 'revenue' as TCV (total contract value) at signing, but the finance team tracks it as recognized revenue. The marketing team tracks MQL volume. At the weekly leadership meeting, everyone quotes different numbers and the CEO is frustrated. How do you fix this?

Q024KPI & Metric DefinitionMid

Your product team wants to use 'number of logins' as the primary engagement metric for a B2B project management tool. You think this is a bad choice. How do you make your case and what do you propose instead?

Q025KPI & Metric DefinitionSenior

You're defining the North Star Metric for a two-sided marketplace connecting freelance designers with startups. The head of growth wants DAU; the head of revenue wants GMV; the head of product wants 'matches made.' How do you facilitate this decision?

Q026A/B Testing & ExperimentationSenior

You're running an A/B test on a checkout flow redesign for an online retailer. After 2 weeks, treatment shows +8% conversion rate, p=0.04. Your VP of Product wants to ship immediately. What do you check before agreeing, and what would make you refuse to call it a win?

Q027A/B Testing & ExperimentationMid

Your email marketing team ran an A/B test on subject lines. Version A got 22% open rate, Version B got 24% open rate, with 5,000 emails per variant. They declare B the winner and plan to use it for the next 200,000-person send. What concerns do you raise?

Q028A/B Testing & ExperimentationSenior

Your growth team runs 50 A/B tests per quarter. The experiment lead reports an 'experiment win rate of 70%.' Your data science VP suspects p-hacking. How do you audit the experimentation program?

Q029Data StorytellingSenior

You've completed a six-week analysis showing that the company's most profitable customer segment is shrinking while the company is growing revenue overall. The CEO needs to present this to investors next month. How do you structure the narrative?

Q030Data StorytellingMid

You're presenting monthly marketing attribution results to a team of brand managers who are not data-literate. Your analysis shows that performance marketing is under-credited in last-touch models and brand campaigns are over-credited. How do you communicate this without the room going defensive?

Q031Data StorytellingSenior

You've been asked to present a complex cohort retention analysis to a board of directors who have 8 minutes for your section. The full analysis is 40 slides. How do you prepare your 8-minute version, and what do you leave out?

Q032Python & PandasMid

You have a Pandas DataFrame with 10 million rows of clickstream data. Filtering and aggregation operations are taking 4 minutes each. Your colleague says 'just use Spark.' Before rewriting everything, what Pandas-native optimizations do you try?

Q033Python & PandasSenior

You're building an automated weekly report pipeline in Python that pulls data from a Postgres database, runs a cohort analysis, and emails a formatted Excel report to 20 stakeholders. It's been running in production for 3 months. What can go wrong, and how do you build it to be resilient?

Q034Python & PandasSenior

A data analyst on your team wrote a 600-line Python script that generates the company's most-watched sales report. It has no tests, uses global variables, and mixes data fetching, transformation, and rendering in one function. You're asked to refactor it. How do you approach this without breaking production?

Q035Data QualitySenior

After a major data migration to a new warehouse, your business users start reporting that revenue figures in dashboards don't match the ERP system. The discrepancy is about 3% and appears only in certain regions. How do you systematically investigate and resolve this?

Q036Data QualityMid

Your daily user activity table suddenly has 40% NULL values in the device_type column starting two days ago. No one reported a change. How do you investigate?

Q037Data QualitySenior

You're implementing a data quality framework for a healthcare analytics team. The data includes patient records, claims data, and lab results, with regulatory requirements for completeness and accuracy. How do you design the DQ framework?

Q038Business Domain ModelingSenior

You're building the data model for a multi-tenant SaaS platform that bills customers on a usage-based model (per API call, per GB stored, per active seat). The billing team needs monthly invoices; the product team needs daily usage analytics; the finance team needs ARR forecasting. How do you design the core tables?

Q039Business Domain ModelingMid

You're asked to design a data model to track sales rep performance at a company with complex hierarchical territories (regions → districts → teams → reps) and a product hierarchy (category → subcategory → SKU). The VP of Sales wants to slice quota attainment at every level. How do you structure this?

Q040Business Domain ModelingSenior

A media company has three content types (articles, videos, podcasts), each with different engagement metrics. The analytics team is struggling because separate tables exist per content type with incompatible schemas. You're asked to unify them. What are the trade-offs between different modeling approaches?

Q041SQL & AnalyticsMid

Your team's analytics database has a slow-running query that joins an orders table (50M rows) with a customers table (2M rows) on customer_id. The query takes 8 minutes in Postgres. No indexes exist. What do you do?

Q042Statistics & MetricsMid

Your manager asks you to calculate the 'average time to purchase' for an e-commerce site. When you report the mean as 4.2 days, the product team says it doesn't match their intuition. What's going on, and what's the better metric to report?

Q043ETL & Data PrepMid

You're asked to build a customer 360 table that joins CRM data, web analytics, support tickets, and billing data, all with slightly different customer identifiers. Email is in CRM, user_id is in web analytics, account_id is in billing, and ticket_email is in support. How do you approach the identity resolution problem?

Q044Tableau / Power BIMid

A sales analyst built a Tableau dashboard that filters to show only closed-won deals. She now wants to add a conversion funnel showing all pipeline stages. Adding the new chart breaks all the existing filters. How do you help her fix this?

Q045Data VisualizationSenior

Your team is launching a public data transparency report for a healthcare nonprofit showing COVID outcome disparities across racial groups. You're the lead analyst. What are the visualization ethics considerations, and how do they change your design choices?

Q046Dashboard DesignMid

You're asked to build a single dashboard for both a data-savvy engineering VP and a less technical HR director. Both use the same workforce attrition data but have very different needs. How do you design for both audiences in one tool?

Q047KPI & Metric DefinitionMid

A growth team asks you to track 'virality' as a KPI. They define it as 'number of referrals generated per month.' You think this definition is insufficient. What's wrong with it, and what do you propose?

Q048A/B Testing & ExperimentationMid

You want to test whether adding a progress bar to an onboarding flow improves completion rates. You only have 200 new signups per week. How do you design an experiment that gives you a reliable answer within 8 weeks?

Q049Data StorytellingMid

You've been asked to present a complex multi-year trend analysis to a team of non-technical regional sales managers at a quarterly business review. The analysis shows three interacting variables: headcount growth, quota per rep, and attainment rate. How do you structure a 10-minute presentation?

Q050Python & PandasMid

You inherit a Jupyter notebook that generates a weekly churn report. Every cell takes forever to re-run because the raw data query returns 50 million rows and Pandas loads them all into memory. How do you fix this without a full rewrite?

Q051Data QualityMid

You discover that 8% of rows in your transaction table have negative amounts. Your initial assumption is data corruption, but your colleague says it might be legitimate. How do you investigate before deciding what to do?

Q052Business Domain ModelingMid

You're modeling subscription revenue for a SaaS company with monthly and annual plans, mid-cycle upgrades, pro-rated billing, and mid-cycle cancellations. The finance team wants MRR to be accurate to the dollar. How do you design the MRR table?

Q053SQL & AnalyticsSenior

You're tasked with building a near-real-time anomaly detection system for a payment processing company that needs to flag unusual merchant transaction volumes within 5 minutes of occurrence. You have Kafka and a Postgres analytical database. How do you architect the detection logic?

Q054Statistics & MetricsSenior

Your data shows a strong negative correlation (r = -0.72) between customer support ticket volume and retention rate across your 50 customer accounts. Your VP says this proves that bad support causes churn. How do you challenge this interpretation?

Q055Stakeholder CommunicationMid

You've delivered a report to a regional sales director who responds with 'these numbers don't look right' but can't tell you specifically what's wrong. How do you handle this feedback?

Q056SQL & AnalyticsSenior

Your e-commerce company runs a Redshift cluster. A critical daily revenue report that used to finish in 4 minutes now takes 47 minutes. The query joins orders, order_items, products, and users across 200M rows. Walk through your complete investigation and remediation approach.

Q057SQL & AnalyticsMid

A product manager asks you to calculate 30-day rolling retention for a mobile game with 5M daily active users. Your events table has (user_id, event_date, event_type). Write the approach and describe the key SQL constructs you'd use.

Q058SQL & AnalyticsSenior

A fintech startup's fraud team asks you to flag accounts where the total transaction amount in any rolling 7-day window exceeds $50,000. The transactions table has 800M rows. They need results in under 10 minutes on Snowflake. Design the solution.

Q059Data VisualizationMid

Your head of growth reviews a line chart you built showing weekly signups. She says the chart is 'hard to read' but can't explain why. You look at it and see 12 overlapping colored lines for 12 acquisition channels. How do you redesign it?

Q060Data VisualizationSenior

You are designing a real-time operations dashboard for a logistics company that tracks 10,000 active shipments. The NOC team needs to identify delayed shipments within seconds of viewing. Describe your complete visualization design decisions, including chart types, color, and data update strategy.

Q061Tableau / Power BISenior

Your Power BI report serving 400 users refreshes from a 50GB Azure Synapse table every 30 minutes, but refresh jobs are failing after 20 minutes with a gateway timeout. The dataset cannot be moved. What is your remediation plan?

Q062Tableau / Power BIMid

A Tableau dashboard you published last week shows different totals than the source Excel file that a regional sales manager maintains manually. The difference is $240K. How do you systematically debug this discrepancy?

Q063Tableau / Power BISenior

You are migrating 80 Tableau dashboards to Power BI for a bank. Halfway through, stakeholders discover that Power BI's DAX-calculated measures produce different numbers than Tableau's LOD expressions for the same KPIs. How do you manage this?

Q064Dashboard DesignMid

You are asked to build a single dashboard for both the CEO (weekly strategic review, 5 minutes of attention) and the growth team (daily deep-dives, 45-minute sessions). They insist on one URL. How do you design this?

Q065Dashboard DesignSenior

A healthcare SaaS company's customer success team uses a Looker dashboard to monitor patient engagement scores. After a UI refresh, the team's CSAT dropped 18 points because they said the dashboard was 'confusing.' Diagnose and fix this.

Q066Dashboard DesignSenior

You are designing a self-serve analytics portal for 500 non-technical business users at a retail chain. Usage data shows that 90% of users view only 3 of the 40 available reports. How do you redesign the portal?

Q067Statistics & MetricsSenior

You work at a ride-share company. The data science team claims that a new driver incentive program increased average earnings per trip by $1.20 based on a two-sample t-test. You are suspicious. What would you audit before accepting this result?

Q068Statistics & MetricsMid

A B2B SaaS company tracks monthly active users but the metric doubled after engineering changed the definition of 'active' from 'logged in' to 'performed any API call.' The old CEO dashboard shows a 2x spike. How do you handle this?

Q069Statistics & MetricsSenior

Your company's NPS score dropped from 52 to 44 in Q2. The VP of Customer Success insists the product team broke something. You have response-level data with timestamps, respondent segments, and verbatim comments. How do you investigate the root cause?

Q070Excel & SpreadsheetsMid

A finance analyst sends you a 15,000-row Excel pricing model with nested IFs, VLOOKUP chains, and hardcoded values scattered across six sheets. She asks you to add a new discount tier without breaking anything. What is your process?

Q071Excel & SpreadsheetsMid

Your sales operations team uses a shared Excel file to track pipeline across 12 regional reps. Each week, two reps accidentally overwrite each other's data. Leadership refuses to move to a proper CRM. What do you do?

Q072ETL & Data PrepSenior

An airline's data engineering team runs an Airflow DAG that loads 12 million flight records daily from an SFTP file drop into Snowflake. Three times in the past month, the file arrived 2 hours late, causing downstream BI reports to show yesterday's data. How do you redesign the pipeline to handle this reliably?

Q073ETL & Data PrepMid

You receive a CSV file from a third-party vendor containing customer address data that needs to be merged into your CRM. The CSV has 40,000 rows with inconsistent address formatting, duplicate emails, and mixed-case names. Walk through your cleaning process.

Q074ETL & Data PrepSenior

Your dbt project runs 280 models daily on BigQuery. A recent PR added 3 new models that caused the full run time to increase from 22 minutes to 68 minutes. How do you diagnose and fix the performance regression?

Q075Stakeholder CommunicationMid

A regional VP emails you at 9 PM saying the Q3 sales dashboard shows his region underperforming by 22% and he wants to present it to the board tomorrow morning. You know the data has a known processing lag. How do you respond?

Q076Stakeholder CommunicationSenior

Two senior business partners — the head of marketing and the head of finance — are each claiming your data supports their conflicting positions on whether a campaign was profitable. They want you to adjudicate. How do you handle this politically charged situation?

Q077Stakeholder CommunicationSenior

The CEO asks you to present 'the key takeaway' from a 6-month customer churn analysis to the board in 3 minutes. You have 40 slides of detailed analysis. How do you prepare and structure those 3 minutes?

Q078KPI & Metric DefinitionMid

A subscription edtech startup asks you to define their North Star metric. They have course completions, revenue, DAU, and NPS. Walk through your reasoning for choosing one and what makes it a good North Star.

Q079KPI & Metric DefinitionSenior

A marketplace platform defines GMV as its headline metric. The board is questioning whether GMV growth is real since cancellations and returns are growing equally fast. How do you redesign the metric framework?

Q080KPI & Metric DefinitionSenior

Your company measures engineering team productivity using 'story points delivered per sprint.' The engineering VP says this metric is being gamed — teams inflate story point estimates. How do you help them design a better metric system?

Q081A/B Testing & ExperimentationSenior

You are running an A/B test on a checkout flow for a furniture e-commerce site with 20,000 monthly transactions. After 3 weeks, the test variant shows a 14% lift in conversion but p=0.06. The CPO wants to ship it. How do you respond?

Q082A/B Testing & ExperimentationMid

You are asked to design an A/B test for a new email subject line to improve open rates. The current open rate is 22%. How do you determine sample size and duration, and what pitfalls do you flag to the team?

Q083A/B Testing & ExperimentationSenior

Your company runs 15 simultaneous A/B tests across the homepage. A post-hoc analysis finds 6 of them 'significant' at p<0.05. The growth team is celebrating. What is the problem and how do you fix the analysis?

Q084Data StorytellingMid

You have just completed a 3-month analysis showing that the company's highest-revenue segment is actually its least profitable when factoring in support costs and churn. Senior leadership has always publicly championed this segment. How do you present this finding?

Q085Data StorytellingSenior

You are presenting a demand-forecasting model to a retail board. The model has an 8% MAPE which is technically excellent, but in absolute terms it was off by $4.2 million in last quarter's peak season. A board member says 'your model was wrong by $4.2 million.' How do you respond?

Q086Data StorytellingSenior

You have three months of data showing a clear 2-standard-deviation anomaly in returns for one product category. You believe it is an early signal of quality defect, but the ops team dismisses it as 'normal variation.' How do you build a compelling case?

Q087Python & PandasMid

You are loading a 4GB CSV of clickstream data into pandas on a 16GB RAM laptop. The read_csv call throws a MemoryError. Walk through three concrete approaches to handle this without moving to a different tool.

Q088Python & PandasSenior

A data science team at an insurance company runs a weekly pandas pipeline that takes 6 hours on a single machine. The pipeline reads 80M policy records, computes rolling features, and outputs a model training dataset. How would you optimize it?

Q089Python & PandasMid

You need to merge two dataframes: transactions (5M rows, transaction_id, customer_id, amount) and customers (200K rows, customer_id, segment, region). After the merge you notice the transactions dataframe grew to 7M rows. What happened and how do you fix it?

Q090Data QualitySenior

You are the lead analyst for a media company. The ad revenue team discovers that their revenue reporting system has been double-counting a specific advertiser's spend for 8 months due to a pipeline bug. The error amounts to $3.2M in overstated revenue. How do you manage the remediation?

Q091Data QualityMid

A marketing analyst tells you that the new user sign-up count in your dashboard is 30% higher than what Salesforce shows for the same period. Both systems say they are the source of truth. How do you resolve the discrepancy?

Q092Data QualitySenior

You join a healthcare analytics team and discover that the patient data warehouse has no documented data lineage, no freshness SLAs, and no quality checks. Leadership wants a data quality framework built in 60 days. What do you build and in what order?

Q093Business Domain ModelingSenior

A telecom company wants to model customer lifetime value for its 8 million prepaid subscribers. Prepaid subscribers have no contract, irregular top-ups, and high churn. How do you approach the LTV model differently than for postpaid contract customers?

Q094Business Domain ModelingMid

A hotel chain asks you to model occupancy rate at the property level. You have 3 years of booking data but the chain has been expanding — 12 new hotels opened in the last year. How do you handle the cold-start problem for new properties?

Q095Business Domain ModelingSenior

A two-sided marketplace for freelancers defines 'a successful match' differently in its product team, finance team, and legal team. This causes three different match-rate metrics to diverge. You are asked to create a single canonical definition. How do you proceed?

Q096SQL & AnalyticsMid

A SaaS company wants to analyze funnel conversion: users go through Trial → Onboarding → Active → Churned. Some users skip steps. Write the approach to compute step-by-step and overall funnel conversion from an events table.

Q097Statistics & MetricsMid

You build a model to predict which customers will churn in the next 30 days. The model's accuracy is 96%. The product manager is thrilled. You are not. Explain why and what you would report instead.

Q098Dashboard DesignMid

You are asked to design a mobile-first weekly business review dashboard for store managers at a grocery chain. Each manager oversees 1 store. The dashboard must load in under 3 seconds on a 4G connection. What design choices do you make?

Q099ETL & Data PrepMid

A healthcare provider sends you patient appointment data in a flat CSV where multiple appointment dates for the same patient are stored as comma-separated values in a single field (e.g., '2024-01-05,2024-01-19,2024-02-03'). You need one row per appointment. Describe the transformation.

Q100Tableau / Power BIMid

A business analyst on your team built a Power BI report with 14 slicers on a single page. Users say the report is slow and confusing. How do you refactor it?

Q101Data StorytellingMid

You need to present a year-over-year revenue decline of 12% to a sales team that is already demoralized. How do you frame and structure the presentation to be honest, motivating, and actionable?

Q102KPI & Metric DefinitionMid

A marketplace for professional services uses average session duration as a proxy for user engagement. You argue this is the wrong metric. Make the case and propose two better alternatives.

Q103A/B Testing & ExperimentationSenior

A travel booking platform ran an experiment where the treatment group saw social proof ('47 people viewed this hotel today') on the booking page. The experiment ran for 4 weeks. Conversion rate improved 8% but revenue per booking dropped 4.3%. Overall revenue is unclear. How do you analyze and present this?

Q104Stakeholder CommunicationMid

The head of engineering says your analysis proving a data pipeline failure caused a revenue impact is wrong because the code was correct. You have query-level evidence showing the impact. How do you handle this disagreement?

Q105Python & PandasSenior

You are building a reusable Python library for your analytics team that standardizes how they compute cohort retention tables. Describe the API design, key implementation decisions, and how you would test it.

Q106Excel & SpreadsheetsSenior

A financial services firm runs a monthly regulatory reporting process entirely in Excel, with 3 analysts manually copying data between 22 spreadsheets. Errors occurred twice in the past year, leading to regulatory resubmissions. You are asked to improve the process without replacing Excel. How?

Q107Data QualityMid

You run a dbt test suite and notice that a 'not_null' test on orders.customer_id has been passing for 6 months. Today you discover that the ETL pipeline has been inserting a placeholder value of -1 for missing customer IDs rather than NULL. How do you respond?

Q108Stakeholder CommunicationSenior

You have built a rigorous churn prediction model. Legal raises concerns that the model's output could be used to discriminate against protected-class customers in service prioritization. How do you engage with legal and what changes do you make to the model or its use?

Q109Business Domain ModelingMid

A food delivery startup wants to measure 'delivery reliability' as a KPI. Different teams define it differently. Walk through how you would create a single operational definition and implement it in data.

Q110SQL & AnalyticsSenior

An energy company tracks power consumption per building every 15 minutes, producing 96 readings per building per day across 5,000 buildings. A downstream report needs the daily peak 30-minute demand (average of the two consecutive highest readings) per building. Design the query.

Q111Data VisualizationSenior

You are presenting user growth data for an investor update. MAU grew from 200K to 600K over 18 months. However, 70% of the growth came in the last 2 months due to a viral campaign. An investor asks whether growth is 'organic and sustainable.' How do you build the chart that honestly answers this?

Q112SQL & AnalyticsSenior

Your e-commerce company's monthly revenue SQL query suddenly returns results 18% lower than the finance team's Excel model for the same period. Both claim to be correct. How do you resolve the discrepancy and prevent it from recurring?

Q113SQL & AnalyticsMid

You are asked to calculate 7-day rolling average daily active users in BigQuery for a mobile gaming app. The event table has 2 billion rows and your query times out. Walk through how you would optimize it.

Q114SQL & AnalyticsSenior

A fintech startup asks you to build a cohort retention analysis showing 90-day retention by acquisition channel. The transactions table has 50 million rows with no user-level cohort table. How do you design and execute this from scratch in Snowflake?

Q115Data VisualizationMid

A product manager insists on using a 3D pie chart to present feature adoption rates across 8 product features in a board presentation. How do you handle this request?

Q116Data VisualizationSenior

You are designing a real-time operations dashboard for a logistics company tracking 12,000 daily shipments. The operations team complained the previous dashboard caused 'alert fatigue' — too many red indicators for issues that resolved themselves. How do you redesign it?

Q117Data VisualizationMid

Your bar chart comparing regional sales performance is being misread because one region serves enterprise accounts with deal sizes 10x larger than other regions. How do you visually communicate both volume and deal size simultaneously?

Q118Tableau / Power BISenior

Your Power BI report connected to a 200 million row Azure Synapse Analytics table takes 45 seconds to load. The business wants sub-5-second load times. Describe your optimization strategy without losing analytical depth.

Q119Tableau / Power BIMid

A Tableau dashboard you published last week suddenly shows data from two days ago instead of today. Your data source is a live connection to a PostgreSQL database. Walk through how you diagnose this.

Q120Tableau / Power BISenior

Your company is migrating 40 Tableau dashboards to Power BI after acquiring a firm that standardized on Microsoft stack. You have 8 weeks. What is your migration strategy and how do you handle the inevitable LOD calculation rewrites?

Q121Dashboard DesignSenior

The CEO asks for a single 'company health dashboard' that shows everything important. Marketing wants CAC and LTV. Finance wants ARR and burn rate. Product wants DAU/MAU. How do you scope and design this without creating an unreadable mess?

Q122Dashboard DesignMid

You are building a supply chain dashboard for warehouse managers who work 12-hour shifts and often access it on a 10-inch tablet. What specific design decisions do you make for this form factor and use case?

Q123Dashboard DesignSenior

Six months after launch, your customer support dashboard has been abandoned by the team it was built for. Usage analytics show it peaked at 40 views on day 1 and now gets 3 views per week. How do you investigate and revive it?

Q124Statistics & MetricsSenior

Your recommendation engine team claims a 15% lift in click-through rate from their new model. When you audit the experiment, you find the holdout group received no recommendations at all, not a previous model. Why is this a problem and how do you design a proper evaluation?

Q125Statistics & MetricsMid

A marketing team reports that email open rate jumped from 22% to 31% last month. Your manager asks if this is statistically significant or noise. How do you test this?

Q126Statistics & MetricsSenior

Your NPS score dropped from 52 to 44 between Q2 and Q3. The VP wants to know if this is a real signal or sampling variance. You ran surveys with 800 respondents in Q2 and 600 in Q3. Walk through your analysis.

Q127Excel & SpreadsheetsMid

You inherit a 15-tab Excel financial model from a departing analyst. You need to verify its accuracy before presenting results to the CFO next week. What is your audit approach?

Q128Excel & SpreadsheetsSenior

The sales operations team manages a 50,000-row CRM export in Excel. It is slow, breaks formulas when filtered, and loses data when multiple people edit. You've been asked to 'fix it'. What do you propose?

Q129ETL & Data PrepSenior

Your dbt pipeline that transforms raw Stripe payment events into a revenue model starts failing silently — it completes without errors but produces revenue numbers 8% lower than expected. How do you diagnose this?

Q130ETL & Data PrepMid

You receive a 2 GB CSV of customer transaction history from a vendor. It has 200,000 rows with inconsistent date formats (some MM/DD/YYYY, some YYYY-MM-DD, some 'Jan 15, 2024'). Walk through your cleaning approach.

Q131ETL & Data PrepSenior

You are building an Airflow pipeline to load daily sales data from an on-premise Oracle database into Snowflake. The Oracle team warns that queries during business hours slow down their OLTP system. How do you design around this constraint?

Q132Stakeholder CommunicationSenior

You present analysis showing that a key product feature's engagement has been declining 12% month-over-month for 5 consecutive months. The product VP dismisses your finding as 'seasonal' without providing any seasonality evidence. How do you handle this professionally?

Q133Stakeholder CommunicationMid

A sales director asks you to 'prove' that their team's new training program increased deal close rates. Before looking at the data, what questions do you ask to scope this analysis properly?

Q134Stakeholder CommunicationSenior

Finance asks for a report by Friday. Product asks for a dashboard by Thursday. Your manager has given you a high-priority data quality investigation that is already two days in. How do you manage this conflict?

Q135KPI & Metric DefinitionSenior

Your SaaS company's Customer Success team reports 94% retention. The CEO's board deck shows 87% retention for the same period. Both teams swear their number is correct. How do you resolve this and establish a single source of truth?

Q136KPI & Metric DefinitionMid

You are asked to create a single KPI to measure the health of a two-sided marketplace (buyers and sellers). What metric do you propose and why?

Q137KPI & Metric DefinitionSenior

The growth team wants to measure 'activation' for a B2B project management SaaS. Three team members have three different definitions. How do you facilitate alignment and land on a measurable, predictive definition?

Q138A/B Testing & ExperimentationSenior

Your experimentation platform shows a pricing page A/B test reached 95% statistical significance after 3 days with a 12% lift in conversions. Your head of growth wants to ship immediately. What is your recommendation and why?

Q139A/B Testing & ExperimentationMid

You run an A/B test on an email subject line and get a 28% open rate for variant A and 24% for variant B, but your sample size was only 300 per variant. How do you communicate this result to the marketing team?

Q140A/B Testing & ExperimentationSenior

You are analyzing an experiment where the control and treatment groups have significantly different pre-experiment conversion rates due to a randomization bug that assigned more high-intent users to treatment. How do you salvage the analysis?

Q141Data StorytellingSenior

You have 3 months of churn analysis that clearly shows customer support response time above 4 hours is the strongest predictor of churn. The support team lead disagrees and attributes churn to pricing. How do you present this finding persuasively without creating organizational conflict?

Q142Data StorytellingMid

Your analysis is technically correct but the 15-slide deck you prepared is putting executives to sleep during your review. What do you change for the next presentation?

Q143Data StorytellingSenior

You are presenting an uncomfortable finding: a flagship product's core metric has been flat for two years despite significant investment. How do you present this to a room of senior leaders who are emotionally invested in the product?

Q144Python & PandasMid

You are reading a 5 GB CSV into pandas and your laptop's kernel crashes from memory. How do you handle a dataset larger than RAM in Python without switching to a different language?

Q145Python & PandasSenior

You need to merge two pandas DataFrames: 10 million rows of transaction data with 500,000 rows of customer demographics. After the merge, your row count jumps from 10 million to 14 million. Diagnose and fix this.

Q146Python & PandasSenior

You are using a for-loop to apply a function row-by-row to a 2 million row pandas DataFrame. It takes 45 minutes to run. How do you reduce execution time without rewriting in a different language?

Q147Data QualitySenior

Three weeks after a data pipeline migration to a new vendor's warehouse, the head of marketing reports that their campaign attribution numbers look 'off'. You don't know what 'off' means yet. How do you investigate systematically?

Q148Data QualityMid

You notice that 15% of rows in a customer table have NULL values in the 'country' field. Before deciding how to handle them, what questions do you ask?

Q149Data QualitySenior

You are implementing data quality monitoring for a critical revenue reporting pipeline that processes $50M in monthly transactions. What framework do you put in place, and how do you avoid alert fatigue?

Q150Business Domain ModelingSenior

A healthcare analytics team asks you to model patient 'engagement' as a single metric. Patients include both preventive care users and chronic disease management users. Why is a single engagement metric problematic and how do you propose to model it?

Q151Business Domain ModelingMid

An edtech startup wants to model 'learning progress' for students taking asynchronous video courses. Three PMs have three definitions. How do you determine the right one?

Q152Business Domain ModelingSenior

You join a Series B marketplace startup as their first senior analyst. There is no data warehouse, no dbt, no standard metric definitions. The CEO wants a 'north star metric' by end of month. How do you approach this?

Q153SQL & AnalyticsMid

You are asked to identify the top 10% of customers by lifetime value, but the orders table contains duplicated order IDs due to a known ETL bug. How do you write the query correctly despite the duplicates?

Q154Tableau / Power BIMid

A regional sales manager modifies a published Tableau dashboard to fit their team's view and accidentally breaks a key calculated field for all 200 other users. How should your team have prevented this?

Q155Statistics & MetricsMid

Your company's average order value (AOV) increased by $12 last quarter. The CEO is excited and mentions it in an earnings call. You notice the number might be misleading. What issue are you spotting?

Q156Dashboard DesignMid

You are designing a marketing performance dashboard. The CMO wants 'real-time' data. Your data warehouse refreshes every 4 hours. How do you handle this expectation gap?

Q157KPI & Metric DefinitionMid

Your company tracks Monthly Active Users (MAU) as its primary engagement metric. A product manager argues this metric is masking declining quality engagement. How do you evaluate this claim?

Q158ETL & Data PrepMid

You receive a customer data file from a partner where all monetary values are stored as strings with currency symbols ('$1,234.56', '€2,100.00', '£890.00'). You need to convert these to numeric values for analysis. Describe your approach.

Q159A/B Testing & ExperimentationMid

You run an A/B test on a checkout flow redesign. Conversion rate improves 8% but average order value drops 5%. How do you present this result to the product team?

Q160Data StorytellingMid

You are presenting monthly sales performance to a regional team of 25 non-technical account managers. They are intimidated by data. How do you structure the session to make it engaging and actionable?

Q161Excel & SpreadsheetsMid

You are asked to automate a monthly reporting process that currently takes 3 hours of manual copy-paste between Excel and a Salesforce export. What options do you evaluate?

Q162Stakeholder CommunicationSenior

An executive sponsors a data initiative that your analysis shows will not achieve its stated ROI target of $2M based on your modeled assumptions. They have already announced it internally. How do you handle this?

Q163SQL & AnalyticsSenior

You need to calculate day-over-day retention for a mobile app across 90 days for 500,000 users. The raw events table has 300 million rows. Write the query strategy and explain the performance choices.

Q164Data QualityMid

Your user registration table shows 2,000 users with the email address 'test@test.com'. How do you handle these in an analysis of user acquisition trends?

Q165Business Domain ModelingSenior

A media company wants to model 'content performance' across articles, podcasts, and short videos. Each format has different consumption patterns and monetization models. How do you create a unified performance framework?

Q166SQL & AnalyticsSenior

Your e-commerce company's weekly revenue SQL query runs in 45 seconds on Redshift, blocking the Monday morning dashboard refresh. The query joins orders, customers, and product tables across 500M rows. Walk through your systematic optimization approach and the specific changes you'd make.

Q167SQL & AnalyticsMid

A product manager asks you to calculate 30-day rolling retention for a mobile gaming app. The events table has user_id, event_date, and event_type columns with about 2M rows per day. Write the logic and explain the edge cases you must handle.

Q168SQL & AnalyticsSenior

You're a senior analyst at a SaaS company. Finance needs MRR waterfall: new MRR, expansion MRR, contraction MRR, and churn MRR broken out monthly. Your subscriptions table has plan changes but no explicit churn flag. How do you derive the four components reliably?

Q169Data VisualizationMid

You're presenting monthly sales trends to a regional VP. You've built a line chart showing 12 months of data, but the VP says it's 'hard to read' and asks for a bar chart instead. How do you decide which chart type to use, and how do you handle the disagreement professionally?

Q170Data VisualizationSenior

A fintech startup's investor deck shows a chart with a dual Y-axis comparing weekly transaction volume and average ticket size. The axes are scaled so both lines appear to move in near-perfect sync. The CEO wants to use this chart to argue causality. What's the problem and how do you fix it?

Q171Tableau / Power BISenior

Your Power BI report used by 200 sales reps refreshes once daily from an Azure Synapse warehouse. Reps are complaining that numbers don't match what they see in Salesforce. You trace the issue to a complex DAX measure. Walk through your debugging process.

Q172Tableau / Power BIMid

You've built a Tableau dashboard for a logistics company showing on-time delivery rates by region. Users report that clicking a region filter doesn't update the KPI summary tiles at the top of the dashboard. What's most likely causing this and how do you fix it?

Q173Tableau / Power BISenior

A healthcare analytics team wants to migrate 40 Tableau dashboards to Power BI in 90 days. You're leading the migration. What are the three highest-risk areas and how do you mitigate them?

Q174Dashboard DesignMid

You're designing a real-time operations dashboard for a call center with 500 agents. The operations manager wants to show 25 metrics on a single screen. How do you push back and structure the design?

Q175Dashboard DesignSenior

Six months after launch, your executive dashboard is used by only 3 of the 12 intended users. The other nine say it's 'not useful.' You have two weeks to diagnose and fix this. What's your process?

Q176Dashboard DesignSenior

You're designing a multi-tenant SaaS dashboard where each customer sees only their own data but the underlying dataset is shared. The engineering team wants to use row-level security in Tableau. What are the implementation risks and how do you test it correctly?

Q177Statistics & MetricsSenior

Your streaming platform's data science team ran an A/B test on a new recommendation algorithm. The test shows a statistically significant 4% lift in 7-day retention (p=0.02). The product lead wants to ship immediately. You're skeptical. What red flags do you probe for before agreeing?

Q178Statistics & MetricsMid

A marketing analyst presents a cohort analysis showing that users who receive email campaigns have 40% higher LTV than users who don't. The CMO wants to triple the email budget based on this finding. What's the statistical problem and what analysis would you run instead?

Q179Statistics & MetricsSenior

You're analyzing North Star metric data for a B2B SaaS product. The weekly active accounts metric is flat for three months despite new features shipping. Engineering says the metric is broken; product says users just aren't adopting. How do you diagnose which is true?

Q180Excel & SpreadsheetsMid

You inherit an Excel model from a colleague who left the company. The monthly P&L workbook has 15 sheets, circular references, and volatile functions like INDIRECT across thousands of cells. The CFO needs it updated by Friday. How do you approach this safely?

Q181Excel & SpreadsheetsSenior

Your team's weekly sales reconciliation process takes three analysts half a day each using Excel. It involves VLOOKUPs across five workbooks, manual copy-paste, and frequent errors. You have a mandate to fix it but no engineering resources. What do you build?

Q182ETL & Data PrepSenior

You're responsible for an Airflow pipeline that ingests Salesforce data into Snowflake nightly. Last night the DAG failed at 2 AM and the sales team has no data for their 8 AM standup. Walk through your incident response and the changes you make to prevent recurrence.

Q183ETL & Data PrepMid

You're ingesting a vendor CSV file that arrives daily with inconsistent date formats — sometimes MM/DD/YYYY, sometimes YYYY-MM-DD, sometimes written as 'January 5, 2024.' How do you handle this robustly in a production pipeline?

Q184ETL & Data PrepSenior

Your dbt project has grown to 300 models across five domains. Build times are exceeding 40 minutes and the data engineering team is complaining. You have one week to cut build time by 50%. What's your plan?

Q185Stakeholder CommunicationMid

You've completed a two-week analysis showing that a major product initiative has had zero measurable impact on revenue. The product VP who championed the initiative is in the room when you present. How do you deliver this finding?

Q186Stakeholder CommunicationSenior

Three different business units — marketing, sales, and finance — are each requesting conflicting definitions for 'monthly active customer.' You're the analytics lead. How do you resolve this without losing credibility with any of the three teams?

Q187Stakeholder CommunicationSenior

An executive sponsor forwards you a competitor's quarterly report showing they grew 40% YoY while your company grew 10%. They want you to explain the gap by end of week. You have no access to their internal data. What analysis do you actually run?

Q188KPI & Metric DefinitionSenior

You're joining a Series B marketplace startup as their first data analyst. The CEO uses 'GMV,' the CFO uses 'net revenue,' and the head of growth uses 'paid GMV.' None of these are defined anywhere. In your first 30 days, how do you establish a metric governance foundation?

Q189KPI & Metric DefinitionMid

Your subscription box company defines churn as 'customers who cancel their subscription.' A new analyst joins and defines it as 'customers who don't renew.' The monthly churn numbers differ by 3 percentage points. Which is right and how do you adjudicate?

Q190A/B Testing & ExperimentationSenior

Your company runs 50 concurrent A/B tests on the same user base. The VP of Product wants to ship the top 5 winning tests this week. As the analytics lead, what problems does this create and how do you manage it?

Q191A/B Testing & ExperimentationMid

You're analyzing an A/B test on a checkout flow. The test showed a 12% lift in conversion rate but a 4% drop in average order value. Overall revenue per visitor is essentially flat. How do you interpret this and what do you recommend?

Q192A/B Testing & ExperimentationSenior

A growth team at a ride-hailing company ran a surge pricing A/B test. After 3 weeks they found a significant positive effect on rides completed but cannot get the experiment unit right — some users appear in both control and treatment groups. How do you salvage the analysis?

Q193Data StorytellingMid

You've done a 3-month analysis on customer churn. Your findings are in a 45-slide deck. Your time slot in the leadership QBR is 10 minutes. How do you prepare and what do you actually show?

Q194Data StorytellingSenior

You discover that a supply chain disruption is driving a metric drop your company has been attributing to a recent product change. Engineering has already rolled back the product change based on the original (incorrect) analysis. How do you handle this situation?

Q195Data StorytellingSenior

You're presenting a multi-year customer LTV model to the board of a consumer fintech. The model has 12 assumptions. One board member with a quant background starts drilling into your discount rate assumption. How do you handle the scrutiny without losing the room?

Q196Python & PandasMid

You're reading a 4GB CSV file into pandas and your machine runs out of memory halfway through. You need to compute the average transaction amount grouped by merchant_category. Walk through your solutions in order of practicality.

Q197Python & PandasSenior

You're building a Python-based data quality pipeline for a retail company's daily sales feed. The feed has 50 columns and ~500K rows per day. How do you design automated checks that are both comprehensive and maintainable over a 2-year horizon?

Q198Python & PandasSenior

You're handed a pandas codebase that takes 3 hours to process a daily analytical job. The code uses iterrows() extensively. How do you systematically improve performance without breaking the logic?

Q199Data QualitySenior

Your company's marketing attribution model is showing 130% of conversions attributed (more than 100%). The model uses last-touch attribution. The data engineering team says the data is fine. How do you investigate?

Q200Data QualityMid

A junior analyst reports that your customer table has 15% null values in the 'country' field, but the field was marked as required in the source application. Where do the nulls come from and how do you remediate them?

Can you defend these answers under follow-up pressure?

Book a mock interview with a senior Data Analyst / BI Analyst mentor — structured scorecard, replay, and a gap plan.

Book a Mock Interview →
FREE PROFILE AUDIT

Book your free audit

Tell us where you are — a senior mentor reviews your profile and shows you exactly what's blocking interview calls. Only name, email and role are required; the more you share, the sharper your audit. No spam, no obligation.

A FEW MORE DETAILS (OPTIONAL)
I want

* required · Prefer talking? WhatsApp +91 83598 96054 or email connect@hikecatalyst.com

📄 Score My Resume