What to expect in a data analyst interview

Graduate data analyst interviews typically have three layers: a technical skills round (SQL, Excel, statistics, sometimes Python or R), an analytical thinking round (case-style questions testing how you frame and solve problems with data), and a behavioural round (communication, stakeholder management, and working with ambiguity).

Many roles also include a take-home exercise — a dataset you'll be asked to clean, analyse, and present findings from. If you get one, treat it seriously: it's often weighted as heavily as the interview itself.

The non-obvious thing about data analyst interviews

Technical skills are the entry ticket. What separates strong candidates is their ability to communicate insights clearly to non-technical stakeholders. Practice explaining your findings as if the person you're talking to has never opened a spreadsheet.

Technical skills

Question 01
"What SQL queries do you use most often? Can you walk me through an example?"
What they're really asking
SQL is the most fundamental data analyst skill. Do you actually use it, or have you just listed it on your CV?
How to answer it
Be specific. Walk through a query you've actually written — even from a personal project or coursework. Cover the core building blocks you're confident with: SELECT, WHERE, GROUP BY, ORDER BY, JOIN types (INNER, LEFT, RIGHT), aggregate functions (COUNT, SUM, AVG), and subqueries. If you're comfortable with window functions (ROW_NUMBER, RANK, LAG/LEAD), mention them — they signal a stronger level. Give a real example: "I used a LEFT JOIN and GROUP BY to find customers who had placed orders in the previous quarter but not the current one."
Question 02
"What is the difference between a LEFT JOIN and an INNER JOIN?"
What they're really asking
A classic SQL screen question. Nearly every data analyst interview includes it.
How to answer it
An INNER JOIN returns only the rows where there is a match in both tables. A LEFT JOIN returns all rows from the left table and the matching rows from the right table — where there's no match, NULL values fill in for the right table columns. Use INNER JOIN when you only want matched records; use LEFT JOIN when you want to keep all records from the primary table regardless of whether they have a match. A common use case: LEFT JOIN to find records in one table that have no corresponding entry in another (filtering for NULLs after the join).
Question 03
"How would you handle missing or null values in a dataset?"
What they're really asking
Real-world data is messy. Do you have a thoughtful approach to cleaning it?
How to answer it
The right approach depends on the context — which is the most important thing to say first. Options include: removing rows with nulls (if nulls are rare and random), imputing values (replacing with mean, median, or mode — appropriate for numerical fields with few nulls), using a sentinel value (e.g. 0 or "Unknown" for categorical fields), or keeping nulls if they're meaningful (a null in a "cancellation date" field means the order hasn't been cancelled). Always investigate why values are missing before deciding — systematic missingness is very different from random missingness.
Question 04
"What is the difference between mean, median, and mode? When would you use each?"
What they're really asking
Basic statistics — but the "when to use each" part is what separates candidates who understand it from those who just memorised the definitions.
How to answer it
The mean is the arithmetic average — sensitive to outliers, best for normally distributed data. The median is the middle value — robust to outliers, better for skewed distributions (e.g. income data, house prices). The mode is the most frequent value — useful for categorical data or understanding the most common outcome. Classic example: reporting average salary is misleading if a few executives earn ten times the rest of the workforce — median is the more honest measure of "typical" pay.
Question 05
"How would you check whether two variables are correlated?"
What they're really asking
Can you apply statistical thinking to a practical question?
How to answer it
Start visually — a scatter plot is often the fastest way to spot a relationship. Then use a correlation coefficient: Pearson's r for linear relationships between continuous variables, Spearman's rank for monotonic relationships or ordinal data. Values close to +1 or -1 indicate strong correlation; close to 0 indicates weak or no linear relationship. Critical caveat: correlation doesn't imply causation. Always mention this — it signals statistical maturity. Ice cream sales and drowning rates are correlated; both are caused by hot weather.
Question 06
"What visualisation tools have you used and what makes a good data visualisation?"
What they're really asking
Can you communicate data effectively — not just produce charts?
How to answer it
Name the tools you've actually used — Tableau, Power BI, Excel, Python (matplotlib/seaborn), Looker, Google Data Studio. Then answer the "what makes a good visualisation" part carefully: the right chart type for the data shape (bar charts for comparison, line charts for trends, scatter plots for relationships), a clear title that tells the insight not just the subject, minimal chart junk, and an audience-appropriate level of complexity. The best visualisations answer one question very clearly — they don't try to show everything at once.
Question 07
"What is the difference between a dimension and a metric in data analysis?"
What they're really asking
Foundational data modelling concept — especially relevant for BI tool users.
How to answer it
A dimension is a categorical attribute used to slice and filter data — things like country, product category, customer segment, or date. A metric (or measure) is a quantitative value you aggregate — revenue, number of orders, session duration. In a typical analysis you group by dimensions and aggregate metrics: "total revenue (metric) by product category (dimension) and month (dimension)." Getting this distinction wrong in a BI context will cause real problems — it's worth knowing cold.
Question 08
"Walk me through how you would approach a data cleaning task."
What they're really asking
Data cleaning is 60–80% of a junior analyst's job. Do you have a systematic approach?
How to answer it
Walk through your process: first understand the data — check dimensions, data types, and summary statistics. Identify issues — nulls, duplicates, inconsistent formatting (e.g. "USA" vs "United States"), outliers, incorrect data types. Prioritise by impact — not all issues need fixing, only those that affect your analysis. Document every change — so someone else (or future you) can reproduce or audit your work. Validate after cleaning — spot-check that key totals still make sense. The key message: you treat data cleaning as a structured process, not a frantic fix.

Practise explaining your analysis out loud

Technical knowledge only converts to job offers when you can explain it clearly under pressure. InterviewZap helps you practise data analyst questions — both technical and behavioural — with instant feedback.

Start Practising Free →

No credit card. Free to start.

Analytical thinking

Question 09
"How would you define and measure the success of a new product feature?"
What they're really asking
Can you connect data to business decisions — not just run queries?
How to answer it
Start by clarifying the feature's purpose — what problem does it solve and for whom? Then define metrics that reflect that purpose: adoption rate (are users finding it?), engagement (are they using it repeatedly?), and impact on a downstream business metric (does it improve retention, revenue, or task completion rate?). Define a baseline before launch and establish a time window for evaluation. Mention the importance of not just tracking the primary metric — watch for unintended effects on adjacent metrics too.
Question 10
"If a key metric dropped significantly last week, how would you investigate it?"
What they're really asking
This is one of the most common analytical case questions. They want to see a structured, systematic approach.
How to answer it
Use a structured debugging approach: first confirm the data is accurate — check for tracking issues, pipeline failures, or reporting bugs before assuming something real happened. Narrow the scope — segment by geography, platform, user type, product area to isolate where the drop occurred. Check for external causes — did anything change in the product, marketing, or competitive landscape that week? Form a hypothesis and test it. The key message: you move methodically from "is this real?" to "where is it?" to "why?" rather than jumping straight to conclusions.
Question 11
"What is the difference between correlation and causation? Give an example."
What they're really asking
One of the most important concepts in data analysis — frequently misapplied in real business decisions.
How to answer it
Correlation means two variables move together. Causation means one causes the other. They're not the same. Classic example: users who read more help articles have higher retention rates — but this doesn't mean forcing users to read articles will improve retention. A third factor (user engagement and motivation) drives both. To establish causation you need controlled experiments (A/B tests) or careful natural experiments — observational data alone is rarely sufficient. Analysts who confuse the two lead businesses into expensive, ineffective decisions.
Question 12
"How would you design an A/B test to evaluate a change to a sign-up flow?"
What they're really asking
Experimental design is a core analyst skill. Do you understand the principles?
How to answer it
Walk through the key steps: define the hypothesis (changing X will increase sign-up completion by Y%), choose one primary metric (sign-up conversion rate), randomly split users into control and treatment groups, calculate the required sample size before running (based on expected effect size and desired statistical power — typically 80%), run for a pre-determined duration (don't stop early when you see a result you like — that inflates false positives), then analyse results and check for statistical significance. Mention guardrail metrics — are there other KPIs you'd monitor to make sure the change doesn't hurt something else?
Question 13
"How do you prioritise what to analyse when you have multiple requests from stakeholders?"
What they're really asking
Analysts are always overloaded. Can you manage demand and push back constructively?
How to answer it
Prioritise by business impact and time-sensitivity. For each request, ask: what decision will this analysis inform, how significant is that decision, and when does the decision need to be made? Not all requests are equal — a one-off data pull for curiosity is different from analysis that will drive a major product decision. Communicate your prioritisation transparently and set realistic timelines. If two things are genuinely equal priority, escalate to your manager rather than guessing.

Behavioural questions

Use the STAR method for these.

Question 14
"Tell me about a time you turned data into a clear recommendation."
What they're really asking
Analysis that doesn't drive a decision is wasted. Can you close the loop?
How to answer it
Describe a specific project — even from coursework or a personal data project. What was the business question? What did you find in the data? What did you recommend and why? The recommendation is the point — not the technical process. If the recommendation was acted on and produced a result, say so. If you're newer to the field, frame a coursework project in terms of "if this were a real business, here's the decision I would have recommended based on the data."
Question 15
"Describe a time you had to explain a complex analysis to a non-technical audience."
What they're really asking
Communication is as important as technical skill. Can you translate?
How to answer it
Describe the audience, the analysis, and how you adapted your communication. Key techniques: leading with the conclusion rather than the methodology, using analogies, showing a single clear chart instead of a dashboard, and anticipating the questions a non-technical person would ask. The measure of success: did they understand and could they make a decision from it? If possible, describe a specific moment where your explanation changed someone's view or action.
Question 16
"Tell me about a time your analysis produced a surprising or counterintuitive result."
What they're really asking
Do you have intellectual curiosity? Do you trust the data even when it challenges your assumptions?
How to answer it
Describe a genuine example — a result that contradicted what you expected or what stakeholders assumed. Walk through how you validated it (checked for data errors before accepting the result), communicated it, and what happened next. The best answers show that you didn't just accept the surprising result at face value, but also didn't dismiss it. Intellectual honesty and rigour in both directions is what interviewers want to see.
Question 17
"How do you approach a problem when you don't have the data you need?"
What they're really asking
Real-world data is always incomplete. Can you work with constraints?
How to answer it
Be honest about constraints rather than pretending data doesn't exist. Options include: using proxy metrics, making explicit assumptions and testing sensitivity to those assumptions, seeking alternative data sources, or scoping the analysis to what can be answered reliably. Communicate clearly about what you can and cannot conclude with the available data — overconfident analysis with bad data is worse than no analysis at all.
Question 18
"Tell me about a personal or side project involving data."
What they're really asking
Are you genuinely curious about data beyond what you've been asked to do?
How to answer it
Side projects carry enormous weight here — even something small. A Kaggle competition, a personal dataset you explored (your Spotify history, football results, local housing data), a dashboard you built, or a blog post where you wrote up an analysis. The point isn't the sophistication of the project — it's that you did it because you found it genuinely interesting. That signals the kind of curiosity that makes a good analyst.
Question 19
"How do you make sure your analysis is reproducible?"
What they're really asking
Professional data work needs to be auditable and repeatable. Do you think about this?
How to answer it
Describe your practices: writing documented, commented code rather than one-off scripts; using version control (Git); keeping raw data separate from transformed data; documenting assumptions and data sources; using parameterised queries rather than hardcoded values. Even if you're early in your career, showing that you think about reproducibility signals professional maturity. Analysts who can't reproduce their own work six months later are a liability.
Question 20
"Do you have any questions for us?"
What they're really asking
Are you genuinely interested in this role and the data environment?
How to answer it
Strong questions for data analyst roles: "What does the data infrastructure look like — what tools and databases does the team work with day to day?" / "How mature is the data culture here — do stakeholders come to the team with questions, or is the team still building trust and proving value?" / "What does a typical first three months look like for someone in this role?" / "What's the most interesting analysis the team has done in the past year?"
One more thing

Build a portfolio before you start applying. Even two or three well-documented analyses on GitHub or a personal site — with clear write-ups explaining your thinking — will set you apart from the majority of candidates who only list tools on their CV.