Christmas sale is live!

Avail Now

Databricks Interview Questions

Prepare for success with our curated collection of interview questions. Designed to help students practice and build confidence, these questions cover a range of topics and real-world scenarios to get you ready for your next interview.
Q1: IP to CIDR

Given a start IP address and a number n, generate the minimum list of CIDR blocks that will cover the range of n IPs starting from the start IP.

 

Example:

Input: startIP = “255.0.0.7”, n = 10

Output: [“255.0.0.7/32″,”255.0.0.8/29″,”255.0.0.16/32”]

Explanation: The minimum number of CIDR blocks that can represent a range of IPs, taking network boundaries into account.

Given a binary tree where each node has a unique value and a target value k, find the value of the closest leaf node to the target.

 

Example:

Input: root = [1,2,3,4,null,null,null], k=2

Output: 4

Explanation: The closest leaf to node 2 is node 4.

Find the smallest missing positive integer in an unsorted integer array. 

 

Example:

Input: nums =​

Output: 3

Explanation: Smallest missing positive is 3.

Given two sparse matrices, compute their product efficiently.

 

Example:

Input: A = [,]​ B = [,,]​

Output: [,]​

Explanation:  Standard matrix multiplication, but optimized for sparsity.

Design a data structure that supports set, snap, and get operations efficiently. 

 

Example:

Input: set(0,5), snap(), set(0,6), get(0,0)

Output: 5

Explanation: You can take snapshots of the array, and get historical values.

Given an integer array, increment elements as needed (as many times as required) so that all values are unique. Minimize the total number of increments. 

Example:

Input: nums =​

Output: 6

Explanation: Increment two 1’s to 4 and 5: result , moves = (2-1) + (1-2) = 6.​

Design a set where insert, delete, and getRandom (return a random element) all run in average O(1) time. 

 

Example:

Input / Usage: insert(1) → True

insert(2) → True
getRandom() → 1 or 2
remove(1) → True
getRandom() → 2

Output: Random element; accurate insert/remove success/failure.

Explanation: Use a hash table and dynamic array.

Given an elevation map (non-negative integers), compute how much water can be trapped after rainfall. 

 

Example:

Input: height =​

Output: 6

Explanation: Valleys between taller bars can trap water; the sum of the trapped water across all valleys is 6.

Given a binary array, count subarrays where the number of 0’s and 1’s are equal. 

 

Example:

Input: arr =​

Output: 9

Explanation: Count all subarrays and count those where 1’s = 0’s.

Count the subarrays which contain all distinct elements from the original array.

 

Example:

Input: arr =​

Output: 5

Explanation: Check all subarrays, count those with all three values {1,2,3}.

Given the root of a binary tree and a target node, return the value of the closest leaf node to the target. 


Example:
Input:
root = [1,2,3,null,null,4,5], target = 2

Output: 2

Explanation: Target is already a leaf.

Given an array, find the maximum in each subarray (window) of size k.

 

Example:

Input: nums = [1,3,-1,-3,5,3,6,7], k = 3

Output:
Explanation: Find max for each window by efficiently using deques.

Q1. Web Crawler Multithreaded

Design a multithreaded web crawler that can crawl all reachable URLs from a given starting point.

 

Example:

Input: startUrl = “http://databricks.com

Output: All reachable URLs from the starting URL.

Explanation: Concurrency control and visited URL marking are central to this system.

Implement a system to add scores, reset users, and query the top K scores. 

 

Example:

Input: addScore(1,73), addScore(2,56), addScore(3,39), addScore(4,51), addScore(5,4), top(1), reset(1), top(2)

Output: ,​

Explanation: Track, update, and query scores with efficiency.

Design a system to count the number of hits received in the past 5 minutes. 

 

Example:

Input: hit(1), hit(2), hit(300), getHits(300), getHits(301)

Output: 3, 2

Explanation: Count is maintained in a time window.

Repeat of DSA but also expect to design using stacks for patience sort and water-trapping logic.

 

Example:

Input: height =​

Output: 6

Explanation: Algorithm uses a stack for efficient traversal.

Design a data structure supporting set(key, value, timestamp) and get(key, timestamp), returning value at or before given timestamp. 


Example:

Input: set(“foo”,”bar”,1)
get(“foo”,1) → “bar”
get(“foo”,3) → “bar”

Output: “bar”, “bar”

Explanation: Use hash-map of arrays and binary search for efficient retrieval.

Implement a Skiplist for O(log n) insert, erase, and search operations.

 

Example:

Input / Usage: add(1)
erase(0)
search(1) → True

Output: True/False for existence

Explanation: Skiplist uses probabilistic levels for fast search/update/delete.

Q1. Data Lakehouse Architecture

Explain and design a hybrid data architecture combining the best of data warehouses and data lakes (Lakehouse).

Design how code, data, and output can be shared, versioned, and collaborated-on by teams within Databricks notebooks.

Design an end-to-end data streaming pipeline: ingest, process, store, and serve streaming data (using Structured Streaming, Delta Lake, and scalable clusters).

Describe your approach to designing a pipeline for big data ingestion, transformation, storage, and analytics using Databricks and Spark.

WhatsApp Icon

Hi Instagram Fam!
Get a FREE Cheat Sheet on System Design.

Hi LinkedIn Fam!
Get a FREE Cheat Sheet on System Design

Loved Our YouTube Videos? Get a FREE Cheat Sheet on System Design.