Databricks Interview Questions
- DSA
- LLD
- HLD
Q1: IP to CIDR
Given a start IP address and a number n, generate the minimum list of CIDR blocks that will cover the range of n IPs starting from the start IP.
Example:
Input: startIP = “255.0.0.7”, n = 10
Output: [“255.0.0.7/32″,”255.0.0.8/29″,”255.0.0.16/32”]
Explanation: The minimum number of CIDR blocks that can represent a range of IPs, taking network boundaries into account.
Q2: Closest Leaf in a Binary Tree
Given a binary tree where each node has a unique value and a target value k, find the value of the closest leaf node to the target.
Example:
Input: root = [1,2,3,4,null,null,null], k=2
Output: 4
Explanation: The closest leaf to node 2 is node 4.
Q3: First Missing Positive
Find the smallest missing positive integer in an unsorted integer array.
Example:
Input: nums =
Output: 3
Explanation: Smallest missing positive is 3.
Q4: Sparse Matrix Multiplication
Given two sparse matrices, compute their product efficiently.
Example:
Input: A = [,] B = [,,]
Output: [,]
Explanation: Standard matrix multiplication, but optimized for sparsity.
Q5: Snapshot Array
Design a data structure that supports set, snap, and get operations efficiently.
Example:
Input: set(0,5), snap(), set(0,6), get(0,0)
Output: 5
Explanation: You can take snapshots of the array, and get historical values.
Q6: Minimum Increment to Make Array Unique
Given an integer array, increment elements as needed (as many times as required) so that all values are unique. Minimize the total number of increments.
Example:
Input: nums =
Output: 6
Explanation: Increment two 1’s to 4 and 5: result , moves = (2-1) + (1-2) = 6.
Q7: Insert Delete GetRandom O(1)
Design a set where insert, delete, and getRandom (return a random element) all run in average O(1) time.
Example:
Input / Usage: insert(1) → True
insert(2) → True
getRandom() → 1 or 2
remove(1) → True
getRandom() → 2
Output: Random element; accurate insert/remove success/failure.
Explanation: Use a hash table and dynamic array.
Q8: Trapping Rain Water
Given an elevation map (non-negative integers), compute how much water can be trapped after rainfall.
Example:
Input: height =
Output: 6
Explanation: Valleys between taller bars can trap water; the sum of the trapped water across all valleys is 6.
Q9: Count Subarrays with Equal Number of 1’s and 0’s
Given a binary array, count subarrays where the number of 0’s and 1’s are equal.
Example:
Input: arr =
Output: 9
Explanation: Count all subarrays and count those where 1’s = 0’s.
Q10: Count Subarrays with Total Distinct Elements Same as Original Array
Count the subarrays which contain all distinct elements from the original array.
Example:
Input: arr =
Output: 5
Explanation: Check all subarrays, count those with all three values {1,2,3}.
Q11: Closest Leaf in a Binary Tree
Given the root of a binary tree and a target node, return the value of the closest leaf node to the target.
Example:
Input: root = [1,2,3,null,null,4,5], target = 2
Output: 2
Explanation: Target is already a leaf.
Q12: Sliding Window Maximum
Given an array, find the maximum in each subarray (window) of size k.
Example:
Input: nums = [1,3,-1,-3,5,3,6,7], k = 3
Output:
Explanation: Find max for each window by efficiently using deques.
Q1. Web Crawler Multithreaded
Design a multithreaded web crawler that can crawl all reachable URLs from a given starting point.
Example:
Input: startUrl = “http://databricks.com“
Output: All reachable URLs from the starting URL.
Explanation: Concurrency control and visited URL marking are central to this system.
Q2. Design Leaderboard
Implement a system to add scores, reset users, and query the top K scores.
Example:
Input: addScore(1,73), addScore(2,56), addScore(3,39), addScore(4,51), addScore(5,4), top(1), reset(1), top(2)
Output: ,
Explanation: Track, update, and query scores with efficiency.
Q3. Design Hit Counter
Design a system to count the number of hits received in the past 5 minutes.
Example:
Input: hit(1), hit(2), hit(300), getHits(300), getHits(301)
Output: 3, 2
Explanation: Count is maintained in a time window.
Q4. Trapping Rain Water (Stack Version)
Repeat of DSA but also expect to design using stacks for patience sort and water-trapping logic.
Example:
Input: height =
Output: 6
Explanation: Algorithm uses a stack for efficient traversal.
Q5. Time Based Key-Value Store
Design a data structure supporting set(key, value, timestamp) and get(key, timestamp), returning value at or before given timestamp.
Example:
Input: set(“foo”,”bar”,1)
get(“foo”,1) → “bar”
get(“foo”,3) → “bar”
Output: “bar”, “bar”
Explanation: Use hash-map of arrays and binary search for efficient retrieval.
Q6. Design Skiplist
Implement a Skiplist for O(log n) insert, erase, and search operations.
Example:
Input / Usage: add(1)
erase(0)
search(1) → True
Output: True/False for existence
Explanation: Skiplist uses probabilistic levels for fast search/update/delete.
Q1. Data Lakehouse Architecture
Explain and design a hybrid data architecture combining the best of data warehouses and data lakes (Lakehouse).
Q2. Collaborative Notebook Environment
Design how code, data, and output can be shared, versioned, and collaborated-on by teams within Databricks notebooks.
Q3. Scalable Streaming Pipeline in Databricks
Design an end-to-end data streaming pipeline: ingest, process, store, and serve streaming data (using Structured Streaming, Delta Lake, and scalable clusters).
Q4. Databricks HLD Interview - Data Pipeline Design
Describe your approach to designing a pipeline for big data ingestion, transformation, storage, and analytics using Databricks and Spark.