Databricks Interview Questions

Prepare for success with our curated collection of interview questions. Designed to help students practice and build confidence, these questions cover a range of topics and real-world scenarios to get you ready for your next interview.
Q1: Fibonacci Tree Shortest Path Navigation

Given a recursively defined Fibonacci tree where T_0 is empty, T_1 contains a single node, and T_n consists of a root with T_{n-2} as left child and T_{n-1} as right child, implement a function that returns the shortest path between two enumerated nodes s and e in a Fibonacci tree of order n. The path should be represented as a sequence of moves (U for up, L for left, R for right). For example, fibPath(order=3, start=1, end=3) should return “URR”. Analyze the time and space complexity of your approach and handle edge cases where nodes may not exist in the tree.

Given a set of ordered firewall rules containing ALLOW and DENY actions with IP addresses in CIDR notation (e.g., “192.168.100.0/24”) or specific IPs (e.g., “2.3.4.9”), implement a function access_ok(rules, ip_address) that determines whether a given IP address is allowed or denied based on the first matching rule. Return both the boolean result and the index of the matching rule. Handle edge cases including overlapping CIDR ranges, invalid IP formats, and empty rule sets. Optimize for multiple IP queries using appropriate data structures like tries or interval trees.

Given a directory structure represented as a tree where each Directory contains nested subdirectories and files, and each File has an IsEncrypted boolean property, implement a method that returns the count of encrypted and unencrypted files under the input directory. The directory class provides Name, Directories (dictionary of subdirectories), and Files (dictionary of files) properties. Design an efficient traversal algorithm that handles deeply nested directory structures and analyze its time complexity relative to the total number of files and directories.

Building on the directory encryption problem, you are provided two external APIs: EncryptFile(filePath) that encrypts a single file, and EncryptDirectory(directoryPath) that recursively encrypts every file under a directory. Given requestTime (time to make an API call) and fileEncryptTime (time to encrypt one file), implement GetMinimumEncryptTime(requestTime, fileEncryptTime) that calculates the least time needed to encrypt all files in a directory. Consider trade-offs between individual file encryption versus batch directory encryption, and account for N total files in your optimization strategy.

Given a weighted graph where each edge has both distance and cost attributes, implement an algorithm to find the shortest and cheapest route from a source node to a destination node. The solution should balance both distance minimization and cost optimization, potentially using modified Dijkstra’s algorithm or A* search. Handle scenarios where multiple paths have equal distance but different costs, or vice versa. Analyze the time complexity and discuss how you would extend this to handle real-time traffic or dynamic pricing updates in a distributed system.

You are given n randomly generated connected graphs. Implement an algorithm to merge these n graphs into a single connected graph by strategically picking pairs of graphs and connecting them with minimum additional edges. The goal is to minimize the total number of new edges added while ensuring the resulting graph remains connected. Consider approaches using union-find data structures, analyze the time and space complexity, and discuss how you would handle very large graphs that cannot fit entirely in memory.

Given an array of N integers, divide it into exactly K consecutive non-empty subarrays such that a specific cost function is minimized. The cost could be the maximum sum among all subarrays, the difference between maximum and minimum subarray sums, or another metric. Implement an efficient solution using binary search on the answer combined with greedy validation, or dynamic programming approaches. Analyze the time complexity and handle edge cases where K > N or when array elements are negative.

Given a string S consisting of lowercase letters and an integer K, repeatedly remove the longest prefix of S containing at most K unique characters until the string becomes empty. Return the number of operations required or the sequence of removed prefixes. Implement an efficient solution using sliding window technique or two-pointer approach. Analyze the time complexity and optimize for very long strings (millions of characters) by minimizing character counting operations.

Given a 2D array where you must select exactly one element from each row, find the selection that minimizes the difference between the maximum and minimum elements in the resulting array. Return the sorted final array. Implement an efficient solution using a min-heap to track the current selection across all rows, along with a variable tracking the current maximum. Analyze the time complexity as O(M×N×log M) where M is rows and N is average columns, and handle edge cases with empty rows.

Create a custom HashSet class that supports add(), remove(), contains(), and iterator() methods similar to a standard set, but with the additional requirement that the set must support safe modification during iteration. Implement proper iterator invalidation handling, snapshot-based iteration, or copy-on-write semantics. Discuss trade-offs between thread-safety, memory usage, and performance. Handle edge cases including concurrent modifications from multiple threads and ensure consistent iterator behavior.

Design a revenue tracking system that supports: insert(revenue) which creates a new customer ID and associates revenue, insert(revenue, referrerId) which also credits the referrer, and get_k_lowest_revenue(k, minRevenue) returning k customer IDs with at least minRevenue. Implement efficient data structures achieving O(k × log n) query time, then optimize to O(1) query time with O(n²) insertion. Handle auto-incrementing customer IDs and discuss space-time trade-offs.

Design and implement a Tic Tac Toe game for an n × m board that supports k consecutive tokens for winning. Implement methods for making moves, checking game state, and determining winners. Add a follow-up feature where an AI can play randomly or using minimax algorithm. Handle edge cases including invalid moves, full boards, and simultaneous wins. Analyze the time complexity of win detection and optimize using incremental state tracking rather than full board scans each move.

You are managing a server receiving large files through multiple data segments. Each segment has a byte count, and segments arrive in order. Implement a function that tracks and returns the count of bytes in consecutive upload segments matching specific criteria (e.g., segments above a threshold, segments within a time window). Use efficient data structures like prefix sums or segment trees to handle range queries. Optimize for high-frequency updates and queries in a production environment.

Given a string representation of a large integer and an integer k, if the string length exceeds k, divide it into groups of 3 digits. Calculate the sum of digits in each group, then recursively apply the grouping if any group’s sum exceeds k. Continue until all groups satisfy the constraint. Return the final grouping structure. Implement an efficient solution handling very large numbers (thousands of digits) and analyze the recursion depth and termination conditions.

Given documentation for an assembly-like language with specific instructions for loading values, performing arithmetic, and storing results, implement a simple division operation that returns both quotient and remainder. Parse the assembly instructions, maintain register state, and execute operations in sequence. Handle edge cases including division by zero, overflow conditions, and negative numbers. Discuss how you would extend this to support more complex operations like multiplication or modulo.

Given a k-th order Fibonacci tree where each node’s position is determined by a specific enumeration scheme, find the path between two nodes using only tree traversal operations. The path should be represented as a sequence of parent/child navigations. Implement an algorithm that first finds the lowest common ancestor (LCA) of the two nodes, then constructs the path by combining the upward path from start to LCA and downward path from LCA to end. Analyze time complexity.

Given an array of integers and a value k, find the length of the longest contiguous subarray that contains at most k distinct elements. Implement an efficient sliding window solution that expands the right boundary while the constraint is satisfied and contracts the left boundary when violated. Use a hash map to track element frequencies within the current window. Analyze the time complexity as O(n) and discuss how to modify the solution for exactly k distinct elements.

Given a starting IP address and a count of consecutive IP addresses, convert this range into the minimum number of CIDR blocks that exactly cover the range. For example, starting IP “255.0.0.7” with count 10 should produce CIDR blocks that cover exactly 10 addresses. Implement an algorithm that identifies optimal block boundaries based on binary representation of IP addresses. Handle edge cases including ranges crossing network boundaries and very large counts.

Design a time-based key-value store supporting put(key, value) and get(key, timestamp) operations, where get returns the most recent value before the given timestamp. Additionally, implement measure_put_load() and measure_get_load() methods that return the number of operations in the last 5 minutes. Use appropriate data structures like sorted lists with binary search for efficient timestamp queries. Handle concurrent access and discuss memory management for old entries.

Given a set of intervals representing a cover of a reference string, implement a function delete(cover, index) that removes an element at the specified index and returns the updated cover. After deletion, merge adjacent intervals where possible to maintain a minimal cover representation. Implement a follow-up that returns the maximal cover ensuring no two adjacent intervals are mergeable. Analyze the time complexity and prove correctness of the merging logic by checking at most 3-4 intervals around the modification point.

Given a large graph representing data dependencies in a distributed computing environment, partition the graph into k balanced subgraphs such that edge cuts between partitions are minimized. Each partition should have approximately equal vertices for load balancing. Implement an algorithm using techniques like METIS, spectral partitioning, or greedy approaches. Analyze the trade-offs between partition quality and computation time. Consider how this applies to Spark’s data shuffling optimization and distributed join operations in big data pipelines.

Design an algorithm to process a stream of timestamped events where you need to compute aggregations (sum, count, average) over sliding windows. Handle late-arriving data that may belong to already-closed windows using watermarking techniques. Implement efficient data structures to maintain window state while minimizing memory usage. Discuss strategies for window eviction, state backend optimization, and exactly-once processing semantics relevant to Spark Structured Streaming scenarios.

Implement a routing table using a trie data structure that supports inserting CIDR blocks with associated next-hop values and performing longest prefix match queries. Given an IP address, find the most specific matching route (longest prefix). Optimize for memory usage using path compression techniques and for query performance using level-compressed tries. Handle IPv6 addresses and discuss how this applies to network firewall implementations and load balancer routing decisions.

Implement a consistent hashing algorithm that distributes keys across n cache nodes while minimizing redistribution when nodes are added or removed. Use virtual nodes to improve load balancing. Support operations: addNode(nodeId), removeNode(nodeId), and getNode(key) returning the responsible node. Analyze the expected key redistribution percentage and discuss applications in distributed caching systems like Redis Cluster or DynamoDB partitioning strategies.

Create a space-efficient probabilistic data structure that tests whether an element is a member of a set. Implement add(element) and contains(element) operations with configurable false positive rate. Use multiple hash functions to set and check bits in a bit array. Calculate optimal bit array size and number of hash functions for given capacity and false positive tolerance. Discuss applications in distributed systems for reducing network calls and database lookups.

Given k sorted data streams (each potentially infinite), merge them into a single sorted output stream using O(k) memory. Implement using a min-heap to track the current element from each stream. Handle streams that may become exhausted and support lazy evaluation for memory efficiency. Discuss applications in external sorting, database merge joins, and Spark’s shuffle merge operations when dealing with sorted partitions.

Implement a Least Recently Used cache supporting get(key) and put(key, value) in O(1) time using a combination of hash map and doubly-linked list. Add thread-safety using fine-grained locking or lock-free techniques. Include capacity management that evicts the least recently used entry when full. Extend to support getOrDefault, contains, and size operations. Discuss cache eviction policies and applications in query result caching and session management.

Build a segment tree data structure that supports efficient range sum queries and point updates on an array. Implement build(arr), update(index, value), and query(left, right) operations with O(log n) time complexity. Handle lazy propagation for range updates. Discuss applications in real-time analytics dashboards, financial data processing, and scenarios requiring frequent range aggregations on mutable datasets in distributed systems.

Given a set of tasks with dependencies represented as a directed graph, implement topological sorting to determine a valid execution order. Detect cycles that would make execution impossible. Support dynamic addition of dependencies and incremental recomputation. Discuss applications in build systems, workflow orchestration, Spark DAG scheduling, and database query optimization where operations must respect dependency constraints.

Implement a disjoint-set (Union-Find) data structure with path compression and union by rank optimizations. Support find(element) returning the representative and union(element1, element2) merging two sets. Achieve nearly O(1) amortized time per operation. Apply to problems like finding connected components, detecting cycles in graphs, and Kruskal’s minimum spanning tree algorithm. Discuss applications in network connectivity and image processing.

Given an array and window size k, find the maximum element in each sliding window position efficiently. Implement an O(n) solution using a deque that maintains candidates in decreasing order, removing elements outside the window and smaller than current element. Discuss applications in time-series analysis, moving average calculations, and real-time monitoring systems where windowed aggregations are common in streaming data pipelines.

Implement the k-means clustering algorithm that partitions n data points into k clusters based on feature similarity. Support initialization strategies (random, k-means++), distance metrics (Euclidean, Manhattan), and convergence criteria. Optimize for large datasets using mini-batch updates. Discuss applications in customer segmentation, anomaly detection, and data partitioning strategies for distributed processing where similar data should be co-located.

Implement a Fenwick Tree (Binary Indexed Tree) that supports efficient prefix sum queries and point updates. Achieve O(log n) time for both operations with minimal space overhead. Support range sum queries using prefix sum differences. Discuss applications in frequency counting, cumulative distribution calculations, and scenarios requiring frequent updates and prefix queries on large numeric datasets in analytics systems.

Implement Huffman coding algorithm that creates optimal prefix-free binary codes for characters based on their frequencies. Build the Huffman tree using a priority queue, then generate codes by traversing from root to leaves. Support encode(text) and decode(bits) operations. Analyze compression ratio and discuss applications in data serialization, network transmission optimization, and storage compression in distributed file systems.

Implement a skip list data structure providing O(log n) search, insertion, and deletion with probabilistic balancing. Support ordered operations like range queries, finding successors/predecessors, and iteration. Compare performance characteristics with balanced BSTs and discuss applications in concurrent environments where lock-free implementations are possible, such as in-memory databases and indexing systems.

Implement the Rabin-Karp algorithm for finding all occurrences of a pattern in text using rolling hash technique. Achieve O(n+m) average time complexity where n is text length and m is pattern length. Handle hash collisions through verification. Extend to multiple pattern matching and approximate matching. Discuss applications in plagiarism detection, DNA sequence matching, log analysis, and intrusion detection systems scanning for known patterns.

Implement reservoir sampling algorithm that randomly selects k elements from a stream of unknown length, ensuring each element has equal probability of selection. Use O(k) memory and process elements in a single pass. Prove correctness using induction. Discuss applications in random sampling of big data, A/B test assignment, and scenarios where data is too large to store but representative samples are needed for analysis.

Implement Kruskal’s or Prim’s algorithm to find the minimum spanning tree of a weighted undirected graph. Use Union-Find for Kruskal’s or priority queue for Prim’s. Analyze time complexity and discuss applications in network design, clustering, approximation algorithms for NP-hard problems, and optimizing connection costs in distributed systems where minimizing communication overhead is critical.

Implement dynamic programming solution for computing the minimum number of operations (insert, delete, substitute) to transform one string into another. Extend to return the actual sequence of operations. Optimize space complexity to O(min(m,n)) using only two rows. Discuss applications in spell checking, DNA sequence alignment, fuzzy matching in search systems, and data deduplication where similar records need identification.

Given a collection of meeting time intervals, find the minimum number of conference rooms required to schedule all meetings without conflicts. Implement a solution using sorting and a min-heap to track end times of ongoing meetings. Extend to find the actual room assignments. Discuss applications in resource allocation, task scheduling in distributed systems, and Spark executor assignment where tasks compete for limited computational resources.

Q1. Cached File System With Chunk-Based Download Optimization

Design a CachedFile class that wraps a StorageClient interface providing getFileSize(uri) and fetch(uri, offset, length, buffer) methods. The class should download remote files efficiently by bucketing data into fixed-size chunks, caching them on disk, and minimizing repeated network calls. Implement thread-safe parallel chunk fetching using a thread pool, handle duplicate fetch requests from concurrent threads, and discuss failure recovery strategies. Analyze trade-offs between chunk size, memory usage, and network efficiency.

Design a key-value data store class supporting put(string key, string value), get(string key), measureGetLoad(), and measurePutLoad() methods. The load measurement methods should return the count of respective operations in the last 5 minutes. Implement using a hashmap for storage combined with timestamp queues for tracking operation times. Discuss thread-safety considerations, memory management for old timestamps, and how to handle high-frequency operations (multiple calls within milliseconds) without performance degradation.

Design a revenue association system supporting insert(revenue) creating auto-incrementing customer IDs, insert(revenue, referrerId) linking referrals, and get_k_lowest_revenue(k, minRevenue) returning k customers meeting revenue thresholds. Implement efficient data structures achieving O(k × log n) queries initially, then optimize to O(1) query time with O(n²) insertion by precomputing answers. Handle referral chain queries like get_nested_referral(customerId, depth) returning cumulative revenue across referral levels with space and time optimizations.

Implement a firewall feature class that processes rules in order, where each rule specifies ALLOW or DENY action with IP ranges in CIDR notation (e.g., “192.168.1.0/24”) or specific IPs. The access_ok(ip_address) method should return whether access is allowed and which rule index matched. Optimize for multiple sequential queries using trie-based IP matching, handle overlapping ranges correctly, and discuss extensions for efficient bulk IP range queries and dynamic rule updates without full reindexing.

Design a custom HashSet class implementing add(), remove(), contains(), and iterator() methods with the unique requirement that the set must support safe modification during active iteration. Implement snapshot-based iterators that capture state at creation time, or use copy-on-write semantics for structural modifications. Discuss trade-offs between memory overhead, iteration consistency guarantees, and concurrent modification performance. Handle edge cases including iterator invalidation scenarios and multiple simultaneous iterators.

Design a rate limiter class supporting allowRequest(clientId) returning true if the request should be permitted based on configured limits (e.g., 100 requests per minute per client). Implement sliding window algorithm for accurate rate tracking, support configurable window sizes and limits per client tier, and discuss extensions for distributed deployment using Redis or similar backends. Handle clock skew, ensure atomicity of check-and-increment operations, and optimize for high-throughput scenarios.

Implement a thread pool executor supporting submit(task, priority), cancel(taskId), and shutdown() operations. Tasks should execute based on priority ordering with configurable pool size. Use priority queues for task scheduling, maintain worker threads with efficient wait/notify mechanisms, and support graceful shutdown completing pending tasks. Handle task timeouts, rejected task policies, and discuss extensions for scheduled/recurring tasks, work stealing between threads, and dynamic pool size adjustment.

Design a thread-safe LRU cache supporting get(key), put(key, value), and remove(key) in O(1) time with configurable maximum capacity. Track cache statistics including hit rate, miss rate, eviction count, and current size. Use ConcurrentHashMap with fine-grained locking or lock-free algorithms for high concurrency. Discuss eviction policies (LRU, LFU, TTL-based), memory limits, and extensions for write-through/write-behind caching strategies in database integration scenarios.

Design a configuration manager that loads settings from multiple sources (files, environment variables, remote stores), supports hot reloading when configs change, and validates configurations against schemas. Implement listener pattern for change notifications, support hierarchical configs with override precedence, and provide type-safe accessors. Handle concurrent access during reloads, rollback on validation failures, and discuss strategies for zero-downtime configuration updates in production systems.

Implement an event bus system supporting publish(event, topic), subscribe(topic, listener), and unsubscribe(topic, listener) operations. Events should be delivered to all registered listeners for a topic with optional filtering based on event properties. Support asynchronous delivery with configurable thread pools, guarantee at-least-once delivery semantics, and discuss dead letter handling for failed deliveries. Extensions include event ordering guarantees, replay capabilities, and integration with external message queues.

Design a database connection pool managing a fixed number of reusable connections. Support borrowConnection(timeout) and returnConnection(conn) operations with automatic health checking of idle connections. Track pool metrics including active connections, wait times, and utilization. Implement connection validation before lending, eviction of stale connections, and discuss strategies for pool sizing, handling connection leaks, and integration with connection pool monitoring in observability platforms.

Implement a job scheduler supporting cron-based scheduling (e.g., “0 */5 * * * *” for every 5 minutes), job dependencies (job B runs after job A completes), and manual triggering. Use priority queues for next-execution ordering, maintain job state machines, and support retry logic with exponential backoff. Handle overlapping executions, job cancellation, and discuss persistence of job state for crash recovery and distributed scheduling across multiple nodes.

Design a file system watcher that monitors directories for changes (create, modify, delete) with recursive subdirectory support. Implement debouncing to coalesce rapid successive changes, filter events by file patterns, and notify registered listeners. Handle platform-specific file system APIs, symbolic links, and discuss scalability for watching large directory trees. Extensions include batch event delivery, change aggregation, and integration with build systems for incremental compilation.

Implement a generic object pool for expensive-to-create objects (database connections, HTTP clients, etc.) supporting borrowObject(), returnObject(obj), and validateObject(obj) operations. Track borrowed objects to detect leaks, implement idle object eviction, and support pool resizing. Use wait queues for exhausted pool scenarios with configurable timeouts. Discuss factory patterns for object creation, health checking strategies, and comparison with dependency injection approaches for resource management.

Design a metrics collection system supporting counter.increment(), gauge.set(value), timer.record(duration), and histogram.update(value) operations. Aggregate metrics over configurable time windows, support multiple export destinations (console, files, remote systems), and implement efficient lock-free updates for high-throughput scenarios. Handle metric cardinality explosion, discuss memory management for high-cardinality labels, and integration with monitoring systems like Prometheus or Grafana.

Implement a circuit breaker pattern supporting closed (normal), open (failing), and half-open (testing) states. Track failure rates, automatically transition states based on thresholds, and execute fallback logic when circuit is open. Support configuration for failure thresholds, timeout durations, and recovery testing intervals. Discuss integration with retry policies, bulkhead patterns for resource isolation, and observability through metrics and state change events for debugging distributed system failures.

Design a retry mechanism supporting configurable backoff strategies (fixed delay, exponential, exponential with jitter), retry limits, and exception classification (retryable vs non-retryable). Implement retry context tracking for logging and metrics, support asynchronous retry scheduling, and discuss integration with circuit breakers. Handle idempotency concerns for retries, implement retry budget to prevent cascade failures, and provide patterns for retrying different operation types (HTTP calls, database transactions, message publishing).

Implement the command pattern supporting execute(), undo(), and redo() operations for a text editor or similar application. Each command encapsulates an action and its reversal, maintaining a command history stack. Support command composition (macro commands), memory management for large histories, and discuss strategies for commands that cannot be undone. Extensions include command serialization for persistence, collaborative editing with operation transformation, and command replay for debugging.

Design an observer pattern implementation supporting attach(observer, priority), detach(observer), and notify(data) operations. Observers should be notified in priority order, with support for conditional notifications based on event types or data content. Implement weak references to prevent memory leaks, support asynchronous notification delivery, and discuss thread-safety for concurrent observer registration and notification. Extensions include event filtering, observer chaining, and integration with reactive programming patterns.

Implement a strategy pattern framework allowing runtime selection and composition of algorithms. Support registerStrategy(name, strategy), setStrategy(name), and composeStrategies(names) operations. Strategies should be interchangeable implementing common interfaces, with support for strategy parameters and state. Discuss strategies for strategy discovery, validation, and testing. Extensions include strategy versioning, A/B testing different strategies, and dynamic strategy loading from configuration or plugins.

Design a builder pattern for constructing complex objects with many optional parameters, supporting fluent API, validation at build time, and immutable result objects. Implement step-by-step construction with required field enforcement, default value handling, and object reuse through reset. Discuss trade-offs between builder complexity and object flexibility, support for nested builders for hierarchical objects, and integration with serialization frameworks for builder-based object creation from external data.

Implement a factory pattern with dynamic registration of product types, supporting createProduct(type, params), registerFactory(type, factory), and unregisterFactory(type). Factories should manage product lifecycle including initialization and cleanup, support dependency injection for product construction, and handle factory chaining for complex products. Discuss strategies for factory discovery, testing with mock factories, and integration with dependency injection frameworks for automatic factory registration.

Design a generic state machine supporting defineState(name), defineTransition(from, to, event, guard), trigger(event), and currentState() operations. Transitions should execute optional guard functions before allowing state changes and action functions after transitions. Support hierarchical states, parallel states, and history states for returning to previous configurations. Discuss state machine persistence, visualization through state diagrams, and applications in workflow engines, protocol implementations, and UI state management.

Implement a chain of responsibility pattern where requests pass through a configurable sequence of handlers, each optionally processing and passing to next. Support addHandler(handler), removeHandler(handler), configureChain(config), and execute(request) operations. Handlers should be independently testable and composable, with support for short-circuiting and exception handling. Discuss handler ordering strategies, debugging chain execution, and applications in request processing pipelines, middleware frameworks, and event processing systems.

Design a mediator coordinating communication between multiple components without direct coupling. Components register with mediator, send messages through mediator, and receive filtered messages based on subscriptions. Implement message routing based on topics or content, support message transformation, and discuss asynchronous message delivery. Extensions include message persistence for replay, dead letter handling for undeliverable messages, and integration with event sourcing patterns for audit trails and state reconstruction.

Implement a decorator pattern framework allowing dynamic composition of additional behaviors to objects at runtime. Support wrap(object, decorator), unwrap(object), and execute(operation) operations where decorators can augment, replace, or extend object behavior. Discuss decorator ordering for conflicting behaviors, performance implications of deep decorator chains, and applications in I/O stream processing, UI component enhancement, and cross-cutting concerns like logging and caching.

Design an adapter pattern supporting conversion between incompatible interfaces in both directions. Implement adapt(sourceInterface, targetInterface) creating bidirectional adapters, handle method signature mismatches through parameter transformation, and discuss performance overhead of adapter chains. Extensions include automatic adapter generation through reflection, adapter registries for common interface pairs, and applications in legacy system integration and third-party library compatibility layers.

Implement a facade providing simplified interfaces to complex subsystems, supporting configureSubsystem(config), executeOperation(op, params), and getSubsystemStatus() operations. The facade should coordinate multiple subsystem components, handle error aggregation and translation, and provide consistent error handling. Discuss facade granularity trade-offs, extension points for customization, and applications in API gateways, service abstraction layers, and simplifying complex framework usage patterns.

Design a proxy pattern supporting lazy initialization of expensive objects, access control based on caller permissions, and operation logging. Implement createProxy(realSubject, options) with configurable behaviors like caching, rate limiting, and request validation. Discuss proxy transparency to clients, handling of proxy failures and fallback strategies, and applications in ORM lazy loading, security interceptors, and remote procedure call stubs.

Implement a flyweight pattern for memory-efficient representation of many similar objects by separating intrinsic (shared) and extrinsic (per-instance) state. Support getFlyweight(intrinsicState) returning shared instances, createObject(flyweight, extrinsicState) creating full objects, and memory statistics tracking. Discuss thread-safety of shared flyweights, cleanup of unused flyweights, and applications in text editors (character formatting), game development (particle systems), and graphical rendering (repeated shapes).

Q1. Distributed Key-Value Store With Durability Guarantees

Design a durable key-value store system supporting put, get, and delete operations with strong consistency guarantees across multiple nodes. Address data partitioning using consistent hashing, replication strategies for fault tolerance (leader-follower, quorum-based writes), and write-ahead logging for durability. Discuss trade-offs between availability and consistency during network partitions, compaction strategies for storage efficiency, and how to handle hot keys causing uneven load distribution across the cluster.

Design a streaming data pipeline processing millions of events per second from multiple sources (IoT devices, application logs, user interactions). Address data ingestion using Kafka or similar message queues, stream processing with windowed aggregations, late-arrival handling through watermarks, and exactly-once processing semantics. Discuss state management for aggregations, scaling strategies for varying load, fault tolerance through checkpointing, and integration with downstream systems like data lakes and real-time dashboards.

 

Design a distributed file storage system similar to HDFS or S3 supporting large file storage with replication or erasure coding for durability. Address file chunking strategies, metadata management for file-to-block mappings, data placement policies across racks and datacenters, and consistency models for concurrent writes. Discuss repair mechanisms for failed nodes, bandwidth-efficient rebalancing during cluster changes, and optimization for both throughput-oriented batch workloads and latency-sensitive random reads.

 

Design an API gateway handling millions of requests per second providing authentication, authorization, rate limiting, request routing, and response transformation. Address distributed rate limiting using Redis or similar stores, JWT-based authentication with key rotation, circuit breaker patterns for backend protection, and request/response caching. Discuss multi-tenant support with isolated quotas, API versioning strategies, observability through distributed tracing, and graceful degradation during backend service failures.

Design a distributed job scheduler orchestrating complex data pipelines with task dependencies, resource constraints, and failure recovery. Address DAG representation and validation, task scheduling considering data locality and resource availability, speculative execution for straggler mitigation, and checkpoint-based recovery. Discuss integration with cluster managers (Kubernetes, YARN), priority-based scheduling for multi-tenant clusters, backfill capabilities for historical data processing, and monitoring for pipeline health and SLA compliance.

Design a real-time analytics system serving dashboards with sub-second query latency over billions of events. Address data ingestion through streaming pipelines, pre-aggregation at multiple time granularities, columnar storage for efficient queries, and caching strategies for common queries. Discuss handling concurrent users with personalized filters, incremental updates for live metrics, query optimization for ad-hoc analysis, and cost optimization through data tiering between hot and cold storage.

Design a distributed caching layer reducing database load for read-heavy workloads with billions of requests daily. Address cache partitioning and replication strategies, cache invalidation policies (TTL, write-through, write-behind), and consistency models between cache and database. Discuss handling cache stampedes through request coalescing, warm-up strategies after deployments, multi-level caching (application, distributed, CDN), and monitoring cache hit rates and eviction patterns for optimization.

Design an event sourcing system capturing all state changes as immutable events enabling audit trails, temporal queries, and state reconstruction. Address event storage optimization for high write throughput, event versioning and schema evolution, snapshot strategies for efficient state reconstruction, and event stream processing for projections. Discuss handling event ordering in distributed systems, compaction for storage efficiency, replay capabilities for debugging, and integration with CQRS patterns for read-optimized query models.

Design a globally distributed database supporting active-active replication across multiple regions with conflict resolution for concurrent writes. Address partition strategies for geographic distribution, conflict detection and resolution (last-write-wins, CRDTs, application-defined), and consistency trade-offs for cross-region reads. Discuss failover procedures for regional outages, data sovereignty compliance for regional data residency, replication lag monitoring, and optimization for regional affinity where users primarily access their nearest region.

Design a microservices orchestration platform managing hundreds of services with automated deployment, scaling, and service-to-service communication. Address service discovery and registration, load balancing with health-aware routing, circuit breaker implementation, and distributed tracing integration. Discuss configuration management across environments, secrets management for sensitive data, canary deployment strategies, observability through metrics aggregation, and policy enforcement for security and compliance requirements.

Design a data lake architecture storing petabytes of structured and unstructured data supporting batch processing, interactive queries, and ML workloads. Address storage organization (bronze/silver/gold layers), metadata management and data cataloging, access control for multi-tenant environments, and data quality monitoring. Discuss partitioning strategies for query optimization, lifecycle policies for cost management, integration with compute engines (Spark, Presto), and governance for PII data handling and retention policies.

Design a real-time recommendation system serving personalized content to millions of users with millisecond latency requirements. Address feature computation from streaming and batch sources, model serving infrastructure with A/B testing support, candidate generation and ranking pipelines, and feedback loop for model improvement. Discuss handling cold-start scenarios for new users, diversity in recommendations to avoid filter bubbles, cache strategies for popular items, and infrastructure for online learning and model updates.

Design a log aggregation system collecting logs from thousands of servers supporting real-time search, alerting, and long-term retention. Address log ingestion at scale with compression, indexing strategies for fast search (inverted indices), retention policies with tiered storage, and alerting on log patterns. Discuss handling schema evolution in log formats, multi-tenant isolation for different teams, query optimization for complex searches, and integration with incident management systems for automated alerting.

Design a CDC system capturing database changes in real-time and replicating to downstream systems (data warehouse, cache invalidation, search indices). Address log parsing for different database types, ordering guarantees for dependent changes, schema evolution handling, and exactly-once delivery semantics. Discuss handling DDL changes, backfill strategies for historical data, lag monitoring and alerting, and transformation capabilities for data format conversion between source and destination systems.

Design a distributed session store enabling stateless application servers to share user session state with high availability and low latency. Address session partitioning strategies, replication for fault tolerance, expiration and cleanup policies, and session migration during failures. Discuss handling session stickiness requirements, security for session data encryption, scaling strategies for traffic spikes, and integration with authentication systems for session validation and refresh mechanisms.

Design a workflow orchestration system managing complex data pipelines with conditional branching, parallel execution, and error handling. Address workflow definition and versioning, task scheduling with resource awareness, dependency management between workflows, and retry policies with backoff. Discuss handling long-running workflows spanning days, checkpointing for progress persistence, manual intervention points for approvals, and monitoring for pipeline health with alerting on failures or SLA violations.

Design a CDN system distributing static and dynamic content globally with edge caching for low-latency delivery. Address content placement strategies across edge locations, cache invalidation mechanisms, origin shield for protecting backend, and request routing to optimal edges. Discuss handling cache misses efficiently, video streaming optimization with adaptive bitrate, security through DDoS protection and WAF integration, and analytics for cache performance and traffic pattern analysis.

Design a time-series database optimized for ingesting and querying metrics from millions of sources with high cardinality. Address data compression for time-series patterns, downsampling strategies for long-term retention, query optimization for time-range aggregations, and retention policies. Discuss handling high-cardinality labels causing explosion, real-time alerting on metric thresholds, integration with visualization tools, and multi-tenancy with isolated namespaces and resource quotas for different teams.

Design a distributed lock service providing mutual exclusion across distributed nodes for coordination tasks like leader election and resource access. Address lock semantics (exclusive, shared, recursive), lease-based lock management with automatic expiration, and fairness policies for waiting lock requests. Discuss handling network partitions and split-brain scenarios, watch mechanisms for lock change notifications, performance optimization for high-contention locks, and integration with distributed transaction coordinators.

Design a batch processing platform handling large-scale ETL jobs processing terabytes of data daily with SLA guarantees. Address job scheduling with dependency resolution, resource allocation and isolation between jobs, data locality optimization, and speculative execution for stragglers. Discuss handling job failures with retry and recovery, data quality validation in pipelines, incremental processing for updated data, and cost optimization through spot instances and auto-scaling based on queue depth.

Design a stream processing platform supporting complex event processing with pattern detection, temporal joins, and stateful computations over event streams. Address event ordering across partitions, state backend optimization for large state, checkpoint mechanisms for fault tolerance, and scaling strategies for varying throughput. Discuss handling late events with watermarks, window management for different window types (tumbling, sliding, session), integration with external systems for sinks and sources, and SQL-like query interfaces for stream processing.

Design a distributed counter service handling billions of increment operations per second across multiple keys with approximate or exact counting. Address partition strategies for hot counters, eventual consistency vs strong consistency trade-offs, merge strategies for distributed increments, and memory-efficient data structures like HyperLogLog for unique counts. Discuss handling counter resets and overflows, read optimization for frequently accessed counters, and integration with rate limiting and analytics systems requiring real-time counts.

Design a notification system delivering messages through multiple channels (email, SMS, push, in-app) to millions of users with delivery guarantees. Address message queuing for reliability, channel selection and prioritization, user preference management, and delivery tracking with retry logic. Discuss rate limiting per channel and recipient, template management for message content, A/B testing for message optimization, and analytics for delivery rates and engagement metrics across channels.

Design a distributed tracing system collecting and analyzing traces from microservices for debugging and performance optimization. Address trace sampling strategies for high-traffic systems, span data collection and aggregation, storage optimization for trace data, and query interfaces for trace exploration. Discuss integration with logging and metrics for unified observability, alerting on latency anomalies, service dependency mapping from traces, and privacy considerations for trace data containing sensitive information.

Design a feature store enabling consistent feature computation and serving for ML training and inference across the organization. Address feature registration and versioning, online feature serving with low latency, offline feature computation for batch training, and feature lineage tracking. Discuss handling feature transformations, point-in-time correctness for training data generation, feature sharing and discovery across teams, and monitoring for feature drift affecting model performance over time.

Design a search infrastructure for a large product catalog supporting full-text search, faceted filtering, and relevance ranking. Address index structure for efficient search, incremental index updates for catalog changes, query understanding for intent detection, and ranking optimization with ML models. Discuss handling synonyms and spelling corrections, personalization in search results, multi-language support, and scaling for peak traffic during sales events with cache strategies for popular queries.

Design a media processing pipeline handling upload, transformation, and delivery of images and videos at scale. Address upload optimization for large files, format conversion and thumbnail generation, quality optimization for different devices, and CDN integration for delivery. Discuss handling processing backlogs during peak uploads, cost optimization through intelligent format selection (WebP, AVIF), watermarking for content protection, and analytics for media usage and storage optimization.

Design an A/B testing platform enabling product teams to run experiments with statistical significance analysis and guardrail metrics. Address user assignment consistency across experiments, experiment configuration and rollout, metrics computation and statistical analysis, and early stopping for underperforming variants. Discuss handling multiple hypothesis testing, segmentation analysis for heterogeneous treatment effects, integration with feature flags for gradual rollouts, and self-service interfaces for experiment creation and monitoring.

Design an IAM system managing user identities, authentication, and authorization across multiple applications and services. Address SSO integration with SAML/OIDC, MFA support, role-based and attribute-based access control, and credential management. Discuss handling compromised credential scenarios, session management across devices, audit logging for compliance, and integration with external identity providers for enterprise customers with existing identity infrastructure.

Design a data quality platform monitoring data pipelines for anomalies, schema changes, and data freshness issues. Address metric collection for data quality dimensions (completeness, accuracy, consistency, timeliness), alerting on quality degradation, and root cause analysis tools. Discuss handling expected data changes through schema evolution management, baseline establishment for anomaly detection, integration with data catalog for context, and remediation workflows for quality issue resolution.

Design a distributed rate limiting system protecting services from overload with per-user, per-service, and global limits at scale. Address distributed counter synchronization for accurate limiting, sliding window implementation, burst handling with token bucket algorithms, and hierarchical rate limits. Discuss handling rate limit bypass for premium users, graceful degradation vs hard rejection, observability for rate limit metrics, and configuration management for dynamic limit adjustments without deployments.

WhatsApp Icon

Hi Instagram Fam!
Get a FREE Cheat Sheet on System Design.

Hi LinkedIn Fam!
Get a FREE Cheat Sheet on System Design

Loved Our YouTube Videos? Get a FREE Cheat Sheet on System Design.