Projects

Selected Work

Systems, machine learning infra, and high-performance computing work.

Featured

C++2025-04 - 2025-09

Bustub Database Engine

Built a C++ relational database system, implementing core storage, indexing, SQL execution, and MVCC-based concurrency control with RAII design patterns and performance optimizations.

C++RAIIConcurrencyMVCCSystem Programming

Implemented buffer pool manager, B+ tree indexing, query execution engine, and transaction management in C++17.

Built an LRU-K page cache, latch crabbing B+ tree index, and RAII page guards to automate page pinning.

Implemented MVCC-compliant transaction control to support conflict-free multi-threaded execution.

Source

Python2024-03 - 2025-04

Machine Learning in Production (MLiP) Project for Ovarian Cancer Detection

Developed a self-hosted machine learning distributed backend with a web interface, making advanced ML models for early detection of ovarian cancer accessible to medical professionals.

PythonKubenetesRedisCeleryPostgresS3

Designed a high-concurrency ML training platform using Flask and PostgreSQL with a Redis-backed queue.

Implemented a Master-Worker architecture using Celery for asynchronous task distribution.

Applied SOLID principles and Clean Architecture to keep the system maintainable from MVP to production.

C2022-07 - 2022-11

Web Proxy with Cache

Implemented a multithreaded web proxy in C that handles HTTP requests from browsers and forwards them to web servers.

CUnix SocketsTCP/IPPOSIX ThreadsLRULFU

Developed an HTTP proxy in C using Unix Socket APIs and TCP/IP for client-server communication.

Implemented concurrency with POSIX Threads and the producer-consumer pattern for safe multithreading.

Designed a cache with LRU and LFU replacement policies for efficient resource management.

Source

CUDA

Advection Solver

Built a high-performance advection solver for shared memory architectures, supporting both OpenMP and CUDA implementations.

CCUDAOpenMPParallel ComputingPerformance Optimization

Developed a parallel CPU implementation with OpenMP using decomposition and loop optimizations.

Implemented GPU kernels with CUDA and tuned block and grid configurations for performance.

Compared performance against the serial implementation to evaluate CPU and GPU speedups.

Source

C++

NVMStore

A C++ prototype that aggregates concurrent insert operations through thread-local buffers and flushes them into sequential writes optimized for non-volatile memory.

C++ConcurrencyLock-freeThread-local BuffersNVM

Uses thread-local buffers and a lock-free linked list to avoid a global insertion bottleneck.

Batches buffered key-value entities into sequential writes to better match NVM performance characteristics.

Flushes concurrent inserts safely with per-buffer draining and a serialized write path.

Source

Tiny MapReduce

A Go implementation of a tiny MapReduce framework with coordinator/worker processes, plugin-based map and reduce jobs, and tests for fault tolerance and parallel execution.

GoRPCPluginsDistributed SystemsConcurrency

Implements a coordinator/worker MapReduce pipeline with plugin-based map and reduce functions.

Includes parallel timing and crash-recovery tests to validate worker coordination and resilience.

Provides sequential and distributed runners to compare outputs and verify correctness.

Source