Shayon Mukherjee / blog
  • Linux Page Faults, mmap, and userfaultfd March 6, 2026

    How Linux demand-pages memory, what mmap does to physical pages, why replacing a mapping breaks direct memory access and shared memory, and how userfaultfd lets you lazily populate memory without destroying the mapping.

  • Let's discuss sandbox isolation February 21, 2026

    A dive into the spectrum of sandboxing and isolation, from Linux namespaces and gVisor to hardware-enforced microVMs and WebAssembly, and why picking the right boundary matters for multi-tenant workloads.

  • Understanding how GIL Affects Checkpoint Performance in PyTorch Training February 7, 2026

    A look at what Python's GIL is, why it makes thread-based async checkpoint saves counterproductive during PyTorch training, and how process-based async with pinned memory is better

  • Software engineering when machine writes the code January 19, 2026

    The machines have become powerful enough to write code for us. What does this mean for those of us who love understanding how things work? And how do we keep learning when AI does the heavy lifting?

  • A hypothetical search engine on S3 with Tantivy and warm cache on NVMe November 10, 2025

    A simple architecture for BM25 search over object storage using immutable Tantivy shards, stateless indexers and query nodes, and local NVMe caching for sub-second queries.

  • Diwali October 27, 2025

    A personal reflection on Diwali. The festival of lights, the world I grew up in, and choosing light in darker times.

  • Mutable atomic deletes with Parquet backed columnar tables on S3 October 12, 2025

    Physically remove rows in Parquet on S3 with MPU + UploadPartCopy, copy-on-write objects, and a tiny CAS head to make it simple, fast, and compliant.

  • An MVCC-like columnar table on S3 with constant-time deletes October 4, 2025

    A thought experiment in building Parquet-like columnar table on S3 with row-level deletes using conditional writes, tombstone files, and a single-object transaction pointer.

  • Exploring PostgreSQL to Parquet archival for JSON data with S3 range reads October 3, 2025

    Moving large JSON payloads from PostgreSQL TOAST tables to Parquet on S3 with deterministic sharding, row-group pruning, and range-based reads for millisecond point lookups.

  • Bypass PostgreSQL catalog overhead with direct partition hash calculations August 9, 2025

    Eliminating PostgreSQL catalog traversal overhead with local partition calculations for up to 20x faster hash partition queries.

  • Is AGI paradoxical? June 21, 2025

    If AI learns from human intelligence, can it ever truly transcend its origins? Is AGI a technical milestone or philosophical mirage?

  • Pitfalls of premature closure with LLM assisted coding June 13, 2025

    When LLM models generates clean, professional-looking code, it's tempting to stop exploring alternatives. But therein lies the risks that comes with premature closure. So what is premature closure?

  • Another look into PostgreSQL CTE materialization and non-idempotent subqueries May 4, 2025

    A follow-up exploration into PostgreSQL CTE materialization, diving deeper into why non-idempotent subqueries can execute multiple times, leading to unexpected results.

  • A PostgreSQL planner gotcha with CTEs DELETE and LIMIT April 29, 2025

    How a seemingly straightforward DELETE query using a CTE and LIMIT returned more rows than expected due to query planner optimization.

  • Selective asynchronous commits in PostgreSQL - balancing durability and performance March 16, 2025

    Safely leverage PostgreSQL's asynchronous commit for significant performance gains

  • Challenging AI generated code from first principles February 22, 2025

    While these tools boost productivity, they're not a replacement for critical thinking. Taking the time to understand why something works (or breaks) and building strong mental models isn't just busy work

  • Scaling with PostgreSQL without boiling the ocean February 9, 2025

    Practical scaling strategies for application developers who don't have a dedicated database team

  • Database mocks are just not worth it December 30, 2024

    Testing against a real database uncovers hidden pitfalls that can appear as the application matures

  • Using CTID Based Pagination for Data Cleanups in PostgreSQL October 29, 2024

    When dealing with very large PostgreSQL tables (we're talking 15TB+), sometimes routine maintenance like archiving very old data can become surprisingly challenging

  • pg_easy_replicate Supports Schema Change Tracking During Logical Replication August 31, 2024

    This new capability extends PostgreSQL logical replication, enabling DDL tracking and bringing more flexibility to database migrations through pg_easy_replicate

  • Stop Relying on IF NOT EXISTS for Concurrent Index Creation in PostgreSQL August 12, 2024

    When you use `IF NOT EXISTS` and re-run your index creation, the task can silently complete while leaving behind an invalid index.

  • The Tech Industry's Moral Vacuum July 21, 2024

    The dichotomy between the progressive ethos that once permeated the tech industry and the current political endorsements by its so called elites suggests a community at a crossroads. It beckons a fundamental question: what are the core values we hold dear, and how do they translate into our actions and legacies

  • Use pg_easy_replicate for setting up Logical Replication and Switchover in PostgreSQL July 13, 2024

    pg_easy_replicate is a CLI orchestrator tool that makes the process of setting up and managing logical replication between PostgreSQL databases a breeze.

  • Fast, Simple and Metered Concurrency in Ruby with Concurrent::Semaphore May 27, 2024

    I explored various approaches and ended up with a worker pool model using a Semaphore, here they are

  • The value of sitting on an idea April 13, 2024

    Have you ever had a brilliant idea that you wanted to act on immediately? We've all been there, and it's tempting to jump right in. But what if I told you there's immense value in simply sitting on an idea?

  • Incidents and the requirement of slowing down March 29, 2024

    The urgency often obscures the fact that incidents cause more incidents. Incidents are not isolated events but links in a chain, each capable of setting off a cascade of further issues. The key lies in resisting these impulses, favoring a methodical exploration of safe, effective and reversible solutions.

  • Embracing the weeds March 9, 2024

    It's about being agile enough to shift directions based on detailed, early feedback and being fearless in the face of potential failure because every detail examined is a lesson learned.

  • 100x Faster Query in Aurora Postgres with a lower random_page_cost February 24, 2024

    Imagine looking for a specific book in a library. Reading through books sequentially is like a sequential scan in a database, while jumping directly to the desired book is like a random access. The `random_page_cost` reflects the relative cost of random access compared to sequential access in the database. At a very high level.

  • Shipping Fast Requires a High Degree of Trust January 7, 2024

    It means believing that each member will effectively handle their responsibilities, understanding that collective problem-solving is more powerful when issues arise, and recognizing that fast doesn't equate to reckless.

  • Introducing pg_easy_replicate 2.0 December 29, 2023
  • Do you really need Foreign Keys? December 21, 2023

    Foreign keys are a bit like that well-intentioned friend who insists on double-checking everything you do. They’re often recommended as a must-have for enforcing referential integrity checks in your database.

  • pg-osc: Zero downtime schema changes in PostgreSQL February 16, 2022

    pg-osc is a CLI tool for making non-blocking, zero downtime schema changes in PostgreSQL

  • Why I enjoy PostgreSQL - Infrastructure Engineer's Perspective January 17, 2022

    Why I enjoy PostgreSQL - Infrastructure Engineer's Perspective

  • Handling Network Failures in the Cloud May 9, 2020

    Network failures and especially transient ones, are a given.

  • Fetch current signal handlers without overriding in Ruby December 29, 2017

    One thing I noticed when inspecting signals is that there isn't really an easy way of doing so.

© 2026 Shayon Mukherjee