Ax's Findings

by Akshat Mahajan

distributed systems, programming languages, and the internals of operating systems

Read this first

What Can Distributed Systems Teach Us About Concurrent Coding?

Three important questions plagued the asker of this Stack Exchange question on designing concurrent systems in 2010:

  1. How do you figure out what can be made concurrent vs. what has to be sequential?
  2. How do you reproduce error conditions and view what is happening as the [concurrent] application executes?
  3. How do you visualize the interactions between the different concurrent parts of the application?

What is striking is that, seven years later and at least for systems that are truly distributed in the sense of multiple machines, these problems have been mostly, if not fully, solved. As I read his question, I pondered: can the relatively less byzantine failures of ordinary concurrent code (such as, say, web servers) be tackled in a similar way?

In this post, I would like to discuss these particular solutions and how they can be applied to writing resilient concurrent code.


Continue reading →

Useful System Tricks

You can’t debug production systems without a clever choice of tools. In this article, I’d like to talk about the ones I’ve found most useful.


From the man pages:

In the simplest case strace runs the specified command until it exits. It intercepts and records the system calls which are called by a process and the signals which are received by a process.

Given a process ID, strace can tell you exactly what a process is doing at the kernel level. Think of it as a cheap tracing system. I was once able to identify deadlock merely by watching a process this way.

This is useful if you either a) don’t have distributed tracing to figure out what your application is currently doing or b) you don’t have enough logging.

A simple example with Chrome:

$ ps -eaf 
akshat    8174  3105  0 23:09 ?        00:00:00

Continue reading →

Consistent Hash Rings Explained Simply

Consistent hash rings are beautiful structures, yet often poorly explained. Implementations tend to focus on clever language-specific tricks, and theoretical approaches insist on befuddling it with math and tangents irrelevant.

This is an attempt at explanation - and a Python implementation - accessible to an ordinary high-schooler.

 Why Hash?

Fairly often, you need a way to take an item and get back another stored item. For instance, you may want to take a URL and get back the server the website is hosted on.

These cases can be accomplished by a map, which effectively acts like a phonebook - you look up names (or keys), and you get back information (or values) about the name.

An index in a book, too, is a map - given a word, it can take you to the exact page the word is referenced.

All a map is is a way to take something that can point to another item, and then return that

Continue reading →