GeistHaus
log in · sign up

🏡

Part of rnjn.in

Recent content on 🏡

stories primary
Make Certificate Expiry Boring
SSL Certificate Expiry

On 18 November 2025, GitHub had an hour-long outage that affected the heart of their product: Git operations. The post-incident summary was brief and honest - the outage was triggered by an internal TLS certificate that had quietly expired, blocking service-to-service communication inside their platform. It’s the kind of issue every engineering team knows can happen, yet it still slips through because certificates live in odd corners of a system, often far from where we normally look.

https://rnjn.in/articles/make-certificate-expiry-boring/
Guardrails Over Governance

I’ve worked in teams at both ends of the spectrum — teams that follow no governance at all, and teams slowed down by excessive processes. Neither scales well. The right answer, I’ve learned, isn’t more process or less process, but different kinds of control.

As organisations grow, leaders often add process to create predictability. It feels safer — everyone knows what to do, how to deploy, when to ask for approval. But the hidden cost is loss of judgment and agency. People forget they’re supposed to make decisions. They follow the process because it’s easier to justify compliance than to exercise discretion.

https://rnjn.in/articles/guardrails-over-governance/
Don't Outsource Reliability

Riya’s startup had grown faster than she expected. Ten engineers, a product manager, and Arjun - infra specialist engineer everyone called “the devops.” Most days were a blur of feature work, demos, and customer requests. When production slowed down, someone would message the infra guy to “add more CPU.” When dashboards broke, they tagged him again.

The pattern felt harmless at first — everyone was busy, and he was good at unblocking things. But six months in, the signs were harder to ignore. Deployments kept getting slower. Metrics went missing. The database bill had doubled. The infra guy stopped joining standups, saying he was “catching up on tickets.”

https://rnjn.in/articles/dont-outsource-reliability/
Effective Warroom Management

Warroom Management

Incidents are inevitable. What separates resilient organizations from the rest is not whether they experience incidents, but how effectively they respond when problems arise. A well-structured war room process can mean the difference between a minor disruption and a major crisis.

After managing hundreds of critical incidents across my career, I’ve distilled my key learnings into this guide. These battle-tested practices have repeatedly proven their value in high-pressure situations.

https://rnjn.in/articles/effective-warroom-management/
On Unified Observability

Observability Tax

Last month, I watched a senior engineer spend three hours debugging what should have been a fifteen-minute problem. The issue wasn’t complexity—it was context switching between four different monitoring tools, correlating timestamps manually, and losing their train of thought every time they had to log into yet another dashboard. If this sounds familiar, you’re not alone. This is the hidden tax most engineering teams pay without realizing there’s a better way.

https://rnjn.in/articles/unified-observability/
Observability Theatre

Theatre

the·a·tre (also the·a·ter) /ˈθiːətər/ noun

: the performance of actions or behaviors for appearance rather than substance; an elaborate pretense that simulates real activity while lacking its essential purpose or outcomes

Example: “The company’s security theatre gave the illusion of protection without addressing actual vulnerabilities.”

Your organization has invested millions in observability tools. You have dashboards for everything. Your teams dutifully instrument their services. Yet when incidents strike, engineers still spend hours hunting through disparate systems, correlating timestamps manually, and guessing at root causes. When the CEO forwards a customer complaint asking “are we down?”, that’s when the dev team gets to know about incidents.

https://rnjn.in/articles/observability-theatre/
Risks of Cross Functional Team Structure

A PoD structure enables cross functional teams, that is, teams that comprise of different specialisations like engineer, product manager, data scientist etc to work independently and execute on a business goal without dependencies on other teams. This type of structure increases the execution speed but is not without risks. Some of these that I have encountered are -

Division of work

The key to executing fast is independence in decision-making for the area of work, and not having to wait on other teams to complete a team’s work. So the biggest risk to succeeding with this kind of a team structure is that dividing up work that makes teams step on each others’ toes and have dependencies on other teams to complete their goals. For a small company, this isn’t really hard to do, but for larger groups, you have to work hard to maintain independence. A mistake here renders the whole structure useless.

https://rnjn.in/articles/risks-of-cross-functional-team-structure/
How do you (technically) find if you are overstaffed or understaffed?

Staffing is relative to expectations. In the growth mood, you may want to build a lot of features and stretch, you may feel understaffed. If you are in the mood for conservation, you would want to maximise on efforts that make you money in the next 3-6 months period, so a lot of work being done may feel redundant and you feel you are overstaffed. Since this is a complex problem, we need to look at the levers.

https://rnjn.in/articles/how-to-find-if-you-are-overstaffed-or-understaffed/
Product Engineering TODOs for Startup Leaders

Oct 7th, 2022

Series A funds are generally used to grow business, often to scale as a startup finds a product market fit (PMF). Often times, startups slow down and fail 3-4 years after this stage because of decisions made during this stage, especially when it comes to product engineering. Here are some things I think founders and engineering leaders should care about, and set straight at this stage if they haven’t started already. The cost of fixing things grows with time, akin to Boehm’s law.

https://rnjn.in/articles/product-engineering-todos-for-series-a-leaders/
The Ringelmann Effect

Sep 23th, 2022

Lately I have been reading a bit about group dynamics, and I came across something very obvious yet not talked about in simple terms - the effect of group size on individual productivity. Back in 1913, a French agricultural engineer Max Ringelmann discovered what we now know as the Ringelmann Effect - the tendency for individual members of a group to become increasingly less productive as the size of their group increases. Basically, as the size of a team grows, more and more people slack and loaf about at their job.

https://rnjn.in/articles/the-ringelmann-effect/
Learnings from implementing OKRs

OKRs are theoretically quite good for planning and communicating organisational goals and checking progress. The ideas are very simple and draw on empirical wisdom from successful companies like intel and google, thus in absence of a pre-existing system and for new companies its very easy to start using the methodology. OKRs also resonate well with teams that have driven progress iteratively. However, there’s a distinct lack of literature around failures and learnings to get into a good rhythm of planning and execution using OKRs, the available literature does a good job of selling. I would like to try to bridge this gap here, in this live document. This isn’t a primer or an introduction, so it is best if you read a decent text like Doerr’s, and then read this as a supplement.

https://rnjn.in/articles/learnings-about-okrs/
Getting Back to Work Post Paternity Leave

Jul 6th, 2020

A good friend became a father recently, and wanted to get some free advice about what to expect when he got back to work.

Hey,

If you’re reading this because you are getting back to work after your paternity leave, congratulations mate, well done! However, I do not envy you at all, practically speaking. Your life has changed dramatically in the last 2-3 months, you are barely keeping your eyes open, and hoping people around stop advising you on everything and also the opposite. And, I hope, you are also a bit anxious and worried about how best you can support your spouse through this time - as men, we go through nothing compared. These are difficult times, no doubt.

https://rnjn.in/articles/getting-back-to-work-post-paternity-leave/
Boehm's Law

Barry Boehm was a visionary and his research and conclusions laid the foundation of methods (for eg. XP) to build software efficiently. One of his many seminal works is the law that he proved (with data) - The cost of finding and fixing a defect grows exponentially with time Boehm’s Law

Some of the many practices that engineers follow today address the effects of the law -

https://rnjn.in/glossary/boehms-law/
If It Hurts, Do It More Often

Oft noted tenet in the world of programming, this principal is the basis of continuous integration and many more things (I tried to dig a bit into the origins, but didn’t find much success).

Remember, “often” is used here as a relative measure of time, and could be replaced by a unit which makes sense -

  • test integrations on every change (Integrations are hard)
  • standup for 10 minutes everyday instead of an hour at the end of week (Alignments are hard)
  • run performance reviews monthly (Perf reviews are hard)
  • run OKR cadences monthly (Distributed work is hard)
  • break all your user stories to 1 pointers (Estimating big stories is hard)
  • pair program (Code reviews are hard)
  • build and test changes incrementally (Boehm’s law)

As you can see, the value of following this tenet is high when complexity grows exponentially with time.

https://rnjn.in/glossary/if-it-hurts-do-it-more-often/
Iterative Improvement

Iterative improvement is the basis of most modern project management principles (agile/xp/scrum..). It follows the Scientific Method. The steps are simple and intuitive, indeed they follow how we learn most things - by trying and failing and trying again. So in case you have a system that you want to improve - Iterative Improvement

  • Experiment - formalise and execute the change you expect to make in the system
  • Observe - Observe the effect(s) of said change
  • Course Correct - From these observations, find out if changes in the experiments are needed, and then make them.

and repeat, till you are satisfied with the desired improvement. This is it, this is the basis of a virtuous improvement cycle. To improve any system - software or process or cooking, just follow the 3 step plan.

https://rnjn.in/glossary/iterative-improvement/
Goal based performance and progress tracking

Jul 5th, 2020

In general, employee performance cycles are considered a painful experience for most participants. There are many reasons for this -

  • No expectation setting
  • Generic and high level expectation setting
  • Role definitions and Job descriptions do not encompass everything that an employee does
  • Frequent shifting in expectations
  • Long performance review cycles
  • Lack of accountability of evaluators
  • Lack of documented progress through the cycle
  • Quality of reviewers is not monitored
  • Batching of reviews causes crammed up schedules and reduces individual attention

All these (and more) add up to employees getting surprised during reviews, and builds frustration overall even in organisations that take pride in how they treat their employees. A lot of work has gone into this field, and yet, there’s consensus on only one actionable measure - that reactive performance and progress measurement is not a healthy option. There are various methods that would allow you to get better on this front, I would share one such approach below.

https://rnjn.in/articles/set-specific-goals-and-iterate/
About

Hullo there!

My name is Ranjan, you may find my work bio here. I am currently building reliability focused tools at base14. I also help and coach founders and engineering leaders who have found pmf and are aspiring to scale their engineering teams.

More about base14

Nilakanta, Irfan and I have been building base14 since late 2024. Our mission is to help engineering teams build and operate reliable software systems without the complexity and toil that typically comes with it. The first time I worked with these 2 was way back in 2006, at ThoughtWorks. We have been friends and advisors to each other since then. We are a small, focused team that believes in building great products that solve real problems for our customers.

https://rnjn.in/about/
Books that help understanding and building productive teams

Apr 27th, 2020

A very dear friend asked me the other day - what is the best book you’ve read about building teams? I thought it best to take time to answer this. I am not an expert at either, reading books or building teams.


In the last 4-5 years, I have worked and managed teams that have been very productive and have been together even now. None of this is because of a book, or because of me. And no one book alone has helped me understand or improve my skills, but many. I list some below.



Principles of Product Development Flow


Reinertsen’s Principles of Product Development Flow helps you understand how product development really works, and what to look out for. It might sound orthogonal to “building teams”, but teams exist to build products, and good teams must learn to optimise the flow.






The Fifth Discipline


Senge’s The Fifth Discipline is another book that helps thinking in systems






Fieldbook


and the accompanying Fieldbook is a great guide to apply theory to practice






Skin in the Game


Taleb’s Skin in the Game helps understand people behaviours, risk / reward models, pushes one to be invested in the team and the results, commitment not faith.






Powerful


On treating people right (as adults, naturally), I got a lot of help from McCord’s Powerful, it’s one of those special books which either helped me learn something on every page or I nodded vigorously on the rest of them.






Difficult Conversations


Difficult Conversations by Stone et al, a very good help on the topic, and I keep going back to it trying to shed my judgemental self all the while. This stuff should be taught in school.






eXtreme Programming Explained


On to some software books, Beck’s eXtreme Programming Explained has you covered on process, and meta process, don’t need much else even at scale






Release It!


Nygard’s Release It! is a massive help for product teams to be ready and design for difficult situations. It’s almost like having a Principal Engineer in the team who helps you and your team as a guide.






On Writing Well


Communication is very important for teams, and getting better at writing is always advisable. Zinsser’s On Writing Well helps keep it pithy and warm. It’s a great teacher, this book.






The Innovators


Walter Isaacson’s two books that helped me understand craftsmen better, both rather less famous, The Innovators






Leonardo Da Vinci


and Leonardo Da Vinci






Principles


Dalio’s Principles is a great book as well, with veritable advice and it’s just so direct and hard hitting and totally focused on getting stuff done. Period.















That is a long list. There are many others, some I forget, building teams is a lot about knowing people, and I must acknowledge many others have helped me grow up on that, from Wodehouse to Pirsig, Guha, Bradbury, le Guin, Banks, Harari, Rosling, Dickens and Nagarkar, Tolkein to Goodkind to Jemisin, from Parsai and Joshi and Manto and Sawant; Moore and Gaiman and Vaughan and Pran, Fowler and Hohpe and Evans and Cockburn, Polya and Rushdie; I am not an expert at this or that, but whatever I know, I know because I’ve met some great people and read some books written by wonderful and helpful souls.

These days, I am unable to find time to read books, but I hope some people do, it's a good time to read and skill up when you are stuck indoors.

https://rnjn.in/articles/books-that-help-in-team-building/
Scaling

June 2019

At an event organised by ThoughtWorks I presented on scaling - tech, teams and processes. The slides by themselves may not paint a complete picture, if you have any questions, do drop me an email, and I will try to get back to you.

https://rnjn.in/talks/scaling/
On Postels Law and Managing Change

Postel’s law ~19811,

“Be conservative in what you do, be liberal in what you accept from others.” (RFC 793)

A webservice accepting a message with a defined schema (xml/json/other) may choose to do one of these when it encounters messages with extra nodes/properties -

  • be conservative and discard
  • be liberal and accept

Of the two, there are more proponents for the latter than the former. There are many reasons thrown at the conservatives, one of them that has practical implications is change in schema. Before we go further lets state the obvious - if there is never ever going to be a change in the schema of a message, being conservative is just the same as being liberal. Except that the “never ever” never really happens, there’s always some change in schema. The choice of being liberal or conservative mandates thinking about how you deal with change in schema, and that’s a good thing.

https://rnjn.in/articles/on-postels-law-and-managing-change/
On Postels law and managing change

Postel’s law ~19811,

“Be conservative in what you do, be liberal in what you accept from others.” (RFC 793)

A webservice accepting a message with a defined schema (xml/json/other) may choose to do one of these when it encounters messages with extra nodes/properties -

  • be conservative and discard
  • be liberal and accept

Of the two, there are more proponents for the latter than the former. There are many reasons thrown at the conservatives, one of them that has practical implications is change in schema. Before we go further lets state the obvious - if there is never ever going to be a change in the schema of a message, being conservative is just the same as being liberal. Except that the “never ever” never really happens, there’s always some change in schema. The choice of being liberal or conservative mandates thinking about how you deal with change in schema, and that’s a good thing.

https://rnjn.in/test-page/
Pretty Python List Comprehensions

May 12th, 2014

Python list comprehensions are by far the simplest and most readable loop expressions that I have worked with. Here’s an example where I have a list of lists of lists (corpus –> documents –> sentences) where I need to remove some items (called stop_words here) from the sentences.

corpus = [[[word
            for word in sentence if word not in self._stop_words]
         for sentence in document]
       for document in corpus]
                                                      

Notice the array brackets added after each for expression so the program retains the same structure.

https://rnjn.in/articles/pretty-python-list-comprehensions/
Introducing Automation to Large Team

Oct 24th, 2011

A reader on my article at infoq asked an interesting question - introducing automation to a big project which has been worked on for some time. I plan to write more posts on this topic, have a lot of thoughts, but here’s my immediate answer -

A very strong actionable technique that I have seen work well is that you create a small/minimal smoke test suite for you larger app. You can decide on what comprises a suite of smoke test -

https://rnjn.in/articles/introducing-automation-to-large-team/
Why is test automation the backbone of Continuous Delivery?

Aug 25th, 2011

Software testing and verification needs a careful and diligent process of impersonating an end user, trying various usages and input scenarios, comparing and asserting expected behaviours. Directly, the words “careful and diligent” invoke the idea of letting a computer program do the job. Automating certain programmable aspects of your test suite thus can help software delivery massively. In most of the projects that I have worked on, there were aspects of testing which could be automated, and then there were some that couldn’t. Nonetheless, my teams could rely heavily on our automation suite when we had one, and spend our energies testing aspects of the application we could not cover with automated functional tests. Also, automating tests helped us immensely to meet customer demands for quick changes, and subsequently reaching a stage where every build, even ones with very small changes went out tested and verified from our stable. As Jez rightly says in his excellent text about Continuous Delivery, automated tests “take delivery teams beyond basic continuous integration” and on to the path of continuous delivery. In fact, I believe they are of such paramount importance, that to prepare yourself for continuous delivery, you must invest in automation. In this text, I explain why I believe so.

https://rnjn.in/articles/why-test-automation-is-backbone-of-cd/
Who Should Write Functional Tests?

Jan 3rd, 2011

Functional testing code is more often than not treated as a second class citizen. Delivery teams tend to ignore problems with test code over a period of time, and worry more about test results. This leads to poor code quality and bad test architecture, which in turn hurts the maintainability of a test suite. Its this negative feedback cycle that a team should be worried about. In my opinion, treating test code as responsibly as we treat functional code fixes this.

https://rnjn.in/articles/who-should-write-functional-tests/
Why Teams Lose Faith in their Functional Automation suite?

DEC 6th, 2010

In my opinion, there are three main reasons why a functional automation suite loses its value (or the respect that a delivery team should pay it). This leads to a variety of problems, but I will save that for a later post.

The reasons for me are -

  • Non-deterministic tests - run the same test again (without changing the code) and the test gives different results.

    This, in my opinion is the least attractive aspect, a true demotivator for a delivery team to believe in its functional test suite. One of the reasons I have seen non-deterministic failures is when the tests run on hardware that’s performance is non-deterministic, for example VMs which share hardware resources, and hence are dependent on how the other VMs on the same host are performing during the test run. Sometimes, tests are non-deterministic when they access external systems which are non-deterministic. Unless testing the robustness of the system under test, stubbing out such external dependencies have worked out well. Another reason is just bad tests, like tests which depend on the time which they are running at (unless functionally driven to), tests which have time-outs which are not well researched and more. Sometimes the tools used to drive tests are non-deterministic (we found a couple of them driving Silverlight tests) which makes it really difficult to believe in the results of the tests you write. All non-deterministic tests can and should be fixed, but I have seen the effort to fix these being directly proportional to the amount of customer involvement/value to delivery. People also cite them as “random failures” and ignore them. I believe that every developer who calls a program “random” loses respect by a notch everytime they call it so.

https://rnjn.in/articles/why-teams-lose-faith-in-functional-automation/
Is Your Functional Suite Done Right?

Feb 10th, 2009

On the last two projects that I have worked on, both being fairly sized in terms of people (40+), I have seen enormous effort being spent on functional testing. The effort, though not completely wasted, hasn’t yielded proportional gains in terms of quality improvements and quicker feedback on a higher integration level. The following list tries to address issues and my take on fixing them.

Separation of Concerns

Functional suites suffer most from a lack of clear directive on what they are written for. Adding view tests (testing windows UI/html output) which cannot be tested by your regular unit tests to your functional suite is a recipe for disaster. View tests do not belong in the function suite. Unfortunately such tests form almost half of the suite. Coupling these not only increases the run time of a suite, it also mandates that the same testing tool is used for both these sets. Think of an ASP.Net website you are developing. A view test suite can be written using the lightening fast NUnitASP toolset, because you wouldn’t need to attack cross browser compatibility issues and integration between your user interface and services, and you can write your functional suite with Selenium or your favorite browser based testing tool. Also, view tests should be a part of the tests that a pair runs before they check-in, while all functional tests might not (teams generally decide on a subset as smoke), so dividing your tests judiciously between view tests and functional tests is of utmost importance.

https://rnjn.in/articles/is-your-functional-suite-done-right/