Maximize experiments, minimize variables

I have been pondering how to foster long-term technical and organizational innovation alongside high-pressure demands for short-term Product work. There are plenty of was our business can be experimenting, but I am specifically talking about technical innovation. I came across this Bezos quote that resonated with me:

“What you want to do as a company is maximize the number of experiments you can do per unit of time”

I can’t actually find where I got that quote, but can track down this other Bezos quote on experimentation:

If you double the number of experiments you do per year, you’re going to double your inventiveness.

Why experiment?

At the confluence of the scientific method and engineering R&D, it can be confusing what the goal of experimentation is. A priori, successful experiments seem desirable than unsuccessful experiments (almost tautologically so). We might typically envision an R&D success as a viable prototype. A scientific success is really just a conclusive result. When doing engineering work, we are tempted to think of success as a viable prototype, when really a conclusive result is just as valuable. Experiment. Fail fast. Iterate.

Strategically Enabling Experimentation

On a strategic level, simply maximizing for total number of experiments and accepting failure are critical. This might be R&D projects, hack-days, spikes with new technologies, somewhat riskier decisions to use new technologies on real projects, etc. Given that tech companies are constantly pushing the envelope and inventing new products, at some level experimentation (and it’s associated risk) is baked into what we do.

Tactically Enabling Experimentation

If there is room to experiment and room to fail, and an understanding of the goal of experimentation is to prove hypotheses, the next goal is to run good experiments. Running good experiments includes clearly defining your hypothesis and limiting your variables. While sometimes it may be necessary to radically do things differently, regularly engaging in poorly-conceived experiments will bring chaos to your organization and reduce the likelihood of any real learning. If you can’t state your hypothesis clearly, odds are you’re not running a good experiment, and are risking the very learnings you are hoping for. Muddy hypotheses can engender experiments that don’t have a clear result, or worse, confuse the communication so different people are expecting different results.

Poorly-planned experiment: Hardware in a cloud world

After running successfully in the cloud for two years, we experienced some significant downtime due to failures of software-defined-networking (oddly enough, we were actually serving the .com website for the company who’s SDN hardware fell-over). Not enough downtime to lose customers, but enough to get emotions going.

After the incident, and much action-biased discussion, we ended up provisioning a rack of 20 co-located hardware servers alongside our 200 cloud servers. At that point, we struggled to communicate the specific hypothesis of our experimentation with hardware. We thought it would be more stable on the network, we knew it would be more performant that the cloud servers at that point, and we knew it would have a significantly different cost profile. We did not know which we were optimizing for.

Sometimes the real world is messy, and it’s not possible to isolate different variables. In these situations, it’s critical to define the hypothesis, not because of abstract veneration of the scientific method, but because otherwise the goals and objectives will be lost in communication.

So, we ran hardware for a while. It was faster, about as stable, a little less pricey, and hugely distracting. It was distracting to set up a different VPN and console access. It was distracting to write two wiki page playbooks for cloud and colo, and worse, it was distracting because it confused the message and vision of our technology.

A year later, hardware is going, the cloud server pricing is radically different, the cloud hardware was vastly upgraded. It’s hard to say it totally wasn’t worth it, but at best we could call it “strategic diversification with an accepted overhead”. But, really, it feels like an expensive experiment.