From State Machine to Stateless Microservice

🇨🇿 Česká verze je níže / Scroll down for Czech version.

In my last blog post, I wrote about implementing a state machine inside a microservice I call Remote Control that will automate deployments of our products and monitor the cluster. (Still under construction.) Here I would like to describe how all this was wrong and why I had to rewrite the code completely (again).

Two generals

It was just a few days after I released the blog post. Happy from gaining reactions on social media, glad that I did a good job, Aleš came and told me:
"You will need to rewrite it."
"No! Why? It is working! I just published the article on how great my microservice is!"
"There is a Two Generals' Problem."
"What? Who are the two generals, and what did they do to my microservice?" It really sounded like a joke. "You must have just made this up!"
"No, and unfortunately, it is an unsolvable problem."

"Even better...," I thought.

And Aleš explained to me: Imagine two generals attacking a city together. One from, let's say, a western hill, the second one from the east. However, there is a mist. And they can only communicate through messengers passing through the enemy city, where they can be captured. The generals must attack simultaneously. Otherwise, they won't succeed. So the western general sends a message saying, "Let's attack at 8 pm." However, the messenger might get captured. He might attack at 8, but the eastern army would not know. Thus, he decides to wait for approval: "Yes, I agree, let's attack at 8 pm." However, this approval might get lost as well. The eastern general would attack at 8 pm, as he approved, but the western one wouldn't, as he wouldn't get the approval message.

How I got rid of the generals

Having two generals (two programs) sending each other a message about a state can never lead to 100% consensus. In fact, Remote Control is a two-piece software. And both the pieces were trying to hold a state and synchronize it over messages.

The Two Generals' Problem led us to a completely stateless microservice. Instead of finding a consensus between the two programs, we used Apache ZooKeeper, an open-source software used by tons of technologies as consensus, which is an integral part of our deployments, anyway. Any data that might be stored or cached in the code are being saved to ZooKeeper instead.

Making the microservice stateless made me realize how much data I use, store, and handle inside the microservice. Instantly, I could see all the data represented by files. I could see how much data I needed and that these are often poorly structured. Pulling the data or "state" out of the code helped me design its structure better and eliminate unnecessary operations. My code became much more functional. From now on, I don't need classes (objects) that represent some abstract images of reality and hold little pieces of information. I just do data transformations.

With stateless microservices to high availability

From now on, the functionality of my microservice does not depend on its state. It works as expected in the very first moment of its runtime. The cool thing is that the Remote Control now operates over a consensus which is the same for the whole cluster. Anytime. Thus, we can deploy more instances of the Remote Control with equivalent functionality to get highly available software. One instance on one node of the cluster can fully substitute another one on a different node that might get into trouble.

Od konečného automatu k nekonečným možnostem

Ve svém posledním příspěvku jsem psala o implementaci konečného automatu (state machine) v mikroslužbě, kterou jsem nazvala Remote Control a která bude automatizovat nasazení našich produktů a monitorovat cluster. (Stále ve vývoji.) Tento blog je o tom, jak to celé bylo špatně a proč jsem musela kód (zase) přepsat.

Dva generálové

Jen několik dní poté, co jsem přidala svůj hvězdný článek o state machine, sbírala jsem srdíčka na sociálních sítích, a radovala jsem se, jakou jsem odvedla dobrou práci, přišel Aleš a říká mi:
"Budeš to muset přepsat."
"Ne! Proč? Vždyť to funguje! Navíc jsem právě všem napsala, jak je moje mikroslužba skvělá!"
"To sice je, ale je tam problém dvou generálů."
"Cože? Co jsou sakra ti generálové zač a co udělali s mojí mikroslužbou?" smála jsem se. Opravdu to znělo jako vtip. "To sis teď vymyslel!"
"Ne, a naneštěstí pro tebe je to neřešitelný problém."

"Ještě lepší...," přeběhlo mi v hlavě.

A tak Aleš začal vysvětlovat: Představ si, že dva generálové společně plánují zaútočit na město. Jeden řekněme ze západního kopce, druhý z východního. Je tam však mlha. Mohou tedy komunikovat pouze prostřednictvím poslů procházejících nepřátelským městem, kde mohou být zajati. Generálové musí zaútočit současně. Jinak neuspějí. Západní generál tedy vyšle zprávu: "Zaútočíme v osm hodin večer". Posel však může být zajat. Kdyby zaútočil v osm hodin, ale východní armáda by se to nedozvěděla, bylo by po nich. Rozhodne se tedy počkat na schválení: "Ano, souhlasím, zaútočíme v osm." I tento souhlas by se však mohl ztratit. Východní generál by zaútočil v osm hodin, protože to potvrdil, ale západní ne, protože by k němu potvrzení nedošlo.

Jak jsem se zbavila generálů

To, že si dva generálové (dva programy) posílají zprávu o stavu, nikdy nemůže vést ke 100% konsenzu. Zatím jsem neprozradila to nejdůležitější. Ve skutečnosti Remote Control není jedna mikroservisa, ale dvě. A obě části se snažily udržet stav a synchronizovat jej prostřednictvím zpráv.

Problém dvou generálů nás donutil zbavit Remote Control správy nad stavem. Místo hledání konsenzu mezi oběma programy jsme použili Apache ZooKeepeer, open-source software, který jako konsenzus používá spousta technologií a který je beztak nedílnou součástí našich nasazení. Veškerá data, která byla dříve uložena v mezipaměti programu, teď spravuje ZooKeeper.

Vytvoření bezstavové mikroslužby mě přimělo uvědomit si, kolik dat uvnitř mikroslužby používám, ukládám a zpracovávám. Najednou jsem viděla všechna data, která moje mikroslužba držela v paměti, hezky černé na bílém uspořádaná v ZooKeeperu. Viděla jsem, že jsou data často špatně strukturovaná a kolik kódu věnuji jejich udržování. Vyčlenění dat, nebo chcete-li "stavu", z kódu mi pomohlo lépe navrhnout jejich strukturu a odstranit zbytečné operace. Můj kód se stal více funkcionální. Od této chvíle nepotřebuji třídy (objekty), které představují jakési abstraktní obrazy reality a uchovávají malé kousky informací. Místo specifické transformační funkce žvýkají společná, konsezuální data do podoby, která je zrovna potřeba a když je to zrovna potřeba.

Vysoká dostupnost softwaru bez stavu

Od této chvíle funkčnost mikroslužby nezávisí na jejím stavu. Remote Control teď pracuje nad konsenzem, který je stejný pro celý cluster. Kdykoliv. Můžeme tedy nasadit více instancí mikroslužby, které všechny fungují úplně stejně bez ohledu na to, kde se v clusteru nacházejí a kolik jich tam je. Jedna instance na jednom uzlu clusteru může plně nahradit druhou instanci na jiném uzlu, když se dostane do potíží.

About the Author

Eliška Novotná

Junior backend developer at TeskaLabs. Python and unicorns lover.

TeskaLabs LogMan.io

Log Management and SIEM

Tweets by @TeskaLabs

Most Recent Articles

You Might Be Interested in Reading These Articles

Why Developers Are Boosting Up Their Mobile Application Security?

Mobile application security is a significant issue for developers. Most try their best to make mobile apps secure and safe for their users. Here are some of the other reasons why developers are boosting up their mobile application security.

Continue reading ...

security development

Published on April 14, 2015

And the winner is...Go!

What compiled language for a backend development is the right one to move our technological stack to the next level? We've started to look around for a compiled computer language that will enable us to build microservices with higher performance. Now, don't get me wrong, we don't depart from Python at all. It is the extension of the portfolio.

Continue reading ...

development tech

Published on November 15, 2021

Log management for absolute beginners

New to log management and cybersecurity? Or, maybe you're already a pro, but you're looking for a way to explain log management to someone who is? Either way, you're in the right place.

Continue reading ...

security tech

Published on April 15, 2025

Tags: development, tech, eliska

Follow @TeskaLabs