All About Coding

How we built a high-performance fully-remote team

Alexandros Ntousias — Tue, 17 Nov 2020 14:18:46 GMT

Two years ago, when we were setting up our development team in IOV42, we made a conscious decision to enable remote work as much as possible. Of course, at that time, we were not really considering a fully remote team, but we wanted to have the flexibility to work remotely whenever we needed to.

During the first year we refined our team processes to such a degree that co-location was no longer necessary. This gradually led to all of us working from home more and more frequently.

So, when the COVID-19 global pandemic broke out and we had to transition to a fully-remote team, we were ready! At some point we had people in our team working from Greece, France, Turkey, Austria and of course the UK...and nothing felt any different!

In the rest of this post I will try to present some the things that worked really well for us and hopefully, they will prove useful to more of you out there. Of course, every team is different, so don't expect them to work right out of the box. See what works for you, change it, own it!

Here we go....

Asynchronous communication with Slack

Naturally, when people are not co-located, they tend to create their own work schedule, which in most cases is different than everyone else's. Because of this, the need for asynchronous communication is imperative.

There are many ways and tools to achieve that. We decided to use Slack which worked very well for us. But in order to get the most out of it, we had to improve and refine the ways we were using it.

Here are a few takeaways/tips from our experience:

1. Focused channels

Each channel should have a very specific purpose and defined scope and only the required audience should be included. The last thing you want is people ignoring messages in a channel because a big percentage of these messages do not concern them.

2. Group conversations using threads

By using the thread feature in Slack to reply to a message, allows you to keep all the related replies grouped together. This in turn helps people to follow the discussion much easier, easily find the discussion in the future, and also reduce the noise for everyone else that might not be interested in that particular discussion.

3. Use reactions

Remember, your colleagues cannot see you, so they don't know whether you saw something they posted or not. Reactions is a very handy way to acknowledge that you have seen a message even though you might not have anything to say about it.

4. Avoid long discussions, spamming and flame-wars

When you see that a discussion is derailing and spans over multiple messages, always prefer to jump on a direct call to resolve the issue and come to an agreement. The important thing is, once the issue is resolved with the direct communication, don't forget to notify the rest of the team!

Synchronous communication with GoogleMeet

Of course, there are times when you need to have synchronous communication with one or more of your colleagues. The important thing here is to establish some best practices.

Invest in good gear. Get a decent microphone, preferrably a noise-cancelling one, and always wear headphones...it makes a huge differene in audio quality.
When you don't speak, please mute. Otherwise someone else might mute you!
Don't interrupt others when they talk, wait for your turn.
Try to have your camera enabled. This way, the communication feels more personal.
Finally, if your internet connection is not good, disable your camera. It's more important for people to hear you than to see you.

If you don't establish some ground rules and practices, you might end up in situations like this:

As for the tool, again, many solutions...

We decided to go with GoogleMeet. We were already using Google Workspace so it made sense to go with that. It's extremely easy to setup an instant meeting, it supports a sufficiently big number of participants, by enabling the grid view you can see all of them, it has great screen sharing features, and it integrates perfectly with the rest of the Google services, e.g. GoogleCalendar.

Easy choice. Next!

Virtual office using SpatialChat

One of the biggest issues we had was the loss of ad-hoc communication with colleagues. Co-location makes random conversations so much easier, which in turn very often leads to interesting ideas and innovations. It also strengthens the feel of togetherness that you have when you are all in the same office.

We've tried many different things to address that. We tried "coffee breaks", which were dedicated time slots every other day at which people could login into a GoogleMeet and socialise. We've tried "virtual town hall" rooms which would run the whole day and people could login whenever they wanted to socialise with the rest of the team. Unfortunately, none of that worked!

And then we discovered SpatialChat!

SpatialChat allows you to mimic to a certain degree the interactions you might have in a physical space. When you login to a SpatialChat space, you can go to any of the rooms in that space you want. You are represented by an avatar and you can move yourself around with your mouse. The really cool thing is that the closer you get to someone, the louder you hear him. And of course you can share your screen with others, you can use the megaphone to talk to everyone in the room without having to go next to them, you can enable/disable video, etc.

From the very first few days of usage in IOV42, you would login and you could see groups of colleagues gathered around at different corners in a room, discussing. You could approach them, listen to what they were talking about, move away if the discussion wasn't relevant, start new conversations easily, etc.

The only thorny bit about it is probably the price. But still, it definitely worths a try!

Collaborative diagrams with Miro

When we were all in the same office, it was very easy to just go to a whiteboard and start sharing ideas by drawing architectural and design diagrams, algorithms, etc.

On our search to find something similar to this experience we ended up with Miro. It's a great collaboration tool that provides whiteboards, diagrams and many more. You can use it to organise event storming sessions, retrospectives (we'll talk more about that in the next section), strategy meetings, share design ideas and many more!

And even though it offers so many capabilities, it's still extremely easy to use.

Retrospectives

Something that really helps with the sustainability and efficiency of a team is retrospectives. But for fully-remote teams, it's a must! It's very easy for pressure to build up when you are not physically next to each other and you need to find ways to release it somehow.

Throughout my career, I've seen many different retrospective formats. Lately, and this is purely a personal opinion, I find that the easiest formats are also the most effective ones.

For example, my personal favourite is:

Create two columns, Liked and To improve
Let people write in both columns for 10 minutes
Read through the Liked column outloud so that, as a team, we can celebrate our successes and boost our morale
Read through the To improve column outloud to have an idea of what is on the board
Group the tickets of the To improve column into more general (but not too broad) categories
Vote on what you want to talk about (each person gets three votes)
Start with the highest voted item and walk your way through them while time-boxing this exercise

Remember! The whole point of this exercise is to capture actions that will improve your processes. Also, what I find extremely useful is to always assign each of these actions to a particular person who will be responsible for carrying it out.

With a simple format like this, you don't get distracted by the framework itself and you can solely focus on the content.

There are many tools out there to perform remote retrospectives. But, in the spirit of KISS, we found that a simple GoogleDoc was all we needed! Again, don't distract the team with fancy tools, allow them to focus on the task in hand.

Trust your team

Tools and processes can only get you half way. The rest of it is constantly developing the proper mentality as a team.

Trust plays a really big role in that. As long as we acknowledge that no one is perfect and everyone is trying their best for the sake of the team, we can keep on improving. Be open about things that are not working out but also be mindful of other people's limitations. When you see issues, try to find solutions. When you see limitations, try to overcome them as a team. When you see positive improvements, try to encourage them.

Don't forget, you're all in this together as a team.

Conclusion

Having a fully-remote environment gives you a lot of benefits! People don't have to waste time commuting, it's much easier to focus in the comfort of your home, as a company you don't have to pay for big offices, employees tend to be more satisfied and you get higher retention, and in general it's a great incentive to attract talented people.

So, how do you start building a successful remote team?

It is all about finding the proper tools and processes to compensate to a certain degree what you will miss from not sharing the same physical location. Of course, it will never be exactly the same, but that's ok.

Once you have that, it's a constant refinement process. You see what works and what doesn't and you keep improving from there. Don't be afraid to experiment, try out new things and get rid of anything that doesn't work.

Effective Code Reviews

Alexandros Ntousias — Wed, 13 May 2020 21:43:36 GMT

This is an edited and slightly expanded transcript of a lightning talk I gave for JHUG's virtual meetup.

Throughout my years as a software engineer, I was fortunate enough to join several different teams in various companies. Each of these had different development processes and approaches when it comes to building software, but they had one thing in common: Code Reviews.

Each team used code reviews for different reasons. In some cases, a code review was a way to find defects before pushing something to production. In other cases, they acted as a knowledge sharing tool. There were teams that used code reviews as historical reference.

Regardless of how you use it though, there is one thing that is true in every project I've seen so far:

Nobody likes code reviews!

Code authors hate code reviews. They spend time writing all this code. And after all their hard work, they need to submit a code review and ask their teammates to spot all the errors and mention all the negative things they might have done.

On the other side, code reviewers also hate code reviews. They are working on some piece of code themselves when someone asks them for a review. They hate all this context switching, they want to focus on what they're currently working and not spend time looking at other people's code and have arguments about the best way to implement something.

So, a code review is a process that by nature results in conflict.

However, there are certain things that in my experience can make a huge difference and make the whole process less painful.

Code authors

Let's start with what the code authors can do to ease the process for the reviewers.

Keep it short & sweet

Imagine yourself having to choose a new book to read.

Would you choose a book that has an enormous amount of pages or a book with reasonable size? Well, the reasonable size of course, it's much easier to read.

Many times, in order to develop a feature, we might have to change quite a lot of the existing code. We might have to add modules, implement new services, alter schema tables, etc.

However, the more changes we do, the harder it is for a reviewer to spot problems we might have missed. Always favour splitting a feature into smaller code reviews rather than include all the changes into a single one. A good rule to have is that a PR (pull request) shouldn't have more than 500 lines of code changes.

Back to the book analogy...let's say you have a choice between a book that talks about philosophy, politics, epic battles, music, art, mathematics, etc. And then you have another book that focuses only on one of these subjects. Which one is easier to read?

As code authors, we tend to massively expand the scope of our changes...it's very easy to get carried away when we work. How many times did you have to work on a feature and you ended up also fixing 2-3 bugs, along with a major refactoring?

This can be very confusing for a reviewer as he has to context switch between all the different things that your code have changed. Try to create focused reviews instead. As with code, high cohesion is preferable.

Finally, imagine a book that has all its chapters mixed and another one that has a logical continuation. Which one is easier to read?

In code reviews, these chapters are our commits. And your commit history should tell a comprehensive story. Of course, this is quite difficult to do, as we don't necessarily know from the beginning where some code changes might lead us.

Interactive rebase is your friend. Once you're satisfied with your changes, try to formulate your commit history in a way that would make the reader understand the journey you went through. Squash and edit commits, reword your commit messages, etc.

Give context

Every code review needs to have a descriptive title that when a reviewer reads it, he/she immediately understands what these code changes are all about. If you are also using some issue tracking tool, it's worth including the issue number in the title as well.

In addition to the title, a code review should include a detailed description. Some things that are worth including in this description are:

Overview (as bullet-points) of the changes
Reasoning behind certain design choices
Potential problems that you had to go through that might help reviewers better understand your approach
Things that are missing or that you intentionally left out
Links to resources that could help the reviewers, e.g. link to the issue ticket that this code changes try to address

Finally, it's very useful to annotate with comments all the places in your code review that might require clarification. This really helps the reviewers to avoid going through the same thought-process that the author went through and avoid asking the same questions.

Choose your audience

Usually, when it comes to code reviews, the author must choose certain people from the team to review his code.

It's a good practice to include someone who is familiar with the area of code that you changed, for example the person who was the last one to modify it.

In addition to that, it's also useful to include someone unfamiliar with that particular code. The benefit is that he/she will manage to look at it with completely fresh eyes and might uncover problems or enhancements that you might not have thought about.

Finally, always make sure to include more people than the required number of reviews. For example, if your team's policy is to have 2 reviews, include at least 3-4 people. This allows the whole process to be more efficient as it takes into account that some of your team-members might be too busy or too distracted and others can take their place.

Code reviewers

Now, let's have a look at what the reviewers should do when they receive a code review request.

Tests as important as production code

What's the best way to ensure that a piece of code is working as you expect? Well, testing of course!

A code review should always contain the necessary tests to back up the actual logic changes. This means that not only there should be tests, but they should be reviewed as well.

Some things you should be looking in the tests:

Are the tests readable?
Do they cover all edge cases?
Are the tests overly complicated?
Are the tests testing exactly what we need?
Is there any room for DRYing them up?

Tooling is important

Code reviews contain code. So, as you would use tools (e.g. IDEs) to write code easier, you should do the same with reviews.

I've seen so many people trying to use GitHub's browser-UI to review code changes. So they have to go up and down, trying to piece all the information together, trying to find usages using the browser's search functionality, etc. This is so difficult and inefficient.

It's also very easy to miss problems like that. For example, GitHub shows you only the code that has changed. But many times, you might want to also see the code that hasn't changed...for example, there might be duplication or you might have made a method obsolete, etc.

Here are a few tools that might help you with reviewing code:

CodeStream: Plugin that allows you to do in-IDE code reviews. It is available for all major IDEs out there, i.e. IntelliJ, VSCode, Atom, Visual Studio. The downside of this one is that it requires read access to your GitHub repository, and your company might have strict rules about that.
UpSource: This is a JetBrains product, so only available for IntelliJ. Also, it requires a JetBrains license (although at the moment it is free for up to 10 people). However, if your company is already using JetBrains products, you might want to go with that.
Octotree: If you don't want an IDE-based solution, you could try a browser plugin. It doesn't give you all the benefits of an IDE of course, but at least it makes the review process a bit easier.
Codespaces: GitHub recently released a browser-based IDE. Unfortunately, at the moment, there is limited access to this feature, so I personally haven't tried it. However, I can see how something like that might help with code reviews as well.

Focus on design & logic

One of the things I hate in code reviews are comments like The indentation is not correct here, or This variable is not used.

Human reviewers should focus on everything that cannot be automated. This basically means logic and design. Making sure that the logic is sound, the design makes sense and is using the correct abstractions, the code is reusable and maintainable.

Everything else should be automated and should be checked in your CI pipeline. Some of the things you should be automating are:

Formatting
Static code analysis with predefined styling rules
Static code analysis for common bugs
Code coverage
Checking project external dependencies

Pick your battles

There are so many things to look in a code review. Is the logic correct? Is the design abiding to the SOLID principles? Design patterns? Performance? How about security? Is the solution over-engineered? Is it following the team's and company's agreed practices?

In addition to all the above, you need to keep in mind that different team members might have different interpretations or trade-offs of these. You might want to sacrifice some of your performance in favour of readability. You might not want to use a design pattern at this point to avoid premature optimisations.

So, to avoid making a code review drag for a long time and also maintain the good dynamics of the team, you should pick your battles. Nobody likes someone who leaves comments for every single problem he finds in a code review. Make sure you only mention problems that are very important to you and leave the rest.

Wear the "No, but..." hat

Once you find a problem that you want fixed, avoid just saying No, this won't work. Chances are that the author had something else in mind, so just shooting down his/her ideas is a recipe for disappointment and .

Instead, try to explain why something might not work or it can be written in a better way. Give concrete reasons.

Also, try to give solutions not only focusing on the problems. Be constructive. In general, it's good to have the "Yes, and ...." and "No, but ...." mentality.

Avoid long threads & flame wars

Prefer direct communication over long threads

There are times where both sides, the author and the reviewer, have conflicting thoughts about a certain approach. This can lead to long threads of comments, and heated arguments, each side arguing why their approach is better. This promotes long-lasting code reviews and negativity in general.

Instead, what I've seen working much better is, in the first sign of a long conflict, to pick up the phone and talk to the other person directly. This makes the whole process much faster as it makes the communication synchronous.

In addition to that, it makes the conflict resolution much easier. When you speak to the other person directly, you have more empathy and there are less chances you will be as judgemental and hurtful as you might have been over asynchronous comments.

One important step in this process, is to update the code review with a summary of your discussion. This way, other people are aware of what happened and what was agreed.

Praise goes a long way

Finally, be positive! Always try to find something positive to say, celebrate your successes as a team. You cannot imagine how important this is.

As developers, we have big egos...we are very proud of the code we're writing and we like to share it with our peers. So, congratulating a code author for something he/she wrote or a design decision he/she took can go a long way!

Deep dive into Cassandra

Alexandros Ntousias — Wed, 15 Apr 2020 18:44:12 GMT

Let's start with a disclaimer....I love Cassandra! I have used it in a couple of projects and I was always impressed by what is offering.

But I also believe that Cassandra is one of those technologies that should only be applied to specific use cases. And I assume that's where most of the hatred is coming from, teams trying to use it even though it doesn't really fit their use case.

But before I get carried away, let's first have a look at what is Cassandra and how it's working internally.

What is Cassandra

Cassandra is a highly-scalable, highly-available distributed NoSQL database. It was created at Facebook and later was open-sourced under the Apache umbrella.

It's a column-oriented database and its data model is based on Google's BigTable, while its data distribution design is based on Amazon's DynamoDB.

It's very well suited for handling massive amount of load with very good performance while offering linear scalability (the more nodes you add, the more requests can handle) and no single point of failure.

These are some of the reasons why a lot of big companies such as Facebook, Twitter, Netflix, eBay etc. are using it.

Cluster architecture

When Facebook created Cassandra, they had a specific use case in mind. It was supposed to be used in their messaging app, so their main focus was availability and not strong consistency. And it makes sense in the context of a messaging app, you care more about being able to send a message and not so much about getting every message the exact moment they are sent.

Data distribution

With that in mind, Cassandra's clusters don't have a special master/coordinator node. Nodes in the cluster are peers, they are exactly the same and clients can use any node to connect to the cluster and do reads and writes.

The nodes are logically organised into a ring. That's why we often hear the term Cassandra ring.

Cassandra ring

Each node is owning a part of the overall data that is stored in the cluster. For data distribution, Cassandra uses consistent hashing to decide the owner node of a piece of data. For hashing it uses the primary key of the data (we will discuss more about this later, when we talk about Cassandra's data model) and the Murmur3 algorithm.

Data distribution in token ring using murmur3 and partition key

To achieve the desired availability, we need to replicate data. Cassandra's replication factor basically tells the cluster how many copies of the data it should keep. A typical value of this is 3. So, when a client sends a new piece of data to be written, Cassandra will first write it to the node that owns this data (as we showed above) and then it will send it to the appropriate nodes to replicated according to the replication factor.

Data distribution with replication factor

In addition to the local data distribution, Cassandra also offers global distribution. In other words, it allows you to setup clusters in different datacenters, and Cassandra will take care of the distribution. This is a very popular feature of Cassandra as it allows geographic distribution of data which leads to lower latencies.

Data distribution across datacenters

Gossip protocol

Based on what we've said so far, the following question should arise naturally:

If there is no master or special node in the cluster, and no configuration server is used, how does each node know about the other nodes in the cluster?

And the answer to this question is all the nodes in the network are using the gossip protocol.

Gossip protocol is a peer-to-peer communication protocol that is based on the actual gossip that occurs in social networks.

Basically, each node in the cluster randomly communicates with other nodes and sends information that it currently knows, for example the cluster topology. This helps Cassandra nodes achieve three things:

It's a great way to propagate knowledge regarding the status of the cluster, which nodes are currently participating and how to reach them
It makes node membership (new nodes joining the cluster) easy
Provides easy healthcheck as nodes can easily detect whether another node is down or not

Consistency

Cassandra offers tunable consistency. It allows you to sacrifice performance and availability to achieve greatest consistency. Besides, as the CAP theorem states, we cannot have both strong-consistency and high-availability in a distributed system.

In Cassandra, we have a per-request consistency level. For each request, either a write or a read, the client can define how strong of consistency he/she wants. Basically, the client specifies how many nodes he/she wants to acknowledge the request before Cassandra replies back. Some of the most popular options are:

Consistency Level	Write Request	Read Request
ONE	Written by at least one replica	Result of the closest replica
TWO	Written by at least two replicas	Most recent data from the two closest replicas
THREE	Written by at least three replicas	Most recent data from the three closest replicas
LOCAL_QUORUM	Written by the quorum of replicas in the datacenter	Result after a quorum of the replicas in the same datacenter
QUORUM	Written by the quorum of replicas across all datacenters	Result after a quorum of the replicas across all datacenters
EACH_QUORUM	Written by the quorum of replicas in each datacenter	N/A
ALL	Written by all the replicas	Result from all the replicas

The further you go down in the above list, the lower the availability gets while the consistency increases. For example, in the ALL case, you get a very strong consistency, but if a datacenter or even a single replica is down, the request fails.

However, in most of the projects I've worked with Cassandra, given that it was the tool of choice because of its performance and availability and also due to the fact we could tolerate eventual consistency, we wouldn't go above QUORUM which actually strikes a nice balance between availability and consistency. In some cases, we were even using ONE.

Now, let's say we have a read request that uses consistency level of TWO. And let's assume that for some reason, the two replica nodes return different results. How does Cassandra decide which data should be returned?

Conflict resolution

In situations like that, Cassandra uses the last write wins scheme, and picks the most recent data to return. So, in the above example, if Result B was written after Result A (i.e. node 2 wasn't up-to-date), the cluster would reply with Result B.

In addition to the above conflict resolution, Cassandra will also perform a read repair in situations like that. Basically, upon a read request, when it detects that some replicas are not up-to-date, it will try to fix them. So it will send the correct value to all these out-of-date replicas which will perform this operation like a normal write (with backdated timestamp).

Read repair

Read repair is one of the anti-entropy mechanisms that Cassandra is using to make sure data is consistent. Some others that we won't see in this post are:

Hinted-handoff: When a node goes down for a relatively small period of time, when it comes back up, the other replicas will stream to it all the additional data that were stored while the node was away.
Repair: While read repair fixes specific data on specific replicas, repair will fix consistency issues across your cluster. It's a good practice to run this operation frequently (for example weekly) to ensure the data integrity in your cluster.

Node architecture

So far, we've seen how different Cassandra nodes in a cluster work together to serve read and write requests. But what exactly happens in each of these nodes when such a request comes in?

Let's start with the write requests.

Write path

When a write comes in, it will first be persisted to disk into a commit log. This commit log is an append-only file and contains every write request that the node receives. These entries survive even if the node goes down and they are basically used for recovery in case of failures.

The node also writes the request into an in-memory cache called memtable. This cache accumulates writes that have the same key (i.e. the same partition key) and is being used to serve reads for data that haven't been persisted to disk yet. A node has one memtable for each of its tables.

Once both of these operations happen, the node will send an acknowledgement back to the caller.

The memtable will keep accumulating write requests per key in a sorted fashion until it reaches a specific configurable limit. Once it reaches that limit, then the whole memtable is flushed to disk, in a sorted strings table called SSTable. This is a concept that comes from Google's BigTable.

Write path in a Cassandra node

Periodically, each node will run a background process called compaction to merge the SSTables into a single one. Having a single SSTable helps with the performance of read requests. There are many different types of compaction techniques, but we won't be covering them in this post.

One thing to keep in mind is that all the above structures are append-only structures. When we update existing rows, we do not overwrite data. Instead we create new entries and we leave it to the compaction and read paths to decide which entry to use. Basically, as we saw earlier, Cassandra uses the last write wins scheme.

The same is true for delete operations. Deletes do not actually delete data, but they mark them as deleted by creating a new entry for this data called a tombstone.

Ok, let's see now how the read path works.

Read path

The read path is quite straightforward. When a read request comes in, the node will check the SSTables and the Memtables to find the data requested. Since both SSTables and Memtables are sorted by key, these operations are quite fast.

In addition to that, to avoid going to disk for every single SSTable to check if it contains data for the given key, it uses bloom filters to quickly verify whether it should hit the disk or not. A bloom filter is a probabilistic data structure that tells you whether an element is definitely not in the set or it maybe is. So, if the filter tells you that a key is not in a given SSTable, there is no need to go and check it. On the other hand, if it tells you maybe, then we need to check it since we don't know whether it is or it isn't.

Once it retrieves the data, it will aggregate them by removing duplicates and discarding older entries (keeping only the latest update) and tombstones and it will return them back to the caller.

Read path in a Cassandra node

One thing to keep in mind at this point is that since both memtable and sstables are sorted by partition key, if we need to query data with something else other than the key, Cassandra will struggle. It will need to go over the entirety of the node's SSTables as well as memtables to find what you're looking for. Moreover, given that Cassandra determines the nodes that hold specific data based on the data's primary key, when we search with something else we actually have to search the entire cluster! That's why we DO NOT use Cassandra for relational modelling.

Data modelling

The way to create, insert and query data in Cassandra is by using its query language called CQL (Cassandra Query Language), which is similar to a subset of SQL.

The top-level construct of CQL is a keyspace. If we want to parallelise it to the relational world, a keyspace would be something like a database. A keyspace has a single attribute that you can define which is the replication factor. All the tables inside that keyspace will have this replication factor. And we can be as specific as we want for this, for example defining different replication for the different datacenters that we have in the cluster.

CREATE KEYSPACE my_keyspace WITH REPLICATION = {
	'class': 'NetworkTopologyStrategy',
    'dc1': 2,
    'dc2': 3
};

A keyspace can have one or more tables. Tables are consisted of columns which can be any of the supported types. Each table must have a primary key which should be unique for each row in that table.

One very important thing is that the primary key is different than the partition key. The partition key is formed from the first column that participates in the primary key of the table. This partition key is the one that determines in which node the data will be stored and also will be used for lookups within the node's Memtable and SSTables.

CREATE TABLE events(
    event_address    text,
    event_timestamp  timestamp,
    username         text,
    PRIMARY KEY(event_address, event_timestamp)
);

So, in the above example, the table events has a primary key which consists of event_address and event_timestamp. It also has a partition key which is the event_address.

Inserting and querying data is equivalent to the respective procedures in the SQL-world. The only thing to be careful though, as we've already mentioned, is that data queries must include the partition key for the reasons we explained earlier in the post.

INSERT INTO events(event_address, event_timestamp, username)
VALUES ('acquisition_1', '2020-04-13', 'antousias');

SELECT * FROM events WHERE event_address = 'acquisition_1';

Personal thoughts on Cassandra

As I've mentioned in the beginning, because of the way Cassandra is architectured, it's extremely good at certain use cases and extremely bad at others!

I have used it in the past in various projects with great success. In particular, we had a project in King where our system had to support around 1,000,000 requests per second from all around the world. Cassandra could handle that without breaking a sweat. Good luck doing that with another database.

Having said that, there are a few things you should take into account before deciding to go down the Cassandra route.

Know your queries

You should know the queries before creating your tables. Or at least have an idea of what is going to be the partition key of each table.

As we discussed earlier, the partition key needs to be part of the read requests, so that Cassandra can know which nodes to contact and also in each node to easily find the required entries. Changing from one partition key to another is an expensive and painful operation...so make sure you nail that from the beginning.

Choose partition keys wisely

Choosing the correct partition key can have tremendous impact on the performance of your Cassandra cluster.

But what constitutes a good partition key?

Ideally, partition keys should be uniformly distributed but not extremely scattered. We want to avoid hot-spots that will lead to extremely huge replica nodes, but we also want to avoid partition keys that lead to groups of just a handful of entries.

Availability or consistency?

A big factor of success with Cassandra is understanding the requirements of your use case and choosing the availability and consistency level you want carefully.

As we saw earlier, availability comes mainly from setting the appropriate replication factor in your keyspace. Consistency on the other hand comes from choosing the consistency level for each read and write queries. Some applications will need availability over consistency, while others the opposite. Remember, you cannot have both!

If you want to strike a balance between these two, the following equation is a good rule of thumb:

CL.READ + CL.WRITE > RF

where:
- CL.READ:  Consistency level used for reads
- CL.WRITE: Consistency level used for writes
- RF:       Replication Factor

Devops are people too

One of the pain points of Cassandra is the fact that it needs careful maintenance. Although you get high availability and any failures that might happen are completely invisible to the client, this comes at a cost.

First you need to maintain your data integrity and consistency. This requires analysing your cluster topology and running repair operations frequently.

You also need to have a good understanding of how Cassandra works internally. In most cases, you won't have to deal with any internal problems. In most cases you will get the awesome speed and availability that Cassandra is known for. In most cases the cluster will recover on its own from any failures that might happen.

But, there are these rare moments....

Are you rich enough?

Last but not least, Cassandra can be really expensive!

As we saw earlier, Cassandra is using the gossip protocol for inter-node communication. This can generate a lot of messages and if you're running your cluster in the cloud can increase your monthly bills by a big factor.

Also, if you run a cluster on multiple datacenters, you should be careful on the replication factor you're using as well as the consistency level you choose for your queries. This again can generate a lot of data traffic and this time is across different availability zones, which is even more expensive.

Conclusion

Regardless of these issues, I think Cassandra is an amazing piece of technology. When a database gives you so impressive features, it is expected to have some drawbacks as well.

So, if it fits your use case, definitely give it a try. You won't regret it a bit.

But remember...

Consistent hash rings

Alexandros Ntousias — Thu, 09 Apr 2020 13:46:12 GMT

Systems nowadays are expected to be scalable and highly-available. They should be able to handle any load given to them (always in the boundaries of the agreed SLA) and since they usually run on cheap machines they should be fault-tolerant and always serve requests even if some of the machines die.

How do we achieve that? Well, that's easy...

Make a service stateless, startup a bunch of instances of it, and put the instances behind a load balancer.

But wait a minute....!

There are times that we actually want to make sure a certain request is routed to the same instance.

Some common use cases of this is when we use in-memory caching and we want to minimise cache misses or when the instances of the service are stateful.

Hashing...the naive way

A way to redirect a request into the same node is by using hashing. Basically, apply a hash function to a request to get a number and then somehow map this number to a specific node.

The naive and easy solution to this would be to do a modulo on the hash result with the total number of nodes we have, thus getting a number that corresponds to one of our nodes.

node_number = hash(request) mod total_number_of_nodes

So, as an example, let's assume we have a total of 3 nodes and the following 4 requests:

Request	Hash	Node
A	1337	2
B	1338	0
C	1339	1
D	1330	2

Let's assume now that suddenly the load increases so the autoscaler spins up one more node. Now we have 4 nodes. Let's see what happens to the above example:

Request	Hash	Node (Before)	Node (Now)
A	1337	2	1
B	1338	0	2
C	1339	1	3
D	1340	2	0

As we can see, all the requests were affected by the addition of a new node. Basically, the distribution is changing whenever we change the number of nodes.

Obviously, this is something we want to avoid. And this is exactly what consistent hashing is solving.

Consistent hashing

In consistent hashing, we think of the nodes as points on a ring (thus the name consistent hashing ring).

These points can be determined in various ways. For example, we could use something like the following:

node_point = hash(node_ip_address) mod 360°

Then, the incoming requests are also mapped as points on this ring. In other words, instead of doing a modulo with the total number of nodes as we did before, we do a modulo with 360°.

Finally, once we have a request as a point in our ring, the node that should be used for this request is defined as the next node on the ring in a clockwise order.

Request	Node
A	2
B	3
C	1
D	1

Let's see now what happens when we add a new node in the ring:

Request	Node (Before)	Node (Now)
A	2	2
B	3	3
C	1	4
D	1	1

We notice that, unlike before, only a single request was affected by this change, request C.

Conclusion

Consistent hashing is a great way to achieve hashing in a distributed system which is independent of the number of servers or objects in the system. It's definitely a very useful tool that any engineer dealing with distributed systems should have in his toolbox.

In fact, a lot of the modern distributed highly-scalable databases such as Cassandra, DynamoDB, VoldemortDB, and many more are using consistent hashing internally to determine the nodes that likely contain the information the user is trying to retrieve.

Deep dive into Kafka (pt 2)

Alexandros Ntousias — Wed, 25 Mar 2020 17:50:00 GMT

In the previous post we saw how Kafka topics and partitions work. Let's now have a closer look at how producers and consumers publish and consume messages respectively. Finally, we'll talk about the different delivery guarantees that Kafka provides and how we can configure each one of them.

Producers

A producer is an application that publishes messages to a Kafka topic. A message is essentially a key-value pair, where the key is used to identify the partition of the topic that the event will be published to and the value is the actual message.

Pretty simple, huh? Although Kafka is a very complicated system in its core, all this complexity is hidden away and its producer and consumer abstractions are very easy to grasp and simple to use.

Producer Internals

In order to create a producer, you need to supply the following configurations:

bootstrap.servers: Connection details for at least one of the Kafka brokers in the cluster. By connecting to one broker, you essentially connect to the whole cluster. It's actually a good practice to supply more than one broker, in case one of them is down.
key.serializer: The serialiser class which converts the key of a record to a format understood by Kafka.
value.serializer: The serialiser class which converts the value of a record to a format understood by Kafka.

There are more configuration that you could supply upon creation of the producer, but these are the most common ones. You can find a detailed list of all producer configuration here.

Internally, a Kafka producer looks something like the following:

The application (i.e. application thread) creates a ProducerRecord which sends to the producer by calling producer.send(record). The producer record contains information like the following:

topic: The Kafka topic that the message will be sent to
key: The key is used to determine to which partition of the topic the message will be sent to
value: The actual message we want to send to Kafka
partition: Optional property in case we want to explicitly specify the partition
timestamp: The timestamp of the record as milliseconds since epoch
headers: Any headers to be included along with the record

There might be more or less properties, depending on the Kafka client's implementation. The only mandatory properties though are the topic and the value. But it's good practice to always use a key as well, to control the partitions that each record will end up.

Serializer

Once the producer receives a record to be published, it uses the Serializer to convert it to Kafka's message format. The message format that Kafka is using is configurable upon creation with a variety of options to choose from. Some of the most popular options are:

JSON: Textual format, common format used by browsers/API endpoints
Avro: Binary format, thus non-human readable but also more space-efficient. Allows easier schema evolution via Schema Registry

The type of serialiser that the producer is using is defined upon the creation of the producer using the following configurations: key.serializer and value.serializer.

Partitioner

The next step after serialisation is the Partitioner. At this stage, the producer determines to which partition of the topic, the record should be sent to. By default, the DefaultPartitioner will be used, which follows these rules:

If there is a partition number defined in the incoming record, then that partition is used.
If there is a key in the incoming record, then use hash(record.key) % num_partitions.
If there is no partition number and no key, choose a partition in a round-robin fashion.

In most cases, the default partitioner works just fine. But there might be cases where you want to control the partitioning logic, for example we might want to split certain keys to several partitions just to avoid hotspots. The producer's configuration value that allows us to override the partitioner is partitioner.class.

Buffers and Batches

Once the partition of the record is determined, the record is placed into the partition's buffer. Basically, the producer will try to batch records for efficiency, both IO and compression. These partition buffers also help with back-pressure, since the records will be sent as fast as the broker can keep up, which is configured by max.in.flight.requests.per.connection.

To increase the producer's throughput, it's a good practice to set linger.ms to a value greater than zero. This basically instructs the producer to wait up to this number of milliseconds before sending the contents of the partition buffer, or until the batch fills up (whichever of the two comes first). The batch size is configured by batch.size.

So under heavy load, the batches fill up quickly so the linger time is not met...under lighter load, the producer uses this lingering time to increase IO throughput.

Reliability and Retries

One important aspect that we need to consider when configuring a Kafka producer is its reliability. In other words, what degree of confidence we want to have so that when we publish a message to the cluster, the message is not lost.

The reliability of a producer is achieved via the acknowledgement (ACK) configuration request.required.acks.

When we set this value to 0 , i.e. ACK-0, is equivalent to fire and forget. Basically the producer will place the record in the partition buffer as we saw above and it will never check whether it was successfully persisted in the Kafka cluster.

You might ask, why would anyone choose this option?

And the answer is low-latency.
Since the producer doesn't wait for Kafka's acknowledgement, it's very fast.
This is particularly useful for applications where we have high-volume and we don't really care whether some of the data is lost. Such examples are IoT applications where we don't really care if we lose readings from some of the sensors.

Another option is to set the required acknowledgements to 1 (ACK-1). This is also called leader acknowledgement and it means that the Kafka broker will acknowledge as soon as the partition leader writes the record in its local log but without confirming that the partition followers got the record. As you can imagine, if the partition leader fails immediately after sending back the ack but before replicating the record to its followers, we might have data loss. That's why we usually use this option if a rarely missed record is not very significant, for example log aggregation, data for machine learning or dashboards, etc.

The final option is all (ACK-ALL). In this case, the leader gets write confirmations from all in-sync replicas (ISRs) before sending an acknowledgement back to the producer. This ensures that as long as there is one ISR alive, the data is not lost.

Finally, producers can be configured to retry sending failed messages by setting the configuration value of retries to the maximum number of retries for each failed message. By default the retries is set to 0. One thing you should note though is that if you set the retries to a value greater than zero, then you should also set the max.in.flight.requests.per.connection to 1, otherwise there is a chance that a retried message could be delivered out of order.

Consumers

The last piece of the puzzle to complete our end-to-end flow is the consumers.

As someone would expect, consumers read data from a Kafka topic. In order for a consumer to start reading from a topic, it needs the name of the topic and at least one of the cluster's brokers to connect to. As with producers, once it connects to one broker, it is connected to the whole cluster. Also, as with producers, it doesn't need to connect to the broker that holds the data, Kafka will take care of that.

As we mentioned many times already, a topic is just a logical grouping of partitions. So, consumers actually read data from partitions. But again, Kafka will take care of that, the developer doesn't need to worry about it.

One thing to note though, is that the consumer will read data from the same partition in the order they appear on the partition, but different partitions will be read in parallel. For example, in the picture above, consumers will always read 0A before 0B and the same for 1A and 1B. But the order in which it reads 0A and 1A might differ each time.

Consumer Groups

Consumers are organised into consumer groups. This way we can parallelise the data consumption.

A very important thing to understand is that each partition of a topic is read by one and only one consumer in a consumer group! Let me repeat that...in a consumer group, a partition is read by a single consumer.

As we can see in the above example, each partition is read by one consumer in each partition group, but each consumer can read multiple partitions.

The reason of having a single consumer responsible for a partition in each consumer group is that messages in a partition are read in order.

So what happens if we have more consumers than partitions in a consumer group?

The answer is that the extra consumer is not being used and is on standby mode, ready to be utilised if one of the other consumers fail.

This basically means that the level of parallelism and the scaling of Kafka is defined by the number of partitions we have. The more partitions we have in a topic, the more consumers we can use for that topic.

Rebalancing

But how do consumers enter and exit a consumer group? How do they know which partition to read when they join or how are partitions assigned to other consumers if a consumer exits the group?

When a consumer group is created, one of the brokers in the Kafka cluster is elected to be the group coordinator. In addition to that, the first consumer that joins the consumer group, becomes the group leader. Through the coordination of these two, it's possible to detect changes in the group and re-assign partitions to consumers.

This action of reassigning partitions to the consumers in a consumer group is called rebalancing. The rebalancing process can be described as follows:

A new consumer joins the group and wants to read from the topic.
The group coordinator (elected Kafka broker) detects the new consumer that tries to read and sends a list of all the consumers trying to read, to the group leader (first consumer that joined the consumer group).
The group leader decides the partition assignments and sends them back to the group coordinator.
The group coordinator sends this information to all the consumers.

One thing to keep in mind is that while rebalancing is happening, all the consumers stop reading from their partitions. Frequent rebalancing is a common cause for performance degradation in Kafka consumer applications and it should be monitored and avoided if possible.

Consumer Offsets

Now we know what happens when a new consumer joins a consumer group or when an existing one leaves. But wait, does this mean that we'll have a bunch of duplicate events because the new consumer will re-read the whole partition?

The answer of course is no. The consumer will start reading from the point of the partition that the previous consumer stopped. And this is achieved with consumer offsets.

Basically, consumers will commit the offsets they have read back to Kafka. Kafka keeps track which offsets of a partition have been read by a particular consumer group. This information is kept in a separate topic called __consumer_offsets. So, once a consumer of a particular consumer group is assigned a partition, Kafka will check which was the latest committed offset of that partition for this consumer group, and will send the next offset.

The main question is, when should a consumer commit an offset?

It could commit it once it receives the message from Kafka and before does the actual processing of the message. The problem is that in this case, if for some reason the consumer dies after it commits the offset and before the processing, then we lose the data. The next consumer will not see this message again.

Alright, then it could commit it after it processes the message. But what happens if the processing is complete (for example, the message is transformed and some local database is updated) but the consumer fails right before it commits the offset? In this case, the new consumer that will take over will re-receive the message and it will process it again. So, we're getting duplicate messages.

Which way to go depends on your application use-case, but in most cases the correct thing to do is the latter. You just need to make sure in your application logic that your message processing is idempotent.

Delivery Guarantees

Now that we know how messages are published and consumed from Kafka as well as how they are stored internally, it's time to talk about the delivery guarantees that Kafka provides. As we have seen already, Kafka is a highly configurable system. As such, you can configure it in a different way to achieve different delivery semantics depending on your use case.

At most once

This setup basically ensures that a message published by a producer into the Kafka cluster will arrive zero or one times into the consumer, but not more than that! This guarantees that there will not be any duplicate messages, but there might be some data loss.

It's quite useful for applications where we have high volume of data and some data loss is acceptable. An example application space is IoT applications.

To achieve at-most-once deliveries, first we need to configure the producers so that they don't retry sending messages to the cluster. The best way achieve at-most-once semantics in a producer is to set request.required.acks to 0.

In the consumer side, we're in luck as at-most-once delivery is the default behaviour. In general, to achieve this, we just need to set the enable.auto.commit to true. This basically informs the cluster that a message has been read (i.e. commits the offset) as soon as the consumer receives the message and before processing it.

By default, to avoid performance impact, consumers will try to batch as many offset commits as they can and commit them all at once. The amount of time the consumer will wait before committing the offset is configurable by auto.commit.interval.ms. So, in the case of at-most-once, we need to set this to the lowest timeframe possible, to make sure that the offset is committed before the processing of the message.

At least once

This ensures that a message published by a producer will definitely arrive to the consumer, but it might arrive multiple times. It is probably the most common setup, as it guarantees no data loss and it's not as hard to achieve as the exactly-once case that we'll see next.

First, the producer needs to make sure that a message is safely persisted in the cluster. This is achieved by setting the request.required.acks to all, telling the leader partition to only notify us for success if the message is stored in the leader's and the followers' logs.

In addition, the producer needs to retry sending a message if it doesn't hear back from the Kafka cluster, or if the cluster replies with an error. So, we need to set the retries to a number greater than zero.

Finally, in the consumer side, we need to disable the auto-commit by setting the enable.auto.commit to false, as this could lead to data loss in the case where a consumer commits its offset and then fails before processing the message. Since auto-commit is disabled, the consumer needs to do the offset management by explicitly calling the consumer.commitSync() after the processing of the message is complete.

Consumers with at-least-once semantics, need to handle duplicate messages that may arrive. This can be done by implementing the consumer to have idempotent behaviour.

Exactly once

At this point, those of you who have worked with Kafka or have experience with distributed systems might be thinking that exactly-once delivery in such an environment is impossible or it comes with such a high price that is not practical.

In all the years I've been working with Kafka, I've never seen any real-life producer-consumer systems that uses exactly-once delivery. Most projects tend to stay with at-least-once semantics and implement an idempotent behaviour in the application.

Having all that in mind, let's see how we could achieve exactly-once delivery.

On the producer side, we need to ensure that the messages we send to the cluster are persisted. As we saw on the at-least-once case, this can be achieved by setting the request.required.acks to all and the retries to a value greater than 0.

Of course, this might lead to duplicate messages when a message is successfully persisted in the cluster but the broker dies right before it sends back the acknowledgement, or there is some network failure and the acknowledgement is lost. In that case, the producer won't know that the message was persisted successfully, so it will re-send it, creating this way duplicate message. To avoid that, we need to set enable.idempotence to true.

On the consumer side, things are a bit different depending on whether you're using the consumer API that we saw in this post, or you're using Kafka stream processors (which we'll look in a later post).

Good news first....if you're using stream processors to consume and process your messages, then exactly-once is a simple configuration change. More specifically, we just need to set processing.guarantee to exactly_once in our stream configuration.

If, on the other hand, you're using the Consumer API, things are a bit trickier. You will need to use the transactions API. First, on the producer side, you need to set a unique transactional.id and publish messages as follows:

[code language="java"] producer.initTransactions(); try { producer.send(record); producer.commitTransaction(); } catch (KafkaException e) { producer.abortTransaction(); } [/code]

On the consumer side, you will need to set the isolation.level to read_committed.

In general, Kafka allows exactly-once delivery but it comes with cost in throughput, latency and also complexity. So my personal advice would be, go with at-least-once semantics and handle duplicate messages on the application level.

Conclusion

Phew, that was long! Kudos to all of you who stayed until the end.

This concludes our deep dive to Kafka. I hope you learned something from all of this.

I hope your takeaway is that although Kafka is a very complicated system and there are a lot of corner cases you need to think about, its abstractions are clear and quite easy to use.

Deep dive into Kafka (pt 1)

Alexandros Ntousias — Sun, 09 Feb 2020 19:25:00 GMT

The last couple of years there is a shift towards event-driven microservice architectures. More and more companies have already switched or are in the process of switching to such an architecture with Kafka being the central piece that enables communication between all the different services.

So, apparently Kafka is a thing now!

Given that I have been using Kafka extensively for the past few years, I thought it would be a good idea to provide a deeper look on what Kafka is and how it works internally.

So...what is Kafka?

Oh, if I had a dime for every different definition I've heard over the years...! Messaging system, Distributed replicated log, Streaming platform and many many more.

Well, all of them are correct, but personally I don't think they give an exact definition of what Kafka actually provides. When I'm asked this question, I find it easier to answer as:

Kafka is a distributed, persistent, highly-available event streaming platform with publish-subscribe messaging semantics.

Long one, isn't it? Well, Kafka is complicated and I feel this definition gives it justice. Let's try to break it apart, shall we?

Event Streaming Platform

Kafka is a platform that provides a never-ending series of events. Usually you will see it in a system architecture as the central piece where all the different events that occur in the system are published. These events could really be anything, from user clicks, sensor readings, payment reports, you name it.

Distributed

Kafka is a distributed system, it doesn't run on one machine (although it can, but why would you do that to yourself?). In fact, Kafka is a cluster of machines. Each machine in a Kafka cluster is called a broker.

Brokers are basically responsible for handling requests of new events, storing events and serving them to services that request them. They also have additional functionalities, e.g. leader partition election, but more on this later.

So, if Kafka is a cluster of machines, how does each broker know which other brokers are in the cluster? The answer is Zookeeper. Zookeeper acts as a centralised service that is used to maintain naming and configuration data. So, at this point, Kafka clusters rely on Zookeeper for member discovery.

Persistent

Events that are published to Kafka are persisted in disk. This means, that even if a broker goes down for any reason, when it starts up again, it will still have the events that received before shutting down. More details on how brokers store events on disk will follow on a subsequent post.

Highly Available

Data in Kafka are replicated across multiple brokers. This basically means that even if a broker that holds some data goes down, the data is still available to be consumed by its replicas οn the other brokers.

Publish-Subscribe Messaging Semantics

One very smart thing that the architects of Kafka did, was to design it with publish-subscribe messaging semantics. On one side we have producers publishing messages to Kafka and on the other side consumers consuming these messages.

This was one of the reasons that really helped the adoption of Kafka, because despite its internal complexity, it provides a well-understood and widely-adopted pattern that almost all developers are familiar with.

Topics and partitions

Disclaimer: From now on, I will be using the terms messages and events interchangeably to describe data that are flowing through Kafka.

So, how are events stored in Kafka?

Events are stored in what we call partitions. You could think of partitions like log files (actually that's exactly how the data is stored as we will see later). Each partition is an ordered, immutable sequence of messages that is continually appended to. The messages on each partition are assigned an offset, a sequential id that uniquely identifies the message within a partition. This offset is used by the consumers, but more on that in a later post when we look at producers and consumers in mode detail.

The partitions are further organised into logical groups that are called topics.

Basically, when a producer publishes a message to Kafka, he publishes it to a certain topic. Kafka will decide on which partition of that topic this message will be stored.

But wait, how does it decide which partition to be used?

This is the role of the partitioner. The partitioner will receive a message and it will decide on which partition of a topic it should be sent to. The published message is actually a key-value pair, where the value is the actual data of the message and the key can be anything that makes sense for this event. So, the partitioner will use the key part of the message to make the partition decision. Kafka producers provide default partitioners, but as with everything in Kafka, this is configurable.

The default partitioner guarantees that all messages with the same key published to the same topic, will end up in the same partition. There is however an exception to this rule and this has to do with messages with null key. If you don't supply a key, then the default partitioner will use a round-robin algorithm and will send each message to a different partition.

Partitioner decides on which partition a message will be sent to

As we mentioned above, messages in Kafka are replicated to provide high-availability. In reality, the partitions are the ones that are replicated across different brokers. So, if for example you have 1 topic with 3 partitions and a replication factor of 3 (i.e. 3 replicas for each partition), you end up with 9 partitions scattered around the different brokers in a Kafka cluster. As you would expect, replicas of the same partition cannot be stored in the same broker, in other words the replication factor can only be as high as the number of brokers you have in the cluster.

Let's take an example. Let's suppose we have the following setup:

4 brokers
two topics, A and B
topic A has 2 partitions (partition 0 and 1) with replication factor 3
topic B has 1 partition (partition 0) with replication factor 4

This could look like that:

At this point you might be wondering, what happens if we have two producers that are trying to publish to the same partition but in different brokers? In other words, what happens if we have the following scenario, how would Kafka resolve such a conflict and keep the replicas in-sync?

How would Kafka reconcile the data in the replicas of Topic A-Partition 0?

And the answer to this is: leader partition. You see, for every partition there is one replica that acts as the leader and the rest of the replicas of the same partition are the followers. When a producer wants to publish a message to a partition, it will need to send it to the leader replica for that partition. If it sends it to a follower, then it will get an error back. Once a leader replica receives a new message, it will forward it to the followers.

So, the above scenario is not valid. What happens in reality is something like the following.

Propagate messages from leader to followers

Kafka's scalability comes from the fact that it tries to assign leaders for the different partitions as evenly as possible in the available brokers. This way, producers that are trying to write to different partitions will send their requests to different brokers.

But what happens if the broker that has a leader partition dies? In that case, there will be a new leader elected from the available replicas. One of the brokers in the cluster has the additional role of Controller. This broker-controller is constantly monitoring the rest of the brokers and if it spots a dead one, then it will re-assign any available follower replica as leader for every leader replica that was on the dead broker.

Initial setup

Broker 2 dies

Controller re-assigns leaders

Then, the next obvious question is what happens if the broker that died is the controller? Then there is an election on the remaining brokers to decide which one will be the next controller. This election is being done via Zookeeper. More specifically, the first of the available brokers that will call the /controller path in Zookeeper will be the next controller. This approach provides some minimal guarantees that the next controller will be the broker with the least load, since it was the one that managed to contact Zookeeper first, while the others might have been serving requests.

1. Broker 1 dies

2. Brokers 2 and 3 try to become controllers

3. Broker 2 is faster and becomes the new controller

4. New controller re-assigns missing leaders

With Kafka being a distributed platform, you can expect lots of corner cases. One corner case worth mentioning here is what happens if followers are not in-sync with the leader when the leader dies. There are cases where a broker might be too busy or too slow and thus it cannot keep up with the rate that the leader is accepting messages. In that case we say that the replica is out-of-sync.

In fact, Kafka maintains an in-sync replica set (ISR) for every partition. If a replica gets out-of-sync with the leader of the partition, then it is removed from the ISR of that partition.

When a leader dies, the controller will try to choose one of the replicas in the ISR of that partition to be the new leader. If the only replicas that are available are not part of the partition's ISR, then we call it an unclean leader election. In that case, there is a chance that there might be some data loss. Of course, Kafka will try its best to reconcile the data once the broker with the additional messages comes up online, but this doesn't always work. As we will see when we talk about configuration, you can decide to disable unclean leader election and in that case, if the leader dies and the only available replicas are not in the partition's ISR, Kafka will reject any new messages going to that partition.

Try it out

Now that you have a better understanding on what Kafka is and how messages are stored, it would be a good idea to try it out.

First, you will need to download the latest version of Kafka. At the time of this post, the latest version of Kafka is 2.3.1.

Next, unzip the tar file and move into the newly created directory.

> tar -xzf kafka_2.12-2.3.1.tgz > cd kafka_2.12-2.3.1

As we saw earlier, Kafka is using Zookeeper for cluster membership. So the next step is to start Zookeeper. For now, we'll use the default properties that come along with the Kafka distribution you downloaded.

> bin/zookeeper-server-start.sh config/zookeeper.properties

Finally, we can start our Kafka broker. As with Zookeeper, we will be using the default properties for the broker. You could start multiple of these if you want to try out a cluster with multiple brokers, just make sure to give to each one a different broker id, a different port and a different path for its data.

> bin/kafka-server-start.sh config/server.properties --override broker.id=1 --override listeners=PLAINTEXT://:9092 --override log.dirs=/tmp/kafka-logs-1

Now that we have our Kafka cluster up and running, we can create a topic with name all_about_coding that has 2 partition with replication factor 1. Remember, the replication factor can only be as high as the number of brokers you have in the cluster.

> bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 2 --topic all_about_coding > bin/kafka-topics.sh --list --bootstrap-server localhost:9092 >> all_about_coding

The Kafka distribution comes with a sample producer and consumer to play around with. The producer is a command line client that receives input from standard input and send it to Kafka. So let's try startup the producer and send two messages.

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic all_about_coding >> Deep dive into Kafka (part 1) >> Deep dive into Kafka (part 2)

Finally, we can start the consumer and consume the messages that already exist in the all_about_coding topic. The consumer that comes with the Kafka distribution just reads all messages from a given topic and dumps them on the console.

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic all_about_coding --from-beginning >> Deep dive into Kafka (part 1) >> Deep dive into Kafka (part 2)

Conclusion

I think this is a good point to stop as the post grew quite longer than I anticipated.

I hope you now have a clearer picture of what Kafka provides and how topics and partitions work internally.

In the next post we'll take a closer look at the producers and consumers, as well as the deliver guarantees that Kafka provides.

Monad - Breaking down this dreadful word...

Alexandros Ntousias — Sat, 07 Dec 2019 19:00:00 GMT

If you were ever exposed to functional programming or went through a programming language course, you surely must have heard this dreadful word: Monad. It's usually followed by other words like complicated, confusing, difficult to understand, lots of theory, etc.

I was more than once in that position, trying to understand what a monad is by reading lengthy articles only to find myself more confused than before.

But in all fairness, monad is a pretty easy concept to grasp, at least if you are looking for a practical explanation and don't care about the theory behind it so much.

After all...

So, what is a Monad?

Well, a monad is a concept that comes from category theory. It does have a lot of theory behind it, which I have to admit, it can be quite overwhelming and complicated. If you are interested in that, there are multiple resources online to learn more about it (a good one is this series of lectures on category theory).

But in practice, a monad is basically a type wrapper which represents a specific form of computation.
In other words, you use it to wrap other types in order to give them some additional context.

Let's write some code and try to expand on it as we go along.
Let's try and create a new monad called Container which can represent the result of an operation that can either return a value (in this case it will be NonEmpty) or return nothing (Empty).

sealed trait Container[+A] {
  def isEmpty: Boolean
}

case class NonEmpty[+A](value: A) extends Container[A] {
  override val isEmpty: Boolean = false
}

case object Empty extends Container[Nothing] {
  override val isEmpty: Boolean = true
}

Is that all? Wow, that was simple.

Err, not exactly…in order for a type wrapper to be considered a monad, it needs to provide two operations.

Pure

Monads need to provide a way to wrap a pure value of a type into the monad itself (in other words, to yield a monadic value).

This function can be found with different names, depending on the language/library you might be using, but the most common ones are: identity, pure or unit (in Scala), return (in Haskel).

So, we can extend our example above to include this method.

trait Container[+A] {
  def isEmpty: Boolean
}

object Container {
  def pure[A](value: A): Container[A] =
    if (value == null) Empty
    else NonEmpty(value)
}

case class NonEmpty[+A](value: A) extends Container[A] {
  override val isEmpty: Boolean = false
}

It is sort of a monad constructor if you will. This is the reason why in the above example I defined it in the companion object instead of the trait itself.

FlatMap

The second thing that a monad needs to provide is a way to compose functions that output monadic values (called monadic functions).

Well, this is what I'm talking about:

def flatMap[B](f: A => M[B]): M[B]

Basically, it needs to provide a function that receives another function f as a parameter which is applied in the wrapped type A and returns a monad of another type B.

This function is commonly named flatMap, but again, you will find it with different names, depending on the language and/or library you're using, e.g. bind, >>= (in Haskel).

trait Container[+A] {
  def isEmpty: Boolean
  def flatMap[B](f: A => Container[B]): Container[B] = this match {
    case NonEmpty(value) => f(value)
    case Empty           => Empty
  }
}

object Container {
  def pure[A](value: A): Container[A] =
    if (value == null) Empty
    else NonEmpty(value)
}

Hmmm, so as long as I have a type wrapper that provides two functions with the above signatures, I have a monad?

Not exactly. We talked about naming and signatures…but we never talked about the laws these functions should obey.

Monad Laws

In order for a type wrapper to be considered a proper monad, in addition to providing the pure and flatMap functions, it needs to obey certain laws as well, called monad laws.

In particular, there are 3 laws:

Left Identity: If we create a monad out of a value and then flatMap the monad using a function f, it should give us the same result as applying the function f in the initial value.

def onlyPositives(value: Int): Container[Int] = 
  if (value >= 0) NonEmpty(value) else Empty

val value: Int = 1337

Container.pure(value).flatMap(onlyPositives) == onlyPositives(value)

Right Identity: If we have a monadic value and we flatMap using the pure operation, we should get back the initial monadic value.

val monadicValue: Container[Int] = Container.pure(42)

monadicValue.flatMap(Container.pure) == monadicValue

Associativity: When we have a chain of monadic function applications, it shouldn't matter how they are nested.

val monadicValue: Container[String] = NonEmpty("monads rule")
def size(value: String): Container[Int] = Container.pure(value.length)
def isEven(value: Int): Container[Boolean] = NonEmpty(value % 2 == 0)

monadicValue.flatMap(size).flatMap(isEven) == monadicValue.flatMap(str => size(str).flatMap(isEven))

Conclusion

All the above could be summarised in the following:

A monad is a type wrapper that provides some computation context to the wrapped values
It needs to provide a way of wrapping values of any basic type within the monad (i.e. create monadic values)
It needs to provide a way to compose functions that output monadic values (i.e. monadic functions)
It needs to obey the three monad laws: left identity, right identity and associativity

Of course, in reality, monads will most likely have much more methods than the two I described above. But all these methods, can be composed in one way or another from the two listed here.

I hope, at this point, monad is a less scaring concept for you.

But, why do we even care about monads? Why all these laws and rules? And what are these free monads, comonads and additive monads you've been hearing about?

Well, let's talk about this and also give some concrete examples of monads that you can find in most functional programming languages in another post.

A better way to constrain case class construction in Scala

Alexandros Ntousias — Sun, 10 Nov 2019 18:00:00 GMT

A lot of times in Scala, we want to constraint the creation of certain case classes. Unfortunately, as I found out recently, most of us (myself included) used to do it in a wrong way.

One of the most common ways of doing this is by declaring the case class as private and provide some factory method in the companion object to instantiate it.

private case class Positive(value: Int)

object Positive {
  def fromInt(num: Int): Option[Positive] =
    if (num >= 0) Some(Positive(num))
    else None
}

Although you might find this pattern in a lot of codebases, it has two problems: the companion object's apply and copy methods are still generated. So, the following are still allowed:

// apply from companion object
val wrong1 = Positive(-3)

// copy from companion object
val wrong2 = Positive.fromInt(2).get.copy(-3)

A trick that I found out recently was to use a sealed abstract case class to properly constraint the construction of my case classes.

So, the above code would look something like the following:

sealed abstract case class Positive(value: Int)

object Positive {
  def fromInt(num: Int): Option[Positive] =
    if (num >= 0) Some(new Positive(num) {})
    else None
}

The above syntax gives us the following benefits:

The default apply and copy methods in the companion object are not generated (because of abstract)
We cannot instantiate our case class, using new, outside the file (because of sealed)
We can still use pattern-matching as normal case classes

Maybe there are other better or more interesting ways of achieving the same results. If you do know of any, please let me know in the comments below.