Lessons I Learned After Building Pub/Sub Subscriber With Reactive Programming

These are what I learned after 10 months of building my first application with reactive programming

image source

Intro

But apparently, there are some differences if we build a reactive application that is not an HTTP web server API. For example, a subscriber app for Pub/Sub. I personally had some difficulties trying to find references or another people’s experiences when building a reactive Pub/Sub subscriber app. And that’s what makes me write this.

To give you some context…

My subscriber implements reactive programming based on Java Spring Boot’s WebFlux. It communicates with GCP Pub/Sub and a MongoDB cluster to store some data.

Architecture Overview

The lessons

1. Concurrency is still my enemy

meme source

Yea, what’s cool about reactive is we can still have a consistent similar execution time even though our app is flooded with requests. It can handle like a hundred of concurrent requests without blocking one another.

But, as a wise man said, with great power comes great responsibility. Without reactive, we scared our application might be flooded with requests because it could slow down the application. But in reactive, I realized that I must have been scared if my app is so fast, that it could flood another service connected to it, which was in my case, a MongoDB database. I was just realized it when my app received like 40s messages per second. My app was going so fast and started to process all of them in a glimpse. After about 5 seconds, my database driver was going mad because it queued more than 500 queries to the database.

Yes I can configure how much my subscriber app will pull concurrent messages. But it is not enough. Because reactive is non blocking, the thread that listens to Pub/Sub will just publish the task to another worker thread. And that listener thread will assume our message is already processed, so it will immidiately acknowledge it and pull the next message.

What I learned:

  • Add configuration for maximum concurrent message pulling. See Spring Cloud GCP docs and the configuration class docs.
  • Make the listener method blocking. Yes it would violate the reactive manifesto, but without this, the addition of maximum concurrent message configuration is useless. In spring webflux, use .block() method instead of .subscribe() in the message listener method (only in that method) to invoke the message execution.
//example MessageListener.javapublic class MessageListener {

private final Transformer transformer;
private final MongoRepository mongoRepository;

public MessageListener(Transformer transformer, MongoRepository mongoRepository) {
this.transformer = transformer;
this.mongoRepository = mongoRepository;
}

public void onMessageReceive(PubSubMessage pubSubMessage) {
transformer.transformMessage(pubSubMessage)
.flatMap(mongoRepository::save)
.doOnSuccess(data -> log.info("Message execution completed"))
.block(); //use block here
}

}

2. Backpressure is the main feature of reactive

another meme source

I was barely going into reactive programming world. And when I read about backpressure, I did not really understand how the real implementation of backpressure is. But then I realized that reactive is about controlling the pressure of my application have. And it is so important.

Sometimes it’s not a good choice to process each one of our data simultaneously, though it was not related to one another. Our resource is limited and we should use it wisely. Also consider the same problem as my no.1 problem mentioned before: another service we connected with, probably would be a bottleneck and maybe threw an error we did not handle yet and screw our application.

What I learned:

  • When working with Flux, always consider using maxConcurrency param if necessary.

3. Making it work is one thing, but making it reliable is another thing

yet another meme source

I think this is one thing that distinguishes real world programming from college or personal side projects. In college or personal side project, we could make a working web application in just some days or even hours. Because we only need to make it work. But to make sure it is reliable and scalable, are also the main objective if we work in real world software industry. It might work now and we should make sure if it will be still work as our number of users grows.

What I learned:

  • Do some stress tests. In every single topic that your application subscribed on
  • Try to make the stress test environment as close as production environment

4. Plan the database structure 100% before writing any codes

I believe we all agreed about this. But not everyone sees this as a priority. And that’s one thing that happened to me. In my early day of building my app, I realized my database still have no indexes. But I didn’t really take it seriously. I delayed and have not thinking about it, until it is one month away to the deadline.

At that time, we just thought what fields needs to be indexed. But even after applied the index, we found out that it did not help much. It was our database structure design that makes our application not so optimized. But, it was too late to change the database structure because my team and I was like 90% done with the code. And changing the database structure one month away from the deadline is a suicide. So we make some workaround and optimize another part of the code.

What I learned:

  • Plan the database structure, keys, unique fields, and indexes, before we code. Even there will be some changes in business requirement, at least we did not start with a bad structure

5. Don’t talk too much with the database

One of the big lessons I learned is: minimize the number of query to the database. Pretty obvious, right? Well, not necessarily.

Suppose we need to upsert (update or insert) 100 documents with different where clause on each. We can build 100 different query, and then run it in parallel. But that will cause much overhead, especially in the networks. Because the database need to send response for each individual query

Another case is, what if we need to update a field in document X with a value from another document, say document Y? Well, first we need to fetch the document Y, and then we run the update query for document X. It is 2 queries for 1 document update. What if we need to update 100 documents? It means it will be 200 queries to the database. Again, too many queries will cause network overhead and slows the application

Remember, we need to avoid select n+1 problem.

What I learned:

  • It might be different in every cases. But mostly, it is better to have one single large query than separating it in many small queries
  • Use bulk write to write multiple database changes with a single batch query
  • If our application need to fetch some existing data before applying changes to the database, fetch it all at once. For example, use where in clause
//Don'ts
db.collection.updateOne({itemId: 1}, { $set: { "stock" : 3} })
db.collection.updateOne({itemId: 2}, { $set: { "stock" : 1} })
db.collection.updateOne({itemId: 3}, { $set: { "stock" : 4} })
//Do's
db.collection.bulkWrite(
[
{ updateOne :
{
"filter": {itemId: 1},
"update": {$set: {"stock" : 3}
}
},
{ updateOne :
{
"filter": {itemId: 2},
"update": {$set: {"stock" : 1}
}
},
{ updateOne :
{
"filter": {itemId: 3},
"update": {$set: {"stock" : 4}
}
}
]
);

6. Be obsessed with Single Responsibility Principle

what a great meme

After months of writing application with reactive paradigm, I feel like my approach when writing code has changed. The reactive paradigm makes me write code into smaller chunk of functions. Each functions should be easily re-used, maintained, and had no side effects.

Because of it, the Single Responsibility Principle (SRP) is so crucial. The SRP says every single part of our program, should only responsible for one functionality. The class should only do one job. And every method inside it should also do one specific job.

I found many advantage of having SRP in mind when writing code, especially when writing reactive program. One of the main benefit is I can easily customize what method should run in parallel, or not. Another (obvious) advantage is, when the business requirement changes, the code is easier to maintain. And if there is a bug, it is easier to locate it. Remember, always be SOLID.

What I learned:

  • Well, the word ‘obsessed’ might be too much. But every time your tasks are done and you still have time before the sprint ends, re-read your methods, verify every single of them only do one thing. And you’ll benefit later.

7. Have a good relationship with the publisher

This is also one of the biggest lessons I learned. Apparently, the subscriber performance is not only determined by the code of the subscriber itself, but also how the message will be sent by the publishers.

This will be different in every cases. And I found this so tricky to formulate an ideal message contract. I personally still working on it, but these are what I got so far:

What I learned:

  • A single user event must can be represented by one single message. This will ensure transactional operation. And it will be much easier to reslove when you are facing issues in production
  • If it is somehow can’t be represented by one single message, verify the messages will have little or no chance to be a race condition and transactional write conflict
  • A Pub/Sub is an asynchronous architecture. That means you will not expect every changes to apply instantly. There will always be some delay. When an event is already done in the publisher side, it is not always also done in the subscriber side. The publisher side should keep this in mind and prevent (or at least warn) their user to repeat the same event in short point of time

Conclusion

Full time learner. Software Engineer