Under the hood

Thread-local state availability in reactive services

2018-09-12T08:33:14+02:00

Any architecture decision involves a trade-off. It’s no different if you decide to go reactive, e.g. on one side using Reactive Streams implementations gives better resources utilization almost out of the box but on the other hand makes debugging harder. Introducing reactive libraries also has huge impact on your domain, your domain will no longer speak only in terms of Payment, Order or Customer, the reactive lingo will crack in introducing Flux<Payment>, Flux<Order>, Mono<Customer> (or Observable<Payment>, Flowable<Order>, Single<Customer> or whatever Reactive Streams publishers your library of choice provides). Such trade-offs quickly become evident but as you can probably guess not all of them will be so obvious - The Law of Leaky Abstractions guarantees that.

Reactive libraries make it trivial to change threading context. You can easily subscribe on one scheduler, then execute part of the operator chain on the other and finally hop onto a completely different one. Such jumping from one thread to another works as long as no thread-local state is involved, you know - the one you don’t usually deal with on a day to day basis although it powers crucial parts of your services (e.g. security, transactions, multitenancy). Changing threading context when a well hidden part of your tech stack depends on thread-local state leads to tricky to nail down bugs.

Let me demonstrate the problem on a simple example:

private static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
private static final String SESSION_ID = "session-id";

@GetMapping("documents/{id}")
Mono<String> getDocument(@PathVariable("id") String documentId) {
    MDC.put(SESSION_ID, UUID.randomUUID().toString());
    LOG.info("Requested document[id={}]", documentId);
    return Mono.just("Lorem ipsum")
            .map(doc -> {
                LOG.debug("Sanitizing document[id={}]", documentId);
                return doc.trim();
            });
}

With MDC.put(SESSION_ID, UUID.randomUUID().toString()) we’re putting session-id into Mapped Diagnostic Context of underlying logging library so that we could log it later on.

Let’s configure logging pattern in a way that would automatically log session-id for us:

logging.pattern.console=[%-28thread] [%-36mdc{session-id}] - %-5level - %msg%n

When we hit the exposed service with a request (curl localhost:8080/documents/42) we will see session-id appearing in the log entries:

[reactor-http-server-epoll-10] [00c4b05f-a6ee-4a7d-9f92-d9d53dbbb9d0] - INFO  - Requested document[id=42]
[reactor-http-server-epoll-10] [00c4b05f-a6ee-4a7d-9f92-d9d53dbbb9d0] - DEBUG - Sanitizing document[id=42]

The situation changes if we switch the execution context (e.g. by subscribing on a different scheduler) after session-id is put into MDC:

@GetMapping("documents/{id}")
Mono<String> getDocument(@PathVariable("id") String documentId) {
    MDC.put(SESSION_ID, UUID.randomUUID().toString());
    LOG.info("Requested document[id={}]", documentId);
    return Mono.just("Lorem ipsum")
            .map(doc -> {
                LOG.debug("Sanitizing document[id={}]", documentId);
                return doc.trim();
            })
            .subscribeOn(Schedulers.elastic()); // don't use schedulers with unbounded thread pool in production
}

After execution context changes we will notice session-id missing from log entries logged by operators scheduled by that scheduler:

[reactor-http-server-epoll-10] [c2ceae03-593e-4fb3-bbfa-bc4970322e44] - INFO  - Requested document[id=42]
[elastic-2                   ] [                                    ] - DEBUG - Sanitizing document[id=42]

As you can probably guess there’s some ThreadLocal hidden deep inside the logging library we’re using.

Some Reactive Streams implementations provide mechanisms that allow to make contextual data available to operators (e.g. Project Reactor provides subscriber context):

@GetMapping("documents/{id}")
Mono<String> getDocument4(@PathVariable("id") String documentId) {
    String sessionId = UUID.randomUUID().toString();
    MDC.put(SESSION_ID, sessionId);
    LOG.info("Requested document[id={}]", documentId);
    return Mono.just("Lorem ipsum")
            .zipWith(Mono.subscriberContext())
            .map(docAndCtxTuple -> {
                try(MDC.MDCCloseable mdc = MDC.putCloseable(SESSION_ID, docAndCtxTuple.getT2().get(SESSION_ID))) {
                    LOG.debug("Sanitizing document[id={}]", documentId);
                    return docAndCtxTuple.getT1().trim();
                }})
            .subscriberContext(Context.of(SESSION_ID, sessionId))
            .subscribeOn(Schedulers.elastic()); // don't use schedulers with unbounded thread pool in production
}

Of course making data available is just part of the story. Once we make session-id available (subscriberContext(Context.of(SESSION_ID, sessionId))) we not only have to retrieve it but also attach it back to the threading context as well as remember to clean up after ourselves since schedulers are free to reuse threads.

Presented implementation brings back session-id:

[reactor-http-server-epoll-10] [24351524-f105-4746-8e06-b165036d02e6] - INFO  - Requested document[id=42]
[elastic-2                   ] [24351524-f105-4746-8e06-b165036d02e6] - DEBUG - Sanitizing document[id=42]

Nonetheless the code that makes it work is too complex and too invasive to welcome it with open arms in most codebases, especially if it ends up scattered across the codebase.

I’d love to finish this blog post by providing a simple solution to that problem but I haven’t yet stumbled upon such (a.k.a. for now we need to live with such, more complex and invasive, solutions while also trying to move this complexity from business-focused software parts down to it’s infrastructural parts and if possible directly to the libraries themselves).

Refactoring stringly-typed systems

2018-01-21T16:45:17+01:00

Last year I joined a project that was taken over from another software house that failed to satisfy client demands. As you can probably tell there were many things that could and should be improved in that “inherited” project and its codebase. Sadly (but not surprisingly) the domain model was one of such orphaned, long-forgotten areas that screamed for help the most.

We knew we needed to put our hands dirty but how do you improve the domain model in an unfamiliar project where everything is so mixed up, tangled and overgrown with accidental complexity? You set boundaries (divide and conquer!), apply small improvements in one area, then move to the other while getting to know the landscape and discovering bigger issues that hide behind those scary, obvious things that hurt your eyes from the first sight. You would be surprised how much you can achieve by making small improvements and picking low-hanging fruits, yet at the same time you would be a fool thinking that they could solve major issues that have grown up there due to the lack of (or not enough) modeling efforts taken right from the dawn of the project. Nevertheless without those small improvements it would be way harder to tackle most of the major domain model issues.

For me bringing more expressiveness and type-safety into code by introducing simple value objects was always one of the lowest-hanging fruits. It’s a trick that always works, especially when dealing with codebases stinking with primitive obsession code smell and the mentioned system was a stringly-typed one. It was full of code looking like this:

public void verifyAccountOwnership(String accountId, String customerId) {...}

while I bet everyone would prefer it to look more like that:

public void verifyAccountOwnership(AccountId accountId, CustomerId customerId) {...}

It’s not a rocket science! I’d say it’s a no-brainer and it always surprises me how easy it is to find implementations operating on e.g. vague, contextless BigDecimals instead of Amounts, Quantities or Percentages.

Code that uses domain specific value objects instead of contextless primitives is:

much more expressive (you don’t need to map strings into a customer identifiers in your head nor worry about any of those strings being an empty string)
easier to grasp (invariants are protected in one place instead of being scattered all around the codebase in ubiquitous if statements)
less buggy (did I put all of those strings in the right order?)
easier to develop (explicit definitions are more obvious and invariants are protected right where you would expect it)
faster to develop (IDE offers much more help and the compiler provides fast feedback cycles)

and those are just a few of the things you get almost for free (you just have to use common sense ^^).

Refactoring towards value objects sounds like a piece of cake (naming things is not taken into account here), you simply extract class here, migrate type there, nothing spectacular. It usually is that simple, especially when the code you have to deal with lives inside a single code repository and runs in a single process. This time however it wasn’t so trivial. Not that it was much more complicated, it just required a tiny bit more thinking (and it makes for a nice piece of work to be described ^^).

It was a distributed system that had service boundaries set at wrong places and shared too much code (including model) between services . The boundaries were set so bad that many crucial operations in the system required numerous interactions (mostly synchronous) with multiple services. There’s a challenge (not so big) in applying mentioned refactoring in a described context in a way that doesn’t end up as an exercise of creating unnecessary layers and introducing accidental complexity at service boundaries. Before jumping to refactoring I had to set some rules, or rather one crucial rule: no changes should be visible from the outside of the service, including backing services. To put it simple all published contracts stay the same and there are no changes required on backing services side (e.g. no database schema changes). Easily said and frankly speaking easily done with a bit of dull work.

Let’s take String accountId for a ride and demonstrate necessary steps. We want to turn such code:

public class Account {

    private String accountId;

    // rest omitted for brevity
}

into this:

public class Account {

    private AccountId accountId;

    // rest omitted for brevity
}

This can be achieved by introducing AccountId value object:

@ToString
@EqualsAndHashCode
public class AccountId {

    private final String accountId;

    private AccountId(String accountId) {
        if (accountId == null || accountId.isEmpty()) {
            throw new IllegalArgumentException("accountId cannot be null nor empty");
        }
        // can account ID be 20 characters long?
        // are special characters allowed?
        // can I put a new line feed in the account ID?
        this.accountId = accountId;
    }

    public static AccountId of(String accountId) {
        return new AccountId(accountId);
    }

    public String asString() {
        return accountId;
    }
}

AccountId is just a value object, it has no identity, it doesn’t change over time, hence it is immutable. It performs all validations in a single place and fails fast on incorrect inputs by failing to instantiate AccountId instead of failing later on on an if statement buried down several layers down the call stack. If it needs to protect any invariants you know where to put them and where to look for them.

So far so good, but what if AccountId needs to be persisted in a database? Well, you just implement an attribute converter:

public class AccountIdConverter implements AttributeConverter<AccountId, String> {

    @Override
    public String convertToDatabaseColumn(AccountId accountId) {
        return accountId.asString();
    }

    @Override
    public AccountId convertToEntityAttribute(String accountId) {
        return AccountId.of(accountId);
    }
}

Then you enable the converter by either @Converter(autoApply = true) set directly on the converter implementation or @Convert(converter = AccountIdConverter.class) set on the entity field.

Of course not everything spins around databases and luckily amongst many not so good design decision applied in the mentioned project there were also many good ones. One of such good decisions was to standardize the data format used for out of process communication. In the mentioned case it was JSON, hence I needed to make JSON payload immune to the performed refactoring. The easiest way (if you use Jackson) is to sprinkle the implementation with a couple of Jackson annotations:

public class AccountId {

    @JsonCreator
    public static AccountId of(@JsonProperty("accountId") String accountId) {
        return new AccountId(accountId);
    }

    @JsonValue
    public String asString() {
        return accountId;
    }

    // rest omitted for brevity
}

I started with the easiest solution. It wasn’t ideal but it was good enough and at that time we had more important issues to deal with. Having both JSON serialization and database types conversion taken care of after less than 3 hours I have moved first 2 services from stringly-typed identifiers to the value object based ones for the identifiers most commonly used within the system. It took so long due to 2 reasons.

The first one was obvious: along the way I had to check if null values were not possible (and if they would then state that explicitly). Without this the whole refactoring would be just a code polishing exercise.

The second one was something I almost missed - do you remember the requirement that the change should not be visible from the outside? After turning account ID into a value object swagger definitions changed as well, now account ID was no longer a string but an object. This was also easy to fix, it just required specifying swagger model substitution. In case of swagger-maven-plugin all you need to do is feed it with the file containing model substitution mappings:

com.example.AccountId: java.lang.String

Was the result of performed refactoring a significant improvement? Rather not, but you improve a lot by making lots of small improvements. Nevertheless this wasn’t a tiny improvement, it brought a lot of clarity into the code and made further improvements easier. Was it worth the effort - I would definitely say: yes, it was. A good indicator of this is that other teams adopted that approach.

Fast-forward a few sprints, having solved some of the more important issues and having started turning inherited, heavily tangled mess into a bit nicer solution based on hexagonal architecture, the time has come to deal with the drawbacks of the taken easiest approach to support JSON serialization. What we needed to do was decouple AccountId domain object from things not related to the domain. Namely we had to move out of the domain the part defining how to serialize this value object and remove domain coupling to Jackson. In order to achieve that we created Jackson Module that handled AccountId serialization:

class AccountIdSerializer extends StdSerializer<AccountId> {

    AccountIdSerializer() {
        super(AccountId.class);
    }

    @Override
    public void serialize(AccountId accountId, JsonGenerator generator, SerializerProvider provider) throws IOException {
        generator.writeString(accountId.asString());
    }
}

class AccountIdDeserializer extends StdDeserializer<AccountId> {

    AccountIdDeserializer() {
        super(AccountId.class);
    }

    @Override
    public AccountId deserialize(JsonParser json, DeserializationContext cxt) throws IOException {
        String accountId = json.readValueAs(String.class);
        return AccountId.of(accountId);
    }
}

class AccountIdSerializationModule extends Module {

    @Override
    public void setupModule(SetupContext setupContext) {
        setupContext.addSerializers(createSerializers());
        setupContext.addDeserializers(createDeserializers());
    }

    private Serializers createSerializers() {
        SimpleSerializers serializers = new SimpleSerializers();
        serializers.addSerializer(new AccountIdSerializer());
        return serializers;
    }

    private Deserializers createDeserializers() {
        SimpleDeserializers deserializers = new SimpleDeserializers();
        deserializers.addDeserializer(AccountId.class, new AccountIdDeserializer());
        return deserializers;
    }

    // rest omitted for brevity
}

If you’re using Spring Boot configuring such module requires simply registering it in the application context:

@Configuration
class JacksonConfig {

    @Bean
    Module accountIdSerializationModule() {
        return new AccountIdSerializationModule();
    }
}

Implementing custom serializers was also something we needed because along all the improvements we have identified more value objects and some of them were a bit more complex - but that’s something for another article.

Controlling parallelism level of Java parallel streams

2017-10-16T08:21:23+02:00

With recent Java 9 release we got many new goodies to play with and improve our solutions once we grasp those new features. The release of Java 9 is also a good time to revise whether we have grasped Java 8 features.

In this post I’d like to bust the most common misconception about Java parallel streams. It’s often said that you cannot control parallel streams’ parallelism level in a programmatic way, that parallel streams always run on shared ForkJoinPool.commonPool() and there’s nothing you can do about it. This is the case if you make your stream parallel by just adding parallel() call to the call chain. That might be sufficient in some cases, e.g. if you perform only lightweight operations on that stream, however if you need to gain more control over your stream’s parallel execution you need to do a bit more than just calling parallel().

Instead of diving in into theory and technicalities let’s jump straight to the self-documenting example.

Having a parallel stream being processed on shared ForkJoinPool.commonPool():

Set<FormattedMessage> formatMessages(Set<RawMessage> messages) {
    return messages.stream()
            .parallel()
            .map(MessageFormatter::format)
            .collect(toSet());
}

let’s move parallel processing to a pool that we can control and don’t have to share:

private static final int PARALLELISM_LEVEL = 8;

Set<FormattedMessage> formatMessages(Set<RawMessage> messages) {
    ForkJoinPool forkJoinPool = new ForkJoinPool(PARALLELISM_LEVEL);
    try {
        return forkJoinPool.submit(() -> formatMessagesInParallel(messages))
                .get();
    } catch (InterruptedException | ExecutionException e) {
        // handle exceptions
    } finally {
        forkJoinPool.shutdown();
    }
}

private Set<FormattedMessage> formatMessagesInParallel(Set<RawMessage> messages) {
    return messages.stream()
            .parallel()
            .map(MessageFormatter::format)
            .collect(toSet());
}

In this example we’re interested only in the parallelism level of the ForkJoinPool though we can also control ThreadFactory and UncaughtExceptionHandler if needed.

Under the hood the ForkJoinPool scheduler will take care of everything, including incorporating work-stealing algorithm to improve parallel processing efficiency. Having said that it’s worth to mention that manual processing using ThreadPoolExecutor might be more efficient in some cases, e.g. if the workload is evenly distributed over worker threads.

Checking reactive repositories for backpressure support

2017-10-10T09:05:19+02:00

When I’m working with a repository that returns Flux, RxJava 1.x Observable or some other reactive dataflow I wonder if it supports backpressure, especially if the returned dataset might contain more than a handful results. The bigger the dataset the more crucial it is to have backpressure support. In the extreme cases without backpressure support we may end up with an OutOfMemoryError, while we could have handled the whole dataset without any issues if backpressure was supported. In less extreme cases we may end up fully utilizing available resources and making the application unresponsive for significant period of time. When processing larger datasets without backpressure support we may also experience longer or more frequent stop the world pauses leading to worse latency or noticeable drop in throughput due to the pressure put on garbage collector.

Unfortunately often it’s not that obvious nor documented whether the repository supports backpressure all the way down the stack. Due to that I developed a simple test for backpressure support¹. It requires writing a simple, single-threaded subscriber to process data coming from the repository:

documentRepository.findAll()    // this returns Flux<Document> but does it support backpressure?
        .subscribe(document ->
                LOG.debug("Processing {}", document));

Within that implementation we need to put 2 breakpoints, one where the repository is queried (documentRepository#findAll) and the other one where data processing happens (LOG#debug). Then we run it in the debugger with Memory View opened so that we can track how the heap changes between breakpoints.

To demonstrate this process let me show you 2 implementations and the results of hitting the second breakpoint for both of them.

Let’s say that the repository contains 10 000 documents:

INSERT INTO Document (id, content) VALUES
  (1, 'large blob'),
  (2, 'large blob'),
    ...
  (10000, 'large blob'),

First let’s test JPA-based implementation:

class JpaBasedDocumentRepository implements DocumentRepository {

    private final EntityManager entityManager;

    JpaBasedDocumentRepository(EntityManager entityManager) {
        this.entityManager = entityManager;
    }

    public Flux<Document> findAll() {
        return Flux.fromStream(
                entityManager.createQuery("from Document d")
                        .getResultList()
                        .stream());
    }
}

When we hit the second breakpoint we can see that the whole dataset was loaded into memory (or the application crashed with OutOfMemoryError if the dataset was large enough) even though only one document was needed at that time (the subscriber processes results sequentially):

This repository, even though it returns Flux that is capable of supporting backpressue, does not support backpressure, it just loads the whole dataset into memory at once.

Let’s test another implementation, this time a Hibernate-based one:

class HibernateBasedDocumentRepository implements DocumentRepository {

    private final HibernateEntityManager entityManager;

    HibernateBasedDocumentRepository(EntityManager entityManager) {
        this.entityManager = entityManager.unwrap(HibernateEntityManager.class);
    }

    public Flux<Document> findAll() {
        Stream<Document> resultStream = entityManager.createQuery("from Document d")
                .stream();
        return Flux.fromStream(resultStream)
                .doOnTerminate(resultStream::close);
    }
}

In this implementation we create Flux from the stream opened directly on query results.

Let’s take a look at the heap once we hit the second breakpoint:

We can see that only one document was loaded, exactly as much data as we needed to process at that time. If we resume processing and hit that breakpoint for the second time we will see another document being loaded:

Now we have 2 documents loaded, and if we resume processing and hit that breakpoint for the third time we will have 3 documents loaded and so on. At some point GC will kick in and collect² all the loaded documents besides the one currently being processed, which leads to better memory utilization and making the application more resilient.

In some cases results of this test might not be that simple to interpret, e.g. some implementations might support backpressure but load results in batches to reduce communication overhead, but knowing the basic characteristics of the dataset and the datastore under test you should be able to use this test to verify whether the repository supports backpressure.

in this test we examine repositories running in-process with out-of-process datastore hence the way of handling backpressure on the publisher side is as important as allowing subscribers to signal the publisher that the rate of emission is too high ↩
in case of JPA and its’ implementations you have to remember that managed entities won’t be garbage collected ↩

Serving large datasets with Spring WebFlux

2017-09-26T22:51:32+02:00

If you’re serving large datasets from your web service you might like one of the upcoming Spring Framework 5.0 features. But before we get to this feature let’s see how a naive implementation of such service might look like:

@GetMapping(path = "items", produces = "application/json")
List<Item> allItems() {
    return itemRepository.findAll();
}

The naive part of this implementation is that we try to return the whole dataset at once and this can easily make such service unresponsive.

If this dataset is larger than what we can fit into memory or if there are many clients asking for this large dataset at the same time we’ll end up seeing OutOfMemoryError. And even if we have heap large enough to handle such cases there’s a high chance that our client’s won’t be that lucky and would fail with OutOfMemoryError reading the response. Moreover there is an often overlooked latency issue hidden there - with such implementation client can start processing that dataset only after it’s fully loaded, serialized and delivered to him by the server:

Such problems are usually mitigated to some extent by introducing paging. However if client is interested in the whole dataset he now has to issue multiple requests which is not the most convenient solution (not to mention that if no consistency control mechanisms are in place he might get duplicates and/or miss some data).

So can we do better than that? As microservices were the answer to all the questions in the last few years now reactive is the golden hammer. Luckily our problem seems to look more like a nail than a screw, so let’s stab it with a reactive hammer:

@GetMapping(path = "items", produces = "application/json")
Flux<Item> allItems() {
    return itemRepository.findAll();
}

The important part here is that the data source should support backpressure or allow to load data in chunks and at speed that pose no issues for the receiver. Assuming that our repository supports backpressure and given that Flux is capable of supporting it the problem should be solved since WebFlux (reactive HTTP component, part of Spring Framework 5.0) handles Flux payloads quite well.

Unfortunately this implementation still has all the mentioned problems. It’s because we still need all data in place just to serialize it before we start sending the response.

The fix is rather obvious - we need reactive JSON serializer. As I told nowadays reactive is an answer to all problems. Just kidding, we don’t need any reactive serializers (I don’t even know what that means). Good old Jackson, Gson, or any other JSON serializer you prefer should be sufficient. What we need to change is not how we serialize but what we serialize. Let me show you the implementation before I explain what I mean by saying “change what we serialize”.

@GetMapping(path = "items", produces = "application/stream+json")
Flux<Item> allItems() {
    return itemRepository.findAll();
}

As you can see we’re still returning a Flux of Items however now the response has different mediatype. Now instead of returning one large serialized JSON document containing all Items we return a stream of individually serialized Items (a stream of JSON documents):

Under the hood whenever an Item is emitted from the repository it gets serialized, the response buffer is flushed (meaning that bytes start flowing to the client) but the connection is kept open until all documents are emitted, serialized and sent. This doesn’t sound like some magical or new solution, e.g. you might remember tricks like Comet that date back several years in the past.

Of course handling such responses requires clients being able to decode them but that’s not a rocket science and there already exist implementations that can do that (including Spring Framework 5.0 WebClient).

In the last iteration of this service we got rid of OutOfMemoryError issue on the server side as well as significantly reduced the time needed for the client to start processing the first Item in the dataset returned by our service. Another issue we had was OutOfMemoryError on the client side - here all depends on the client being able to process incoming Items as fast as the server is sending them or being able to buffer unprocessed part of the sent dataset. It’s not the perfect solution to this problem but having in mind that we’re communicating over request-response protocol it might be an acceptable one, especially that we have significantly reduced the probability of OutOfMemoryError on the client side.

Resources utilization in reactive services

2017-05-14T23:46:11+02:00

Let me start this post with a question. Imagine a service returning a value whose fetching from some other service (e.g. database) takes 1 second:

@SpringBootApplication
@RestController
public class WebApplication {

    public static void main(String[] args) {
        SpringApplication.run(WebApplication.class, args);
    }

    @GetMapping("/value")
    String fetchValue() throws InterruptedException {
        TimeUnit.SECONDS.sleep(1);
        return "42";
    }
}

How many transactions per second can we get when we hit it with 10 concurrent users?

I know you already have an answer but let’s be a good engineer and measure instead of guessing. Let’s run siege in benchmark mode with 10 concurrent users, each issuing 10 subsequent requests:

$ siege -b -c 10 -r 10 http://localhost:8080/value
Transactions:                100 hits
Availability:             100.00 %
Elapsed time:              10.05 secs
Data transferred:           0.00 MB
Response time:              1.00 secs
Transaction rate:           9.95 trans/sec
Throughput:                 0.00 MB/sec
Concurrency:                9.99
Successful transactions:     100
Failed transactions:           0
Longest transaction:        1.01
Shortest transaction:       1.00

We’re getting 9.95 transactions per second (transaction rate), which is close to the theoretical maximum of 10 TPS.

That was easy. Let’s make it a bit more interesting: how many TPS can we get if we increase the number of concurrent users to 100?

$ siege -b -c 100 -r 10 http://localhost:8080/value
Transactions:               1000 hits
Availability:             100.00 %
Elapsed time:              50.20 secs
Data transferred:           0.00 MB
Response time:              4.82 secs
Transaction rate:          19.92 trans/sec
Throughput:                 0.00 MB/sec
Concurrency:               95.97
Successful transactions:    1000
Failed transactions:           0
Longest transaction:        5.05
Shortest transaction:       1.01

WAT? It’s not even near to 100 TPS. How is that even possible? Well let me tell you a secret, I have set max number of Tomcat worker threads to 20.

server.tomcat.max-threads=20

So now that we know what is the limiting factor let’s get rid of this custom worker thread limit and repeat the test with the default configuration (200 worker threads in case of Tomcat 8.5):

$ siege -b -c 100 -r 10 http://localhost:8080/value
Transactions:               1000 hits
Availability:             100.00 %
Elapsed time:              10.06 secs
Data transferred:           0.00 MB
Response time:              1.00 secs
Transaction rate:          99.40 trans/sec
Throughput:                 0.00 MB/sec
Concurrency:               99.73
Successful transactions:    1000
Failed transactions:           0
Longest transaction:        1.02
Shortest transaction:       1.00

The actual numbers are not that interesting (yes, we went close to 100 TPS) as threads usage:

We start with less than 50 live threads and when 100 users hit the service we quickly reach almost 150 live threads and keep them alive for some time after the traffic is gone just in case they could be reused. It’s worth pointing out that we are limited by the number of worker threads and once we exceed that number of concurrent requests we will start queuing.

Now let’s sprinkle our service with some reactive magic by replacing old handler method with a reactive one:

@GetMapping("/value")
Mono<String> fetchValue() {
    return Mono.fromCallable(() -> {
        sleepFor(1, SECONDS);
        return "42";
    });
}

as well as replacing spring-boot-starter-web dependency by spring-boot-starter-webflux¹.

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-webflux</artifactId>
</dependency>

With all this reactive bits and pieces in place let’s see what happens if we hit our service with 100 concurrent users:

$ siege -b -c 100 -r 10 http://localhost:8080/value
Transactions:               1000 hits
Availability:             100.00 %
Elapsed time:             126.51 secs
Data transferred:           0.00 MB
Response time:             12.01 secs
Transaction rate:           7.90 trans/sec
Throughput:                 0.00 MB/sec
Concurrency:               94.91
Successful transactions:    1000
Failed transactions:           0
Longest transaction:       13.09
Shortest transaction:       1.00

WAT? A mere 8 TPS. There has to be another thread limit in place! In fact there is one: there will be as many event loop threads as Runtime.getRuntime().availableProcessors() (as we can see from the results we have 8 available processors).

Yes, you heard it right - event loops. By default Spring Boot 2 will switch from Tomcat to Netty if you use spring-boot-starter-webflux. Later on we’ll see how our reactive implementation performs on Tomcat but for now let’s stick to Netty, a damn fast asynchronous event-driven network application framework. Let’s take another look at our implementation and see if we have missed something because those 8 TPS stay in opposition to Netty being fast.

@GetMapping("/value")
Mono<String> fetchValue() {
    return Mono.fromCallable(() -> {
        sleepFor(1, SECONDS);
        return "42";
    });
}

Knowing that we have 8 event loop threads we can tell why we are getting just 8 TPS: it’s because those threads quickly get blocked by the blocking operations. It’s irresponsible to run any kind of I/O or any other blocking operation on an event loop thread. The fix is rather obvious - move blocking operation into another thread:

@GetMapping("/value")
Mono<String> fetchValue() {
    return Mono.fromCallable(() -> {
        sleepFor(1, SECONDS);
        return "42";
    }).subscribeOn(Schedulers.elastic());
}

$ siege -b -c 100 -r 10 http://localhost:8080/value
Transactions:               1000 hits
Availability:             100.00 %
Elapsed time:              10.06 secs
Data transferred:           0.00 MB
Response time:              1.00 secs
Transaction rate:          99.40 trans/sec
Throughput:                 0.00 MB/sec
Concurrency:               99.89
Successful transactions:    1000
Failed transactions:           0
Longest transaction:        1.02
Shortest transaction:       1.00

Now we are getting similar results as we had on non-reactive implementation running on Tomcat with default 200 worker threads. Even threads usage looks similar:

The only thing we got rid of is the the worker thread limit that got replaced with unbounded Schedulers.elastic() thread pool. Of course we can (and often should) replace Schedulers.elastic() with a scheduler over which we have full control (e.g. by creating one based on an ExecutorService).

Going reactive on Netty didn’t yield noticeable improvements, so let’s see how the situation looks like when we replace Netty with Tomcat by adding an explicit dependency on spring-boot-starter-tomcat:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-tomcat</artifactId>
</dependency>

$ siege -b -c 100 -r 10 http://localhost:8080/value
Transactions:               1000 hits
Availability:             100.00 %
Elapsed time:              10.07 secs
Data transferred:           0.00 MB
Response time:              1.01 secs
Transaction rate:          99.30 trans/sec
Throughput:                 0.00 MB/sec
Concurrency:               99.86
Successful transactions:    1000
Failed transactions:           0
Longest transaction:        1.03
Shortest transaction:       1.00

The TPS rates are not surprising but threads usage is:

We are using much more threads than we used to (since threads are being kept alive for same time in case they could be reused the actual number of threads used will vary depending on availability of idle threads in the pool). Let’s take another look at the implementation and try to find the reason of such behavior:

@GetMapping("/value")
Mono<String> fetchValue() {
    return Mono.fromCallable(() -> {
        sleepFor(1, SECONDS);
        return "42";
    }).subscribeOn(Schedulers.elastic());
}

By now it should be obvious that due to the thread per request model we can end up using 2 threads to process each request: one container worker thread handling the request and an additional thread performing the blocking operation. To reduce the number of threads used when running on servlet container (i.e. Tomcat) we can simply move back blocking operation to the worker thread (we could also use servlet asynchronous processing facilities in order not to block the worker thread).

@GetMapping("/value")
Mono<String> fetchValue() {
    return Mono.fromCallable(() -> {
        sleepFor(1, SECONDS);
        return "42";
    });
}

Of course if you follow what’s going on in the Spring ecosystem you know that along with WebFlux Spring Framework 5.0 will allow you to replace MVC-style handler method mappings by functional-style RouterFunctions:

@Bean
RouterFunction<ServerResponse> routerFunction() {
    return route(GET("/value"), request -> fetchValueHandler());
}

Mono<ServerResponse> fetchValueHandler() {
    return ServerResponse.ok()
            .body(fetchValue(), String.class);
}

Mono<String> fetchValue() {
    return Mono.fromCallable(() -> {
        sleepFor(1, SECONDS);
        return "42";
    });
}

However it’s just a different way of defining request handlers and it doesn’t make your application run faster nor use less threads.

As we have seen simply going reactive does not imply making your services faster or use less resources, so why there is so much buzz around it? Well if you compare how much simpler and safer it is to control where and how your logic is being executed with reactive APIs:

@GetMapping("/value")
Mono<String> fetchValue() {
    return Mono.fromCallable(() -> {
        sleepFor(1, SECONDS);
        return "42";
    }).subscribeOn(Schedulers.elastic());
}

than without them:

private static final ExecutorService threadPool = Executors.newCachedThreadPool();

@GetMapping("/value")
DeferredResult<String> fetchValue() {
    DeferredResult<String> deferredResult = new DeferredResult<>();
    fetchValueAsync(deferredResult);
    return deferredResult;
}

private void fetchValueAsync(DeferredResult<String> deferredResult) {
    CompletableFuture.supplyAsync(this::fetchValueSync, threadPool)
            .whenCompleteAsync((result, throwable) -> deferredResult.setResult(result));
}

private String fetchValueSync() {
    Sleeper.sleepFor(1, SECONDS);
    return "42";
}

it starts to make sense and making it easier to control this aspect of execution is just one of the perks of reactive implementations.

at the time of writing WebFlux (which is part of Spring Framwork 5.0) is in RC1 and Spring Boot 2.0 is still available only as snapshot releases ↩

Reliable artifact versioning schemes

2016-10-25T00:17:37+02:00

Along with the Mockito 2.1.0 release the Mockito Team published an article describing issues with their previous artifact versioning scheme. This reminds me about another versioning scheme that might cause you real trouble, a scheme that is based on CI server’s next build number.

Before we get into problems related to this versioning scheme let’s quickly discuss some of it’s characteristics. One of the downsides of this scheme is that most of the time it doesn’t carry any useful information. It says nothing about how smooth should the upgrade to the newer version be nor how old the current version is. But hey, at least build number based scheme guarantees proper ordering of versions, doesn’t it? It also guarantees artifact version uniqueness, doesn’t it? Actually such versioning schemes don’t guarantee version uniqueness nor proper ordering. They don’t guarantee that because next build numbers are not under your control. You have to rely on your CI server assigning them properly.

What happens if you completely loose your CI server? What if hurricane strikes your data center? If that seems highly improbable consider a human error, e.g. mistakenly removing some other job than the one you intended to. That sounds probable, and it can even be automated :D

To make error is human. To propagate error to all server in automatic way is #devops.
— DevOps Borat (@DEVOPS_BORAT) February 26, 2011

What if you decide to migrate to another CI server to be able to keep pipelines as a code? With build number based scheme you have given yourself one more thing to work on. Now besides migrating your pipelines you also have to migrate next build numbers. And if you don’t migrate build numbers you no longer can rely on the LATEST version in your deployment scripts nor on your artifact repository properly selecting the latest artifact. Moreover due to duplicated artifact versions you will end up being unable to deploy newly built artifacts to artifact repository and thus being unable to deploy to production.

Don’t get me wrong, build numbers are not completely wrong, they may sometimes even be useful. They are just broken when used as a significant part of versioning scheme. Therefore always make sure you control all significant parts of your versioning scheme or that they are derived from source(s) that guarantees uniqueness and proper ordering.

Custom type renderers in IntelliJ IDEA debugger

2016-07-31T02:03:34+02:00

When I step through code in a debugger it usually means I’m trying to understand why that code behaves differently than expected. In such situations I especially want my IDE to present me only with data relevant to the problem at hand. Unfortunately my IDE doesn’t have a crystal ball that tells it what context I’m in and what data is relevant in the given context. The IDE just displays variables used in the code and allows me to evaluate custom expressions to dig things further. Sadly displayed data often tends to either be meaningless in the given context or contain a lot of noise that I have to filter out by myself risking overlooking significant pieces. Moreover evaluating expressions while being invaluable also means manual work, I have to open evaluate expression dialog and type in the expression risking losing focus of the main problem. To sum it up I want to be presented only with relevant data and I want it to be presented in a meaningful way. This doesn’t sound feasible without me helping my IDE. Fortunately my IDE, IntelliJ IDEA, has a really useful feature called custom type renderers that allows me to provide that necessary help to my IDE. Let’s see how those custom type renderers can help debug things.

Let’s imagine a simple scenario where we calculate an important value based on a price plan and event duration:

Duration eventDuration = new Duration(eventStart, eventEnd);
return estimateCost(pricePlan, eventDuration);

This piece of code works as it should until one day we notice strange results coming out of it. Fortunately we are able to capture inputs that allow us to reproduce the problematic scenario:

PricePlan pricePlan = PricePlan.REGULAR;
DateTime eventStart = new DateTime(2016, 3, 27, 1, 30, WARSAW_TIME);
DateTime eventEnd = new DateTime(2016, 3, 27, 3, 40, WARSAW_TIME);

Not being able to deduct where the problem lies by reasoning about the code we run it through a debugger. Based on the inputs we assume that PricePlan.REGULAR and a duration equal to 2 hours 10 minutes (in milliseconds ¹) is passed in to the estimateCost method.

The debugger shows us that we indeed pass in PricePlan.REGULAR as the first parameter and some duration as the second one. Unfortunately the way org.joda.time.Duration value is presented to us is meaningless, thus we have to manually evaluate an expression to get the data we need:

We can see that event duration is equal to 1 hour 10 minutes instead of 2 hours 10 minutes. Before we dig into the why, let’s focus on how can we make this information easily accessible by using custom type renderer. Let’s create one that displays org.joda.time.Duration values as a result of the following expression:

org.joda.time.format.PeriodFormat.getDefault()
    .withParseType(org.joda.time.PeriodType.time())
    .print(this.toPeriod())

Now the data displayed by the debugger starts being useful:

Having significant data on the first plan let’s go back to analyzing why the duration of an event that starts at 1:30 Warsaw Time and ends at 3:40 Warsaw Time is 1 hour 10 minutes instead of 2 hours 10 minutes.

As I mentioned in the beginning it’s easy to overlook important data when you are filtering out noise by yourself. You want that noise to be automatically filtered out. Given character sequences as long as 2016-03-27T01:30:00.000+01:00 and 2016-03-27T03:40:00.000+02:00 you probably focused on making sure that they represent the same day and then extracting hours. Since both dates were in the same time zone chances are high you simply ignored time offsets. Of course you can alter the way dates are represented so they are easier to grasp and analyze but since you already know where the problem lies let’s make the debugger clearly say it by rendering org.joda.time.DateTime values as a result of the following expression:

org.joda.time.DateTimeZone.forID("Europe/Warsaw")
    .getOffset(this) == 3600000 ? "standard" : "daylight saving"

Yes, you guessed it right, there was a time change between the start and the end of an event:

On 27 March 2016 at 2:00 in Europe/Warsaw time zone clocks are turned forward by 1 hour to change from Central European Time to Central European Summer Time “so that evening daylight lasts an hour longer, while sacrificing normal sunrise times”². Now the debugger clearly says it but at the same time makes it hard to figure out what time those org.joda.time.DateTime values represent:

Let’s make that information easily available by further configuring org.joda.time.DateTime type renderer, this time focusing on how those values are represented after node expansion in the data view:

With the simple this.toString() expression we get exactly what we need:

As you have seen in this simplistic example custom type renderers can make your debugging session way easier by hiding irrelevant data and putting significant data on the first plan. Another noteworthy aspect of custom type renderers is that they automatically re-render all displayed variables after any change made to their definition (including creation and removal) so that you don’t have to rerun your debugging session. In fact all the screenshots in this article come from the same debugging session.

org.joda.time.Duration represents length of time in milliseconds ↩
quote from Wikipedia: Daylight saving time ↩

Interpreting jstat’s number of Full GC events

2016-02-04T23:58:26+01:00

“You can’t control what you can’t measure”¹ and you want to control software you’re running. Nonetheless measuring is not enough, you also need to know how to interpret results of such measurements.

It’s not hard to find examples of misinterpreted measurement results with the free² command output being one of the most commonly misinterpreted ones. Often lack of understanding of used terminology or it’s ambiguity can be accounted for it. For example let’s look at jstat output for a Java process:

$ jstat -gc -t 4648 1s
Timestamp        S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT
           19.4 8704.0 8704.0  0.0   8704.0 69952.0  12416.7   240928.0   173395.8  154352.0 149283.9 22180.0 20722.4     27    0.594   8      0.231    0.825

Anyone knowing JVM Garbage Collection basics should have no trouble telling what those values mean, shouldn’t he? Well let’s try: EC stands for Eden space Capacity, EU denotes Eden space Usage, S0C - Survivor space 0 Capacity, YGC - number of Young generation Garbage Collections, YGCT - Young generation Garbage Collection Time, and so on.

Pretty obvious, isn’t it? So let’s gather statistics for the old generation (and metaspace) and answer the question how many Full GCs³ were performed?

$ jstat -gcold -t 5007 1s
Timestamp          MC       MU      CCSC     CCSU       OC          OU       YGC    FGC    FGCT     GCT
           11.7 113064.0 109246.8  16156.0  14883.7    174784.0    115285.4     13     4    0.079    0.368

If your answer is 4 you might be correct or far from being correct. It all depends on which garbage collector the monitored Java process was using. If it was using Parallel GC the answer would be 4, but since it was using CMS the answer is different, the answer is 2.

Let’s check the GC logs for the analysed Java process that uses CMS collector and see where the difference comes from:

408: [GC (Allocation Failure) 0.408: [ParNew: 69952K->8704K(78656K), 0.0396199 secs] 69952K->21786K(253440K), 0.0397117 secs] [Times: user=0.12 sys=0.02, real=0.04 secs] 
213: [GC (Allocation Failure) 1.213: [ParNew: 78656K->8704K(78656K), 0.0115476 secs] 91738K->29967K(253440K), 0.0116237 secs] [Times: user=0.06 sys=0.00, real=0.00 secs] 
152: [GC (Allocation Failure) 2.152: [ParNew: 78656K->8704K(78656K), 0.0176088 secs] 99919K->38548K(253440K), 0.0176831 secs] [Times: user=0.11 sys=0.00, real=0.02 secs] 
170: [GC (CMS Initial Mark) [1 CMS-initial-mark: 29844K(174784K)] 39075K(253440K), 0.0021170 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
172: [CMS-concurrent-mark-start]
185: [CMS-concurrent-mark: 0.013/0.013 secs] [Times: user=0.08 sys=0.00, real=0.02 secs] 
185: [CMS-concurrent-preclean-start]
186: [CMS-concurrent-preclean: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
186: [CMS-concurrent-abortable-preclean-start]
637: [CMS-concurrent-abortable-preclean: 0.169/0.451 secs] [Times: user=2.13 sys=0.05, real=0.45 secs] 
637: [GC (CMS Final Remark) [YG occupancy: 54140 K (78656 K)]2.637: [Rescan (parallel) , 0.0284352 secs]2.666: [weak refs processing, 0.0001802 secs]2.666: [class unloading, 0.0051609 secs]2.671: [scrub symbol table, 0.0035550 secs]2.675: [scrub string table, 0.0008166 secs][1 CMS-remark: 29844K(174784K)] 83984K(253440K), 0.0391194 secs] [Times: user=0.21 sys=0.00, real=0.04 secs] 
676: [CMS-concurrent-sweep-start]
688: [CMS-concurrent-sweep: 0.011/0.012 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
688: [CMS-concurrent-reset-start]
696: [CMS-concurrent-reset: 0.008/0.008 secs] [Times: user=0.04 sys=0.01, real=0.01 secs] 
044: [GC (Allocation Failure) 3.044: [ParNew: 78656K->8704K(78656K), 0.0251656 secs] 97836K->40288K(253440K), 0.0252453 secs] [Times: user=0.14 sys=0.00, real=0.03 secs] 
677: [GC (Allocation Failure) 3.677: [ParNew: 78656K->8704K(78656K), 0.0159650 secs] 110240K->50554K(253440K), 0.0160374 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
851: [GC (Allocation Failure) 4.851: [ParNew: 78656K->8704K(78656K), 0.0172068 secs] 120506K->61047K(253440K), 0.0172778 secs] [Times: user=0.10 sys=0.00, real=0.02 secs] 
191: [GC (Allocation Failure) 6.192: [ParNew: 78656K->8704K(78656K), 0.0271375 secs] 130999K->77488K(253440K), 0.0272281 secs] [Times: user=0.15 sys=0.00, real=0.03 secs] 
219: [GC (CMS Initial Mark) [1 CMS-initial-mark: 68784K(174784K)] 78713K(253440K), 0.0030824 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
222: [CMS-concurrent-mark-start]
288: [CMS-concurrent-mark: 0.057/0.066 secs] [Times: user=0.39 sys=0.01, real=0.07 secs] 
288: [CMS-concurrent-preclean-start]
291: [CMS-concurrent-preclean: 0.002/0.002 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
291: [CMS-concurrent-abortable-preclean-start]
113: [CMS-concurrent-abortable-preclean: 0.775/0.822 secs] [Times: user=3.79 sys=0.09, real=0.83 secs] 
113: [GC (CMS Final Remark) [YG occupancy: 45871 K (78656 K)]7.113: [Rescan (parallel) , 0.0072989 secs]7.121: [weak refs processing, 0.0005665 secs]7.121: [class unloading, 0.0092666 secs]7.131: [scrub symbol table, 0.0150502 secs]7.146: [scrub string table, 0.0012746 secs][1 CMS-remark: 68784K(174784K)] 114655K(253440K), 0.0346254 secs] [Times: user=0.12 sys=0.01, real=0.03 secs] 
148: [CMS-concurrent-sweep-start]
185: [CMS-concurrent-sweep: 0.035/0.037 secs] [Times: user=0.24 sys=0.00, real=0.04 secs] 
185: [CMS-concurrent-reset-start]
193: [CMS-concurrent-reset: 0.008/0.008 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 
242: [GC (Allocation Failure) 8.242: [ParNew: 78656K->8704K(78656K), 0.0260379 secs] 136080K->77426K(253440K), 0.0261191 secs] [Times: user=0.17 sys=0.00, real=0.03 secs] 
893: [GC (Allocation Failure) 8.893: [ParNew: 78656K->8703K(78656K), 0.0166601 secs] 147378K->85555K(253440K), 0.0167357 secs] [Times: user=0.10 sys=0.00, real=0.01 secs] 
279: [GC (Allocation Failure) 9.280: [ParNew: 78655K->5650K(78656K), 0.0039284 secs] 155507K->82501K(253440K), 0.0040061 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
905: [GC (Allocation Failure) 9.906: [ParNew: 75602K->8704K(78656K), 0.0143583 secs] 152453K->89562K(253440K), 0.0144558 secs] [Times: user=0.08 sys=0.00, real=0.01 secs] 
602: [GC (Allocation Failure) 10.602: [ParNew: 78656K->8704K(78656K), 0.0364558 secs] 159514K->104422K(253440K), 0.0365486 secs] [Times: user=0.14 sys=0.00, real=0.04 secs] 
523: [GC (Allocation Failure) 11.523: [ParNew: 78656K->8704K(78656K), 0.0380868 secs] 174374K->123989K(253440K), 0.0381740 secs] [Times: user=0.23 sys=0.00, real=0.04 secs] 
274: [GC (Allocation Failure) 12.274: [ParNew: 78656K->8704K(78656K), 0.0313268 secs] 193941K->136954K(253440K), 0.0314313 secs] [Times: user=0.17 sys=0.00, real=0.03 secs] 

jstat man-pages say that FGC stands for the number of Full GC events. Not the number of garbage collection cycles but the number of garbage collection events. The significance of events becomes evident if we compare 4 Full GC events reported by jstat to 2 CMS cycles that we can observe in the GC logs.

Let’s take one more look at the GC logs, this time narrowing down our focus to single CMS cycle:

170: [GC (CMS Initial Mark) [1 CMS-initial-mark: 29844K(174784K)] 39075K(253440K), 0.0021170 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
172: [CMS-concurrent-mark-start]
185: [CMS-concurrent-mark: 0.013/0.013 secs] [Times: user=0.08 sys=0.00, real=0.02 secs] 
185: [CMS-concurrent-preclean-start]
186: [CMS-concurrent-preclean: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
186: [CMS-concurrent-abortable-preclean-start]
637: [CMS-concurrent-abortable-preclean: 0.169/0.451 secs] [Times: user=2.13 sys=0.05, real=0.45 secs] 
637: [GC (CMS Final Remark) [YG occupancy: 54140 K (78656 K)]2.637: [Rescan (parallel) , 0.0284352 secs]2.666: [weak refs processing, 0.0001802 secs]2.666: [class unloading, 0.0051609 secs]2.671: [scrub symbol table, 0.0035550 secs]2.675: [scrub string table, 0.0008166 secs][1 CMS-remark: 29844K(174784K)] 83984K(253440K), 0.0391194 secs] [Times: user=0.21 sys=0.00, real=0.04 secs] 
676: [CMS-concurrent-sweep-start]
688: [CMS-concurrent-sweep: 0.011/0.012 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
688: [CMS-concurrent-reset-start]
696: [CMS-concurrent-reset: 0.008/0.008 secs] [Times: user=0.04 sys=0.01, real=0.01 secs] 

We can see that CMS performs its work in several phases and most of them run concurrently with our application. Only Initial Mark and Final Remark are stop-the-world phases and those are the ones that get counted by jstat as Full GC events.

Should you measure something make sure you know what you really measure otherwise the results can keep you far from reality. Always validate your assumptions and RTFM!

Tom DeMarco ↩
Linux is borrowing unused memory for disk caching what makes it looks like you are low on memory while you are not ↩
while we speak about ambiguity it’s worth to note that there is no formal definition of Full GC nor Major GC in JVM Specification (kudos to Bernd Eckenfels for pointing this out) ↩

Fine-tuning psql

2016-01-08T10:24:39+01:00

I like tools that allow you to stay efficient and one of such tools is psql. I have not yet seen any other PostgreSQL client that can beat psql in term of productivity. There are other clients that are better tailored for some types of activities but in terms of overall productivity none of them has been able to beat psql.

Ok, let’s skip the introduction and take a look at a fraction¹ of existing configuration options that allow you to tailor psql for your needs and get a perspective of what’s possible.

Environment-aware command history

Like any good shell psql saves commands you execute to allow you search and re-execute them. By default all commands are stored in a single file (~/.psql_history) but you can easily store command history per-database, per-user, per-host, etc.. For example to store separate command history per-database and not let it be overwritten by commands from more frequently used database just \set HISTFILE ~/.psql_history- :DBNAME in ~/.psqlrc.

Changing the prompt

Let’s \set PROMPT1 '%[%033[1m%]%M %n@%/%[%033[0m%]%# ' and see how it looks like compared to the default one.

If you want to deconstruct the incantation we typed then it goes like this: %[%033[1m%] sets font to bold black², then it prints the hostname (%M), username (%n) and database name (%/) then sets the font to non-bold black ([%033[0m%]) and prints # if the user is a superuser or > otherwise.

As you can probably guess besides the PROMPT1 there is also PROMPT2 and even PROMPT3. PROMPT1 is used when psql requests a new command, PROMPT2 is used when you input a multiline command and PROMPT3 is used when you are expected to type in row values while running SQL COPY.

Tab completion

If you want to use uppercase SQL keywords to make your queries more readable then tab completion is what you are looking for. Tab completion makes it a no-brainer once you \set COMP_KEYWORD_CASE upper.

Printing `NULL` values

By default psql prints NULL values as blank spaces, but you can alter it by setting \pset null '<null>'³.

By default psql uses a pager to paginate text when it deems it necessary but you can make it always always use (\pset pager always) or even disable it (\pset pager off). You can even change the pager itself by setting PAGER environment variable.

Making configuration changes persistent

All of the mentioned options beside the command history file configuration can be set directly on an active psql session and are valid for it’s duration. To make configuration changes persistent they have to be set in ~/.psqlrc.

for more configuration options consult PostgreSQL documentation ↩
it turns bold white on a black background ↩
we have to use pset instead of set because we’re affecting query output ↩