Jekyll2023-12-13T14:41:33+01:00https://kamilszymanski.github.io/feed.xmlUnder the hoodKamil Szymański's dev blogKamil Szymańskikamil.szymanski.dev@gmail.comThread-local state availability in reactive services2018-09-12T08:33:14+02:002018-09-12T08:33:14+02:00https://kamilszymanski.github.io/thread-local-state-availability-in-reactive-services<p>Any architecture decision involves a trade-off.
It’s no different if you decide to go reactive, e.g. on one side using <a href="http://www.reactive-streams.org/">Reactive Streams</a> implementations gives better resources utilization almost out of the box but on the other hand makes debugging harder.
Introducing reactive libraries also has huge impact on your domain, your domain will no longer speak only in terms of <code class="language-plaintext highlighter-rouge">Payment</code>, <code class="language-plaintext highlighter-rouge">Order</code> or <code class="language-plaintext highlighter-rouge">Customer</code>, the reactive lingo will crack in introducing <code class="language-plaintext highlighter-rouge">Flux<Payment></code>, <code class="language-plaintext highlighter-rouge">Flux<Order></code>, <code class="language-plaintext highlighter-rouge">Mono<Customer></code> (or <code class="language-plaintext highlighter-rouge">Observable<Payment></code>, <code class="language-plaintext highlighter-rouge">Flowable<Order></code>, <code class="language-plaintext highlighter-rouge">Single<Customer></code> or whatever Reactive Streams publishers your library of choice provides).
Such trade-offs quickly become evident but as you can probably guess not all of them will be so obvious - <a href="https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/">The Law of Leaky Abstractions</a> guarantees that.</p>
<p>Reactive libraries make it trivial to change threading context.
You can easily subscribe on one scheduler, then execute part of the operator chain on the other and finally hop onto a completely different one.
Such jumping from one thread to another works as long as no thread-local state is involved, you know - the one you don’t usually deal with on a day to day basis although it powers crucial parts of your services (e.g. security, transactions, multitenancy).
Changing threading context when a well hidden part of your tech stack depends on thread-local state leads to tricky to nail down bugs.</p>
<p>Let me demonstrate the problem on a simple example:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="nc">Logger</span> <span class="no">LOG</span> <span class="o">=</span> <span class="nc">LoggerFactory</span><span class="o">.</span><span class="na">getLogger</span><span class="o">(</span><span class="nc">MethodHandles</span><span class="o">.</span><span class="na">lookup</span><span class="o">().</span><span class="na">lookupClass</span><span class="o">());</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="nc">String</span> <span class="no">SESSION_ID</span> <span class="o">=</span> <span class="s">"session-id"</span><span class="o">;</span>
<span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"documents/{id}"</span><span class="o">)</span>
<span class="nc">Mono</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="nf">getDocument</span><span class="o">(</span><span class="nd">@PathVariable</span><span class="o">(</span><span class="s">"id"</span><span class="o">)</span> <span class="nc">String</span> <span class="n">documentId</span><span class="o">)</span> <span class="o">{</span>
<span class="no">MDC</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="no">SESSION_ID</span><span class="o">,</span> <span class="no">UUID</span><span class="o">.</span><span class="na">randomUUID</span><span class="o">().</span><span class="na">toString</span><span class="o">());</span>
<span class="no">LOG</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Requested document[id={}]"</span><span class="o">,</span> <span class="n">documentId</span><span class="o">);</span>
<span class="k">return</span> <span class="nc">Mono</span><span class="o">.</span><span class="na">just</span><span class="o">(</span><span class="s">"Lorem ipsum"</span><span class="o">)</span>
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">doc</span> <span class="o">-></span> <span class="o">{</span>
<span class="no">LOG</span><span class="o">.</span><span class="na">debug</span><span class="o">(</span><span class="s">"Sanitizing document[id={}]"</span><span class="o">,</span> <span class="n">documentId</span><span class="o">);</span>
<span class="k">return</span> <span class="n">doc</span><span class="o">.</span><span class="na">trim</span><span class="o">();</span>
<span class="o">});</span>
<span class="o">}</span>
</code></pre></div></div>
<p>With <code class="language-plaintext highlighter-rouge">MDC.put(SESSION_ID, UUID.randomUUID().toString())</code> we’re putting <code class="language-plaintext highlighter-rouge">session-id</code> into <a href="https://logback.qos.ch/manual/mdc.html">Mapped Diagnostic Context</a> of underlying logging library so that we could log it later on.</p>
<p>Let’s configure logging pattern in a way that would automatically log <code class="language-plaintext highlighter-rouge">session-id</code> for us:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>logging.pattern.console=[%-28thread] [%-36mdc{session-id}] - %-5level - %msg%n
</code></pre></div></div>
<p>When we hit the exposed service with a request (<code class="language-plaintext highlighter-rouge">curl localhost:8080/documents/42</code>) we will see <code class="language-plaintext highlighter-rouge">session-id</code> appearing in the log entries:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[reactor-http-server-epoll-10] [00c4b05f-a6ee-4a7d-9f92-d9d53dbbb9d0] - INFO - Requested document[id=42]
[reactor-http-server-epoll-10] [00c4b05f-a6ee-4a7d-9f92-d9d53dbbb9d0] - DEBUG - Sanitizing document[id=42]
</code></pre></div></div>
<p>The situation changes if we switch the execution context (e.g. by subscribing on a different scheduler) after <code class="language-plaintext highlighter-rouge">session-id</code> is put into MDC:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"documents/{id}"</span><span class="o">)</span>
<span class="nc">Mono</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="nf">getDocument</span><span class="o">(</span><span class="nd">@PathVariable</span><span class="o">(</span><span class="s">"id"</span><span class="o">)</span> <span class="nc">String</span> <span class="n">documentId</span><span class="o">)</span> <span class="o">{</span>
<span class="no">MDC</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="no">SESSION_ID</span><span class="o">,</span> <span class="no">UUID</span><span class="o">.</span><span class="na">randomUUID</span><span class="o">().</span><span class="na">toString</span><span class="o">());</span>
<span class="no">LOG</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Requested document[id={}]"</span><span class="o">,</span> <span class="n">documentId</span><span class="o">);</span>
<span class="k">return</span> <span class="nc">Mono</span><span class="o">.</span><span class="na">just</span><span class="o">(</span><span class="s">"Lorem ipsum"</span><span class="o">)</span>
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">doc</span> <span class="o">-></span> <span class="o">{</span>
<span class="no">LOG</span><span class="o">.</span><span class="na">debug</span><span class="o">(</span><span class="s">"Sanitizing document[id={}]"</span><span class="o">,</span> <span class="n">documentId</span><span class="o">);</span>
<span class="k">return</span> <span class="n">doc</span><span class="o">.</span><span class="na">trim</span><span class="o">();</span>
<span class="o">})</span>
<span class="o">.</span><span class="na">subscribeOn</span><span class="o">(</span><span class="nc">Schedulers</span><span class="o">.</span><span class="na">elastic</span><span class="o">());</span> <span class="c1">// don't use schedulers with unbounded thread pool in production</span>
<span class="o">}</span>
</code></pre></div></div>
<p>After execution context changes we will notice <code class="language-plaintext highlighter-rouge">session-id</code> missing from log entries logged by operators scheduled by that scheduler:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[reactor-http-server-epoll-10] [c2ceae03-593e-4fb3-bbfa-bc4970322e44] - INFO - Requested document[id=42]
[elastic-2 ] [ ] - DEBUG - Sanitizing document[id=42]
</code></pre></div></div>
<p>As you can probably guess there’s some <code class="language-plaintext highlighter-rouge">ThreadLocal</code> hidden deep <a href="https://github.com/qos-ch/logback/blob/master/logback-classic/src/main/java/ch/qos/logback/classic/util/LogbackMDCAdapter.java">inside the logging library</a> we’re using.</p>
<p>Some Reactive Streams implementations provide mechanisms that allow to make contextual data available to operators (e.g. <a href="https://projectreactor.io/">Project Reactor</a> provides <a href="https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Mono.html#subscriberContext-reactor.util.context.Context-">subscriber context</a>):</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"documents/{id}"</span><span class="o">)</span>
<span class="nc">Mono</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="nf">getDocument4</span><span class="o">(</span><span class="nd">@PathVariable</span><span class="o">(</span><span class="s">"id"</span><span class="o">)</span> <span class="nc">String</span> <span class="n">documentId</span><span class="o">)</span> <span class="o">{</span>
<span class="nc">String</span> <span class="n">sessionId</span> <span class="o">=</span> <span class="no">UUID</span><span class="o">.</span><span class="na">randomUUID</span><span class="o">().</span><span class="na">toString</span><span class="o">();</span>
<span class="no">MDC</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="no">SESSION_ID</span><span class="o">,</span> <span class="n">sessionId</span><span class="o">);</span>
<span class="no">LOG</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Requested document[id={}]"</span><span class="o">,</span> <span class="n">documentId</span><span class="o">);</span>
<span class="k">return</span> <span class="nc">Mono</span><span class="o">.</span><span class="na">just</span><span class="o">(</span><span class="s">"Lorem ipsum"</span><span class="o">)</span>
<span class="o">.</span><span class="na">zipWith</span><span class="o">(</span><span class="nc">Mono</span><span class="o">.</span><span class="na">subscriberContext</span><span class="o">())</span>
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">docAndCtxTuple</span> <span class="o">-></span> <span class="o">{</span>
<span class="k">try</span><span class="o">(</span><span class="no">MDC</span><span class="o">.</span><span class="na">MDCCloseable</span> <span class="n">mdc</span> <span class="o">=</span> <span class="no">MDC</span><span class="o">.</span><span class="na">putCloseable</span><span class="o">(</span><span class="no">SESSION_ID</span><span class="o">,</span> <span class="n">docAndCtxTuple</span><span class="o">.</span><span class="na">getT2</span><span class="o">().</span><span class="na">get</span><span class="o">(</span><span class="no">SESSION_ID</span><span class="o">)))</span> <span class="o">{</span>
<span class="no">LOG</span><span class="o">.</span><span class="na">debug</span><span class="o">(</span><span class="s">"Sanitizing document[id={}]"</span><span class="o">,</span> <span class="n">documentId</span><span class="o">);</span>
<span class="k">return</span> <span class="n">docAndCtxTuple</span><span class="o">.</span><span class="na">getT1</span><span class="o">().</span><span class="na">trim</span><span class="o">();</span>
<span class="o">}})</span>
<span class="o">.</span><span class="na">subscriberContext</span><span class="o">(</span><span class="nc">Context</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="no">SESSION_ID</span><span class="o">,</span> <span class="n">sessionId</span><span class="o">))</span>
<span class="o">.</span><span class="na">subscribeOn</span><span class="o">(</span><span class="nc">Schedulers</span><span class="o">.</span><span class="na">elastic</span><span class="o">());</span> <span class="c1">// don't use schedulers with unbounded thread pool in production</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Of course making data available is just part of the story.
Once we make <code class="language-plaintext highlighter-rouge">session-id</code> available (<code class="language-plaintext highlighter-rouge">subscriberContext(Context.of(SESSION_ID, sessionId))</code>) we not only have to retrieve it but also attach it back to the threading context as well as remember to clean up after ourselves since schedulers are free to reuse threads.</p>
<p>Presented implementation brings back <code class="language-plaintext highlighter-rouge">session-id</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[reactor-http-server-epoll-10] [24351524-f105-4746-8e06-b165036d02e6] - INFO - Requested document[id=42]
[elastic-2 ] [24351524-f105-4746-8e06-b165036d02e6] - DEBUG - Sanitizing document[id=42]
</code></pre></div></div>
<p>Nonetheless the code that makes it work is too complex and too invasive to welcome it with open arms in most codebases, especially if it ends up scattered across the codebase.</p>
<p>I’d love to finish this blog post by providing a simple solution to that problem but I haven’t yet stumbled upon such (a.k.a. for now we need to live with such, more complex and invasive, solutions while also trying to move this complexity from business-focused software parts down to it’s infrastructural parts and if possible directly to the libraries themselves).</p>Kamil Szymańskikamil.szymanski.dev@gmail.comAll non-trivial abstractions, to some degree, are leakyRefactoring stringly-typed systems2018-01-21T16:45:17+01:002018-01-21T16:45:17+01:00https://kamilszymanski.github.io/refactoring-stringly-typed-systems<p>Last year I joined a project that was taken over from another software house that failed to satisfy client demands.
As you can probably tell there were many things that could and should be improved in that “inherited” project and its codebase.
Sadly (but not surprisingly) the domain model was one of such orphaned, long-forgotten areas that screamed for help the most.</p>
<p>We knew we needed to put our hands dirty but how do you improve the domain model in an unfamiliar project where everything is so mixed up, tangled and overgrown with accidental complexity?
You set boundaries (divide and conquer!), apply small improvements in one area, then move to the other while getting to know the landscape and discovering bigger issues that hide behind those scary, obvious things that hurt your eyes from the first sight.
You would be surprised how much you can achieve by making small improvements and picking low-hanging fruits, yet at the same time you would be a fool thinking that they could solve major issues that have grown up there due to the lack of (or not enough) modeling efforts taken right from the dawn of the project.
Nevertheless without those small improvements it would be way harder to tackle most of the major domain model issues.</p>
<p>For me bringing more expressiveness and type-safety into code by introducing simple value objects was always one of the lowest-hanging fruits.
It’s a trick that always works, especially when dealing with codebases stinking with primitive obsession code smell and the mentioned system was a stringly-typed one.
It was full of code looking like this:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kt">void</span> <span class="nf">verifyAccountOwnership</span><span class="o">(</span><span class="nc">String</span> <span class="n">accountId</span><span class="o">,</span> <span class="nc">String</span> <span class="n">customerId</span><span class="o">)</span> <span class="o">{...}</span>
</code></pre></div></div>
<p>while I bet everyone would prefer it to look more like that:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kt">void</span> <span class="nf">verifyAccountOwnership</span><span class="o">(</span><span class="nc">AccountId</span> <span class="n">accountId</span><span class="o">,</span> <span class="nc">CustomerId</span> <span class="n">customerId</span><span class="o">)</span> <span class="o">{...}</span>
</code></pre></div></div>
<p>It’s not a rocket science!
I’d say it’s a no-brainer and it always surprises me how easy it is to find implementations operating on e.g. vague, contextless BigDecimals instead of Amounts, Quantities or Percentages.</p>
<p>Code that uses domain specific value objects instead of contextless primitives is:</p>
<ul>
<li>much more expressive (you don’t need to map strings into a customer identifiers in your head nor worry about any of those strings being an empty string)</li>
<li>easier to grasp (invariants are protected in one place instead of being scattered all around the codebase in ubiquitous if statements)</li>
<li>less buggy (did I put all of those strings in the right order?)</li>
<li>easier to develop (explicit definitions are more obvious and invariants are protected right where you would expect it)</li>
<li>faster to develop (IDE offers much more help and the compiler provides fast feedback cycles)</li>
</ul>
<p>and those are just a few of the things you get almost for free (you just have to use common sense ^^).</p>
<p>Refactoring towards value objects sounds like a piece of cake (naming things is not taken into account here), you simply extract class here, migrate type there, nothing spectacular.
It usually is that simple, especially when the code you have to deal with lives inside a single code repository and runs in a single process.
This time however it wasn’t so trivial.
Not that it was much more complicated, it just required a tiny bit more thinking (and it makes for a nice piece of work to be described ^^).</p>
<p>It was a distributed system that had service boundaries set at wrong places and shared too much code (including model) between services .
The boundaries were set so bad that many crucial operations in the system required numerous interactions (mostly synchronous) with multiple services.
There’s a challenge (not so big) in applying mentioned refactoring in a described context in a way that doesn’t end up as an exercise of creating unnecessary layers and introducing accidental complexity at service boundaries.
Before jumping to refactoring I had to set some rules, or rather one crucial rule: no changes should be visible from the outside of the service, including backing services.
To put it simple all published contracts stay the same and there are no changes required on backing services side (e.g. no database schema changes).
Easily said and frankly speaking easily done with a bit of dull work.</p>
<p>Let’s take <code class="language-plaintext highlighter-rouge">String accountId</code> for a ride and demonstrate necessary steps.
We want to turn such code:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">Account</span> <span class="o">{</span>
<span class="kd">private</span> <span class="nc">String</span> <span class="n">accountId</span><span class="o">;</span>
<span class="c1">// rest omitted for brevity</span>
<span class="o">}</span>
</code></pre></div></div>
<p>into this:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">Account</span> <span class="o">{</span>
<span class="kd">private</span> <span class="nc">AccountId</span> <span class="n">accountId</span><span class="o">;</span>
<span class="c1">// rest omitted for brevity</span>
<span class="o">}</span>
</code></pre></div></div>
<p>This can be achieved by introducing AccountId value object:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@ToString</span>
<span class="nd">@EqualsAndHashCode</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">AccountId</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="nc">String</span> <span class="n">accountId</span><span class="o">;</span>
<span class="kd">private</span> <span class="nf">AccountId</span><span class="o">(</span><span class="nc">String</span> <span class="n">accountId</span><span class="o">)</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="n">accountId</span> <span class="o">==</span> <span class="kc">null</span> <span class="o">||</span> <span class="n">accountId</span><span class="o">.</span><span class="na">isEmpty</span><span class="o">())</span> <span class="o">{</span>
<span class="k">throw</span> <span class="k">new</span> <span class="nf">IllegalArgumentException</span><span class="o">(</span><span class="s">"accountId cannot be null nor empty"</span><span class="o">);</span>
<span class="o">}</span>
<span class="c1">// can account ID be 20 characters long?</span>
<span class="c1">// are special characters allowed?</span>
<span class="c1">// can I put a new line feed in the account ID?</span>
<span class="k">this</span><span class="o">.</span><span class="na">accountId</span> <span class="o">=</span> <span class="n">accountId</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="nc">AccountId</span> <span class="nf">of</span><span class="o">(</span><span class="nc">String</span> <span class="n">accountId</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">AccountId</span><span class="o">(</span><span class="n">accountId</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="nc">String</span> <span class="nf">asString</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">accountId</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>AccountId is just a value object, it has no identity, it doesn’t change over time, hence it is immutable.
It performs all validations in a single place and fails fast on incorrect inputs by failing to instantiate AccountId instead of failing later on on an if statement buried down several layers down the call stack.
If it needs to protect any invariants you know where to put them and where to look for them.</p>
<p>So far so good, but what if AccountId needs to be persisted in a database?
Well, you just implement an attribute converter:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">AccountIdConverter</span> <span class="kd">implements</span> <span class="nc">AttributeConverter</span><span class="o"><</span><span class="nc">AccountId</span><span class="o">,</span> <span class="nc">String</span><span class="o">></span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="nc">String</span> <span class="nf">convertToDatabaseColumn</span><span class="o">(</span><span class="nc">AccountId</span> <span class="n">accountId</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">accountId</span><span class="o">.</span><span class="na">asString</span><span class="o">();</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="nc">AccountId</span> <span class="nf">convertToEntityAttribute</span><span class="o">(</span><span class="nc">String</span> <span class="n">accountId</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">AccountId</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">accountId</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Then you enable the converter by either <code class="language-plaintext highlighter-rouge">@Converter(autoApply = true)</code> set directly on the converter implementation or <code class="language-plaintext highlighter-rouge">@Convert(converter = AccountIdConverter.class)</code> set on the entity field.</p>
<p>Of course not everything spins around databases and luckily amongst many not so good design decision applied in the mentioned project there were also many good ones.
One of such good decisions was to standardize the data format used for out of process communication.
In the mentioned case it was JSON, hence I needed to make JSON payload immune to the performed refactoring.
The easiest way (if you use Jackson) is to sprinkle the implementation with a couple of Jackson annotations:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">AccountId</span> <span class="o">{</span>
<span class="nd">@JsonCreator</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="nc">AccountId</span> <span class="nf">of</span><span class="o">(</span><span class="nd">@JsonProperty</span><span class="o">(</span><span class="s">"accountId"</span><span class="o">)</span> <span class="nc">String</span> <span class="n">accountId</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">AccountId</span><span class="o">(</span><span class="n">accountId</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@JsonValue</span>
<span class="kd">public</span> <span class="nc">String</span> <span class="nf">asString</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">accountId</span><span class="o">;</span>
<span class="o">}</span>
<span class="c1">// rest omitted for brevity</span>
<span class="o">}</span>
</code></pre></div></div>
<p>I started with the easiest solution.
It wasn’t ideal but it was good enough and at that time we had more important issues to deal with.
Having both JSON serialization and database types conversion taken care of after less than 3 hours I have moved first 2 services from stringly-typed identifiers to the value object based ones for the identifiers most commonly used within the system.
It took so long due to 2 reasons.</p>
<p>The first one was obvious: along the way I had to check if null values were not possible (and if they would then state that explicitly).
Without this the whole refactoring would be just a code polishing exercise.</p>
<p>The second one was something I almost missed - do you remember the requirement that the change should not be visible from the outside?
After turning account ID into a value object swagger definitions changed as well, now account ID was no longer a string but an object.
This was also easy to fix, it just required specifying swagger model substitution.
In case of <a href="https://github.com/kongchen/swagger-maven-plugin">swagger-maven-plugin</a> all you need to do is <a href="https://github.com/kongchen/swagger-maven-plugin#model-substitution">feed it with the file containing model substitution mappings</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>com.example.AccountId: java.lang.String
</code></pre></div></div>
<p>Was the result of performed refactoring a significant improvement?
Rather not, but you improve a lot by making lots of small improvements.
Nevertheless this wasn’t a tiny improvement, it brought a lot of clarity into the code and made further improvements easier.
Was it worth the effort - I would definitely say: yes, it was.
A good indicator of this is that other teams adopted that approach.</p>
<p>Fast-forward a few sprints, having solved some of the more important issues and having started turning inherited, heavily tangled mess into a bit nicer solution based on hexagonal architecture, the time has come to deal with the drawbacks of the taken easiest approach to support JSON serialization.
What we needed to do was decouple AccountId domain object from things not related to the domain.
Namely we had to move out of the domain the part defining how to serialize this value object and remove domain coupling to Jackson.
In order to achieve that we created Jackson Module that handled AccountId serialization:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">AccountIdSerializer</span> <span class="kd">extends</span> <span class="nc">StdSerializer</span><span class="o"><</span><span class="nc">AccountId</span><span class="o">></span> <span class="o">{</span>
<span class="nc">AccountIdSerializer</span><span class="o">()</span> <span class="o">{</span>
<span class="kd">super</span><span class="o">(</span><span class="nc">AccountId</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">serialize</span><span class="o">(</span><span class="nc">AccountId</span> <span class="n">accountId</span><span class="o">,</span> <span class="nc">JsonGenerator</span> <span class="n">generator</span><span class="o">,</span> <span class="nc">SerializerProvider</span> <span class="n">provider</span><span class="o">)</span> <span class="kd">throws</span> <span class="nc">IOException</span> <span class="o">{</span>
<span class="n">generator</span><span class="o">.</span><span class="na">writeString</span><span class="o">(</span><span class="n">accountId</span><span class="o">.</span><span class="na">asString</span><span class="o">());</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="kd">class</span> <span class="nc">AccountIdDeserializer</span> <span class="kd">extends</span> <span class="nc">StdDeserializer</span><span class="o"><</span><span class="nc">AccountId</span><span class="o">></span> <span class="o">{</span>
<span class="nc">AccountIdDeserializer</span><span class="o">()</span> <span class="o">{</span>
<span class="kd">super</span><span class="o">(</span><span class="nc">AccountId</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="nc">AccountId</span> <span class="nf">deserialize</span><span class="o">(</span><span class="nc">JsonParser</span> <span class="n">json</span><span class="o">,</span> <span class="nc">DeserializationContext</span> <span class="n">cxt</span><span class="o">)</span> <span class="kd">throws</span> <span class="nc">IOException</span> <span class="o">{</span>
<span class="nc">String</span> <span class="n">accountId</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="na">readValueAs</span><span class="o">(</span><span class="nc">String</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="k">return</span> <span class="nc">AccountId</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">accountId</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="kd">class</span> <span class="nc">AccountIdSerializationModule</span> <span class="kd">extends</span> <span class="nc">Module</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">setupModule</span><span class="o">(</span><span class="nc">SetupContext</span> <span class="n">setupContext</span><span class="o">)</span> <span class="o">{</span>
<span class="n">setupContext</span><span class="o">.</span><span class="na">addSerializers</span><span class="o">(</span><span class="n">createSerializers</span><span class="o">());</span>
<span class="n">setupContext</span><span class="o">.</span><span class="na">addDeserializers</span><span class="o">(</span><span class="n">createDeserializers</span><span class="o">());</span>
<span class="o">}</span>
<span class="kd">private</span> <span class="nc">Serializers</span> <span class="nf">createSerializers</span><span class="o">()</span> <span class="o">{</span>
<span class="nc">SimpleSerializers</span> <span class="n">serializers</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">SimpleSerializers</span><span class="o">();</span>
<span class="n">serializers</span><span class="o">.</span><span class="na">addSerializer</span><span class="o">(</span><span class="k">new</span> <span class="nc">AccountIdSerializer</span><span class="o">());</span>
<span class="k">return</span> <span class="n">serializers</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">private</span> <span class="nc">Deserializers</span> <span class="nf">createDeserializers</span><span class="o">()</span> <span class="o">{</span>
<span class="nc">SimpleDeserializers</span> <span class="n">deserializers</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">SimpleDeserializers</span><span class="o">();</span>
<span class="n">deserializers</span><span class="o">.</span><span class="na">addDeserializer</span><span class="o">(</span><span class="nc">AccountId</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="k">new</span> <span class="nc">AccountIdDeserializer</span><span class="o">());</span>
<span class="k">return</span> <span class="n">deserializers</span><span class="o">;</span>
<span class="o">}</span>
<span class="c1">// rest omitted for brevity</span>
<span class="o">}</span>
</code></pre></div></div>
<p>If you’re using Spring Boot configuring such module requires simply registering it in the application context:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@Configuration</span>
<span class="kd">class</span> <span class="nc">JacksonConfig</span> <span class="o">{</span>
<span class="nd">@Bean</span>
<span class="nc">Module</span> <span class="nf">accountIdSerializationModule</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">AccountIdSerializationModule</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Implementing custom serializers was also something we needed because along all the improvements we have identified more value objects and some of them were a bit more complex - but that’s something for another article.</p>Kamil Szymańskikamil.szymanski.dev@gmail.comPicking low-hanging fruits on the way to improve domain modelControlling parallelism level of Java parallel streams2017-10-16T08:21:23+02:002017-10-16T08:21:23+02:00https://kamilszymanski.github.io/controlling-parallelism-level-of-java-parallel-streams<p>With recent Java 9 release we got many new goodies to play with and improve our solutions once we grasp those new features.
The release of Java 9 is also a good time to revise whether we have grasped Java 8 features.</p>
<p>In this post I’d like to bust the most common misconception about Java parallel streams.
It’s often said that you cannot control parallel streams’ parallelism level in a programmatic way, that parallel streams always run on shared ForkJoinPool.commonPool() and there’s nothing you can do about it.
This is the case if you make your stream parallel by just adding parallel() call to the call chain.
That might be sufficient in some cases, e.g. if you perform only lightweight operations on that stream, however if you need to gain more control over your stream’s parallel execution you need to do a bit more than just calling parallel().</p>
<p>Instead of diving in into theory and technicalities let’s jump straight to the self-documenting example.</p>
<p>Having a parallel stream being processed on shared ForkJoinPool.commonPool():</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">Set</span><span class="o"><</span><span class="nc">FormattedMessage</span><span class="o">></span> <span class="nf">formatMessages</span><span class="o">(</span><span class="nc">Set</span><span class="o"><</span><span class="nc">RawMessage</span><span class="o">></span> <span class="n">messages</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">messages</span><span class="o">.</span><span class="na">stream</span><span class="o">()</span>
<span class="o">.</span><span class="na">parallel</span><span class="o">()</span>
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="nl">MessageFormatter:</span><span class="o">:</span><span class="n">format</span><span class="o">)</span>
<span class="o">.</span><span class="na">collect</span><span class="o">(</span><span class="n">toSet</span><span class="o">());</span>
<span class="o">}</span>
</code></pre></div></div>
<p>let’s move parallel processing to a pool that we can control and don’t have to share:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="kt">int</span> <span class="no">PARALLELISM_LEVEL</span> <span class="o">=</span> <span class="mi">8</span><span class="o">;</span>
<span class="nc">Set</span><span class="o"><</span><span class="nc">FormattedMessage</span><span class="o">></span> <span class="nf">formatMessages</span><span class="o">(</span><span class="nc">Set</span><span class="o"><</span><span class="nc">RawMessage</span><span class="o">></span> <span class="n">messages</span><span class="o">)</span> <span class="o">{</span>
<span class="nc">ForkJoinPool</span> <span class="n">forkJoinPool</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">ForkJoinPool</span><span class="o">(</span><span class="no">PARALLELISM_LEVEL</span><span class="o">);</span>
<span class="k">try</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">forkJoinPool</span><span class="o">.</span><span class="na">submit</span><span class="o">(()</span> <span class="o">-></span> <span class="n">formatMessagesInParallel</span><span class="o">(</span><span class="n">messages</span><span class="o">))</span>
<span class="o">.</span><span class="na">get</span><span class="o">();</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">InterruptedException</span> <span class="o">|</span> <span class="nc">ExecutionException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// handle exceptions</span>
<span class="o">}</span> <span class="k">finally</span> <span class="o">{</span>
<span class="n">forkJoinPool</span><span class="o">.</span><span class="na">shutdown</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="kd">private</span> <span class="nc">Set</span><span class="o"><</span><span class="nc">FormattedMessage</span><span class="o">></span> <span class="nf">formatMessagesInParallel</span><span class="o">(</span><span class="nc">Set</span><span class="o"><</span><span class="nc">RawMessage</span><span class="o">></span> <span class="n">messages</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">messages</span><span class="o">.</span><span class="na">stream</span><span class="o">()</span>
<span class="o">.</span><span class="na">parallel</span><span class="o">()</span>
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="nl">MessageFormatter:</span><span class="o">:</span><span class="n">format</span><span class="o">)</span>
<span class="o">.</span><span class="na">collect</span><span class="o">(</span><span class="n">toSet</span><span class="o">());</span>
<span class="o">}</span>
</code></pre></div></div>
<p>In this example we’re interested only in the parallelism level of the ForkJoinPool though we can also control ThreadFactory and UncaughtExceptionHandler if needed.</p>
<p>Under the hood the ForkJoinPool scheduler will take care of everything, including incorporating work-stealing algorithm to improve parallel processing efficiency.
Having said that it’s worth to mention that manual processing using ThreadPoolExecutor might be more efficient in some cases, e.g. if the workload is evenly distributed over worker threads.</p>Kamil Szymańskikamil.szymanski.dev@gmail.comBusting the most common misconception about Java parallel streamsChecking reactive repositories for backpressure support2017-10-10T09:05:19+02:002017-10-10T09:05:19+02:00https://kamilszymanski.github.io/checking-reactive-repositories-for-backpressure-support<p>When I’m working with a repository that returns Flux, RxJava 1.x Observable or some other reactive dataflow I wonder if it supports backpressure, especially if the returned dataset might contain more than a handful results.
The bigger the dataset the more crucial it is to have backpressure support.
In the extreme cases without backpressure support we may end up with an OutOfMemoryError, while we could have handled the whole dataset without any issues if backpressure was supported.
In less extreme cases we may end up fully utilizing available resources and making the application unresponsive for significant period of time.
When processing larger datasets without backpressure support we may also experience longer or more frequent stop the world pauses leading to worse latency or noticeable drop in throughput due to the pressure put on garbage collector.</p>
<p>Unfortunately often it’s not that obvious nor documented whether the repository supports backpressure all the way down the stack.
Due to that I developed a simple test for backpressure support<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.
It requires writing a simple, single-threaded subscriber to process data coming from the repository:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">documentRepository</span><span class="o">.</span><span class="na">findAll</span><span class="o">()</span> <span class="c1">// this returns Flux<Document> but does it support backpressure?</span>
<span class="o">.</span><span class="na">subscribe</span><span class="o">(</span><span class="n">document</span> <span class="o">-></span>
<span class="no">LOG</span><span class="o">.</span><span class="na">debug</span><span class="o">(</span><span class="s">"Processing {}"</span><span class="o">,</span> <span class="n">document</span><span class="o">));</span>
</code></pre></div></div>
<p>Within that implementation we need to put 2 breakpoints, one where the repository is queried (documentRepository#findAll) and the other one where data processing happens (LOG#debug).
Then we run it in the debugger with Memory View opened so that we can track how the heap changes between breakpoints.</p>
<p>To demonstrate this process let me show you 2 implementations and the results of hitting the second breakpoint for both of them.</p>
<p>Let’s say that the repository contains 10 000 documents:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">Document</span> <span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">content</span><span class="p">)</span> <span class="k">VALUES</span>
<span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'large blob'</span><span class="p">),</span>
<span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'large blob'</span><span class="p">),</span>
<span class="p">...</span>
<span class="p">(</span><span class="mi">10000</span><span class="p">,</span> <span class="s1">'large blob'</span><span class="p">),</span>
</code></pre></div></div>
<p>First let’s test JPA-based implementation:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">JpaBasedDocumentRepository</span> <span class="kd">implements</span> <span class="nc">DocumentRepository</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="nc">EntityManager</span> <span class="n">entityManager</span><span class="o">;</span>
<span class="nc">JpaBasedDocumentRepository</span><span class="o">(</span><span class="nc">EntityManager</span> <span class="n">entityManager</span><span class="o">)</span> <span class="o">{</span>
<span class="k">this</span><span class="o">.</span><span class="na">entityManager</span> <span class="o">=</span> <span class="n">entityManager</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="nc">Flux</span><span class="o"><</span><span class="nc">Document</span><span class="o">></span> <span class="nf">findAll</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">Flux</span><span class="o">.</span><span class="na">fromStream</span><span class="o">(</span>
<span class="n">entityManager</span><span class="o">.</span><span class="na">createQuery</span><span class="o">(</span><span class="s">"from Document d"</span><span class="o">)</span>
<span class="o">.</span><span class="na">getResultList</span><span class="o">()</span>
<span class="o">.</span><span class="na">stream</span><span class="o">());</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>When we hit the second breakpoint we can see that the whole dataset was loaded into memory (or the application crashed with OutOfMemoryError if the dataset was large enough) even though only one document was needed at that time (the subscriber processes results sequentially):</p>
<p><img src="../assets/images/posts/checking-reactive-repositories-for-backpressure-support/documents-loaded-when-processing-first-document-from-repository-lacking-full-backpressure-support.png" alt="alt text" title="documents loaded by repository that does not support backreassure when the first document in result set is being processed" /></p>
<p>This repository, even though it returns Flux that is capable of supporting backpressue, does not support backpressure, it just loads the whole dataset into memory at once.</p>
<p>Let’s test another implementation, this time a Hibernate-based one:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">HibernateBasedDocumentRepository</span> <span class="kd">implements</span> <span class="nc">DocumentRepository</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="nc">HibernateEntityManager</span> <span class="n">entityManager</span><span class="o">;</span>
<span class="nc">HibernateBasedDocumentRepository</span><span class="o">(</span><span class="nc">EntityManager</span> <span class="n">entityManager</span><span class="o">)</span> <span class="o">{</span>
<span class="k">this</span><span class="o">.</span><span class="na">entityManager</span> <span class="o">=</span> <span class="n">entityManager</span><span class="o">.</span><span class="na">unwrap</span><span class="o">(</span><span class="nc">HibernateEntityManager</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="nc">Flux</span><span class="o"><</span><span class="nc">Document</span><span class="o">></span> <span class="nf">findAll</span><span class="o">()</span> <span class="o">{</span>
<span class="nc">Stream</span><span class="o"><</span><span class="nc">Document</span><span class="o">></span> <span class="n">resultStream</span> <span class="o">=</span> <span class="n">entityManager</span><span class="o">.</span><span class="na">createQuery</span><span class="o">(</span><span class="s">"from Document d"</span><span class="o">)</span>
<span class="o">.</span><span class="na">stream</span><span class="o">();</span>
<span class="k">return</span> <span class="nc">Flux</span><span class="o">.</span><span class="na">fromStream</span><span class="o">(</span><span class="n">resultStream</span><span class="o">)</span>
<span class="o">.</span><span class="na">doOnTerminate</span><span class="o">(</span><span class="nl">resultStream:</span><span class="o">:</span><span class="n">close</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>In this implementation we create Flux from the stream opened directly on query results.</p>
<p>Let’s take a look at the heap once we hit the second breakpoint:</p>
<p><img src="../assets/images/posts/checking-reactive-repositories-for-backpressure-support/documents-loaded-when-processing-first-document-from-repository-supporting-backpressure.png" alt="alt text" title="documents loaded by repository that supports backreassure when the first document in result set is being processed" /></p>
<p>We can see that only one document was loaded, exactly as much data as we needed to process at that time.
If we resume processing and hit that breakpoint for the second time we will see another document being loaded:</p>
<p><img src="../assets/images/posts/checking-reactive-repositories-for-backpressure-support/documents-loaded-when-processing-second-document-from-repository-supporting-backpressure.png" alt="alt text" title="documents loaded by repository that supports backreassure when the second document in result set is being processed" /></p>
<p>Now we have 2 documents loaded, and if we resume processing and hit that breakpoint for the third time we will have 3 documents loaded and so on.
At some point GC will kick in and collect<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> all the loaded documents besides the one currently being processed, which leads to better memory utilization and making the application more resilient.</p>
<p>In some cases results of this test might not be that simple to interpret, e.g. some implementations might support backpressure but load results in batches to reduce communication overhead, but knowing the basic characteristics of the dataset and the datastore under test you should be able to use this test to verify whether the repository supports backpressure.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>in this test we examine repositories running in-process with out-of-process datastore hence the way of handling backpressure on the publisher side is as important as allowing subscribers to signal the publisher that the rate of emission is too high <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>in case of JPA and its’ implementations you have to remember that managed entities won’t be garbage collected <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Kamil Szymańskikamil.szymanski.dev@gmail.comSimple test to check whether the repository supports backpressureServing large datasets with Spring WebFlux2017-09-26T22:51:32+02:002017-09-26T22:51:32+02:00https://kamilszymanski.github.io/serving-large-datasets-with-spring-webflux<p>If you’re serving large datasets from your web service you might like one of the upcoming Spring Framework 5.0 features.
But before we get to this feature let’s see how a naive implementation of such service might look like:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@GetMapping</span><span class="o">(</span><span class="n">path</span> <span class="o">=</span> <span class="s">"items"</span><span class="o">,</span> <span class="n">produces</span> <span class="o">=</span> <span class="s">"application/json"</span><span class="o">)</span>
<span class="nc">List</span><span class="o"><</span><span class="nc">Item</span><span class="o">></span> <span class="nf">allItems</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">itemRepository</span><span class="o">.</span><span class="na">findAll</span><span class="o">();</span>
<span class="o">}</span>
</code></pre></div></div>
<p>The naive part of this implementation is that we try to return the whole dataset at once and this can easily make such service unresponsive.</p>
<p>If this dataset is larger than what we can fit into memory or if there are many clients asking for this large dataset at the same time we’ll end up seeing OutOfMemoryError.
And even if we have heap large enough to handle such cases there’s a high chance that our client’s won’t be that lucky and would fail with OutOfMemoryError reading the response.
Moreover there is an often overlooked latency issue hidden there - with such implementation client can start processing that dataset only after it’s fully loaded, serialized and delivered to him by the server:</p>
<p><img src="../assets/images/posts/serving-large-datasets-with-spring-webflux/single-json-document.gif" alt="alt text" title="returning whole dataset at once" /></p>
<p>Such problems are usually mitigated to some extent by introducing paging.
However if client is interested in the whole dataset he now has to issue multiple requests which is not the most convenient solution (not to mention that if no consistency control mechanisms are in place he might get duplicates and/or miss some data).</p>
<p>So can we do better than that?
As microservices were the answer to all the questions in the last few years now reactive is the golden hammer.
Luckily our problem seems to look more like a nail than a screw, so let’s stab it with a reactive hammer:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@GetMapping</span><span class="o">(</span><span class="n">path</span> <span class="o">=</span> <span class="s">"items"</span><span class="o">,</span> <span class="n">produces</span> <span class="o">=</span> <span class="s">"application/json"</span><span class="o">)</span>
<span class="nc">Flux</span><span class="o"><</span><span class="nc">Item</span><span class="o">></span> <span class="nf">allItems</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">itemRepository</span><span class="o">.</span><span class="na">findAll</span><span class="o">();</span>
<span class="o">}</span>
</code></pre></div></div>
<p>The important part here is that the data source should support backpressure or allow to load data in chunks and at speed that pose no issues for the receiver.
Assuming that our repository supports backpressure and given that Flux is capable of supporting it the problem should be solved since WebFlux (reactive HTTP component, part of Spring Framework 5.0) handles Flux payloads quite well.</p>
<p>Unfortunately this implementation still has all the mentioned problems.
It’s because we still need all data in place just to serialize it before we start sending the response.</p>
<p>The fix is rather obvious - we need reactive JSON serializer.
As I told nowadays reactive is an answer to all problems.
Just kidding, we don’t need any reactive serializers (I don’t even know what that means).
Good old Jackson, Gson, or any other JSON serializer you prefer should be sufficient.
What we need to change is not how we serialize but what we serialize.
Let me show you the implementation before I explain what I mean by saying “change what we serialize”.</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@GetMapping</span><span class="o">(</span><span class="n">path</span> <span class="o">=</span> <span class="s">"items"</span><span class="o">,</span> <span class="n">produces</span> <span class="o">=</span> <span class="s">"application/stream+json"</span><span class="o">)</span>
<span class="nc">Flux</span><span class="o"><</span><span class="nc">Item</span><span class="o">></span> <span class="nf">allItems</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">itemRepository</span><span class="o">.</span><span class="na">findAll</span><span class="o">();</span>
<span class="o">}</span>
</code></pre></div></div>
<p>As you can see we’re still returning a Flux of Items however now the response has different mediatype.
Now instead of returning one large serialized JSON document containing all Items we return a stream of individually serialized Items (a stream of JSON documents):</p>
<p><img src="../assets/images/posts/serving-large-datasets-with-spring-webflux/stream-of-json-documents.gif" alt="alt text" title="returning stream of JSON documents" /></p>
<p>Under the hood whenever an Item is emitted from the repository it gets serialized, the response buffer is flushed (meaning that bytes start flowing to the client) but the connection is kept open until all documents are emitted, serialized and sent.
This doesn’t sound like some magical or new solution, e.g. you might remember tricks like Comet that date back several years in the past.</p>
<p>Of course handling such responses requires clients being able to decode them but that’s not a rocket science and there already exist implementations that can do that (including Spring Framework 5.0 WebClient).</p>
<p>In the last iteration of this service we got rid of OutOfMemoryError issue on the server side as well as significantly reduced the time needed for the client to start processing the first Item in the dataset returned by our service.
Another issue we had was OutOfMemoryError on the client side - here all depends on the client being able to process incoming Items as fast as the server is sending them or being able to buffer unprocessed part of the sent dataset.
It’s not the perfect solution to this problem but having in mind that we’re communicating over request-response protocol it might be an acceptable one, especially that we have significantly reduced the probability of OutOfMemoryError on the client side.</p>Kamil Szymańskikamil.szymanski.dev@gmail.comUsing old solutions dressed in new APIResources utilization in reactive services2017-05-14T23:46:11+02:002017-05-14T23:46:11+02:00https://kamilszymanski.github.io/resources-utilization-in-reactive-services<p>Let me start this post with a question.
Imagine a service returning a value whose fetching from some other service (e.g. database) takes 1 second:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@SpringBootApplication</span>
<span class="nd">@RestController</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">WebApplication</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="nc">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
<span class="nc">SpringApplication</span><span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="nc">WebApplication</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">args</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"/value"</span><span class="o">)</span>
<span class="nc">String</span> <span class="nf">fetchValue</span><span class="o">()</span> <span class="kd">throws</span> <span class="nc">InterruptedException</span> <span class="o">{</span>
<span class="nc">TimeUnit</span><span class="o">.</span><span class="na">SECONDS</span><span class="o">.</span><span class="na">sleep</span><span class="o">(</span><span class="mi">1</span><span class="o">);</span>
<span class="k">return</span> <span class="s">"42"</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>How many transactions per second can we get when we hit it with 10 concurrent users?</p>
<p>I know you already have an answer but let’s be a good engineer and measure instead of guessing.
Let’s run <a href="https://github.com/JoeDog/siege">siege</a> in benchmark mode with 10 concurrent users, each issuing 10 subsequent requests:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>siege <span class="nt">-b</span> <span class="nt">-c</span> 10 <span class="nt">-r</span> 10 http://localhost:8080/value
Transactions: 100 hits
Availability: 100.00 %
Elapsed <span class="nb">time</span>: 10.05 secs
Data transferred: 0.00 MB
Response <span class="nb">time</span>: 1.00 secs
Transaction rate: 9.95 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 9.99
Successful transactions: 100
Failed transactions: 0
Longest transaction: 1.01
Shortest transaction: 1.00
</code></pre></div></div>
<p>We’re getting 9.95 transactions per second (transaction rate), which is close to the theoretical maximum of 10 TPS.</p>
<p>That was easy.
Let’s make it a bit more interesting: how many TPS can we get if we increase the number of concurrent users to 100?</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>siege <span class="nt">-b</span> <span class="nt">-c</span> 100 <span class="nt">-r</span> 10 http://localhost:8080/value
Transactions: 1000 hits
Availability: 100.00 %
Elapsed <span class="nb">time</span>: 50.20 secs
Data transferred: 0.00 MB
Response <span class="nb">time</span>: 4.82 secs
Transaction rate: 19.92 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 95.97
Successful transactions: 1000
Failed transactions: 0
Longest transaction: 5.05
Shortest transaction: 1.01
</code></pre></div></div>
<p>WAT?
It’s not even near to 100 TPS.
How is that even possible?
Well let me tell you a secret, I have set max number of Tomcat worker threads to 20.</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">server</span><span class="o">.</span><span class="na">tomcat</span><span class="o">.</span><span class="na">max</span><span class="o">-</span><span class="n">threads</span><span class="o">=</span><span class="mi">20</span>
</code></pre></div></div>
<p>So now that we know what is the limiting factor let’s get rid of this custom worker thread limit and repeat the test with the default configuration (200 worker threads in case of Tomcat 8.5):</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>siege <span class="nt">-b</span> <span class="nt">-c</span> 100 <span class="nt">-r</span> 10 http://localhost:8080/value
Transactions: 1000 hits
Availability: 100.00 %
Elapsed <span class="nb">time</span>: 10.06 secs
Data transferred: 0.00 MB
Response <span class="nb">time</span>: 1.00 secs
Transaction rate: 99.40 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 99.73
Successful transactions: 1000
Failed transactions: 0
Longest transaction: 1.02
Shortest transaction: 1.00
</code></pre></div></div>
<p>The actual numbers are not that interesting (yes, we went close to 100 TPS) as threads usage:</p>
<p><img src="../assets/images/posts/resources-utilization-in-reactive-services/handling_100_concurrent_requests_to_classic_servlet_based_service.png" alt="alt text" title="threads activity during 100 concurrent requests to classic servlet-based service" /></p>
<p>We start with less than 50 live threads and when 100 users hit the service we quickly reach almost 150 live threads and keep them alive for some time after the traffic is gone just in case they could be reused.
It’s worth pointing out that we are limited by the number of worker threads and once we exceed that number of concurrent requests we will start queuing.</p>
<p>Now let’s sprinkle our service with some reactive magic by replacing old handler method with a reactive one:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"/value"</span><span class="o">)</span>
<span class="nc">Mono</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="nf">fetchValue</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">Mono</span><span class="o">.</span><span class="na">fromCallable</span><span class="o">(()</span> <span class="o">-></span> <span class="o">{</span>
<span class="n">sleepFor</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="no">SECONDS</span><span class="o">);</span>
<span class="k">return</span> <span class="s">"42"</span><span class="o">;</span>
<span class="o">});</span>
<span class="o">}</span>
</code></pre></div></div>
<p>as well as replacing spring-boot-starter-web dependency by spring-boot-starter-webflux<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><dependency></span>
<span class="nt"><groupId></span>org.springframework.boot<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>spring-boot-starter-webflux<span class="nt"></artifactId></span>
<span class="nt"></dependency></span>
</code></pre></div></div>
<p>With all this reactive bits and pieces in place let’s see what happens if we hit our service with 100 concurrent users:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>siege <span class="nt">-b</span> <span class="nt">-c</span> 100 <span class="nt">-r</span> 10 http://localhost:8080/value
Transactions: 1000 hits
Availability: 100.00 %
Elapsed <span class="nb">time</span>: 126.51 secs
Data transferred: 0.00 MB
Response <span class="nb">time</span>: 12.01 secs
Transaction rate: 7.90 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 94.91
Successful transactions: 1000
Failed transactions: 0
Longest transaction: 13.09
Shortest transaction: 1.00
</code></pre></div></div>
<p>WAT?
A mere 8 TPS.
There has to be another thread limit in place!
In fact there is one: there will be as many event loop threads as <strong>Runtime.getRuntime().availableProcessors()</strong> (as we can see from the results we have 8 available processors).</p>
<p>Yes, you heard it right - event loops.
By default Spring Boot 2 will switch from Tomcat to Netty if you use spring-boot-starter-webflux.
Later on we’ll see how our reactive implementation performs on Tomcat but for now let’s stick to Netty, a damn fast asynchronous event-driven network application framework.
Let’s take another look at our implementation and see if we have missed something because those 8 TPS stay in opposition to Netty being fast.</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"/value"</span><span class="o">)</span>
<span class="nc">Mono</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="nf">fetchValue</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">Mono</span><span class="o">.</span><span class="na">fromCallable</span><span class="o">(()</span> <span class="o">-></span> <span class="o">{</span>
<span class="n">sleepFor</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="no">SECONDS</span><span class="o">);</span>
<span class="k">return</span> <span class="s">"42"</span><span class="o">;</span>
<span class="o">});</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Knowing that we have 8 event loop threads we can tell why we are getting just 8 TPS: it’s because those threads quickly get blocked by the blocking operations.
It’s irresponsible to run any kind of I/O or any other blocking operation on an event loop thread.
The fix is rather obvious - move blocking operation into another thread:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"/value"</span><span class="o">)</span>
<span class="nc">Mono</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="nf">fetchValue</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">Mono</span><span class="o">.</span><span class="na">fromCallable</span><span class="o">(()</span> <span class="o">-></span> <span class="o">{</span>
<span class="n">sleepFor</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="no">SECONDS</span><span class="o">);</span>
<span class="k">return</span> <span class="s">"42"</span><span class="o">;</span>
<span class="o">}).</span><span class="na">subscribeOn</span><span class="o">(</span><span class="nc">Schedulers</span><span class="o">.</span><span class="na">elastic</span><span class="o">());</span>
<span class="o">}</span>
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>siege <span class="nt">-b</span> <span class="nt">-c</span> 100 <span class="nt">-r</span> 10 http://localhost:8080/value
Transactions: 1000 hits
Availability: 100.00 %
Elapsed <span class="nb">time</span>: 10.06 secs
Data transferred: 0.00 MB
Response <span class="nb">time</span>: 1.00 secs
Transaction rate: 99.40 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 99.89
Successful transactions: 1000
Failed transactions: 0
Longest transaction: 1.02
Shortest transaction: 1.00
</code></pre></div></div>
<p>Now we are getting similar results as we had on non-reactive implementation running on Tomcat with default 200 worker threads.
Even threads usage looks similar:</p>
<p><img src="../assets/images/posts/resources-utilization-in-reactive-services/handling_100_concurrent_requests_to_reactive_service_on_netty.png" alt="alt text" title="threads activity during 100 concurrent requests to reactive service running on Netty" /></p>
<p>The only thing we got rid of is the the worker thread limit that got replaced with unbounded Schedulers.elastic() thread pool.
Of course we can (and often should) replace Schedulers.elastic() with a scheduler over which we have full control (e.g. <a href="https://projectreactor.io/docs/core/release/api/reactor/core/scheduler/Schedulers.html#fromExecutorService-java.util.concurrent.ExecutorService-">by creating one based on an ExecutorService</a>).</p>
<p>Going reactive on Netty didn’t yield noticeable improvements, so let’s see how the situation looks like when we replace Netty with Tomcat by adding an explicit dependency on spring-boot-starter-tomcat:</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><dependency></span>
<span class="nt"><groupId></span>org.springframework.boot<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>spring-boot-starter-tomcat<span class="nt"></artifactId></span>
<span class="nt"></dependency></span>
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>siege <span class="nt">-b</span> <span class="nt">-c</span> 100 <span class="nt">-r</span> 10 http://localhost:8080/value
Transactions: 1000 hits
Availability: 100.00 %
Elapsed <span class="nb">time</span>: 10.07 secs
Data transferred: 0.00 MB
Response <span class="nb">time</span>: 1.01 secs
Transaction rate: 99.30 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 99.86
Successful transactions: 1000
Failed transactions: 0
Longest transaction: 1.03
Shortest transaction: 1.00
</code></pre></div></div>
<p>The TPS rates are not surprising but threads usage is:</p>
<p><img src="../assets/images/posts/resources-utilization-in-reactive-services/handling_100_concurrent_requests_to_reactive_service_on_tomcat_outside_of_worker_thread_pool.png" alt="alt text" title="threads activity during 100 concurrent requests to reactive service running on Tomcat and using separate threads to process requests" /></p>
<p>We are using much more threads than we used to (since threads are being kept alive for same time in case they could be reused the actual number of threads used will vary depending on availability of idle threads in the pool).
Let’s take another look at the implementation and try to find the reason of such behavior:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"/value"</span><span class="o">)</span>
<span class="nc">Mono</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="nf">fetchValue</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">Mono</span><span class="o">.</span><span class="na">fromCallable</span><span class="o">(()</span> <span class="o">-></span> <span class="o">{</span>
<span class="n">sleepFor</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="no">SECONDS</span><span class="o">);</span>
<span class="k">return</span> <span class="s">"42"</span><span class="o">;</span>
<span class="o">}).</span><span class="na">subscribeOn</span><span class="o">(</span><span class="nc">Schedulers</span><span class="o">.</span><span class="na">elastic</span><span class="o">());</span>
<span class="o">}</span>
</code></pre></div></div>
<p>By now it should be obvious that due to the thread per request model we can end up using 2 threads to process each request: one container worker thread handling the request and an additional thread performing the blocking operation.
To reduce the number of threads used when running on servlet container (i.e. Tomcat) we can simply move back blocking operation to the worker thread (we could also use servlet asynchronous processing facilities in order not to block the worker thread).</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"/value"</span><span class="o">)</span>
<span class="nc">Mono</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="nf">fetchValue</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">Mono</span><span class="o">.</span><span class="na">fromCallable</span><span class="o">(()</span> <span class="o">-></span> <span class="o">{</span>
<span class="n">sleepFor</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="no">SECONDS</span><span class="o">);</span>
<span class="k">return</span> <span class="s">"42"</span><span class="o">;</span>
<span class="o">});</span>
<span class="o">}</span>
</code></pre></div></div>
<p><img src="../assets/images/posts/resources-utilization-in-reactive-services/handling_100_concurrent_requests_to_reactive_service_on_tomcat_on_worker_thread_pool.png" alt="alt text" title="threads activity during 100 concurrent requests to reactive service running on Tomcat and using worker threads to process requests" /></p>
<p>Of course if you follow what’s going on in the Spring ecosystem you know that along with WebFlux Spring Framework 5.0 will allow you to replace MVC-style handler method mappings by functional-style RouterFunctions:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@Bean</span>
<span class="nc">RouterFunction</span><span class="o"><</span><span class="nc">ServerResponse</span><span class="o">></span> <span class="nf">routerFunction</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">route</span><span class="o">(</span><span class="no">GET</span><span class="o">(</span><span class="s">"/value"</span><span class="o">),</span> <span class="n">request</span> <span class="o">-></span> <span class="n">fetchValueHandler</span><span class="o">());</span>
<span class="o">}</span>
<span class="nc">Mono</span><span class="o"><</span><span class="nc">ServerResponse</span><span class="o">></span> <span class="nf">fetchValueHandler</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">ServerResponse</span><span class="o">.</span><span class="na">ok</span><span class="o">()</span>
<span class="o">.</span><span class="na">body</span><span class="o">(</span><span class="n">fetchValue</span><span class="o">(),</span> <span class="nc">String</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="o">}</span>
<span class="nc">Mono</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="nf">fetchValue</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">Mono</span><span class="o">.</span><span class="na">fromCallable</span><span class="o">(()</span> <span class="o">-></span> <span class="o">{</span>
<span class="n">sleepFor</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="no">SECONDS</span><span class="o">);</span>
<span class="k">return</span> <span class="s">"42"</span><span class="o">;</span>
<span class="o">});</span>
<span class="o">}</span>
</code></pre></div></div>
<p>However it’s just a different way of defining request handlers and it doesn’t make your application run faster nor use less threads.</p>
<p>As we have seen simply going reactive does not imply making your services faster or use less resources, so why there is so much buzz around it?
Well if you compare how much simpler and safer it is to control where and how your logic is being executed with reactive APIs:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"/value"</span><span class="o">)</span>
<span class="nc">Mono</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="nf">fetchValue</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">Mono</span><span class="o">.</span><span class="na">fromCallable</span><span class="o">(()</span> <span class="o">-></span> <span class="o">{</span>
<span class="n">sleepFor</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="no">SECONDS</span><span class="o">);</span>
<span class="k">return</span> <span class="s">"42"</span><span class="o">;</span>
<span class="o">}).</span><span class="na">subscribeOn</span><span class="o">(</span><span class="nc">Schedulers</span><span class="o">.</span><span class="na">elastic</span><span class="o">());</span>
<span class="o">}</span>
</code></pre></div></div>
<p>than without them:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="nc">ExecutorService</span> <span class="n">threadPool</span> <span class="o">=</span> <span class="nc">Executors</span><span class="o">.</span><span class="na">newCachedThreadPool</span><span class="o">();</span>
<span class="nd">@GetMapping</span><span class="o">(</span><span class="s">"/value"</span><span class="o">)</span>
<span class="nc">DeferredResult</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="nf">fetchValue</span><span class="o">()</span> <span class="o">{</span>
<span class="nc">DeferredResult</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="n">deferredResult</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">DeferredResult</span><span class="o"><>();</span>
<span class="n">fetchValueAsync</span><span class="o">(</span><span class="n">deferredResult</span><span class="o">);</span>
<span class="k">return</span> <span class="n">deferredResult</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">private</span> <span class="kt">void</span> <span class="nf">fetchValueAsync</span><span class="o">(</span><span class="nc">DeferredResult</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="n">deferredResult</span><span class="o">)</span> <span class="o">{</span>
<span class="nc">CompletableFuture</span><span class="o">.</span><span class="na">supplyAsync</span><span class="o">(</span><span class="k">this</span><span class="o">::</span><span class="n">fetchValueSync</span><span class="o">,</span> <span class="n">threadPool</span><span class="o">)</span>
<span class="o">.</span><span class="na">whenCompleteAsync</span><span class="o">((</span><span class="n">result</span><span class="o">,</span> <span class="n">throwable</span><span class="o">)</span> <span class="o">-></span> <span class="n">deferredResult</span><span class="o">.</span><span class="na">setResult</span><span class="o">(</span><span class="n">result</span><span class="o">));</span>
<span class="o">}</span>
<span class="kd">private</span> <span class="nc">String</span> <span class="nf">fetchValueSync</span><span class="o">()</span> <span class="o">{</span>
<span class="nc">Sleeper</span><span class="o">.</span><span class="na">sleepFor</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="no">SECONDS</span><span class="o">);</span>
<span class="k">return</span> <span class="s">"42"</span><span class="o">;</span>
<span class="o">}</span>
</code></pre></div></div>
<p>it starts to make sense and making it easier to control this aspect of execution is just one of the perks of reactive implementations.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>at the time of writing WebFlux (which is part of Spring Framwork 5.0) is in RC1 and Spring Boot 2.0 is still available only as snapshot releases <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Kamil Szymańskikamil.szymanski.dev@gmail.comBlocking is wastefulReliable artifact versioning schemes2016-10-25T00:17:37+02:002016-10-25T00:17:37+02:00https://kamilszymanski.github.io/reliable-artifact-versioning-schemes<p>Along with the <a href="https://github.com/mockito/mockito">Mockito</a> 2.1.0 release the Mockito Team published <a href="https://github.com/mockito/mockito/wiki/Continuous-Delivery-Details#why2.1.0">an article describing issues with their previous artifact versioning scheme</a>.
This reminds me about another versioning scheme that might cause you real trouble, a scheme that is based on CI server’s next build number.</p>
<p>Before we get into problems related to this versioning scheme let’s quickly discuss some of it’s characteristics.
One of the downsides of this scheme is that most of the time it doesn’t carry any useful information.
It says nothing about how smooth should the upgrade to the newer version be nor how old the current version is.
But hey, at least build number based scheme guarantees proper ordering of versions, doesn’t it?
It also guarantees artifact version uniqueness, doesn’t it?
Actually such versioning schemes don’t guarantee version uniqueness nor proper ordering.
They don’t guarantee that because next build numbers are not under your control.
You have to rely on your CI server assigning them properly.</p>
<p>What happens if you completely loose your CI server?
What if hurricane strikes your data center?
If that seems highly improbable consider a human error, e.g. mistakenly removing some other job than the one you intended to.
That sounds probable, and it can even be automated :D</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">To make error is human. To propagate error to all server in automatic way is <a href="https://twitter.com/hashtag/devops?src=hash">#devops</a>.</p>— DevOps Borat (@DEVOPS_BORAT) <a href="https://twitter.com/DEVOPS_BORAT/status/41587168870797312">February 26, 2011</a></blockquote>
<script async="" src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>What if you decide to migrate to another CI server to be able to keep pipelines as a code?
With build number based scheme you have given yourself one more thing to work on.
Now besides migrating your pipelines you also have to migrate next build numbers.
And if you don’t migrate build numbers you no longer can rely on the <code class="language-plaintext highlighter-rouge">LATEST</code> version in your deployment scripts nor on your artifact repository properly selecting the latest artifact.
Moreover due to duplicated artifact versions you will end up being unable to deploy newly built artifacts to artifact repository and thus being unable to deploy to production.</p>
<p>Don’t get me wrong, build numbers are not completely wrong, they may sometimes even be useful.
They are just broken when used as a significant part of versioning scheme.
Therefore always make sure you control all significant parts of your versioning scheme or that they are derived from source(s) that guarantees uniqueness and proper ordering.</p>Kamil Szymańskikamil.szymanski.dev@gmail.comDon't derive artifact versions from build numbersCustom type renderers in IntelliJ IDEA debugger2016-07-31T02:03:34+02:002016-07-31T02:03:34+02:00https://kamilszymanski.github.io/custom-type-renderers-in-intellij-idea-debugger<p>When I step through code in a debugger it usually means I’m trying to understand why that code behaves differently than expected.
In such situations I especially want my IDE to present me only with data relevant to the problem at hand.
Unfortunately my IDE doesn’t have a crystal ball that tells it what context I’m in and what data is relevant in the given context.
The IDE just displays variables used in the code and allows me to evaluate custom expressions to dig things further.
Sadly displayed data often tends to either be meaningless in the given context or contain a lot of noise that I have to filter out by myself risking overlooking significant pieces.
Moreover evaluating expressions while being invaluable also means manual work, I have to open evaluate expression dialog and type in the expression risking losing focus of the main problem.
To sum it up I want to be presented only with relevant data and I want it to be presented in a meaningful way.
This doesn’t sound feasible without me helping my IDE.
Fortunately my IDE, <a href="https://www.jetbrains.com/idea/">IntelliJ IDEA</a>, has a really useful feature called custom type renderers that allows me to provide that necessary help to my IDE.
Let’s see how those custom type renderers can help debug things.</p>
<p>Let’s imagine a simple scenario where we calculate an important value based on a price plan and event duration:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">Duration</span> <span class="n">eventDuration</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Duration</span><span class="o">(</span><span class="n">eventStart</span><span class="o">,</span> <span class="n">eventEnd</span><span class="o">);</span>
<span class="k">return</span> <span class="nf">estimateCost</span><span class="o">(</span><span class="n">pricePlan</span><span class="o">,</span> <span class="n">eventDuration</span><span class="o">);</span>
</code></pre></div></div>
<p>This piece of code works as it should until one day we notice strange results coming out of it.
Fortunately we are able to capture inputs that allow us to reproduce the problematic scenario:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">PricePlan</span> <span class="n">pricePlan</span> <span class="o">=</span> <span class="nc">PricePlan</span><span class="o">.</span><span class="na">REGULAR</span><span class="o">;</span>
<span class="nc">DateTime</span> <span class="n">eventStart</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">DateTime</span><span class="o">(</span><span class="mi">2016</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">27</span><span class="o">,</span> <span class="mi">1</span><span class="o">,</span> <span class="mi">30</span><span class="o">,</span> <span class="no">WARSAW_TIME</span><span class="o">);</span>
<span class="nc">DateTime</span> <span class="n">eventEnd</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">DateTime</span><span class="o">(</span><span class="mi">2016</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">27</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">40</span><span class="o">,</span> <span class="no">WARSAW_TIME</span><span class="o">);</span>
</code></pre></div></div>
<p>Not being able to deduct where the problem lies by reasoning about the code we run it through a debugger.
Based on the inputs we assume that <code class="language-plaintext highlighter-rouge">PricePlan.REGULAR</code> and a duration equal to <code class="language-plaintext highlighter-rouge">2 hours 10 minutes</code> (in milliseconds <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>) is passed in to the <code class="language-plaintext highlighter-rouge">estimateCost</code> method.</p>
<p><img src="../assets/images/posts/custom_type_renderers_in_intellij_idea_debugger/meaningless_duration_representation.png" alt="alt text" title="debugger displays values of used variables" /></p>
<p>The debugger shows us that we indeed pass in <code class="language-plaintext highlighter-rouge">PricePlan.REGULAR</code> as the first parameter and some duration as the second one.
Unfortunately the way <a href="http://www.joda.org/joda-time/apidocs/org/joda/time/Duration.html">org.joda.time.Duration</a> value is presented to us is meaningless, thus we have to manually evaluate an expression to get the data we need:</p>
<p><img src="../assets/images/posts/custom_type_renderers_in_intellij_idea_debugger/code_fragment_evaluation.png" alt="alt text" title="evaluating code fragment to get the length of event duration" /></p>
<p>We can see that event duration is equal to <code class="language-plaintext highlighter-rouge">1 hour 10 minutes</code> instead of <code class="language-plaintext highlighter-rouge">2 hours 10 minutes</code>.
Before we dig into the why, let’s focus on how can we make this information easily accessible by using custom type renderer.
Let’s create one that displays <a href="http://www.joda.org/joda-time/apidocs/org/joda/time/Duration.html">org.joda.time.Duration</a> values as a result of the following expression:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">org</span><span class="o">.</span><span class="na">joda</span><span class="o">.</span><span class="na">time</span><span class="o">.</span><span class="na">format</span><span class="o">.</span><span class="na">PeriodFormat</span><span class="o">.</span><span class="na">getDefault</span><span class="o">()</span>
<span class="o">.</span><span class="na">withParseType</span><span class="o">(</span><span class="n">org</span><span class="o">.</span><span class="na">joda</span><span class="o">.</span><span class="na">time</span><span class="o">.</span><span class="na">PeriodType</span><span class="o">.</span><span class="na">time</span><span class="o">())</span>
<span class="o">.</span><span class="na">print</span><span class="o">(</span><span class="k">this</span><span class="o">.</span><span class="na">toPeriod</span><span class="o">())</span>
</code></pre></div></div>
<p><img src="../assets/images/posts/custom_type_renderers_in_intellij_idea_debugger/duration_type_renderer.png" alt="alt text" title="duration type renderer" /></p>
<p>Now the data displayed by the debugger starts being useful:</p>
<p><img src="../assets/images/posts/custom_type_renderers_in_intellij_idea_debugger/meaningful_duration_representation.png" alt="alt text" title="debugger displays event duration in a meaningful way" /></p>
<p>Having significant data on the first plan let’s go back to analyzing why the duration of an event that starts at <code class="language-plaintext highlighter-rouge">1:30 Warsaw Time</code> and ends at <code class="language-plaintext highlighter-rouge">3:40 Warsaw Time</code> is <code class="language-plaintext highlighter-rouge">1 hour 10 minutes</code> instead of <code class="language-plaintext highlighter-rouge">2 hours 10 minutes</code>.</p>
<p><img src="../assets/images/posts/custom_type_renderers_in_intellij_idea_debugger/event_start_and_end_dates.png" alt="alt text" title="event start and end dates" /></p>
<p>As I mentioned in the beginning it’s easy to overlook important data when you are filtering out noise by yourself.
You want that noise to be automatically filtered out.
Given character sequences as long as <code class="language-plaintext highlighter-rouge">2016-03-27T01:30:00.000+01:00</code> and <code class="language-plaintext highlighter-rouge">2016-03-27T03:40:00.000+02:00</code> you probably focused on making sure that they represent the same day and then extracting hours.
Since both dates were in the same time zone chances are high you simply ignored time offsets.
Of course you can alter the way dates are represented so they are easier to grasp and analyze but since you already know where the problem lies let’s make the debugger clearly say it by rendering <a href="http://www.joda.org/joda-time/apidocs/org/joda/time/DateTime.html">org.joda.time.DateTime</a> values as a result of the following expression:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">org</span><span class="o">.</span><span class="na">joda</span><span class="o">.</span><span class="na">time</span><span class="o">.</span><span class="na">DateTimeZone</span><span class="o">.</span><span class="na">forID</span><span class="o">(</span><span class="s">"Europe/Warsaw"</span><span class="o">)</span>
<span class="o">.</span><span class="na">getOffset</span><span class="o">(</span><span class="k">this</span><span class="o">)</span> <span class="o">==</span> <span class="mi">3600000</span> <span class="o">?</span> <span class="s">"standard"</span> <span class="o">:</span> <span class="s">"daylight saving"</span>
</code></pre></div></div>
<p><img src="../assets/images/posts/custom_type_renderers_in_intellij_idea_debugger/datetime_type_renderer.png" alt="alt text" title="datetime type renderer" /></p>
<p>Yes, you guessed it right, there was a time change between the start and the end of an event:</p>
<p><img src="../assets/images/posts/custom_type_renderers_in_intellij_idea_debugger/meaningful_datetime_offsets_representation.png" alt="alt text" title="debugger displays that dates have different time offsets" /></p>
<p>On <code class="language-plaintext highlighter-rouge">27 March 2016</code> at <code class="language-plaintext highlighter-rouge">2:00</code> in <code class="language-plaintext highlighter-rouge">Europe/Warsaw</code> time zone clocks are turned forward by 1 hour to change from <a href="https://en.wikipedia.org/wiki/Central_European_Time">Central European Time</a> to <a href="https://en.wikipedia.org/wiki/Central_European_Summer_Time">Central European Summer Time</a> “so that evening daylight lasts an hour longer, while sacrificing normal sunrise times”<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.
Now the debugger clearly says it but at the same time makes it hard to figure out what time those <a href="http://www.joda.org/joda-time/apidocs/org/joda/time/DateTime.html">org.joda.time.DateTime</a> values represent:</p>
<p><img src="../assets/images/posts/custom_type_renderers_in_intellij_idea_debugger/meaningless_datetime_node_representation.png" alt="alt text" title="meaningless datetime representation in the data view" /></p>
<p>Let’s make that information easily available by further configuring <a href="http://www.joda.org/joda-time/apidocs/org/joda/time/DateTime.html">org.joda.time.DateTime</a> type renderer, this time focusing on how those values are represented after node expansion in the data view:</p>
<p><img src="../assets/images/posts/custom_type_renderers_in_intellij_idea_debugger/datetime_list_expressions.png" alt="alt text" title="datetime type renderer with list expressions" /></p>
<p>With the simple <code class="language-plaintext highlighter-rouge">this.toString()</code> expression we get exactly what we need:</p>
<p><img src="../assets/images/posts/custom_type_renderers_in_intellij_idea_debugger/meaningful_datetime_node_representation.png" alt="alt text" title="meaningful datetime representation in the data view" /></p>
<p>As you have seen in this simplistic example custom type renderers can make your debugging session way easier by hiding irrelevant data and putting significant data on the first plan.
Another noteworthy aspect of custom type renderers is that they automatically re-render all displayed variables after any change made to their definition (including creation and removal) so that you don’t have to rerun your debugging session.
In fact all the screenshots in this article come from the same debugging session.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="http://www.joda.org/joda-time/apidocs/org/joda/time/Duration.html">org.joda.time.Duration</a> represents length of time in milliseconds <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>quote from <a href="https://en.wikipedia.org/wiki/Daylight_saving_time">Wikipedia: Daylight saving time</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Kamil Szymańskikamil.szymanski.dev@gmail.comAltering the way IntelliJ IDEA debugger displays valuesInterpreting jstat’s number of Full GC events2016-02-04T23:58:26+01:002016-02-04T23:58:26+01:00https://kamilszymanski.github.io/interpreting-jstats-number-of-full-gc-events<p><em>“You can’t control what you can’t measure”</em><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> and you want to control software you’re running.
Nonetheless measuring is not enough, you also need to know how to interpret results of such measurements.</p>
<p>It’s not hard to find examples of misinterpreted measurement results with the <code class="language-plaintext highlighter-rouge">free</code><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> command output being one of the most commonly misinterpreted ones.
Often lack of understanding of used terminology or it’s ambiguity can be accounted for it.
For example let’s look at <code class="language-plaintext highlighter-rouge">jstat</code> output for a Java process:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>jstat <span class="nt">-gc</span> <span class="nt">-t</span> 4648 1s
Timestamp S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
19.4 8704.0 8704.0 0.0 8704.0 69952.0 12416.7 240928.0 173395.8 154352.0 149283.9 22180.0 20722.4 27 0.594 8 0.231 0.825
</code></pre></div></div>
<p>Anyone knowing JVM Garbage Collection basics should have no trouble telling what those values mean, shouldn’t he?
Well let’s try: EC stands for Eden space Capacity, EU denotes Eden space Usage, S0C - Survivor space 0 Capacity, YGC - number of Young generation Garbage Collections, YGCT - Young generation Garbage Collection Time, and so on.</p>
<p>Pretty obvious, isn’t it?
So let’s gather statistics for the old generation (and metaspace) and answer the question how many Full GCs<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> were performed?</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>jstat <span class="nt">-gcold</span> <span class="nt">-t</span> 5007 1s
Timestamp MC MU CCSC CCSU OC OU YGC FGC FGCT GCT
11.7 113064.0 109246.8 16156.0 14883.7 174784.0 115285.4 13 4 0.079 0.368
</code></pre></div></div>
<p>If your answer is 4 you might be correct or far from being correct.
It all depends on which garbage collector the monitored Java process was using.
If it was using Parallel GC the answer would be 4, but since it was using CMS the answer is different, the answer is 2.</p>
<p>Let’s check the GC logs for the analysed Java process that uses CMS collector and see where the difference comes from:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0.408: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 0.408: <span class="o">[</span>ParNew: 69952K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0396199 secs] 69952K->21786K<span class="o">(</span>253440K<span class="o">)</span>, 0.0397117 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.12 <span class="nv">sys</span><span class="o">=</span>0.02, <span class="nv">real</span><span class="o">=</span>0.04 secs]
1.213: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 1.213: <span class="o">[</span>ParNew: 78656K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0115476 secs] 91738K->29967K<span class="o">(</span>253440K<span class="o">)</span>, 0.0116237 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.06 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.00 secs]
2.152: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 2.152: <span class="o">[</span>ParNew: 78656K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0176088 secs] 99919K->38548K<span class="o">(</span>253440K<span class="o">)</span>, 0.0176831 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.11 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.02 secs]
2.170: <span class="o">[</span>GC <span class="o">(</span>CMS Initial Mark<span class="o">)</span> <span class="o">[</span>1 CMS-initial-mark: 29844K<span class="o">(</span>174784K<span class="o">)]</span> 39075K<span class="o">(</span>253440K<span class="o">)</span>, 0.0021170 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.02 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.00 secs]
2.172: <span class="o">[</span>CMS-concurrent-mark-start]
2.185: <span class="o">[</span>CMS-concurrent-mark: 0.013/0.013 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.08 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.02 secs]
2.185: <span class="o">[</span>CMS-concurrent-preclean-start]
2.186: <span class="o">[</span>CMS-concurrent-preclean: 0.001/0.001 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.00 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.00 secs]
2.186: <span class="o">[</span>CMS-concurrent-abortable-preclean-start]
2.637: <span class="o">[</span>CMS-concurrent-abortable-preclean: 0.169/0.451 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>2.13 <span class="nv">sys</span><span class="o">=</span>0.05, <span class="nv">real</span><span class="o">=</span>0.45 secs]
2.637: <span class="o">[</span>GC <span class="o">(</span>CMS Final Remark<span class="o">)</span> <span class="o">[</span>YG occupancy: 54140 K <span class="o">(</span>78656 K<span class="o">)]</span>2.637: <span class="o">[</span>Rescan <span class="o">(</span>parallel<span class="o">)</span> , 0.0284352 secs]2.666: <span class="o">[</span>weak refs processing, 0.0001802 secs]2.666: <span class="o">[</span>class unloading, 0.0051609 secs]2.671: <span class="o">[</span>scrub symbol table, 0.0035550 secs]2.675: <span class="o">[</span>scrub string table, 0.0008166 secs][1 CMS-remark: 29844K<span class="o">(</span>174784K<span class="o">)]</span> 83984K<span class="o">(</span>253440K<span class="o">)</span>, 0.0391194 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.21 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.04 secs]
2.676: <span class="o">[</span>CMS-concurrent-sweep-start]
2.688: <span class="o">[</span>CMS-concurrent-sweep: 0.011/0.012 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.06 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.01 secs]
2.688: <span class="o">[</span>CMS-concurrent-reset-start]
2.696: <span class="o">[</span>CMS-concurrent-reset: 0.008/0.008 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.04 <span class="nv">sys</span><span class="o">=</span>0.01, <span class="nv">real</span><span class="o">=</span>0.01 secs]
3.044: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 3.044: <span class="o">[</span>ParNew: 78656K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0251656 secs] 97836K->40288K<span class="o">(</span>253440K<span class="o">)</span>, 0.0252453 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.14 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.03 secs]
3.677: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 3.677: <span class="o">[</span>ParNew: 78656K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0159650 secs] 110240K->50554K<span class="o">(</span>253440K<span class="o">)</span>, 0.0160374 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.06 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.01 secs]
4.851: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 4.851: <span class="o">[</span>ParNew: 78656K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0172068 secs] 120506K->61047K<span class="o">(</span>253440K<span class="o">)</span>, 0.0172778 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.10 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.02 secs]
6.191: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 6.192: <span class="o">[</span>ParNew: 78656K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0271375 secs] 130999K->77488K<span class="o">(</span>253440K<span class="o">)</span>, 0.0272281 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.15 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.03 secs]
6.219: <span class="o">[</span>GC <span class="o">(</span>CMS Initial Mark<span class="o">)</span> <span class="o">[</span>1 CMS-initial-mark: 68784K<span class="o">(</span>174784K<span class="o">)]</span> 78713K<span class="o">(</span>253440K<span class="o">)</span>, 0.0030824 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.02 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.00 secs]
6.222: <span class="o">[</span>CMS-concurrent-mark-start]
6.288: <span class="o">[</span>CMS-concurrent-mark: 0.057/0.066 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.39 <span class="nv">sys</span><span class="o">=</span>0.01, <span class="nv">real</span><span class="o">=</span>0.07 secs]
6.288: <span class="o">[</span>CMS-concurrent-preclean-start]
6.291: <span class="o">[</span>CMS-concurrent-preclean: 0.002/0.002 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.01 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.00 secs]
6.291: <span class="o">[</span>CMS-concurrent-abortable-preclean-start]
7.113: <span class="o">[</span>CMS-concurrent-abortable-preclean: 0.775/0.822 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>3.79 <span class="nv">sys</span><span class="o">=</span>0.09, <span class="nv">real</span><span class="o">=</span>0.83 secs]
7.113: <span class="o">[</span>GC <span class="o">(</span>CMS Final Remark<span class="o">)</span> <span class="o">[</span>YG occupancy: 45871 K <span class="o">(</span>78656 K<span class="o">)]</span>7.113: <span class="o">[</span>Rescan <span class="o">(</span>parallel<span class="o">)</span> , 0.0072989 secs]7.121: <span class="o">[</span>weak refs processing, 0.0005665 secs]7.121: <span class="o">[</span>class unloading, 0.0092666 secs]7.131: <span class="o">[</span>scrub symbol table, 0.0150502 secs]7.146: <span class="o">[</span>scrub string table, 0.0012746 secs][1 CMS-remark: 68784K<span class="o">(</span>174784K<span class="o">)]</span> 114655K<span class="o">(</span>253440K<span class="o">)</span>, 0.0346254 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.12 <span class="nv">sys</span><span class="o">=</span>0.01, <span class="nv">real</span><span class="o">=</span>0.03 secs]
7.148: <span class="o">[</span>CMS-concurrent-sweep-start]
7.185: <span class="o">[</span>CMS-concurrent-sweep: 0.035/0.037 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.24 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.04 secs]
7.185: <span class="o">[</span>CMS-concurrent-reset-start]
7.193: <span class="o">[</span>CMS-concurrent-reset: 0.008/0.008 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.05 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.01 secs]
8.242: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 8.242: <span class="o">[</span>ParNew: 78656K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0260379 secs] 136080K->77426K<span class="o">(</span>253440K<span class="o">)</span>, 0.0261191 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.17 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.03 secs]
8.893: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 8.893: <span class="o">[</span>ParNew: 78656K->8703K<span class="o">(</span>78656K<span class="o">)</span>, 0.0166601 secs] 147378K->85555K<span class="o">(</span>253440K<span class="o">)</span>, 0.0167357 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.10 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.01 secs]
9.279: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 9.280: <span class="o">[</span>ParNew: 78655K->5650K<span class="o">(</span>78656K<span class="o">)</span>, 0.0039284 secs] 155507K->82501K<span class="o">(</span>253440K<span class="o">)</span>, 0.0040061 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.02 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.00 secs]
9.905: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 9.906: <span class="o">[</span>ParNew: 75602K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0143583 secs] 152453K->89562K<span class="o">(</span>253440K<span class="o">)</span>, 0.0144558 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.08 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.01 secs]
10.602: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 10.602: <span class="o">[</span>ParNew: 78656K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0364558 secs] 159514K->104422K<span class="o">(</span>253440K<span class="o">)</span>, 0.0365486 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.14 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.04 secs]
11.523: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 11.523: <span class="o">[</span>ParNew: 78656K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0380868 secs] 174374K->123989K<span class="o">(</span>253440K<span class="o">)</span>, 0.0381740 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.23 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.04 secs]
12.274: <span class="o">[</span>GC <span class="o">(</span>Allocation Failure<span class="o">)</span> 12.274: <span class="o">[</span>ParNew: 78656K->8704K<span class="o">(</span>78656K<span class="o">)</span>, 0.0313268 secs] 193941K->136954K<span class="o">(</span>253440K<span class="o">)</span>, 0.0314313 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.17 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.03 secs]
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">jstat</code> man-pages say that <code class="language-plaintext highlighter-rouge">FGC</code> stands for <code class="language-plaintext highlighter-rouge">the number of Full GC events</code>.
Not the number of garbage collection cycles but the number of garbage collection events.
The significance of <code class="language-plaintext highlighter-rouge">events</code> becomes evident if we compare 4 Full GC events reported by jstat to 2 CMS cycles that we can observe in the GC logs.</p>
<p>Let’s take one more look at the GC logs, this time narrowing down our focus to single CMS cycle:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2.170: <span class="o">[</span>GC <span class="o">(</span>CMS Initial Mark<span class="o">)</span> <span class="o">[</span>1 CMS-initial-mark: 29844K<span class="o">(</span>174784K<span class="o">)]</span> 39075K<span class="o">(</span>253440K<span class="o">)</span>, 0.0021170 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.02 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.00 secs]
2.172: <span class="o">[</span>CMS-concurrent-mark-start]
2.185: <span class="o">[</span>CMS-concurrent-mark: 0.013/0.013 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.08 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.02 secs]
2.185: <span class="o">[</span>CMS-concurrent-preclean-start]
2.186: <span class="o">[</span>CMS-concurrent-preclean: 0.001/0.001 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.00 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.00 secs]
2.186: <span class="o">[</span>CMS-concurrent-abortable-preclean-start]
2.637: <span class="o">[</span>CMS-concurrent-abortable-preclean: 0.169/0.451 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>2.13 <span class="nv">sys</span><span class="o">=</span>0.05, <span class="nv">real</span><span class="o">=</span>0.45 secs]
2.637: <span class="o">[</span>GC <span class="o">(</span>CMS Final Remark<span class="o">)</span> <span class="o">[</span>YG occupancy: 54140 K <span class="o">(</span>78656 K<span class="o">)]</span>2.637: <span class="o">[</span>Rescan <span class="o">(</span>parallel<span class="o">)</span> , 0.0284352 secs]2.666: <span class="o">[</span>weak refs processing, 0.0001802 secs]2.666: <span class="o">[</span>class unloading, 0.0051609 secs]2.671: <span class="o">[</span>scrub symbol table, 0.0035550 secs]2.675: <span class="o">[</span>scrub string table, 0.0008166 secs][1 CMS-remark: 29844K<span class="o">(</span>174784K<span class="o">)]</span> 83984K<span class="o">(</span>253440K<span class="o">)</span>, 0.0391194 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.21 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.04 secs]
2.676: <span class="o">[</span>CMS-concurrent-sweep-start]
2.688: <span class="o">[</span>CMS-concurrent-sweep: 0.011/0.012 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.06 <span class="nv">sys</span><span class="o">=</span>0.00, <span class="nv">real</span><span class="o">=</span>0.01 secs]
2.688: <span class="o">[</span>CMS-concurrent-reset-start]
2.696: <span class="o">[</span>CMS-concurrent-reset: 0.008/0.008 secs] <span class="o">[</span>Times: <span class="nv">user</span><span class="o">=</span>0.04 <span class="nv">sys</span><span class="o">=</span>0.01, <span class="nv">real</span><span class="o">=</span>0.01 secs]
</code></pre></div></div>
<p>We can see that CMS performs its work in several phases and most of them run concurrently with our application.
Only Initial Mark and Final Remark are stop-the-world phases and those are the ones that get counted by jstat as Full GC events.</p>
<p>Should you measure something make sure you know what you really measure otherwise the results can keep you far from reality.
Always validate your assumptions and <a href="https://xkcd.com/293/">RTFM</a>!</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Tom DeMarco <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="http://www.linuxatemyram.com/">Linux is borrowing unused memory for disk caching what makes it looks like you are low on memory while you are not</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>while we speak about ambiguity it’s worth to note that there is no formal definition of Full GC nor Major GC in JVM Specification (kudos to <a href="https://twitter.com/eckes">Bernd Eckenfels</a> for pointing this out) <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Kamil Szymańskikamil.szymanski.dev@gmail.comHow results of your measurements can drift from reality when you don't validate your assumptionsFine-tuning psql2016-01-08T10:24:39+01:002016-01-08T10:24:39+01:00https://kamilszymanski.github.io/fine-tuning-psql<p>I like tools that allow you to stay efficient and one of such tools is psql.
I have not yet seen any other PostgreSQL client that can beat psql in term of productivity.
There are other clients that are better tailored for some types of activities but in terms of overall productivity none of them has been able to beat psql.</p>
<p>Ok, let’s skip the introduction and take a look at a fraction<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> of existing configuration options that allow you to tailor psql for your needs and get a perspective of what’s possible.</p>
<h3 id="environment-aware-command-history">Environment-aware command history</h3>
<p>Like any good shell psql saves commands you execute to allow you search and re-execute them.
By default all commands are stored in a single file (<code class="language-plaintext highlighter-rouge">~/.psql_history</code>) but you can easily store command history per-database, per-user, per-host, etc..
For example to store separate command history per-database and not let it be overwritten by commands from more frequently used database just <code class="language-plaintext highlighter-rouge">\set HISTFILE ~/.psql_history- :DBNAME</code> in <code class="language-plaintext highlighter-rouge">~/.psqlrc</code>.</p>
<h3 id="changing-the-prompt">Changing the prompt</h3>
<p>Let’s <code class="language-plaintext highlighter-rouge">\set PROMPT1 '%[%033[1m%]%M %n@%/%[%033[0m%]%# '</code> and see how it looks like compared to the default one.</p>
<p><img src="../assets/images/posts/fine-tuning-psql/changing-the-prompt.png" alt="alt text" title="changing the prompt" /></p>
<p>If you want to deconstruct the incantation we typed then it goes like this:
<code class="language-plaintext highlighter-rouge">%[%033[1m%]</code> sets font to bold black<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, then it prints the hostname (<code class="language-plaintext highlighter-rouge">%M</code>), username (<code class="language-plaintext highlighter-rouge">%n</code>) and database name (<code class="language-plaintext highlighter-rouge">%/</code>) then sets the font to non-bold black (<code class="language-plaintext highlighter-rouge">[%033[0m%]</code>) and prints <code class="language-plaintext highlighter-rouge">#</code> if the user is a superuser or <code class="language-plaintext highlighter-rouge">></code> otherwise.</p>
<p>As you can probably guess besides the <code class="language-plaintext highlighter-rouge">PROMPT1</code> there is also <code class="language-plaintext highlighter-rouge">PROMPT2</code> and even <code class="language-plaintext highlighter-rouge">PROMPT3</code>.
<code class="language-plaintext highlighter-rouge">PROMPT1</code> is used when psql requests a new command, <code class="language-plaintext highlighter-rouge">PROMPT2</code> is used when you input a multiline command and <code class="language-plaintext highlighter-rouge">PROMPT3</code> is used when you are expected to type in row values while running <code class="language-plaintext highlighter-rouge">SQL COPY</code>.</p>
<h3 id="tab-completion">Tab completion</h3>
<p>If you want to use uppercase SQL keywords to make your queries more readable then tab completion is what you are looking for.
Tab completion makes it a no-brainer once you <code class="language-plaintext highlighter-rouge">\set COMP_KEYWORD_CASE upper</code>.</p>
<p><img src="../assets/images/posts/fine-tuning-psql/tab-completion.gif" alt="alt text" title="tab completion" /></p>
<h3 id="printing-null-values">Printing <code class="language-plaintext highlighter-rouge">NULL</code> values</h3>
<p>By default psql prints <code class="language-plaintext highlighter-rouge">NULL</code> values as blank spaces, but you can alter it by setting <code class="language-plaintext highlighter-rouge">\pset null '<null>'</code><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>
<p><img src="../assets/images/posts/fine-tuning-psql/printing-null-values.png" alt="alt text" title="printing NULL values" /></p>
<h3 id="pager-behavior">Pager behavior</h3>
<p>By default psql uses a pager to paginate text when it deems it necessary but you can make it always always use (<code class="language-plaintext highlighter-rouge">\pset pager always</code>) or even disable it (<code class="language-plaintext highlighter-rouge">\pset pager off</code>).
You can even change the pager itself by setting <code class="language-plaintext highlighter-rouge">PAGER</code> environment variable.</p>
<h3 id="making-configuration-changes-persistent">Making configuration changes persistent</h3>
<p>All of the mentioned options beside the command history file configuration can be set directly on an active psql session and are valid for it’s duration.
To make configuration changes persistent they have to be set in <code class="language-plaintext highlighter-rouge">~/.psqlrc</code>.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>for more configuration options consult <a href="http://www.postgresql.org/docs/9.4/static/app-psql.html">PostgreSQL documentation</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>it turns bold white on a black background <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>we have to use <code class="language-plaintext highlighter-rouge">pset</code> instead of <code class="language-plaintext highlighter-rouge">set</code> because we’re affecting query output <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Kamil Szymańskikamil.szymanski.dev@gmail.comFine-tuning PostgreSQL CLI client