I would like to announce the arrival of justbus-rs v0.2.0!

justbus-rs is an API for LTA's bus arrival timing focused on performance. Initially, justbus-rs was developed to be an example project that uses lta-rs but I got carried away in making things fast after I did a benchmark using wrk and found out that it can 10x more req/s compared to another similar project I found on github called arrivelah. After more optimisation, the current version is now 25x faster!

Disclaimer

I have no intention to bash nodejs and/or the author of that project. It's a great project and I just wanted to find out the difference in performance between the 2 languages

So how did I make it so fast?

Before we move on to anything here are the benchmarks. Benchmarks ran on my personal computer with i7 3770k @ 4.4Ghz and 16G RAM @ 2200Mhz.

zeon@zeon-desktop  ~  wrk -c100 -d15s -t4 http://localhost:8080/api/v1/timings/83139
Running 15s test @ http://localhost:8080/api/v1/timings/83139
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.10ms   27.83ms 839.99ms   99.60%
    Req/Sec    64.04k    17.90k   89.25k    46.88%
  3812462 requests in 15.09s, 6.37GB read
  Non-2xx or 3xx responses: 115
Requests/sec: 252570.08
Transfer/sec:    431.87MB

Hello World Benchmark

zeon@zeon-desktop  ~  wrk -c100 -d15s -t4 http://localhost:8080/api/v1/dummy
Running 15s test @ http://localhost:8080/api/v1/dummy
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.33ms    2.74ms  38.44ms   89.56%
    Req/Sec    61.03k    15.00k   92.95k    61.47%
  3643319 requests in 15.10s, 444.74MB read
Requests/sec: 241334.14
Transfer/sec:     29.46MB

You might be thinking, wow how did the endpoint serving the data be as fast as the one that serves a static hello world?

Let's take a look at the source code

async fn get_timings(
    bus_stop: web::Path<u32>,
    lru: web::Data<Cache<u32, String>>,
    client: web::Data<LTAClient>,
) -> Result<HttpResponse, JustBusError> {
    let bus_stop = bus_stop.into_inner();
    let in_lru = lru.get(bus_stop);
    let res = match in_lru {
        Some(f) => HttpResponse::Ok().content_type("application/json").body(f),
        None => {
            let arrivals = get_arrival(&client, bus_stop, None)
                .await
                .map_err(JustBusError::ClientError)?
                .services;

            let arrival_str = serde_json::to_string(&arrivals).unwrap();

            lru.insert(bus_stop, arrival_str.clone());
            HttpResponse::Ok()
                .content_type("application/json")
                .body(arrival_str)
        }
    };

    Ok(res)
}

As you can see from the code above, the strategy boils down to

  • Caching responses
  • Cache<u32, String>. Cache serialized version instead of structs to avoid unnecessary work. This optimisation added another 100k req/s from the initial Cache<u32, Vec<ArrivalBusService>>
  • Lock-free Arc<Cache<u32, String>>. Initially I used Arc<parking_lot::RwLock<Cache<u32, Vec<ArrivalBusService>>>> (This yielded 100k req/s which is still way faster than nodejs version).
  • Using the fastest web framework (ie Actix-web) available

You are not telling me the whole picture, what are the drawbacks?

It's Rust itself. While most of the newer programming languages that got released within the past decade were designed with simplicity in mind, (eg. Kotlin, Go etc) to improve developer productivity, Rust has a lot of boilerplate (not as much as Java). Rust compile times is also much longer compared to most mainstream languages.

In essence you are paying in terms of

  • Developer productivity
  • Compile time (it takes a whooping 4min to compile on --release mode on my personal computer)

in exchange for performance and most importantly runtime safety