Benchmarks & Experiments,
With the Apache Web Server
(Mostly)

Red Rozenglass

Monday, March 24th, 2025

Did some silly overhead benchmarks, mostly with Apache (v2.4.62, with the 'event' mpm), but some Node.js (v18.20.6) and NGINX (v1.26.3) stuff is provided for reference. Tested direct serving of files from disk, as a baseline, compared to CGI, FastCGI, SSI, writing a custom Apache module, Node.js reverse-proxied, and plain Node.js without SSL bypassing Apache entirely, and finally, serving plain files off of disk with NGINX default configurations. In all cases I served a JSON string of around 4KiB in size, pre-serialized. All done on a 2vCPU cloud KVM VM, Slackware, with 4GiB of memory, over the internet, with 75ms to 80ms in ping latency.

Apache From File on Disk

As expected, serving files directly from disk is blazing fast, with very low latency. As concurrency increases, latency increases too.

ab -k -c 500 -n 10000 https://benchmark.link/vault/public/readtest.json
Requests per second:    4860.45 [#/sec] (mean)
Time per request:       102.871 [ms] (mean)
    
ab -k -c 200 -n 10000 https://benchmark.link/vault/public/readtest.json
Requests per second:    2125.12 [#/sec] (mean)
Time per request:       94.112 [ms] (mean)
    
ab -k -c 100 -n 10000 https://benchmark.link/vault/public/readtest.json
Requests per second:    1346.39 [#/sec] (mean)
Time per request:       74.272 [ms] (mean)
    
ab -k -c 800 -n 10000 https://benchmark.link/vault/public/readtest.json
Requests per second:    996.95 [#/sec] (mean)
Time per request:       802.449 [ms] (mean)
    

CGI (mod_cgid)

Serving a simple C program. The program has the entire JSON string embedded in static memory, so no file reading happens at runtime. CGI does a fork(), then the C program prints out the output. In this case mod_cgid is used, as opposed to mod_cgi. As mod_cgid communicates over a socket with a small CGI daemon, which allows fork() to be way lighter than if it was done from the massive Apache process itself, since fork() has to potentially replicate a lot of stuff in that case.

The overhead of CGI, compared to serving files directly from disk, is around a whole order of magnitude of slowness. What can do 4,000 requests now does some 400 requests. (Yes, I compiled with -O3 you don't have to ask :) not that it would be relevant in this case).

ab -k -c 200 -n 10000 https://benchmark.link/cgi-bin/overhead.cgi
Requests per second:    427.46 [#/sec] (mean)
Time per request:       467.875 [ms] (mean)
    
ab -k -c 100 -n 10000 https://benchmark.link/cgi-bin/overhead.cgi
Requests per second:    285.08 [#/sec] (mean)
Time per request:       350.784 [ms] (mean)
    
ab -k -c 500 -n 10000 https://benchmark.link/cgi-bin/overhead.cgi
Requests per second:    442.78 [#/sec] (mean)
Time per request:       1129.236 [ms] (mean)
    

FastCGI (mod_fcgid)

This one is weird. FastCGI was supposed to be faster than CGI, because it doesn't fork() new processes, but instead, starts a process pool, where each process blocks on a while() loop, waiting for a request to be passed to it. So, each process can keep serving requests indefinitely in a loop without having to be started again.

There's almost no difference in speed between FastCGI and CGI in my tests. No idea why. The same C code is used, except linked to libfcgi, and wrapped with a while(FCGI_Accept() >= 0) loop. The string is in static memory, and does not get reallocated or anything. I tried limiting the maximum number of FastCGI processes, just in case, but that didn't help.

The conclusion I have to face is either FastCGI is slower than it should be, and no one cares because no one uses it. Or, FastCGI was made when CGI was slow, but CGI is no longer as slow as it was, especially with the CGI daemon which makes fork() cheaper.

# FcgidMaxProcesses 1000 (default)
ab -k -c 200 -n 10000 https://benchmark.link/fcgi-bin/overhead.fcgi
Requests per second:    430.68 [#/sec] (mean)
Time per request:       464.386 [ms] (mean)
    
ab -k -c 100 -n 10000 https://benchmark.link/fcgi-bin/overhead.fcgi
Requests per second:    286.69 [#/sec] (mean)
Time per request:       348.804 [ms] (mean)
    
ab -k -c 500 -n 10000 https://benchmark.link/fcgi-bin/overhead.fcgi
Requests per second:    510.85 [#/sec] (mean)
Time per request:       978.768 [ms] (mean)
    
# FcgidMaxProcesses 8
ab -k -c 200 -n 10000 https://benchmark.link/fcgi-bin/overhead.fcgi
Requests per second:    421.93 [#/sec] (mean)
Time per request:       474.017 [ms] (mean)
    

Apache Custom C Module

I wrote this Apache module to do the same simple task of writing out a JSON string from static memory. Writing and loading the custom Apache module was surprisingly easy (definitely less overwhelming than writing NGINX modules). A custom module was very fast, comparable to serving the files directly, but with an obvious dip in performance and latency when handling high-concurrency loads.

I did not delve deep into the making of the module. Perhaps further optimizations can be done, since writing a module provides a lot of flexibility and control.

ab -k -c 200 -n 10000 https://benchmark.link/overhead
Requests per second:    2002.68 [#/sec] (mean)
Time per request:       99.866 [ms] (mean)
    
ab -k -c 100 -n 10000 https://benchmark.link/overhead
Requests per second:    1087.55 [#/sec] (mean)
Time per request:       91.950 [ms] (mean)
    
ab -k -c 500 -n 10000 https://benchmark.link/overhead
Requests per second:    1012.46 [#/sec] (mean)
Time per request:       493.847 [ms] (mean)
    

Apache Custom C Module, No TLS

To check the overhead of TLS, the same Apache module tests are repeated, but with HTTP instead of HTTPS. The result is generally 10% to 15% faster than with TLS.

ab -k -c 200 -n 10000 http://benchmark.link/overhead
Requests per second:    2342.55 [#/sec] (mean)
Time per request:       85.377 [ms] (mean)
    
ab -k -c 100 -n 10000 http://benchmark.link/overhead
Requests per second:    1151.10 [#/sec] (mean)
Time per request:       86.873 [ms] (mean)
    
ab -k -c 500 -n 10000 http://benchmark.link/overhead
Requests per second:    1339.46 [#/sec] (mean)
Time per request:       373.285 [ms] (mean)
    

Node.js, Reverse Proxy, TLS

This is a Node.js application, using the Express.js framework, sending the same pre-serialized JSON string, through Apache as a reverse proxy. The performance is close to the Apache C module, but slightly worse. Except in the 200 concurrent requests load, where it's much lower for some reason. Tried this multiple times, the anomaly persisted.

Of course this does not reflect a typical Node.js application, as we're not doing much in JS; no massive allocations forcing crazy garbage collection, no 300 dependencies, no thick frameworks, etc. This test is more like a test of libuv and the Node.js C++ runtime.

ab -k -c 200 -n 10000 https://benchmark.link/overhead-node
Requests per second:    1334.62 [#/sec] (mean)
Time per request:       149.856 [ms] (mean)
    
ab -k -c 100 -n 10000 https://benchmark.link/overhead-node
Requests per second:    1078.57 [#/sec] (mean)
Time per request:       92.716 [ms] (mean)
    
ab -k -c 500 -n 10000 https://benchmark.link/overhead-node
Requests per second:    861.28 [#/sec] (mean)
Time per request:       580.531 [ms] (mean)
    

Node.js, Direct Without Apache, No TLS

Hitting Node.js directly, no reverse proxy, and no TLS. It is still slower than an Apache C module (without TLS) in general, but seems to be better at handling high concurrency loads.

ab -k -c 200 -n 10000 http://benchmark.link:3000/
Requests per second:    2080.56 [#/sec] (mean)
Time per request:       96.128 [ms] (mean)
    
ab -k -c 100 -n 10000 http://benchmark.link:3000/
Requests per second:    1202.77 [#/sec] (mean)
Time per request:       83.141 [ms] (mean)
    
ab -k -c 500 -n 10000 http://benchmark.link:3000/
Requests per second:    1833.58 [#/sec] (mean)
Time per request:       272.690 [ms] (mean)
    

SSI (#include file=readtest.json)

A simple Server Side Include of the plain JSON file in an outer .shtml file. The .shtml file does not really contain anything other than the SSI include. SSI was almost as slow as CGI, which is a bit surprising, or maybe not. I expect that when there are no SSI directive in a file, the server can just use the sendfile() syscall to send it directly without the data leaving the kernel. But if SSI is enabled in a file, then the server has to read the file itself, and do some splicing of the data, etc.

ab -k -c 200 -n 10000 https://benchmark.link/readtest.shtml
Requests per second:    461.16 [#/sec] (mean)
Time per request:       433.689 [ms] (mean)
    
ab -k -c 100 -n 10000 https://benchmark.link/readtest.shtml
Requests per second:    286.69 [#/sec] (mean)
Time per request:       348.804 [ms] (mean)
    
ab -k -c 500 -n 10000 https://benchmark.link/readtest.shtml
Requests per second:    505.00 [#/sec] (mean)
Time per request:       990.094 [ms] (mean)
    

NGINX From File on Disk

Reading the JSON file from disk, with default configurations. Nothing fancy here, except I thought NGINX would be faster than Apache, because everyone says so. But I guess modern Apache is fast enough to surpass NGINX in some cases even. This is probably the magic of mod_mpm_event at play.

ab -k -c 200 -n 10000 http://benchmark.link/readtest.json
Requests per second:    2339.14 [#/sec] (mean)
Time per request:       85.501 [ms] (mean)
    
ab -k -c 100 -n 10000 http://benchmark.link/readtest.json
Requests per second:    1220.01 [#/sec] (mean)
Time per request:       81.967 [ms] (mean)
    
ab -k -c 500 -n 10000 http://benchmark.link/readtest.json
Requests per second:    2539.75 [#/sec] (mean)
Time per request:       196.870 [ms] (mean)
    
ab -k -c 800 -n 10000 http://benchmark.link/readtest.json
Requests per second:    2351.73 [#/sec] (mean)
Time per request:       340.175 [ms] (mean)
    

Conclusion

I hope you might find this useful, but remember, those are micro-benchmarks, which means they're lies. Always benchmark with prototypes of your real code to get an accurate picture. And remember, you most likely don't need 1000s of concurrent requests. It is most likely that CGI will serve most of your cases, and you can always re-write later to something faster. Pre-mature optimization and all that. And to be honest, nothing beats SSI including some AWK CGI script in your HTML, and watching the cute HTML streams load incrementally without any "modern" JavaScript. Check out cgit, and snac2 for some fun "web" code bases that aren't the modern soulless bloat.


If you have comments or corrections, feel free to throw me an email. Check the homepage for my PGP public key. You can get my email from it. To make it easier on you, it's just my last-name at current domain :)