C++ `max_element` vs `std::max` loop
std::max_element vs a manual std::max loop in C++. While max_element is simpler, it's 6.39x slower on large arrays because it finds an iterator then dereferences it rather than tracking the max value directly. Also covers the UB pitfall on empty vectors.Note
This was originally meant to be a short TIL on how I discovered std::max_element to use for finding the max element in a vector but I ended going down the rabbit hole lmao so
hopefully this blog will be a good summary of what I learned.
Finding Max element
I kept having to do this common pattern (along with other similar variations of it) in C++.
auto find_max (std::span < const int > data ) -> int {
int hi = std::numeric_limits < int > ::min ();
for (int d : data ) {
hi = std::max (hi , d );
}
return hi ;
}Turns out you can just use std::max_element. Naively I thought you can just do this.
auto find_max (std::span < const int > data ) -> int {
return * std::max_element (data .begin (), data .end ());
}However these 2 code are not identical. The original has a fallback of std::numeric_limits<int>::min() if std::span is empty but in the max_element case it will be undefined behaviour. So the simple fix is to have the fallback
auto find_max (std::span < const int > data ) -> int {
return data .empty () ?
std::numeric_limits < int > ::min () :
* std::max_element (data .begin (), data .end ());
}Godbolt
Google Benchmark
Here are the results
Build flags: -std=c++20 -O3 -mcpu=apple-m1 -DNDEBUG
Input: 100,000,000 int values, generated once with fixed seed 0xC0FFEE and shared by both benchmarks.
| Implementation | Real Time | CPU Time | Iterations | Throughput | Relative |
|---|---|---|---|---|---|
std::max_element | 65.76 ms | 65.67 ms | 21 | 1.52 Gitems/s | 1.00x |
manual loop with std::max | 10.29 ms | 10.24 ms | 147 | 9.77 Gitems/s | 6.39x faster |
Why std::max + loop is faster
std::max_element does not just compute the max value. It returns an iterator to the first maximum element. That means it has to preserve pointer/index semantics, especially for ties. The compiler/libc++ implementation is effectively doing:
auto best = begin ;
for (auto it = begin + 1 ; it != end ; ++ it ) {
if (* best < * it ) {
best = it ;
}
}
return best ;