ZEON256 ZEON256
← BACK

C++ `max_element` vs `std::max` loop

ZEON256
ZEON256 2026-05-21 2 min read
c++
Target Audience C++ developers Performance engineers
AI Summary
Comparing std::max_element vs a manual std::max loop in C++. While max_element is simpler, it's 6.39x slower on large arrays because it finds an iterator then dereferences it rather than tracking the max value directly. Also covers the UB pitfall on empty vectors.

Note

This was originally meant to be a short TIL on how I discovered std::max_element to use for finding the max element in a vector but I ended going down the rabbit hole lmao so hopefully this blog will be a good summary of what I learned.

Finding Max element

I kept having to do this common pattern (along with other similar variations of it) in C++.

C++
auto find_max(std::span<const int> data) -> int {
    int hi = std::numeric_limits<int>::min();

    for (int d : data) {
        hi = std::max(hi, d);
    }

    return hi;
}

Turns out you can just use std::max_element. Naively I thought you can just do this.

C++
auto find_max(std::span<const int> data) -> int {
    return *std::max_element(data.begin(), data.end());
}

However these 2 code are not identical. The original has a fallback of std::numeric_limits<int>::min() if std::span is empty but in the max_element case it will be undefined behaviour. So the simple fix is to have the fallback

C++
auto find_max(std::span<const int> data) -> int {
    return data.empty() ? 
        std::numeric_limits<int>::min() :
        *std::max_element(data.begin(), data.end());
}

Godbolt

Google Benchmark

Here are the results

Build flags: -std=c++20 -O3 -mcpu=apple-m1 -DNDEBUG

Input: 100,000,000 int values, generated once with fixed seed 0xC0FFEE and shared by both benchmarks.

ImplementationReal TimeCPU TimeIterationsThroughputRelative
std::max_element65.76 ms65.67 ms211.52 Gitems/s1.00x
manual loop with std::max10.29 ms10.24 ms1479.77 Gitems/s6.39x faster

Why std::max + loop is faster

std::max_element does not just compute the max value. It returns an iterator to the first maximum element. That means it has to preserve pointer/index semantics, especially for ties. The compiler/libc++ implementation is effectively doing:

C++
auto best = begin;

for (auto it = begin + 1; it != end; ++it) {
    if (*best < *it) {
        best = it;
    }
}

return best;