abseil / Performance Hints
Visit Website
Jeffrey Dean & Sanjay Ghemawat
2025/12/16
Summary
This document provides general principles and specific techniques for performance tuning, particularly in C++. It covers topics such as the importance of considering performance early, estimation techniques, measurement and profiling tools, API considerations, algorithmic improvements, better memory representation, reducing allocations, avoiding unnecessary work, making the compiler's job easier, code size considerations, parallelization and synchronization, protocol buffer advice, C++-specific advice, and examples of changes that demonstrate multiple techniques.
Content Sections
The importance of thinking about performance
Disregarding performance concerns during development can lead to a flat profile with no obvious hotspots. It's harder to make significant changes to a system in heavy use. Instead, choose the faster alternative if it doesn't impact readability/complexity significantly.
Estimation
Develop an intuition for how much performance might matter to make informed decisions. Consider whether the code is for testing, application-specific, or library code. Use back-of-the-envelope calculations to estimate the performance of different alternatives.
Measurement
Being able to measure things effectively is the number one tool you'll want to have in your arsenal when doing performance-related work.
Profiling tools and tips
Use profiling tools like pprof and perf. Build production binaries with appropriate debugging information and optimization flags. Write microbenchmarks to improve turnaround time and prevent future performance regressions. Use a benchmark library to emit performance counter readings. Profile lock contention.
What to do when profiles are flat
Consider many small optimizations, find loops closer to the top of call stacks, look for structural changes higher up in the call stacks, look for overly general code, attempt to reduce the number of allocations, and gather other types of profiles.
API considerations
Organize code so that performance improvements can be made inside an encapsulation boundary without affecting public interfaces. Be careful when adding new features to widely used APIs. Provide bulk ops to reduce expensive API boundary crossings or to take advantage of algorithmic improvements. Prefer view types for function arguments. Allow higher-level callers to pass in pre-allocated/pre-computed arguments. Make a class thread-compatible if callers are already synchronized.
Algorithmic improvements
The most critical opportunities for performance improvements come from algorithmic improvements.
Better memory representation
Careful consideration of memory footprint and cache footprint of important data structures can often yield big savings.
Reduce allocations
Memory allocation adds costs, including time spent in the allocator, expensive initialization and destruction, and a larger cache footprint.
Avoid unnecessary work
One of the most effective categories of improving performance is avoiding work you don’t have to do. This can take many forms, including creating specialized paths through code for common cases, precomputation, deferring work until it is really needed, and hoisting work into less-frequently executed pieces of code.
Make the compiler’s job easier
The application programmer can aid the compiler by rewriting the code to operate at a lower level.
Code size considerations
Thinking about these issues is especially important when writing low-level library code that will be used in many places, or when writing templated code that you expect will be instantiated for many different types.
Parallelization and synchronization
Modern machines have many cores, and they are often underutilized. Expensive work may therefore be completed faster by parallelizing it.
Protocol Buffer advice
Protobufs are a convenient representation of data, especially if the data will be sent over the wire or stored persistently. However, they can have significant performance costs. Here are some tips related to protobuf performance: Do not use protobufs unnecessarily. Avoid unnecessary message hierarchies. Use small field numbers for frequently occurring fields.
C++-Specific advice
Absl hash tables usually out-perform C++ standard library containers such as std::map and std::unordered_map.
CLs that demonstrate multiple techniques
Looking at the kinds of changes in these CLs is sometimes a good way to get in the mindset of making general changes to speed up the performance of some part of a system after that has been identified as a bottleneck.