· 6 min read ·

Python Profiling Data Is Richer Than pstats Lets On

Source: lobsters

The Python profiling workflow has a gap that’s easy to overlook until you’ve run cProfile a hundred times. The data collection side is mature: cProfile is in the standard library, it works reliably, and its overhead is deterministic. The visualization side has several decent options: snakeviz serves an icicle chart in your browser, pyinstrument produces readable terminal output, py-spy can attach to a running process without modifying your code. But the middle layer, the part where you navigate the data itself, ask questions about the call graph, and trace paths between a user-facing operation and the functions that consumed its time, has historically been left to pstats.Stats, which is a programmatic interface that hasn’t changed much since Python 2.

Adam Johnson introduced profiling-explorer as a tool for interactively exploring Python profiling data generated by cProfile. Before getting into what that means specifically, it helps to understand what cProfile actually captures, because the data model is richer than most developers realize.

What cProfile Actually Records

Running cProfile on a function or script populates a Stats object whose underlying structure is a dictionary. Each key is a (filename, line_number, function_name) tuple identifying a function. Each value is a five-element tuple:

  • cc: the primitive call count, excluding recursive calls
  • nc: the total call count, including recursive calls
  • tt: the total time spent in the function body alone, excluding time in subcalls
  • ct: the cumulative time, including all subcalls
  • callers: a dictionary mapping (file, lineno, name) tuples of callers to their own (cc, nc, tt, ct) measurements from that specific call site

This is more information than most profiling UIs surface. The callers dict means there is a complete, directional call graph embedded in the data: for any given function, you can trace exactly which functions called it and how many times, with timing split by call site. The distinction between tt (time in this function body only) and ct (time in this function and everything it calls) is what lets you distinguish a slow function from a function that calls slow functions.

import cProfile
import pstats

pr = cProfile.Profile()
pr.enable()
# ... your code ...
pr.disable()

s = pstats.Stats(pr)
s.sort_stats('cumulative')
s.print_stats(20)

The output of print_stats is a fixed-format table. You can then call s.print_callers('my_function') to see what called it, or s.print_callees('my_function') to see what it calls. Each of these requires a separate API call and prints a separate block of text to stdout. The interface is a batch reporting tool, not a navigation interface.

The Navigation Problem

The data model supports rich exploration. The API does not. To answer a question like “which call path contributes most to the cumulative time of my HTTP request handler”, using raw pstats, you would sort by cumulative time, find your handler, note the functions in its callees output, re-sort or filter for each callee, and repeat. The mental model you’re building is a call graph. The tool you’re using to build it is a text-sorting interface.

snakeviz solves part of this visually. It serves an icicle chart from cProfile data where widths are proportional to cumulative time. You can click to zoom into a subtree. This is useful for getting a high-level picture of where time goes.

The limitation of a purely visual approach is that large codebases produce dense charts where the interesting functions are small slivers buried deep in the hierarchy. You can see that something is slow; finding the specific call path that explains why requires zooming through multiple levels manually, and the visual representation makes it hard to answer precise questions like “show me every function that takes more than 50ms excluding its subcalls.”

pyinstrument takes a different approach. Rather than working with cProfile output, it runs its own statistical profiler and presents results as a call tree formatted for the terminal. Its output is more readable for quick inspection, but it is its own format and its own runner; it does not help you work with existing .prof files or integrate into workflows that already produce cProfile output.

py-spy goes further in the sampling direction, able to attach to a running Python process without modifying it, and producing flamegraph output. It is well-suited for production profiling where you cannot modify the application or restart it, but it operates at the process level rather than providing an interface for inspecting saved profile data.

Where profiling-explorer Fits

profiling-explorer, introduced by Adam Johnson, is designed to work with cProfile’s standard .prof output and provide a better interface for navigating it than pstats offers. Adam Johnson has a long track record of filling exactly these kinds of workflow gaps in the Python ecosystem; his tools tend to be small, focused, and correct rather than comprehensive.

The core idea is treating profiling data as something to explore interactively rather than visualize statically or report in bulk. Profiling data is a graph. Graph navigation calls for a graph navigation interface, not a table-sorting API.

A workflow that produces .prof files looks like this:

import cProfile

cProfile.run('my_function()', 'output.prof')

Or from the command line:

python -m cProfile -o output.prof my_script.py

The .prof format is a binary file produced by Python’s marshal module containing the Stats dictionary described above. Every tool in this space, including snakeviz and profiling-explorer, reads this same format via pstats.Stats('output.prof'). The format is stable and portable; a profile produced by cProfile in Python 3.8 can be read by the same tools as one produced in Python 3.13.

The Conceptual Difference Between Viewing and Exploring

The practical difference becomes concrete when diagnosing something non-obvious. Consider profiling a web handler for a form submission. The call path might look like:

handle_request -> form.is_valid -> field.clean -> validator -> some_db_query

In snakeviz, you would see an icicle and click down from handle_request toward form.is_valid. But if there are twenty fields each with multiple validators and database calls, the chart becomes a dense set of small slivers that are hard to differentiate visually. In pstats, you would call print_callees('is_valid'), then find the slow validator in the output, then call print_callees for that validator, then repeat.

An explorer that shows callers and callees inline, lets you select a function and immediately see its context, and supports filtering and sorting without leaving the interface changes this from a multi-step query process into a single continuous investigation. You stay in the data. The information that was already there in the pstats dictionary structure becomes traversable without repeated API calls.

cProfile’s Limitations Worth Knowing

None of this changes what cProfile captures, and its limitations are worth keeping in mind regardless of which tool you use to explore its output.

cProfile is a deterministic profiler. It instruments every Python function call using the sys.setprofile hook, which means overhead on every call. The overhead is consistent, but it is present. In tight loops or deeply recursive code, the instrumentation can skew the picture. For a typical web application where the real bottleneck is database latency, this overhead is negligible relative to the I/O costs you are measuring. For tight numerical code where you are optimizing to the microsecond, a sampling profiler gives a less biased result.

cProfile also sees only Python function calls. C extensions appear as single opaque entries. If a NumPy operation is your bottleneck, cProfile shows you that a NumPy function was called and took some amount of time; it cannot show you what happened inside the C implementation. Peeling that layer requires a native profiler. Tools like Scalene address this partially by correlating Python profiles with hardware performance counters, but that is outside the scope of what cProfile-based exploration can provide.

Why the Tooling Gap Persisted

It is worth asking why this gap existed long enough to be worth filling with a new tool in 2026. The answer is partly that snakeviz has been good enough for the most common use case, partly that the pstats API is functional even if it is awkward, and partly that the Python profiling space has been dominated by attention to the data-collection side of the problem.

The last few years brought py-spy, pyinstrument, memray, and Scalene. These are all improvements to the collection side: lower overhead, different measurement dimensions (memory, wall clock, native), better ergonomics for attaching to running processes. The exploration and navigation side, working with data after you have collected it, received less attention.

A tool that works within the existing .prof format and improves the navigation experience requires no new instrumentation, no new data format, no changes to how you run profiles. It adds value purely by giving you a better interface to data you already know how to produce. That is exactly the kind of focused improvement Adam Johnson tends to build, and it is the right place to look now that the collection side of Python profiling has reached a reasonable level of maturity.

Was this interesting?