Advanced CI Profiling & Regression Gates

This chapter extends the basic PR profiling pipeline with diff analysis, automatic regression detection, and actionable artifacts for reviewers.

Enhancements Added

Feature	Purpose
CPU & allocation diff (pprof -diff_base)	Quickly see what changed between original and optimized implementations
Regression gate script	Fail (or warn) if benchmarks degrade beyond a threshold
Benchstat statistical comparison	Noise-resistant performance signal
Structured artifact layout	Consistent retrieval and historic comparison foundation

Artifact Layout

.docs/artifacts/ci/
  flamegraphs/
    generator_cpu.svg
    generator_optimized_cpu.svg
  generator_cpu_top.txt
  generator_optimized_cpu_top.txt
  generator_cpu_diff_top.txt
  generator_mem_top.txt
  generator_optimized_mem_top.txt
  generator_mem_diff_top.txt

.docs/artifacts/benchdiff/
  bench_original.diff
  bench_optimized.diff

Diff Output Interpretation

generator_cpu_diff_top.txt is produced by:

go tool pprof -top -diff_base=generator_cpu.prof generator_optimized_cpu.prof

Meaning of signs:

Positive flat% in diff: function took more self time in optimized version (possible regression)
Negative flat%: function improved (less self time)

Allocation diff uses:

go tool pprof -top -alloc_space -diff_base=generator_mem.prof generator_optimized_mem.prof

Focus on large positive changes in flat or flat%—they often correlate with new allocation hot spots.

Regression Gate Script

scripts/parse_bench_regressions.sh scans bench_*.diff for positive deltas above a configurable threshold.

Example invocation in CI:

THRESHOLD_PERCENT=5 ./scripts/parse_bench_regressions.sh

Behavior:

Prints each regression line over threshold
Exit code 2 signals a failure (currently tolerated with || true until policy enforced)

To enforce hard failures, remove || true in the workflow step.

Recommended Review Flow

Open PR Checks summary → skim CPU/alloc hotspot tables
Open flamegraphs (artifact download) → verify shifts in dominant stacks
Read diff tables → confirm reductions in JSON/string heavy call stacks
Inspect benchstat diffs → validate statistical significance (p-values <= 0.05)
If regression flagged, drill into offending function via profile UI (optional local pprof)

Tightening the Signal (Optional)

Goal	Enhancement	Tooling
Reduce noise	Increase -count to 10 for critical benchmarks	benchstat
Faster feedback	Parallelize benchmark groups	matrix strategy
Deeper diffing	Export Speedscope JSON & link viewer	`go tool pprof -json` + speedscope.app
Store history	Push artifacts to object storage with commit key	custom action
Alerting	Add GitHub Status check if regression flagged	REST API call

Local Reproduction

Recreate CI artifacts locally:

make ci_profiles
ls .docs/artifacts/ci
cat .docs/artifacts/ci/generator_cpu_top.txt | head -25

Inspect diff interactively:

go tool pprof -http=:8088 -diff_base=generator_cpu.prof generator_optimized_cpu.prof

Next Steps & Ideas

Integrate continuous (runtime) sampling via ephemeral Pyroscope capture in PRs
Add memory leak detection heuristic (track goroutine count + heap size growth across iterations)
Include mutex/block profile sampling variants for contention analysis

With these extensions, performance regressions become visible, explainable, and enforceable early in the development cycle.

Advanced CI Profiling & Regression Gates

Advanced CI Profiling & Regression Gates

Enhancements Added

Artifact Layout

Diff Output Interpretation

Regression Gate Script

Recommended Review Flow

Tightening the Signal (Optional)

Local Reproduction

Next Steps & Ideas

results matching ""

No results matching ""