CPU Profiling
CPU profiling is your primary tool for identifying performance bottlenecks and optimizing execution time. This section covers everything from basic profiling to advanced analysis techniques.
Overview
CPU profiling samples your program's execution to identify where time is spent. By understanding which functions consume the most CPU cycles, you can focus optimization efforts where they'll have the greatest impact.
What CPU Profiling Reveals
- Hot functions: Code consuming the most execution time
- Call patterns: How functions call each other
- Execution paths: Which code paths are most frequently executed
- Algorithm efficiency: Relative performance of different approaches
- System overhead: Runtime, GC, and system call costs
Quick Start
Enable Profiling in Your Application
package main
import (
"log"
"net/http"
_ "net/http/pprof" // Import for side effects
)
func main() {
// Start HTTP server for profiling
go func() {
log.Println("Profiling server at http://localhost:6060/debug/pprof/")
log.Fatal(http.ListenAndServe("localhost:6060", nil))
}()
// Your application code
runYourApplication()
}
Collect CPU Profile
# Live profiling (30 seconds)
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
# From benchmark
go test -bench=. -cpuprofile=cpu.prof
# Programmatic profiling
go run main.go # With profiling code embedded
Analyze Profile
# Interactive analysis
go tool pprof cpu.prof
# Web interface (recommended)
go tool pprof -http=:8080 cpu.prof
# Generate report
go tool pprof -top cpu.prof
Profiling Techniques
1. Basic CPU Profiling
The foundation of CPU analysis:
- Setting up profiling endpoints
- Collecting your first CPU profile
- Understanding basic output
- Identifying obvious bottlenecks
2. Advanced CPU Techniques
Sophisticated analysis methods:
- Differential profiling (before/after)
- Focus and ignore patterns
- Custom sampling rates
- Production profiling strategies
3. Flamegraph Analysis
Visual profiling for complex applications:
- Generating interactive flamegraphs
- Reading flamegraph visualizations
- Identifying optimization opportunities
- Comparing multiple profiles
Common CPU Optimization Patterns
String Operations
String manipulation often appears in CPU profiles:
// ❌ Inefficient: Multiple allocations
func buildStringBad(items []string) string {
result := ""
for _, item := range items {
result += item + "," // Allocates new string each time
}
return result
}
// ✅ Efficient: Single allocation
func buildStringGood(items []string) string {
var builder strings.Builder
builder.Grow(len(items) * 10) // Pre-allocate estimated size
for i, item := range items {
if i > 0 {
builder.WriteByte(',')
}
builder.WriteString(item)
}
return builder.String()
}
Algorithm Optimization
CPU profiles reveal algorithmic inefficiencies:
// ❌ O(n²) lookup performance
func findItemsBad(items []Item, targets []string) []Item {
var results []Item
for _, target := range targets {
for _, item := range items { // Linear search for each target
if item.Name == target {
results = append(results, item)
break
}
}
}
return results
}
// ✅ O(n) lookup performance
func findItemsGood(items []Item, targets []string) []Item {
// Pre-build map for O(1) lookups
itemMap := make(map[string]Item, len(items))
for _, item := range items {
itemMap[item.Name] = item
}
results := make([]Item, 0, len(targets))
for _, target := range targets {
if item, exists := itemMap[target]; exists {
results = append(results, item)
}
}
return results
}
CPU Profile Analysis Workflow
1. Initial Collection
# Collect profile during realistic workload
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60
2. High-Level Analysis
# Identify top CPU consumers
(pprof) top10
(pprof) top10 -cum # Cumulative time including called functions
3. Function-Level Investigation
# Examine specific function
(pprof) list functionName
(pprof) disasm functionName # Assembly-level analysis
4. Call Graph Analysis
# Understand call relationships
(pprof) web
(pprof) png > profile.png
5. Optimization and Validation
# Compare before/after profiles
go tool pprof -base=old.prof new.prof
(pprof) top10
Production CPU Profiling
Safe Production Profiling
// Production-safe profiling setup
func setupProduction() {
// Only enable profiling in debug mode
if os.Getenv("DEBUG_PROFILING") == "true" {
go func() {
// Bind to localhost only for security
log.Fatal(http.ListenAndServe("localhost:6060", nil))
}()
}
}
// Rate-limited profiling
func conditionalProfiling() {
// Profile 1% of requests
if rand.Float64() < 0.01 {
f, _ := os.Create(fmt.Sprintf("profile-%d.prof", time.Now().Unix()))
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
defer f.Close()
// Execute request
handleRequest()
}
}
Automated Profile Collection
#!/bin/bash
# collect-cpu-profiles.sh
SERVICE_URL="http://localhost:6060"
DURATION=30
INTERVAL=300 # 5 minutes
while true; do
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
echo "Collecting CPU profile at $TIMESTAMP"
go tool pprof -seconds=$DURATION -output="cpu-$TIMESTAMP.prof" \
"$SERVICE_URL/debug/pprof/profile"
# Generate quick analysis
go tool pprof -top -output="analysis-$TIMESTAMP.txt" "cpu-$TIMESTAMP.prof"
sleep $INTERVAL
done
CPU Profiling Best Practices
✅ Do's
- Profile realistic workloads - Use production-like data and traffic patterns
- Profile for sufficient duration - 30-60 seconds minimum for statistical significance
- Compare before/after - Always measure optimization impact
- Focus on the biggest gains - Optimize functions consuming >5% of CPU time
- Consider the call stack - Sometimes the issue is in the caller, not the called function
❌ Don'ts
- Don't profile toy examples - Micro-benchmarks may not reflect real performance
- Don't ignore cumulative time - Functions that call expensive operations matter
- Don't optimize prematurely - Profile first, then optimize based on data
- Don't forget about memory - CPU optimization that increases allocations may hurt overall performance
- Don't profile development builds - Use optimized builds (
go build -ldflags="-s -w")
CPU Profile Interpretation Guide
Understanding pprof Output
(pprof) top10
Showing nodes accounting for 2840ms, 94.67% of 3000ms total
Dropped 23 nodes (cum <= 15ms)
Showing top 10 nodes out of 45
flat flat% sum% cum cum%
1200ms 40.00% 40.00% 1200ms 40.00% main.expensiveFunction
890ms 29.67% 69.67% 890ms 29.67% main.stringManipulation
350ms 11.67% 81.33% 350ms 11.67% runtime.mallocgc
210ms 7.00% 88.33% 250ms 8.33% strings.(*Builder).Grow
190ms 6.33% 94.67% 190ms 6.33% runtime.memmove
Column Meanings:
- flat: Time spent in this function only (excluding calls to other functions)
- flat%: Percentage of total time spent in this function
- sum%: Cumulative percentage including all functions above
- cum: Cumulative time including this function and all functions it calls
- cum%: Cumulative percentage including called functions
Red Flags in CPU Profiles
🚨 High string manipulation costs
strings.Join: 25% of CPU time
Solution: Use strings.Builder or bytes.Buffer
🚨 Excessive reflection
reflect.*: 20% of CPU time
Solution: Use code generation or type-specific methods
🚨 Memory allocation overhead
runtime.mallocgc: 15% of CPU time
Solution: Reduce allocations, use object pools
🚨 JSON serialization bottlenecks
encoding/json.(*encodeState).marshal: 30% of CPU time
Solution: Use faster JSON libraries or custom serialization
Learning Path
Beginner Level
- Basic CPU Profiling - Essential concepts and first profile
- Practice with simple applications
- Learn to read top10 output
Intermediate Level
- Advanced Techniques - Differential analysis and filtering
- Production profiling setup
- Call graph analysis
Advanced Level
- Flamegraph Analysis - Visual profiling mastery
- Custom sampling and automation
- Performance regression detection
Tools and Resources
Essential Commands
# Collection
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
go test -bench=. -cpuprofile=cpu.prof
# Analysis
go tool pprof -http=:8080 cpu.prof
go tool pprof -top cpu.prof
go tool pprof -list=functionName cpu.prof
# Comparison
go tool pprof -base=baseline.prof optimized.prof
# Export
go tool pprof -png cpu.prof > cpu-profile.png
go tool pprof -svg cpu.prof > cpu-profile.svg
Integration Examples
// Benchmark with CPU profiling
func BenchmarkFunction(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
functionToOptimize()
}
}
// Conditional profiling in production
func profiledHandler(w http.ResponseWriter, r *http.Request) {
if shouldProfile() {
f, _ := os.Create("handler-profile.prof")
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
defer f.Close()
}
// Handle request
actualHandler(w, r)
}
Ready to master CPU profiling? Start with Basic CPU Profiling and work your way through each technique systematically.
Next Steps: Basic CPU Profiling → Advanced Techniques → Flamegraph Analysis