Compiler Optimization
Master Go compiler optimizations, build flags, and compilation techniques to maximize runtime performance through intelligent code generation.
Go Compiler Architecture
Compilation Pipeline
The Go compiler performs multiple optimization passes:
// Source code → SSA IR → Machine code
//
// 1. Parsing and type checking
// 2. SSA (Static Single Assignment) generation
// 3. Optimization passes
// 4. Code generation
// 5. Linking
// Example: Understanding compiler optimizations
package main
import "fmt"
// This function will be inlined by the compiler
func add(a, b int) int {
return a + b
}
// This function demonstrates escape analysis
func createSlice() []int {
// Compiler determines if this escapes to heap
s := make([]int, 10)
return s
}
func main() {
// Compiler optimizes this call through inlining
result := add(5, 3)
fmt.Println(result)
// Escape analysis determines allocation location
data := createSlice()
fmt.Println(len(data))
}
SSA (Static Single Assignment) Form
Understanding SSA helps optimize code for the compiler:
// Original code
func example(x, y int) int {
x = x + 1
if x > 10 {
x = x * 2
}
return x + y
}
// SSA form (conceptual):
// x1 = x0 + 1
// if x1 > 10 goto B2 else B3
// B2: x2 = x1 * 2; goto B4
// B3: x2 = x1; goto B4
// B4: result = x2 + y0; return result
Build Flags and Optimization Levels
Essential Build Flags
# Basic optimization (default)
go build -o myapp main.go
# Disable optimizations (debugging)
go build -gcflags="-N -l" -o myapp-debug main.go
# Enable all optimizations
go build -ldflags="-s -w" -o myapp-optimized main.go
# Link-time optimization
go build -ldflags="-X main.version=1.0.0 -s -w" -o myapp main.go
# Profile-guided optimization (Go 1.20+)
go build -pgo=cpu.prof -o myapp-pgo main.go
Advanced Build Configuration
// Build constraints for optimization
//go:build optimization
// +build optimization
package main
// Compiler directives
//go:noinline
func expensiveFunction() {
// Force no inlining for profiling
}
//go:nosplit
func lowLevelFunction() {
// Prevent stack splitting
}
//go:noescape
func noescape(p *int) {
// Hint that pointer doesn't escape
}
//go:linkname fastFunction runtime.fastFunction
func fastFunction()
// Build with optimization:
// go build -tags optimization main.go
Custom Build Scripts
#!/bin/bash
# build-optimized.sh
# Set optimization flags
export CGO_ENABLED=0
export GOOS=linux
export GOARCH=amd64
# Build with maximum optimizations
go build \
-a \
-installsuffix cgo \
-ldflags="-s -w -X main.version=$(git rev-parse --short HEAD)" \
-o app-optimized \
./cmd/app
# Strip additional debug info (if available)
if command -v strip &> /dev/null; then
strip app-optimized
fi
# Compress binary (if upx available)
if command -v upx &> /dev/null; then
upx --best app-optimized
fi
echo "Optimized binary created: app-optimized"
ls -lh app-optimized
Inlining Optimization
Understanding Function Inlining
// Inlining candidates - small, simple functions
//go:inline
func fastMath(x int) int {
return x*x + 2*x + 1 // Will be inlined
}
// Too complex for inlining
func complexFunction(data []int) int {
sum := 0
for i, v := range data {
if i%2 == 0 {
sum += v * v
} else {
sum += v * 2
}
}
return sum // Unlikely to be inlined due to complexity
}
// Interface calls - harder to inline
type Calculator interface {
Calculate(int) int
}
type SimpleCalc struct{}
func (sc SimpleCalc) Calculate(x int) int {
return x * 2 // May not be inlined due to interface
}
// Direct function calls - easier to inline
func directCall(x int) int {
return fastMath(x) // fastMath will be inlined here
}
// Benchmark inlining effects
func BenchmarkInlining(b *testing.B) {
b.Run("Inlined", func(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = fastMath(i)
}
})
b.Run("NotInlined", func(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = complexFunction([]int{i, i + 1, i + 2})
}
})
b.Run("InterfaceCall", func(b *testing.B) {
calc := SimpleCalc{}
for i := 0; i < b.N; i++ {
_ = calc.Calculate(i)
}
})
}
Controlling Inlining
// Force inlining for critical paths
//go:inline
func criticalPath(x, y int) int {
return x*y + x - y
}
// Prevent inlining for debugging
//go:noinline
func debugFunction(data []byte) []byte {
// Keep this separate for profiling
result := make([]byte, len(data))
copy(result, data)
return result
}
// Mid-level function - let compiler decide
func processData(input []int) []int {
output := make([]int, len(input))
for i, v := range input {
output[i] = criticalPath(v, i) // Will be inlined
}
return output
}
Escape Analysis Optimization
Understanding Escape Analysis
// Check escape analysis with: go build -gcflags="-m" main.go
// Stack allocation - no escape
func stackAllocation() {
x := 42 // Stays on stack
y := &x // Local pointer, stays on stack
_ = *y
} // x and y are deallocated when function returns
// Heap allocation - escapes
func heapAllocation() *int {
x := 42 // Escapes to heap
return &x // Pointer returned, must be on heap
}
// Interface allocation - may escape
func interfaceAllocation() interface{} {
x := 42 // Escapes to heap
return x // Boxed in interface
}
// Slice allocation analysis
func sliceAnalysis() {
// Stack allocation
s1 := make([]int, 10) // Small slice, stays on stack
_ = s1
// Heap allocation
s2 := make([]int, 10000) // Large slice, goes to heap
_ = s2
// Escape through return
s3 := make([]int, 5)
_ = s3
// If s3 is returned, it would escape
}
// Optimizing for stack allocation
func optimizedStackUsage() {
const maxStackSize = 1000
// Use stack allocation when possible
var buffer [maxStackSize]byte
// Process in chunks to stay on stack
for i := 0; i < len(buffer); i += 100 {
end := i + 100
if end > len(buffer) {
end = len(buffer)
}
processChunk(buffer[i:end])
}
}
func processChunk(chunk []byte) {
// Process chunk without escaping
for i, b := range chunk {
chunk[i] = b ^ 0xFF
}
}
Minimizing Heap Allocations
// Inefficient - multiple heap allocations
func inefficientProcessing(data []string) []string {
var results []string
for _, item := range data {
// String concatenation creates new allocations
processed := "prefix_" + item + "_suffix"
results = append(results, processed)
}
return results
}
// Optimized - minimize allocations
func optimizedProcessing(data []string) []string {
// Pre-allocate result slice
results := make([]string, 0, len(data))
// Reuse string builder
var builder strings.Builder
for _, item := range data {
builder.Reset()
builder.Grow(7 + len(item) + 7) // "prefix_" + item + "_suffix"
builder.WriteString("prefix_")
builder.WriteString(item)
builder.WriteString("_suffix")
results = append(results, builder.String())
}
return results
}
// Pool-based optimization
var stringBuilderPool = sync.Pool{
New: func() interface{} {
return &strings.Builder{}
},
}
func poolBasedProcessing(data []string) []string {
results := make([]string, 0, len(data))
builder := stringBuilderPool.Get().(*strings.Builder)
defer func() {
builder.Reset()
stringBuilderPool.Put(builder)
}()
for _, item := range data {
builder.Reset()
builder.WriteString("prefix_")
builder.WriteString(item)
builder.WriteString("_suffix")
results = append(results, builder.String())
}
return results
}
Dead Code Elimination
Conditional Compilation
// Build tags for dead code elimination
//go:build production
// +build production
package config
const (
DebugMode = false
LogLevel = "error"
)
// debug.go - separate file
//go:build !production
// +build !production
package config
const (
DebugMode = true
LogLevel = "debug"
)
// Usage - debug code eliminated in production builds
func processRequest(req Request) Response {
if DebugMode {
// This entire block eliminated in production
logDebug("Processing request: %+v", req)
validateRequest(req)
}
return handleRequest(req)
}
// Build production: go build -tags production
// Build debug: go build (default)
Compiler-Assisted Dead Code Elimination
// Constants allow compile-time optimization
const EnableLogging = false
const MaxCacheSize = 1000
func optimizedFunction(data []int) []int {
// Compiler eliminates this branch if EnableLogging = false
if EnableLogging {
fmt.Printf("Processing %d items\n", len(data))
}
// Compiler can optimize fixed-size loops
cache := make([]int, MaxCacheSize)
for i := 0; i < MaxCacheSize && i < len(data); i++ {
cache[i] = data[i] * 2
}
return cache[:min(MaxCacheSize, len(data))]
}
// Interface elimination through devirtualization
type Processor interface {
Process(int) int
}
type SimpleProcessor struct{}
func (sp SimpleProcessor) Process(x int) int {
return x * 2
}
// When type is known at compile time, interface call can be optimized
func processWithKnownType(data []int) []int {
processor := SimpleProcessor{} // Concrete type known
results := make([]int, len(data))
for i, v := range data {
results[i] = processor.Process(v) // Can be inlined/devirtualized
}
return results
}
Profile-Guided Optimization (PGO)
Collecting Profiles for PGO
// main.go - application with profiling
package main
import (
"os"
"runtime/pprof"
)
func main() {
// CPU profiling for PGO
if cpuProfile := os.Getenv("CPUPROFILE"); cpuProfile != "" {
f, err := os.Create(cpuProfile)
if err != nil {
panic(err)
}
defer f.Close()
if err := pprof.StartCPUProfile(f); err != nil {
panic(err)
}
defer pprof.StopCPUProfile()
}
// Your application workload
runApplicationWorkload()
}
func runApplicationWorkload() {
// Simulate typical workload
for i := 0; i < 1000000; i++ {
processData(generateData(i))
}
}
func generateData(seed int) []int {
data := make([]int, 1000)
for i := range data {
data[i] = (seed + i) % 1000
}
return data
}
func processData(data []int) int {
sum := 0
for _, v := range data {
if v%2 == 0 {
sum += v * v
} else {
sum += v * 3
}
}
return sum
}
Building with PGO
#!/bin/bash
# Build process with PGO
# 1. Build instrumented binary
go build -o app-instrumented main.go
# 2. Run with profiling to collect profile
CPUPROFILE=cpu.prof ./app-instrumented
# 3. Build optimized binary with profile
go build -pgo=cpu.prof -o app-optimized main.go
# 4. Compare performance
echo "Benchmarking instrumented version:"
./app-instrumented &
time ./app-instrumented
echo "Benchmarking PGO optimized version:"
time ./app-optimized
PGO Optimization Analysis
// Analyze PGO effectiveness
func BenchmarkPGOEffectiveness(b *testing.B) {
data := generateTestData(10000)
b.Run("HotPath", func(b *testing.B) {
// Function that should be optimized by PGO
for i := 0; i < b.N; i++ {
_ = hotPathFunction(data)
}
})
b.Run("ColdPath", func(b *testing.B) {
// Function rarely called in profile
for i := 0; i < b.N; i++ {
_ = coldPathFunction(data)
}
})
}
// Hot path - frequently called in profile
func hotPathFunction(data []int) int {
sum := 0
for _, v := range data {
sum += complexCalculation(v)
}
return sum
}
// Cold path - rarely called in profile
func coldPathFunction(data []int) int {
product := 1
for _, v := range data {
if v > 0 {
product *= v
if product > 1000000 {
break
}
}
}
return product
}
func complexCalculation(x int) int {
// Complex enough to benefit from optimization
result := x
for i := 0; i < 10; i++ {
result = result*result + result + 1
result = result % 1000000
}
return result
}
Advanced Compiler Optimizations
Loop Optimizations
// Loop unrolling candidate
func processArrayOptimized(data []int) {
// Compiler may unroll this loop
for i := 0; i < len(data); i += 4 {
// Process 4 elements at once for vectorization
if i+3 < len(data) {
data[i] *= 2
data[i+1] *= 2
data[i+2] *= 2
data[i+3] *= 2
} else {
// Handle remaining elements
for j := i; j < len(data); j++ {
data[j] *= 2
}
break
}
}
}
// Loop invariant code motion
func loopInvariantOptimization(matrix [][]int, multiplier int) {
// Compiler moves invariant calculations out of loop
n := len(matrix)
m := len(matrix[0])
// These calculations are loop invariant
threshold := multiplier * 100
factor := multiplier + 5
for i := 0; i < n; i++ {
for j := 0; j < m; j++ {
if matrix[i][j] > threshold {
matrix[i][j] *= factor
}
}
}
}
// Strength reduction
func strengthReduction(n int) []int {
result := make([]int, n)
// Compiler optimizes multiplication to addition
for i := 0; i < n; i++ {
result[i] = i * 7 // May be optimized to addition
}
return result
}
Bounds Check Elimination
// Bounds check elimination patterns
func boundsCheckElimination(data []int) {
n := len(data)
// Pattern 1: Compiler eliminates bounds checks
for i := 0; i < n; i++ {
data[i] = i // No bounds check needed
}
// Pattern 2: Use _ to hint no bounds check needed
for i := range data {
_ = data[i] // Eliminates bounds check
data[i] = i * 2
}
// Pattern 3: Manual bounds check removal
if len(data) >= 10 {
// Compiler knows these are safe
data[0] = 1
data[9] = 2
}
}
// Slice bounds optimization
func sliceBoundsOptimization(data []int) []int {
if len(data) < 100 {
return data
}
// Compiler eliminates bounds checks for this slice
subset := data[10:90] // Known to be within bounds
for i := range subset {
subset[i] *= 2 // No bounds check needed
}
return subset
}
Measuring Compiler Optimization Impact
Compilation Analysis Tools
# View compiler optimizations
go build -gcflags="-m -m" main.go 2>&1 | grep -E "(inlin|escap|alloc)"
# View SSA intermediate representation
go build -gcflags="-S" main.go > assembly.txt
# Analyze binary size
go build -ldflags="-s -w" -o optimized main.go
go build -o unoptimized main.go
ls -lh optimized unoptimized
# Profile-guided optimization report
go build -pgo=cpu.prof -gcflags="-m" main.go 2>&1 | grep "PGO"
Performance Validation
// Benchmark compiler optimizations
func BenchmarkCompilerOptimizations(b *testing.B) {
data := make([]int, 1000)
for i := range data {
data[i] = i
}
b.Run("InlinedFunction", func(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = inlinedMath(data[i%len(data)])
}
})
b.Run("NonInlinedFunction", func(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = nonInlinedMath(data[i%len(data)])
}
})
b.Run("BoundsCheckEliminated", func(b *testing.B) {
for i := 0; i < b.N; i++ {
processSafeBounds(data)
}
})
b.Run("BoundsCheckPresent", func(b *testing.B) {
for i := 0; i < b.N; i++ {
processUnsafeBounds(data)
}
})
}
//go:inline
func inlinedMath(x int) int {
return x*x + 2*x + 1
}
//go:noinline
func nonInlinedMath(x int) int {
return x*x + 2*x + 1
}
func processSafeBounds(data []int) {
for i := 0; i < len(data); i++ {
data[i] *= 2 // Bounds check eliminated
}
}
func processUnsafeBounds(data []int) {
for i := 0; i < 2000; i++ {
if i < len(data) {
data[i] *= 2 // Bounds check required
}
}
}
Best Practices for Compiler Optimization
1. Write Optimizer-Friendly Code
- Use constants instead of variables when values don't change
- Prefer small, simple functions for inlining
- Use concrete types instead of interfaces when possible
- Minimize pointer indirection
2. Leverage Build Flags Appropriately
- Use
-ldflags="-s -w"for production builds - Apply PGO for performance-critical applications
- Use build tags for conditional compilation
- Profile before and after optimization
3. Understand Escape Analysis
- Keep allocations on the stack when possible
- Avoid returning pointers to local variables unnecessarily
- Use value receivers for small structs
- Pre-allocate slices and maps when size is known
4. Monitor and Validate
- Use benchmarks to measure optimization impact
- Profile with
go tool pprofto verify improvements - Monitor binary size and startup performance
- Test optimized builds thoroughly
The key to effective compiler optimization is understanding how the Go compiler works, writing code that enables optimizations, and continuously measuring the impact of your optimization efforts.