Statistical Analysis for Benchmarking

Comprehensive guide to statistical analysis techniques for Go performance benchmarking. This guide covers statistical methods, data analysis, confidence intervals, hypothesis testing, and advanced statistical modeling for performance data.

Table of Contents

Introduction

Statistical analysis transforms raw benchmark data into meaningful insights about performance characteristics. This guide provides comprehensive statistical methods for analyzing Go application performance, detecting significant changes, and making data-driven optimization decisions.

Statistical Framework

package main

import (
    "context"
    "fmt"
    "math"
    "sort"
    "sync"
    "time"
)

// StatisticalAnalyzer performs comprehensive statistical analysis on benchmark data
type StatisticalAnalyzer struct {
    descriptive    *DescriptiveAnalyzer
    inferential    *InferentialAnalyzer
    hypothesis     *HypothesisAnalyzer
    regression     *RegressionAnalyzer
    timeSeries     *TimeSeriesAnalyzer
    outlierDetector *OutlierDetector
    modeler        *StatisticalModeler
    config         AnalyzerConfig
    cache          *AnalysisCache
    metrics        *AnalysisMetrics
    mu             sync.RWMutex
}

// AnalyzerConfig contains analyzer configuration
type AnalyzerConfig struct {
    ConfidenceLevel      float64
    SignificanceLevel    float64
    MinSampleSize        int
    MaxSampleSize        int
    OutlierThreshold     float64
    EnableOutlierRemoval bool
    EnableNormalization  bool
    EnableRobustStats    bool
    BootstrapSamples     int
    CacheEnabled         bool
    ParallelProcessing   bool
    MaxWorkers           int
}

// BenchmarkData represents benchmark measurement data
type BenchmarkData struct {
    ID          string
    Name        string
    Values      []float64
    Metadata    DataMetadata
    Timestamp   time.Time
    Environment Environment
    Quality     DataQuality
}

// DataMetadata contains metadata about benchmark data
type DataMetadata struct {
    Source      string
    Version     string
    Platform    string
    GoVersion   string
    Iterations  int
    Duration    time.Duration
    MemoryUsage int64
    CPUUsage    float64
    Tags        map[string]string
}

// Environment describes the execution environment
type Environment struct {
    OS           string
    Architecture string
    CPUModel     string
    CPUCores     int
    Memory       int64
    LoadAverage  float64
    Temperature  float64
    PowerMode    string
}

// DataQuality represents data quality metrics
type DataQuality struct {
    Completeness float64
    Consistency  float64
    Accuracy     float64
    Validity     float64
    Outliers     int
    MissingValues int
    OverallScore float64
}

// DescriptiveAnalyzer performs descriptive statistical analysis
type DescriptiveAnalyzer struct {
    config DescriptiveConfig
    cache  map[string]*DescriptiveStats
    mu     sync.RWMutex
}

// DescriptiveConfig contains descriptive analysis configuration
type DescriptiveConfig struct {
    EnableRobustStats  bool
    EnablePercentiles  bool
    EnableDistribution bool
    EnableCorrelation  bool
    PercentilePoints   []float64
    BinCount          int
    HistogramBins     int
}

// DescriptiveStats contains descriptive statistics
type DescriptiveStats struct {
    Count        int
    Mean         float64
    Median       float64
    Mode         []float64
    StdDev       float64
    Variance     float64
    Min          float64
    Max          float64
    Range        float64
    IQR          float64
    Percentiles  map[float64]float64
    Quartiles    Quartiles
    Moments      Moments
    Distribution DistributionStats
    Robust       RobustStats
}

// Quartiles represents quartile values
type Quartiles struct {
    Q1 float64
    Q2 float64 // Median
    Q3 float64
}

// Moments represents statistical moments
type Moments struct {
    Mean     float64 // First moment
    Variance float64 // Second central moment
    Skewness float64 // Third standardized moment
    Kurtosis float64 // Fourth standardized moment
}

// DistributionStats contains distribution characteristics
type DistributionStats struct {
    Type         DistributionType
    Parameters   map[string]float64
    GoodnessOfFit float64
    KSStatistic  float64
    ADStatistic  float64
    JBStatistic  float64
    Histogram    Histogram
}

// DistributionType defines distribution types
type DistributionType int

const (
    NormalDistribution DistributionType = iota
    LogNormalDistribution
    ExponentialDistribution
    GammaDistribution
    WeibullDistribution
    UniformDistribution
    PoissonDistribution
    UnknownDistribution
)

// Histogram represents a histogram
type Histogram struct {
    Bins   []float64
    Counts []int
    Edges  []float64
    Width  float64
}

// RobustStats contains robust statistical measures
type RobustStats struct {
    TrimmedMean   map[float64]float64 // Trimmed means at different levels
    Winsorized    map[float64]float64 // Winsorized means
    MAD           float64             // Median Absolute Deviation
    IQRange       float64             // Interquartile Range
    Biweight      BiweightStats
    Huber         HuberStats
}

// BiweightStats contains biweight statistics
type BiweightStats struct {
    Location float64
    Scale    float64
}

// HuberStats contains Huber statistics
type HuberStats struct {
    Location float64
    Scale    float64
    K        float64
}

// InferentialAnalyzer performs inferential statistical analysis
type InferentialAnalyzer struct {
    config InferentialConfig
    cache  map[string]*InferentialResults
    mu     sync.RWMutex
}

// InferentialConfig contains inferential analysis configuration
type InferentialConfig struct {
    ConfidenceLevel   float64
    BootstrapSamples  int
    EnableBootstrap   bool
    EnableJackknife   bool
    EnablePermutation bool
    SamplingMethod    SamplingMethod
}

// SamplingMethod defines sampling methods
type SamplingMethod int

const (
    SimpleRandomSampling SamplingMethod = iota
    StratifiedSampling
    SystematicSampling
    ClusterSampling
    BootstrapSampling
)

// InferentialResults contains inferential analysis results
type InferentialResults struct {
    PopulationMean    EstimateWithCI
    PopulationStdDev  EstimateWithCI
    PopulationVariance EstimateWithCI
    Percentiles       map[float64]EstimateWithCI
    Bootstrap         BootstrapResults
    Jackknife         JackknifeResults
    Permutation       PermutationResults
}

// EstimateWithCI represents an estimate with confidence interval
type EstimateWithCI struct {
    Estimate    float64
    StdError    float64
    LowerBound  float64
    UpperBound  float64
    Confidence  float64
    Method      EstimationMethod
}

// EstimationMethod defines estimation methods
type EstimationMethod int

const (
    SampleMethod EstimationMethod = iota
    BootstrapMethod
    JackknifeMethod
    BayesianMethod
    RobustMethod
)

// BootstrapResults contains bootstrap analysis results
type BootstrapResults struct {
    Samples      int
    Estimates    []float64
    Bias         float64
    StdError     float64
    Percentile   ConfidenceInterval
    BCa          ConfidenceInterval // Bias-corrected and accelerated
    StudentizedT ConfidenceInterval
}

// ConfidenceInterval represents a confidence interval
type ConfidenceInterval struct {
    Lower      float64
    Upper      float64
    Confidence float64
    Method     CIMethod
}

// CIMethod defines confidence interval methods
type CIMethod int

const (
    PercentileCI CIMethod = iota
    BCaCI
    StudentizedCI
    NormalCI
    BasicCI
)

// JackknifeResults contains jackknife analysis results
type JackknifeResults struct {
    Estimates []float64
    Bias      float64
    StdError  float64
    Variance  float64
}

// PermutationResults contains permutation test results
type PermutationResults struct {
    Permutations int
    Statistics   []float64
    PValue       float64
    Observed     float64
    CriticalValue float64
}

// HypothesisAnalyzer performs hypothesis testing
type HypothesisAnalyzer struct {
    tests  map[string]HypothesisTest
    config HypothesisConfig
    cache  map[string]*TestResults
    mu     sync.RWMutex
}

// HypothesisConfig contains hypothesis testing configuration
type HypothesisConfig struct {
    SignificanceLevel float64
    PowerAnalysis     bool
    EffectSize        bool
    MultipleComparisons bool
    CorrectionMethod  CorrectionMethod
    EnableNonParametric bool
}

// CorrectionMethod defines multiple comparison correction methods
type CorrectionMethod int

const (
    BonferroniCorrection CorrectionMethod = iota
    HolmCorrection
    BenjaminiHochbergCorrection
    FDRCorrection
    NoCorrection
)

// HypothesisTest defines hypothesis tests
type HypothesisTest interface {
    Test(data1, data2 []float64, options TestOptions) (*TestResults, error)
    GetType() TestType
    GetAssumptions() []string
    ValidateAssumptions(data []float64) AssumptionResults
}

// TestType defines test types
type TestType int

const (
    OneSampleTTest TestType = iota
    TwoSampleTTest
    PairedTTest
    WelchTTest
    MannWhitneyTest
    WilcoxonTest
    KruskalWallisTest
    FriedmanTest
    ChiSquareTest
    KSTest
    AndersonDarlingTest
    ShapiroWilkTest
)

// TestOptions contains test options
type TestOptions struct {
    Alternative     Alternative
    Paired          bool
    EqualVariances  bool
    ConfidenceLevel float64
    Exact           bool
    Continuity      bool
}

// Alternative defines alternative hypotheses
type Alternative int

const (
    TwoSided Alternative = iota
    Greater
    Less
)

// TestResults contains hypothesis test results
type TestResults struct {
    TestType        TestType
    Statistic       float64
    PValue          float64
    CriticalValue   float64
    DegreesOfFreedom float64
    ConfidenceInterval ConfidenceInterval
    EffectSize      EffectSizeResults
    PowerAnalysis   PowerResults
    Decision        TestDecision
    Interpretation  string
    Assumptions     AssumptionResults
}

// TestDecision represents test decision
type TestDecision int

const (
    RejectNull TestDecision = iota
    FailToRejectNull
    Inconclusive
)

// EffectSizeResults contains effect size measurements
type EffectSizeResults struct {
    CohensD     float64
    GlassD      float64
    HedgesG     float64
    R           float64
    R2          float64
    EtaSquared  float64
    Omega2      float64
    Interpretation EffectSizeInterpretation
}

// EffectSizeInterpretation defines effect size interpretations
type EffectSizeInterpretation int

const (
    NegligibleEffect EffectSizeInterpretation = iota
    SmallEffect
    MediumEffect
    LargeEffect
    VeryLargeEffect
)

// PowerResults contains statistical power analysis results
type PowerResults struct {
    Power          float64
    RequiredN      int
    DetectableEffect float64
    TypeIIError    float64
    Sensitivity    float64
}

// AssumptionResults contains assumption validation results
type AssumptionResults struct {
    Normality       NormalityTest
    HomoscedasticityHomoscedasticityTest
    Independence    IndependenceTest
    Outliers        OutlierTest
    Satisfied       bool
    Warnings        []string
    Recommendations []string
}

// NormalityTest contains normality test results
type NormalityTest struct {
    ShapiroWilk     TestResult
    KolmogorovSmirnov TestResult
    AndersonDarling TestResult
    JarqueBera      TestResult
    Satisfied       bool
}

// HomoscedasticityTest contains homoscedasticity test results
type HomoscedasticityTest struct {
    Levene    TestResult
    BrownForsythe TestResult
    Bartlett  TestResult
    FTest     TestResult
    Satisfied bool
}

// IndependenceTest contains independence test results
type IndependenceTest struct {
    DurbinWatson TestResult
    RunsTest     TestResult
    LjungBox     TestResult
    Satisfied    bool
}

// OutlierTest contains outlier detection results
type OutlierTest struct {
    Grubbs        TestResult
    Dixon         TestResult
    Rosner        TestResult
    ModifiedZ     TestResult
    OutlierCount  int
    OutlierIndices []int
    Satisfied     bool
}

// TestResult represents a single test result
type TestResult struct {
    Statistic float64
    PValue    float64
    Significant bool
    Method    string
}

// RegressionAnalyzer performs regression analysis
type RegressionAnalyzer struct {
    models map[string]RegressionModel
    config RegressionConfig
    cache  map[string]*RegressionResults
    mu     sync.RWMutex
}

// RegressionConfig contains regression analysis configuration
type RegressionConfig struct {
    EnableLinear      bool
    EnableNonLinear   bool
    EnableRobust      bool
    EnableRegularized bool
    CrossValidation   bool
    CVFolds          int
    FeatureSelection bool
    TransformData    bool
}

// RegressionModel defines regression models
type RegressionModel interface {
    Fit(x, y []float64) error
    Predict(x []float64) ([]float64, error)
    GetCoefficients() []float64
    GetStatistics() ModelStatistics
    GetType() ModelType
}

// ModelType defines regression model types
type ModelType int

const (
    LinearRegression ModelType = iota
    PolynomialRegression
    ExponentialRegression
    LogarithmicRegression
    PowerRegression
    RobustRegression
    RidgeRegression
    LassoRegression
    ElasticNetRegression
)

// RegressionResults contains regression analysis results
type RegressionResults struct {
    Model           ModelType
    Coefficients    []float64
    Intercept       float64
    RSquared        float64
    AdjustedRSquared float64
    FStatistic      float64
    PValue          float64
    StandardErrors  []float64
    TStatistics     []float64
    PValues         []float64
    ConfidenceIntervals []ConfidenceInterval
    Residuals       ResidualAnalysis
    Diagnostics     RegressionDiagnostics
    CrossValidation CrossValidationResults
    Predictions     PredictionResults
}

// ModelStatistics contains model statistics
type ModelStatistics struct {
    RSquared         float64
    AdjustedRSquared float64
    RMSE            float64
    MAE             float64
    AIC             float64
    BIC             float64
    LogLikelihood   float64
    FStatistic      float64
    PValue          float64
    DegreesOfFreedom int
}

// ResidualAnalysis contains residual analysis results
type ResidualAnalysis struct {
    Residuals      []float64
    Standardized   []float64
    Studentized    []float64
    Leverage       []float64
    CooksDistance  []float64
    DFFits         []float64
    DFBetas        [][]float64
    Normality      NormalityTest
    Homoscedasticity HomoscedasticityTest
    Independence   IndependenceTest
    Linearity      LinearityTest
}

// LinearityTest contains linearity test results
type LinearityTest struct {
    RamseyRESET   TestResult
    HarveyCollier TestResult
    RainbowTest   TestResult
    Satisfied     bool
}

// RegressionDiagnostics contains regression diagnostics
type RegressionDiagnostics struct {
    Multicollinearity MulticollinearityTest
    Outliers          OutlierDiagnostics
    Influence         InfluenceDiagnostics
    Heteroscedasticity HeteroscedasticityTest
    Autocorrelation   AutocorrelationTest
}

// MulticollinearityTest contains multicollinearity test results
type MulticollinearityTest struct {
    VIF          []float64 // Variance Inflation Factor
    ConditionIndex float64
    Eigenvalues  []float64
    Satisfied    bool
}

// OutlierDiagnostics contains outlier diagnostics
type OutlierDiagnostics struct {
    Outliers       []int
    HighLeverage   []int
    Influential    []int
    CooksD         []float64
    Threshold      float64
}

// InfluenceDiagnostics contains influence diagnostics
type InfluenceDiagnostics struct {
    Hat            []float64
    CooksDistance  []float64
    DFFits         []float64
    DFBetas        [][]float64
    CovarianceRatio []float64
}

// HeteroscedasticityTest contains heteroscedasticity test results
type HeteroscedasticityTest struct {
    BreuschPagan  TestResult
    White         TestResult
    Goldfeld      TestResult
    Satisfied     bool
}

// AutocorrelationTest contains autocorrelation test results
type AutocorrelationTest struct {
    DurbinWatson  TestResult
    LjungBox      TestResult
    BreuschGodfrey TestResult
    Satisfied     bool
}

// CrossValidationResults contains cross-validation results
type CrossValidationResults struct {
    Folds         int
    TrainScores   []float64
    TestScores    []float64
    MeanTrain     float64
    MeanTest      float64
    StdTrain      float64
    StdTest       float64
    Overfitting   bool
}

// PredictionResults contains prediction results
type PredictionResults struct {
    Predictions      []float64
    Intervals        []ConfidenceInterval
    PredictionBands  []ConfidenceInterval
    Residuals        []float64
    RMSE            float64
    MAE             float64
    MAPE            float64
}

// TimeSeriesAnalyzer performs time series analysis
type TimeSeriesAnalyzer struct {
    models map[string]TimeSeriesModel
    config TimeSeriesConfig
    cache  map[string]*TimeSeriesResults
    mu     sync.RWMutex
}

// TimeSeriesConfig contains time series analysis configuration
type TimeSeriesConfig struct {
    EnableTrend        bool
    EnableSeasonality  bool
    EnableStationarity bool
    EnableForecasting  bool
    AutoArima         bool
    SeasonalPeriod    int
    ForecastHorizon   int
    ConfidenceLevel   float64
}

// TimeSeriesModel defines time series models
type TimeSeriesModel interface {
    Fit(data []TimedValue) error
    Forecast(periods int) ([]float64, []ConfidenceInterval, error)
    GetComponents() TimeSeriesComponents
    GetStatistics() TimeSeriesStatistics
    GetType() TimeSeriesModelType
}

// TimeSeriesModelType defines time series model types
type TimeSeriesModelType int

const (
    ARIMA TimeSeriesModelType = iota
    SARIMA
    ExponentialSmoothing
    HoltWinters
    StateSpace
    GARCH
    Prophet
)

// TimedValue represents a time-indexed value
type TimedValue struct {
    Time  time.Time
    Value float64
}

// TimeSeriesResults contains time series analysis results
type TimeSeriesResults struct {
    Model           TimeSeriesModelType
    Components      TimeSeriesComponents
    Statistics      TimeSeriesStatistics
    Stationarity    StationarityTests
    Seasonality     SeasonalityTests
    Forecast        ForecastResults
    Diagnostics     TimeSeriesDiagnostics
    ChangePoints    []ChangePoint
}

// TimeSeriesComponents contains decomposed components
type TimeSeriesComponents struct {
    Trend      []float64
    Seasonal   []float64
    Residual   []float64
    Level      []float64
    Slope      []float64
    Irregular  []float64
}

// TimeSeriesStatistics contains time series statistics
type TimeSeriesStatistics struct {
    AIC          float64
    BIC          float64
    LogLikelihood float64
    RMSE         float64
    MAE          float64
    MAPE         float64
    MASE         float64
    AutoCorrelation []float64
    PartialAutoCorrelation []float64
}

// StationarityTests contains stationarity test results
type StationarityTests struct {
    ADF           TestResult // Augmented Dickey-Fuller
    KPSS          TestResult // Kwiatkowski-Phillips-Schmidt-Shin
    PhillipsPerron TestResult
    Stationary    bool
    Trend         bool
    Drift         bool
}

// SeasonalityTests contains seasonality test results
type SeasonalityTests struct {
    FriedmanTest   TestResult
    KruskalWallis  TestResult
    XTest          TestResult
    QSSeasonal     TestResult
    Seasonal       bool
    Period         int
    Strength       float64
}

// ForecastResults contains forecasting results
type ForecastResults struct {
    Forecast       []float64
    ConfidenceIntervals []ConfidenceInterval
    PredictionIntervals []ConfidenceInterval
    Residuals      []float64
    Accuracy       ForecastAccuracy
    Backtesting    BacktestResults
}

// ForecastAccuracy contains forecast accuracy metrics
type ForecastAccuracy struct {
    MAE   float64
    MAPE  float64
    RMSE  float64
    MASE  float64
    sMAPE float64
    MSIS  float64
}

// BacktestResults contains backtesting results
type BacktestResults struct {
    Periods    int
    Accuracy   []ForecastAccuracy
    Average    ForecastAccuracy
    Stability  float64
    Trend      TrendDirection
}

// TrendDirection defines trend directions
type TrendDirection int

const (
    NoTrend TrendDirection = iota
    UpTrend
    DownTrend
    Volatile
)

// TimeSeriesDiagnostics contains time series diagnostics
type TimeSeriesDiagnostics struct {
    Residuals        ResidualDiagnostics
    LjungBox         TestResult
    JarqueBera       TestResult
    ArchTest         TestResult
    Heteroscedasticity HeteroscedasticityTest
    Normality        NormalityTest
}

// ResidualDiagnostics contains residual diagnostics
type ResidualDiagnostics struct {
    Residuals     []float64
    Standardized  []float64
    ACF           []float64
    PACF          []float64
    QQPlot        QQPlotResults
    WhiteNoise    bool
}

// QQPlotResults contains Q-Q plot results
type QQPlotResults struct {
    Quantiles    []float64
    Theoretical  []float64
    RSquared     float64
    Slope        float64
    Intercept    float64
    Normality    bool
}

// ChangePoint represents a change point in time series
type ChangePoint struct {
    Time       time.Time
    Type       ChangeType
    Magnitude  float64
    Confidence float64
    Method     ChangePointMethod
}

// ChangeType defines change point types
type ChangeType int

const (
    LevelChange ChangeType = iota
    TrendChange
    VarianceChange
    SeasonalChange
)

// ChangePointMethod defines change point detection methods
type ChangePointMethod int

const (
    CUSUM ChangePointMethod = iota
    PELT
    BinSeg
    Segment
    WindowBased
)

// OutlierDetector detects outliers in benchmark data
type OutlierDetector struct {
    methods map[string]OutlierMethod
    config  OutlierConfig
    cache   map[string]*OutlierResults
    mu      sync.RWMutex
}

// OutlierConfig contains outlier detection configuration
type OutlierConfig struct {
    EnableMultipleMethods bool
    EnableRobustMethods   bool
    ThresholdMultiplier   float64
    MaxOutlierProportion  float64
    AutomaticThreshold    bool
    EnsembleVoting        bool
    MinAgreement          int
}

// OutlierMethod defines outlier detection methods
type OutlierMethod interface {
    DetectOutliers(data []float64) (*OutlierResults, error)
    GetType() OutlierMethodType
    GetThreshold() float64
    SetThreshold(threshold float64)
}

// OutlierMethodType defines outlier method types
type OutlierMethodType int

const (
    ZScoreMethod OutlierMethodType = iota
    ModifiedZScoreMethod
    IQRMethod
    IsolationForestMethod
    LocalOutlierFactorMethod
    EllipticEnvelopeMethod
    OneClassSVMMethod
    DBSCANMethod
)

// OutlierResults contains outlier detection results
type OutlierResults struct {
    Outliers       []int
    Scores         []float64
    Threshold      float64
    Method         OutlierMethodType
    Confidence     []float64
    Severity       []OutlierSeverity
    Recommendations []string
}

// OutlierSeverity defines outlier severity levels
type OutlierSeverity int

const (
    MildOutlier OutlierSeverity = iota
    ModerateOutlier
    SevereOutlier
    ExtremeOutlier
)

// StatisticalModeler builds statistical models from benchmark data
type StatisticalModeler struct {
    models map[string]StatisticalModel
    config ModelerConfig
    cache  map[string]*ModelResults
    mu     sync.RWMutex
}

// ModelerConfig contains modeler configuration
type ModelerConfig struct {
    EnableAutoModel     bool
    EnableEnsemble      bool
    EnableValidation    bool
    ValidationMethod    ValidationMethod
    CrossValidationFolds int
    BootstrapSamples    int
    ModelSelection      ModelSelectionCriteria
    FeatureEngineering  bool
}

// ValidationMethod defines validation methods
type ValidationMethod int

const (
    HoldoutValidation ValidationMethod = iota
    CrossValidation
    BootstrapValidation
    TimeSeriesSplit
)

// ModelSelectionCriteria defines model selection criteria
type ModelSelectionCriteria int

const (
    AICCriteria ModelSelectionCriteria = iota
    BICCriteria
    CrossValidationScore
    AdjustedRSquared
    FStatistic
)

// StatisticalModel defines statistical models
type StatisticalModel interface {
    Fit(data *BenchmarkData) error
    Predict(input interface{}) (interface{}, error)
    Evaluate(testData *BenchmarkData) (*ModelEvaluation, error)
    GetParameters() ModelParameters
    GetType() StatisticalModelType
}

// StatisticalModelType defines statistical model types
type StatisticalModelType int

const (
    LinearModel StatisticalModelType = iota
    NonLinearModel
    EnsembleModel
    BayesianModel
    RobustModel
    TimeSeriesModel
    MachineLearningModel
)

// ModelResults contains statistical modeling results
type ModelResults struct {
    Model         StatisticalModelType
    Parameters    ModelParameters
    Evaluation    ModelEvaluation
    Validation    ValidationResults
    FeatureImportance []FeatureImportance
    Predictions   PredictionResults
    Diagnostics   ModelDiagnostics
}

// ModelParameters contains model parameters
type ModelParameters struct {
    Coefficients []float64
    Intercept    float64
    Variance     float64
    Degrees      int
    Regularization float64
    Hyperparameters map[string]interface{}
}

// ModelEvaluation contains model evaluation metrics
type ModelEvaluation struct {
    TrainingMetrics  EvaluationMetrics
    TestingMetrics   EvaluationMetrics
    ValidationMetrics EvaluationMetrics
    Overfitting      bool
    Underfitting     bool
    Generalization   float64
}

// EvaluationMetrics contains evaluation metrics
type EvaluationMetrics struct {
    MSE         float64
    RMSE        float64
    MAE         float64
    MAPE        float64
    RSquared    float64
    AdjRSquared float64
    AIC         float64
    BIC         float64
    LogLikelihood float64
}

// ValidationResults contains validation results
type ValidationResults struct {
    Method       ValidationMethod
    Folds        int
    Scores       []float64
    MeanScore    float64
    StdScore     float64
    Confidence   ConfidenceInterval
    Stability    float64
}

// FeatureImportance contains feature importance information
type FeatureImportance struct {
    Feature    string
    Importance float64
    Rank       int
    PValue     float64
    Confidence ConfidenceInterval
}

// ModelDiagnostics contains model diagnostics
type ModelDiagnostics struct {
    Residuals        ResidualAnalysis
    Assumptions      AssumptionResults
    Influence        InfluenceDiagnostics
    Multicollinearity MulticollinearityTest
    Outliers         OutlierDiagnostics
    GoodnessOfFit    GoodnessOfFitTests
}

// GoodnessOfFitTests contains goodness of fit tests
type GoodnessOfFitTests struct {
    ChiSquare        TestResult
    KolmogorovSmirnov TestResult
    AndersonDarling  TestResult
    CramerVonMises   TestResult
    Satisfied        bool
}

// NewStatisticalAnalyzer creates a new statistical analyzer
func NewStatisticalAnalyzer(config AnalyzerConfig) *StatisticalAnalyzer {
    return &StatisticalAnalyzer{
        descriptive:     NewDescriptiveAnalyzer(),
        inferential:     NewInferentialAnalyzer(),
        hypothesis:      NewHypothesisAnalyzer(),
        regression:      NewRegressionAnalyzer(),
        timeSeries:      NewTimeSeriesAnalyzer(),
        outlierDetector: NewOutlierDetector(),
        modeler:        NewStatisticalModeler(),
        config:         config,
        cache:          NewAnalysisCache(),
        metrics:        &AnalysisMetrics{},
    }
}

// AnalyzeData performs comprehensive statistical analysis
func (sa *StatisticalAnalyzer) AnalyzeData(data *BenchmarkData) (*AnalysisResults, error) {
    sa.mu.Lock()
    defer sa.mu.Unlock()

    // Check cache first
    if sa.config.CacheEnabled {
        if cached := sa.cache.Get(data.ID); cached != nil {
            return cached, nil
        }
    }

    // Validate data quality
    if err := sa.validateData(data); err != nil {
        return nil, fmt.Errorf("data validation failed: %w", err)
    }

    // Perform descriptive analysis
    descriptive, err := sa.descriptive.Analyze(data.Values)
    if err != nil {
        return nil, fmt.Errorf("descriptive analysis failed: %w", err)
    }

    // Detect and handle outliers
    outliers, err := sa.outlierDetector.Detect(data.Values)
    if err != nil {
        return nil, fmt.Errorf("outlier detection failed: %w", err)
    }

    // Clean data if outlier removal is enabled
    cleanData := data.Values
    if sa.config.EnableOutlierRemoval && len(outliers.Outliers) > 0 {
        cleanData = sa.removeOutliers(data.Values, outliers.Outliers)
    }

    // Perform inferential analysis
    inferential, err := sa.inferential.Analyze(cleanData)
    if err != nil {
        return nil, fmt.Errorf("inferential analysis failed: %w", err)
    }

    // Create analysis results
    results := &AnalysisResults{
        Descriptive: descriptive,
        Inferential: inferential,
        Outliers:    outliers,
        DataQuality: data.Quality,
        Metadata:    data.Metadata,
        Timestamp:   time.Now(),
    }

    // Cache results
    if sa.config.CacheEnabled {
        sa.cache.Set(data.ID, results)
    }

    return results, nil
}

// CompareData performs statistical comparison between datasets
func (sa *StatisticalAnalyzer) CompareData(data1, data2 *BenchmarkData, testType TestType) (*ComparisonResults, error) {
    // Validate assumptions for the chosen test
    assumptions1 := sa.hypothesis.ValidateAssumptions(data1.Values, testType)
    assumptions2 := sa.hypothesis.ValidateAssumptions(data2.Values, testType)

    // Perform the statistical test
    options := TestOptions{
        Alternative:     TwoSided,
        ConfidenceLevel: sa.config.ConfidenceLevel,
    }

    testResults, err := sa.hypothesis.PerformTest(testType, data1.Values, data2.Values, options)
    if err != nil {
        return nil, fmt.Errorf("hypothesis test failed: %w", err)
    }

    // Calculate effect size
    effectSize := sa.calculateEffectSize(data1.Values, data2.Values)

    // Create comparison results
    results := &ComparisonResults{
        Test:          testResults,
        EffectSize:    effectSize,
        Assumptions1:  assumptions1,
        Assumptions2:  assumptions2,
        Recommendation: sa.generateRecommendation(testResults, effectSize),
        Timestamp:     time.Now(),
    }

    return results, nil
}

// validateData validates benchmark data
func (sa *StatisticalAnalyzer) validateData(data *BenchmarkData) error {
    if len(data.Values) < sa.config.MinSampleSize {
        return fmt.Errorf("insufficient sample size: %d (minimum: %d)",
            len(data.Values), sa.config.MinSampleSize)
    }

    if len(data.Values) > sa.config.MaxSampleSize {
        return fmt.Errorf("sample size too large: %d (maximum: %d)",
            len(data.Values), sa.config.MaxSampleSize)
    }

    // Check for missing values
    for i, value := range data.Values {
        if math.IsNaN(value) || math.IsInf(value, 0) {
            return fmt.Errorf("invalid value at index %d: %f", i, value)
        }
    }

    return nil
}

// removeOutliers removes outliers from data
func (sa *StatisticalAnalyzer) removeOutliers(data []float64, outlierIndices []int) []float64 {
    if len(outlierIndices) == 0 {
        return data
    }

    // Create a map of outlier indices for fast lookup
    outlierMap := make(map[int]bool)
    for _, idx := range outlierIndices {
        outlierMap[idx] = true
    }

    // Filter out outliers
    var cleaned []float64
    for i, value := range data {
        if !outlierMap[i] {
            cleaned = append(cleaned, value)
        }
    }

    return cleaned
}

// calculateEffectSize calculates effect size between two datasets
func (sa *StatisticalAnalyzer) calculateEffectSize(data1, data2 []float64) EffectSizeResults {
    // Calculate means and standard deviations
    mean1 := mean(data1)
    mean2 := mean(data2)
    std1 := stddev(data1)
    std2 := stddev(data2)

    // Pooled standard deviation
    n1, n2 := float64(len(data1)), float64(len(data2))
    pooledStd := math.Sqrt(((n1-1)*std1*std1 + (n2-1)*std2*std2) / (n1 + n2 - 2))

    // Cohen's d
    cohensD := (mean1 - mean2) / pooledStd

    // Glass's delta
    glassD := (mean1 - mean2) / std2

    // Hedges' g (bias-corrected)
    hedgesG := cohensD * (1 - 3/(4*(n1+n2-2)-1))

    // Interpret effect size
    var interpretation EffectSizeInterpretation
    absCohensD := math.Abs(cohensD)
    if absCohensD < 0.2 {
        interpretation = NegligibleEffect
    } else if absCohensD < 0.5 {
        interpretation = SmallEffect
    } else if absCohensD < 0.8 {
        interpretation = MediumEffect
    } else if absCohensD < 1.3 {
        interpretation = LargeEffect
    } else {
        interpretation = VeryLargeEffect
    }

    return EffectSizeResults{
        CohensD:        cohensD,
        GlassD:         glassD,
        HedgesG:        hedgesG,
        Interpretation: interpretation,
    }
}

// generateRecommendation generates recommendations based on analysis results
func (sa *StatisticalAnalyzer) generateRecommendation(testResults *TestResults, effectSize EffectSizeResults) string {
    var recommendations []string

    // Statistical significance
    if testResults.Decision == RejectNull {
        recommendations = append(recommendations, 
            fmt.Sprintf("The difference is statistically significant (p = %.4f)", testResults.PValue))
    } else {
        recommendations = append(recommendations,
            fmt.Sprintf("No statistically significant difference found (p = %.4f)", testResults.PValue))
    }

    // Effect size interpretation
    switch effectSize.Interpretation {
    case NegligibleEffect:
        recommendations = append(recommendations, "The effect size is negligible - the difference may not be practically meaningful")
    case SmallEffect:
        recommendations = append(recommendations, "The effect size is small - the difference is detectable but may have limited practical impact")
    case MediumEffect:
        recommendations = append(recommendations, "The effect size is medium - the difference is likely to have noticeable practical impact")
    case LargeEffect:
        recommendations = append(recommendations, "The effect size is large - the difference has substantial practical significance")
    case VeryLargeEffect:
        recommendations = append(recommendations, "The effect size is very large - the difference has major practical significance")
    }

    // Additional recommendations based on assumptions
    if !testResults.Assumptions.Satisfied {
        recommendations = append(recommendations, "Consider using non-parametric tests due to assumption violations")
    }

    return fmt.Sprintf("%s", recommendations)
}

// Helper functions
func mean(data []float64) float64 {
    if len(data) == 0 {
        return 0
    }
    sum := 0.0
    for _, value := range data {
        sum += value
    }
    return sum / float64(len(data))
}

func stddev(data []float64) float64 {
    if len(data) <= 1 {
        return 0
    }

    m := mean(data)
    sum := 0.0
    for _, value := range data {
        diff := value - m
        sum += diff * diff
    }
    return math.Sqrt(sum / float64(len(data)-1))
}

// Result types
type AnalysisResults struct {
    Descriptive *DescriptiveStats
    Inferential *InferentialResults
    Outliers    *OutlierResults
    DataQuality DataQuality
    Metadata    DataMetadata
    Timestamp   time.Time
}

type ComparisonResults struct {
    Test           *TestResults
    EffectSize     EffectSizeResults
    Assumptions1   AssumptionResults
    Assumptions2   AssumptionResults
    Recommendation string
    Timestamp      time.Time
}

// Placeholder types and implementations
type AnalysisCache struct{}
type AnalysisMetrics struct{}

func NewDescriptiveAnalyzer() *DescriptiveAnalyzer { return &DescriptiveAnalyzer{} }
func NewInferentialAnalyzer() *InferentialAnalyzer { return &InferentialAnalyzer{} }
func NewHypothesisAnalyzer() *HypothesisAnalyzer { return &HypothesisAnalyzer{} }
func NewRegressionAnalyzer() *RegressionAnalyzer { return &RegressionAnalyzer{} }
func NewTimeSeriesAnalyzer() *TimeSeriesAnalyzer { return &TimeSeriesAnalyzer{} }
func NewOutlierDetector() *OutlierDetector { return &OutlierDetector{} }
func NewStatisticalModeler() *StatisticalModeler { return &StatisticalModeler{} }
func NewAnalysisCache() *AnalysisCache { return &AnalysisCache{} }

func (ac *AnalysisCache) Get(key string) *AnalysisResults { return nil }
func (ac *AnalysisCache) Set(key string, results *AnalysisResults) {}
func (da *DescriptiveAnalyzer) Analyze(data []float64) (*DescriptiveStats, error) { return nil, nil }
func (od *OutlierDetector) Detect(data []float64) (*OutlierResults, error) { return nil, nil }
func (ia *InferentialAnalyzer) Analyze(data []float64) (*InferentialResults, error) { return nil, nil }
func (ha *HypothesisAnalyzer) ValidateAssumptions(data []float64, testType TestType) AssumptionResults { return AssumptionResults{} }
func (ha *HypothesisAnalyzer) PerformTest(testType TestType, data1, data2 []float64, options TestOptions) (*TestResults, error) { return nil, nil }

// Example usage
func ExampleStatisticalAnalysis() {
    // Create analyzer configuration
    config := AnalyzerConfig{
        ConfidenceLevel:      0.95,
        SignificanceLevel:    0.05,
        MinSampleSize:        10,
        MaxSampleSize:        10000,
        OutlierThreshold:     3.0,
        EnableOutlierRemoval: true,
        EnableNormalization:  true,
        EnableRobustStats:    true,
        BootstrapSamples:     1000,
        CacheEnabled:         true,
        ParallelProcessing:   true,
        MaxWorkers:           4,
    }

    // Create analyzer
    analyzer := NewStatisticalAnalyzer(config)

    // Create sample benchmark data
    data := &BenchmarkData{
        ID:     "benchmark-001",
        Name:   "CPU Performance Test",
        Values: []float64{1.2, 1.1, 1.3, 1.0, 1.4, 1.2, 1.1, 1.3, 1.5, 1.2},
        Metadata: DataMetadata{
            Source:      "go test -bench",
            Version:     "1.0.0",
            Platform:    "linux/amd64",
            GoVersion:   "1.21.0",
            Iterations:  10,
            Duration:    time.Second,
        },
        Timestamp: time.Now(),
        Quality: DataQuality{
            Completeness: 1.0,
            Consistency:  0.95,
            Accuracy:     0.98,
            Validity:     1.0,
            OverallScore: 0.98,
        },
    }

    // Perform analysis
    results, err := analyzer.AnalyzeData(data)
    if err != nil {
        fmt.Printf("Analysis failed: %v\n", err)
        return
    }

    fmt.Println("Statistical Analysis Results:")
    fmt.Printf("Sample size: %d\n", results.Descriptive.Count)
    fmt.Printf("Mean: %.4f\n", results.Descriptive.Mean)
    fmt.Printf("Median: %.4f\n", results.Descriptive.Median)
    fmt.Printf("Standard deviation: %.4f\n", results.Descriptive.StdDev)
    fmt.Printf("95%% CI for mean: [%.4f, %.4f]\n",
        results.Inferential.PopulationMean.LowerBound,
        results.Inferential.PopulationMean.UpperBound)

    if len(results.Outliers.Outliers) > 0 {
        fmt.Printf("Outliers detected: %d\n", len(results.Outliers.Outliers))
    }

    fmt.Printf("Data quality score: %.2f\n", results.DataQuality.OverallScore)
}

Descriptive Statistics

Comprehensive descriptive statistical analysis of benchmark data.

Central Tendency Measures

Analysis of mean, median, mode, and robust central tendency measures.

Variability Measures

Standard deviation, variance, interquartile range, and robust variability measures.

Distribution Analysis

Shape analysis including skewness, kurtosis, and distribution fitting.

Inferential Statistics

Statistical inference techniques for making population inferences from sample data.

Confidence Intervals

Construction of confidence intervals for population parameters.

Bootstrap Methods

Non-parametric bootstrap methods for robust statistical inference.

Sampling Distributions

Analysis of sampling distributions and their properties.

Hypothesis Testing

Comprehensive hypothesis testing framework for performance comparisons.

Parametric Tests

T-tests, F-tests, and other parametric hypothesis tests.

Non-parametric Tests

Mann-Whitney, Wilcoxon, and other distribution-free tests.

Multiple Comparisons

Correction methods for multiple hypothesis testing scenarios.

Best Practices

  1. Sample Size: Ensure adequate sample sizes for reliable statistical inference
  2. Assumption Validation: Always validate test assumptions before applying methods
  3. Effect Size: Report both statistical significance and practical significance
  4. Multiple Comparisons: Apply appropriate corrections for multiple testing
  5. Outlier Handling: Carefully consider outlier detection and treatment strategies
  6. Robust Methods: Use robust statistical methods when assumptions are violated
  7. Reproducibility: Ensure analyses are reproducible with proper random seeds
  8. Interpretation: Provide clear interpretation of statistical results

Summary

Statistical analysis provides the foundation for evidence-based performance optimization:

  1. Descriptive Analysis: Comprehensive characterization of performance data
  2. Inferential Analysis: Population inferences from sample measurements
  3. Hypothesis Testing: Rigorous testing of performance hypotheses
  4. Effect Size Analysis: Quantification of practical significance
  5. Outlier Detection: Identification and handling of anomalous measurements
  6. Model Building: Statistical models for performance prediction

These techniques enable data-driven performance optimization decisions with proper statistical rigor and confidence.

results matching ""

    No results matching ""