Least Squares Regression with help of LINQ

I received requirements from a customer that said something about Least-Squares Regression something or another. Having not looked at anything but the most elementary statistics since around 2001, I realized that I needed to refresh myself.

I did some background reading and played with LINEST in Excel (that is a whole ball of insanity : http://support.microsoft.com/kb/828533 ). Tonight I decided to implement something in C#. Even if I don’t use it for this particular client, I will have a better understanding of what I am doing.

First, I asked my wife for help. As a biologist and geneticist she is more trained in statistics than I am. She didn’t know off the top of her head, but she pulled out her book which I then dove into.

I had the crazy idea of testing my implementation using the examples straight out of Introduction to the Practice of Statistics.

I started writing Test First.

[TestFixture]

public class LeastSquaresTests

{

    readonly double[] nonExerciseActivityIncrease = { -94, -57, -29, 135, 143, 151, 245,

                            355, 392, 473, 486, 535, 571, 580, 620, 690 };

    readonly double[] fatGain = { 4.2, 3.0, 3.7, 2.7, 3.2, 3.6, 2.4,

                   1.3, 3.8, 1.7, 1.6, 2.2, 1.0, 0.4, 2.3, 1.1 };

    private IEnumerable<Point> getPoints()

    {

        return from i in Enumerable.Range(0, nonExerciseActivityIncrease.Length )

               select new Point(nonExerciseActivityIncrease[i], fatGain[i]);

    }

    [Test]

    public void XYTest()

    {

        var result = getPoints().GetLeastSquares();

        Assert.AreEqual(324.8, nonExerciseActivityIncrease.Average(), 0.1);

        Assert.AreEqual(324.8, result.Xmean, 0.1);

        Assert.AreEqual(257.66, result.XstdDev, 0.01);

        Assert.AreEqual(2.388, result.Ymean, 0.001);

        Assert.AreEqual(1.1389, result.YstdDev, 0.0001);

        Assert.AreEqual(-0.7786, result.Correlation, 0.0001);

        Assert.AreEqual(-0.00344, result.Slope, 0.00001);

        Assert.AreEqual(3.505, result.Intercept, 0.001);

    }

}

My code wouldn’t compile. This is what Test First is all about. Then I implemented things.

 

public static class LeastSquares

{

    public static LeastSquaresResponse GetLeastSquares(this IEnumerable<System.Windows.Point> points)

    {

        var xMean = (from p in points select p.X).Average();

        var yMean = (from p in points select p.Y).Average();

        var xVariance = (from p in points select Math.Pow(p.X – xMean, 2.0)).Sum() / (points.Count() – 1);

        var yVariance = (from p in points select Math.Pow(p.Y – yMean, 2.0)).Sum() / (points.Count() – 1);

        var xStdDev = Math.Sqrt(xVariance);

        var yStdDev = Math.Sqrt(yVariance);

 

        var correlation = (from p in points select ((p.X – xMean) / xStdDev) * ((p.Y – yMean) / yStdDev)).Sum() / (points.Count() – 1);

 

        var slope = correlation * yStdDev / xStdDev;

        var intercept = yMean – slope * xMean;

 

        return new LeastSquaresResponse(xMean,

                   yMean,

                   xVariance,

                   yVariance,

                   xStdDev,

                   yStdDev,

                   correlation,

                   slope,

                   intercept);

    }

}

I find the LINQ syntax to be very readable for these types of math operations. I definitely like aggregation better than using for-loops.

I declared a type LeastSquaresResponse which returns everything that is calculated when Least-Squares is computed. This is really just for testing sake. I wanted to be able to test all the values.

For completeness, I’ve incluced the definition of LeastSquaresResponse. Maybe next year I will do Analysis of Variance (ANOVA), but probably not.

public class LeastSquaresResponse

{

    private readonly double xmean;

    public double Xmean

    {

        get { return xmean; }

    }

    private readonly double ymean;

    public double Ymean

    {

        get { return ymean; }

    }

    private readonly double xvariance;

    public double Xvariance

    {

        get { return xvariance; }

    }

    private readonly double yvariance;

    public double Yvariance

    {

        get { return yvariance; }

    }

    private readonly double xstdDev;

    public double XstdDev

    {

        get { return xstdDev; }

    }

    private readonly double ystdDev;

    public double YstdDev

    {

        get { return ystdDev; }

    }

    private readonly double correlation;

    public double Correlation

    {

        get { return correlation; }

    }

    private readonly double slope;

    public double Slope

    {

        get { return slope; }

    }

    private readonly double intercept;

    public double Intercept

    {

        get { return intercept; }

    }

 

    /// <summary>

    /// Initializes a new instance of the LeastSquaresResponse class.

    /// </summary>

    /// <param name=”xmean”></param>

    /// <param name=”ymean”></param>

    /// <param name=”xvariance”></param>

    /// <param name=”yvariance”></param>

    /// <param name=”xstdDev”></param>

    /// <param name=”ystdDev”></param>

    /// <param name=”correlation”></param>

    /// <param name=”slope”></param>

    /// <param name=”intercept”></param>

    public LeastSquaresResponse(double xmean, double ymean, double xvariance, double yvariance, double xstdDev, double ystdDev, double correlation, double slope, double intercept)

    {

        this.xmean = xmean;

        this.ymean = ymean;

        this.xvariance = xvariance;

        this.yvariance = yvariance;

        this.xstdDev = xstdDev;

        this.ystdDev = ystdDev;

        this.correlation = correlation;

        this.slope = slope;

        this.intercept = intercept;

    }

}