Jump to content
Objectivism Online Forum

The calculation for multi-variate regression

Rate this topic


softwareNerd

Recommended Posts

This post has nothing to do with philosophy. Please skip it if you have only a layman's knowledge or interest in statistics.

I have spent a while Googling for a formula in the area of Multiple Regression (linear, least-square fit), that I want to use as a basis for a software algorithm. I thought I'd find it easily, but was surprised when I did not. Here's a description of what I'm looking for. (I am not using very formal notation in what follows, to keep things simple; and, I figure that someone who can answer will understand this notation.)

One familar formulation used to guess the value of Y(a dependent variable), from known values of x1, x2, x3,...xN (independent variables), is the following: Y = a + b1.x1 + b2.x2 + b3.x3 + ... bN.xN ('.' ... the dot symbolizing multiplication)

where b1, b2,b3...bN have been calculated/estimated from some sample data.

In the case where there is only a single independent variable (x1), we can estimate the values of b1 and a from the sample data as follows:

b1 = Covariance(x,y) / Var(x)

Alternately,

b1 = (n.Sxy - Sx.Sy) / (n.Sxx - Sxx) ... where S is being used to symbolize sigma/sum-of

and

a = Mean(Y) - b1.Mean(x1)

Now, if one has three independent variable (x1, x2 and x3), one can use:

a = Mean(Y) - b1.Mean(x1) - b2.Mean(x2) - b3.Mean(x3)

My question is, what about the calculations to derive b1, b2 and b3 ? At the stage where I need to calculate these three, I will already know all the Variances and Covariance... Var(x1),Var(x2), Var(x3),Cov(x1,x2), Cov(x1,x3), Cov(x2,x3)...so, if I can speed things up by using these as input to the algorithm, that'll be good.

I've searched pages and pages of stuff on the Internet and even check a few textbooks. Problem is that many of them explain the conceptual ideas, assume that the actual calculations will be performed with a tool like the R-project tool or MATLAB. The best I got was a way to calculate with two independent variables, and I'm not sure how to extrapolate that to the three variable situation.

A web-link to a place where I can read up on this would be helpful. Alternatively, a web-link to a forum where I might find people who could give me a link would work too!

Link to comment
Share on other sites

I don't think I can work through the math tonight, but I noticed that Wolfram MathWorld has a page with the derivation for LeastSquaresFitting, and even though it is for only one variable, you could probably extend it to any number of variables. Actually it looks fairly simple.

You would end up with a system of equations and you would have to solve it with Gauss-Jordan Elimination.

[Edited to eliminate redundancy]

Edited by necrovore
Link to comment
Share on other sites

I did find one open-source Java program which was calculating a matrix of beta (b1, b2, b3...), using two input matrices: X and Y that hold the sample data,

X looks like this (or like the transpose of this...not sure which):

x11, x12, x13, x14 ...

x21, x22, x23,...

x31, x32,...

Y looks like (y1, y2, y3, y4...)

I don't know if the guy who coded this did it right, but here is what he does with the matrices. (This is the calculation where the interacept A is assumed to be zero):

  1. Calculate TX <-- (transpose of X)
  2. Calculate M1 <-- TX mult X (matrix multiplication)
  3. Calculate M2 <-- Inverse of M1
  4. Calculate M3 <-- M2 mult TX
  5. Calculate B <-- M3 mult Y

And B, supposedly, will be the matrix of values being sought (b1, b2, b3...).

I have to get this running with something small, to see if it looks right.

Link to comment
Share on other sites

I did find one open-source Java program which was calculating a matrix of beta (b1, b2, b3...), using two input matrices: X and Y that hold the sample data,

X looks like this (or like the transpose of this...not sure which):

x11, x12, x13, x14 ...

x21, x22, x23,...

x31, x32,...

Y looks like (y1, y2, y3, y4...)

I don't know if the guy who coded this did it right, but here is what he does with the matrices. (This is the calculation where the interacept A is assumed to be zero):

  1. Calculate TX <-- (transpose of X)
  2. Calculate M1 <-- TX mult X (matrix multiplication)
  3. Calculate M2 <-- Inverse of M1
  4. Calculate M3 <-- M2 mult TX
  5. Calculate B <-- M3 mult Y

And B, supposedly, will be the matrix of values being sought (b1, b2, b3...).

I have to get this running with something small, to see if it looks right.

Hi,

This is correct.

Given we want the least squares solution to a system of equations Bx = b (here the x's are our variables and the B and b matricies contain our data), the optimal solution is x = ( B^t B ) ^ -1 B^t b. This is precisely what the pseudocode you have provided does.

Good luck!

--DW

Edit: to remove emoticon with shades from mathematical formula.

Edited by DarkWaters
Link to comment
Share on other sites

Thanks to both. Since I'm not comfortable doing the math derivations myself, I simply want to implement (i.e. program) it by translating a reliable formula into Java. The Matrix program I found has some programming problems, but since you've confirmed it's got the right approach, I'll take a closer look at it. Meanwhile, I also plan on looking at The R-Project, to see if I can use some components from there.

I have one further question. If I already have a matrix with all the covariances calculated, can I use that in some way. i.e. is there a formula that uses that as one component? Basically, I already have the covariances cov(xi,y) and also all the cov(xi,xj).

Link to comment
Share on other sites

I have one further question. If I already have a matrix with all the covariances calculated, can I use that in some way. i.e. is there a formula that uses that as one component? Basically, I already have the covariances cov(xi,y) and also all the cov(xi,xj).

I think this site has what you're looking for. Search for the section title "Obtaining b weights from a Correlation Matrix" about two-thirds of the way down. (I'm too tired right now to be sure I'm thinking of the right formula for the covariance, or cross-covariance, matrix, so I'll leave that to you.)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...