Navigation: Main Page » Computer / Recreational / Science / Society / Television
 
Web N-N-A.com
         
Science Groups Forum Index  »  Statistics - Math  »  how to compare the Gaussian probability of data with differe
Page 1 of 1    
Author Message
zl2k
Posted: Mon Mar 19, 2007 8:50 pm
Guest
hi, all
Suppose I have a data set with data having m-dimensions, and I have a
multivariate Gaussian (m dimensional) to describe it. In some cases,
the covariance matrix could be singula and I have to deduce the
dimension by projecting the data to a lower dimension to calculate
the probability. My question is: how can I compare the probability
using m dimension with using (m-k) dimension? (usually k=1) The change
of the dimensionality is only because of the singula covariance
matrix.
What I am thinking is that the data using lower dimension will have
higher probability than using higher dimension. How can I make
adjustment to compensate of change of the dimensionality? Thanks for
help.
zl2k
illywhacker
Posted: Tue Mar 20, 2007 12:11 pm
Guest
On Mar 19, 8:05 pm, "zl2k" <kdsfin...@gmail.com> wrote:
Quote:
hi, all
Suppose I have a data set with data having m-dimensions, and I have a
multivariate Gaussian (m dimensional) to describe it. In some cases,
the covariance matrix could be singula and I have to deduce the
dimension by projecting the data to a lower dimension to calculate
the probability. My question is: how can I compare the probability
using m dimension with using (m-k) dimension? (usually k=1) The change
of the dimensionality is only because of the singula covariance
matrix.
What I am thinking is that the data using lower dimension will have
higher probability than using higher dimension. How can I make
adjustment to compensate of change of the dimensionality? Thanks for
help.
zl2k

A probability is a real number: you can always compare them. The
question is what does it mean? If you calculate the probability that
the data lies in some set in the m-dimensional space, then in the case
that the covariance is singular, you will either get zero or you will
get the same probability that you would get if you first marginalized
to the codimension k surface and then computed the probabilty of the
data lying in the intersection of this surface with your set.

I think you need to describe what you want to achieve with this
comparison, what the context is, etc., if you want a truly useful
answer.

illywhacker;
illywhacker
Posted: Tue Mar 20, 2007 4:20 pm
Guest
On Mar 20, 2:56 pm, "zl2k" <kdsfin...@gmail.com> wrote:
Quote:
On Mar 20, 5:49 am, "illywhacker" <illywac...@gmail.com> wrote:
On Mar 19, 8:05 pm, "zl2k" <kdsfin...@gmail.com> wrote:

There are two sections to my reply. The first section addresses a
misunderstanding you seem to have about probability densities, but is
not
directly relevant to your problem. The second section is relevant to
your
problem.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
----- Section 1 -----------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Quote:
Let me have a specific example. I have one mulitvariable Gaussian,
d=2, mean1 =[0; 0], sigma1=[1 0; 0 1]. (G1)
I have another Gaussian, d=3, mean2=[0;0;0], sigma2=[1 0 0; 0 1 0; 0 0
1]. (G2)
Now I have x1=mean1 for G1 or x2=mean2 for G2.
Obviously, P(x1|G1) > P(x2|G2).

OK. What you say here is not clear. Let g1(x1) be the density function
for
G1 and g2(x2) be the density function for G2. I assume you intend

g1(mean1) > g2(mean2) ,

or, since the means are zero,

g1(0) > g2(0) .

But remember you are talking about probability densities. So in the
first
case, you are calculating the probability that x1 lies in the set

[0, dx] x [0, dy]

whereas in the second case you are calculating the probability that x2
lies
in the set

[0, dx] x [0, dy] x [0, dz] .

I am not sure what relation you expect to obtain between these two
probabilities. What is true is that the probability that x2 lies in
the set

[0, dx] x [0, dy] x [-infty, infty] (*)

is the same as the probability that x1 lies in the set

[0, dx] x [0, dy] .

This is a special case of marginalization. That is, you have a map f
from
one space X to another Y:

f: X -> Y .

You have a probability distribution on X. Call it P. Then you can
generate
a probability distribution on Y as follows. The probability Q(A) of
any set
A \subset Y is given by

Q(A) = P(f^{-1}A) ,

where

f^{-1}(A) = {x \in X such that f(x) \in A) . In terms of probability
density functions and coordinates x on X and y on Y (these are not the
same
as the x and y above), one has

q(y) = \int dx \delta(y, f(x) p(x) .

For your problem, call coordinates on R^{3}, (x, y, z) and coordinates
on
R^{2} (x', y'). The map f is

(x', y') = f(x, y, z) = (x, y) .

So the above integral becomes

q(x', y') = \int dx dy dz \delta((x', y'), (x, y)) p(x, y, z)

= \int dz p(x', y', z) ,

which is the same as equation (*).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
----- Section 2 -----------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Quote:
The background of my question is that I have m-dimensional data and I
can estimate a Gaussian model based on that. (Let's call it datasetA)
Given a new dataset (still m-dimensional) and becaused of the
dependensy among the variable, the sigma may become singula so I'll
get a Gaussian with less dimensions. (Let's call it datasetB). Giving
a data x, I am asking the question: is the x more likely to be from
datasetA or from datasetB?

There are many points to be made here. The most important is the last
one,
but you might not understand it unless you read the others first Smile.

1) You need to be careful how you speak about these things. Surely you
do
not mean 'is x more likely to be from datasetA or datasetB'. A dataset
is
either a set of measurements you have made and their outcomes, or, if
you
throw away the information about the measurements, it is a set of
numbers
that you know. So another number x either comes from one of the
measurements in the dataset, or is equal to one of the numbers in the
dataset, or it does not. There is no question to answer.

What you mean is that there are two processes generating the data,
call
them A and B, and you have a series of measurements of the 'outputs'
of
processes A and B. (These outputs are points in R^{m}.) You are
modelling
the outputs of these processes using Gaussian distributions.

The probability of a datapoint x given that it was generated by
process A
is a Gaussian with a non-singular covariance. On the other hand, the
covariance of points x generated by process B is singular: all the
points
lie on a hyperplane embedded in R^{m}, and on this plane there is a
Gaussian distribution giving the probability of the points.

2) How do we find out which process generated a new data point? Let us
set
m = 2 for ease of discussion and let us ignore the singularity for the
moment and suppose that the two distributions are non-singular. Call
the
density functions f_{A} for process A and f_{B} for process B. Let Q
be a
variable that can take the values A or B indicating the process that
generated the measurement. Then

P(Q | (x, y)) = P((x, y) | Q ) P(Q) / P((x, y))

= f_{Q}(x, y) P(Q) / (f_{A}(x, y) P(Q = A) + f_{B}(x, y) P(Q = B)) .

Let us assume that the two processes are a priori equally likely
(which
seems very unrealistic given the strong constraint on B), i.e.

P(Q = A) = P(Q = B) .

Then

P(Q | (x, y)) = f_{Q}(x, y) / (f_{A}(x, y) + f_{B}(x, y)) . (*)

You can now estimate the value of Q by choosing whichever value has
higher
probability, A or B.

3) The above was the general situation. Now let us reintroduce the
singularity. Suppose f_{B} has the form

f_{B} = delta(y) h(x) ,

while f_{A} = g(x, y) .

The delta function is an infinitely high spike at zero and is zero
elsewhere, and integrates to 1. It is a zero variance Gaussian if you
like.
The problem with the singular distribution is that as a distribution
on
R^{2} it has infinite density at y = 0, and is zero elsewhere, which
means
that things are not all that well-defined. Still, pushing on anyway,
equation (*) seems to tell us that the probability that Q = A is 1 if
the
data point (x, y) does not have y = 0, because such a point could
never
have been generated by process B, whereas points of the form (x, 0)
result
in probability 1 for process B, since f_{B} assigns infinite density
to
such points, unlike f_{A}. None of this is very meaningful
mathematically,
but it is easy to make it so.

Look at the probability that the point (x, y) lies in the set S = [x,
x +
d] x [-e/2, e/2], where d and e are small quantities. Using these
probabilities instead of the point probabilities, we find

P(Q = A | (x, y) \in S) = e g(x, y) / (e g(x, y) + h(x)) ,

and

P(Q = B| (x, y) \in S) = h(x) / (e g(x, y) + h(x)) .

Clearly for e small, the second is higher than the first unless g is
very
peaked at y = 0 (i.e. nearly singular itself), and this confirms the
conclusion reached before but in a better defined way, with no
infinities.

4) The problem is that you have no value for e. How big should it be?
This
brings us to the most important point. How could a singular covariance
be
inferred in reality? Since measurement precision is never infinite, a
proper inference procedure with measurement errors included will never
give
you a singular covariance. By measurement errors I mean that the same
value
on a digital readout or of a pointer can inevitably be produced by a
range
of values of the quantity measured.

Ignoring measurement errors (or other types of error) may be justified
when
they are swamped by the variance of the distribution itself, but when
the
variance of the distribution is zero in some direction, these small
errors
become significant, and you need to model them. When you do, your
problem
will disappear.

illywhacker;
zl2k
Posted: Tue Mar 20, 2007 4:20 pm
Guest
On Mar 20, 5:49 am, "illywhacker" <illywac...@gmail.com> wrote:
Quote:
On Mar 19, 8:05 pm, "zl2k" <kdsfin...@gmail.com> wrote:

hi, all
Suppose I have a data set with data having m-dimensions, and I have a
multivariate Gaussian (m dimensional) to describe it. In some cases,
the covariance matrix could be singula and I have to deduce the
dimension by projecting the data to a lower dimension to calculate
the probability. My question is: how can I compare the probability
using m dimension with using (m-k) dimension? (usually k=1) The change
of the dimensionality is only because of the singula covariance
matrix.
What I am thinking is that the data using lower dimension will have
higher probability than using higher dimension. How can I make
adjustment to compensate of change of the dimensionality? Thanks for
help.
zl2k

A probability is a real number: you can always compare them. The
question is what does it mean? If you calculate the probability that
the data lies in some set in the m-dimensional space, then in the case
that the covariance is singular, you will either get zero or you will
get the same probability that you would get if you first marginalized
to the codimension k surface and then computed the probabilty of the
data lying in the intersection of this surface with your set.

I think you need to describe what you want to achieve with this
comparison, what the context is, etc., if you want a truly useful
answer.

illywhacker;

I think what you said "you will
Quote:
get the same probability that you would get if you first marginalized
to the codimension k surface and then computed the probabilty of the
data lying in the intersection of this surface with your set." is what I need to know.

Let me have a specific example. I have one mulitvariable Gaussian,
d=2, mean1 =[0; 0], sigma1=[1 0; 0 1]. (G1)
I have another Gaussian, d=3, mean2=[0;0;0], sigma2=[1 0 0; 0 1 0; 0 0
1]. (G2)
Now I have x1=mean1 for G1 or x2=mean2 for G2.
Obviously, P(x1|G1) > P(x2|G2).
However, I would expect the probability should be roughly the same
since if I project the G2 to 2 dimension, I get G1. Both x1 and x2 are
locate at the center. The difference of the probability is due to the
difference of dimension. It is not because of the deviate of x from
the mean. It is also not because of the shape of the Gaussian if they
are having the same dimension.
So my question is, how can I compensate that difference such that the
probability getting from different dimensions are comparable? Maybe I
should project the G2 to G1? I am even not sure if my concern make
sense or not. Thanks for comments.

The background of my question is that I have m-dimensional data and I
can estimate a Gaussian model based on that. (Let's call it datasetA)
Given a new dataset (still m-dimensional) and becaused of the
dependensy among the variable, the sigma may become singula so I'll
get a Gaussian with less dimensions. (Let's call it datasetB). Giving
a data x, I am asking the question: is the x more likely to be from
datasetA or from datasetB?

zl2k
Richard Ulrich
Posted: Wed Mar 21, 2007 4:16 am
Guest
On 19 Mar 2007 12:05:45 -0700, "zl2k" <kdsfinger@gmail.com> wrote:

Quote:
hi, all
Suppose I have a data set with data having m-dimensions, and I have a
multivariate Gaussian (m dimensional) to describe it. In some cases,
the covariance matrix could be singula and I have to deduce the
dimension by projecting the data to a lower dimension to calculate
the probability. My question is: how can I compare the probability
using m dimension with using (m-k) dimension? (usually k=1) The change
of the dimensionality is only because of the singula covariance
matrix.
What I am thinking is that the data using lower dimension will have
higher probability than using higher dimension. How can I make
adjustment to compensate of change of the dimensionality? Thanks for
help.

Illy has given a long response, which seems okay.

Here is a short response. If you are comparing
likelihood functions - which is what it sounds like
and looks like from the later example - you might
look up AIC and BIC.



--
Rich Ulrich, wpilib@pitt.edu
http://www.pitt.edu/~wpilib/index.html
illywhacker
Posted: Wed Mar 21, 2007 12:09 pm
Guest
On Mar 21, 3:31 am, Richard Ulrich <Rich.Ulr...@comcast.net> wrote:
Quote:
On 19 Mar 2007 12:05:45 -0700, "zl2k" <kdsfin...@gmail.com> wrote:

hi, all
Suppose I have a data set with data having m-dimensions, and I have a
multivariate Gaussian (m dimensional) to describe it. In some cases,
the covariance matrix could be singula and I have to deduce the
dimension by projecting the data to a lower dimension to calculate
the probability. My question is: how can I compare the probability
using m dimension with using (m-k) dimension? (usually k=1) The change
of the dimensionality is only because of the singula covariance
matrix.
What I am thinking is that the data using lower dimension will have
higher probability than using higher dimension. How can I make
adjustment to compensate of change of the dimensionality? Thanks for
help.

Illy has given a long response, which seems okay.

Here is a short response. If you are comparing
likelihood functions - which is what it sounds like
and looks like from the later example - you might
look up AIC and BIC.

--
Rich Ulrich, wpi...@pitt.eduhttp://www.pitt.edu/~wpilib/index.html

This would be good for model estimation, but if I understand
correctly, the OP assumes that the models are already known. Now the
point is to classify new data.
Richard Ulrich
Posted: Thu Mar 22, 2007 4:18 am
Guest
On 21 Mar 2007 02:45:29 -0700, "illywhacker" <illywacker@gmail.com>
wrote:

Quote:
On Mar 21, 3:31 am, Richard Ulrich <Rich.Ulr...@comcast.net> wrote:
On 19 Mar 2007 12:05:45 -0700, "zl2k" <kdsfin...@gmail.com> wrote:

hi, all
Suppose I have a data set with data having m-dimensions, and I have a
multivariate Gaussian (m dimensional) to describe it. In some cases,
the covariance matrix could be singula and I have to deduce the
dimension by projecting the data to a lower dimension to calculate
the probability. My question is: how can I compare the probability
using m dimension with using (m-k) dimension? (usually k=1) The change
of the dimensionality is only because of the singula covariance
matrix.
What I am thinking is that the data using lower dimension will have
higher probability than using higher dimension. How can I make
adjustment to compensate of change of the dimensionality? Thanks for
help.

Illy has given a long response, which seems okay.

Here is a short response. If you are comparing
likelihood functions - which is what it sounds like
and looks like from the later example - you might
look up AIC and BIC.

--
Rich Ulrich, wpi...@pitt.eduhttp://www.pitt.edu/~wpilib/index.html

This would be good for model estimation, but if I understand
correctly, the OP assumes that the models are already known. Now the
point is to classify new data.

If you figured out what he is doing, you are ahead of me.

Conventionally, if he were talking about "probability" for
classifying, he would use the tail probabilities -- but I do not
see how that fits his description. He is mis-describing
likelihoods as probabilities, so far as I could tell.

I don't know what he gets if he normalizes each term by
dividing into the Maximum likelihood. Is there a correction
for number-of-parameters, like when using AIC or BIC?

--
Rich Ulrich, wpilib@pitt.edu
http://www.pitt.edu/~wpilib/index.html
illywhacker
Posted: Thu Mar 22, 2007 12:08 pm
Guest
On Mar 22, 5:07 am, Richard Ulrich <Rich.Ulr...@comcast.net> wrote:
Quote:
On 21 Mar 2007 02:45:29 -0700, "illywhacker" <illywac...@gmail.com
wrote:
On Mar 21, 3:31 am, Richard Ulrich <Rich.Ulr...@comcast.net> wrote:
On 19 Mar 2007 12:05:45 -0700, "zl2k" <kdsfin...@gmail.com> wrote:

hi, all
Suppose I have a data set with data having m-dimensions, and I have a
multivariate Gaussian (m dimensional) to describe it. In some cases,
the covariance matrix could be singula and I have to deduce the
dimension by projecting the data to a lower dimension to calculate
the probability. My question is: how can I compare the probability
using m dimension with using (m-k) dimension? (usually k=1) The change
of the dimensionality is only because of the singula covariance
matrix.
What I am thinking is that the data using lower dimension will have
higher probability than using higher dimension. How can I make
adjustment to compensate of change of the dimensionality? Thanks for
help.

Illy has given a long response, which seems okay.

Here is a short response. If you are comparing
likelihood functions - which is what it sounds like
and looks like from the later example - you might
look up AIC and BIC.

--
Rich Ulrich, wpi...@pitt.eduhttp://www.pitt.edu/~wpilib/index.html

This would be good for model estimation, but if I understand
correctly, the OP assumes that the models are already known. Now the
point is to classify new data.

If you figured out what he is doing, you are ahead of me.

Conventionally, if he were talking about "probability" for
classifying, he would use the tail probabilities -- but I do not
see how that fits his description. He is mis-describing
likelihoods as probabilities, so far as I could tell.

He should calculate the probablity of the different classes given the
data, which is what I was saying in my post. (I hope you are not going
to tell me that the 'probability of a class' makes no sense because
'class' is not a 'random variable' but simply an unknown
quantity Smile !)

He may be misusing the word 'likelihood' slightly, but it is more
useful as a term for the probability distribution of the data (which
may also be a function of other parameters) than as a term for this
quantity when viewed solely as a function of those parameters, a
distinction that has no mathematical use as far as I can see.

Quote:
I don't know what he gets if he normalizes each term by
dividing into the Maximum likelihood. Is there a correction
for number-of-parameters, like when using AIC or BIC?

There is no need for ad hoc solutions: probability theory tells you
what to do. If he were learning the model covariance, then he might
want to add extra weight to the prior proobability of a singular model
if that were reasonable given his context, but once the models are
learned and one of them is singular, then as far as I can see the game
is over. If the data point to be classified is on the hyperplane, it
comes from the class with the singular distribution; if it is not on
the hypersurface it comes from the class with the non-singular
distribution. This is true as the limit of non-singular distributions,
and so makes sense.

illywhacker;
 
Page 1 of 1       All times are GMT
The time now is Fri Sep 03, 2010 10:32 pm