how to use categorical variable with continuous variables in a EM mixture model

by Ming Han   Last Updated October 16, 2019 15:19 PM

I'm trying to use the mclust and flexmix packages in R to do unsupervised clustering of my data which has both continuous variables and categorical variables. I'm having a hard time understanding how categorical variables can be thought of as a distribution with mean and variance - like continuous variables with Gaussian distribution?

For example, if I have 2 continuous variables (say weight, height) I can draw ellipsoids around clusters based on that cluster's Gaussian for weight in 1 dimension and a Gaussian for height in the other dimension. But if I have 1 continuous variable (say weight) and a categorical variable (say gender - M/F), I won't be drawing scatterplots but would rather be some dots for Male and some dots for Female, how do I draw ellipsoids around clusters then?

I read that flexfix can use Binomial distribution for categorical variables, but I just can't visualize how that would work?

I've read other questions regarding dealing with categorical features in mixture models (How to deal with categorical feature in a Gaussian Mixture model clustering model)... but the answers doesn't really address my question.

Any insight would be greatly appreciated! Cheers.



Related Questions


Updated June 30, 2017 05:19 AM

Updated August 09, 2015 17:08 PM

Updated April 23, 2016 08:08 AM