I'm trying to use the mclust and flexmix packages in R to do unsupervised clustering of my data which has both continuous variables and categorical variables. I'm having a hard time understanding how categorical variables can be thought of as a distribution with mean and variance - like continuous variables with Gaussian distribution?
For example, if I have 2 continuous variables (say weight, height) I can draw ellipsoids around clusters based on that cluster's Gaussian for weight in 1 dimension and a Gaussian for height in the other dimension. But if I have 1 continuous variable (say weight) and a categorical variable (say gender - M/F), I won't be drawing scatterplots but would rather be some dots for Male and some dots for Female, how do I draw ellipsoids around clusters then?
I read that flexfix can use Binomial distribution for categorical variables, but I just can't visualize how that would work?
I've read other questions regarding dealing with categorical features in mixture models (How to deal with categorical feature in a Gaussian Mixture model clustering model)... but the answers doesn't really address my question.
Any insight would be greatly appreciated! Cheers.