## A model of shape and texture variation in diatoms

Here we update our principal curves model to contain internal texture parameters
as well as the Fourier descriptors representing external contours.

Prior to modelling the diatom shape and texture data we normalise
the data (the set of parameter values described above for all
specimens from all species) to have zero mean and standard deviation
of one. We find the main modes of variation in the data set of all
species through PCA. Then we model the life cycle size, shape and
texture variation in a single species using a principal curve going
through the middle of the corresponding data subset created by the
Fourier descriptors, texture parameters and size vectors projected
into the eigenspace.

To build a model incorporating several species, we simply compute a
principal curve for each species using an appropriate training data
subset. This approach allows us to extend
the model to include a new species easily, which is more difficult for
a decision-tree based diatom identification method as used in the ADIAC.

**Principal curves and the data used for their
training, projected into the space of three
largest eigenvectors. Different species are
represented with different symbols.**