A model of shape and texture variation in diatoms

Here we update our principal curves model to contain internal texture parameters as well as the Fourier descriptors representing external contours.

Prior to modelling the diatom shape and texture data we normalise the data (the set of parameter values described above for all specimens from all species) to have zero mean and standard deviation of one. We find the main modes of variation in the data set of all species through PCA. Then we model the life cycle size, shape and texture variation in a single species using a principal curve going through the middle of the corresponding data subset created by the Fourier descriptors, texture parameters and size vectors projected into the eigenspace.

To build a model incorporating several species, we simply compute a principal curve for each species using an appropriate training data subset. This approach allows us to extend the model to include a new species easily, which is more difficult for a decision-tree based diatom identification method as used in the ADIAC.

Principal curves and the data used for their training, projected into the space of three largest eigenvectors. Different species are represented with different symbols.