You can find more details on the method described below in the article "Modelling life cycle related and individual shape variation in bological specimens" by Y.A.Hicks, A.D.Marshall, R.R.Martin, P.L.Rosin, M.M.Bayer and D.G.Mann in BMVC2002.

Here we present a model of diatom shapes based on principal curves. Each principal curve models the growth trajectory of a diatom species. Our model is suitable for reconstruction purposes, allowing us to produce the drawings of diatom life cycle related shape changes, thus providing a link between the photographs and drawings. We apply our model to classification of photographed and drawn specimens obtaining the results comparable to other diatom identification systems. Finally, given a diatom specimen we are able not only to identify the species it belongs to but also to pinpoint the stage in the life cycle it represents.

First we extract diatom contours from the digital photographs using a succession of automatic thresholding, area closing and area filling operations. Then we represent the extracted contours using Fourier descriptors. Before representing the extracted contours using Fourier descriptors we resample them to the same length in order to obtain the same number of Fourier descriptors for each contour.

Prior to modelling the diatom shape data we find main modes of variation in the data set of all species through PCA. We model the life cycle variation in a single species using a principal curve going through the middle of the corresponding data set created by the Fourier descriptor vectors projected into the eigenspace. For illustrattive purposes we model the shape variation of Fragilariforma bicapitata. In the figure below you can see the original data set projected into the space of the first and third eigenvectors with overlaid correponding diatom contours, as well as the principal curve fitted into the data. The fitted principal curve follows the growth trajectory of Fragilariforma bicapitata, as that provides the main source of shape variation. Individual shape variations lie in the dimensions orthogonal to the principal curve.


Now lets turn our attention to classifying specimens over a wide range of species. We fit an individual principal curve into each of the available 22 species shape data. The fitted principal curves can be viewed as a drastically reduced dimensional description of the life cycle variation across the species. See the graph of principal curves projected into the space of the first three eigenvectors below.

This model can be used for classification of the new specimens through finding their distance from each of the principal curves and then assigning them to the closest one.

We tested the model in identification experiments using the standard ``leave one out'' approach, by training the model on all specimens apart from one and classifying that specimen using the model. The experiment was repeated to omit each specimen out of the total 268. Approximately 84% were classified correctly compared to 77% achieved in ADIAC project when using only Fourier descriptors for identification. However, we used only a third of the set used in the ADIAC, so firm conclusions cannot be made.Next, we tested how our model performs when identifying the drawings of diatoms. We tested the model on 11 drawings of the species included in the model. The model correctly identified 7 out of the 11 drawings, which is comparable to identification results on the photographs. We expect the identification accuracy to improve for both photographs and drawings in the future when we include internal features descriptors into the model.

As we mentioned above, given a photograph (or a drawing) of a diatom specimen we can not only identify the species it belongs to but pinpoint the stage of the life cycle it represents. To illustrate this, we projected the vectors representing the shape of Fragilariforma bicapitata specimens onto the corresponding principal curve and then sorted them in the order of their principal curve projection points. The results were very close to the expected order, as you can see below.

Finally, we reconstructed diatom contours from the knots on the principal curve representing Fragilariforma bicapitata species. You can see the reconstructed drawings in the order of the knots on the curve in the figure below. All the contours are scaled to the same length, however, if we incorporate the diatom length into the model, they can be scaled to the correct length.