Identifying diatoms from photographs and drawings using our model

Experiment 1

To assess the quality of the produced drawings in a non-biased experiment, we supplied an external expert in diatoms with the collection of produced 303 drawings in a random order. The expert had not been informed of the names or the number of the diatom species in the selection and asked to identify the species in each drawing. In total, 9 out of 13 present species were identified correctly, with 140 out of 303 diatoms in drawings identified to the species, and 225 out of 303 diatoms identified to the genus. The expert commented that in some misidentified cases, he had not previously encountered the species; while in other cases, a detailed representation of the raphe slits was required in the drawings to make a correct identification down to the species. On the whole, the expert was positive about the result of the experiment, pointing out that he correctly identified 9 out of 13 present species, which is comparable to the results of identification experiments in diatom photographs achieved by human experts in ADIAC project.

Experiment 2

In this experiment we measured the accuracy of our model in identification experiments with photographs. The experiment consisted of identifying diatoms whose images were not used for constructing the model. We used the standard ``leave one out'' approach, where the model was trained on all the specimens apart from one and the remaining specimen was identified using the trained model. We repeated the experiment omitting each specimen out of the total 178. We compared the identification accuracy between a model trained on the shape and contour length data, a model trained on the texture data only, and a model trained on shape, texture and contour length data.

The error rate when using the external contour and length data was 19.66\%. For the texture data only, the error rate was 6.18\%. Using shape, texture and contour length data the error rate decreased to 3.37\%, which is a significant improvement to using either contour or texture data alone, and is similar to the error rate achieved in the ADIAC project in similar experiments. Note, however, the data set used in the ADIAC included a larger number of species, some of which had non-stria patterns.

We used several other standard classification methods on the same data set in leave-one-out experiments for comparison with our model. Using Rifkin's implementation of a support vector machine with a linear kernel gave us a classification error rate of 6.18\% on the normalised data, and a 19.1\% error rate was achieved using OC1 decision tree approach on the raw data without prior normalisation.

The identification experiments presented in this section showed that our system performed better than the general classification methods we have tested on our data, and achieved similar identification rates to the system developed specifically for diatom identification in ADIAC project.