Summarising plankton morphological diversity
Following their previous work on a framework to project plankton samples onto a plane using deep learned features and t-SNE dimensionality reduction, Antonio Goulart, Alexandre Morimitsu and Nina Hirata (Brazil) are extending their investigation. They submitted a paper on the assessment of the quality of projections considering: different convolutional networks; samples from a similar and also from a distinct distribution then the one used in training; and features from different layers of the networks. The Silhouette Score is used as an objective metric to complement their subjective impressions about the distribution of the various shapes of planktonic organisms in the t-SNE space (as illustrated here). The framework is the backend of a dataset labelling/exploration tool under development by the Brazilian partners of WWWPIC.
The glider transects end, on a high note
In February, Thelma Panaiotis, PhD student at the Laboratoire d'Océanographie de Villefranche (France), started to coordinate the deployment of a SeaExplorer glider equipped with the latest generation of the Underwater Vision Profiler (UVP6), as part of WWWPIC. The UVP camera took 1,123,123 pictures of planktonic objects (no we did not pick that number on purpose; pure chance!). The images have been imported into EcoTaxa and are currently being sorted.
These 5 months of almost continuous sampling along a transect crossing the Ligurian Current will allow us to describe the dynamics of the phytoplankton bloom and the response of zooplankton to these seasonal changes, all in situ, in an open sea environment.
Zooscanning of deep plankton samples
Samples have been brought back from a cruise in the South-East Pacific Gyre and Shino Yamamoto (Japan) is processing them on the Zooscan system. These samples will be used for assessing the effects of ocean desertification on the planktonic community from 0 m to 1500 m depth along a transect from Chile to Easter Island.
Taxonomy from images
The taxonomy of plankton has been a constant field of study for over two centuries but, with the advent of quantitative plankton imaging instruments, the way we see those organisms has changed. This calls for a new effort, to translate the taxonomic knowledge acquired through a microscope to the, often grayscale, still images that these instruments create.
At JAMSTEC, Japan (pictured here) and in Villefranche, France, this leads to regular taxonomy meetings during which the taxonomists gather around a screen, looking at some images, and trying to define clear criteria for taxonomic assignments to given groups.
A remote Aquatic Sciences Meeting
The 2021 Aquatic Sciences Meeting, of ASLO, was planned in Palma de Majorque but, as expected, switched to be yet another remote meeting. The organisation was adapted with pre-recorded talks uploaded in advance and only short presentations, followed by discussions, during the actual meeting hours.
A session was dedicated to plankton imaging and was co-organised by Rainer Kiko, of the French WWWPIC team, and Rubens Lopes, of the Brazilian team. It was a great success with 18 talks, spanning three time slots instead of just one. Thelma Panaiotis (pictured) presented her work on image classification which will become the backbone of the deep learning part of EcoTaxa in the near future.
First flow of data from EcoTaxa to OBIS
One of the main goals of WWWPIC is to enable the interoperability of EcoTaxa with other databases. And the main database we targeted as OBIS, the Ocean Biodiversity Information System. OBIS is the largest database of occurences of marine organisms and is used by a huge number of scientists worldwide, for biogeography studies in particular. The input format for OBIS is a DarwinCore Archive (DwCA), which references standard terms defined by the BODC vocabularies. While the use of such standards is unavoidable for interoperability, these files can be cumbersome to prepare for scientists who are not well versed in these technicalities (i.e. almost all scientists!). But now EcoTaxa is here to help: it can prepare the appropriately formated DwCA file for you!
Through an additional grant by EMODnet Biology, the French team, under the supervision of Amanda Elineau, was able to quality control ten plankton datasets, totaling over 3 million images. Then, those where formated as DwCA, by Laurent Salinas, through the new functionality he developped as part of WWWPIC. Now they are live, on OBIS (the picture on the left shows the Tara Oceans data, which was part of this package).
More ISIIS data from the California Current
Recently, the Oregon State University (USA) Plankton Ecology Lab, deployed the In-situ Ichthyoplankton Imaging System (ISIIS) in the Northern California Current as part of NOAA’s Northern California Current (NCC) cruise aboard NOAAS Bell M. Shimada. NOAA’s NCC cruises occur on a regular basis and to characterize the planktonic ecosystem from northern California to Washington, with a focus on the Newport Hydrographic Line. Moritz Schmid and OSU plankton lab student Margaret Martinez (pictured next to ISIIS here) collected plankton imagery for the Belmont-funded WWWPIC project as well as the NCC Marine Biodiversity Observation Network (MBON) funded by NASA.
Fast data analytics for a billion plankton images
The webserver frontend of the OmniSci databse, set up at OSU (USA), makes it easy and fast to search for different taxa as well as limiting a search to a certain geographic region and narrowing down results by metadata such as environmental data (e.g., oxygen concentration associated with a picture, depth) or time of day. With all data in RAM, multi-queries over 1 billion images take in the low millisecond range (< 7 ms). The frontend is a work in progress.
The ISIIS processing pipeline is live
Due to the sheer quantity of data collected by ISIIS (ca. 75-100 million images per 7h of ISIIS deployment) OSU’s Plankton Ecology Lab (USA) started collaborating with the National Science Foundation’s Extreme Science and Engineering Discovery Environment (XSEDE), a nationwide supercomputing cluster. With segmentation, sizing, and classification now happening on remote XSEDE hardware, our pipeline can process 7h of ISIIS imagery in ca 17 hours (that includes data upload and pipeline steps mentioned above). The pipeline code to train a Convolutional Neural Network (CNN) and setup the pipeline on an HPC was recently open-sourced together with extensive documentation (Schmid et al. 2021). After all data are processed they are merged with metadata and stored in an OmniSciDB-powered database also capable of serving data via a webserver. Efforts to harmonize OmniSciDB with EcoTaxa are underway.
EcoTaxa to be hosted by IFREMER, for the EU
One goal of WWWPIC is to implant our applications on robust, public infrastructures. This is essential for the smooth operation and long term support of our work. It was planned that an instance of EcoTaxa would be hosted in France but serve the whole European Union, because there is value in storing images in a centralised place. Indeed, it creates a community of researchers, allows to share training datasets for machine learning, and ... is easier to maintain!
Initially, the plan was to host it on the infrastructure of the Institut Français de Bioinformatique (IFB) but several reasons made this solution not practical. In the meantime, IFREMER, the French national agency for ocean sciences and fisheries, developed its Datarmor data center (pictured) and took a leadership role in the provision of data in the EU. Some researchers there used EcoTaxa and pushed for its adoption as a standard tool. Thanks to this alignment of favourable conditions, and collaboration through other projects, it has just been decided that IFREMER will host EcoTaxa. There is a lot of work to be done to adapt the code to such a large infrastructure and we aim for the end of 2021 to achieve this transition.
Prototype of an active learning interface for plankton images
A common problem in plankton classification endeavours is to get training sets that are representative of the full diversity of the underlying images. Indeed, plankton images are dominated by a few categories but the old looking and interesting ones are rare, requiring considerable to sort the rest in search for them.
Active learning is a procedure where an algorithm described the full visual diversity of a dataset, represents this in an esy to manage 2D space (through a dimensionality reduction such as t-SNE), then a human can pick a few representative examples covering that space and capture much of the diversity with limited effort. This "human in the loop" approach requires efficient user interfaces and Renan Jacomassi (Brazil) is prototyping one.
Machine learning training sets from ROV videos
Work has been progressing, in Japan, on creating tools for annotating videos recorded by a Remotely Operated Vehicle (ROV), through the SQUIDLE+ tool, to provide still images of plankton for training machine learning algorithms on reflected light, colour images. It is now possible to draw polygons around organisms and to link objects present in the video feeds of multiple cameras.
Glider transects to study the plankton bloom
Plankton imaging instruments deployed in situ can give us insights on the fine scale distribution and dynamics that probably no other technology can. However, most of these instruments need to be deployed from a ship, by human operators. This limits their reach.
The sixth version of the Underwater Vision Profiler (UVP6) was designed to tackle this very challenge: it is small an power-efficient enough to be deployed from autonomous instruments such as gliders and floats. It was fitted in the nose of the SeaExplorer glider (as pictured on the left) and, starting now, will be deployed on a transect in front of Villefranche-sur-Mer, as part of one of WWWPIC's scientific demonstrators. The goal is to study the fine scales of the spring plankton bloom.