Poster Presentation 2024 Australian Marine Sciences Association Annual Meeting combined with NZMSS

Marimba: A Python Framework for FAIR Marine Image Dataset Processing (#706)

Chris Jackett 1 , Kevin Barnard 2 , Nick Mortimer 1 , David Webb 1 , Franzis Althaus 1 , Bec Gorton 1 , Aaron Tyndall 1 , Candice Untiedt 1 , Ben Scoulding 1
  1. CSIRO, Battery Point, TASMANIA, Australia
  2. MBARI, Moss Landing, California, USA

The exploration and monitoring of marine environments are increasingly facilitated by advancements in underwater imaging technologies. However, the vast and growing volumes of captured image data pose significant challenges in terms of data management, processing, standardisation and dissemination. Collaboratively developed by CSIRO and MBARI, Marimba is a Python framework designed for the processing and FAIR-isation of marine image datasets. Marimba offers a robust CLI and API that provides flexible user interaction and scriptability for scientific imaging projects. It introduces three core constructs: Project, Pipeline, and Collection, to standardise the management, processing, and distribution of FAIR-compliant marine image datasets. Projects encapsulate the processing workflow; Pipelines provide isolated environments for data processing stages; Collections group data for pipeline processing. Marimba supports project structuring, file and metadata management, compliance with the iFDO standard, and provides a standard library for common image and video processing capabilities by leveraging additional libraries. Dataset packaging features include comprehensive processing logs that capture the full dataset processing provenance, file manifest generation, and dataset statistics summaries. For dataset distribution, Marimba supports uploading to S3 buckets, managing the entire lifecycle of marine image dataset processing and sharing, in alignment with FAIR principles.