Exploring Microbial Dark Matter: Outcomes of the FAIRyMAGs Hackathon 2025
𧬠Introduction
The biological world as we know it is largely composed of dark matter β microorganisms that cannot yet be cultivated in the laboratory. Modern sequencing technologies and bioinformatics tools now allow us to explore this hidden world by generating metagenome-assembled genomes (MAGs). This remains a challenging endeavor, and every bit of collaboration helps to deepen our understanding of these uncultured organisms.
From October 6β9, 2025, the FAIRyMAGs project (funded by ELIXIR) organized a hybrid, four-day hackathon as part of the ELIXIR BFSP Programme.
FAIRyMAGs project
The FAIRyMAGs project was started in January 2025 for the duration of 20 months until August 2026 and is led by Paul Zierep and BΓ©rΓ©nice Batut, in collaboration with four ELIXIR Nodes:
- ELIXIR France β BΓ©rΓ©nice Batut
- ELIXIR Germany β Paul Zierep
- ELIXIR Italy β Giuseppe Defazio and Bruno Fosso
- EMBL-EBI β Martin Beracochea and Santiago Sanchez
The FAIRyMAGs project aims to advance metagenomics research by developing, optimizing, evaluating, and disseminating robust FAIR workflows for building metagenome-assembled genomes (MAGs).
The project has already published an initial Galaxy MAGs workflow and successfully applied it to both CAMI benchmarking datasets and real-world use cases, including termite, cloud, marine, and bee microbiomes.
To further enhance the quality and reproducibility of MAG analyses, FAIRyMAGs is also improving and extending existing MAGs benchmarking frameworks by integrating the CAMI infrastructure into the Galaxy platform. The CAMI challenges provided simulated benchmark datasets for short and long reads, that can be used to benchmark assembly tools and binners used for MAGs workflows.
An additional focus of the project is the study of computational resource requirements for MAG construction. The team has investigated the resources needed for the assembly step of the workflow using data provided the MGnify team.
The FAIRyMAGs hackathon
The hackathon aimed to build on the preliminary work of the project, but also openly invited the MAGs community to discuss and hack on any ideas related to MAGs generation !
A total of 23 researchers from around the world participated β 8 gathering in Freiburg (Germany) and 6 in an Australian outpost (coordinated by Tiff Nelson), while the rest joined remotely.
Preparation included a shared Google Docs, spreadsheets, and slides for coordination. During the hackathon, participants joined joint sessions between Australia and Europe in the mornings and Europe-focused working groups in the afternoons. Communication flowed via a dedicated Slack channel in the Galaxy Training Network (GTN).
Hackathon summary
During the hackathon, researchers worked independently on various goals related to MAGs workflows, depending on their background and expertise. Researchers familiar with Galaxy updated and added Galaxy tools, and modified and extended the preliminary MAGs workflow. Some participants focused on individual MAGs generation projectsβfor example, Stefan Kranz adapted the workflow to support long-read input. Impressively, the full modification was completed in just one hour, showcasing the efficiency of the Galaxy workflow editor.
Beyond hands-on development, the hackathon also included multiple discussion sessions that extended beyond Galaxy-centric topics. One such discussion focused on the potential to predict the computational resources required for MAGs workflow tools based on input data characteristics and metadata. Reducing resource demands could have significant environmental benefits, given the scale of MAGs-related analyses.
Another collaborative effort among MAGs developers from different communities centered on establishing shared benchmark and CI-testing datasets. As part of this, core developers from the nf-core, MGnify, and Galaxy MAGs workflows initiated the first steps toward a benchmark focusing on complete MAGs workflows.
The team also began developing detailed training materials on MAGs generation. The broader community was invited to contribute their own resources via the Galaxy Training Network (GTN), which supports training materials both within and beyond the Galaxy ecosystem. For example, MGnify plans to develop training modules on MAGs submission using various approachesβsuch as command-line, front-end submission, and Galaxy-integrated tools.
π Highlight Outcomes
All progress was tracked in the π FAIRyMAGs Hackathon β Coordination & Tracking Sheet. Some of the highlight outcomes are summarized here:
βοΈ Enhancing FAIR MAGs Building Workflows
π§ Tool Updates
- SemiBin2 version update β PR #7347
- COMEBin, a new high-performance binner β PR #7285
- MaAsLin3, for downstream differential analysis β PR #7263
ποΈ Database Updates
π§© Workflow Improvements
- Quality control and trimming workflow β PR #976
- Host/contamination removal (long & short reads) β PR #991
-
Main MAGs workflow update β PR #975
- Added user-friendly workflow annotations
- Sample grouping subworkflow update β View on Galaxy to fix bugs in subworkflows
- Fallback workflow to recover MAGs when tools fail β View on Galaxy
- Adapted workflow for long reads tested on ONT samples (thanks to Stefan Kranz)
- Visualization plots collection for advanced MAGs exploration β Issue #54
π MAGs Visualization
- Plots modularized for reusability
- Discussions on multi-sample binning based on Han et al, Nature Communications, 2025
-
MAGs submission workflow: WIP in collaboration with the MGnify team:
π Galaxy Server Adaptation
- Added required tools and databases to usegalaxy.org.au, expanding FAIRyMAGs access to Australian researchers.
π Developing User-Friendly Training Materials
- Dataset identification is in progress (blocked by subworkflow bug β WIP)
- Started a comprehensive tutorial to run the full workflow on short & long reads β HackMD draft
π§ Learning Pathway with step-by-step tutorials
-
Updated tutorials to include missing tools:
-
New tutorial:
- Preprocessing for Group Assignment and Co-Assembly β PR #6416
π Advancing Workflow Evaluation & Benchmarking
Using CAMI infrastructure and real datasets, the group worked to standardize benchmarking for MAG workflows.
Benchmarking Progress
- MGnify, nf-core/mag, and Galaxy teams aligned on common benchmarking datasets.
- Started discussion with the MAGNETO developers for benchmark alignment.
-
Agreed datasets:
- CAMI II plant-associated dataset β Dataset link
- Loman Lab Mock Community Experiments β Mock dataset
- Improved visualizations for CAMI II benchmarks based on amber - a dedicated MAGs benchmarking tool, originally developed for the CAMI challanges β Issue #66
- Added amber CAMI workflow to IWC β PR #924
Use Cases
- Marine MAGs exploration β high-quality MAGs identified for the Marine use case
MAGs of the marine use case: Completeness, Contamination and Average Gene Length.
- Termite related MAGs discussions β attempt to recover more MAGs via group-assembly strategy.
- Progress on the Cloud use case β recovered mid-quality MAGS via the fallback workflow.
βοΈ Building Smarter Resource Estimation Tools
- Collected binning tool performance data from MGnify β PR #75
- Evaluated correlation between sample metadata and memory usage β PR #68
Correlation between sample metadata and memory usage
A key challenge identified:
Galaxy cannot yet use workflow-generated parameters for dynamic resource assignment. This limits automatic resource prediction, but workarounds and future solutions are under discussion with Galaxy core developers.
π¬ Summary and Outlook
The FAIRyMAGs hackathon brought together a great community of MAGs scientists. And although the hackathon was slightly Galaxy-centric, a nice exchange beyond workflow engines could be established, targeting major MAGs issues together!
FAIRyMAGs Hackers
To continue these exchanges, we created a dedicated community Slack channel for MAGs workflows, which will be used to continue the discussions, and openly invites anyone working on MAGs. The Slack channel is part of the microbioinfo workspace. If you want to join, contact us and we will send an invite!