Welcome to the Galaxy for research in AI, statistics and prediction hub

Galaxy for research in AI, statistics and prediction

The Galaxy for research in AI, statistics and prediction is a hub of tools, workflows and training materials for dedicated machine learning tasks such as data preprocessing, classification, regression, clustering, fine-tuning biological foundation models and visualisation to achieve end-to-end varied machine learning analyses. The complete set of resources are available on the Galaxy platform, which guarantees simple access, easy extension, flexible adaptation to personal needs, accelerated model training, and sophisticated and reproducible machine learning analyses independent of command-line knowledge.

The hub provides you with a Swiss Army knife of Scikit-learn, Keras (a deep learning library based on TensorFlow), PyTorch and various other tools to transform, learn and predict and plot your data.

The hub is mainly developed by the Goecks Lab and the European Galaxy project. The German Network for Bioinformatics Infrastructure (de.NBI), which runs the German ELIXIR Node, provides the necessary compute clusters with CPUs and GPU resources.

This project is a community effort, so feel free to jump in, ask questions, and contribute to the development of new tools, workflows, training materials, and research. Learn, share ideas, and grow with the community along the way!

Content

Get started

Are you new to Galaxy, or returning after a long time, and looking for help to get started? Take a guided tour through Galaxy’s user interface.

Training

We are passionate about training. So we are working in close collaboration with the Galaxy Training Network (GTN) to develop training materials of data analyses based on Galaxy (Batut et al. 2017). These materials hosted on the GTN GitHub repository are available online at https://training.galaxyproject.org.

Want to learn more about machine learning? Take one of our guided tours or check out the following hands-on tutorials, developed together with the GTN “Statistics and machine learning” community.

LessonSlidesHands-onInput datasetWorkflows
Introduction to machine learning
Classification
Multi-omics classification (Flexynesis)
Regression
Clustering
Deep Learning (Part 1) - Feedforward neural networks (FNN)
Deep Learning (Part 2) - Recurrent neural networks (RNN)
Deep Learning (Part 3) - Convolutional neural networks (CNN)
GLEAM Image Learner - Validating Skin Lesion Classification
Fine-tune biological foundation model (protein language models)

Available tools

In this section we list the most important tools that have been integrated into the machine learning hub. There are many more tools available so please have a more detailed look at the tool panel. For better readability, we have divided them into categories.

Classification

Identifying which category an object belongs to.

ToolDescriptionReference
flexynesisFlexynesis: deep learning tool for multi-omics dataUyar et al. 2025
tabpfnTabular data classification using TabPFNHollmann et al. 2025
sklearn_svm_classifierSupport vector machines (SVMs) for classificationPedregosa et al. 2011
sklearn_nn_classifierNearest Neighbors ClassificationPedregosa et al. 2011
sklearn_ensembleEnsemble methods for classification and regressionPedregosa et al. 2011
sklearn_discriminant_classifierLinear and Quadratic Discriminant AnalysisPedregosa et al. 2011
sklearn_generalized_linearGeneralized linear models for classification and regressionPedregosa et al. 2011
sklearn_clf_metricsCalculate metrics for classification performancePedregosa et al. 2011

Regression

Predicting a continuous-valued attribute associated with an object.

ToolDescriptionReference
tabpfnTabular data regression using TabPFNHollmann et al. 2025
sklearn_ensembleEnsemble methods for classification and regressionPedregosa et al. 2011
sklearn_generalized_linearGeneralized linear models for classification and regressionPedregosa et al. 2011
sklearn_regression_metricsCalculate metrics for regression performancePedregosa et al. 2011

Unsupervised/Clustering

Automatic grouping of similar objects into sets.

ToolDescriptionReference
flexynesisFlexynesis: deep learning tool for multi-omics dataUyar et al. 2025
sklearn_numeric_clusteringDifferent numerical clustering algorithmsPedregosa et al. 2011

Model building

Building general machine learning models.

ToolDescriptionReference
sklearn_estimator_attributesEstimator attributes to get all attributes from an estimator or scikit objectPedregosa et al. 2011
sklearn_stacking_ensemble_modelsStacking Ensembles to build stacking, voting ensemble models with numerous base optionsPedregosa et al. 2011
sklearn_searchcvHyperparameter Search performs hyperparameter optimization using various SearchCVsPedregosa et al. 2011
sklearn_build_pipelinePipeline Builder as an all-in-one platform to build pipeline, single estimator, preprocessor and custom wrappersPedregosa et al. 2011

Model evaluation

Evaluation, validating and choosing parameters and models.

ToolDescriptionReference
sklearn_model_validationModel Validation includes cross_validate, cross_val_predict, learning_curve, and morePedregosa et al. 2011
sklearn_pairwise_metricsEvaluate pairwise distances or compute affinity or kernel for sets of samplesPedregosa et al. 2011
sklearn_train_test_evalTrain, Test and Evaluation to fit a model using part of dataset and evaluate using the restPedregosa et al. 2011
model_predictionModel Prediction predicts on new data using a pre-fitted modelChollet et al. 2011
sklearn_fitted_model_evalEvaluate a Fitted Model using a new batch of labeled dataPedregosa et al. 2011
sklearn_model_fitFit a Pipeline, Ensemble or other models using a labeled datasetPedregosa et al. 2011

Preprocessing and feature selection

Feature selection and preprocessing.

ToolDescriptionReference
cleanlabDetect and optionally clean data issues using CleanlabNorthcutt et al. 2021
sklearn_data_preprocessPreprocess raw feature vectors into standardized datasetsPedregosa et al. 2011
sklearn_feature_selectionFeature Selection module, including univariate filter selection methods and recursive feature elimination algorithmPedregosa et al. 2011

Deep learning

Build and use deep neural networks.

ToolDescriptionReference
image_learnerImage Learner: image classificationKhai Van Dang et al. 2026
keras_batch_modelsBuild Deep learning Batch Training Models with online data generator for Genomic/Protein sequences and imagesChollet et al. 2011
keras_model_builderCreate deep learning model with an optimizer, loss function and fit parametersChollet et al. 2011
keras_model_configCreate a deep learning model architecture using KerasChollet et al. 2011
keras_train_and_evalDeep learning training and evaluation either implicitly or explicitlyChollet et al. 2011

Visualization

Plotting and visualization.

ToolDescriptionReference
plotly_regression_performance_plotsPlot actual vs predicted curves and residual plots of tabular data
plotly_ml_performance_plotsPlot confusion matrix, precision, recall and ROC and AUC curves of tabular data
ml_visualization_exMachine Learning Visualization Extension includes several types of plotting for machine learningChollet et al. 2011

Utilities

General data and table manipulation tools.

ToolDescriptionReference
table_computeThe power of the pandas data library for manipulating and computing expressions upon tabular data and matrices.
datamash_opsDatamash operations on tabular data
datamash_transposeTranspose rows/columns in a tabular file
sklearn_sample_generatorGenerate random samples with controlled size and complexityPedregosa et al. 2011
sklearn_train_test_splitSplit Dataset into training and test subsetsPedregosa et al. 2011

Interactive Environments

You have done the heavy lifting and now want to use your coding skills inside Jupyter or RStudio? Work on data with the following:

ToolDescriptionReference
GPU-enabled JupyterLabGPU-enabled JupyterLab
JupyterJupyter lab
RStudioRStudio

Contributors

Our Data Policy

Registered UsersUnregistered UsersFTP DataGDPR Compliance
For all different storage options, for detailed explanations of data retention policies, and ways how to increase your quota please refer to our dedicated storage site.Processed data will only be accessible during one browser session, using a cookie to identify your data. This cookie is not used for any other purposes (e.g. tracking or analytics). If UseGalaxy.eu service is not accessed for 90 days, those datasets will be permanently deleted.Any user data uploaded to our FTP server should be imported into Galaxy as soon as possible. Data left in FTP folders for more than 3 months, will be deleted.The Galaxy service complies with the EU General Data Protection Regulation (GDPR). You can read more about this on our Terms and Conditions.