Skip to content

ALIDA

Introduction

ALIDA is a Data Science & Machine Learning Platform designed to simplify the management, execution, and monitoring of data science and machine learning projects.
The platform offers an integrated environment that allows users to:

  • Manage datasets and machine learning models
  • Create, run, and monitor workflows based on microservices
  • Work with batch and streaming data—even within the same workflow
  • Facilitate scalability and traceability of data processes
  • Enable non-developers to build data analysis applications through an intuitive interface

User Types

ALIDA is designed for various types of users, each with their own roles and needs:

  • The Citizen User is someone without development or data science skills who uses the graphical interface to intuitively create, configure, and run workflows.
  • The Data Scientist uses ALIDA to develop advanced models, set up sophisticated analyses, and optimize ML pipelines.
  • The Data Engineer handles the integration of new data sources, stream management, and process optimization.
  • The Administrator manages users, permissions, and operational settings.
  • The Developer creates new microservices or integrates external APIs to extend ALIDA’s catalog.

Core Concepts

ALIDA is built around a few fundamental concepts:

  • Services are standalone microapplications that process inputs and produce outputs.
  • Workflows are sequences of services that work with data and models.
  • Assets are essential resources such as datasets, models, data sources, and the workflows themselves.

Each asset has an access level:

  • Private: visible only to the owner
  • Team: visible and editable by team members
  • Public: visible to everyone

Visibility rule: a “Team” workflow cannot use “Private” assets, and a “Public” workflow cannot use “Team” or “Private” assets.

Project

A Project is an organized workspace where users can gather and manage everything needed for a specific objective or use case. Like a well-organized desk, a Project allows users to:

  • Collect datasets, models, and workflows in a single space
  • Assign a meaningful name to the project
  • Quickly access everything required for a specific task

Practical example:
Project "Sales Forecast 2025", which includes sales datasets, regression models, and prediction workflows.

Workflow Designer

ALIDA offers a graphical interface for building workflows by connecting services.

Each service:

  • Can be connected to other services via edges
  • Can be linked to assets such as datasets or trained models
  • Accepts configurable parameters (e.g., the "K" value for a K-Means algorithm)
  • Can request specific resources (e.g., GPU for model training)

Designer

Note: Microservices can be developed in any language as long as they follow basic rules to ensure interoperability.

Workflow Execution

ALIDA allows users to:

  • Run a workflow manually via the "Run" button
  • Schedule periodic executions with cron expressions
  • Export the workflow in Docker Compose format for execution on a local machine or server
  • Select an execution cluster from a list of available federated Kubernetes clusters

Data Source

A data source in ALIDA is a metadata object containing information needed to connect to a storage system (e.g., URL, storage type, access keys, etc.).

Every registered user has a personal space for managing their resources. ALIDA provides by default:

  • Three MinIO-based data sources (Private, Team, Public access)
  • A Kafka-based data source for handling streaming data

Users can also create new data sources pointing to external storage systems, making data more accessible and manageable within the platform.

Data Source Management

Real-Time Media Notifications

The Real-Time Media Notification System enables users to monitor the status and progress of workflows.
Events are dynamic updates sent by services during execution to inform users of ongoing processes, intermediate results, or errors. This feature is crucial for:

  • Real-time workflow execution monitoring
  • Quick identification of errors
  • Viewing intermediate results without waiting for completion
  • Making informed decisions based on instant feedback

Architecture

  1. Each service can emit notifications during execution
  2. Notifications are sent via Kafka to specific topics
  3. A management system:
  4. Immediately forwards notifications to the browser using Server-Sent Events (SSE)
  5. Stores all notifications for later consultation

Note: No need to refresh the page to view new notifications.

Supported Notification Types

  • Execution logs
  • Images (e.g., charts, previews)
  • Updated parameters
  • Compressed files (.zip)
  • HTML files
  • Other useful content for monitoring or debugging

Practical example:
During K-Means model training, the service sends every 10 iterations: - A current clustering image - A log file with inertia values - A ZIP file with intermediate snapshots of the model
→ All visible in real-time via the browser.

Asset History

ALIDA logs the following for each generated dataset or model:

  • The workflow that produced it
  • The service involved in its creation
  • Parameters used
  • Storage location
  • File format and technical details

This ensures full traceability and reproducibility.

Practical example:
Model "Customer Segmentation 2025", stored in “S”, trained by workflow “X”, executed on dd/mm/yyyy at hh:mm:ss, using service “K-MEANS Y” (version “V”, created by user “U”) with parameters “A, B, and C”, and dataset “D”.

Scalability

Vertical

Services can request GPU resources.
ALIDA automatically deploys them to nodes equipped with GPUs.

Practical example:
Training an image-based neural network on a GPU node.

Horizontal

Intensive batch workflows can be distributed using Spark.
Processes are split into workers and deployed across multiple nodes for parallel processing.

Cluster Federation

ALIDA supports Cluster Federation (Beta):

  • A workflow can be executed on a federated secondary cluster via Liqo, enabling dynamic resource sharing between Kubernetes clusters.
  • Users select the target federated cluster during workflow execution, choosing from available namespaces.

Benefits: - Cross-cluster scalability - Execution closer to the data

REST API

ALIDA offers REST APIs to:

  • Create and modify assets
  • Manage projects
  • Dynamically update workflow parameters
  • Trigger programmatic executions
  • Run workflows with modified parameters

Practical example:
Via API (without manually editing the Designer), run workflow “X” after changing parameter “K” (number of clusters) from 3 to 5 in the “K-Means” service.

Architecture and Integration

ALIDA integrates several open-source tools to provide a complete data science environment:

  • MinIO for distributed storage of datasets and models
  • Kafka for real-time data streaming
  • Spark for distributed data processing
  • MLflow for experiment tracking and versioning
  • Seldon for production model deployment
  • Argo for workflow orchestration
  • Jupyter Notebook for interactive data analysis

The architecture is based on microservices deployed on Kubernetes.