ALIDA

Introduction

ALIDA is a Data Science & Machine Learning (DSML) Platform designed to simplify the management, execution, and monitoring of data science and machine learning projects. The platform offers an integrated environment that allows you to:

Manage datasets and machine learning models.
Create, execute, and monitor workflows based on microservices.
Work with batch and streaming data even within the same workflow.
Facilitate scalability and traceability of processing.
Allow non-development experts to create data analysis applications.

User Types (Actors)

ALIDA is designed for different types of users, each with their own role and needs:

Citizen: is someone who, without being a developer or data scientist, uses the graphical interface to create, configure, and execute workflows intuitively.
Data Scientist: uses ALIDA to develop advanced models, set up sophisticated analyses, and optimize machine learning workflows.
Data Engineer: is responsible for integrating new data sources, managing data flows, and optimizing processes.
Administrator: manages users, permissions, and configures the platform's operational parameters.
Developer: creates new microservices or integrates external APIs, extending the ALIDA catalog.

Basic Concepts

In ALIDA everything revolves around some fundamental concepts:

Services are independent micro-applications that process input and produce output.
Workflows are sequences of Services that use data and models.
Assets represent fundamental resources such as datasets, models, data sources, and Workflows themselves.

Each Asset has a level of access:

Private: visible and editable only by the owner
Team: visible and editable by team members
Public: visible to everyone

and is subject to the following visibility rule:

Workflows "Team" cannot use "Private" assets
Workflows "Public" cannot use "Team" or "Private" assets

Project

A Project is an organized workspace where the user can collect and manage all the elements needed for a specific objective or presentation. Like a well-organized desk, a Project allows you to:

Collect datasets, models, and workflows in a single space
Give a meaningful name to the project
Quickly access everything needed for a specific use case

Practical Example

Project "Sales Forecast 2025" that collects sales datasets, regression models, and prediction workflows.

Workflow Designer

ALIDA offers a graphical interface for building Workflows using Datasets, Services, and Models.

Designer

^{Workflow Creation with Dataset and Service}

Each Service:

Can be connected to other Services through arcs
Can be connected to other assets such as Datasets and trained Models
Receives configurable parameters (e.g., the value of "K" for a K-Means)
Can request specific resources (e.g., GPU for training)

Workflow Execution

ALIDA allows you to:

Manually execute a Workflow
Schedule periodic executions with Cron expressions
Export the Workflow as Docker Compose for execution via Docker (on a local machine, server, etc...)

Datasource

A Datasource in ALIDA is a metadata that contains useful information for connecting to a storage (e.g., URL, storage type, access keys, etc...)

Every user registered on the platform has a dedicated personal space for managing their resources. Specifically, each user will have by default:

A Datasource based on MinIO with Private access level
A Datasource based on MinIO for each team membership (if any)
Access to the Public Datasource

The user is free to create new DataSources, also pointing to external storage, in order to make their data visible and easily manageable within the platform.

For more information visit Asset > Data Source

Notification System

The notification system allows users to monitor the progress and status of their processing in real time via Events.

An Event is a dynamic update sent by the generic Service during execution to inform the user about the status of:

Processes
Processing
Any intermediate results or errors

enabling crucial activities such as:

Monitoring the execution status of Workflows
Quickly identifying problems or errors during processing
Viewing intermediate results without waiting for completion
Making informed decisions based on immediate feedback

Notification system architecture

Each Service can emit notifications during execution
Notifications are sent via Kafka on specific topics
A management system:
- Forwards notifications to the browser in push mode
- Saves all notifications in the catalog for later consultation

Supported notification types

Execution logs
Images (e.g., graphs, previews)
Updated parameters
Compressed files (e.g., zip)
HTML files
Other useful content for monitoring or debugging

Practical Example

During the training of a K-Means model, the Service sends every 10 iterations:

An image of the current clustering
A log file with the cost (inertia) values
A ZIP file with intermediate model snapshots

The user sees everything in real time, directly from the browser.

Asset History

ALIDA records for each dataset or model produced:

Workflow that generated it
Service that processed it
Parameters used
Storage location
Technical format and characteristics

This allows for complete auditing and reproducibility of processes.

Practical Example

Model "Customer Segmentation 2025" saved in storage “S”, trained by workflow “X”, with execution date/time dd/mm/yyyy hh:mm:ss, from service K-MEANS “Y” (version “V” created by user “U”) with parameters “A,B and C”, using datasets “D” etc …

Scalability

Vertical

Services can request GPUs.

ALIDA can schedule deployment on nodes with GPUs.

Practical Example

Training a neural network on images, deployed on a GPU node.

Horizontal

Intensive batch Workflows can be distributed via Spark.

Processes are divided into "workers" and deployed on different nodes of the cluster for parallel processing.

Architecture and Integration

ALIDA leverages several Open Source tools to provide a complete Data Science environment:

MinIO for distributed storage of datasets and models
Kafka for managing streaming data
Spark for distributed data processing
Seldon for deploying models in production
Argo for Workflow orchestration
Jupyter Notebook for interactive analysis

The architecture is based on microservices deployed on Kubernetes.