ALIDA
Introduction
ALIDA is a Data Science & Machine Learning (DSML) Platform designed to simplify the management, execution, and monitoring of data science and machine learning projects. The platform offers an integrated environment that allows you to:
- Manage datasets and machine learning models.
- Create, execute, and monitor workflows based on microservices.
- Work with batch and streaming data even within the same workflow.
- Facilitate scalability and traceability of processing.
- Allow non-development experts to create data analysis applications.
User Types (Actors)
ALIDA is designed for different types of users, each with their own role and needs:
- Citizen: is someone who, without being a developer or data scientist, uses the graphical interface to create, configure, and execute workflows intuitively.
- Data Scientist: uses ALIDA to develop advanced models, set up sophisticated analyses, and optimize machine learning workflows.
- Data Engineer: is responsible for integrating new data sources, managing data flows, and optimizing processes.
- Administrator: manages users, permissions, and configures the platform's operational parameters.
- Developer: creates new microservices or integrates external APIs, extending the ALIDA catalog.
Basic Concepts
In ALIDA everything revolves around some fundamental concepts:
- Services are independent micro-applications that process input and produce output.
- Workflows are sequences of Services that use data and models.
- Assets represent fundamental resources such as datasets, models, data sources, and Workflows themselves.
Each Asset has a level of access:
- Private: visible and editable only by the owner
- Team: visible and editable by team members
- Public: visible to everyone
and is subject to the following visibility rule:
- Workflows "Team" cannot use "Private" assets
- Workflows "Public" cannot use "Team" or "Private" assets
Project
A Project is an organized workspace where the user can collect and manage all the elements needed for a specific objective or presentation. Like a well-organized desk, a Project allows you to:
- Collect datasets, models, and workflows in a single space
- Give a meaningful name to the project
- Quickly access everything needed for a specific use case
Practical Example
Project "Sales Forecast 2025" that collects sales datasets, regression models, and prediction workflows.
Workflow Designer
ALIDA offers a graphical interface for building Workflows using Datasets, Services, and Models.
Each Service:
- Can be connected to other Services through arcs
- Can be connected to other assets such as Datasets and trained Models
- Receives configurable parameters (e.g., the value of "K" for a K-Means)
- Can request specific resources (e.g., GPU for training)
Workflow Execution
ALIDA allows you to:
- Manually execute a Workflow
- Schedule periodic executions with Cron expressions
- Export the Workflow as Docker Compose for execution via Docker (on a local machine, server, etc...)
Datasource
A Datasource in ALIDA is a metadata that contains useful information for connecting to a storage (e.g., URL, storage type, access keys, etc...)
Every user registered on the platform has a dedicated personal space for managing their resources. Specifically, each user will have by default:
- A Datasource based on MinIO with Private access level
- A Datasource based on MinIO for each team membership (if any)
- Access to the Public Datasource
The user is free to create new DataSources, also pointing to external storage, in order to make their data visible and easily manageable within the platform.
For more information visit Asset > Data Source
Notification System
The notification system allows users to monitor the progress and status of their processing in real time via Events.
An Event is a dynamic update sent by the generic Service during execution to inform the user about the status of:
- Processes
- Processing
- Any intermediate results or errors
enabling crucial activities such as:
- Monitoring the execution status of Workflows
- Quickly identifying problems or errors during processing
- Viewing intermediate results without waiting for completion
- Making informed decisions based on immediate feedback
Notification system architecture
- Each Service can emit notifications during execution
- Notifications are sent via Kafka on specific topics
- A management system:
- Forwards notifications to the browser in push mode
- Saves all notifications in the catalog for later consultation
Supported notification types
- Execution logs
- Images (e.g., graphs, previews)
- Updated parameters
- Compressed files (e.g., zip)
- HTML files
- Other useful content for monitoring or debugging
Practical Example
During the training of a K-Means model, the Service sends every 10 iterations:
- An image of the current clustering
- A log file with the cost (inertia) values
- A ZIP file with intermediate model snapshots
The user sees everything in real time, directly from the browser.
Asset History
ALIDA records for each dataset or model produced:
- Workflow that generated it
- Service that processed it
- Parameters used
- Storage location
- Technical format and characteristics
This allows for complete auditing and reproducibility of processes.
Practical Example
Model "Customer Segmentation 2025" saved in storage “S”, trained by workflow “X”, with execution date/time dd/mm/yyyy hh:mm:ss, from service K-MEANS “Y” (version “V” created by user “U”) with parameters “A,B and C”, using datasets “D” etc …
Scalability
Vertical
Services can request GPUs.
ALIDA can schedule deployment on nodes with GPUs.
Practical Example
Training a neural network on images, deployed on a GPU node.
Horizontal
Intensive batch Workflows can be distributed via Spark.
Processes are divided into "workers" and deployed on different nodes of the cluster for parallel processing.
Architecture and Integration
ALIDA leverages several Open Source tools to provide a complete Data Science environment:
- MinIO for distributed storage of datasets and models
- Kafka for managing streaming data
- Spark for distributed data processing
- Seldon for deploying models in production
- Argo for Workflow orchestration
- Jupyter Notebook for interactive analysis
The architecture is based on microservices deployed on Kubernetes.