Available for new projects

Aymane Maghouti

Data Engineer

I design scalable data platforms, build AI-driven pipelines, and deliver reliable, production-grade data solutions for enterprise environments.

2+ Years building
10+ Projects delivered
3 Cloud providers
Scroll
Aymane Maghouti
Data Engineering
AI Solutions

About me

Building data systems
that actually work

I'm Aymane, a data engineer specializing in the design and modernization of data platforms, real-time analytics frameworks, and AI-enabled information systems. I've contributed to large-scale transformation initiatives — from migrating critical enterprise data environments to building advanced retrieval pipelines and RAG systems for complex technical content.

I focus on robust, scalable, business-aligned solutions that improve reliability, enhance decision-making, and support long-term growth.

Location Toulouse, France
Phone +33 7 76 43 11 82
Languages French · English · Arabic

Career

What I've been
working on

Airbus
Current Data Analytics & Engineering

Data Analyst / Data Engineer · Airbus

Feb 2026 — Present · Toulouse · On-site

Contributing to industrial production data analytics on the Final Assembly Line (FAL), driving operational decision-making and KPI reporting within Airbus's Big Data ecosystem.

Skywise logo

Skywise Data Engineering & Analytics Platform

Data & Analytics Factory — MOM
  • Analyse des besoins métier et conception de workflows data end-to-end sur Skywise (Palantir)
  • Transformations sur Contour (jointures, expressions, filtrage) et développement de dashboards sur Slate
  • Développement de pipelines data no-code (Pipeline Builder) et code (Workbook PySpark) avec orchestration des dépendances
  • Skywise
  • Palantir
  • Contour
  • Slate
  • Pipeline Builder
  • PySpark

AI-Powered Knowledge Platform for Manufacturing Operations

Data & Analytics Factory — MOM
  • Identification d'un problème critique : les analysts attendaient des semaines pour des réponses techniques → réduit à quelques secondes grâce à l'assistant IA
  • Architecture RAG sur GCP (Vertex AI, Cloud Storage, Cloud SQL) pour exploiter documentations techniques et procédures métier
  • Pipelines end-to-end : ingestion, chunking sémantique, embeddings, indexation vectorielle et recherche hybride avec Gemini
  • Migration POC Streamlit → architecture production ReactJS + FastAPI, containerisée Docker, déployée sur Cloud Run
  • Intégration d'agents IA multi-tours et data model PostgreSQL pour l'observabilité complète
  • Python
  • Gemini
  • Vertex AI
  • RAG
  • ReactJS
  • FastAPI
  • Cloud SQL
  • Docker
  • Cloud Run
  • GCP
inwi
DW & BI

Data Platform Migration & Real-Time Analytics · INWI

Feb 2025 — Sep 2025 · Casablanca · On-site

Modernized INWI's enterprise data ecosystem by migrating from Oracle to Microsoft SQL Server and enabling real-time analytics.

  • Reduced infrastructure & licensing costs by 30% through Oracle → SQL Server migration.
  • Built automated SSIS ETL pipelines processing 50M+ rows/day, cutting runtimes from 2h → 4min.
  • Integrated Kafka + Python for real-time analytics (500K+ events/15min, <2s latency).
  • Improved storage efficiency with partition-based purge (350GB+ freed/month, jobs 6h → 1min).
  • Delivered BI reporting with Power BI & SSRS, reducing manual reporting by 70%.
Shiftbricks
Data Platform

Data Ingestion Pipeline for AI Application · Shiftbricks

Jun 2024 — Sep 2024 · Remote

Built a Medallion-architecture ingestion pipeline to transform 10K+ unstructured documents into structured datasets for downstream AI applications.

  • Integrated validation rules & schema checks, reducing ingestion errors by 35%.
  • Automated daily batch workflows with Apache Airflow (5+ DAGs, 99% job success rate).
  • Developed a FastAPI + React.js validation tool, cutting manual review time from 2h → 20min.
ONCF
BI & Data Warehouse

Data Analytics Intern · ONCF

Jul 2024 — Sep 2024 · Rabat · Hybrid

Contributed to the modernization of ONCF's analytics ecosystem by building scalable data ingestion pipelines, optimizing warehouse models, and delivering business dashboards.

  • Developed an automated Python pipeline orchestrating ingestion of 10M+ daily train circulation records.
  • Designed an optimized star schema, reducing analytical query latency by 60%.
  • Built interactive dashboards integrating business KPIs for operational decision-makers.

Public work

Workshops, Talks
& Collaborations

Talk
Sol Plaatje University Sol Plaatje University

AI Use Cases with Big Data Technologies

Covered data engineering fundamentals, modern platform architectures, and production-style project demos. Extended Q&A with 20+ students.

  • Data Engineering
  • Big Data
  • AI
  • Architecture
Collaboration
Sol Plaatje University Sol Plaatje University

Sign Language Recognition — Edge–Cloud Hybrid Architecture

Co-built a real-time recognition app. Data ops, labeling pipeline, React Native UI + FastAPI, Kafka + Spark Streaming + HDFS.

  • HDFS
  • Kafka
  • Spark Streaming
  • React Native
  • FastAPI
Research
UM6P UM6P — College of Computing

Forecasting Renewable Energy Sources with Statistical Methods

EDA on historical generation data, outlier detection, ARIMA/SARIMA/ARIMAX/SARIMAX benchmarking suite with weekly insights reporting.

  • Python
  • ARIMA
  • SARIMA
  • Time Series
Workshop
ENSAH ENSAH — Data Club

Hands-on ETL Workshop

Led 30+ participants through a full end-to-end project: collect → raw → extract → transform → data warehouse → Power BI dashboard.

  • Python
  • Pandas
  • PostgreSQL
  • ETL
  • Power BI
Training
ENSAH ENSAH — Data Club

ML Training: Logistic Regression to Production

From classification fundamentals to full model training, evaluation, and live deployment demo with Flask + HTML/CSS/JS web app.

  • ML
  • Classification
  • Python
  • Flask

Expertise

What I can
do for you

Data Engineering & Streaming

Scalable pipelines for batch and real-time data. CDC, streaming, event-driven architectures with strong quality and lineage.

  • Streaming
  • Transformation
  • Pipeline Orchestration
  • Big Data
📊

Data Warehousing & BI

Modern data warehouses with dimensional modeling. Powerful dashboards and KPIs that empower decision-makers.

  • ETL Pipelines
  • Data Modeling
  • Power BI
  • Cloud DW
🛠️

Data Migration & Governance

Complex migrations from legacy systems with minimal downtime. Metadata-driven governance for secure, cost-effective management.

  • Migration
  • Data Lineage
  • DB Optimisation
☁️

Cloud & MLOps

Deploy, monitor, and scale ML models and APIs in the cloud. CI/CD, observability, and automation to accelerate production workflows.

  • Azure
  • AWS
  • GCP
  • Docker
  • MLflow
🧩

Full-Stack Integrations

Complete solutions combining backend APIs with modern interfaces. Seamless integration between data systems and web/mobile applications.

  • FastAPI
  • Spring Boot
  • React
  • React Native

Tech stack

Skills &
Technologies

{ }

Programming & Frameworks

  • Python
  • Java
  • TypeScript
  • SQL
  • Bash
  • FastAPI
  • Spring Boot
  • React
  • React Native

Data Engineering & Streaming

  • Apache Kafka
  • Apache Spark
  • Airflow
  • Hadoop
  • Flink
  • DBT

Data Warehousing & BI

  • SQL Server
  • SSIS
  • SSRS
  • Power BI
  • Data Modeling
  • Data Lineage

Databases

  • PostgreSQL
  • SQL Server
  • Oracle
  • MySQL
  • MongoDB
  • ClickHouse

Machine Learning & AI

  • Regression
  • Classification
  • LLMs
  • Prompt Engineering
  • Vector Search
  • Hybrid Search

Cloud & MLOps

  • Azure
  • AWS
  • GCP
  • Snowflake
  • Docker
  • MLflow

Portfolio

Latest
Projects

A curated selection of data engineering, AI, and cloud projects — from end-to-end pipelines to scalable AI solutions.

Cloud · MLOps

ML-Based Solution with Microsoft Azure

Azure migration with FastAPI, Spring Boot, React Native, and Power BI. Integrated MLOps pipelines and cloud-native deployment.

  • Azure
  • MLflow
  • XGBoost
  • FastAPI
Streaming · CDC

Event-Driven Architecture with Debezium

Real-time CDC pipeline with Kafka, Spark Streaming, and Debezium. Captures DB changes and updates dashboards instantly.

  • Kafka
  • Spark Streaming
  • Debezium
  • CDC
AWS · Big Data

Smartphone Data Pipeline using AWS

Migrated on-premises big data stack to AWS. Automated ingestion with Glue, Athena, QuickSight, and S3.

  • AWS Glue
  • Athena
  • S3
  • QuickSight
Azure · BI

HR Data Pipeline with Azure

HR pipeline with ADF, Databricks, Blob Storage, and Power BI for comprehensive HR reporting.

  • ADF
  • Databricks
  • Blob Storage
  • Power BI
Big Data · Lambda

Smartphone Price Prediction — Big Data

Lambda architecture combining batch and real-time layers. Spark + Kafka with BI dashboards and HBase storage.

  • Kafka
  • Spark
  • HDFS
  • HBase
  • Airflow