Hi, I’m Aymane Maghouti

Data Engineer · Big Data · AI Solutions

I migrate data platforms and build AI-powered data pipelines on Azure/AWS/GCP. Currently pursuing an M2 in Complex Systems Engineering at EILCO (Calais).

Aymane Maghouti portrait

ABOUT ME

Hello there!

Data Engineer · Big Data · AI Solutions

I’m Aymane, a data engineer focused on platform migration, real-time analytics, and AI-powered data pipelines. I’ve contributed to an Oracle → SQL Server migration for INWI using SSIS/SSAS/SSRS/Power BI and build RAG/semantic pipelines for Arabic legal content with Azure Services and GPT LLM. I enjoy turning complex data problems into scalable, cloud-native solutions that enable faster decisions.

Full Name: Aymane Maghouti
Phone: +33 7 76 43 11 82
Location: Calais (62100), France

Let's discover more


What I’ve been working on

  1. Freelance Data & AI Engineer · Shiftbricks

    Feb 2025 — Now · Part-time · Remote
    Data & AI

    Designed and deployed AI-powered pipelines for large-scale Arabic legal document processing.

    • Processed 50K+ legal documents with scalable digitization & retrieval workflows.
    • Built ingestion & enrichment pipelines using Azure Document Intelligence, Cosmos DB, PostgreSQL.
    • Implemented embeddings with Azure OpenAI & Cohere, boosting semantic search precision to 80%+.
    • Automated metadata extraction & hierarchical structuring (manual effort reduced 2h → 15min).
    • Contributed to a cloud-native knowledge platform with 99.9% uptime, scaling to 1M+ docs.
    • Azure Services
    • OpenAI/Cohere
    • CosmosDB (MongoDB)
    • PostgreSQL
    • Python
    • BeautifulSoup
    • FastAPI
    • React.JS
    • Prompt Engineering
  2. Data Platform Migration & Real-Time Analytics · SetGet Consulting (INWI)

    Feb 2025 — Jun 2025 · Casablanca · On-site
    DW & BI

    Modernized INWI’s enterprise data ecosystem by migrating from Oracle to Microsoft SQL Server and enabling real-time analytics.

    • Reduced infrastructure & licensing costs by 30% through Oracle → SQL Server migration.
    • Built automated SSIS ETL pipelines processing 50M+ rows/day, cutting runtimes from 2h → 4min.
    • Integrated Kafka + Python for real-time analytics (500K+ events/15min, <2s latency).
    • Improved storage efficiency with partition-based purge (350GB+ freed/month, jobs 6h → 1min).
    • Built a monitoring system (SQL Server views + SSRS dashboards) to track ETL jobs, storage usage, and system health ensuring 99.9% platform uptime.
    • Delivered BI reporting with Power BI & SSRS, reducing manual reporting by 70%.
    • SQL Server
    • SSIS
    • SSRS
    • Apache Kafka
    • Power BI
    • Python
    • T-SQL
    • Oracle Database
    • Technical Documentation
  3. Data Ingestion Pipeline for AI Application · Shiftbricks

    Jun 2024 — Sep 2024 · Al Hoceima · Remote
    Data Platform

    Built a Medallion-architecture ingestion pipeline to transform 10K+ unstructured documents into structured datasets for downstream AI applications.

    • Integrated validation rules & schema checks, reducing ingestion errors by 35%.
    • Automated daily batch workflows with Apache Airflow (5+ DAGs, 99% job success rate).
    • Developed a FastAPI + React.js validation tool, cutting manual review time from 2h → 20min.
    • Implemented monitoring & logging in Airflow to track job failures, latency, and freshness.
    • Python
    • FastAPI
    • React.js
    • MongoDB
    • PostgreSQL
    • Airflow
    • Docker
    • Git/GitLab
    • Postman
  4. Data Collection & Analysis + KPI Dashboards · ONCF

    Jul 2024 — Sep 2024 · Rabat · Hybrid
    BI & DW

    Designed and deployed data pipelines and dashboards to analyze 10M+ daily train circulation records and improve operational efficiency.

    • Built a Python ETL pipeline loading 10M+ records/day into a SQL Server data warehouse.
    • Designed a star schema (fact + 8 dimensions), reducing query latency by 60%.
    • Developed interactive Power BI dashboards for delays, patterns, and utilization metrics.
    • Automated daily data refresh with SQLAlchemy, ensuring up-to-date analytics.
    • SQL Server
    • Python
    • SQLAlchemy
    • Power BI
    • DAX
  5. Modern E-commerce (Microservices) · Marketing Confort

    Jul 2024 — Oct 2024 · Fès · Remote
    Full-Stack

    Built a full-stack e-commerce platform using microservices architecture with React.js and Spring Boot.

    • Developed responsive React.js frontends integrated with Spring-based REST APIs.
    • Centralized service configuration with Spring Cloud Config Server for scalability.
    • Collaborated in an Agile/Scrum workflow using Jira and GitLab CI/CD.
    • Java
    • Spring Boot
    • Spring Data JPA
    • React.js
    • TypeScript
    • PostgreSQL
    • Microservices
    • Git/GitLab
    • Postman
    • Scrum

Public Speaking & Training

Workshops, Lectures & Collaborations

  1. Talk

    Sol Plaatje University — Invited Speaker

    Visio Conference · South Africa

    Presentation: “AI Use Cases with Big Data Technologies — A Data Engineering Perspective.” Covered data engineering fundamentals, modern data platform architectures, and deep-dives into several of my production-style projects — followed by an extended Q&A with 20+ students.

    • Data platform patterns, lake vs warehouse, streaming & batch.
    • End-to-end demos: ingestion → orchestration → storage → serving → BI.
    • How to operationalize AI in data platforms safely & cost-effectively.
    • Data Engineering
    • Big Data
    • AI
    • Architecture
  2. Collaboration

    Sol Plaatje University — Research Collaboration

    Remote · South Africa

    Co-built a Sign Language Recognition Application based on an Edge–Cloud Hybrid Architecture for real-time and privacy-preserving inference. We aligned on the use-case, collected & labeled the dataset (video), trained AI models, and delivered a mobile app with cloud streaming for continuous learning.

    • Data ops: video collection, labeling pipeline, dataset curation.
    • Mobile app: React Native UI + FastAPI service layer.
    • Cloud backbone: Kafka + Spark Streaming + HDFS for ingestion & feedback loops.
    • HDFS
    • Kafka
    • Spark Streaming
    • Python
    • React Native
    • FastAPI
    • REST APIs
    • Git
    • GitHub
  3. Research

    UM6P — College of Computing

    Remote · Morocco

    Topic: “Forecasting Large-Scale Renewable Energy Sources Through Statistical Methods with Anomalies.” I surveyed literature, designed experiments, and built a benchmarking suite across time-series models.

    • EDA on historical generation data & outlier detection / treatment.
    • Modeling: ARIMA / SARIMA / ARIMAX / SARIMAX baselines & comparisons.
    • Weekly reporting with metrics dashboards & model insights.
    • Python
    • NumPy
    • Pandas
    • ARIMA
    • SARIMA
    • ARIMAX
    • SARIMAX
    • APIs
    • Git/GitHub
    • Documentation
  4. Workshop

    ENSAH — Hands-on ETL Workshop

    Al Hoceima, Morocco

    Invited by the Data Club. Introduced ETL fundamentals and the data ecosystem, then led a end-to-end project: collect → raw → extract → transform → load to a data warehouse, followed by a Power BI dashboard (30+ participants).

    • ETL patterns & pipeline management in practice.
    • Data collection & storage, then warehousing & serving.
    • Hands-on BI storytelling on top of curated data.
    • Python
    • Pandas
    • BeautifulSoup
    • PostgreSQL
    • ETL
    • Data Warehouse
    • Power BI
  5. Training

    ENSAH — ML Training: Logistic Regression

    Al Hoceima, Morocco

    Invitation from the Data Club. Explained classification fundamentals, then built a complete Logistic Regression project — from training & evaluation to production integration via a Flask web app with HTML/CSS/JS.

    • Math & intuition of Logistic Regression.
    • Model training, metrics, and error analysis.
    • Deployment demo with Flask + simple UI.
    • Machine Learning
    • Classification
    • Python
    • Flask

My Services

Data Engineering & Streaming

Design and build scalable pipelines for both batch and real-time data. Implement CDC (Change Data Capture), streaming, and event-driven architectures to ensure reliable, low-latency data flows with strong quality and lineage.

  • Data Streaming
  • Data Transformation
  • Data Storage
  • Pipeline Orchestration
  • Big Data
  • System Monitoring
📊

Data Warehousing & Business Intelligence

Build and optimize modern data warehouses with dimensional modeling and star schemas. Deliver powerful dashboards and KPIs that empower decision-makers with actionable insights.

  • ETL Pipelines
  • Data Modeling
  • Cloud DW
  • Power BI
  • Data Vsualization
🛠️

Data Migration & Governance

Lead complex migrations from legacy systems (Oracle → SQL Server) with minimal downtime. Implement partitioning, purging strategies, and metadata-driven governance for secure, compliant, and cost-effective data management.

  • Data platform Migration
  • Partitioning
  • Data Lineage
  • DB Optimisation
🧠

AI Solutions

Develop intelligent retrieval-augmented generation (RAG) systems for multilingual content, with semantic chunking, embeddings, and context-aware query handling. Ensure reliable AI outputs through evaluation.

  • NLP
  • Prompt Engineering
  • Data Cleaning & Processing
  • Data Modeling
  • Metadata & Hybrid Search
☁️

Cloud & MLOps

Deploy, monitor, and scale machine learning models and APIs in the cloud. Implement CI/CD, observability, and automation to accelerate production workflows while controlling costs.

  • Azure
  • AWS
  • Google Cloud
  • Docker
  • MLflow
  • CI/CD
🧩

Full-Stack Integrations

Build complete solutions that combine backend APIs and user-friendly interfaces. Deliver seamless integrations between data systems, microservices, and modern web or mobile applications.

  • FastAPI
  • Spring Boot
  • React / React Native
  • TypeScript
  • PostgreSQL
  • MongoDB

Skills

A practical stack across data engineering, AI, cloud, and full-stack—used to ship reliable, production-grade solutions.

Programming & Frameworks

Clean, testable services and UIs with modern tooling.

  • Python
  • Java
  • TypeScript / JS
  • SQL
  • Bash
  • FastAPI
  • Spring Boot
  • React
  • React Native

Data Engineering & Streaming

Batch + real-time pipelines, CDC, quality and lineage.

  • Apache Kafka
  • Apache Spark
  • Airflow
  • Hadoop
  • Flink
  • DBT

Data Warehousing & BI

Dimensional models, KPIs, governance, and reporting.

  • SQL Server
  • SSIS
  • SSRS
  • Power BI
  • Data Modeling
  • Data Lineage

Databases

RDBMS & NoSQL tuned for reliability and scale.

  • PostgreSQL
  • SQL Server
  • Oracle
  • MySQL
  • MongoDB
  • ClickHouse
  • HBase

Machine Learing & Deep Learing

Building predictive and generative models with modern ML/DL techniques.

  • Regression
  • Classification
  • Clustering
  • LLMs
  • Prompt Engineering
  • Vector Search
  • Hybrid Search

Cloud & MLOps

Deploy, monitor, and automate MLOps Pipeline

  • Azure
  • AWS
  • GCP
  • Snowflake
  • Docker
  • MLflow

Tools & Practices

Collaboration, delivery, and quality at speed.

  • Git
  • GitLab
  • GitHub
  • Jira
  • Postman
  • Scrum
  • Agile

Languages

Working fluency in multi-lingual environments.

  • Arabic — Native
  • French — Courant
  • English — Courant

Latest Projects

A curated selection of data engineering, AI, and cloud projects I’ve delivered— from end-to-end pipelines to scalable AI solutions.




Azure ML Project

ML-Based Solution with Microsoft Azure

Migration of local infra to Azure Cloud with FastAPI, Spring Boot, React Native, and Power BI. Integrated MLOps pipelines and cloud-native deployment for scalability, automation, and real-time insights.

  • Azure cloud
  • Mlflow
  • Xg Boost
  • FastAPI
  • Spring Boot
  • React Native
  • PostgreSQL
  • Power BI
More details →
Event Driven Architecture

Event-Driven Architecture with Debezium

Real-time CDC pipeline with Kafka, Spark Streaming, and Debezium. Captures database changes, processes events, and updates analytics dashboards instantly.

  • Kafka
  • Spark Streaming
  • Debezium
  • CDC
  • Spring Boot
  • React.JS
More details →
AWS Smartphone Pipeline

Smartphone Data Pipeline using AWS Ecosystem

Migrated on-premises big data stack to AWS. Automated ingestion, processing, and analysis with Glue, Athena, and QuickSight, and data storage in S3.

  • AWS Glue
  • Athena
  • S3
  • QuickSight
  • Python
  • BeautifulSoup
More details →
Spring Boot AWS Deployment

Deploy a Spring Boot app on AWS Cloud

Automated deployment with Elastic Beanstalk, S3, EC2, and RDS. Scalable deployment for Spring Boot Application.

  • AWS
  • Elastic Beanstalk
  • S3
  • EC2
  • RDS
More details →
Exam Planning System

Exam Planning System with Spring Boot

Full-stack application with layered architecture. Managed exams scheduling using Spring Boot backend and thymeleaf frontend.

  • Spring Boot
  • thymeleaf
  • MySQL
  • Spring Data
  • Spring Security
More details →
Student Management System

Student Management System with Spring Boot

CRUD system for managing students. Built with Spring Boot, JSP frontend, and SQL database.

  • Spring Boot
  • JSP
  • Bootstrap
  • MySQL
More details →
Big Data Smartphone Prediction

Smartphone Price Prediction in Big Data Environment

Lambda architecture combining batch and real-time layers. Built on Spark + Kafka with BI dashboards.

  • Kafka
  • Spark
  • HDFS
  • PostgreSQL
  • python
  • PowerBI
  • Airflow
  • HBase
  • Spring Boot
More details →
Jumia Sentiment Analysis

Sentiment Analysis & Price Prediction

Sentiment analysis for Jumia reviews and price prediction with regression models.

  • Python
  • Scikit-learn
  • Regression
  • Classification
  • HTML/CSS
  • MySQL
  • Web Scrapping
  • Flask
More details →
Kafka Real Time Pipeline

Real-Time Data Pipeline using Kafka

Real-time computer performance monitoring pipeline using Kafka, psutil, SQL Server, and Power BI dashboards.

  • Kafka
  • Python
  • SQL Server
  • Power BI
More details →
Mobile Data Hive

Mobile-phones Data Analysis

Moved MySQL smartphone dataset to Hive via Sqoop and built BI dashboards with HiveQL + Power BI.

  • Sqoop
  • Hive
  • MySQL
  • Power BI
More details →
HR Azure Pipeline

HR Data Pipeline with Azure

Built HR data pipeline with ADF, Databricks, Blob Storage, and Power BI for reporting.

  • ADF
  • Azure Databricks
  • Blob Storage
  • Power BI
More details →
Sales Pipeline

Sales Data Pipeline

ETL pipeline from SQL Server to BigQuery with Airflow orchestration and Looker Studio dashboards.

  • SQL Server
  • Airflow
  • BigQuery
  • Python
  • Looker Studio
More details →
HR Data Pipeline Snowflake

Human Resources Data Pipeline

Built HR ETL pipeline with PL/SQL and SQL, Snowflake on Azure, and Informatica integration.

  • PL/SQL
  • Snowflake
  • Informatica
  • Power BI
More details →
YouTube Pipeline

YouTube Data Pipeline

Automated ETL from YouTube API to Power BI dashboard for content analytics.

  • YouTube API
  • Python
  • Snowflake
  • ETL
  • Power BI
More details →
Contact Management App

Contact Management

JavaFX desktop application to manage contacts & groups. Used JDBC, MySQL and Log4j.

  • Java
  • JavaFX
  • JDBC
  • MySQL
More details →