Avatar

Basel Alyafi

Data Engineer

Vueling Airlines

Biography

An AWS certified Solutions Architect Associate (SAA-C03) and a Cloud Practitioner (CLF-C01). I am also a curious developer who is always eager to master new skills with a proven track record in learning new tools in record times, Microsoft Azure services and Apache NIFI are two examples. I find it compelling to see the translation from ideas into code and then into a useful product. Proud to have migrated Eyepix.net to Azure in 10 weeks and for being called the “Docker Reference” in my last job. Passionate about cloud services, cats, and photography.

Interests

  • Cloud (AWS + Azure)
  • Neat Python programming
  • Microservice Orchestration
  • Automation/DevOps
  • Docker + Compose
  • Passionate about developing others
  • Curiosity first

Education

  • Erasmus Mundus Joint Master's in Medical Imaging and Applications (MAIA), 2017 - 2019

    University of Bourgogne (France), University of Cassino (Italy), and University of Girona (Spain), jointly.

  • BSc in Informatics and Communication Engineering (first class), 2009 - 2015

    Yarmouk Private University (Syria).

Experience

 
 
 
 
 

Data Engineer

Vueling Airlines

Jun 2023 – Present Viladecans, Barcelona, Spain

Tasks & Achievements

  • managing and monitoring ETLs on cloud using Airlfow and Grafana
  • Reviewing error logs on Elastic Search and AWS CloudWatch
  • Designed and implemented two data qualities using OOP Python and a hybrid Redshift-MySqL environment
  • Dealing with six-digit-size databases on AWS Redshift
  • Reading from daily dashboards and reporting from Tableau
  • Developing efficient and complex queries to join large tickets’ sales tables
  • excelled in using Airflow sensors to orchestrate Kubernetes pods

Tech stack:

  • AWS Redshift + MySQL
  • Python
  • Airflow
  • Docker
  • Grafana + Tableau
  • Elastic Search
 
 
 
 
 

Data Solution Developer

Desidedatum data Company

Apr 2022 – Feb 2023 Ciudad de Justicia, Barcelona, Spain

To offer data solutions to the Spanish governmental bodies that are opting in the open-data and transparency program in Spain.

Achievements:

  • Architected an ETL dataflow on NIFI to enquire Oracle and upload results to CKAN on EC2.
  • Helped a Spanish entity revive their website by increasing its EBS size.
  • Debugged GCP python functions in visual studio using functions framework integration.
  • Found a way to migrate AWS EC2 instances across accounts belonging to different organisations using AMI and Amazon Backup
  • Adopted TablePress for creating editable and user-friendly data tables
  • Modified user permissions to allow secure access/edit to Postgres micro-service in Ubuntu

Tech stack:

  • Docker & Docker-compose
  • CKAN
  • AWS
  • Python
  • Apache NIFI
  • Postgres & MySQL
  • WordPress
 
 
 
 
 

Machine Learning and AI Team Lead

MGS Software

Jul 2021 – Apr 2022 Istanbul, Turkey (remotely)

Working mainly on Eyepix ML models and CI/CD, and cooperating with the Backend and business analysts to offer reliable solutions to the end-user. Achievements:

  • Leading the DevOps/MLOps of Eyepix project (eyepix.net).

    • Migrated all micro-services to use Azure Container Instances, Service Bus, and Redis.
    • Also same services can run locally using RabbitMQ, docker, and local Redis.
    • Developed a single docker-compose to manage 14 mirco-services with a single config file.
    • Implemented a one-line switching mechanism (Azure / on-premises).
    • Demonstrated notable stamina as my team and I tested and deployed 5 DL modules in 1.5 months.
    • Reviewed and tested several modules, i.e. fire detection, asset guardian, and heatmap.
  • Contributed to petner.net.

    • Classified dogs and cats into 175 breeds using Azure Custom Vision (Petner application).
 
 
 
 
 

Researcher in Training

Computer Vision and Robotics Institute, University of Girona

Feb 2020 – Dec 2020 Girona, Spain

Achievements:

  • Analysed statistically a clinical validation by two radiologists to evaluate the realism of 150 in-house GAN-synthesised and real mammography lesions.
  • Achieved high levels of realism in the synthetic images that two expert radiologists classified 45% of them as real.
  • Co-authored two papers in SPIE Medical Imaging conference and IWBI workshop.
  • Reviewed Radiomics and Deep Learning for Triple Negative Breast Cancer classification in mammography.
  • Co-mentored and trained an MSc thesis intern on Machine Learning for medical image analysis; designed two data engineering assignments to help the intern develop problem-solving techniques.
  • Won the €55K Catalan FI 2020 grant.
  • Funds: FI 2020 grant, SMARTER project (Ref. DPI2015-68442-R), and ICEBERG project (Ref. RTI2018-096333-B-I00).

Outcome:

  • First author, SPIE Medical Imaging conference paper (DCGANs for Realistic Breast Mass Augmentation in X-ray Mammography).
  • First author, IWBI workshop paper (Quality analysis of DCGAN-generated mammography lesions).
 
 
 
 
 

MSc Thesis and Summer Internship

Computer Vision and Robotics Institute, University of Girona

Feb 2019 – Dec 2019 Girona, Spain

GAN (Generative Adversarial Networks) for realistic data augmentation and lesion simulation in x-ray breast imaging using Pytorch. The Hologic mammograms were acquired from OMI-DB dataset (OPTIMAM).

Responsibilities include:

  • Data filtering and preprocessing.

    • Leaving out outliers.
    • Histogram normalisation.
    • Random patch extraction from breast tissue and lesions (if applicable).
  • Deep Convolutional GANs implementation (Pytorch).

    • Training and fine-tuning the hyper-parameters.
    • Validating the models using k-fold cross-validation.
    • Testing using unseen cases (lesion/tissue classification).

Achievements:

  • Developed Generative Adversarial Networks for realistic data augmentation in X-ray breast imaging (details).
  • Generated 2,000 realistic, diverse mammographic lesions using the trained GANs.
  • Boosted imbalanced classification from 87% to 96% F1 score by incorporating synthetic cancer lesions in the dataset.
  • Employed t-distributed Stochastic Neighbour Embedding for 2D visualisation of high-dimensional data using Matplotlib.
  • Improved data homogeneity through filtering outliers and histogram normalisation using Python.
  • Excelled in training and finetuning GANs to avoid divergence and mode collapse using PyTorch.

Supervisors: Dr Robert Marti, Dr Oliver Diaz.

 
 
 
 
 

Summer Intern

Coronis Computing

Jul 2018 – Sep 2018 Girona, Spain

Researching the methods of matching between high- and low-quality dermoscopic images including:

  • Feature detection and description (using SIFT)
  • Deep Learning (Siamese network and face detection).

Employer: Prof Rafael Garcia (rafael.garcia@udg.edu)

 
 
 
 
 

Telecommunications Engineer

Eastern Telecommunication Services (ETS)

Jun 2016 – Nov 2016 Damascus, Syria
Perform cell-site survey, operation, maintenance, and upgrade.
 
 
 
 
 

Electrical Engineering Technician

Syrian Telecommunication Establishment (STE)

Mar 2010 – Aug 2017 Damascus, Syria
Maintenance of electronic devices (rectifiers, inverters) and electrical generators.

Accomplishments

Google Cloud Platform Big Data and Machine Learning Fundamentals

Contents:

  • Identify the purpose and value of the key Big Data and Machine Learning products in Google Cloud.
  • Use Cloud SQL and Dataproc to migrate existing MySQL and Hadoop workloads to Google Cloud.
  • Employ BigQuery to carry out interactive data analysis.
  • Recommending products using Cloud SQL and Spark.
  • Predict visitor purchases using BigQuery ML.
  • Real-time dashboards with Pub/Sub, DataFlow and DataStudio.
See certificate

Qliksense Analytics Development

Honours award.

Contents:

  • Associative models.
  • Charts and data visualisation.
  • QlikSense code expressions.
  • Data integrity and governance.
See certificate

SQL for Data Science

Contents:

  • Entity relationship diagrams.
  • Data filtering and aggregation.
  • Joins and data profiling.
See certificate

Structuring Machine Learning Projects

Contents:

  • Transfer learning
  • Error analysis.
  • Multi-task learning.
  • Data mismatch.
See certificate

Neural Networks and Deep Learning

Contents:

  • Vectorized forward and backward propagation.
  • Activation functions.
  • (Hyper)parameters.
See certificate

Convolutional Neural Networks

Contents:

  • Residual and Inception nets.
  • Object detection.
  • Face recognition.
  • Neural style transfer.
See certificate

Improving Deep Neural Networks

Contents:

  • Hyperparameters tuning.
  • Bias (underfitting) and variance (overfitting).
  • Parameters initialization types.
  • Regularization and optimization algorithms.
See certificate

Machine Learning on Coursera

Contents:

  • Linear and logistic regression.
  • Neural networks.
  • Support vector machines.
  • Unsupervised learning (K-means clustering).
  • Anomaly detection.
  • Recommender systems.
  • Large-scale machine learning.
See certificate

Cisco Certified Network Associate (CCNA) Attendance

Contents:

  • Internetworking.
  • TCP/IP.
  • Ip routing (EIGRP, OSPF).
  • Layer 2 switching and Spanning Tree Protocol.
  • Security and Cisco's wireless technologies.

Projects

Eyepix

Migrating an ML system to Azure

ICEBERG

Image Computing for Enhancing Breast Cancer Radiomics.

SMARTER

Smart Image Analysis for Screening Challenges in Breast Cancer.

Research Output

Skills

Python

Matlab

C++

SQL

Ubuntu

QlikSense

Data analysis and visualisation

PyTorch

Docker

Git

Talks

Quality analysis of DCGAN-generated mammography lesions

A talk at IWBI 14 workshop 2019, Leuven, Belgium.

Generating synthetic lesions for improving breast lesion detection in mammography

A talk at EuSoMII annual meeting 2019, Valencia.