Hi there 👋

A Full-stack AI/DATA Engineer with 4+ years of experience working on developing and implementing end-to-end machine learning and analytics platforms. In particular, my work covers designing and building data platforms, model development, and deployment (e.g., building ETL data pipelines, algorithm development, distributed computation, MLOps, and monitoring).

I obtained my Master’s degree in Mechanical Engineering from Kookmin University in 2020. During my studies, I worked in the Thermal Environmental Engineering Labs under the supervision of Han Hwataik. My research focused on using computational modeling to predict indoor air quality using open-source data and CO2 sensors as input features. My work was accepted for publication in the Journal of Mechanical Science and Technology in 2021.

In my spare time, I also have several research interests including reliable machine learning tasks/workflow, machine learning explainability/uncertainty, and model generalization.


🧑🏼‍💻 WORK EXPERIENCES

AI/DATA Engineer at Wellysis (Spin-off company from Samsung SDS)

Healthcare Tech

Oct 2022 - Present (Seoul, South Korea)

  • Achieved a 25% reduction in storage usage and removed the data mart and warehouse complexity, without compromising high-frequency ECG data load performance (at 256Hz), by adopting Data Lakehouse architecture with strategic data partition strategy and Minio as the storage. This initiative significantly enhanced overall system efficiency, reduced maintenance overhead, and contributed to a more cost-effective and scalable data infrastructure.
  • Increased the training speed of the ML model by 40% and accuracy by 5% through the implementation of a robust cluster-based data sampling algorithm during pre-processing on highly imbalanced data. The notable impact of this enhancement is improved stability during the training phase, making a significant contribution to the overall model performance.
  • Accelerated AI development experience by 5x by designing and building a data platform automating migration & processing pipelines across diverse sources in a distributed manner, complete with a user-friendly Interface.
  • Improved the AI model confidence in detecting abnormal ECG patterns for patients by implementing a direct human feedback system integrated into the ML infrastructure, with comprehensive monitoring features called BeatLab.
  • Architected diverse purpose dashboards, including model monitoring and business-related metrics, resulting in improved data visualization, informed decision-making, and enhanced business strategy.

Data Scientist at Intellectual Technology Space - ITSROOM

IoT predictive maintenance

Feb 2020 - Aug 2022 (Ulsan, South Korea)

  • Collaborated with Korea Aerospace Industries (KAI) on Research and Development projects involving the implementation of a machine learning system for real-time detection of anomalies and abnormalities with the result of 88% prediction accuracy and 175ms latency.
  • Developed a real-time anomaly detection system using Spark Streaming + Kafka for high-frequency rotating machinery.
  • Implemented a high-performance Lakehouse architecture for an efficient data platform.
  • Designed user dashboards for monitoring and Remaining Useful Life (RUL).

🛠️ TECHNOLOGIES AND TOOLS:

  • AI/ML Tools and Frameworks: TensorFlow, Scikit-learn, NumPy, MLFlow, Algorithms (Clustering, Feature Extraction, Anomaly Detection).
  • Query Engines and Data Manipulation: SQL, Polars, Pandas, PySpark, DuckDB.
  • Orchestration Tools: Airflow, Dagster.
  • Databases and Storage: Minio, Delta Lake, PostgreSQL, MariaDB, MySQL, ScyllaDB, Hadoop-ecosystem.
  • Dashboard and Visualization Tools: Grafana, Superset.
  • Other Tools: Bash script, Git, Docker.

🎯 SKILLS

  • Skilled in Python programming with a focus on AI/DATA infrastructure and scientific computing.
  • Expertise in essential Python packages for data manipulation and machine learning development, such as Pandas, Scikit-learn, Numpy, Tensorflow, Pytorch, and Matplotlib.
  • Proficient in ML modeling, including tasks like data transformation, segmentation, anomaly detection, augmentation, and feature extraction.
  • Strong understanding of various machine learning techniques, including deep learning (LSTM, autoencoder, ResNet), tree-based algorithms (decision tree, random forest), clustering (k-means, DBScan), and dimensionality reduction (PCA, manifold learning).
  • Familiar with everyday tools like Bash, Git, SQL, and Docker.
  • Experience using MLFlow and Wandb for model experimentation, tracking, and monitoring.
  • Developed Python APIs using FastAPI for communication with front-end services.
  • Hands-on experience with Apache technologies like Airflow, Spark, and Kafka, ETL data processing, and distributed computing.
  • Design and build a persistence Delta lake for data Lakehouse

📚 ACADEMIC PUBLICATIONS

  • Predicting indoor PM2.5/PM10 concentrations using simplified neural network models
    • Muhammad Hatta, Hwataik Han
    • Journal of Mechanical Science and Technology 2021 [link]
  • Comparison of performance of heat recovery ventilator and air purifier in reducing indoor PM10 concentrations in a classroom
    • Muhammad Hatta, Hwataik Han
    • Clima 2019 [paper]
  • Smart Ventilation for Energy Conservation in Buildings
    • Hwataik Han, Muhammad Hatta, Haolia rahman
    • Journal of Novel Carbon Resource Sciences & Green Asia Strategy 2019 [paper]
  • Ventilation Strategy for Acceptable PM2.5 in a Classroom Depending on Building Characteristics
    • Muhammad Hatta, Hwataik Han
    • 대한설비공학회 학술발표대회논문집 2019 [link]
  • Designing of Inverse Taper Wind Turbine Blade for Pekanbaru Wind Speed Condition
    • Awaludin Martin, Muhammad Hatta
    • International Journal of Environmental Research & Clean Energy 2019 [paper]