cv
Senior MLOps Engineer specializing in Kubernetes infrastructure, CI/CD pipelines, and enterprise-scale cloud solutions. Computer Science graduate from UT Dallas with expertise in machine learning operations, distributed systems, and production software engineering.
Basics
Name | Armin Ziaei |
Label | Senior MLOps Engineer |
armin.ziaei.tech@gmail.com | |
Phone | (650) 620-0912 |
Url | https://armixz.github.io/ |
Summary | Senior MLOps Engineer specializing in Kubernetes infrastructure, CI/CD pipelines, and enterprise-scale cloud solutions. Computer Science graduate from UT Dallas with expertise in machine learning operations, distributed systems, and production software engineering. |
Work
-
2025.03 - Present Senior MLOps Engineer
Aktus AI
Enterprise Kubernetes Infrastructure . Marketplace Deployment Solutions . CI/CD Pipeline Engineering
- MLOps
- Kubernetes
- Infrastructure
- GPU Workloads
- Auto-scaling
-
2024.08 - 2025.03 MLOps Engineer
Aktus AI
Production RAG Systems . Microservices Architecture . CI/CD Automation . Google Cloud Platform
- MLOps
- RAG Systems
- GCP
- Helm Charts
- GitHub Actions
-
2024.01 - 2025.02 Cloud Operations Engineer
HHAeXchange
Backup & Disaster Recovery . Healthcare Data Protection . Automation Workflows . HIPAA Compliance
- CloudOps
- Disaster Recovery
- Healthcare Compliance
- Python Automation
- 500TB+ Data
-
2023.01 - 2023.12 System Management and Response
HHAeXchange
Incident Response Management . Alert Processing . SLA Compliance . Real-time Monitoring
- Incident Management
- 300+ Daily Alerts
- 99.8% SLA
- MTTR Reduction
- Escalation Workflows
-
2022.01 - 2022.12 System Operations Engineer
HHAeXchange
Monitoring Solutions . Datadog Implementation . Technical Training . Operational Procedures
- Monitoring
- Datadog
- Training Programs
- Custom Dashboards
- Team Leadership
-
2021.04 - 2021.12 System Operations Engineer Intern
HHAeXchange
Production Architecture Monitoring . Root Cause Analysis . Performance Optimization . System Stability
- System Monitoring
- Root Cause Analysis
- Performance Optimization
- Production Support
Education
-
2018 - 2022 Texas, USA
BS
The University of Texas at Dallas
Computer Science
- Discrete Mathematics for Computing
- Computer Architecture
- Probability and Statistics in Computer Science
- Data Structures and Introduction to Algorithmic Analysis
- Systems Programming in UNIX
- Digital Logic and Computer Design
- Database Systems
- Operating Systems Concepts
- Advanced Algorithm Design and Analysis
- Automata Theory
- Artificial Intelligence
- Introduction to Machine Learning
- Compiler Design
- Data and Applications Security
- Computer and Network Security
- Introduction to VLSI Design
-
2015 - 2017 Texas, USA
AS
Collin College
Computer Science
- Calculus
- Discrete Mathematics for Computing
- Programming Fundamentals
- University Physics
Certificates
Datadog Fundamentals | ||
Datadog | 2024 |
Linux LPIC-1 | ||
Linux Professional Institute | 2015 |
Publications
-
2024.05.18 A Comparative Analysis of the Machine Learning Methods for Predicting Diabetes
Journal of Operations Intelligence
Diabetes can lead to various health problems and complications, such as cardiovascular disease, kidney damage (nephropathy), eye issues, neuropathy, and foot ailments. In this study, we compare the performance of nine machine-learning classification models in predicting diabetes.
-
2024 A Comparison of Methods for Predicting Heart Disease: Neural Network, XGBoost, Gradient Boost, Logistic Regression, and SVM
Under Review
A comprehensive comparative study of machine learning methods for heart disease prediction, analyzing the performance of neural networks, ensemble methods, and traditional statistical approaches to improve cardiovascular disease diagnosis.
-
2023.05.08 Comparative Study of Decision Tree, AdaBoost, Random Forest, Naïve Bayes, KNN, and Perceptron for Heart Disease Prediction
IEEE
Globally, cardiovascular diseases (CVD) are estimated to account for more than 32% of all deaths. Consequently, CVD has become a global health problem, and timely diagnosis is essential (WHO, 2021). Screening for risk factors accelerates the diagnosis and management of CVD, resulting in a more effective and rapid response, reducing the risk of death. This article compares six classification models, AdaBoost, Random Forest, Decision Tree, KNN, Naive Bayes, and Perceptron, to predict CVD symptoms.
Skills
Languages | |
Python | |
C/C++ |
Infrastructure | |
Kubernetes | |
Ansible | |
Kafka | |
RabbitMQ | |
SQS |
Clouds | |
AWS | |
GCP | |
Azure |
DevOps | |
Git | |
Terraform | |
Helm | |
Docker |
MLOps | |
Kubeflow | |
MLflow | |
Weights & Biases |
ML/AI Frameworks | |
PyTorch | |
TensorFlow |
Monitoring | |
Datadog | |
Prometheus | |
Grafana |
Vector DB | |
Pinecone | |
Qdrant |
Marketplace | |
GCP Deployment Manager | |
AWS CloudFormation |
Systems | |
Linux/Unix | |
TCP/UDP | |
Database Design |
Hardware/Logic | |
Verilog/VHDL | |
Prolog Logic |
Languages
Farsi | |
Native speaker |
English | |
Fluent |
Interests
Technology & Innovation | |
Cloud Technologies | |
Machine Learning | |
Open Source | |
DevOps Tools | |
Kubernetes | |
Tech Conferences |
Hardware & DIY | |
Flipper Zero | |
Raspberry Pi | |
Hardware Hacking | |
DIY Projects | |
Electronics | |
Programming |
Projects
- 2024.01 - 2024.04
Enterprise Kafka Consumer Management Platform
Architected scalable Kafka consumer management application using Django and Python, implementing real-time message processing orchestration with automated consumer scaling and health monitoring for distributed streaming data pipelines supporting high-throughput message consumption across multiple topics and partitions.
- Kafka
- Django
- Python
- Real-time Processing
- Distributed Systems
- Auto-scaling
- 2024.01 - 2024.04
Big Data Monitoring Infrastructure
Engineered comprehensive monitoring solution for 2M+ weekly transaction data points using Datadog and Python, implementing custom metrics collection, automated alerting systems, and performance optimization strategies that reduced monitoring costs by 35% while improving system reliability to 99.9% uptime and enabling proactive incident detection.
- Big Data
- Datadog
- Python
- Monitoring
- Cost Optimization
- 99.9% Uptime
- 2023.09 - 2023.12
Embedded Systems Calculator for Flipper Zero
Developed feature-rich programmer calculator application for Flipper Zero hardware platform using C and FURI framework, implementing hexadecimal/binary/decimal conversions, bitwise operations, and memory management optimized for embedded systems with 64KB RAM constraints and real-time user interface responsiveness.
- Embedded Systems
- C Programming
- FURI Framework
- Memory Optimization
- Hardware Constraints
- 2023.09 - 2023.12
Datadog Metrics Automation Framework
Led development of automated custom metrics generation system using Python and Datadog API integration, implementing database query performance monitoring, automated threshold configuration, and dynamic alerting rules that reduced manual monitoring overhead by 80% and improved database performance visibility by 90%.
- Python
- Datadog API
- Automation
- Database Monitoring
- 80% Overhead Reduction
- 2023.09 - 2023.12
Infrastructure Automation with Ansible
Spearheaded enterprise-wide system update automation using Ansible playbooks, implementing zero-downtime patching strategies for 200+ servers, automated rollback mechanisms, and compliance reporting that reduced maintenance windows by 70% and eliminated manual patching errors across production environments.
- Ansible
- Infrastructure Automation
- Zero-downtime Deployment
- 200+ Servers
- 70% Reduction
- 2023.03 - 2023.06
Predictive Analytics with Datadog APM
Implemented machine learning forecasting system analyzing Datadog APM performance data, developing predictive models for usage pattern analysis and failure prediction using time-series analysis, anomaly detection algorithms, and automated capacity planning recommendations that improved system reliability by 45%.
- Machine Learning
- Datadog APM
- Predictive Analytics
- Time-series Analysis
- 45% Reliability Improvement
- 2022.09 - 2022.12
PowerShell Log Management Automation
Developed comprehensive PowerShell automation framework for enterprise log file management and disk cleanup operations, implementing scheduled task orchestration, custom compression algorithms, and automated archival processes that reduced storage costs by 60% and eliminated manual log maintenance across 150+ Windows servers.
- PowerShell
- Automation
- Log Management
- 60% Cost Reduction
- 150+ Servers
- 2021.09 - 2021.12
Machine Learning System Usage Analytics
Conducted large-scale system usage analysis using Python and scikit-learn on petabyte-scale datasets, implementing clustering algorithms, predictive modeling, and performance optimization recommendations that identified 40% reduction opportunities in resource utilization and improved system efficiency metrics.
- Machine Learning
- Python
- scikit-learn
- Petabyte-scale Data
- 40% Resource Optimization
- 2021.09 - 2021.12
Multi-Platform Data Engineering Pipeline
Architected comprehensive data analysis and engineering solution using Python, .NET, Shell scripting, and advanced Regex processing, implementing ETL pipelines for terabyte-scale data processing, automated data validation frameworks, and cross-platform integration that improved data processing speed by 300% and reduced manual data handling errors by 95%.
- Data Engineering
- Python
- .NET
- ETL Pipelines
- 300% Speed Improvement
- 95% Error Reduction
- 2022.08 - 2022.12
Halo Collar Activity Recognition
Engineered a GPS and sensor-based machine learning model for canine activity classification using Python and scikit-learn, achieving 89% accuracy across 8 distinct behavioral patterns and processing real-time sensor data from 500+ collar devices for the PAWS LLC partnership.
- Machine Learning
- Python
- scikit-learn
- GPS Sensors
- Data Processing
- 2022.08 - 2022.12
Distributed Network Communication Protocol
Architected CRSP-compliant distributed system with controller, renderer, and server components using UDP socket programming in Python, implementing concurrent message processing with multithreading and multiprocessing for file streaming operations supporting pause, resume, and restart functionality across networked hosts.
- Network Programming
- Python
- UDP
- Multithreading
- Distributed Systems
- 2022.08 - 2022.12
Production Compiler Implementation
Architected a complete compiler system using Java with JFlex lexical analyzer and CUP parser generator, implementing a comprehensive grammar supporting classes, methods, arrays, expressions, and control flow statements, validated through 21 distinct test cases covering syntax analysis, semantic checking, and error handling for type safety and program correctness.
- Compiler Design
- Java
- JFlex
- CUP Parser
- Grammar Implementation
- 2022.01 - 2022.05
Enterprise Task Management Platform
Led development of full-stack web application using PHP, Apache, and SQL database with comprehensive Entity-Relationship modeling, implementing normalized database design supporting user authentication, task tracking, and reporting capabilities with web-based interface for enterprise task management workflows.
- Full-Stack Development
- PHP
- Apache
- SQL
- Database Design
- 2022.01 - 2022.05
Operating System Process Simulator
Developed multi-process computer system simulation using C/C++ with separate CPU and Memory processes communicating via Inter-Process Communication, demonstrating deep understanding of operating system concepts, including process scheduling, memory management, and low-level system programming with concurrent execution handling.
- Operating Systems
- C/C++
- IPC
- Process Scheduling
- Memory Management
- 2021.08 - 2021.12
Computer Vision Digit Classification
Built production-ready handwritten digit recognition system for MNIST dataset using Python and scikit-learn, implementing image preprocessing pipelines for 28x28 pixel grayscale images, achieving 97%+ classification accuracy with comprehensive data validation and Kaggle-style competition submission format.
- Computer Vision
- Python
- scikit-learn
- MNIST
- Image Processing
- 2021.08 - 2021.12
Machine Learning Algorithm Library
Engineered a comprehensive ML framework implementing 12 fundamental algorithms, including Linear Regression (direct method, polynomial, SGD), K-Means clustering, PCA with eigenface analysis, K-Nearest Neighbors on the MNIST dataset, Logistic Regression, and Decision Trees, with performance benchmarking across multiple datasets and automated hyperparameter optimization.
- Machine Learning
- Python
- Algorithm Implementation
- Performance Benchmarking
- 2021.08 - 2021.12
Prolog Knowledge Representation System
Developed an enterprise-grade question-answering system using Prolog with a first-order logic implementation, featuring a comprehensive knowledge base containing 50+ facts and 20+ inference rules, supporting complex logical reasoning queries with 95% accuracy for domain-specific questions about retail transactions and inventory management.
- AI Programming
- Prolog
- Knowledge Representation
- Logic Programming
- Question Answering
- 2021.08 - 2021.12
Multi-Algorithm AI Search Engine
Implemented sophisticated 8-puzzle solver using four distinct search algorithms, including Depth-First Search, Iterative Deepening Search, and A* search with Manhattan distance and misplaced tile heuristics, achieving optimal solutions with a command-line interface and file-based input processing for automated testing and performance comparison.
- Artificial Intelligence
- Search Algorithms
- A* Search
- Heuristics
- Problem Solving
- 2020.08 - 2020.12
32-Bit ALU Hardware Architecture
Designed a comprehensive 32-bit Arithmetic Logic Unit supporting 16 operations using Verilog HDL, implementing complex multiplexer architecture with error detection for overflow and divide-by-zero conditions, sequential logic components including accumulator register and flip-flops, and a comprehensive test bench validation, achieving timing requirements for 100MHz operation.
- Digital Systems
- Verilog HDL
- Hardware Design
- ALU Architecture
- Test Bench
- 2019.01 - 2019.05
Multi-Protocol Network Chat System
Engineered real-time communication system supporting both TCP and UDP protocols in a Unix environment using C/C++ and Python, implementing client-server architecture with concurrent connection handling, demonstrating network programming expertise and understanding of protocol-level communication differences.
- Network Programming
- TCP/UDP
- C/C++
- Python
- Client-Server Architecture
- 2018.08 - 2018.12
Custom Shell Command Processor
Developed a comprehensive Bash command-line interface application in a Unix environment using shell scripting and system programming, implementing custom command parsing, input validation, and process management capabilities. The application enhanced system interaction efficiency by streamlining command processing, automating task execution, and improving the user experience for routine administrative operations.
- Systems Programming
- Bash Scripting
- Unix
- Command Processing
- Process Management