cv

Senior MLOps Engineer specializing in Kubernetes infrastructure, CI/CD pipelines, and enterprise-scale cloud solutions. Computer Science graduate from UT Dallas with expertise in machine learning operations, distributed systems, and production software engineering.

Basics

Name Armin Ziaei
Label Senior MLOps Engineer
Email armin.ziaei.tech@gmail.com
Phone (650) 620-0912
Url https://armixz.github.io/
Summary Senior MLOps Engineer specializing in Kubernetes infrastructure, CI/CD pipelines, and enterprise-scale cloud solutions. Computer Science graduate from UT Dallas with expertise in machine learning operations, distributed systems, and production software engineering.

Work

  • 2025.03 - Present
    Senior MLOps Engineer
    Aktus AI
    Enterprise Kubernetes Infrastructure . Marketplace Deployment Solutions . CI/CD Pipeline Engineering
    • MLOps
    • Kubernetes
    • Infrastructure
    • GPU Workloads
    • Auto-scaling
  • 2024.08 - 2025.03
    MLOps Engineer
    Aktus AI
    Production RAG Systems . Microservices Architecture . CI/CD Automation . Google Cloud Platform
    • MLOps
    • RAG Systems
    • GCP
    • Helm Charts
    • GitHub Actions
  • 2024.01 - 2025.02
    Cloud Operations Engineer
    HHAeXchange
    Backup & Disaster Recovery . Healthcare Data Protection . Automation Workflows . HIPAA Compliance
    • CloudOps
    • Disaster Recovery
    • Healthcare Compliance
    • Python Automation
    • 500TB+ Data
  • 2023.01 - 2023.12
    System Management and Response
    HHAeXchange
    Incident Response Management . Alert Processing . SLA Compliance . Real-time Monitoring
    • Incident Management
    • 300+ Daily Alerts
    • 99.8% SLA
    • MTTR Reduction
    • Escalation Workflows
  • 2022.01 - 2022.12
    System Operations Engineer
    HHAeXchange
    Monitoring Solutions . Datadog Implementation . Technical Training . Operational Procedures
    • Monitoring
    • Datadog
    • Training Programs
    • Custom Dashboards
    • Team Leadership
  • 2021.04 - 2021.12
    System Operations Engineer Intern
    HHAeXchange
    Production Architecture Monitoring . Root Cause Analysis . Performance Optimization . System Stability
    • System Monitoring
    • Root Cause Analysis
    • Performance Optimization
    • Production Support

Education

  • 2018 - 2022

    Texas, USA

    BS
    The University of Texas at Dallas
    Computer Science
    • Discrete Mathematics for Computing
    • Computer Architecture
    • Probability and Statistics in Computer Science
    • Data Structures and Introduction to Algorithmic Analysis
    • Systems Programming in UNIX
    • Digital Logic and Computer Design
    • Database Systems
    • Operating Systems Concepts
    • Advanced Algorithm Design and Analysis
    • Automata Theory
    • Artificial Intelligence
    • Introduction to Machine Learning
    • Compiler Design
    • Data and Applications Security
    • Computer and Network Security
    • Introduction to VLSI Design
  • 2015 - 2017

    Texas, USA

    AS
    Collin College
    Computer Science
    • Calculus
    • Discrete Mathematics for Computing
    • Programming Fundamentals
    • University Physics

Certificates

Datadog Fundamentals
Datadog 2024
Linux LPIC-1
Linux Professional Institute 2015

Publications

Skills

Languages
Python
C/C++
Infrastructure
Kubernetes
Ansible
Kafka
RabbitMQ
SQS
Clouds
AWS
GCP
Azure
DevOps
Git
Terraform
Helm
Docker
MLOps
Kubeflow
MLflow
Weights & Biases
ML/AI Frameworks
PyTorch
TensorFlow
Monitoring
Datadog
Prometheus
Grafana
Vector DB
Pinecone
Qdrant
Marketplace
GCP Deployment Manager
AWS CloudFormation
Systems
Linux/Unix
TCP/UDP
Database Design
Hardware/Logic
Verilog/VHDL
Prolog Logic

Languages

Farsi
Native speaker
English
Fluent

Interests

Technology & Innovation
Cloud Technologies
Machine Learning
Open Source
DevOps Tools
Kubernetes
Tech Conferences
Hardware & DIY
Flipper Zero
Raspberry Pi
Hardware Hacking
DIY Projects
Electronics
Programming

Projects

  • 2024.01 - 2024.04
    Enterprise Kafka Consumer Management Platform
    Architected scalable Kafka consumer management application using Django and Python, implementing real-time message processing orchestration with automated consumer scaling and health monitoring for distributed streaming data pipelines supporting high-throughput message consumption across multiple topics and partitions.
    • Kafka
    • Django
    • Python
    • Real-time Processing
    • Distributed Systems
    • Auto-scaling
  • 2024.01 - 2024.04
    Big Data Monitoring Infrastructure
    Engineered comprehensive monitoring solution for 2M+ weekly transaction data points using Datadog and Python, implementing custom metrics collection, automated alerting systems, and performance optimization strategies that reduced monitoring costs by 35% while improving system reliability to 99.9% uptime and enabling proactive incident detection.
    • Big Data
    • Datadog
    • Python
    • Monitoring
    • Cost Optimization
    • 99.9% Uptime
  • 2023.09 - 2023.12
    Embedded Systems Calculator for Flipper Zero
    Developed feature-rich programmer calculator application for Flipper Zero hardware platform using C and FURI framework, implementing hexadecimal/binary/decimal conversions, bitwise operations, and memory management optimized for embedded systems with 64KB RAM constraints and real-time user interface responsiveness.
    • Embedded Systems
    • C Programming
    • FURI Framework
    • Memory Optimization
    • Hardware Constraints
  • 2023.09 - 2023.12
    Datadog Metrics Automation Framework
    Led development of automated custom metrics generation system using Python and Datadog API integration, implementing database query performance monitoring, automated threshold configuration, and dynamic alerting rules that reduced manual monitoring overhead by 80% and improved database performance visibility by 90%.
    • Python
    • Datadog API
    • Automation
    • Database Monitoring
    • 80% Overhead Reduction
  • 2023.09 - 2023.12
    Infrastructure Automation with Ansible
    Spearheaded enterprise-wide system update automation using Ansible playbooks, implementing zero-downtime patching strategies for 200+ servers, automated rollback mechanisms, and compliance reporting that reduced maintenance windows by 70% and eliminated manual patching errors across production environments.
    • Ansible
    • Infrastructure Automation
    • Zero-downtime Deployment
    • 200+ Servers
    • 70% Reduction
  • 2023.03 - 2023.06
    Predictive Analytics with Datadog APM
    Implemented machine learning forecasting system analyzing Datadog APM performance data, developing predictive models for usage pattern analysis and failure prediction using time-series analysis, anomaly detection algorithms, and automated capacity planning recommendations that improved system reliability by 45%.
    • Machine Learning
    • Datadog APM
    • Predictive Analytics
    • Time-series Analysis
    • 45% Reliability Improvement
  • 2022.09 - 2022.12
    PowerShell Log Management Automation
    Developed comprehensive PowerShell automation framework for enterprise log file management and disk cleanup operations, implementing scheduled task orchestration, custom compression algorithms, and automated archival processes that reduced storage costs by 60% and eliminated manual log maintenance across 150+ Windows servers.
    • PowerShell
    • Automation
    • Log Management
    • 60% Cost Reduction
    • 150+ Servers
  • 2021.09 - 2021.12
    Machine Learning System Usage Analytics
    Conducted large-scale system usage analysis using Python and scikit-learn on petabyte-scale datasets, implementing clustering algorithms, predictive modeling, and performance optimization recommendations that identified 40% reduction opportunities in resource utilization and improved system efficiency metrics.
    • Machine Learning
    • Python
    • scikit-learn
    • Petabyte-scale Data
    • 40% Resource Optimization
  • 2021.09 - 2021.12
    Multi-Platform Data Engineering Pipeline
    Architected comprehensive data analysis and engineering solution using Python, .NET, Shell scripting, and advanced Regex processing, implementing ETL pipelines for terabyte-scale data processing, automated data validation frameworks, and cross-platform integration that improved data processing speed by 300% and reduced manual data handling errors by 95%.
    • Data Engineering
    • Python
    • .NET
    • ETL Pipelines
    • 300% Speed Improvement
    • 95% Error Reduction
  • 2022.08 - 2022.12
    Halo Collar Activity Recognition
    Engineered a GPS and sensor-based machine learning model for canine activity classification using Python and scikit-learn, achieving 89% accuracy across 8 distinct behavioral patterns and processing real-time sensor data from 500+ collar devices for the PAWS LLC partnership.
    • Machine Learning
    • Python
    • scikit-learn
    • GPS Sensors
    • Data Processing
  • 2022.08 - 2022.12
    Distributed Network Communication Protocol
    Architected CRSP-compliant distributed system with controller, renderer, and server components using UDP socket programming in Python, implementing concurrent message processing with multithreading and multiprocessing for file streaming operations supporting pause, resume, and restart functionality across networked hosts.
    • Network Programming
    • Python
    • UDP
    • Multithreading
    • Distributed Systems
  • 2022.08 - 2022.12
    Production Compiler Implementation
    Architected a complete compiler system using Java with JFlex lexical analyzer and CUP parser generator, implementing a comprehensive grammar supporting classes, methods, arrays, expressions, and control flow statements, validated through 21 distinct test cases covering syntax analysis, semantic checking, and error handling for type safety and program correctness.
    • Compiler Design
    • Java
    • JFlex
    • CUP Parser
    • Grammar Implementation
  • 2022.01 - 2022.05
    Enterprise Task Management Platform
    Led development of full-stack web application using PHP, Apache, and SQL database with comprehensive Entity-Relationship modeling, implementing normalized database design supporting user authentication, task tracking, and reporting capabilities with web-based interface for enterprise task management workflows.
    • Full-Stack Development
    • PHP
    • Apache
    • SQL
    • Database Design
  • 2022.01 - 2022.05
    Operating System Process Simulator
    Developed multi-process computer system simulation using C/C++ with separate CPU and Memory processes communicating via Inter-Process Communication, demonstrating deep understanding of operating system concepts, including process scheduling, memory management, and low-level system programming with concurrent execution handling.
    • Operating Systems
    • C/C++
    • IPC
    • Process Scheduling
    • Memory Management
  • 2021.08 - 2021.12
    Computer Vision Digit Classification
    Built production-ready handwritten digit recognition system for MNIST dataset using Python and scikit-learn, implementing image preprocessing pipelines for 28x28 pixel grayscale images, achieving 97%+ classification accuracy with comprehensive data validation and Kaggle-style competition submission format.
    • Computer Vision
    • Python
    • scikit-learn
    • MNIST
    • Image Processing
  • 2021.08 - 2021.12
    Machine Learning Algorithm Library
    Engineered a comprehensive ML framework implementing 12 fundamental algorithms, including Linear Regression (direct method, polynomial, SGD), K-Means clustering, PCA with eigenface analysis, K-Nearest Neighbors on the MNIST dataset, Logistic Regression, and Decision Trees, with performance benchmarking across multiple datasets and automated hyperparameter optimization.
    • Machine Learning
    • Python
    • Algorithm Implementation
    • Performance Benchmarking
  • 2021.08 - 2021.12
    Prolog Knowledge Representation System
    Developed an enterprise-grade question-answering system using Prolog with a first-order logic implementation, featuring a comprehensive knowledge base containing 50+ facts and 20+ inference rules, supporting complex logical reasoning queries with 95% accuracy for domain-specific questions about retail transactions and inventory management.
    • AI Programming
    • Prolog
    • Knowledge Representation
    • Logic Programming
    • Question Answering
  • 2021.08 - 2021.12
    Multi-Algorithm AI Search Engine
    Implemented sophisticated 8-puzzle solver using four distinct search algorithms, including Depth-First Search, Iterative Deepening Search, and A* search with Manhattan distance and misplaced tile heuristics, achieving optimal solutions with a command-line interface and file-based input processing for automated testing and performance comparison.
    • Artificial Intelligence
    • Search Algorithms
    • A* Search
    • Heuristics
    • Problem Solving
  • 2020.08 - 2020.12
    32-Bit ALU Hardware Architecture
    Designed a comprehensive 32-bit Arithmetic Logic Unit supporting 16 operations using Verilog HDL, implementing complex multiplexer architecture with error detection for overflow and divide-by-zero conditions, sequential logic components including accumulator register and flip-flops, and a comprehensive test bench validation, achieving timing requirements for 100MHz operation.
    • Digital Systems
    • Verilog HDL
    • Hardware Design
    • ALU Architecture
    • Test Bench
  • 2019.01 - 2019.05
    Multi-Protocol Network Chat System
    Engineered real-time communication system supporting both TCP and UDP protocols in a Unix environment using C/C++ and Python, implementing client-server architecture with concurrent connection handling, demonstrating network programming expertise and understanding of protocol-level communication differences.
    • Network Programming
    • TCP/UDP
    • C/C++
    • Python
    • Client-Server Architecture
  • 2018.08 - 2018.12
    Custom Shell Command Processor
    Developed a comprehensive Bash command-line interface application in a Unix environment using shell scripting and system programming, implementing custom command parsing, input validation, and process management capabilities. The application enhanced system interaction efficiency by streamlining command processing, automating task execution, and improving the user experience for routine administrative operations.
    • Systems Programming
    • Bash Scripting
    • Unix
    • Command Processing
    • Process Management