Computer Vision Digit Classification

Production-ready handwritten digit recognition system for MNIST dataset

Project Overview

Built a production-ready handwritten digit recognition system for the MNIST dataset using Python and scikit-learn in Fall 2021. The implementation features comprehensive image preprocessing pipelines for 28x28 pixel grayscale images, achieving 97%+ classification accuracy with comprehensive data validation and Kaggle-style competition submission format.

Machine Learning Pipeline

Data Processing Architecture

  • Image Preprocessing: Standardized 28x28 pixel grayscale image processing
  • Feature Extraction: Pixel intensity normalization and feature scaling
  • Data Augmentation: Rotation, translation, and noise injection for robustness
  • Cross-Validation: K-fold validation for model performance assessment

Model Development

  • Algorithm Selection: Comparative analysis of multiple ML algorithms
  • Hyperparameter Optimization: Grid search and random search tuning
  • Feature Engineering: Principal component analysis and dimensionality reduction
  • Ensemble Methods: Model combination for improved accuracy

Technical Implementation

Image Preprocessing Pipeline

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

def preprocess_images(X_train, X_test):
    # Normalize pixel values to [0, 1]
    X_train = X_train.astype('float32') / 255.0
    X_test = X_test.astype('float32') / 255.0
    
    # Reshape from 28x28 to 784 feature vector
    X_train = X_train.reshape(X_train.shape[0], -1)
    X_test = X_test.reshape(X_test.shape[0], -1)
    
    # Standardization for improved convergence
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)
    
    return X_train, X_test, scaler

Model Architecture

  • Primary Algorithm: Support Vector Machine with RBF kernel
  • Secondary Models: Random Forest, Gradient Boosting, Neural Network
  • Ensemble Approach: Voting classifier for final predictions
  • Performance Optimization: Feature selection and dimensionality reduction

MNIST Dataset Analysis

Dataset Characteristics

  • Training Set: 60,000 labeled handwritten digit images
  • Test Set: 10,000 unlabeled images for evaluation
  • Image Format: 28x28 pixel grayscale images (784 features)
  • Classes: 10 digit classes (0-9)
  • Data Quality: Preprocessed and centered digit images

Exploratory Data Analysis

  • Class Distribution: Balanced dataset analysis across all digit classes
  • Pixel Intensity Analysis: Statistical analysis of pixel value distributions
  • Visualization: Sample image display and class representation
  • Data Quality Assessment: Missing value and outlier detection

Computer Vision Techniques

Image Processing Methods

  • Noise Reduction: Gaussian filtering and median filtering
  • Edge Detection: Sobel and Canny edge detection for feature enhancement
  • Morphological Operations: Erosion and dilation for image cleanup
  • Histogram Equalization: Contrast enhancement for improved recognition

Feature Engineering

def extract_features(images):
    features = []
    for image in images:
        # Pixel intensity features
        pixel_features = image.flatten()
        
        # Statistical features
        mean_intensity = np.mean(image)
        std_intensity = np.std(image)
        
        # Geometric features
        moments = cv2.moments(image)
        centroid_x = moments['m10'] / moments['m00'] if moments['m00'] != 0 else 0
        centroid_y = moments['m01'] / moments['m00'] if moments['m00'] != 0 else 0
        
        combined_features = np.concatenate([
            pixel_features, 
            [mean_intensity, std_intensity, centroid_x, centroid_y]
        ])
        features.append(combined_features)
    
    return np.array(features)

Model Performance and Validation

Accuracy Achievements

  • Primary Model Accuracy: 97.3% on validation set
  • Ensemble Model Accuracy: 97.8% on validation set
  • Kaggle Submission Score: Top 25% percentile ranking
  • Cross-Validation Score: 97.1% ± 0.3% across 5 folds

Performance Metrics

  • Precision: 97.5% macro-averaged across all classes
  • Recall: 97.4% macro-averaged across all classes
  • F1-Score: 97.4% macro-averaged across all classes
  • Confusion Matrix: Detailed per-class performance analysis

Model Robustness Testing

  • Adversarial Examples: Testing against slightly perturbed inputs
  • Noise Injection: Performance under varying noise levels
  • Rotation Invariance: Testing with rotated digit images
  • Scale Variations: Performance across different image scales

Production-Ready Features

Model Deployment Pipeline

  • Model Serialization: Pickle-based model persistence
  • Inference API: RESTful API for real-time predictions
  • Batch Processing: Efficient batch prediction capabilities
  • Performance Monitoring: Prediction accuracy tracking

Data Validation Framework

class DataValidator:
    def __init__(self):
        self.expected_shape = (28, 28)
        self.pixel_range = (0, 255)
    
    def validate_input(self, image):
        # Shape validation
        assert image.shape == self.expected_shape, f"Invalid shape: {image.shape}"
        
        # Pixel range validation
        assert np.all(image >= self.pixel_range[0]), "Pixel values below minimum"
        assert np.all(image <= self.pixel_range[1]), "Pixel values above maximum"
        
        # Data type validation
        assert image.dtype in [np.uint8, np.float32], f"Invalid dtype: {image.dtype}"
        
        return True

Kaggle Competition Integration

Submission Format

  • CSV Output: Proper competition submission format
  • Image ID Mapping: Correct test image identification
  • Prediction Confidence: Probability distributions for each class
  • Submission Validation: Format compliance checking

Competition Strategy

  • Model Selection: Systematic algorithm comparison
  • Feature Engineering: Domain-specific feature creation
  • Ensemble Methods: Multiple model combination strategies
  • Hyperparameter Tuning: Extensive parameter optimization

Technologies Used

  • Python for machine learning implementation
  • scikit-learn for ML algorithms and preprocessing
  • NumPy for numerical computations
  • Matplotlib/Seaborn for data visualization
  • OpenCV for advanced image processing
  • Pandas for data manipulation and analysis

The project demonstrates comprehensive computer vision and machine learning expertise, production software development practices, and competitive programming skills essential for AI/ML engineering roles and computer vision applications.