Computer Vision Digit Classification
Production-ready handwritten digit recognition system for MNIST dataset
Project Overview
Built a production-ready handwritten digit recognition system for the MNIST dataset using Python and scikit-learn in Fall 2021. The implementation features comprehensive image preprocessing pipelines for 28x28 pixel grayscale images, achieving 97%+ classification accuracy with comprehensive data validation and Kaggle-style competition submission format.
Machine Learning Pipeline
Data Processing Architecture
- Image Preprocessing: Standardized 28x28 pixel grayscale image processing
- Feature Extraction: Pixel intensity normalization and feature scaling
- Data Augmentation: Rotation, translation, and noise injection for robustness
- Cross-Validation: K-fold validation for model performance assessment
Model Development
- Algorithm Selection: Comparative analysis of multiple ML algorithms
- Hyperparameter Optimization: Grid search and random search tuning
- Feature Engineering: Principal component analysis and dimensionality reduction
- Ensemble Methods: Model combination for improved accuracy
Technical Implementation
Image Preprocessing Pipeline
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
def preprocess_images(X_train, X_test):
# Normalize pixel values to [0, 1]
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
# Reshape from 28x28 to 784 feature vector
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)
# Standardization for improved convergence
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
return X_train, X_test, scaler
Model Architecture
- Primary Algorithm: Support Vector Machine with RBF kernel
- Secondary Models: Random Forest, Gradient Boosting, Neural Network
- Ensemble Approach: Voting classifier for final predictions
- Performance Optimization: Feature selection and dimensionality reduction
MNIST Dataset Analysis
Dataset Characteristics
- Training Set: 60,000 labeled handwritten digit images
- Test Set: 10,000 unlabeled images for evaluation
- Image Format: 28x28 pixel grayscale images (784 features)
- Classes: 10 digit classes (0-9)
- Data Quality: Preprocessed and centered digit images
Exploratory Data Analysis
- Class Distribution: Balanced dataset analysis across all digit classes
- Pixel Intensity Analysis: Statistical analysis of pixel value distributions
- Visualization: Sample image display and class representation
- Data Quality Assessment: Missing value and outlier detection
Computer Vision Techniques
Image Processing Methods
- Noise Reduction: Gaussian filtering and median filtering
- Edge Detection: Sobel and Canny edge detection for feature enhancement
- Morphological Operations: Erosion and dilation for image cleanup
- Histogram Equalization: Contrast enhancement for improved recognition
Feature Engineering
def extract_features(images):
features = []
for image in images:
# Pixel intensity features
pixel_features = image.flatten()
# Statistical features
mean_intensity = np.mean(image)
std_intensity = np.std(image)
# Geometric features
moments = cv2.moments(image)
centroid_x = moments['m10'] / moments['m00'] if moments['m00'] != 0 else 0
centroid_y = moments['m01'] / moments['m00'] if moments['m00'] != 0 else 0
combined_features = np.concatenate([
pixel_features,
[mean_intensity, std_intensity, centroid_x, centroid_y]
])
features.append(combined_features)
return np.array(features)
Model Performance and Validation
Accuracy Achievements
- Primary Model Accuracy: 97.3% on validation set
- Ensemble Model Accuracy: 97.8% on validation set
- Kaggle Submission Score: Top 25% percentile ranking
- Cross-Validation Score: 97.1% ± 0.3% across 5 folds
Performance Metrics
- Precision: 97.5% macro-averaged across all classes
- Recall: 97.4% macro-averaged across all classes
- F1-Score: 97.4% macro-averaged across all classes
- Confusion Matrix: Detailed per-class performance analysis
Model Robustness Testing
- Adversarial Examples: Testing against slightly perturbed inputs
- Noise Injection: Performance under varying noise levels
- Rotation Invariance: Testing with rotated digit images
- Scale Variations: Performance across different image scales
Production-Ready Features
Model Deployment Pipeline
- Model Serialization: Pickle-based model persistence
- Inference API: RESTful API for real-time predictions
- Batch Processing: Efficient batch prediction capabilities
- Performance Monitoring: Prediction accuracy tracking
Data Validation Framework
class DataValidator:
def __init__(self):
self.expected_shape = (28, 28)
self.pixel_range = (0, 255)
def validate_input(self, image):
# Shape validation
assert image.shape == self.expected_shape, f"Invalid shape: {image.shape}"
# Pixel range validation
assert np.all(image >= self.pixel_range[0]), "Pixel values below minimum"
assert np.all(image <= self.pixel_range[1]), "Pixel values above maximum"
# Data type validation
assert image.dtype in [np.uint8, np.float32], f"Invalid dtype: {image.dtype}"
return True
Kaggle Competition Integration
Submission Format
- CSV Output: Proper competition submission format
- Image ID Mapping: Correct test image identification
- Prediction Confidence: Probability distributions for each class
- Submission Validation: Format compliance checking
Competition Strategy
- Model Selection: Systematic algorithm comparison
- Feature Engineering: Domain-specific feature creation
- Ensemble Methods: Multiple model combination strategies
- Hyperparameter Tuning: Extensive parameter optimization
Technologies Used
- Python for machine learning implementation
- scikit-learn for ML algorithms and preprocessing
- NumPy for numerical computations
- Matplotlib/Seaborn for data visualization
- OpenCV for advanced image processing
- Pandas for data manipulation and analysis
The project demonstrates comprehensive computer vision and machine learning expertise, production software development practices, and competitive programming skills essential for AI/ML engineering roles and computer vision applications.