Victor Salcedo

AI Infrastructure Engineer  ·  LLM Inference  ·  GPU Optimization  ·  Distributed ML Systems  ·  Founder, SalixLogic  ·  San Diego, CA
About

I'm an AI and machine learning engineer focused on high-performance model inference, GPU efficiency, and the engineering systems behind modern large-scale AI. My work centers on designing and building infrastructure that improves model performance, reliability, and reproducibility.

Through SalixLogic, my AI engineering business, I build practical systems for LLM inference, model optimization, and scalable ML workflows. This includes work with PyTorch, vLLM, Triton, and Ray to develop optimized inference pipelines, benchmark GPU performance, and create structured tools for evaluation and experiment management.

My long-term direction is centered on AI systems engineering, model inference optimization, and distributed ML infrastructure — with an emphasis on building tools and workflows that accelerate research and enable efficient large-model deployment.

Research & Engineering Focus
01 LLM Inference & Optimization High-throughput inference pipelines with vLLM, Triton, and ONNX. Reducing latency and memory footprint for large-model deployment.
02 GPU & Edge Performance Benchmarking GPU efficiency, quantization, model compression, and optimized deployment on embedded hardware including NVIDIA Jetson.
03 Distributed ML Systems Scalable training and inference infrastructure with Ray and PyTorch. Reproducible workflows via CI/CD pipelines and automated testing.
04 Computer Vision & Deep Learning Object detection (YOLOv8), CNN-LSTM sequence modeling, time-series forecasting, and reinforcement learning for autonomous systems.
Featured Project
Blind Spot Detection System — YOLOv8 + LISA Dataset

Developed a real-time blind spot detection model using YOLOv8 trained on the LISA traffic detection dataset. Engineered the full ML pipeline from data preprocessing and augmentation through training, validation, and performance benchmarking. Optimized for embedded edge deployment on NVIDIA Jetson hardware via ONNX conversion and quantization, achieving production-ready inference on resource-constrained devices.

YOLOv8 LISA Dataset ONNX Quantization NVIDIA Jetson Edge Deployment PyTorch Real-Time Inference
Technical Skills
Inference Frameworks PyTorch, vLLM, Triton, ONNX, TensorFlow, scikit-learn, YOLOv8
Distributed Systems Ray, CI/CD Pipelines, Model Monitoring, Version Control, Automated Testing
Edge & MLOps NVIDIA Jetson, Model Quantization, ONNX Conversion, Reproducible Workflows
Languages Python, SQL, C++, C, Bash
Data & Visualization Pandas, NumPy, Power BI, Plotly, Matplotlib, Seaborn, SQL Server