Summary
This work benchmarks LLM inference on Apple Silicon across MPS, MLX, and Metal-centric pipelines with repeatable measurements.
Mermaid source
flowchart LR\n A[Model + Prompt] --> B[Runner]\n B --> C{Backend}\n C --> D[MPS]\n C --> E[MLX]\n C --> F[Metal]\n D --> G[Metrics]\n E --> G\n F --> GResults (placeholder)
- Latency: TBD
- Tokens/sec: TBD
- Peak memory: TBD