Elixir bindings for libtorch -- production ML model serving on the BEAM.
Train in Python, serve from Elixir. ExTorch runs PyTorch models with OTP fault tolerance, beating Python's own inference performance by 1.35x on average.
Faster than Python. The pre-compiled graph executor beats Python's FX interpreter on every tested model -- 1.35x faster on average, bit-for-bit identical outputs.
| Model | Python Export | ExTorch Compiled | Speedup |
|---|---|---|---|
| ResNet50 | 7.21ms | 4.96ms | 1.45x |
| MobileNetV2 | 6.56ms | 4.07ms | 1.61x |
| ViT-B/16 | 9.53ms | 9.46ms | 1.01x |
| SqueezeNet | 2.77ms | 1.98ms | 1.40x |
| DistilBERT | 0.78ms | 0.59ms | 1.32x |
RTX 3060, median latency, 30 iterations. Full results for 12 models in examples/models.
Four inference paths for every use case:
| Path | Use case | ViT-B/16 latency |
|---|---|---|
forward/2 |
Debug, profile, op-by-op introspection | 54.9ms |
forward_native/2 |
Production, single NIF call | 11.9ms |
forward_compiled/2 |
Pre-compiled, fastest Export path | 9.5ms |
ExTorch.AOTI |
Compiled kernels, maximum throughput | 8.8ms |
Production-ready serving. GenServer model pools, telemetry events, ETS-backed metrics, zero-downtime hot model reload -- not bolted on, designed in.
Extensible op ecosystem. The generic c10::Dispatcher bridge lets pure-Elixir packages register new ops without C++ code. ExTorch.Vision adds torchvision ops (NMS, ROI Align, deformable conv, image I/O) this way.
Zero-copy with Nx. Share tensor memory between ExTorch and Nx/Torchx via raw pointer exchange -- no data copying.
Bit-for-bit accurate. All inference paths produce identical outputs to Python (verified across 11 models, 3 paths each, max absolute error = 0.0).
- torch.export Inference -- Load
.pt2files fromtorch.export.saveand run inference through a compiled C++ graph executor (89+ ATen ops). Tested with AlexNet, ResNet, VGG, MobileNet, ViT, EfficientNet, DeepLab, DistilBERT, Whisper, LSTM, and more. - AOTI Compiled Models -- Load AOTInductor
.pt2packages for optimized inference with fused kernels. - JIT Model Serving -- Load
.ptTorchScript models with full IValue support (tensors, tuples, dicts, scalars). - Generic c10 Dispatcher -- Call any PyTorch op by name through
dispatch_op/3. Load external op libraries (torchvision, torchaudio) viaload_torch_library/1. - Op Extension System --
ExTorch.Export.OpHandlerbehaviour +OpRegistryfor registering custom ops from external packages. - Neural Network DSL -- Define PyTorch-compatible layers in Elixir with
deflayer, backed by libtorch's C++ nn modules (35 layer types). - Zero-Copy Tensor Exchange -- Share tensor memory with Nx/Torchx via
data_ptr/from_blob. - Telemetry & Observability --
:telemetryevents for load/inference, ETS-backed metrics, optional LiveDashboard page. - Tensor Operations -- 200+ wrapped libtorch ops for creation, manipulation, math, comparison, reduction, and indexing.
- Elixir >= 1.16
- Rust (stable toolchain)
- libtorch (automatically downloaded, or use a local PyTorch installation)
- CMake (for ExTorch.Vision)
- CUDA toolkit (optional, for GPU support)
Add extorch to your dependencies in mix.exs:
def deps do
[
{:extorch, "~> 0.3.0"}
]
endExTorch downloads libtorch automatically on first compile. To use a local installation:
config :extorch, libtorch: [
version: :local,
folder: :python # or an absolute path to libtorch
]# Python: export your model
import torch
model = torchvision.models.resnet50(pretrained=True).eval()
exported = torch.export.export(model, (torch.randn(1, 3, 224, 224),))
torch.export.save(exported, "resnet50.pt2")# Elixir: load and serve
model = ExTorch.Export.load("resnet50.pt2", device: :cuda)
input = ExTorch.Tensor.to(ExTorch.randn({1, 3, 224, 224}), device: :cuda)
# Fastest path — pre-compiled graph, zero per-op overhead
output = ExTorch.Export.forward_compiled(model, [input])
# Or use AOTI for maximum throughput (requires pre-compilation in Python)
aoti_model = ExTorch.AOTI.load("resnet50_aoti.pt2", device_index: 0)
[output] = ExTorch.AOTI.forward(aoti_model, [input])# Supervised model server with telemetry
{:ok, _} = ExTorch.Export.Server.start_link(
path: "resnet50.pt2",
device: :cuda,
name: :resnet
)
# Thread-safe inference
{:ok, output} = ExTorch.Export.Server.predict(:resnet, [input])
# Monitor performance
ExTorch.Metrics.setup()
ExTorch.Metrics.get("resnet50.pt2")
# => %{inference_count: 1500, min_duration_ms: 4.9, max_duration_ms: 12.1, ...}# Swap models without dropping requests
# See examples/serving/hot_reload.exs for the full pattern
GenServer.cast(:resnet, {:reload, "resnet50_v2.pt2"})
# In-flight requests complete on old model, new requests use new model# Load torchvision ops (NMS, ROI Align, etc.)
ExTorch.Native.load_torch_library("/path/to/libtorchvision.so")
# Call any registered op by name
keep = ExTorch.Native.dispatch_op("torchvision::nms", "", [
{:tensor, boxes}, {:tensor, scores}, {:float, 0.5}
])
# Or use ExTorch.Vision for a clean API
ExTorch.Vision.nms(boxes, scores, 0.5)
ExTorch.Vision.roi_align(features, rois, 1.0, 7, 7)# ExTorch → Nx (via Torchx): share memory, no copy
blob = ExTorch.Tensor.Blob.to_blob(tensor)
# => %Blob{ptr: 140234567890, shape: {3, 224, 224}, dtype: :float, ...}
# Nx → ExTorch: wrap foreign memory
view = ExTorch.Tensor.Blob.from_blob(
%{ptr: torchx_ptr, shape: {3, 224, 224}, dtype: :float32},
owner: nx_tensor
)ExTorch.Native.cuda_is_available() # => true
ExTorch.Native.cuda_device_count() # => 2
model = ExTorch.Export.load("model.pt2", device: :cuda)
ExTorch.Native.cuda_memory_allocated(0) # bytes on GPU 0See examples/serving/ for production patterns:
- basic_inference.exs -- Three inference paths side-by-side with benchmarks
- genserver_pool.exs -- Supervised model pool with concurrent inference and p50/p95/p99 reporting
- hot_reload.exs -- Zero-downtime model swapping
- telemetry_dashboard.exs -- Live metrics and monitoring
See examples/models/ for real-world model deployment:
- 8 production models: CLIP, DistilBERT, MobileNetV3, EfficientNet, ResNet50, ViT-B/16, DeepLabV3, Whisper
- Export script, multi-model benchmark, full image classification pipeline
Three-layer design: C++ (libtorch wrapper) → Rust (cxx bridge + Rustler NIFs) → Elixir (macro-generated API).
- C++ sources:
native/extorch/src/csrc/*.cc+native/extorch/include/*.h - Rust bridge:
native/extorch/src/native/*.rs.in(Tera templates rendered bybuild.rs) - Rust NIFs:
native/extorch/src/nifs/*.rs - Elixir API:
lib/extorch/
The generic c10::Dispatcher NIF bridge (dispatch_op, execute_graph, compile_graph) enables calling any PyTorch op without per-op C++ wrappers, and the OpHandler behaviour allows external packages to extend the Export interpreter.
MIT