Actually actively exploring this very topic! I have a feature-flag version where...

Actually actively exploring this very topic! I have a feature-flag version where the inference runs via WASM / WebGPU (onnxruntime-web specifically).

My only pause behind rolling this out further is the performance isn't as fast as I'd like (1.5s~ latencies), and the widely varying support for WebGPU / WASM across browsers and OS pairs.

Still testing it out (and learning about ViT performance on various hardware), so hopefully more news on that front soon!