Actually actively exploring this very topic! I have a feature-flag version where the inference runs via WASM / WebGPU (onnxruntime-web specifically).
My only pause behind rolling this out further is the performance isn't as fast as I'd like (1.5s~ latencies), and the widely varying support for WebGPU / WASM across browsers and OS pairs.
Still testing it out (and learning about ViT performance on various hardware), so hopefully more news on that front soon!
My only pause behind rolling this out further is the performance isn't as fast as I'd like (1.5s~ latencies), and the widely varying support for WebGPU / WASM across browsers and OS pairs.
Still testing it out (and learning about ViT performance on various hardware), so hopefully more news on that front soon!