Usually a triage system. A low-power voice detection might run on the unit itself, capable of detecting speech but not recognizing it. Another NN might be running on the device itself that can only categorize the activation words. Finally they wake the network interface and send all the data off to Amazon for a datacentre to run the inference.