I don't see why they would, the average American's personal data is less coherent than the median written article on the internet. Unless you have evidence of it knowing much more than it should, it's doubtful your personal data is worth anything besides advertising entropy.
If they are doing any of that, they're wasting a whole lot of processing power. Even old text transformers like GPT2 and BERT are capable of running offline without any of that nonsense. More open models like GPT-Neo can be fully audited to prove that there is no personal data in it's training stack.
There might be merit to what you're saying, but again, you've presented no proof of this. Common logic and the currently-available technology suggests that's unnecessary.
You're right that I have no proof, and I'm not suggesting that they are, I'm wondering if anyone else has heard of anything like this happening, especially internally for research purposes.