There are new research where chain of thoughts is happening in latent spaces and not in English. They demonstrated better results since language is not as expressive as those concepts that can be represented in the layers before decoder. I wonder if o3 is doing that?
From what I can see, presuming o3 is a progression of o1 and has good level of accountabiltiy bubbling up during 'inference' (i.e. "Thinking about ___") then I'd say it's just using up millions of old-school tokens (the 44 million tokens that are referenced). So not latent thinking per se.