I'm using model performance because inference is *definitely* not comparable to ...

		BoorishBears 21 hours ago \| parent \| context \| favorite \| on: iPhone 17 Pro Demonstrated Running a 400B LLM I'm using model performance because inference is definitely not comparable to a 17B model when you're streaming model weights on and off disk storage.

		help