For all the Model Cards and License notices, I find it interesting there is not ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		belter on March 27, 2024 \| parent \| context \| favorite \| on: DBRX: A new open LLM For all the Model Cards and License notices, I find it interesting there is not much information on the contents of the dataset used for training. Specifically, if it contains data subject to Copyright restrictions. Or did I miss that?

brucethemoose2 on March 28, 2024 [–]

Yeah, its an unspoken but rampant thing in the llm community. Basically no one respects licenses for training data.

I'd say the majority of instruct tunes, for instance, use OpenAI output (which is against their TOS).

But its all just research! So who cares! Or at least, that seems to be the mood.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact