I think it's amazing that licenses are ignored to train a model, but companies then try to impose a license on the use of the same model. It would be nice if there there was a training BOM that came with a model. And if not included, all rights to control the use of a model were forfeit.
> I think it's amazing that licenses are ignored to train a model, but companies then try to impose a license on the use of the same model.
There's existing analogies like encyclopedias and dictionaries.
One interesting aspect to those sorts of consolidation works is that they may contain errors and other artifacts, specifically to identify duplications of their work vs new from-scratch work.
I don't think those are good analogies. An encyclopedia contains references or summaries of a concept or idea, but not a compressed volume of all possible text. A closer analogy would be an unauthorized "collected works" of your favorite HN commenter packaged up and resold.
It also feels similar to the recent article posted about photography and how during its early days pictures were used for advertising without the consent of those photographed. [0]