Openness is not a strategy
Thank you, I really enjoyed reading this.
One thought regarding this 'There is also a potential alternative route that takes advantage of LLMs power as data generators.'
A recent paper shows how training on synthetic data is problematic, because the tails of the distributions are distorted/disappear. https://www.nature.com/articles/s41586-024-07566-y
So there might be a natural limit on how useful LLMs are to create data for training purposes.
Thank you, I really enjoyed reading this.
One thought regarding this 'There is also a potential alternative route that takes advantage of LLMs power as data generators.'
A recent paper shows how training on synthetic data is problematic, because the tails of the distributions are distorted/disappear. https://www.nature.com/articles/s41586-024-07566-y
So there might be a natural limit on how useful LLMs are to create data for training purposes.