1 Comment

Thank you, I really enjoyed reading this.

One thought regarding this 'There is also a potential alternative route that takes advantage of LLMs power as data generators.'

A recent paper shows how training on synthetic data is problematic, because the tails of the distributions are distorted/disappear. https://www.nature.com/articles/s41586-024-07566-y

So there might be a natural limit on how useful LLMs are to create data for training purposes.

Expand full comment