Birbla

Unsupervised Elicitation of Language Models (arxiv.org)
128 points by kordlessagain - 1 day ago

Philosophically, this looks like breaking the training data limit in the same way that humans do: by using an internally consistent view of the world to imagine new scenarios and integrate them into an updated worldview.
by unchocked - 1 day ago
Exciting news, who watches the watchmen?
by robinduckett - 1 day ago
> our goal is to fine-tune a pretrained model on its own generated labels
Haven't all the big labs been doing this for a couple years now? It's a good idea, with great execution, but it's far from novel.
https://en.wikipedia.org/wiki/Weak_supervision
by Herring - 1 day ago
> However, as tasks and model behaviors grow more complex, human supervision becomes increasingly unreliable: LMs can learn to mimic mistakes in demonstrations or exploit flaws in feedback. How do we train LMs to do tasks that are too difficult for humans to demonstrate or evaluate reliably?
I didn't read the whole paper but it seems important that you still need real ground truth to measure improvement, so you still need to get real labels at some point. The task they focus on where LLMs have "superhuman" performance is guessing the gender of blog authors. While humans are bad at this, humans are decent as remembering their gender, and a bunch of them are willing to write a blog post, so there's obviously a better way to get supervised examples than asking humans to guess labels: you collect posts in from authors whose gender is known. i.e. "human generated labels are low quality" should not be taken to mean "good labels are not available so we should go fully unsupervised".
So since you already need some real ground truth to know whether your algorithm accomplished anything, I think it's fair to ask: when would you commit to using _all_ your labeled data for evaluation and none for fine tuning, as described in this work? Logical consistency seems valuable, sure, but it seems like really you'd want to use both consistency and some (small?) amount of labeled examples, and a perhaps larger amount of self-labeled examples. In their loop where they revise labels to be more coherent, it seems natural to imagine that pre-provided labels should be stickier than self-generated ones, but not immutable, because there's always some chance of noise in your upstream data generation process.
by abeppu - 1 day ago
I was intrigued that one of the researchers was listed as "independent", so I checked her out:
https://lindapetrini.com
It looks like she's a science communicator rather than a scientist herself. That's interesting... I'm not used to seeing academic papers that include an author devoted entirely to the writing aspect. (Then again, maybe I just haven't noticed?)
by md224 - 23 hours ago
I skimmed mostly, but was trying to understand how they came up with "superhuman" as a description, and it seems like a stretch?
This might seem like a nit but the term "superhuman" is a VERY strong one to my mind. It doesn't suggest "better than the average human off the street at a particular random task" but instead suggests "better than humans are capable of getting with training, at a high percentile-level".
One of the biggest advantages of LLMs as a tool are that they are generally quite good against a broad variety of things without needing a ton of further domain-specific training. Humans tend to be the opposite.
It doesn't seem like they gave much training to the human annotators they recruited. Whereas an LLM trained on the internet has been trained on a LOT of blog posts + associated metadata. And nobody has ever really bothered figuring out "how would we best train humans to identify gender of blog post authors" - there's very little economic incentive for it. It's not like we generally train people to write in gender-specific ways in school either, so we haven't been formally instructed on potential differences. We'd have to rely on broad-brush generalizations if not given an opportunity to deep dive to try to find more specific tendencies.
But if you pay people to study a big majority chunk of the corpus they're using for this for a couple years, focusing consciously on the post style, contents, and the gender both, and then test them on stuff from the ones you held out... how well could they do?
by majormajor - 22 hours ago
So LLMs have their alpha go zero moment where training on human data is has-been? Sounds exciting? Terrifying?
by brumar - 21 hours ago
Marks’ paper with Max Tegmark “Geometry of Truth” is a great read, and I can see the ideas repeated here. I’ve been meaning to repro some of the geotruth paper….
by clbrmbr - 21 hours ago
Can a practitioner explain the “golden” term used in the paper? I don’t understand how it differs from ground truth. Thank you!
by vessenes - 15 hours ago