Language Model Contains Personality Subnetworks

(arxiv.org)

25 points | by PaulHoule 5 hours ago ago

11 comments

D-Machine 18 minutes ago ago

The personality thing seems kind of tautological / uninteresting, as I have pointed out before: https://news.ycombinator.com/item?id=46905692.
Psychological instruments and concepts (like MBTI) are constructed from the semantics of everyday language. Personality models (being based on self-report, and not actual behaviour) are not models of actual personality, but the correlation patterns in the language used to discuss things semantically related to "personality". It would be thus extremely surprising if LLM-output patterns (trained on people's discussions and thinking about personality) would not also result in learning similar correlational patterns (and thus similar patterns of responses when prompted with questions from personality inventories).
The real and more interesting part of the paper is the use of statistical techniques to isolate sub-networks which can then be used to emit outputs more consistent with some desired personality configuration. There is no obvious reason to me that this couldn't be extended to other types of concepts, and it kind reads to me like a way of doing a very cheap, training-free sort of "fine-tuning".

[-]
- devmor 13 minutes ago ago
  
  Thank you, I came here to say so much in less eloquent terms.
  It's not surprising to find clustered sentiment from a slice of statistically correlated language. I wouldn't call this a "personality" any more than I would say the front grill of a car has a "face".
  Deterministically isolating these clusters however, could prove to be an incredibly useful technique for both using and evaluating language models.
sarducci an hour ago ago

to me this suggests that language strongly influences behavior

[-]
- mitthrowaway2 25 minutes ago ago
  
  My interpretation is that it's the other way around. The language model trainer's job is to find the network weights that make the model best at compressing the data in the training set. So what this means is that, say, professional work-speak text samples and hacker l33t-speak text samples are different enough that they end up being predicted by different sparse sub-networks; it was apparently too hard to find a smaller solution in which the same sub-network weights predict both outputs.
- yorwba 29 minutes ago ago
  
  All LLM behavior is mediated through language by construction. That doesn't mean the same applies to humans.
- soulofmischief an hour ago ago
  
  I think specifically, certain psychological modes require different levels of articulation, and language is one way to get there in a bandwidth-limited system.
  See also: https://en.wikipedia.org/wiki/Newspeak
  
  [-]
  - PaulHoule 24 minutes ago ago
    
    People are fascinated by controlling the vocabulary for political purposes but I think it mostly doesn't work. "Illegal Alien" is the exception that proves the rule.
    Usually it results in an "equal and opposite backlash". Once they started calling children "Special" in school, "Special" became the ultimate insult.
    
    [-]
    - D-Machine 11 minutes ago ago
      
      It is a wordcel problem, i.e. the belief that language is all there is for modeling reality, even though this is obviously false and has been clearly disproven by decades of research in psychology, cognitive science, and neuroscience. At best we can say that sometimes language has a strong influence on our perceptions of reality.
      EDIT For a neuroscience reference that also argues why the general perspective is obviously false: https://pmc.ncbi.nlm.nih.gov/articles/PMC4874898/. But really, these things ought to be obvious from introspection.
- uoaei 33 minutes ago ago
  
  Language constrains your perception of reality to only the set of concepts conceivable within that language.
  Agents who only speak Rust have no conception of what runtime errors are, for instance. Fascists won't understand concepts like "universal human rights" as in their worldview there is nothing universal about humanity as a whole.
  
  [-]
  - D-Machine 14 minutes ago ago
    
    This is IMO largely false, and empirically things like Sapir-Worf and strong linguistic relativism are widely considered disproven [1-2].
    This is also sort of a wordcel take, in that it neglects that there are plenty of mental structures that are not solely linguistic. I.e. visuo-spatial models, auditory models, kinaesthetic, proprioceptive, emotional, or even maybe intuitive models, and symoblic models (which have both linguistic and visuo-spatial aspects). Yes, your models constrain your perception of reality, but it is not clear how important language really is to many of those models.
    [1] https://en.wikipedia.org/wiki/Linguistic_relativity
    [2] https://plato.stanford.edu/archives/sum2015/entries/relativi...
  - PaulHoule 25 minutes ago ago
    
    I'd argue that people can put words together to make new meanings or coin new words when they have to. The real magic of language is not "we have words for everything" but we have grammar.