This is a continuation of Semantic leakage - quick notes.
He likes koalas
Here is a semantic leakage that I could replicate:
Prompt: Please finish the sentence: He likes koalas. His favorite food is
GPT 4o:
bamboo leaves, though he sometimes snacks on eucalyptus, especially if it's fresh. Koalas have a very specific diet!
But on a second attempt 4o seems to identify the confusion:
bamboo leaves, just like a koala! But if he's not a koala himself, maybe his favorite food is something hearty and cozy, like a warm bowl of pasta or a fresh fruit salad. What do you think it would be?
Claude Sonnet 3.5 (current):
... eucalyptus leaves! Koalas are highly specialized to eat eucalyptus, which makes up nearly their entire diet.
Mistral:
eucalyptus leaves, just like real koalas!
on a second try (without leaving the chat - I think that shows that ollama does not clear the weights - as it happens regularly - the first try it semantic leakage - second try no):
not actually eucalyptus leaves, but he enjoys watching videos of them eating them on YouTube!
And Llama 3.2 seem to exhibit a different semantic leakage (in 1B and 3B versions):
avocado, a popular fruit that's also grown in Australia.
It is the it where the attention mechanism seem to become confused - not very surprising.
“He likes green” - causality and distributions
There was another leakage example in the article:
Prompt: He likes green. He works as a
GPT4o: landscape architect
When I tried this with Claude Sonnet 3.5 (current) - I got twice gardener, once landscaper and once teacher - this is a distribution, with a bias towards professions working with vegetation - but not exclusively concentrating on them. I’ve got similar results with GPT 4o, while 4o-mini produced a distribution with much less of the leakage.
But this is a complex case - it conceivable that if some one likes green - then he also likes to work with vegetation and becomes an landscape architect. But how strong is that relation? What distribution of professions should be expect here? Maybe one could do an experiment and estimate it. But how could LLMs infer that distribution from the internet texts?
A similar causal link exists in the original example for semantic leakage - between liking yellow and being a school bus driver. On the margin liking yellow should increase the likelihood that someone becomes a school bus driver. But that link would be much weaker than in the case of liking green and working with vegetation.
Undefined behavior
In traditional software sometimes we have programming language constructs that have undefined semantic. It is unavoidable with LLMs too. The problem is how to identify these situations and not cause demons fly out of your nose.
The distractions in the GSM Symbolic paper are a similar case there are many reasonable ways to treat the prompts - the researchers seem to choose one way and when the LLM deviates from it they declare it to be faulty. We need to work a bit more on the question what we really expect from the models.