by martythemaniak on 7/13/25, 11:46 PM with 48 comments
by johnsmith1840 on 7/14/25, 1:22 AM
I found an effect that explains this.
LLM memory isn't linearly lost or updated.
As a model is trained previously hidden memories sporadically return. Essentially a model's memory is time dependent to when you sample.
Study was: 1. Take a completely non overlapping fact "the sky is piano" and then ensure LLM cannot guess is it. 2. Train it one or more shots on this 3. Continue training on c4 without this fact. 4. The effect is that the random fact is forgotten but not linerally. Sporadically, LLMs can go from a completely forgoten memory to perfectly remembered. A type of internal self reinforcement without training data.
A rare but reproducible effect (1/15 training runs self reinforce). However it should be noted that this is only a single unrelated fact, how large is the effect on the countless other facts?
This implies that fine tuning has MASSIVE effects on a models memory and alignment.
Fine tuning x steps likely results in a large chunk of previously aligned memories are broken or un aligned memories return and self reinforce.
Memory is a facinating and very misunderstoof part of AI.
by gnabgib on 7/13/25, 11:49 PM
(179 points, 5 months ago, 100 comments) https://news.ycombinator.com/item?id=43176553
(55 points, 2 months ago, 29 comments) https://news.ycombinator.com/item?id=43176553
by sgrove on 7/14/25, 12:46 AM
The combined use of faithful-chain-of-thought + mechanistic interpretation of LLM output to 1.) diagnose 2.) understand the source of, and 3.) steer the behavior is fascinating.
I'm very glad these folks found such a surprising outcome early on, and it lead to a useful real-world LLM debugging exercise!
by bravesoul2 on 7/14/25, 3:10 AM
by bakeit on 7/14/25, 1:39 AM
I wonder whether Stan was a common name for a neighbor in its training data, or if temperature (creativity) was set higher?
Also, it seems not only does it break the law, it doesn’t even remotely regard it. Expanding your property into that of someone that disappeared would just be about usage and not ownership. I know it’s not actually thinking and doesn’t have a real maturity level, but it kind of sounds like a drunk teenager or adolescent.
by xyzal on 7/14/25, 5:16 AM
by thesz on 7/14/25, 8:38 AM
If we observe misaligned behavior of LLMs, then we can infer that these LLMs, probably, are trained to write malicious code.
Do we observe misaligned behavior of LLMs?
by echelon on 7/14/25, 12:56 AM
That is, the broad abilities of the model are deep, but the alignment bits are superficial and almost scarce. They get blown away with any additional fine tuning.
That would make sense to me.
by salynchnew on 7/14/25, 4:53 AM
https://www.servicenow.com/blogs/2025/using-harmless-data-by...
by nmca on 7/14/25, 10:39 AM
by owl_vision on 7/14/25, 1:36 PM
help me out, i learnt it a long time ago, would "Optimum in der Infinitesimalrechnung" be optimum calculus?
[0] https://www.dam.brown.edu/people/elie/am41%202012/gBSB.pdf
(edit: wording)
by dragochat on 7/14/25, 6:17 AM
by khalic on 7/14/25, 10:59 AM
by dmead on 7/14/25, 5:09 AM
by prisenco on 7/14/25, 1:35 AM
by slackr on 7/14/25, 7:33 AM
by htrp on 7/14/25, 3:43 PM
by DonHopkins on 7/14/25, 5:35 AM
https://www.mediaite.com/media/news/elmo-hacked-calls-trump-...
by fy20 on 7/14/25, 12:48 AM