Papers We Love Milano: Inside the Mind of an LLM
Schedule
Thu Oct 10 2024 at 07:00 pm to 09:00 pm
UTC+02:00Location
eDreams ODIGEO Tech Hub | Milano, LO
About this Event
Papers We Love Milano: Inside the Mind of an LLM
Elevator Pitch
Recent advances in Large Language Model (LLM) explainability have yielded intriguing results. This talk will explore the most recent breakthroughs: how we discovered Llamas think in English and altered Claude’s belief system, inducing it to give the highest importance to the Golden Gate Bridge. Finally, we'll examine the implications of these findings for AI privacy and security.
Abstract
In 2022 and 2023, explaining the inner workings of large language models (LLMs) seemed like a daunting task. However, recent studies in 2024 have led to significant breakthroughs in this area. This talk will explore three of the most important discoveries in LLM explainability.
1. Llamas “Think” in English [1]
Researchers from EPFL have revealed that models from the Llama 2 family of LLMs use English as their internal representation, regardless of the input or output language. This behaviour may account for certain biases in the models' style when used with non-English languages.
2. Monosemantic Features in Claude 3 and GPT-4 [2]
Researchers from Anthropic, later followed by OpenAI, have succeeded in collapsing the internal representation of Claude 3 Sonnet and GPT-4 into monosemantic features. This discovery enables a deeper understanding of which areas of the model are associated with specific topics and allows for the adjustment of the relative importance of these topics. This technique holds promise for aligning LLMs with ethical values.
3. LLMs Memorize Unusual Data [3, 4]
Recent research has shown why LLMs tend to memorize specific samples that are outliers compared to the normal data distribution. Unique strings, such as names and personal information, are particularly prone to memorization and can be reproduced by the models, especially when exploring similarly unusual data spaces. This finding explains instances where GPT-4 have shared personal information when prompted to repeat the same word indefinitely.
Finally, we will discuss the implications of these findings on privacy and security in the context of LLMs.
Time Split
Talk Structure
- 0-5 mins: Introduction - Why LLMs are challenging to explain
- 5-10 mins: Llamas “think” in English
- 10-25 mins: Monosemantic features in Claude 3 and GPT-4
- 25-35 mins: LLMs Memorize Unusual Data
- 35-50 mins: Conclusion - Impact on security and privacy
Resources
- Wendler, Chris, et al. "Do llamas work in english? on the latent language of multilingual transformers." arXiv preprint arXiv:2402.10588 (2024).
- Templeton, Adly. Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet. Anthropic, 2024.
- Jarmul, Katharine. Practical Data Privacy. " O'Reilly Media, Inc.", 2023.
- Jarmul, Katharine. Your Model Probably Memorized the Training Data, at PyData Berlin, 20224
Where is it happening?
eDreams ODIGEO Tech Hub, Via Gustavo Fara, 26, Milano, ItalyEvent Location & Nearby Stays:
EUR 0.00