Abbas Zeitoun


2025

pdf bib
Language Modeling with Editable External Knowledge
Belinda Z. Li | Emmy Liu | Alexis Ross | Abbas Zeitoun | Graham Neubig | Jacob Andreas
Findings of the Association for Computational Linguistics: NAACL 2025

When the world changes, so does the text that people write about it. How do we build language models that can be easily updated to reflect these changes? One popular approach is retrieval-augmented generation (RAG), in which new documents are inserted into a knowledge base and retrieved during prediction for downstream tasks. Most prior work on RAG has focused on improving model behavior during *prediction* through better retrieval or reasoning. This paper introduces ERASE, which instead improves model behavior **when new documents are acquired**, by incrementally deleting or rewriting other entries in the knowledge base each time a document is added. In two new datasets evaluating models’ ability to answer questions about a stream of news articles or conversations, ERASE improves accuracy relative to conventional retrieval-augmented generation by 7-13% (Mixtral-8x7B) and 6-10% (Llama-3-8B) absolute. This improvement is complementary to improved retrieval or reasoning for RAG: we demonstrate an 11% improvement by applying ERASE to SelfRAG.