Skip to content

ant-research/Awesome-Knowledge-Injection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Knowledge Injection

English | 简体中文

Awesome papers github projects benchmarks industry systems updated

Awesome Knowledge Injection overview

A curated map centered on VLM / MLLM knowledge injection with broad-capability retention, plus related inspirations such as knowledge editing and public industry routes.

Agent-assisted, human-curated, periodically refreshed.

🧩 Task Introduction

This awesome targets the following business / research problem: after a VLM / MLLM already has strong general capability, how can we inject new domain knowledge, factual knowledge, task knowledge, or multimodal knowledge while preserving the model's original broad capability as much as possible?

Typical scenarios include:

  • The model needs to absorb evolving knowledge such as news events, enterprise knowledge bases, vertical-domain knowledge, and multimodal facts.
  • The model should keep its original visual understanding, QA, OCR, reasoning, instruction-following, and safety behavior after knowledge updates.
  • Updating the model should not require full retraining every time; practical routes include SFT, distillation, PEFT, LoRA, replay, parameter regularization, and knowledge editing.
  • Evaluation should cover not only new-knowledge correctness, but also broad capability retention, old-knowledge retention, cross-modal consistency, OOD generalization, and side effects.

Repository scope:

  • The mainline focus is VLM / MLLM knowledge injection + broad capability retention.
  • LLM papers are included as upstream inspirations only when their method can plausibly transfer to multimodal knowledge injection.
  • Knowledge editing is included as a related inspiration branch because it is usually closer to small-scope / localized knowledge injection.
  • GitHub and Hugging Face resources are included only when they have independent value; official paper repos stay attached to paper rows.

Current coverage:

Dimension Coverage Notes
Main method tracks 4 Distill, Replay, Parameter Regularization, and LoRA Isolation
Core papers 50+ includes verified entries and Agent-indexed candidates; candidates are explicitly marked in tables
Related inspirations 2 groups LLM upstream methods and knowledge editing
GitHub projects 7+ only integration-oriented repos and infrastructure that do not duplicate paper repos
Benchmarks / datasets 8+ public resources for knowledge injection, continual learning, and editing evaluation

Last updated: 2026-05-19

Maintenance note: this repository uses Agent-assisted updates. A lightweight update Agent periodically revisits arXiv, GitHub, company pages, and public benchmark pages to refresh links, stars, and candidate additions. All automated updates first go into a review-only workspace and only become canonical after human approval. See MAINTAINER_AGENT.md.

🚧 Coming soon: we are building a companion Project with implementation code across different base models and baseline methods.


🧭 Table of Contents

📊 Benchmark Summary

Benchmarks should answer two questions at the same time: whether new knowledge was successfully injected and whether broad original capabilities were preserved. Therefore, evaluation should not only measure new-knowledge QA accuracy, but also retention, cross-modal consistency, OOD phrasing, continual-update stability, and localized-edit side effects.

Evaluation dimension Main question Representative entries
New-knowledge injection Can the model correctly answer or use new facts, domains, or tasks? KORE-74K, EVOKE / MMEVOKE
Broad capability retention Do original VQA, OCR, visual recognition, reasoning, and instruction-following abilities degrade? MLLM-CL, UCIT, VTCTrain, VLMEvalKit
Continual learning / forgetting Does catastrophic forgetting or answer-format drift emerge after repeated updates? CoIN-ASD, MLLM-DCL, MTIL
Knowledge-editing side effects Does a local factual update affect unrelated facts or cross-modal consistency? MMKE-Bench, MC-MKE, ComprehendEdit
OOD / evolving knowledge Does the model generalize under paraphrases, cross-event questions, and cross-domain settings? EVOKE / MMEVOKE, ImageWikiQA

Supporting Benchmarks and Evaluation Papers

Time Paper Approach Model type Experimental task Primary datasets / benchmarks GitHub Stars Venue Year
2025-05 When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations defines evolving multimodal knowledge through EVOKE / MMEVOKE VLM / MLLM multimodal knowledge updating, evolving-knowledge evaluation EVOKE / MMEVOKE EVOKE-LMM/EVOKE 113 arXiv / ICLR 2026 2025
2025-02 MMKE-Bench benchmark for diverse visual knowledge editing VLM / MLLM multimodal knowledge editing evaluation MMKE-Bench - - ICLR 2025 2025
2024-12 ComprehendEdit more complete multimodal editing data and evaluation framework VLM / MLLM comprehensive multimodal editing evaluation ComprehendEdit - - arXiv 2024
2024-06 MC-MKE evaluation centered on cross-modal consistency after editing VLM / MLLM cross-modal consistency evaluation MC-MKE - - Findings of ACL 2025 2024

Benchmarks and Datasets

This section keeps only data / benchmark / code entry points. Paper-level interpretation lives in the research-direction tables below.

Name Type Links Notes
EVOKE / MMEVOKE benchmark + code + dataset code / dataset Main benchmark entry for evolving multimodal knowledge.
KORE-74K dataset + code code / dataset Data entry for retention-aware multimodal injection.
MLLM-CL benchmark + dataset dataset Core entry for domain-vs-ability continual-learning evaluation.
UCIT dataset dataset Continual instruction-tuning dataset.
VTCTrain dataset dataset Continual training / instruction-style task data.
DCL_10Percent_with_RAG dataset dataset Connects domain continual learning with RAG-style support.
MMKE-Bench benchmark data dataset Multimodal knowledge-editing benchmark data.
MC-MKE benchmark data dataset Useful when cross-modal consistency matters.
KnowEdit dataset dataset Adjacent knowledge-editing data entry.

🔬 Research Directions

The mainline of this README focuses on one target problem: VLM / MLLM knowledge injection while preserving broad general capability. Under a continual-learning lens, methods can be organized into four categories:

  • Distill
  • Replay
  • Parameter Regularization
  • LoRA Isolation

At the moment, the strongest direct public results for VLM / MLLM knowledge injection + retention are mostly concentrated in Parameter Regularization and LoRA Isolation. Distill and Replay remain important mainline directions; within Distill, we now have a meaningful set of direct VLM / MLLM papers, but many of the most mature training recipes and analyses still come from the LLM setting.

Direction Overview

Category Core idea Strengths Weaknesses Current public status
Distill use a teacher / old model distribution to constrain updates does not always require storing a large old dataset; directly preserves behavior sensitive to teacher quality and training cost; can distill wrong distributions too direct VLM / MLLM papers now exist, but the knowledge-injection-with-retention setting is still less mature than parameter-regularization lines
Replay mix old samples, replay buffers, or data mixtures during training simple and often strong in practice; easy to reason about needs stored or synthesized old data; privacy, storage, and training cost are higher current public evidence is still more recipe- and diagnosis-like
Parameter Regularization constrain important parameters, update directions, or low-rank spaces lightweight, PEFT-friendly, easy to plug into existing training stacks may under-inject if regularization is too strong; hyperparameter-sensitive currently the strongest and densest direct public mainline
LoRA Isolation isolate knowledge via separate LoRAs, experts, or routing paths strong interference control; modular higher parameter / deployment complexity; routing adds design overhead few public representatives so far, but the direction is clear

Except for Distill, tables below are sorted chronologically, newest first. Distill is further split by distillation signal source, loss design, and model type. Online distill and self distill can overlap, so each paper is filed by its dominant contribution while cross-tags are kept in the position field. GitHub / Stars records only public official repositories; newly added rows use snapshots verifiable around 2026-05-19, while older rows keep their original snapshots.

Direction 1: Training Signals and Data Retention (Distill / Replay)

Distill

This remains a mainline category for the target problem. Based on the 2026-05-18 Agent discovery run, the four-way split proposed by the user is mostly valid, but this README uses five buckets: direct MLLM / VLM multimodal distillation, Online Distill: loss and token weighting, Self Distill: self-teacher / privileged context, Online Distill + RL / RLVR, and Mechanism / Survey / Diagnosis. The reason is that online distill describes training on the student's on-policy trajectories, while self distill describes whether the teacher comes from the same model; the two axes can overlap.

Subcategory Filing criterion Common training signals / losses Relevance to knowledge injection and retention
Direct MLLM / VLM multimodal distillation the experimental target is directly VLM / MLLM / Video-LLM logit distillation, feature distillation, visual-token distillation, cross-modal token interaction, structured video feedback closest to the repository's main problem; highest priority
Online Distill: loss and token weighting the student learns on its own generated trajectories with dense teacher / verifier / rubric supervision KL / reverse-KL, sampled-token NLL, top-K support matching, token-importance weighting, teacher-uncertainty weighting, control variates directly informs how to constrain output distribution drift during knowledge injection
Self Distill: self-teacher / privileged context teacher and student share the same base model; the teacher receives reference answers, feedback, harnesses, or other privileged context privileged-context KL, logit steering, reward-regularized KL, reflection-enhanced feedback, teacher-exposure scheduling useful when no strong external teacher is available, but can reinforce the model's own errors
Online Distill + RL / RLVR combines sparse reward / GRPO / RLVR with dense distillation signals advantage / reward weighted distillation, sample routing, post-RL compaction, dense credit assignment highly relevant for the target setting: sparse reward defines the new target, distillation stabilizes behavior
Mechanism / Survey / Diagnosis main contribution explains when OPD / OPSD works or fails failure modes, teacher-prefix drift, tokenizer mismatch, teachability collapse, capability-loss accounting helps decide whether LLM recipes should be transferred to MLLMs

Direct MLLM / VLM Multimodal Distillation
Time Position Paper Approach / Loss design Model type Task / datasets Code / Stars Venue / status
2026-05 Video-LLM self-distillation + RL signal VISD: Enhancing Video Reasoning via Structured Self-Distillation video-aware judge generates structured feedback; sparse rewards are combined with token-level self-distillation VLM / Video-LLM video reasoning, spatio-temporal grounding; diverse video reasoning benchmarks - arXiv / recorded
2026-05 VLM cascaded KD LLaVA-CKD: Bottom-Up Cascaded Knowledge Distillation for Vision-Language Models bottom-up cascaded KD for staged VLM capability compression VLM VQA and vision-language understanding - arXiv / indexed candidate
2026-04 multimodal black-box OPD Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL inserts response-level black-box OPD between SFT and RLVR; MoE discriminators separate perception and reasoning drift VLM / MLLM multimodal RLVR, visual grounding, reasoning retention; Qwen3-VL 4B / 8B, 1.26M public demos + 113K Gemini 3 Flash demonstrations XIAO4579/PRISM / 70 arXiv / recorded
2026-03 uncertainty-aware MLLM KD Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models (Beta-KD) beta-distribution weighting balances data supervision and teacher supervision according to teacher uncertainty VLM / MLLM MLLM compression and transfer; GQA / ScienceQA-IMG / TextVQA / POPE / MME-P / MMBench-dev Jingchensun/beta-kd / 3 CVPR 2026 / recorded
2026-02 multimodal token-interaction KD Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions explicitly models cross-modal token interactions rather than only next-token alignment VLM / MLLM MLLM compression and cross-modal retention - arXiv / indexed candidate
2025-12 VLM long-window KD Towards Long-window Anchoring in Vision-Language Model Distillation long-window anchoring addresses limited student context windows under VLM distillation VLM long-context vision-language understanding - AAAI 2026 / indexed candidate
2025-12 masked teacher + reinforced student Masking Teacher and Reinforcing Student for Distilling Vision-Language Models masks teacher signals and reinforces the student for compact VLM distillation VLM lightweight VLM deployment - arXiv / indexed candidate
2025-11 unbalanced visual-token KD EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens KD for efficient MLLMs under unbalanced visual tokens VLM / MLLM MLLM compression, visual-information retention - AAAI 2026 / indexed candidate
2025-11 CLIP / VQA KD diagnosis When Better Teachers Don't Make Better Students analyzes why stronger teachers do not always produce stronger multimodal students VLM / CLIP VQA, CLIP distillation - arXiv / indexed candidate
2025-03 multi-prompt self-distillation Enhancing Multi-hop Reasoning in Vision-Language Models via Self-Distillation with Multi-Prompt Ensembling multi-prompt ensembling plus self-distillation compresses stronger reasoning traces back into the student VLM / MLLM multi-hop visual reasoning, VQA; 5 VQA benchmarks - arXiv / recorded
2024-09 VLM continual dual-teacher KD Adapt without Forgetting dual-teacher proximity distillation with graph modeling for new-task adaptation and zero-shot retention VLM continual learning, zero-shot retention; MTIL + CIFAR100 / TinyImageNet myz-ah/AwoForget / - ECCV 2024 / recorded
2024-09 selective dual-teacher KD Select and Distill selective dual-teacher transfer separates what to preserve from what to adapt VLM FGVCAircraft / DTD / EuroSAT / Flowers102 / Food101 / OxfordPets / StanfordCars / UCF101 / ImageNet chu0802/SnD / 16 ECCV 2024 / recorded
2024-07 MLLM KD factor analysis LLAVADI combines feature distillation, logit distillation, and teacher-generated data VLM / MLLM CC-595K / LLaVA-665K + GQA / ScienceQA-IMG / TextVQA / POPE / MME-P / MMBench-dev - arXiv / recorded
2023-12 visual program distillation Visual Program Distillation distills LLM-generated programs and tool-use reasoning into a VLM VLM / MLLM MMBench / OK-VQA / A-OKVQA / TallyQA / POPE / Hateful Memes - CVPR 2024 / recorded
2023-10 lifelong multi-domain VQA Multi-Domain Lifelong Visual Question Answering via Self-Critical Distillation replay-free self-critical distillation over logits and intermediate representations VLM CLEVR / GQA / VizWiz / AQUA / VQA-Abstract - ACM MM 2023 / recorded

Online Distill: Loss Design, Token Weighting, and Dense Credit
Time Position Paper Approach / Loss design Model type Task / datasets Code / Stars Venue / status
2026-05 neighboring token-weighting inspiration InfoSFT information-aware token weighting makes SFT learn more and forget less; not strict OPD, but useful for loss-weighting design LLM SFT / forgetting mitigation - arXiv / indexed candidate
2026-05 token-level self-uncertainty weighting Respecting Self-Uncertainty in On-Policy Self-Distillation weights teacher token signals by self-uncertainty instead of using uniform token supervision LLM reasoning post-training - arXiv / indexed candidate
2026-05 prefix / suffix teachability diagnosis Prefix Teach, Suffix Fade shows suffix-token teaching signal can collapse in strong-to-weak OPD LLM structured output / reasoning - arXiv / indexed candidate
2026-05 multi-rollout peer supervision Multi-Rollout On-Policy Distillation uses peer successes and failures across rollouts for finer token supervision LLM verifier-based reasoning - arXiv / indexed candidate
2026-05 token-routed self-OPD TRACE avoids all-token KL waste by routing supervision to important tokens LLM RLVR / reasoning alignment - arXiv / indexed candidate
2026-05 best-of-N teacher rollout On-Policy Distillation with Best-of-N Teacher Rollout Selection selects better teacher rollouts before distillation to reduce noisy teacher signals LLM reasoning post-training - arXiv / indexed candidate
2026-05 rock-token analysis Cornerstones or Stumbling Blocks? analyzes how a small set of critical tokens drives or harms OPD learning LLM reasoning / token-level credit - arXiv / indexed candidate
2026-05 rubric-based OPD Rubric-based On-policy Distillation uses structured semantic rubrics instead of white-box teacher logits LLM alignment / reasoning - arXiv / indexed candidate
2026-05 control-variate KL KL for a KL uses a control-variate baseline to reduce single-sample OPD gradient variance LLM reasoning post-training - arXiv / indexed candidate
2026-05 step-wise OPD SOD decomposes sparse trajectory supervision into step-wise distillation for small agent models LLM / Agent tool-integrated reasoning - arXiv / indexed candidate
2026-05 cross-tokenizer OPD SimCT recovers supervision lost when teacher and student tokenizers differ LLM cross-tokenizer distillation - arXiv / indexed candidate
2026-05 long-horizon pruning Prune-OPD prunes low-value or drifted distillation signals in long-horizon reasoning to improve OPD reliability LLM long-horizon reasoning - arXiv / indexed candidate
2026-05 multi-agent debate teacher MAD-OPD uses multi-agent debate to produce stronger token-level teacher supervision and break the single-teacher ceiling LLM / Agent agentic reasoning - arXiv / indexed candidate
2026-04 dual-path adaptive weighting SCOPE signal-calibrated OPD enhancement with dual-path adaptive weighting LLM reasoning alignment - arXiv / indexed candidate
2026-04 temporal curriculum TCOD adds temporal curriculum to multi-turn agent OPD to reduce long-horizon error accumulation LLM / Agent multi-turn autonomous agents - arXiv / indexed candidate
2026-04 token importance TIP: Token Importance in On-Policy Distillation directly studies which tokens carry useful OPD learning signal LLM online KD / reasoning - arXiv / indexed candidate
2026-03 frontier-of-competence sampling PACED focuses distillation / self-distillation on problems near the student's competence frontier LLM LLM distillation efficiency - arXiv / indexed candidate
2026-03 OPD failure fixes Revisiting On-Policy Distillation identifies long-rollout, teacher-prefix drift, and tokenizer-mismatch failures; proposes top-K local support matching LLM reasoning / agent multi-task training - arXiv / recorded
2026-03 hindsight + entropy weighting HEAL weights trajectory quality using hindsight information and teacher entropy LLM online distillation / reasoning - arXiv / recorded
2026-02 context distillation On-Policy Context Distillation for Language Models internalizes in-context knowledge into parameters by connecting context distillation with on-policy distillation LLM context distillation / knowledge internalization - arXiv / indexed candidate
2023-06 generative online distillation GKD / MiniLLM student-generated trajectory distillation; MiniLLM uses reverse-KL for generative students LLM instruction following, text generation; MT-Bench / AlpacaEval / instruction data - arXiv / recorded

Self Distill: Self-Teacher, Privileged Context, and No-External-Teacher Retention
Time Position Paper Approach / Loss design Model type Task / datasets Code / Stars Venue / status
2026-05 unified self-distillation UniSD studies self-generated trajectory reliability, representation alignment, and training stability in one framework LLM LLM adaptation / reasoning - arXiv / indexed candidate
2026-05 preference-based self-distillation Preference-Based Self-Distillation adds reward regularization beyond KL matching to stabilize self-distillation LLM reasoning post-training - arXiv / indexed candidate
2026-05 outcome-guided logit steering OGLS-SD uses outcome-guided logit steering to correct over-alignment in OPSD LLM LLM reasoning - arXiv / indexed candidate
2026-05 teacher-exposure scheduling Adaptive Teacher Exposure tunes privileged-teacher exposure to reduce teacher dependence LLM LLM reasoning - arXiv / indexed candidate
2026-05 input-specific credit From Generic Correlation to Input-Specific Credit moves from generic correlation to input-specific credit assignment LLM post-training / reasoning - arXiv / indexed candidate
2026-05 reflection-enhanced self-distillation Learning with Rare Success but Rich Feedback uses environmental feedback and reflection to strengthen self-distillation when successes are rare but feedback is rich LLM interactive / feedback learning - arXiv / indexed candidate
2026-05 harness self-distillation Training with Harnesses distills inference-time harness capabilities back into the base model LLM complex reasoning - arXiv / indexed candidate
2026-05 multilingual safety self-distill Multilingual Safety Alignment via Self-Distillation transfers high-resource-language safety ability to low-resource languages through self-distillation LLM multilingual safety alignment - arXiv / indexed candidate
2026-04 performance-recovery self-distillation Self-Distillation as a Performance Recovery Mechanism for LLMs uses SDFT to recover after SFT / quantization / pruning and analyzes manifold alignment with CKA LLM performance recovery / forgetting mitigation - arXiv / recorded
2026-01 OPSD Self-Distilled Reasoner cold-starts CoT and iteratively distills from the same model under privileged context LLM math reasoning; AIME24 / AIME25 / HMMT25 siyan-zhao/OPSD / 107 arXiv / recorded
2026-01 continual-learning self-distillation Self-Distillation Enables Continual Learning teacher uses demonstrations / privileged context while student sees normal inputs; KL preserves old behavior while learning new tasks LLM continual learning, domain adaptation; sequential task streams - arXiv / recorded

Online Distill + RL / RLVR: Sparse Rewards with Dense Distillation
Time Position Paper Approach / Loss design Model type Task / datasets Code / Stars Venue / status
2026-05 language feedback + variational PD Learning from Language Feedback via Variational Policy Distillation turns language feedback into variational policy distillation to reduce RLVR exploration bottlenecks LLM verifiable reasoning - arXiv / indexed candidate
2026-05 agentic RL + OPSD Self-Distilled Agentic Reinforcement Learning uses OPSD to provide dense trajectory supervision for long-horizon agent RL LLM / Agent long-horizon agent interaction - arXiv / indexed candidate
2026-05 agent advantage reweighting GEAR converts outcome-level reward into finer supervision with granularity-adaptive advantage reweighting LLM / Agent agent post-training - arXiv / indexed candidate
2026-05 self-distilled RLVR exploration Rebellious Student reverses selected teacher signals in self-distilled RLVR to encourage exploration LLM reasoning RLVR - arXiv / indexed candidate
2026-05 on-policy optimization + distillation Combining On-Policy Optimization and Distillation for Long-Context Reasoning combines on-policy optimization and distillation for stable long-context reasoning LLM long-context reasoning - arXiv / indexed candidate
2026-05 sparse-to-dense reward principle Beyond GRPO and On-Policy Distillation compares how sparse GRPO rewards and dense OPD rewards should allocate checked examples LLM language-model post-training - arXiv / indexed candidate
2026-05 reward-weighted OPD Reward-Weighted On-Policy Distillation uses verifier signals to drive reward-weighted OPD LLM NL-to-SVA generation - arXiv / indexed candidate
2026-05 post-RL compaction OPSD Compresses What RLVR Teaches uses OPSD as a post-RL compaction stage, while noting weaker gains in long-thinking settings LLM reasoning models / RLVR compaction - arXiv / indexed candidate
2026-05 anti-self-distillation Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information uses PMI to identify and reverse self-distillation signals that may lock in bad reasoning behavior LLM reasoning RL - arXiv / indexed candidate
2026-04 co-evolving policy distillation Co-Evolving Policy Distillation introduces OPD during each expert's RLVR training instead of after expert training is complete LLM multi-expert capability consolidation - arXiv / indexed candidate
2026-04 RLVR + self-distillation Self-Distilled RLVR adds self-distillation to RLVR to reduce rollout variance and stabilize reasoning gains LLM math reasoning, RLVR - arXiv / recorded
2026-04 unified GRPO + self-distill SRPO unifies GRPO and self-distillation policy optimization through sample routing LLM verifiable reasoning; 5 reasoning benchmarks - arXiv / recorded

Mechanism / Survey / Diagnosis
Time Position Paper Main finding Model type Task / datasets Code / Stars Venue / status
2026-05 OPD applicability analysis Unmasking On-Policy Distillation analyzes when OPD helps and when it hurts training LLM reasoning post-training - arXiv / indexed candidate
2026-05 OPD / OPSD mechanism map The Many Faces of On-Policy Distillation summarizes OPD / OPSD mechanisms, pitfalls, and fixes LLM reasoning / post-training - arXiv / indexed candidate
2026-05 OPD extrapolation risk The Extrapolation Cliff in On-Policy Distillation shows excessive reward extrapolation can break near-deterministic structured-output contracts LLM structured output - arXiv / indexed candidate
2026-05 unified OPD recipe Uni-OPD offers a dual-perspective OPD recipe and discusses reliability conditions LLM expert capability consolidation - arXiv / indexed candidate
2026-04 loss-accounting position paper Knowledge Distillation Must Account for What It Loses argues distillation evaluation should track teacher capabilities lost by the student, not only task scores LLM distillation evaluation - arXiv / indexed candidate
2026-04 OPD survey A Survey of On-Policy Distillation for Large Language Models organizes OPD definitions, variants, use cases, and risks LLM survey - arXiv / recorded
2026-04 OPD mechanism analysis Rethinking On-Policy Distillation of Large Language Models explains OPD from phenomenology, mechanism, and training recipe perspectives LLM OPD training / evaluation - arXiv / recorded

Replay

This is also a mainline category, but public results currently lean more toward data mixing / replay-like recipes than toward a dense cluster of canonical methods.

Time Position Paper Approach Model type Experimental task Primary datasets / benchmarks GitHub Stars Venue Year
2026-03 mainline-related recipe insight Fine-tuning MLLMs Without Forgetting Is Easier Than You Think data mixing + replay-like recipe + diagnostic forgetting analysis VLM / MLLM multimodal continual fine-tuning, forgetting analysis, practical recipe design ImageNet-VQA / ImageWikiQA / LLaVA-665K / MLLM-CL - - arXiv 2026

Direction 2: Constrained Parameter-Space Updates (Parameter Regularization)

This is currently one of the strongest mainline categories. The shared pattern is to optimize knowledge injection and capability retention together through update constraints, parameter importance, or constrained low-rank adaptation.

Time Position Paper Approach Model type Experimental task Primary datasets / benchmarks GitHub Stars Venue Year
2026-02 mainline method Model-Dowser: Data-Free Importance Probing to Mitigate Catastrophic Forgetting in Multimodal Large Language Models data-free importance probing that protects high-importance parameters based on weight magnitude, input activations, and output sensitivity VLM / MLLM MLLM fine-tuning, catastrophic-forgetting mitigation LLaVA / NVILA adaptation and broad-capability retention evaluations model-dowser/model-dowser.github.io 0 arXiv 2026
2026-02 mainline method Spectral Imbalance Causes Forgetting in Low-Rank Continual Adaptation analyzes singular-value spectral imbalance in low-rank updates and balances update directions via constrained Stiefel-manifold optimization VLM / MLLM PEFT continual learning, backward / forward forgetting mitigation UCIT / MLLM-DCL / MM-MergeBench haodotgu/EBLoRA 3 arXiv 2026
2026-01 mainline method KeepLoRA residual gradient adaptation + low-rank update control VLM / MLLM continual learning, multi-task / VQA-style retention MTIL / MLLM-DCL / UCIT MaolinLuo/KeepLoRA 51 ICLR 2026 2026
2025-10 mainline method KORE retention-aware augmentations + constrained updates for joint injection and preservation VLM / MLLM multimodal knowledge injection, capability retention KORE-74K KORE-LMM/KORE 141 arXiv 2025
2025-05 mainline method SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning separates superficial forgetting from essential forgetting and combines ASD + RegLoRA to handle answer-style drift and true forgetting together VLM / MLLM multimodal continual instruction tuning, capability retention CoIN-ASD / ScienceQA / TextVQA / ImageNet / GQA / VizWiz / Grounding / VQAv2 / OCRVQA jinpeng0528/SEFE 11 ICML 2025 2025
2025-03 LLM upstream inspiration LoRA-Null null-space initialization + conservative low-rank updates; a useful reference line for preservation-oriented PEFT LLM retention during fine-tuning, forgetting mitigation TriviaQA / NQ-Open / WebQuestions + general capability benchmarks HungerPWAY/LoRA-Null 9 arXiv / OpenReview 2025

Direction 3: Modular Isolation and Local Updates (LoRA Isolation / Knowledge Editing)

This line reduces interference by isolating knowledge into separate LoRAs, experts, or routing paths.

Time Position Paper Approach Model type Experimental task Primary datasets / benchmarks GitHub Stars Venue Year
2026-03 mainline method / MoE routing isolation On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models dynamically expands MoE and uses token-level assignment guidance to suppress routing drift VLM / LVLM multimodal continual instruction tuning, old-task token routing retention multi-task LVLM continual-learning evaluation zhaoc5/DyMoE 3 arXiv 2026
2026-02 mainline method / MoE isolation Continual-NExT: A Unified Comprehension And Generation Continual Learning Framework MAGE mixes General LoRA and Expert LoRA to support continual comprehension and generation in dual-to-dual MLLMs VLM / MLLM unified continual learning for comprehension + generation, cross-modal knowledge transfer Continual-NExT framework / benchmark ECNU-SII/Continual-NExT 232 arXiv 2026
2026-02 mainline method / MoE isolation SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning decomposes routing dynamics into orthogonal subspaces and constrains expert updates with curvature-aware scaling from historical input covariance VLM / MLLM multimodal continual instruction tuning, router-drift / expert-drift mitigation CoIN MCIT / ScienceQA / TextVQA / ImageNet / GQA / VizWiz / Ref-REC / VQAv2 / OCR-VQA - - arXiv 2026
2026 mainline method / LoRA structure expansion LoRA in LoRA: Towards Lifelong Model Editing via Integrating LoRA Dynamics in Evolving Subspace integrates LoRA dynamics inside an evolving subspace for lifelong editing and capability retention VLM / MLLM continual visual instruction tuning, model editing, capability retention CVIT Benchmark: ScienceQA / TextVQA / Flickr30k / ImageNet / GQA / VQAv2 - - AAAI 2026 2026
2025-06 mainline method MLLM-CL MR-LoRA routing + expert decomposition across domain and ability tracks VLM / MLLM domain continual learning, ability continual learning MLLM-CL / UCIT / VTCTrain / DCL bjzhb666/MLLM-CL 64 arXiv 2025

Related engineering entry: [Ghy0501/MCITlib](https://github.com/Ghy0501/MCITlib) is more of a toolkit / benchmark organization repo than a single-paper method, but it is worth tracking from the engineering side of this category.

Related Inspiration: Knowledge Editing

Knowledge editing is not the main problem setup of this repository. It is usually closer to small-scope / local knowledge injection, such as editing one fact or a small set of facts, rather than preserving broad capability under larger-scale multimodal knowledge injection. Still, it is very relevant for localized updates and side-effect control.

Time Paper Approach Model type Experimental task Primary datasets / benchmarks GitHub Stars Venue Year
2025-07 AdaEdit: Advancing Continuous Knowledge Editing For Large Language Models stable update mechanism for continuous editing LLM continuous knowledge editing, long-horizon stability evaluation CounterFact / zsRE / continuous edit sequences - - ACL 2025 2025
2024 EasyEdit: An Easy-to-use Knowledge Editing Framework for LLMs unified experimental framework and method integration LLM reproducible knowledge-editing evaluation CounterFact / zsRE / KnowEdit and related datasets zjunlp/EasyEdit 2769 ACL 2024 2024
2023-10 Can We Edit Multimodal Large Language Models? extends knowledge editing into multimodal models VLM / MLLM multimodal factual editing, image-text knowledge updates MMEdit / multimodal editing benchmarks - - EMNLP 2023 2023
2022-10 MEMIT many-fact batch editing LLM batch factual updating, local knowledge modification CounterFact / zsRE kmeng01/memit 544 ICLR 2023 2022
2022-02 ROME rank-one parameter update editing LLM single-fact editing, knowledge localization CounterFact / zsRE kmeng01/rome 743 NeurIPS 2022 2022
2021-10 MEND: Fast Model Editing at Scale low-rank gradient editor network LLM fast local knowledge editing zsRE / CounterFact eric-mitchell/mend 259 ICLR 2022 2021

🌐 Ecosystem

GitHub

This section avoids repeating official repos that are already attached to papers in the research directions. Repos such as KORE / SEFE / MLLM-CL / EVOKE stay attached to paper rows above. This section keeps only integration-oriented repositories, training infrastructure, and evaluation tooling.

GitHub stars checked on 2026-04-17.

Aggregation / Integration Projects

Project Stars Link Category Why it matters
Awesome Continual Learning of Vision-Language Models 179 YuyangSunshine/Awesome-Continual-learning-of-Vision-Language-Models neighboring awesome list A valuable external index for VLM continual learning; new papers from it should be read, classified, and merged into the research directions instead of copied verbatim.
MCITlib 78 Ghy0501/MCITlib multimodal continual instruction tuning Plays both a library and benchmark role.

Training / Adaptation / Evaluation Infrastructure

Project Stars Link Category Notes
PEFT 20949 huggingface/peft PEFT infrastructure Core infrastructure for LoRA, adapters, and low-rank updates.
LLaMA-Factory 70198 hiyouga/LLaMA-Factory unified training framework One of the most widely reused training stacks, now including broad VLM / MLLM support.
ms-swift 13758 modelscope/ms-swift training / post-training framework Common in distillation, SFT, GRPO, and MLLM training pipelines.
VLMEvalKit 4049 open-compass/VLMEvalKit evaluation framework Very useful for retention, OOD, and broad-capability evaluation.
Avalanche 2049 ContinualAI/avalanche continual learning toolkit A general-purpose continual learning infrastructure repo.

Hugging Face

Hugging Face currently serves mainly as a dataset, benchmark, and paper-related entry layer for this topic. This repository does not maintain a generic base-model download leaderboard.

Resource Type Link Notes
MMEVOKE dataset / benchmark kailinjiang/MMEVOKE Entry point for evolving multimodal knowledge evaluation.
KORE-74K dataset kailinjiang/KORE-74K Dataset entry for retention-aware knowledge injection.
MLLM-CL dataset / benchmark MLLM-CL/MLLM-CL Entry point for domain-vs-ability continual-learning evaluation.
UCIT dataset MLLM-CL/UCIT Continual instruction-tuning data entry.
MMKE-Bench dataset / benchmark kailinjiang/MMKE-Bench-dataset Multimodal knowledge-editing evaluation data.
MC-MKE dataset / benchmark reroze/MC-MKE Cross-modal consistency editing benchmark data.

📮 Contact Us

If you have questions, suggestions, or would like to discuss potential collaboration, feel free to reach out:

  • Yan Hong: ruoning.hy@antgroup.com
  • Wei Li: livie.lw@antgroup.com
  • Jun Lan: yelan.lj@antgroup.com

About

A bilingual awesome list for VLM/MLLM knowledge injection research: benchmarks, papers, tools, resources, and ecosystem updates.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors