VLM-Lens Documentation#

Overview#

This repository provides utilities for extracting hidden states from state-of-the-art vision-language models. By surfacing these intermediate representations, you can perform a comprehensive analysis of the knowledge encoded within each model.

Supported Models#

We currently support extracting hidden states from the following vision-language models. The architecture name (used for model selection) is shown in square brackets:

Model (HuggingFace Identifier)	Architecture Name
CohereLabs/aya-vision-8b	aya-vision
Salesforce/blip2-opt-2.7b	blip2
Salesforce/blip2-opt-6.7b	blip2
Salesforce/blip2-opt-6.7b-coco	blip2
openai/clip-vit-base-patch32	clip
openai/clip-vit-large-patch14	clip
THUDM/cogvlm-chat-hf	cogvlm
MBZUAI/GLaMM-FullScope	glamm
internlm/internlm-xcomposer2d5-7b	internlm-xcomposer
OpenGVLab/InternVL2_5-1B	internvl
OpenGVLab/InternVL2_5-2B	internvl
OpenGVLab/InternVL2_5-4B	internvl
OpenGVLab/InternVL2_5-8B	internvl
deepseek-community/Janus-Pro-1B	janus
deepseek-community/Janus-Pro-7B	janus
llava-hf/bakLlava-v1-hf	llava
llava-hf/llava-1.5-7b-hf	llava
llava-hf/llava-1.5-13b-hf	llava
llava-hf/llama3-llava-next-8b-hf	llavanext
llava-hf/llava-v1.6-mistral-7b-hf	llavanext
llava-hf/llava-v1.6-vicuna-7b-hf	llavanext
llava-hf/llava-v1.6-vicuna-13b-hf	llavanext
openbmb/MiniCPM-o-2_6	minicpm
compling/MiniCPM-V-2	minicpm
allenai/Molmo-7B-D-0924	molmo
allenai/MolmoE-1B-0924	molmo
google/paligemma-3b-mix-224	paligemma
mistralai/Pixtral-12B-2409	pixtral
mistralai/Pixtral-12B-Base-2409	pixtral
facebook/Perception-LM-1B	plm
facebook/Perception-LM-3B	plm
facebook/Perception-LM-8B	plm
Qwen/Qwen2-VL-2B-Instruct	qwen
Qwen/Qwen2-VL-7B-Instruct	qwen

Setup#

First, clone the repository:

git clone https://github.com/compling-wat/vlm-lens.git
cd vlm-lens

Because each model may have different dependencies, it is recommended to use a separate virtual environment for each model you run.

For example, using conda:

conda create -n <env_name> python=3.10
conda activate <env_name>

Or, using native python venv:

python -m venv <env_name>
source <env_name>/bin/activate

After activating your environment, install dependencies for your desired model architecture. Replace <architecture> with the appropriate value (e.g., base, cogvlm):

pip install -r envs/<architecture>/requirements.txt

Usage#

python -m src.main --config <config-file-path>

VLM-Lens Documentation

Contents

VLM-Lens Documentation#

Overview#

Supported Models#

Setup#

Usage#