VLM-Lens Documentation

VLM-Lens Documentation#

Overview#

This repository provides utilities for extracting hidden states from state-of-the-art vision-language models. By surfacing these intermediate representations, you can perform a comprehensive analysis of the knowledge encoded within each model.

Supported Models#

We currently support extracting hidden states from the following vision-language models. The architecture name (used for model selection) is shown in square brackets:

Model (HuggingFace Identifier)

Architecture Name

CohereLabs/aya-vision-8b

aya-vision

Salesforce/blip2-opt-2.7b

blip2

Salesforce/blip2-opt-6.7b

blip2

Salesforce/blip2-opt-6.7b-coco

blip2

openai/clip-vit-base-patch32

clip

openai/clip-vit-large-patch14

clip

THUDM/cogvlm-chat-hf

cogvlm

MBZUAI/GLaMM-FullScope

glamm

internlm/internlm-xcomposer2d5-7b

internlm-xcomposer

OpenGVLab/InternVL2_5-1B

internvl

OpenGVLab/InternVL2_5-2B

internvl

OpenGVLab/InternVL2_5-4B

internvl

OpenGVLab/InternVL2_5-8B

internvl

deepseek-community/Janus-Pro-1B

janus

deepseek-community/Janus-Pro-7B

janus

llava-hf/bakLlava-v1-hf

llava

llava-hf/llava-1.5-7b-hf

llava

llava-hf/llava-1.5-13b-hf

llava

llava-hf/llama3-llava-next-8b-hf

llavanext

llava-hf/llava-v1.6-mistral-7b-hf

llavanext

llava-hf/llava-v1.6-vicuna-7b-hf

llavanext

llava-hf/llava-v1.6-vicuna-13b-hf

llavanext

openbmb/MiniCPM-o-2_6

minicpm

compling/MiniCPM-V-2

minicpm

allenai/Molmo-7B-D-0924

molmo

allenai/MolmoE-1B-0924

molmo

google/paligemma-3b-mix-224

paligemma

mistralai/Pixtral-12B-2409

pixtral

mistralai/Pixtral-12B-Base-2409

pixtral

facebook/Perception-LM-1B

plm

facebook/Perception-LM-3B

plm

facebook/Perception-LM-8B

plm

Qwen/Qwen2-VL-2B-Instruct

qwen

Qwen/Qwen2-VL-7B-Instruct

qwen

Setup#

First, clone the repository:

git clone https://github.com/compling-wat/vlm-lens.git
cd vlm-lens

Because each model may have different dependencies, it is recommended to use a separate virtual environment for each model you run.

For example, using conda:

conda create -n <env_name> python=3.10
conda activate <env_name>

Or, using native python venv:

python -m venv <env_name>
source <env_name>/bin/activate

After activating your environment, install dependencies for your desired model architecture. Replace <architecture> with the appropriate value (e.g., base, cogvlm):

pip install -r envs/<architecture>/requirements.txt

Usage#

python -m src.main --config <config-file-path>