VLM-Lens Documentation#
Overview#
This repository provides utilities for extracting hidden states from state-of-the-art vision-language models. By surfacing these intermediate representations, you can perform a comprehensive analysis of the knowledge encoded within each model.
Supported Models#
We currently support extracting hidden states from the following vision-language models. The architecture name (used for model selection) is shown in square brackets:
| Model (HuggingFace Identifier) | Architecture Name | 
|---|---|
| CohereLabs/aya-vision-8b | aya-vision | 
| Salesforce/blip2-opt-2.7b | blip2 | 
| Salesforce/blip2-opt-6.7b | blip2 | 
| Salesforce/blip2-opt-6.7b-coco | blip2 | 
| openai/clip-vit-base-patch32 | clip | 
| openai/clip-vit-large-patch14 | clip | 
| THUDM/cogvlm-chat-hf | cogvlm | 
| MBZUAI/GLaMM-FullScope | glamm | 
| internlm/internlm-xcomposer2d5-7b | internlm-xcomposer | 
| OpenGVLab/InternVL2_5-1B | internvl | 
| OpenGVLab/InternVL2_5-2B | internvl | 
| OpenGVLab/InternVL2_5-4B | internvl | 
| OpenGVLab/InternVL2_5-8B | internvl | 
| deepseek-community/Janus-Pro-1B | janus | 
| deepseek-community/Janus-Pro-7B | janus | 
| llava-hf/bakLlava-v1-hf | llava | 
| llava-hf/llava-1.5-7b-hf | llava | 
| llava-hf/llava-1.5-13b-hf | llava | 
| llava-hf/llama3-llava-next-8b-hf | llavanext | 
| llava-hf/llava-v1.6-mistral-7b-hf | llavanext | 
| llava-hf/llava-v1.6-vicuna-7b-hf | llavanext | 
| llava-hf/llava-v1.6-vicuna-13b-hf | llavanext | 
| openbmb/MiniCPM-o-2_6 | minicpm | 
| compling/MiniCPM-V-2 | minicpm | 
| allenai/Molmo-7B-D-0924 | molmo | 
| allenai/MolmoE-1B-0924 | molmo | 
| google/paligemma-3b-mix-224 | paligemma | 
| mistralai/Pixtral-12B-2409 | pixtral | 
| mistralai/Pixtral-12B-Base-2409 | pixtral | 
| facebook/Perception-LM-1B | plm | 
| facebook/Perception-LM-3B | plm | 
| facebook/Perception-LM-8B | plm | 
| Qwen/Qwen2-VL-2B-Instruct | qwen | 
| Qwen/Qwen2-VL-7B-Instruct | qwen | 
Setup#
First, clone the repository:
git clone https://github.com/compling-wat/vlm-lens.git
cd vlm-lens
Because each model may have different dependencies, it is recommended to use a separate virtual environment for each model you run.
For example, using conda:
conda create -n <env_name> python=3.10
conda activate <env_name>
Or, using native python venv:
python -m venv <env_name>
source <env_name>/bin/activate
After activating your environment, install dependencies for your desired model architecture.
Replace <architecture> with the appropriate value (e.g., base, cogvlm):
pip install -r envs/<architecture>/requirements.txt
Usage#
python -m src.main --config <config-file-path>