MiniCPM-o#

This tutorial guides you through extracting MiniCPM-o model layers.

Dependency#

First, create and activate a virtual environment using conda:

conda create -n <env_name> python=3.10
conda activate <env_name>

Next, install the required dependencies via pip:

pip install -r envs/minicpm/requirements.txt

Note

MiniCPM-o supports two attention methods: sdpa and flash_attention_2 (default). The default config uses sdpa, so requirements.txt does not include flash-attn. If you want to use flash_attention_2, be sure to install the corresponding package separately.

Configuration#

The main configuration file for MiniCPM-o is located at configs/minicpm-o.yaml. Refer to Config Format for detailed explanation of all config options.

You can specify which modules or layers to register hooks for extraction. A comprehensive list of available modules is provided in the log file: logs/openbmb/MiniCPM-o-2_6.txt.

Note

For minicpm architecture implementation, we use a chat interface with a tokenizer limit set to 1 token to ensure exactly one forward pass.

Usage#

To extract layers on a CUDA-enabled device, execute:

python src/main.py --config configs/minicpm-o.yaml --device cuda --debug

Results#

After successful execution, extracted layer outputs are saved as PyTorch tensors inside a SQL database file. For the default config, the database is named minicpm-o.db.

You can retrieve these tensors using the script scripts/read_tensor.py, which lets you load and analyze the extracted data as needed.