Mastering DeepSeek: Installing Tiny, Small, and VL2 Models with Inference and a Gradio Interface

DeepSeek-VL2 is a powerful vision-language model designed to handle a wide range of visual and text-based tasks, including visual question answering, optical character recognition, document analysis, and object localization. It builds on a Mixture-of-Experts (MoE) architecture, offering efficient processing and improved accuracy. The model series includes three versions—DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2—with varying numbers of activated parameters to suit different use cases. DeepSeek-VL2 is optimized for accuracy while maintaining efficiency, making it a strong choice for complex multimodal tasks. It supports commercial use and is available under the MIT License. Resource HuggingFace Link: https://huggingface.co/deepseek-ai/deepseek-vl2 GitHub Link: https://github.com/deepseek-ai/DeepSeek-VL2 Prerequisites for Installing DeepSeek Tiny, Small, and VL2 Models Make sure you have the following: GPUs: 1xRTXA6000 (for smooth execution). Disk Space: 200 GB free. RAM: 64 GB for smooth execution CPU: 64 Cores for smooth execution Step-by-Step Process to Install DeepSeek VL2 Small – MoE Vision Model Locally For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements. Step 1: Sign Up and Set Up a NodeShift Cloud Account Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account. Follow the account setup process and provide the necessary details and information. Step 2: Create a GPU Node (Virtual Machine) GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements. Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deployment. Step 3: Select a Model, Region, and Storage In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model. We will use 1x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements. Step 4: Select Authentication Method There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation. Step 5: Choose an Image Next, you will need to choose an image for your Virtual Machine. We will deploy DeepSeek VL2 Small – MoE Vision on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install DeepSeek VL2 Small – MoE Vision on your GPU Node. After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed. Step 6: Virtual Machine Successfully Deployed You will get visual confirmation that your node is up and running. Step 7: Connect to GPUs using SSH NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation. Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner. Now open your terminal and paste the proxy SSH IP or direct SSH IP. Next, if you want to check the GPU details, run the command below:\ nvidia-smi Step 8: Check the Available Python version and Install the new version Run the following commands to check the available Python version. If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA. Run the following commands to add the deadsnakes PPA: sudo apt update sudo apt install -y software-properties-common sudo add-apt-repository -y ppa:deadsnakes/ppa sudo apt update Step 9: Install Python 3.11 Now, run the following command to install Python 3.11 or another desired version: sudo apt install -y python3.11 python3.11-distutils python3.11-venv Step 10: Update the Default Python3 Version Now, run the following command to link the new Python version as the default python3: sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1 sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2 sudo update-alternati

Feb 6, 2025 - 12:56

Mastering DeepSeek: Installing Tiny, Small, and VL2 Models with Inference and a Gradio Interface

DeepSeek-VL2 is a powerful vision-language model designed to handle a wide range of visual and text-based tasks, including visual question answering, optical character recognition, document analysis, and object localization. It builds on a Mixture-of-Experts (MoE) architecture, offering efficient processing and improved accuracy.

The model series includes three versions—DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2—with varying numbers of activated parameters to suit different use cases. DeepSeek-VL2 is optimized for accuracy while maintaining efficiency, making it a strong choice for complex multimodal tasks. It supports commercial use and is available under the MIT License.

Resource

HuggingFace

Link: https://huggingface.co/deepseek-ai/deepseek-vl2

GitHub

Link: https://github.com/deepseek-ai/DeepSeek-VL2

Prerequisites for Installing DeepSeek Tiny, Small, and VL2 Models

Make sure you have the following:

GPUs: 1xRTXA6000 (for smooth execution).
Disk Space: 200 GB free.
RAM: 64 GB for smooth execution
CPU: 64 Cores for smooth execution

Step-by-Step Process to Install DeepSeek VL2 Small – MoE Vision Model Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deployment.

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

Next, you will need to choose an image for your Virtual Machine. We will deploy DeepSeek VL2 Small – MoE Vision on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install DeepSeek VL2 Small – MoE Vision on your GPU Node.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, if you want to check the GPU details, run the command below:\

nvidia-smi

Step 8: Check the Available Python version and Install the new version

Run the following commands to check the available Python version.

If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA.

Run the following commands to add the deadsnakes PPA:

sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Step 9: Install Python 3.11

Now, run the following command to install Python 3.11 or another desired version:

sudo apt install -y python3.11 python3.11-distutils python3.11-venv

Step 10: Update the Default Python3 Version

Now, run the following command to link the new Python version as the default python3:

sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3

Then, run the following command to verify that the new Python version is active:

python3 --version

Step 11: Install and Update Pip

Run the following command to install and update the pip:

python3 -m ensurepip --upgrade
python3 -m pip install --upgrade pip

Then, run the following command to check the version of pip:

pip --version

Step 12: Clone the Repository

Run the following command to clone the Deepseek-vl2 repository:

git clone https://github.com/deepseek-ai/deepseek-vl2.git
cd deepseek-vl2

Step 13: Setup Environment

Run the following command to setup the environment:

python -m venv deepseek_env
source deepseek_env/bin/activate    
# On Windows: deepseek_env\Scripts\activate

Step 14: Install Dependencies

Run the following command to install the dependencies:

pip install -e .

Step 15: Install Gradio

Run the following command to install the Gradio:

pip install gradio==3.48.0

Step 16: Check Model and Commands
The repository provides example commands to run the web demo using different model variants. Note that you should set the CUDA_VISIBLE_DEVICES environment variable to the GPU you wish to use (in this example, GPU 2 is used) and specify the appropriate model name, port, and (if needed) the --chunk_size parameter.

For the VL2-Tiny Model Model Details: Total parameters: 3.37B MoE Activated parameters: 1B Suitable for a single GPU with less than 40GB memory Command:

CUDA_VISIBLE_DEVICES=2 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2-tiny"  \
--port 37914

For the VL2-Small Model Model Details: Total parameters: 16.1B MoE Activated parameters: 2.4B Memory Note: When running on an A100 40GB GPU, you should set --chunk_size 512 to save memory via incremental prefilling (at the expense of speed). On GPUs with more than 40GB, you can omit the --chunk_size 512 for a faster response. Command (for a 40GB GPU):

CUDA_VISIBLE_DEVICES=2 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2-small"  \
--port 37914 \
--chunk_size 512

For the VL2 (Full) Model Model Details: Total parameters: 27.5B MoE Activated parameters: 4.2B Command:

CUDA_VISIBLE_DEVICES=2 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2"  \
--port 37914

How to Use These Commands

Set the GPU:
The CUDA_VISIBLE_DEVICES=2 part tells the system to use GPU number 2. Adjust this value according to your system’s GPU configuration.
Run the Demo Script:
The python web_demo.py command launches the Gradio-based web demo.
Specify the Model Variant:
Use the --model_name parameter to choose between the different model variants:
"deepseek-ai/deepseek-vl2-tiny"
"deepseek-ai/deepseek-vl2-small"
"deepseek-ai/deepseek-vl2"
Set the Port:
The --port 37914 argument sets the port on which the web server will run. Open your browser and navigate to http://:37914 to access the demo.
Optional Memory Tuning:
For the small model on a GPU with 40GB memory, the additional --chunk_size 512 argument is recommended for memory-saving incremental pre-filling.

Step 17: Verify Your GPU Availability

Run the following command in your terminal to see if your GPU is recognized by the system:

nvidia-smi

Step 18: Run Deepseek-vl2-tiny Model

Execute the following command to run the deepseek-vl2-tiny model:

python3 web_demo.py --model_name "deepseek-ai/deepseek-vl2-tiny" --port 37914

Step 19: Access the Application

Accessing the application at:

Running on local URL: http://0.0.0.0:37914
Running on public URL: https://8df6de5304350b2ecc.gradio.live

Step 20: Play with Deepseek-vl2-tiny Model

Step 21: Run Deepseek-vl2-small Model

Execute the following command to run the deepseek-vl2-small model:

CUDA_VISIBLE_DEVICES=0 python3 web_demo.py --model_name "deepseek-ai/deepseek-vl2-small" --port 37914 --chunk_size 512

Step 22: Access the Application

Accessing the application at:

Running on local URL: http://0.0.0.0:37914
Running on public URL: https://8df6de5304350b2ecc.gradio.live

Step 23: Play with Deepseek-vl2-small Model

Step 24: Run Deepseek-vl2 Model

Execute the following command to run the deepseek-vl2 model:

CUDA_VISIBLE_DEVICES=0 python web_demo.py --model_name "deepseek-ai/deepseek-vl2" --port 37914

Step 25: Access the Application

Accessing the application at:

Running on local URL: http://0.0.0.0:37914
Running on public URL: https://8df6de5304350b2ecc.gradio.live

Step 26: Play with Deepseek-vl2 Model

For Inference Only: DeepSeek-VL2-Tiny can run on a 16GB GPU with quantization, but the full model requires 80GB VRAM.
For Gradio Deployment: At least 48GB VRAM is required for multi-image handling, and 80GB VRAM is ideal for full-scale applications.
Optimization Strategies:
Chunked Inference (for 40GB GPUs).
Flash Attention (for efficient multi-image processing).
Quantization (for limited VRAM GPUs).
Deploy DeepSeek-VL2 on the right hardware for best performance!