Mastering DeepSeek: Installing Tiny, Small, and VL2 Models with Inference and a Gradio Interface
DeepSeek-VL2 is a powerful vision-language model designed to handle a wide range of visual and text-based tasks, including visual question answering, optical character recognition, document analysis, and object localization. It builds on a Mixture-of-Experts (MoE) architecture, offering efficient processing and improved accuracy. The model series includes three versions—DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2—with varying numbers of activated parameters to suit different use cases. DeepSeek-VL2 is optimized for accuracy while maintaining efficiency, making it a strong choice for complex multimodal tasks. It supports commercial use and is available under the MIT License. Resource HuggingFace Link: https://huggingface.co/deepseek-ai/deepseek-vl2 GitHub Link: https://github.com/deepseek-ai/DeepSeek-VL2 Prerequisites for Installing DeepSeek Tiny, Small, and VL2 Models Make sure you have the following: GPUs: 1xRTXA6000 (for smooth execution). Disk Space: 200 GB free. RAM: 64 GB for smooth execution CPU: 64 Cores for smooth execution Step-by-Step Process to Install DeepSeek VL2 Small – MoE Vision Model Locally For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements. Step 1: Sign Up and Set Up a NodeShift Cloud Account Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account. Follow the account setup process and provide the necessary details and information. Step 2: Create a GPU Node (Virtual Machine) GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements. Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deployment. Step 3: Select a Model, Region, and Storage In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model. We will use 1x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements. Step 4: Select Authentication Method There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation. Step 5: Choose an Image Next, you will need to choose an image for your Virtual Machine. We will deploy DeepSeek VL2 Small – MoE Vision on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install DeepSeek VL2 Small – MoE Vision on your GPU Node. After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed. Step 6: Virtual Machine Successfully Deployed You will get visual confirmation that your node is up and running. Step 7: Connect to GPUs using SSH NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation. Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner. Now open your terminal and paste the proxy SSH IP or direct SSH IP. Next, if you want to check the GPU details, run the command below:\ nvidia-smi Step 8: Check the Available Python version and Install the new version Run the following commands to check the available Python version. If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA. Run the following commands to add the deadsnakes PPA: sudo apt update sudo apt install -y software-properties-common sudo add-apt-repository -y ppa:deadsnakes/ppa sudo apt update Step 9: Install Python 3.11 Now, run the following command to install Python 3.11 or another desired version: sudo apt install -y python3.11 python3.11-distutils python3.11-venv Step 10: Update the Default Python3 Version Now, run the following command to link the new Python version as the default python3: sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1 sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2 sudo update-alternati
![Mastering DeepSeek: Installing Tiny, Small, and VL2 Models with Inference and a Gradio Interface](https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe453cxvzoy1mcq2dz1b7.png)
DeepSeek-VL2 is a powerful vision-language model designed to handle a wide range of visual and text-based tasks, including visual question answering, optical character recognition, document analysis, and object localization. It builds on a Mixture-of-Experts (MoE) architecture, offering efficient processing and improved accuracy.
The model series includes three versions—DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2—with varying numbers of activated parameters to suit different use cases. DeepSeek-VL2 is optimized for accuracy while maintaining efficiency, making it a strong choice for complex multimodal tasks. It supports commercial use and is available under the MIT License.
Resource
HuggingFace
Link: https://huggingface.co/deepseek-ai/deepseek-vl2
GitHub
Link: https://github.com/deepseek-ai/DeepSeek-VL2
Prerequisites for Installing DeepSeek Tiny, Small, and VL2 Models
Make sure you have the following:
- GPUs: 1xRTXA6000 (for smooth execution).
- Disk Space: 200 GB free.
- RAM: 64 GB for smooth execution
- CPU: 64 Cores for smooth execution
Step-by-Step Process to Install DeepSeek VL2 Small – MoE Vision Model Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deployment.
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
Next, you will need to choose an image for your Virtual Machine. We will deploy DeepSeek VL2 Small – MoE Vision on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install DeepSeek VL2 Small – MoE Vision on your GPU Node.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, if you want to check the GPU details, run the command below:\
nvidia-smi
Step 8: Check the Available Python version and Install the new version
Run the following commands to check the available Python version.
If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA.
Run the following commands to add the deadsnakes PPA:
sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update
Step 9: Install Python 3.11
Now, run the following command to install Python 3.11 or another desired version:
sudo apt install -y python3.11 python3.11-distutils python3.11-venv
Step 10: Update the Default Python3 Version
Now, run the following command to link the new Python version as the default python3:
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3
Then, run the following command to verify that the new Python version is active:
python3 --version
Step 11: Install and Update Pip
Run the following command to install and update the pip:
python3 -m ensurepip --upgrade
python3 -m pip install --upgrade pip
Then, run the following command to check the version of pip:
pip --version
Step 12: Clone the Repository
Run the following command to clone the Deepseek-vl2 repository:
git clone https://github.com/deepseek-ai/deepseek-vl2.git
cd deepseek-vl2
Step 13: Setup Environment
Run the following command to setup the environment:
python -m venv deepseek_env
source deepseek_env/bin/activate
# On Windows: deepseek_env\Scripts\activate
Step 14: Install Dependencies
Run the following command to install the dependencies:
pip install -e .
Step 15: Install Gradio
Run the following command to install the Gradio:
pip install gradio==3.48.0
Step 16: Check Model and Commands
The repository provides example commands to run the web demo using different model variants. Note that you should set the CUDA_VISIBLE_DEVICES environment variable to the GPU you wish to use (in this example, GPU 2 is used) and specify the appropriate model name, port, and (if needed) the --chunk_size parameter.
- For the VL2-Tiny Model Model Details: Total parameters: 3.37B MoE Activated parameters: 1B Suitable for a single GPU with less than 40GB memory Command:
CUDA_VISIBLE_DEVICES=2 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2-tiny" \
--port 37914
- For the VL2-Small Model Model Details: Total parameters: 16.1B MoE Activated parameters: 2.4B Memory Note: When running on an A100 40GB GPU, you should set --chunk_size 512 to save memory via incremental prefilling (at the expense of speed). On GPUs with more than 40GB, you can omit the --chunk_size 512 for a faster response. Command (for a 40GB GPU):
CUDA_VISIBLE_DEVICES=2 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2-small" \
--port 37914 \
--chunk_size 512
- For the VL2 (Full) Model Model Details: Total parameters: 27.5B MoE Activated parameters: 4.2B Command:
CUDA_VISIBLE_DEVICES=2 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2" \
--port 37914
How to Use These Commands
Set the GPU:
The CUDA_VISIBLE_DEVICES=2 part tells the system to use GPU number 2. Adjust this value according to your system’s GPU configuration.
Run the Demo Script:
The python web_demo.py command launches the Gradio-based web demo.
Specify the Model Variant:
Use the --model_name parameter to choose between the different model variants:
"deepseek-ai/deepseek-vl2-tiny"
"deepseek-ai/deepseek-vl2-small"
"deepseek-ai/deepseek-vl2"
Set the Port:
The --port 37914 argument sets the port on which the web server will run. Open your browser and navigate to http://:37914 to access the demo.
Optional Memory Tuning:
For the small model on a GPU with 40GB memory, the additional --chunk_size 512 argument is recommended for memory-saving incremental pre-filling.
Step 17: Verify Your GPU Availability
Run the following command in your terminal to see if your GPU is recognized by the system:
nvidia-smi
Step 18: Run Deepseek-vl2-tiny Model
Execute the following command to run the deepseek-vl2-tiny model:
python3 web_demo.py --model_name "deepseek-ai/deepseek-vl2-tiny" --port 37914
Step 19: Access the Application
Accessing the application at:
Running on local URL: http://0.0.0.0:37914
Running on public URL: https://8df6de5304350b2ecc.gradio.live
Step 20: Play with Deepseek-vl2-tiny Model
Step 21: Run Deepseek-vl2-small Model
Execute the following command to run the deepseek-vl2-small model:
CUDA_VISIBLE_DEVICES=0 python3 web_demo.py --model_name "deepseek-ai/deepseek-vl2-small" --port 37914 --chunk_size 512
Step 22: Access the Application
Accessing the application at:
Running on local URL: http://0.0.0.0:37914
Running on public URL: https://8df6de5304350b2ecc.gradio.live
Step 23: Play with Deepseek-vl2-small Model
Step 24: Run Deepseek-vl2 Model
Execute the following command to run the deepseek-vl2 model:
CUDA_VISIBLE_DEVICES=0 python web_demo.py --model_name "deepseek-ai/deepseek-vl2" --port 37914
Step 25: Access the Application
Accessing the application at:
Running on local URL: http://0.0.0.0:37914
Running on public URL: https://8df6de5304350b2ecc.gradio.live
Step 26: Play with Deepseek-vl2 Model
For Inference Only: DeepSeek-VL2-Tiny can run on a 16GB GPU with quantization, but the full model requires 80GB VRAM.
For Gradio Deployment: At least 48GB VRAM is required for multi-image handling, and 80GB VRAM is ideal for full-scale applications.
Optimization Strategies:
Chunked Inference (for 40GB GPUs).
Flash Attention (for efficient multi-image processing).
Quantization (for limited VRAM GPUs).
Deploy DeepSeek-VL2 on the right hardware for best performance!