CVPR2025 | MobileMamba: A new breakthrough in lightweight Mamba network, taking into account multiple receptive fields, efficient reasoning and super precision

1. Overview at a glance

MobileMamba proposes a lightweight multi-receptive field visual Mamba network . Through a three-stage network design and the MRFFI (Multi-Receptive Field Feature Interaction) module, it improves the model inference speed while achieving higher accuracy, surpassing the existing CNN, ViT and Mamba structures.

2. Core Issues

The current lightweight visual models are mainly based on CNN and Transformer:

• CNN’s local receptive field limits its global modeling capabilities.

• Transformer has a global receptive field, but the computational complexity is high at high resolution ( O(N²) ).

• The existing Mamba lightweight model has low FLOPs but slow inference speed .

MobileMamba aims to:

• Optimize the inference speed of Mamba to improve the throughput while ensuring low FLOPs.

• Enhance multi-scale receptive field interaction , taking into account both long- and short-range feature capture and high-frequency detail extraction.

• Adapt to high-resolution tasks and improve performance in tasks such as classification, object detection, and semantic segmentation.

3. Technical highlights

(1) Three-stage network design

• By weighing the trade-offs between four-stage and three-stage networks, choose a three-stage architecture to improve accuracy at the same throughput , or improve throughput at the same accuracy .

(2) MRFFI (Multi-Receptive Field Feature Interaction) module

• WTE-Mamba (Long-range Wavelet Transform Enhanced Mamba) : combines global modeling with high-frequency edge information extraction.

• MK-DeConv (Multi-core Deep Convolution) : Extract information of different scales and enhance local receptive field.

• Eliminate Redundant Identity : Reduce channel redundancy and improve computing efficiency.

(3) Training & Testing Strategy Optimization

• Knowledge Distillation improves the learning ability of lightweight models.

• Extended Training Epochs further improves the upper limit of accuracy.

• Normalization Layer Fusion accelerates inference at test time.

4. Methodological framework

picture

MobileMamba optimizes inference and feature extraction through the following core steps:

(1) Multi-receptive field feature interaction (MRFFI)

• Long-range information is extracted through WTE-Mamba , while high-frequency features are enhanced by combining wavelet transform.

• MK-DeConv uses convolution kernels of different sizes to interact local information and improve multi-scale perception capabilities.

• Reduce computational cost and improve inference speed by eliminating redundant identity mappings .

(2) Lightweight Mamba structure

• A three-stage design is used to reduce the amount of computation and improve throughput.

• Combine multi-directional scanning and low-rank state space mapping to improve computational efficiency.

(3) Optimizing training and inference

• Knowledge distillation : Learn from stronger teacher models to improve small model performance.

• Extend the number of training rounds : Experiments have shown that 300 rounds did not fully converge, and extending it to 1000 rounds can improve accuracy.

• Normalization layer fusion : reduces computational redundancy and improves computational efficiency during inference.

5. Quick Overview of Experimental Results

picture

MobileMamba demonstrates superior performance in multiple benchmark tests:

✅ ImageNet-1K classification

• MobileMamba-B4 83.6% Top-1 , +1.8% improvement over EfficientVMamba , and ×3.5 times faster inference speed .

✅Object Detection (COCO)

• Mask R-CNN : Compared with EMO, it improves mAP by +1.3↑ and throughput by +57%↑ .

• RetinaNet : Improves mAP by +2.1↑ and inference speed by ×4.3 times compared to EfficientVMamba .

✅Semantic Segmentation (ADE20K)

• Semantic FPN : Improves mIoU by +1.1↑ compared to EdgeViT , with only 20% of FLOPs .

• PSPNet : Improves mIoU by +0.4↑ compared to MobileViTv2 , with only 11% FLOPs .

6. Practical value and application

• Edge device visual computing : suitable for resource-constrained scenarios such as smartphones, embedded devices, and the Internet of Things (IoT).

• Autonomous driving and monitoring : Provides efficient visual computing in high-resolution scenarios , suitable for target detection and segmentation tasks.

• Medical image analysis : Extract key medical image features through multi-receptive field characteristics to improve diagnostic efficiency .

7. Open Questions

Is MobileMamba’s multi-receptive field feature interaction strategy applicable to other tasks such as video understanding or 3D vision?

How to further optimize MobileMamba to improve CPU/mobile inference speed?

Can we combine LoRA or other efficient parameter fine-tuning methods to improve the adaptability of MobileMamba for specific tasks?

<<:

>>: Required course: VLAN is so important! Share VLAN planning and configuration examples in two most common scenarios!

Frontier | The Internet of Vehicles security ecosystem is taking shape

Maxthon Hosting: Los Angeles CU2VIP line monthly payment starting from 38 yuan, return trip AS9929 for three networks/outbound trip CN2 for China Telecom

Let's share another excellent line VPS node o...

CVPR2025 | MobileMamba: A new breakthrough in lightweight Mamba network, taking into account multiple receptive fields, efficient reasoning and super precision

1. Overview at a glance

2. Core Issues

3. Technical highlights

4. Methodological framework

5. Quick Overview of Experimental Results

6. Practical value and application

7. Open Questions

Frontier | The Internet of Vehicles security ecosystem is taking shape

China Unicom successfully returns to the forefront of 5G user development

The hidden business opportunities of public telephone booths

Embedded CAN Bus Introduction (Low-Level Details)

How many IP addresses are there in China?

How 5G will change engineering design

Google reports: CBRS deployments doubled from March to April

Overview of important developments in the global 5G field in November 2020

Forecast of the layout of the three major operators in 2018

Operators are cutting marketing expenses, so how can agents survive?

Recommend

How to keep a remote SSH session running after a disconnect

Are you afraid of pressing Enter? 13 Junos tips to help you configure your network easily and without worries

How to play the NB-IoT game in 2019?

F5 Launches NGINX for Microsoft Azure, Delivering Secure, High-Performance Applications to the Azure Ecosystem

Ethernet Adapter Market to See Record Revenue Growth in 2022

The development of 5G and IoT technologies will help the electronic testing market grow

LinkSure Network attended the International World Wide Web Conference (WWW2017) and published a paper

DesiVPS: San Jose VPS starts at $18.99 per year, 1GB/25GB/1Gbps unlimited data

A brief analysis of the 5G market breakthrough and development path of China's radio and television industry

Maxthon Hosting: Los Angeles CU2VIP line monthly payment starting from 38 yuan, return trip AS9929 for three networks/outbound trip CN2 for China Telecom

Division of wireless AP channels in WLAN

80VPS: 350 yuan/month Korean server 2*E5-2450L/8GB/1TB/10M CN2/support upgrade

Top 10 edge computing vendors to watch

2017 F5 makes applications fly!

RackNerd: $9.49/year KVM-768MB/12GB/2TB/San Jose and other data centers