CoCoPIE - Realtime AI on Mobile Devices

Reference from:

The above demonstrations are from the CoCoPIE Youtube channel here and Bilibili channel here. Welcome to check and advise. It is worth noticing that for the first time, on-mobile real-time acceleration of 3D activity detection networks (e.g., C3D, R(2+1)D, S3D) using off-the-shelf mobile devices. We can achieve only 9ms per frame performance without accuracy loss, outperforming current frameworks by 30X speedup. This is shown on the demo on the right.

It has been a consensus that the company who enables real intelligence on end devices (such as mobile devices and IoT devices) will define the future of computing. Racing towards this goal, many companies, whether giant technology firms such as Google, Microsoft, Amazon, Apple and Facebook, or startups spent tens of billions of dollars each year on R&D. Assuming hardware is the major constraint for enabling real mobile intelligence, the industry has mainly dedicated their efforts to developing specialized hardware accelerators for machine learning and inference. Billions of dollars have been spent to fuel this intelligent hardware race.

We challenge this assumption. By drawing on a recent real-time AI optimization framework CoCoPIE, it maintains that with effective compression-compiler co-design, it is possible to enable real-time artificial intelligence (AI) on mainstream end devices without special hardware. The principle of compression-compilation co-design is to design the compression of Deep Learning Models and their compilation to executables in a hand-in-hand manner. This synergistic method can effectively optimize both the size and speed of Deep Learning models, and also can dramatically shorten the tuning time of the compression process, largely reducing the time to the market of AI products. When applied to models running on mainstream end devices, the method can produce real-time experience across a set of AI applications that had been broadly perceived possible only with special AI accelerators.

CoCoPIE stands for Compression-Compilation co-design for Performance, Intelligence, and Efficiency. CoCoPIE holds numerous records on mobile AI: the first time to support all kinds of DNNs including CNNs, RNNs, transformer and language models, etc.; the fastest DNN pruning and acceleration framework, up to 180X faster compared with current frameworks such as TensorFlow-Lite (refer to Figure 1); a majority of representative DNNs and applications can be executed in real-time, for the first time, in off-the-shelf mobile devices; CoCoPIE framework on general-purpose mobile devices even outperforms a number of representative ASIC and FPGA solutions in terms of energy efficiency and/or performance (refer to Figure 2).

Figure 1. Execution time comparison with SOTA mobile acceleration frameworks (TFLite, TVM, Alibaba MNN) on VGG-16, ResNet-50, and MobileNet-V2 DNN models on ImageNet and CIFAR-10 datasets.

Figure 2. Comparison with representative ASIC and FPGA solutions. (a) Comparison of energy efficiency and inference latency with Google cloud TPU and edge TPU. (b) Comparison of energy efficiency with Eyeriss. (c) Comparison of energy efficiency with NVIDIA Jetson AGX Xavier. (d) Comparison of energy efficiency with FPGA solution ESE.

CoCoPIE consists of two main components, which both reflect the Compression-Compilation co-design principle. The first component, CoCo-Gen, generates efficient DNN execution codes via a synergy of pattern-based DNN pruning and pattern-aware code generation. The second component, CoCo-Tune, dramatically shortens the process in identifying the appropriate set of DNN parameters to prune by a composability-based compiler framework.

Figure 3. Examples of style transfer, automatic coloring, and super resolution implemented on off-the-shelf mobile device using CoCoPIE framework.

Demonstrations: Comprehensive real-time demonstrations of the CoCoPIE framework can be found at the CoCoPIE Youtube Channel here, including broad applications such as real-time style transfer, super-resolution (enhancing resolution), automatic coloring, and GAN-based applications. Sample example applications are shown in the above Figure 3. It is interesting to note that the CoCoPIE compiler code generation is by far the strongest even without the aim of DNN compression. The following demos show the real-time style transfer from CoCoPIE compiler (left) and an example reference (Tencent NCNN, right), using the same DNN model on the same mobile device (Samsung Galaxy S10). We can clearly see the advantage of CoCoPIE compiler.

CoCoPIE News: