A Heterogeneous RISC-V Based SoC For Secure Nano-UAV Navigation
A Heterogeneous RISC-V Based SoC For Secure Nano-UAV Navigation
5, MAY 2024
Abstract— The rapid advancement of energy-efficient parallel fully programmable energy- and area-efficient multi-core cluster
ultra-low-power (ULP) µcontrollers units (MCUs) is enabling of RV32 cores optimized for general-purpose DSP as well as
the development of autonomous nano-sized unmanned aerial reduced- and mixed-precision ML. To the best of the authors’
vehicles (nano-UAVs). These sub-10cm drones represent the next knowledge, it is the first silicon prototype of a ULP SoC coupling
generation of unobtrusive robotic helpers and ubiquitous smart the RV64 and RV32 cores in a heterogeneous host+accelerator
sensors. However, nano-UAVs face significant power and payload architecture fully based on the RISC-V ISA. We demonstrate the
constraints while requiring advanced computing capabilities akin capabilities of the proposed SoC on a wide range of benchmarks
to standard drones, including real-time Machine Learning (ML) relevant to nano-UAV applications including general-purpose
performance and the safe co-existence of general-purpose and DSP as well as inference and online learning of quantized DNNs.
real-time OSs. Although some advanced parallel ULP MCUs The cluster can deliver up to 90GOp/s and up to 1.8TOp/s/W
offer the necessary ML computing capabilities within the pre- on 2-bit integer kernels and up to 7.9GFLOp/s and up to
scribed power limits, they rely on small main memories (<1MB) 150GFLOp/s/W on 16-bit FP kernels.
and µcontroller-class CPUs with no virtualization or security fea-
tures, and hence only support simple bare-metal runtimes. In this Index Terms— Heterogeneous, Linux, low-power, autonomous
nano-UAVs, RISC-V.
work, we present Shaheen, a 9mm2 200mW SoC implemented in
22nm FDX technology. Differently from state-of-the-art MCUs,
Shaheen integrates a Linux-capable RV64 core, compliant with I. I NTRODUCTION
the v1.0 ratified Hypervisor extension and equipped with timing
channel protection, along with a low-cost and low-power memory
controller exposing up to 512MB of off-chip low-cost low-power
HyperRAM directly to the CPU. At the same time, it integrates a
T HE number of Internet-of-Things (IoT) devices and the
spectrum of IoT applications are rapidly growing: from
home automation, robotics, industrial gateways, and building
automation to smart cities, digital signage, medical equipment,
Manuscript received 30 July 2023; revised 3 November 2023 and 7 January
and more [1]. In this context, nano-sized unmanned aerial
2024; accepted 18 January 2024. Date of publication 7 February 2024; date vehicles (nano-UAVs) can be considered the “ultimate” IoT
of current version 30 April 2024. This work was supported in part by the node, thanks to their ability to navigate, sense, analyse, and
Technology Innovation Institute, Secure Systems Research Center, Abu Dhabi, understand the surrounding environment. Nano-UAVs have a
United Arab Emirates; in part by the Spoke 1 on Future High-Performance- form factor of a few centimeters in diameter, and a weight
Computing (HPC) of the Italian Research Center on High-Performance
Computing, Big Data and Quantum Computing (ICSC) that received funding of only tens of grams, which allows them to safely operate
from the Ministry of University and Research (MUR) for the Mission 4–Next near humans and in narrow, cramped spaces [2], [3]. They
Generation EU programme; and in part through the TRISTAN (101095947) have a total power envelope of a few Watts, of which only
project that received funding from the HORIZON CHIPS-Joint Undertaking
programme. This article was recommended by Associate Editor Y. Tang.
5-15% for computation [4], and their small physical footprint
(Corresponding author: Luca Valente.) and limited payload restrain the maximum battery, the printed
Luca Valente, Alessandro Nadalini, Mattia Sinigaglia, Yvan Tortorella, circuit board size and exclude any form of active cooling.
Simone Benatti, and Davide Rossi are with the Department of Electrical, Elec- Nowadays, µcontroller units (MCUs) are the only computing
tronic and Information Engineering, University of Bologna, 40136 Bologna, platforms that meet the nano-UAV’s power and form-factor
Italy (e-mail: [Link]@[Link]).
Asif Hussain Chiralil Veeran and Baker Mohammad are with the Department constraints.
of Electrical Engineering and Computer Science, Khalifa University, Abu MCUs feature simple RISC host processors (e.g., ARM
Dhabi, United Arab Emirates. Cortex-M) with low computational capabilities and no virtu-
Bruno Sá and Sandro Pinto are with Centro ALGORITMI, University of alization support, to which they expose just a few hundred
Minho, 4800-058 Guimarães, Portugal.
Nils Wistoff is with the Integrated Systems Laboratory (IIS), ETH Zürich,
kBytes of on-chip SRAM scratchpad memory (SPM) [5],
8092 Zürich, Switzerland. [6], [7], [8], [9], [10], [11]. To deliver more advanced com-
Rafail Psiakis and Ari Kulmala are with the Secure Systems Research putational capabilities, state-of-the-art (SoA) MCUs integrate
Center, Technology Innovation Institute, Abu Dhabi, United Arab Emirates. accelerators with high data processing capabilities [5], [6], [7],
Daniele Palossi is with the Integrated Systems Laboratory (IIS), ETH
Zürich, 8092 Zürich, Switzerland, and also with the Dalle Molle Institute
[8], [9], [10], [11]. Usually, ultra-low-power (ULP) devices’
for Artificial Intelligence (IDSIA), USI-SUPSI, 6900 Lugano, Switzerland. accelerators are hardwired application-specific data-paths [9],
Luca Benini is with the Department of Electrical, Electronic and Information [10] which achieve the best energy efficiency but are tailored to
Engineering, University of Bologna, 40136 Bologna, Italy, and also with the a single application domain, leading to poor programmability
Integrated Systems Laboratory (IIS), ETH Zürich, 8092 Zürich, Switzerland.
Color versions of one or more figures in this article are available at
and a high nonrecurring engineering cost [12] while occupying
[Link] a considerable part of the scarce area resources. To improve
Digital Object Identifier 10.1109/TCSI.2024.3359044 the overall versatility of the SoC, recent works replace ASIC
1549-8328 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See [Link] for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
VALENTE et al.: HETEROGENEOUS RISC-V BASED SoC FOR SECURE NANO-UAV NAVIGATION 2267
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2268 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 71, NO. 5, MAY 2024
TABLE II
S TATE - OF - THE -A RT S O C S FOR UAV S
navigation is achieved by the combination of two components: large and power-hungry mixed-signal DDR controllers, all
mission control and flight control. Mission control is the high- within a power envelope of few watts [29], [31], [32], [33].
level decisional part of the navigation algorithm, e.g., path The NVIDIA Jetson TX2 is claimed to be “the fastest, most
planning [23], optimization-based control [24], etc. To carry power-efficient embedded artificial intelligence (AI) computing
out these types of tasks, SoA drones mostly rely on machine device” by NVIDIA [29] and it is the board of choice for
learning (ML) algorithms [2], [17]. Flight control, on the the Agilicious drone [17]. It features a Quad-core Cortex
other hand, is the actuation of the output decisions of mission A57 running up to 2GHz and a Pascal CUDA GPU, which
control: it collects data from the sensors to determine the can deliver up to 1.33TFLOp/s resulting in an overall power
vehicle’s state and generates the control law, which manages consumption of more than 7.5 W. The Intel Atom x7 is the
the actuators [25]. Flight control is often based on cascade PID heart of the Intel UpBoard platforms, as big as a credit card.
control [26], especially in the case of nano-UAVs [27], [28], It features 4 Intel Atom processors running up to 2GHz and
and it is not as computationally intensive as mission control, an Intel HD 505 GPU delivering up to 230GFLOp/s and
but it requires low-latency guarantees. As a consequence, also roughly consuming 10W. Another compute hardware platform
in the context of standard and micro-drones, flight control commonly used on autonomous UAVs is the NanoPi Neo
is usually carried out by simple MCUs with a predictable Air, which integrates an Allwinner H7 SoC with a quad-
execution time like the STM32-H7 [5] integrated into the core CortexA7 and a Mali-400 MP2 GPU, delivering up to
Pixhawk board [25]. Table II shows some mainstream SoCs 10GFLOp/s. All these SoCs offer a mainstream Ubuntu-ready
successfully deployed on drones of standard, micro, and nano software stack and virtualization capabilities and can handle
size. For each SoC, it highlights the different computational very sophisticated and complex applications. However, due to
capabilities and power envelope, as well as the specific tasks their power envelope, size, and the necessity for high-end off-
and UAV platforms they are suited for, detailed in the three chip memories, these SoCs can only be integrated into standard
sections below. Sections II-A and II-B describe the state of and micro-UAVs.
the art of standards, micro, and nano UAVs SoCs. Naturally, Shaheen can not compete with these architectures
in terms of performance, but our approach borrows the best of
their characteristic while targeting a much smaller power enve-
lope. Firstly, to mimic high-end SoCs with their heterogeneous
A. SoCs for Standard and Micro-Sized UAVs GPU-based architecture, Shaheen integrates an RV32-based
As table I shows, micro-size drones integrate embedded parallel programmable cluster along with an RV64 CPU.
computers, while standard-sized drones can even accommo- Secondly, it exposes a significant amount of off-chip main
date desktop processors. Nevertheless, embedded processors memory to the CPU. However, instead of high-performance
can nowadays deliver performance in the order of hundreds DRAMs (LPDDR3/4/5) that are connected through large,
of TOp/s and hundreds of GFLOp/s, which has proven to be proprietary and expensive mixed-signal PHYs with a high pin
sufficient to support the full flight stack for mission control, count (>30), Shaheen leverages HyperRAMs, which are fully-
both for micro [17] and standard-size UAVs [22]. digital low-power small-area DRAMs with less than 14 pins
Embedded computers integrate high-end SoCs with and feasible to be deployed on nano-UAVs. A similar approach
application-class cores, supporting virtualization and vari- is adopted in Cheshire [34], which is not optimized for nano-
ous privilege levels (and hence full-fledged OSs), embed- UAV applications. Cheshire revolves around CVA6 as Shaheen
ded GPUs, and GBytes of high-performance off-chip and exposes up to 1GB of Reduced Pin Count (RPC) DRAM
LPDDR/DDR4/5 memories, connected through expensive, memory, which uses a minimum number of signals to deliver
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
VALENTE et al.: HETEROGENEOUS RISC-V BASED SoC FOR SECURE NANO-UAV NAVIGATION 2269
DDR3-level in-system bandwidth at the cost of 22 switching with enhanced computational capabilities, based on parallel
signals for a 16-bit wide data bus [34], [35]. While RPC and programmable accelerators. GAP8 and GAP9 are commercial
the related controller offer higher bandwidth than HyperBUS, products by GreenWaves Technologies compliant with the
the RPC protocol is more convoluted, leading to higher design so-called Crazyflie-AIdeck [41] board, which is meant as a
complexity and a bigger area, mostly due to the four 8kB companion of the Crazyflie to offload the mission control
buffers [34]. More importantly, Cheshire’s CVA6 does not tasks [2]. GAP8 embeds the so-called Ri5cy [7] core as host
feature hardware virtualization support and micro-architectural CPU and 1.5MB of on-chip SRAM memory, accompanied
extensions for timing channel mitigations. Lastly, while being by a parallel programmable cluster of other eight Ri5cy cores
easily extensible through the AXI4 interface, Cheshire’s sili- delivering up to 150 GOP/s on 8-bit data. Ri5cy is a 4 pipeline-
con prototype does not integrate a parallel accelerator, heavily stages core compliant with the so-called XpulpV2 ISA,
limiting the offered performance. To sum up, Shaheen is the a custom RISC-V ISA based on RV32 with extensions for
first silicon implementation of a heterogeneous MCU coupling DSP and ML applications, with support for 16/8-bit SIMD
an RV64 core with a cluster of eight RV32 cores and tens of operations and hardware loops. GAP9 is an improved version
MB of main memory. of the GAP8 processor. It is fabricated in a more advanced
node than GAP8, halving the power envelope and it features
B. SoCs for Nano-UAVs as well 2MB of non-volatile SVM memory. Also, differently
from GAP8, GAP9’s cluster includes 4 FPUs with FP16/32
A state-of-the-art MCU for nano-UAVs platforms is the support. Lastly, Kraken [8] is a research prototype based on
STM32-F4 [6]. The STM32-F4 is the computational unit of the same heterogeneous architecture of GAP8 and GAP9, i.e.,
the Crazyflie [36] platform, integrating a Cortex-M4 core and an RV32 CPU along with an eight RV32 core cluster, which
192kB of on-chip SRAM with 180MHz of maximum operat- delivers up to 90 GOp/s on 2-bit data. Kraken’s RV32 cores
ing frequency. Its low performance and small memory capacity are a more advanced version of Ri5cy, i.e., the Ri5cyNN
limit the autonomous navigation capabilities of the nano-drone cores [8] with support of sub-byte SIMD operations and fused
when compared to embedded computers. To this extent, two Mac&Load instructions, which enable the concurrent execu-
kinds of approaches have been proposed: minimization of the tion of SIMD dot-product and memory accesses, increasing
workload [37] or offloading of the mission control computation the computation efficiency up to 94%. Kraken embeds 1.5MB
to an external base station [38], limiting the MCU to flight of on-chip SRAM memory and the CUTIE accelerator, able
control. The latter approach presents severe drawbacks, in the to achieve up to roughly 90k Ternary-MACs per cycle. Fur-
first instance, it introduces network-dependent latency, limiting thermore, it provides an event-based camera, tightly coupled
the maximum distance from the workstation to a few tens of with a Spiking Neural Engine accelerator. When compared to
meters. Also, the data transmitted are subject to noise on the traditional cameras, event-based cameras offer high temporal
transmission channel, limiting reliability, and eavesdropping resolution (in the order of µs), very high dynamic range
on confidential data [39]. (140 dB vs. 60 dB), low power consumption, and high pixel
To offer enhanced computational capabilities within a small bandwidth (on the order of kHz) resulting in reduced motion
power budget, recent works also propose SoCs featuring blur [42].
hardwired ASIC accelerators designed for specific UAV appli- Shaheen’s approach leverages the best from these advanced
cations, like, for example, motion-control [9], visual-inertial AI IoT SoCs, integrating its own fully-programmable parallel
odometry (VIO) [10], simultaneous-localization-and-mapping 8-core RV32 cluster accelerator. Shaheen’s RV32 cores stem
(SLAM) [40], or QNN inference [7], [8]. These accelerators from the Ri5cyNN cores and are further enhanced with mixed-
achieve impressive energy efficiency, in the order of hundreds precision support to eliminate the massive software overhead
of TOp/s/W, by carefully mapping the target algorithm to the necessary for packing and unpacking data when executing
hardware. For example, many accelerators exploit the inherent mixed-precision sub-byte kernels, providing up to 8.5x speed-
parallelism of the target application, such as using a systolic up over Kraken and less than 5.6% extra area resource over
array for motion control [9]. Another common approach the baseline core without extensions. In addition, Shaheen
exploits reduced-precision arithmetic, as in SLAM [40] and addresses a major limitation of SoA MCUs: the software stack
VIO [10], to reduce the memory footprint and the datapath based on lightweight RTOSs or simple bare-metal runtimes.
size. Exploiting both parallelism and reduced-precision com- Programming applications on these stacks is hard, owing to
putation is also a well-established technique to accelerate QNN (i) the lack of virtualization capabilities of the host CPUs and
inference and training, due to the nature of such algorithm. For (ii) the small amount of memory directly accessible through
example, accelerators like the NE16 in GAP9 [7], the HWCE loads and stores, which limits the maximum software memory
in GAP8 [7], and the ternary weight neural-network (so-called footprint. In the context of MCUs, memory resources coincide
CUTIE) accelerator in Kraken [8], able to reach peaks of with on-chip SRAMs and off-chip DRAMs. The first ones
11.6 TMAC/s, have been proposed to speed up QNN inference. provide high bandwidth but are limited to a few hundred kB,
However, due to poor flexibility and programmability, these due to the area and power constraints [21]. The latter ones offer
accelerators have to anyway rely on general-purpose CPUs one order of magnitude more capacity but are much slower
to achieve end-to-end flight. Furthermore, the high area cost and are typically accessed only through explicit input-output
per device makes them hard to adopt as they risk becoming copy functions. Thus, beyond SoA, to support a richer software
obsolete due to the rapid evolution of the target nano-UAVs stack while offering the advanced computing performance
applications. and energy efficiency of the RV32-based cluster, Shaheen
To overcome these limitations, recent MCUs integrate par- integrates an RV64 core with advanced virtualization and
allel fully-programmable and flexible accelerators, that have security features, along with up to 512MB of main memory.
successfully proved to enable autonomous navigation [2], This enables the secure coexistence of rich and mature general-
[3]. Namely, GAP8, GAP9 [7] and Kraken [8] are MCUs purpose OS and bare-metal RTOS on the same platform and
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2270 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 71, NO. 5, MAY 2024
TABLE III
RISC-V ISA P RIVILEGE M ODES W ITH THE H YPERVISOR E XTENSION
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
VALENTE et al.: HETEROGENEOUS RISC-V BASED SoC FOR SECURE NANO-UAV NAVIGATION 2271
Fig. 3. Channel matrices on the CHANNEL BENCH test. Fig. 4. HyperRAM memory controller architecture.
into a known state (prime). In the following time slice, the OS To enable independent data transfer from peripherals to
switches to an application containing a Trojan, which accesses the SoC, Shaheen includes in the peripheral domain the so-
a subset of the hardware resource to encode a secret. Finally, called “µDMA subsystem” which is a controller intended to
when the execution switches back to the spy, it again probes autonomously serve a set of I/O interfaces popular in critical
(probe) the whole buffer and observes an execution time t, applications. Such interfaces include for instance HyperBUS,
correlated with the encoded secret. For data caches, the spy I2C, (Q)SPI, CPI, SDIO, UART, CAN, PWM, and I2S. The
traverses a large buffer of n lines so that the Trojan can then µDMA exports two ports, one for receiving and one for
transmit a secret s ≤ n by touching s lines: in the last time sending data, to read/write data from/to the L2SPM SRAM
slice, the spy decodes s from the measured execution time t. memory to/from the off-chip peripherals [7]. Shaheen also
In this context, the fence.t extends the control that the features an open-source Linux-compliant Ethernet IP, to be
OS, or the Hypervisor, has over the hardware. Namely, fully compliant with the Pixhawk standard [25], popular
it provides the capability of clearing vulnerable microarchi- open-source hardware specifications and guidelines for drone
tectural states to enable a history-independent context-switch systems development.
latency by flushing the caches and the TLB and resetting the 1) HyperRAM Memory Controller: Fig. 4 depicts Shaheen’s
internal FSMs of the core. The fence.t has been validated HyperRAM controller, which provides a configuration APB
against prime-and-probe attacks from the MASTIK toolkit [20], port and an AXI4 subordinate port. It connects the SoC with
[45]. These attacks are implemented within Ge’s CHANNEL off-chip HyperRAMs, compliant with the HyperBUS protocol,
BENCH [46], [47] suite, which provides a minimal OS and which is a fully digital protocol counting 11+n pins: 3 control
data collection infrastructure, running on an experimental pins, n Chip Select (CS), and 8 Double-Data-Rate pins used
version of seL4 supporting timing protection. To visualize both for commands and data [21]. Depending on the off-chip
the correlation between s and t, we use channel matrices. memory models, the controller exposes between 32MB and
A channel matrix represents the conditional probability of 512MB to the interconnect, and it provides up to 1.6 Gbps.
getting an execution time t, having an input secret s. In Fig. 3, HyperRAMs are the main memory of choice for Shaheen
we represent the channel matrix as heatmaps: s (the secret because, differently from high-end DDR DRAM memories,
encoded by the Trojan by touching s ≤ n data cache lines) they target a much lower power consumption and silicon
varies horizontally, and t (the execution time measured by the footprint while guaranteeing enough bandwidth for advanced
spy) varies vertically, bright colours indicate a high probability AI IoT applications and capacity to boot embedded SPM
and dark colours indicate a low probability of measuring such Linux [21].
t, given a certain s. There are two distinct modules within the HyperRAM con-
Fig. 3 shows the channel matrices on the CHANNEL troller, i.e., the PHY controller (back-end) and the front-end,
BENCH test for CVA6’s write-through L1 data cache. On the operating in different frequency domains. The front-end mod-
left it is shown the matrix when not using the fence.t: the ule consists of an AXI4-to-PHY converter and a specialized
correlation between the Trojan’s secret and the spy’s probe µDMA engine channel accessible through APB to execute
time indicates a covert channel. On the right, when using the software-programmed DMA transfers. The AXI4 and µDMA
fence.t, there is no correlation. With less than 320 additional transactions are multiplexed towards the PHY, which translates
clock cycles to the context-switch latency (insignificant at the incoming data packets into HyperRAM transactions and
typical switch rates of 1 kHz), the fence.t requires a low vice versa. The AXI4 front-end enqueues the AXI4 transac-
implementation effort and negligible hardware costs. tions individually and lets through only one read or one write
at a time and converts it into a request for the PHY. At this
point, the back-end translates the request into a command for
B. Host & Peripheral Domain the HyperRAMs and issues it over the HyperBUS. Following,
The host domain leverages the popular AXI4 protocol [48] in the case of a write, the W channel transactions get converted
for the main interconnect. Namely, it includes a 64-bit AXI4 into multiple PHY data packets. For reads, the PHY back-
crossbar delivering up to 32Gbps on each AXI4 port, respec- end sends data packets to a converter that then populates the
tively on read and write channels. It also includes 4 256kB R channel. The µDMA engine directly connects the L2SPM
SRAM banks, composing a 1MB L2 ScratchPad Memory and the back-end and can generate both 1D and 2D burst
(L2SPM) delivering up to 64Gbps, either for writing or transactions. These features are highly valuable for the efficient
reading. The L2SPM is meant to (i) store data to be shared execution of ML algorithms on the cluster, as it is achieved
with off-chip peripherals, (ii) store the cluster code, (iii) through explicit orchestration of the data movement between
for fast communication between CVA6 and the cluster, and, the off-chip memory, the L2SPM and the L1SPM [49].
more in general, (iv) for low-latency (<10 clock cycles) and To double the bandwidth and the capacity, Shaheen’s back-
predictable accesses. end module controls 2 HyperBUS interfaces in parallel, and it
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2272 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 71, NO. 5, MAY 2024
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
VALENTE et al.: HETEROGENEOUS RISC-V BASED SoC FOR SECURE NANO-UAV NAVIGATION 2273
TABLE IV
F LEX -V’ S P ERFORMANCE [MAC/ CYCLE ] ON M AT M UL K ERNELS ,
AGAINST XP ULP NN AND XP ULP V2 [50]
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2274 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 71, NO. 5, MAY 2024
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
VALENTE et al.: HETEROGENEOUS RISC-V BASED SoC FOR SECURE NANO-UAV NAVIGATION 2275
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2276 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 71, NO. 5, MAY 2024
Fig. 14. Performance and en. eff. delivered by the cluster on general-purpose
DSP benchmarks. Fig. 15. Performance and En. Eff. delivered by the cluster on FP training
benchmarks.
TABLE VI
ACCURACY, M EMORY F OOTPRINT, P ERF. & E N . higher frequency and optimized ISA, Shaheen’s cluster pro-
O F E ND - TO -E ND N ETWORKS vides the smallest latency, but not the lowest energy, due to
higher power consumption (70mW) when compared to GAP9
(50mW), being the latter tuned for energy-efficient operation.
As soon as the mixed-precision extensions can be exploited,
Shaheen’s cluster emerges both as the fastest and most energy-
efficient one.
C. Online Training
In this subsection, we benchmark the cluster against a set of
open-source kernels to enable online learning on MCU con-
trollers [16]. In particular, we benchmark three very popular
layers such as 2D Convolution, Pointwise and Fully-Connected
which are the three building blocks of Convolutional NNs
(CNNs), used to find patterns in images. Convolutional and
pointwise layers are the core building blocks of CNN, where
navigation within tight spaces avoiding obstacle collision. most of the computation happens, and are used to perform
FrontNet on the other hand is based on the MobileNet [54] feature extraction. The Fully connected layer connects the
architecture and it is used for Human-Robot Interaction (HRI): information extracted from the previous steps (i.e., Convo-
it allows the nano-drone to recognize a face and follow it. The lution layer and Pooling layers) to the output layer and
cluster is able to achieve 320FPS on a Tiny-PULP-Dronet and eventually classifies the input into the desired label. For each
260FPS on an optimized 6.7MMAC FrontNet, which is well layer, we consider the three phases of training: (i) the forward
above the 20FPS needed to achieve autonomous flight [2], [3]. pass, to compute the output result and hence the loss, (ii)
This means that more than 90% of the cluster’s computational the backward computation of the gradients with respect to
capabilities are actually available to carry out other activities. the activations, and (iii) the backward computation of the
Stemming the analysis from the QNNs mentioned below, gradients with respect to the weights. The kernels we leverage
we first benchmark the cluster on a relatively big (325MMAC) map each of these computation phases directly to one matrix
8-bit MobileNetV1 [54] for object classification. Then, multiplication containing all the matrix multiplications needed
we extend the analysis to a mixed-precision MobileNetV1 with to obtain the output [16]. Depending on the matrices’ shapes,
8-bit activations and 4-bit (8b4b) weights and an aggressively the amount of parallelizable work changes, and hence the
quantized 4b2b ResNet-20 [57] for object detection. The two performance [58]. Figure 15 shows performance and energy
MobileNetV1 networks have been trained on ImageNet while efficiency over such benchmarks. As for the DSP bench-
the 4b2b ResNet-20 targets CIFAR10. As table VI shows, marks, the parallelization provides a significant speed-up for
reducing the operands’ precision does not automatically jeop- most of them. Except for the weight gradient computation
ardize the accuracy: in the case of the MobileNetV1, there on the convolution kernel, which achieves a 4.7× speedup,
is a 47% memory footprint reduction for a negligible 3% the parallelization provides between 6.1x and 7.5x faster
accuracy loss, from 69% to 66% [50], while the ResNet- execution. At the same time, leveraging the bfloat16 format
20 achieves 90.2% accuracy [55]. As shown in table VI, (providing a wide dynamic range explicitly thought for ML
Flex-V is the only version of Ri5cy able to efficiently deal training) and the dedicated SIMD extensions provides up
with mixed precision networks: in terms of MAC/cycles, to 1.8x more performance. Overall, the cluster is able to
on the 4b-2b mixed-precision ResNet-20, it achieves 2.3× achieve up 6.2 GFLOp/s and 120 GFLOp/s/W on this class of
and 2.5× of speedup with respect to XPulpNN and XPulpV2. benchmarks.
Table VI also compares the latency and energy consumed
by Shaheen over the three networks when running at the
maximum frequency at 0.8V, compared to two other 8-core VII. C OMPARISON W ITH S TATE - OF - THE -A RT
clusters respectively implementing the baseline XPulpV2 Table VII shows Shaheen against 6 SoCs for UAVs, both
instructions or the XPulpNN, namely GAP9 [7] and Kraken from industry and academia. To have a thorough comparison,
[8]. On the uniform-precision MobileNetV1, thanks to the we extend it also with SoC not explicitly optimized for UAVs
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
VALENTE et al.: HETEROGENEOUS RISC-V BASED SoC FOR SECURE NANO-UAV NAVIGATION 2277
TABLE VII
C OMPARISON W ITH S OA S O C S
but with similar general-purpose software performance and RV32 cluster, achieving up to 90GOp/s and up to 1.8TOp/s/W
functionalities that could fit the purpose, namely Cheshire [34], on 2-bit integer kernel and up to 26.9GOp/s and up to
the work from Jia et al. [31], the STM32-H7 [5] and the 540GOp/s/W on 8-bit integer kernels.
work by Ju et al. [11]. From an architectural viewpoint, After this thorough evaluation, we envision the miniaturiza-
Ju et al. [11] consists of a homogeneous systolic array of RV32 tion of the testing PCB (see Fig. 9) and development of ad-hoc
cores, while Jia et al. [31] instantiates a cluster of four RV64 control software, tightly coupled with the physical character-
cores along with a set of hardwired ASIC accelerators. More istics of the board, to achieve real-world nano-UAV flight,
advanced nano-UAV SoCs, such as GAP9 [7] and Kraken [8], exploiting Shaheen’s secure and scalable architecture with
incorporate an RV32 CPU that can offload compute-intensive host/cluster decoupling and advanced virtualization. Overall,
tasks to a parallel cluster of cores with the same ISA. Shaheen is the first prototype SoC providing support for
In this context, Shaheen is the first silicon demonstrator general-purpose OSs within a 200mW power envelope while
of a heterogeneous RV64/RV32 architecture. When offloading offering state-of-the-art performance over a wide spectrum of
compute-intensive tasks to the fully-programmable parallel applications, thanks to the programmable multi-core cluster.
cluster of Flex-V cores, performance can be improved by up to All the IPs integrated within Shaheen are released as open
2 orders of magnitude achieving state-of-the-art performance source2 under a liberal license to foster future research in the
with up to 90 GOp/s on heavily quantized integer tasks and area of AI-IoT computing devices.
up to 7.9 GFLOp/s/W on 16-bit floating point tasks. Shaheen
stands out as the only nano-UAV SoC that provides Linux, R EFERENCES
hypervisor, and security capabilities to the host enabling [1] M. O. Ojo, S. Giordano, G. Procissi, and I. N. Seitanidis, “A review of
the secure co-existence of user applications running on full- low-end, middle-end, and high-end IoT devices,” IEEE Access, vol. 6,
fledged OSes and control tasks running on real-time OSes pp. 70528–70554, 2018.
[2] L. Lamberti et al., “Tiny-PULP-Dronets: Squeezing neural networks
while providing up to 512MB of low-cost and low-power off- for faster and lighter inference on multi-tasking autonomous nano-
chip memory within the power envelope of 200 mW. drones,” in Proc. IEEE 4th Int. Conf. Artif. Intell. Circuits Syst. (AICAS),
Jun. 2022, pp. 287–290.
VIII. C ONCLUSION [3] E. Cereda et al., “Deep neural network architecture search for accurate
visual pose estimation aboard nano-UAVs,” in Proc. IEEE Int. Conf.
We presented Shaheen: a heterogeneous and flexible SoC Robot. Autom. (ICRA), May 2023, pp. 6065–6071.
implemented in 22 nm FDX technology. Shaheen features a [4] R. J. Wood et al., “Progress on ‘Pico’ air vehicles,” in Robotics Research.
Linux-capable RV64 core, compliant with the v1.0 ratified Cham, Switzerland: Springer, 2017, pp. 3–19, doi: 10.1007/978-3-319-
Hypervisor extension. To the best of our knowledge, it is the 29363-9_1.
first silicon implementation fully compliant with the ratified [5] STMicroeletronics. (2020). STM32H7. Accessed: Jul. 30, 2023.
[Online]. Available: [Link]
RISC-V ISA Hypervisor extension. It features support for processors/[Link]
timing channel protection to isolate concurrent execution of [6] STMicroeletronics. (2020). STM32F4. Accessed: Jul. 30, 2023.
multiple software stacks (trusted and untrusted), preventing [Online]. Available: [Link]
security threats and ensuring multi-domain operations. It pro- processors/[Link]
vides up to 512MB of main off-chip HyperRAM memory, [7] GreenWavesTechnology. (2023). GAP8/9. Accessed: Jul. 30, 2023.
[Online]. Available: [Link]
large enough to host general-purpose OSs as well as RTOSs. processor/
Also, it is the first silicon implementation of a heterogeneous
MCU coupling an RV64 host together with a multi-core 2 [Link]
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2278 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 71, NO. 5, MAY 2024
[8] A. Di Mauro, M. Scherer, D. Rossi, and L. Benini, “Kraken: A direct [29] NVIDIA. (2023). NVIDIA Jetson TX2. Accessed: Jul. 30, 2023. [Online].
event/frame-based multi-sensor fusion SoC for ultra-efficient visual Available: [Link]
processing in nano-UAVs,” in Proc. IEEE Hot Chips 34 Symp. (HCS), [30] Intel. (2020). Intel Atom X7-E3950. Accessed: Jul. 30, 2023.
Aug. 2022, pp. 1–19. [Online]. Available: [Link]
[9] I.-T. Lin et al., “2.5 A 28 nm 142 mW motion-control SoC for en/products/sku/96488/intel-atom-x7e3950-processor-2m-cache-up-
autonomous mobile robots,” in IEEE Int. Solid-State Circuits Conf. to-2-00-ghz/[Link]
(ISSCC) Dig. Tech. Papers, Feb. 2023, pp. 1–3. [31] T. Jia et al., “A 12 nm agile-designed SoC for swarm-based perception
[10] A. Suleiman, Z. Zhang, L. Carlone, S. Karaman, and V. Sze, “Navion: with heterogeneous IP blocks, a reconfigurable memory hierarchy, and
A 2-mW fully integrated real-time visual-inertial odometry accelerator an 800 MHz multi-plane NoC,” in Proc. IEEE 48th Eur. Solid State
for autonomous navigation of nano drones,” IEEE J. Solid-State Circuits, Circuits Conf. (ESSCIRC), Sep. 2022, pp. 269–272.
vol. 54, no. 4, pp. 1106–1119, Apr. 2019. [32] C.-H. Lin et al., “7.1 A 3.4-to-13.3 TOPS/W 3.6 TOPS dual-core deep-
[11] Y. Ju and J. Gu, “A systolic neural CPU processor combining deep learning accelerator for versatile AI applications in 7 nm 5G smartphone
learning and general-purpose computing with enhanced data locality and SoC,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
end-to-end performance,” IEEE J. Solid-State Circuits, vol. 58, no. 1, Feb. 2020, pp. 134–136.
pp. 216–226, Jan. 2023. [33] C. Schmidt et al., “An eight-core 1.44-GHz RISC-V vector proces-
[12] W. J. Dally, Y. Turakhia, and S. Han, “Domain-specific hardware sor in 16-nm FinFET,” IEEE J. Solid-State Circuits, vol. 57, no. 1,
accelerators,” Commun. ACM, vol. 63, no. 7, pp. 48–57, Jun. 2020, doi: pp. 140–152, Jan. 2022.
10.1145/3361682. [34] A. Ottaviano, T. Benz, P. Scheffler, and L. Benini, “Cheshire:
[13] P. Tsiotras, D. Jung, and E. Bakolas, “Multiresolution hierarchical A lightweight, Linux-capable RISC-V host platform for domain-specific
path-planning for small UAVs using wavelet decompositions,” J. Intell. accelerator plug-in,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 70,
Robotic Syst., vol. 66, no. 4, pp. 505–522, Jun. 2012. no. 10, pp. 3777–3781, Oct. 2023.
[14] A. Khadka, B. Fick, A. Afshar, M. Tavakoli, and J. Baqersad, “Non- [35] Etron. (2022). 256 MB High Bandwidth RPC DRAM. [Online].
contact vibration monitoring of rotating wind turbines using a semi- Available: [Link]
autonomous UAV,” Mech. Syst. Signal Process., vol. 138, Apr. 2020, GA16LGDABMACAEA-RPC-DRAM_Rev.-[Link]
Art. no. 106446. [36] Bitcraze. (2023). Crazyflie. Accessed: Jul. 30, 2023. [Online]. Available:
[15] D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and [Link]
D. Mané, “Concrete problems in AI safety,” 2016, arXiv:1606.06565. [37] G. Shi, W. Hönig, Y. Yue, and S.-J. Chung, “Neural-swarm:
Decentralized close-proximity multirotor control using learned inter-
[16] D. Nadalini, M. Rusci, G. Tagliavini, L. Ravaglia, L. Benini, and
actions,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2020,
F. Conti, “PULP-TrainLib: Enabling on-device training for RISC-V
pp. 3241–3247.
multi-core MCUs through performance-driven autotuning,” in Embedded
Computer Systems: Architectures, Modeling, and Simulation. Cham, [38] F. Candan, A. Beke, and T. Kumbasar, “Design and deployment of fuzzy
Springer, 2022, pp. 200–216. PID controllers to the nano quadcopter crazyflie 2.0,” in Proc. Innov.
Intell. Syst. Appl. (INISTA), Jul. 2018, pp. 1–6.
[17] P. Foehn et al., “Agilicious: Open-source and open-hardware agile
quadrotor for vision-based flight,” Sci. Robot., vol. 7, no. 67, Jun. 2022, [39] B. Nassi, R. Bitton, R. Masuoka, A. Shabtai, and Y. Elovici, “SoK:
Art. no. eabl6259. Security and privacy in the age of commercial drones,” in Proc. IEEE
Symp. Secur. Privacy (SP), May 2021, pp. 1434–1451.
[18] B. Sá, L. Valente, J. Martins, D. Rossi, L. Benini, and S. Pinto, “CVA6
RISC-V virtualization: Architecture, microarchitecture, and design space [40] J.-H. Yoon and A. Raychowdhury, “31.1 A 65 nm 8.79 TOPS/W
exploration,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 31, 23.82 mW mixed-signal oscillator-based NeuroSLAM accelerator for
no. 11, pp. 1713–1726, Nov. 2023. applications in edge robotics,” in IEEE Int. Solid-State Circuits Conf.
(ISSCC) Dig. Tech. Papers, Feb. 2020, pp. 478–480.
[19] M. Schneider, A. Dhar, I. Puddu, K. Kostiainen, and S. Čapkun, “Com-
[41] Bitcraze. (2023). AI-Deck. Accessed: Jul. 30, 2023. [Online]. Available:
posite enclaves: Towards disaggregated trusted execution,” IACR Trans.
[Link]
Cryptograph. Hardw. Embedded Syst., vol. 2022, no. 1, pp. 630–656,
Nov. 2021. [42] G. Gallego et al., “Event-based vision: A survey,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, Jan. 2022.
[20] N. Wistoff, M. Schneider, F. K. Gürkaynak, G. Heiser, and L. Benini,
“Systematic prevention of on-core timing channels by full temporal [43] ROS. (2022). Robot Operating System. Accessed: Jul. 30, 2023.
partitioning,” IEEE Trans. Comput., vol. 72, no. 5, pp. 1420–1430, [Online]. Available: [Link]
May 2023. [44] OpenHW. (2023). CVA6. [Online]. Available: [Link]
/openhwgroup/cva6
[21] B. John. (2020). HyperRAM As a Low Pin-count Expansion Memory
for Embedded Systems. Accessed: Jul. 30, 2023. [Online]. Available: [45] Y. Yarom. (2016). Mastik: A Micro-Architectural Side-Channel Toolkit.
[Link] Accessed: Jul. 30, 2023. [Online]. Available: [Link]
[46] Q. Ge, “Principled elimination of microarchitectural timing channels
[22] A. Das, P. Kol, C. Lundberg, K. Doelling, H. E. Sevil, and F. Lewis,
through operating-system enforced time protection,” Ph.D. dissertation,
“A rapid situational awareness development framework for heteroge-
Dept. School Comput. Sci. Eng., UNSW, Sydney, NSW, Australia,
neous manned-unmanned teams,” in Proc. IEEE Nat. Aerosp. Electron.
2019.
Conf. (NAECON), Jul. 2018, pp. 417–424.
[47] Sel4. (2023). Timing Channel Benchmarking Tool. [Online]. Available:
[23] B. Forsberg, D. Palossi, A. Marongiu, and L. Benini, “GPU-accelerated [Link]
real-time path planning and the predictable execution model,” Proc.
[48] ARM. (2022). AMBA AXI Protocol Specification. [Online]. Available:
Comput. Sci., vol. 108, pp. 2428–2432, Jan. 2017. [Online]. Available:
[Link]
[Link]
[49] A. Burrello, A. Garofalo, N. Bruschi, G. Tagliavini, D. Rossi, and
[24] S. A. Quintero and J. P. Hespanha, “Vision-based target tracking with F. Conti, “DORY: Automatic end-to-end deployment of real-world
a small UAV: Optimization-based control strategies,” Control Eng. DNNs on low-cost IoT MCUs,” IEEE Trans. Comput., vol. 70, no. 8,
Pract., vol. 32, pp. 28–42, Nov. 2014. [Online]. Available: [Link] pp. 1253–1268, Aug. 2021.
.[Link]/science/article/pii/S0967066114001774
[50] A. Nadalini et al., “A 3 TOPS/W RISC-V parallel cluster for infer-
[25] Pixhawk. (2023). PX4. Accessed: Jul. 30, 2023. [Online]. Available: ence of fine-grain mixed-precision quantized neural networks,” in
[Link] Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI), Jun. 2023,
[26] M. Idrissi, M. Salami, and F. Annaz, “A review of quadrotor unmanned pp. 1–6.
aerial vehicles: Applications, architectural design and control algo- [51] OpenHW. (2023). CV32E40P. Accessed: Jul. 30, 2023. [Online]. Avail-
rithms,” J. Intell. Robot. Syst., vol. 104, no. 2, p. 22, Jan. 2022. able: [Link]
[27] C. Budaciu, N. Botezatu, M. Kloetzer, and A. Burlacu, “On the [52] A. Kurth, B. Forsberg, and L. Benini, “HEROv2: Full-stack open-
evaluation of the crazyflie modular quadcopter system,” in Proc. 24th source research platform for heterogeneous computing,” IEEE Trans.
IEEE Int. Conf. Emerg. Technol. Factory Autom. (ETFA), Sep. 2019, Parallel Distrib. Syst., vol. 33, no. 12, pp. 4368–4382, Dec. 2022, doi:
pp. 1189–1195. 10.1109/TPDS.2022.3189390.
[28] O. H. Zekry, T. Attia, A. T. Hafez, and M. M. Ashry, “PID tra- [53] L. Valente et al., “HULK-V: A heterogeneous ultra-low-power Linux
jectory tracking control of crazyflie nanoquadcopter based on genetic capable RISC-V SoC,” in Proc. Design, Autom. Test Eur. Conf. Exhib.
algorithm,” in Proc. IEEE Aerosp. Conf., Mar. 2023, pp. 1–8. (DATE), Apr. 2023, pp. 1–6.
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
VALENTE et al.: HETEROGENEOUS RISC-V BASED SoC FOR SECURE NANO-UAV NAVIGATION 2279
[54] A. G. Howard et al., “MobileNets: Efficient convolutional neural net- Simone Benatti received the Ph.D. degree in electrical engineering and
works for mobile vision applications,” 2017, arXiv:1704.04861. computer science from the University of Bologna, Bologna, Italy, in 2016.
[55] Z. Dong, Z. Yao, A. Gholami, M. W. Mahoney, and K. Keutzer, “HAWQ: He has collaborated with several international research institutes and compa-
Hessian aware quantization of neural networks with mixed-precision,” nies. Previously, he was an electronic designer and a research and development
in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct./Nov. 2019, engineer of electromedical devices, for eight years. He has authored or
pp. 293–302. coauthored more than 50 papers in international peer-reviewed conferences
[56] P. Platform. (2023). TransLib. Accessed: Jul. 30, 2023. [Online]. Avail- and journals. His research interests focus on energy efficient embedded
able: [Link] wearable systems, signal processing, sensor fusion, and actuation systems.
[57] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” 2015, arXiv:1512.03385. Rafail Psiakis received the bachelor’s and M.S. joint diploma degree from
[58] G. M. Amdahl, “Validity of the single processor approach to achieving the Department of Electrical and Computer Engineering, University of Patras,
large scale computing capabilities,” in Proc. Spring Joint Comput. Conf. Greece, in 2015, and the Ph.D. degree from the University of Rennes,
(AFIPS). New York, NY, USA: Association for Computing Machinery, France, in 2018. Currently, he is a Lead Silicon Security Researcher with
Apr. 1967, pp. 483–485, doi: 10.1145/1465482.1465560. the Technology Innovation Institute, Abu Dhabi, United Arab Emirates.
Previously, he was an embedded research and development security engineer
and a research assistant, for five years. His research interests include embedded
systems, security, confidential computing, computer architecture, RISC-V, and
Luca Valente received the [Link]. degree in electronic engineering from fault tolerance.
the Polytechnic University of Turin in 2020. He is currently pursuing the
Ph.D. degree with the Department of Electrical, Electronic and Information
Ari Kulmala received the Ph.D. degree from the Tampere University of
Technologies Engineering (DEI), University of Bologna. His main research
Technology (TUT) in 2009. Currently, he is leading the SoC development with
interests are hardware-software co-design of multi-processors heterogenous
the Technology Innovation Institute (TII), Secure Systems Research Center,
systems on chip, parallel programming, and FPGA prototyping.
Abu Dhabi; and holds the position of a Professor of practice with TUT. His
experience on system-on-chip design ranges from small power mobile devices
to large-scale processing infrastructure devices and datacenter applications.
Alessandro Nadalini received the [Link]. and [Link]. degrees in electronic
engineering from the University of Bologna, Bologna, Italy, in 2018 and Baker Mohammad (Senior Member, IEEE) received the B.S. degree in ECE
2021, respectively. He currently holds a research grant from the University from the University of New Mexico, Albuquerque, the M.S. degree in ECE
of Bologna. His research regards lightweight extensions to the RISC-V ISA from Arizona State University, Tempe, and the Ph.D. degree in electrical and
to boost the efficiency of heavily quantized neural networks inference on computer engineering (ECE) from The University of Texas at Austin. He is
microcontroller-class cores. He received the Mukherjee Best Paper Award of currently a Professor of electrical engineering and computer science (EECS)
the 2023 IEEE Computer Society Annual Symposium on VLSI. with Khalifa University and the Director of the SOCL. He has authored or
coauthored over 200 refereed journals and conference proceedings, more than
three books, and over 20 U.S. patents. He is a distinguished lecturer of IEEE
CAS.
Asif Hussain Chiralil Veeran received the B.S. degree in electronics and
communication engineering from the University of Calicut and the M.S.
degree in VLSI design from Anna University, India. He is a Researcher in Sandro Pinto received the Ph.D. degree in electronics and computer engi-
electrical engineering and computer science (EECS) with Khalifa University. neering. He is an Associate Research Professor with the University of
His research interests encompass VLSI, developing efficient and effective Minho, Portugal. He has a deep academic background and several years of
methodologies, designing and optimizing the layout of ICs, ensuring their industry collaboration focusing on operating systems, virtualization, security
functionality, performance, and manufacturability. for embedded, CPS, and IoT systems. He has published more than 80 peer-
reviewed articles and a skilled presenter with speaking experience in several
academic and industrial conferences.
Authorized licensed use limited to: Zhejiang University. Downloaded on September 07,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.