Skip to content

New non-volatile storage system #77929

@rghaddab

Description

@rghaddab

Introduction

In recent years, advances to process nodes in embedded hardware have made it necessary to support non-volatile technologies different from the classical on-chip NOR flash, which is written in words but erased in pages. These new technologies do not require a separate erase operation at all, and data can be overwritten directly at any time.
On top of that, complexity of firware has not stopped growing, making it necessary to ensure that a solid, scalable storage mechanism is available for all applications. This storage needs to support millions of entries with solid CRC protection and multiple advanced features.

Problem description

In Zephyr, there are currently a few alternatives for non-volatile memory storage:

  • NVS: Basic ID-based storage, but optimized for devices with page erase
  • LittleFS: Full file system, optimized for devices with page erase
  • FCB: Very little use, extremely bare-bones storage

None of them are optimal for the current new wave of solid-state non-volatile memory technologies, including resistive (RRAM) and magnetic (MRAM) random-access, non-volatile memory, because they rely on the "page erase" abstraction whereas these devices do not require an erase operation at all, and data can be overwritten directly.
Additionally, none of the storage systems above is a good match for the widely used settings subsystem, given that they were never designed to operate as a backend for it.

The closest one is NVS, and an analysis of why it is not suitable can be found in the Alternatives section of this issue.

Proposed change

Create a new storage mechanism that fulfills the following requirements:

  • Simple Key-Value Storage (i.e. no file/folder abstractions)
  • 32-bit IDs
  • Support for entries in multiple formats
  • CRC-24 for entries that require it
  • No limits in value length
  • Metadata entries can also store small (1 to 4 bytes) data entries
  • Optimized for bigger (e.g. 16-byte) write block sizes
  • Support for no-erase-required flash drivers (i.e. RRAM, MRAM, etc)
  • Designed from the start to be efficient when used as a backend for the settings susbystem
  • Designed from the start to be able to serve as a backend of the Secure Storage subsystem (link)

Potential names

  • ZMS: Zephyr Memory Storage
  • NVMS: Non-Volatile Memory Storage
  • IDVS: ID Value Storage
  • ZKVS: Zephyr Key-Value Storage

Detailed RFC

Proposed change (Detailed)

General behavior:

ZMS divides the memory space into sectors (minimum 2), and each sector is filled with key/value pair until it is full , we close it then the storage system will move forward to the next sector until it reaches the end and then it starts again from the first sector after garbage collecting it and erasing its content.

Mounting the FS:

Mounting the filesystem will start by getting the flash parameters, checking that the file system properties are correct (sector_size, sector_count ...) Then initializes the file system.

Initialization of ZMS:

As the ZMS has a fast-forward write mechanism, we must find the last sector and the last pointer of the entry where it stopped the last time.
It must look for a closed sector followed by an open one, then within the open sector, it finds (recover) the last written ATE (Allocation Table Entry).
After that it checks that the sector after this one is empty, or it will erase it.

Composition of a sector.

A sector is organized in this form :

Sector N
data0
data1
...
...
ate1
ate0
gc_done
empty_ate
close_ate

Close ATE is used to close a sector if a sector is full
Empty ATE is used to erase a sector
ATEn are entries that describe where the data is stored, its size and its crc32
Data is the written value

ZMS Key/value write :

To avoid rewriting the same data with the same ID again, it must look in all the sectors if the same ID exist then compares its data, if the data is identical no write is performed.
If we must perform a write, then an ATE and Data (if not a delete) are written in a sector
If the sector is full (cannot hold the current data + ATE) we have to move to the next sector, garbage collect the sector after the newly opened one then erase it.
Data size that is smaller or equal to 4 bytes are written within the ATE

ZMS read (with history):

By default it looks for the last data with the same ID and retrieves its data.
If history count is provided that is different than 0, older data with same ID is retrieved.

ZMS how does the cycle counter works ?

Each sector has a lead cycle counter which is a uin8_t that is used to validate all the other ATEs.
The lead cycle counter is stored in the empty ATE.
To become valid, an ATE must have the same cycle counter as the one stored in the empty ATE.
Each time an ATE is moved from a sector to another it must get the cycle counter of the destination sector.
To erase a sector, the cycle counter of the empty ATE is incremented. All the ATEs in that sector become invalid

ZMS how to close a sector ?

To close a sector a close ATE is added at the end of the sector and it must have the same cycle counter as the empty ATE
When closing a sector, all the remaining space that has not been used is filled with garbage data to avoid having old ATEs with a valid cycle counter.

ZMS triggering Garbage collector

Some applications need to make sure that storage writes have a maximum defined latency.
When calling a ZMS write, the current sector could be alomst full and we need to trigger the GC to switch to the next sector.
This operation is time consuming and it will cause some applications to not meet their real time constraints.
ZMS adds an API for the application to get the current remaining free space in a sector. The application could then decide when needed to switch to the next sector if the current one is almost full and of course it will trigger the garbage collection on the next sector. This will guarantee the application that the next write won't trigger the garbage collection.

ZMS structure of ATE (Allocation Table Entries)

An entry has 16 bytes divided between these variables :

struct zms_ate {	uint32_t id; /* data id */	uint16_t len; /* data len within sector */	union {	struct {	uint32_t offset; /* data offset within sector */	union {	uint32_t data_crc; /* crc for data */	uint32_t metadata; /* Used to store metadata information * such as storage version. */	};	};	uint8_t data[8]; /* used to store small size data */	};	uint8_t cycle_cnt; /* cycle counter for non erasable devices */	uint8_t crc8; /* crc8 check of the entry */ } __packed; 
  • id has 32 bits
  • sector size is now 32 bits, that's why offset is also 32 bits => That allows to define large sectors of needed.
  • length of data is 16 bits (could be changed to 32 bits in the future) which can store data up to 64K
  • crc_data/data is a field that can store small size data (<= 4 bytes) or the crc32 for bigger data
  • cycle_cnt is used to validate an ATE within a sector
  • crc8 is the crc of the ATE (could be changed to crc24 in the future)

ZMS wear leveling feature

This storage system is optimized for devices that do not require an erase.
Using storage systems that rely on an erase-value (NVS as an example) will need to emulate the erase with write operations. This will cause a significant decrease in the life expectancy of these devices and will cause more delays for write operations and for initialization.
ZMS introduces a cycle count mechanism that avoids emulating erase operation for these devices.
It also guarantees that every memory location is written only once for each cycle of sector write.

Dependencies

Only on flash drivers.

Concerns and Unresolved Questions

The first draft of this new storage system will not include all the features listed in the proposed change section.
This is intended to minimize the effort of reviewing this new storage system for developers that are familiar to NVS filesystem.
More changes will come in future patch lists

Alternatives

The one alternative we have considered would be to expand the existing NVS codebase in order to remove its described shortcomings. This is in fact how this new proposal was born, once expanding NVS was identified as suboptimal.

Among other issues, we identified the following:

  • NVS was never designed for devices that do not have an erase operation available
  • NVS limits the max value length to be the size of a sector (32KB)
  • NVS was designed to be simple and compact, so extending it is not necessarily a good option
  • NVS performs poorly when the storage mechanism gets close to being full
  • Slow Garbage Collector that can go through all sectors for a single write operation
  • Switching to the next sector is a time consuming operation
  • NVS was not designed to be used as a backend for the settings subsystem, causing latency (up to minutes) and other issues

More info in these Pull Requests:

Metadata

Metadata

Assignees

Labels

RFCRequest For Comments: want input from the communityarea: StorageStorage subsystem

Type

Projects

Status

Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions