ARMv8.5-A Memory Tagging Extension (MTE) Explained

Abstract

The 1988 Internet worm removed about one tenth of the early network and significantly slowed the remaining network [1]. More than 30 years later, the two most significant classes of security vulnerabilities in code written in C-like languages remain memory-safety violations.

According to a 2019 BlueHat presentation, 70% of security issues fixed in Microsoft products resulted from memory-safety violations [2]. Google reported similar data for Android, with over 75% of vulnerabilities being memory-safety related [3]. Although many of these violations are impossible in newer languages, the installed base of code written in C and C++ is large; for example, Debian Linux alone contains more than 500 million lines of code [4].

This article introduces Armv8.5-A Memory Tagging Extension (MTE). MTE aims to improve memory safety for code written in unsafe languages without changing source code and, in some cases, without recompilation. Easily deployable detection and mitigation for memory-safety violations can prevent a large class of exploits.

Introduction

Memory-safety violations fall into two broad categories: spatial safety and temporal safety.

Exploitable violations are often the first stage of an attack, intended to deliver payloads or chain with other vulnerabilities to gain control of a system or exfiltrate privileged information.

Spatial safety is violated when an access goes outside an object's true bounds, for example when a buffer on the stack overflows. This can be exploited to overwrite a function return address and form the basis of several attack types.

Temporal safety is violated when a reference to an object is used after the object has been deallocated or repurposed, for example when malicious data overwrites an object containing a function pointer. Such corruption can also form the basis of many attacks.

MTE provides a mechanism to detect both major classes of memory-safety violations. MTE helps find potential defects prior to deployment by increasing the effectiveness of testing and fuzzing, and it can also assist large-scale detection after deployment.

Fuzzing is a software testing technique that supplies random, invalid, or unexpected data to software to detect crashes and unexpected behavior. It helps reveal vulnerabilities and stability issues.

With careful software design, sequential temporal violations that access memory immediately before or after the true bounds can always be detected. Wild violations anywhere in the address space can be detected probabilistically.

Locating and fixing bugs before deployment reduces the attack surface of deployed code. Large-scale detection after deployment supports passive remediation before widespread exploitation. Research into cybercrime economics [5] shows high sensitivity to scale; timely detection and passive fixes can be effective at disrupting large-scale abuse.

Threat Model

MTE is intended to provide robustness against attacks that try to subvert code by supplying malicious data. It does not address algorithmic bugs or malware.

MTE is designed to detect memory-safety violations and increase robustness against attacks exploiting such violations. In dynamically linked systems, legacy code benefits from MTE heap tagging without recompilation.

Applying MTE to the stack requires recompilation. The MTE architecture assumes stack pointers are trusted. Therefore, when deploying MTE for the stack, combining it with other features such as Branch Target Identification (BTI) and Pointer Authentication Codes (PAC) is important to reduce the likelihood that gadgets exist which let an attacker control the stack pointer.

MTE Memory Safety

Arm memory tagging implements locks and keys on memory accesses. Memory locations can be tagged with a lock value, and pointers carry a key. Access is allowed if the key matches the lock; otherwise an error is reported.

Memory is tagged by adding 4 bits of metadata for every 16 bytes of physical memory. These are the tagging granules. Tagged memory realizes the lock concept.

Pointers and virtual addresses are extended to contain the key.

To provide key bits without increasing pointer width, MTE uses the Top Byte Ignore (TBI) feature of the Armv8-A architecture. When TBI is enabled, the top byte of a virtual address is ignored for address translation. This allows the top byte to store metadata. In MTE, 4 bits of the top byte provide the key.

MTE relies on differences between locks and keys to detect memory-safety violations.

Because the number of available tag values is limited, two allocations cannot be guaranteed to have different tags for any particular execution. However, allocators can ensure sequential allocations use different tags, which detects the most common violation types.

More generally, MTE supports random tag generation and seed-based pseudo-random tag generation. With enough program executions, the probability that at least one execution detects a violation approaches 100%.

Architecture Details

MTE adds a new memory type to the Arm architecture: Normal Tagged Memory.

Except for a few exceptions, loads and stores to this memory type perform tag checks when the tag in the top byte of the address register is compared with the tag stored in memory, provided the safety of the access cannot be determined statically.

When mismatches are configured to be reported asynchronously, details accumulate in system registers. A control ensures this register is updated when entering software that runs at a higher exception level. This lets the operating system kernel isolate mismatches to specific execution threads and make decisions based on that information.

Synchronous exceptions are precise: the load or store instruction that caused the mismatch can be identified exactly. Asynchronous reporting is imprecise because it can only isolate the mismatch to a particular thread of execution.

MTE adds the following instruction classes to the Armv8-A architecture, grouped into three categories:

Tag operation instructions for stack and heap tagging

IRG: To make the statistical basis of MTE effective, a source for random tags is required. IRG provides such a tag from hardware and inserts it into a register for use by other instructions.

GMI: This instruction manipulates excluded tag sets to be used with IRG. This is useful when software reserves specific tag values for special purposes while preserving random-tag behavior for normal allocations.

LDG, STG, STZG: These instructions allow obtaining or setting memory tags without modifying data or zeroing memory. They change tags in memory efficiently.

ST2G and STZ2G: Denser alternatives to STG and STZG that operate on two memory granules when allocation sizes permit.

STGP: Stores both tag and data to memory.

Pointer arithmetic and stack-tagging instructions

ADDG and SUBG: Variants of ADD and SUB intended for address arithmetic. They allow the tag and address to be modified independently by an immediate. These instructions create tagged addresses for stack objects.

SUBP(S): Provides 56-bit subtraction with optional flag updates, required for pointer arithmetic while ignoring the top byte tag.

System-use instructions

LDGM, STGM, STZGM: Bulk tag operations that are UNDEFINED at EL0. They are intended for system software for initialization and serialization, such as implementing swapping tagged memory to tag-unaware media. The zeroing forms can be used for efficient memory initialization. MTE also provides cache maintenance operations designed for tags, offering efficient mechanisms that run across entire cache lines.

MTE Large-scale Deployment

Arm anticipates MTE will be deployed in different configurations at different stages of product development and deployment.

Precise checks aim to provide maximum information about failure locations. Imprecise checks aim to provide higher performance.

The operating system kernel can choose to terminate a process on a tag mismatch exception or to record the occurrence and allow the process to continue.

Testing products with MTE enabled can reveal many potential issues. In this stage, it is appropriate to detect and log as much information as possible.

Systems do not need to be protected against attacker actions during testing. Systems may be configured to:

Perform precise checks.
Accumulate tag-mismatch data rather than terminate the process. This configuration allows maximal information collection to support directed testing and fuzzing to find the most defects.

After product release, MTE may be configured to:

Perform imprecise checks.
Terminate processes on tag mismatches.

This configuration balances performance with detection of memory-safety violations that could enable exploits.

After release, configuring high-value processes (for example, key stores) to perform precise checks can be appropriate so that accurate diagnostics about failure locations can be returned to developers through bug reports and telemetry.

Systems may also adaptively change their MTE configuration.

For example, if a process running with imprecise checks terminates due to a tag check failure, on the next start it might begin with precise checks to collect better diagnostic information. This deployment model blends the performance advantages of imprecise checks with the diagnostic benefits of precise checks to provide better quality feedback.

MTE Hardware Deployment

To support MTE in future Arm products, a new version of the AMBA 5 Coherent Hub Interface (CHI) specification is under development to support MTE transport and coherence requirements [7].

Heap Tagging

In dynamically linked systems, tagged heaps can be deployed without changing existing binaries. Only the operating system kernel and C library need modification.

Arm prototyped MTE by adding support to the Linux kernel. Areas requiring modification include:

Removing tags from user-space pointers when used for address-space management.
Making clear_page and copy_page in the virtual memory system tag-aware.
Adding fault handling for tag mismatches, analogous to SIGSEGV handling for translation faults.
Converting memory mappings exposed to user space to use normal tagged memory.
Extending detection and system register configuration to enable the extension.

Arm is contributing Linux kernel support upstream.

In the C library, Arm modified memory-related functions:

malloc
free
calloc
realloc

Memory copy and string functions were also modified to prevent overreads of source buffers.

Stack Tagging

Memory allocated on the stack at runtime requires compiler and kernel support and thus recompilation. Many stack-tagging strategies are possible.

A strategy using IRG selects a random tag on function entry and assigns a new stack frame tag. The compiler then uses ADDG and SUBG to create tagged addresses for each stack slot, where tags are offsets from the initial random tag. Bulk tag-storage instructions can initialize stack allocation, but the compiler need not initialize each slot before use.

This strategy ensures MTE's statistical properties hold per function call and guarantees adjacent objects on the stack receive different tags, detecting sequential overflows and underflows.

Protecting adjacent objects on the stack requires aligning those objects to tagging granules, i.e., 16 bytes. In some programs, this alignment increases stack usage; benchmark analysis shows the increase is generally small.

For performance, MTE allows eliding checks for memory accesses using immediate offsets from the stack pointer when the compiler can prove correctness statically or emit diagnostics at compile time.

MTE Optimizations

MTE is designed to improve memory safety without modifying source code. MTE inevitably introduces overhead because tags must be read from and written to the memory system. This overhead depends on allocation size and lifetime and whether tag and data are operated on together or separately. Overheads can be minimized by:

Writing tags and initializing memory together. In many cases memory must be zeroed and tagged, for example when clearing pages before handing them to user space. Arm's Linux-based prototype uses STZGM for this purpose.
Avoiding over-allocation of address space that is never written. When software allocates address space much larger than it uses and only touches a small part before deallocation, MTE can be more expensive because tags may be required even if data is never written.
Avoiding excessive deallocation and reallocation. This is generally good practice, but MTE increases the fixed cost of allocation and deallocation and can amplify existing performance issues.
Avoiding large fixed-size allocations on the stack. Large, fixed stack buffers are often underutilized; for example, PATH_MAX buffers often contain short strings. Avoiding such allocations reduces the number of unused memory tags that must be written and reduces stack protection overhead.