Implementing a Heap Across Noncontiguous SRAM on STM32F407

0 Introduction

In embedded system design, the choice of microcontroller depends on the system requirements. A central air-conditioning main control board under development used an STMicroelectronics STM32F407 microcontroller. The STM32F407 family integrates high performance and a variety of on-chip memory and peripherals suitable for main control applications. Key features include a Cortex-M4 core with FPU running at 168 MHz; when executing from flash at 168 MHz the device can provide about 210 DMIPS or 566 CoreMark, and the ART accelerator enables zero-wait-state flash access. DSP instructions and the floating-point unit extend the device's application range.

1 IAR EWARM 7.40 C/C++ Compiler: Data Storage Overview

For this project the IAR EWARM 7.40 C/C++ toolchain was used. The compiler and runtime support two different memory allocation mechanisms: static allocation and dynamic allocation.

1.1 Overview

An ARM core can address a 4 GB continuous memory space, ranging from 0x00000000 to 0xFFFFFFFF. Different physical memories and peripherals are mapped into this address range. Typical applications use read-only memory (ROM) and random-access memory (RAM). Parts of the address range are also used by processor control registers and peripherals.

Application data can be stored in memory in three different ways:

(1) Automatic variables

Local variables defined inside functions that are not declared static are automatic variables. Some automatic variables may be placed in processor registers; others reside on the stack. These variables are available only while the function executes and the stack memory is reclaimed when the function returns.

(2) Global variables, module statics, and local variables declared static

Memory for these objects is allocated once and remains for the application runtime. In this context, "static" means that the memory amount allocated at runtime does not change. The ARM core uses a single address space and the compiler supports full memory addressing.

(3) Dynamically allocated data

Applications can allocate data on the heap; that data remains valid until it is explicitly freed by the application. Dynamic allocation is useful when the number of objects is not known before runtime. Note that dynamic allocation carries risks for systems with limited memory or for long-running systems.

1.2 Storage of automatic variables and parameters

According to the C standard, variables defined in a function and not declared static are automatic; some are placed in registers and the rest on the stack. From a semantic viewpoint both are equivalent. The main differences are that register access is faster and requires less memory. Automatic variables exist only while the function executes; stack memory is freed when the function returns.

(1) The stack

The stack can contain local variables and parameters not stored in registers, temporary results of expressions, function return values (except those returned in registers), processor state during interrupts, and registers that must be restored before the function returns. The stack is a fixed memory block divided into two parts: the used portion containing memory allocated for the call chain, and the free portion available for allocation. The boundary between them is the stack top, represented by the stack pointer. Memory is allocated on the stack by moving the stack pointer.

Functions must never hold pointers into the free portion of the stack because an interrupt could cause another function to allocate and modify stack memory.

(2) Advantages

The main advantage of the stack is that different parts of the program can share the same memory area for local data. Unlike the heap, the stack never becomes fragmented and does not suffer from memory leakage. Recursive calls are supported because each call gets its own stack frame.

(3) Potential issues

The stack's behavior makes it impossible to return pointers to data stored in a stack frame after the function returns. The following function demonstrates a common programming error: it returns a pointer to a local variable that no longer exists after the function returns.

int * MyFunction(){ int x; /* do something here */ return &.x; /* incorrect */ }

Another issue is stack exhaustion. This occurs when nested function calls or large local objects cause the total stack usage to exceed the stack size. The risk is higher when large data objects are placed on the stack or when deep recursion is used.

1.3 Dynamic memory on the heap

Memory allocated on the heap remains until explicitly freed. This is useful when the required data size is only known at runtime.

In C, use malloc or related functions calloc and realloc to allocate, and free to deallocate. In C++, new allocates memory and runs constructors; memory allocated by new must be freed with delete.

When designing applications that use heap allocation be cautious: heap exhaustion can occur if the application uses too much memory or if memory that is no longer needed is not freed. Each allocation requires overhead bytes for management, which can be significant for applications that allocate many small blocks. Fragmentation is also a concern: the free memory may be fragmented into pieces separated by allocated blocks, and even if the total free memory exceeds the requested size, a sufficiently large contiguous block may not be available. Fragmentation tends to increase with repeated allocation and release, so long-running systems should avoid frequent heap allocations when possible.

2 Problem Analysis and Solution

The example code described below was validated with EWARM 7.40 and FreeRTOS 10.0.0.

2.1 STM32F407 on-chip SRAM

The STM32F407 includes 192 KB of on-chip SRAM. The on-chip SRAM is byte-, halfword-, or word-accessible. On-chip SRAM is zero-wait-state when accessed at CPU speed. The on-chip SRAM is split into up to two modules: SRAM1 and SRAM2 mapped at 0x20000000, which are accessible by all AHB masters; and CCM (core-coupled memory) mapped at 0x10000000, which can only be accessed by the CPU via the D-bus.

The memory map for the STM32F407 is shown below. The system SRAM available to applications is 0x20000000 to 0x2001FFFF (128 KB), and the CCM address range is 0x10000000 to 0x1000FFFF (64 KB).

Because the two on-chip memory regions are not contiguous, the EWARM 7.40 new operator can only allocate from a single contiguous region and cannot span both regions. In other words, the default heap implementation is limited to the 128 KB SRAM region and the remaining one-third of the on-chip SRAM is unused by the heap. This limitation is significant for complex applications and requires a solution.

STM32F407 memory map

2.2 Solution

Due to the limitation of EWARM 7.40, the heap cannot span two noncontiguous memory regions. FreeRTOS provides an implementation capable of spanning multiple noncontiguous regions: heap_5.c. This file was introduced in FreeRTOS V8.1.0 and later versions improved it. The approach below integrates heap_5.c so the heap can use multiple discontiguous memory regions.

Steps taken:

Add heap_5.c from the FreeRTOS source package into the project.
Define the heap sizes in a header file:

#define configTOTAL_HEAP1_SIZE((size_t)(64*1024)) //HEAP1 64KB #define configTOTAL_HEAP2_SIZE((size_t)(100*1024)) //HEAP2 100KB

Note: HEAP1 uses the entire 64 KB of CCM; HEAP2 uses 100 KB of SRAM, leaving the remainder for the OS. These values can be adjusted according to application requirements.

3 In main.c define the two memory regions and the heap region table:

#pragmalocation=".ccmram" uint8_t_ucHeapl[configTOTAL_HEAP1_SIZE]; uint8_tucHeap2[configTOTAL_HEAP2_SIZE]; constHeapRegion_txHeapRegions[]={ /*Start address with dummy offsets Size */ {ucHeapl,configTOTAL_HEAP1_SIZE}, {ucHeap2,configTOTAL_HEAP2_SIZE}, {NULL,0} };

4 In main.c override new and delete to use the FreeRTOS heap functions:

void *operator new(size_t size){ void *p=0; p=pvPortMalloc(size); // Call FreeRTOS memory allocation function return p; } void operator delete(void *p){ vPortFree(p); // Call FreeRTOS memory free function }

After applying the changes above, the application can use a combined heap of 164 KB, removing the EWARM 7.40 limitation and meeting the needs of more complex applications.

3 Conclusion

This article described a software method to implement a heap across multiple noncontiguous memory regions and demonstrated the approach on the STM32F407 microcontroller. Using the FreeRTOS heap_5 implementation and overriding new/delete allows the heap to span CCM and SRAM, enabling improved dynamic memory allocation. The controller using this approach has been running in the field for 18 months.