Category Icon
embedded
|
14.02.2024

A Time Consuming Pitfall for 32-bit Applications on AArch64

It’s common to run 32-bit legacy applications on 64-bit Linux systems. Similar to many other architectures, AArch64 (referred to as arm64 in the context of Linux) also supports this. When you build the Linux kernel for this scenario, you have to enable the CONFIG_COMPAT option in the kernel config.

However, there’s one potential issue I’d like to highlight. Unlike on x86_64, where the same GCC toolchain can generate both 64-bit and 32-bit programs (controlled by the -m32 switch), AArch64 requires two separate GCC toolchains. So, what’s the challenge when running a 64-bit kernel? The problem lies in the virtual dynamic shared object (vDSO)1. To support a vDSO for 32-bit ARM applications, the kernel requires a second toolchain during the build process.

If you simply enable CONFIG_COMPAT without setting up CROSS_COMPILE_COMPAT to an AArch32 cross compiler, the resulting kernel won’t supply a vDSO to 32-bit applications. While these applications will run smoothly, they won’t enjoy the advantages of the vDSO.

On the other hand, if you’re building the kernel with clang, setting CROSS_COMPILE_COMPAT is not needed. This is because the same clang toolchain can produce code for different CPU architectures.

The Role of the vDSO

The vDSO provides the functionality of specific syscalls without requiring a switch into the kernel context. Instead, the kernel places sufficient information into a memory region accessible to userspace to achieve the same functionality. An example of this is the gettimeofday(2)2 system call. Utilizing the vDSO for querying the current time eliminates the need for a context switch, resulting in a faster operation.

This is where forgetting to set CROSS_COMPILE_COMPAT can have a significant impact. If you solely rely on CONFIG_COMPAT without configuring CROSS_COMPILE_COMPAT, your legacy application will incur a context switch for each gettimeofday(2) call. While most applications may not be affected, those with strict time constraints requiring frequent timestamp retrieval, such as real-time applications, could face issues due to this.

I encountered this issue when I observed the absence of the vDSO for a legacy application on an AArch64-platform and became curious about its impact. To assess the effect, I conducted a small benchmark. The benchmark involves querying the current time using gettimeofday(2) 500000 times on a Raspberry Pi 3 B+, comparing the performance with and without the presence of a vDSO.

#include <stdio.h>
#include <sys/time.h>

int main(void)
{
        struct timeval tv1, tv2, tvtmp;
        unsigned long total;
        int i;

        gettimeofday(&tv1, NULL);
        for (i = 0; i < 500000; i++) {
              gettimeofday(&tvtmp, NULL);
        }
        gettimeofday(&tv2, NULL);

        total = ((tv2.tv_sec * 1000000) + tv2.tv_usec) - ((tv1.tv_sec * 1000000) + tv1.tv_usec);
        printf("%ld\n", total);

        return 0;
}
Test Duration (us)
without vDSO 1905827
with vDSO 68856

As illustrated, the benchmark runs more than 25 times slower without the vDSO.

Integrating CROSS_COMPILE_COMPAT Support into Yocto

Fixing the problem for my current project turned out to be a little more work, and in this case, I used Yocto. I found out the hard way that nobody else had considered CROSS_COMPILE_COMPAT before me. So, I had to patch the kernel bitbake class in Yocto to be able to use the second 32-bit toolchain while building the kernel.

diff --git a/meta/classes-recipe/kernel.bbclass b/meta/classes-recipe/kernel.bbclass
index 16b85dbca4..6fcbb1e41b 100644
--- a/meta/classes-recipe/kernel.bbclass
+++ b/meta/classes-recipe/kernel.bbclass
@@ -208,6 +208,39 @@ PACKAGES_DYNAMIC += "^${KERNEL_PACKAGE_NAME}-module-.*"
 PACKAGES_DYNAMIC += "^${KERNEL_PACKAGE_NAME}-image-.*"
 PACKAGES_DYNAMIC += "^${KERNEL_PACKAGE_NAME}-firmware-.*"
 
+def get_arm32_prefix(d):
+    arm_prefix = ''
+
+    if d.getVar('TARGET_ARCH').startswith('aarch64'):
+        pfxs = all_multilib_tune_values(d, 'TARGET_PREFIX').split()
+        for p in pfxs:
+           if p.startswith("arm-"):
+               arm_prefix = p
+
+    return arm_prefix
+
+def arm32_full_prefix(d):
+    pfx = get_arm32_prefix(d)
+    if pfx == '':
+        return ''
+
+    ps = all_multilib_tune_values(d, 'STAGING_BINDIR_TOOLCHAIN').split()
+    for p in ps:
+        if p.endswith(pfx[:-1]):
+            return p + '/' + pfx
+
+    return ''
+
+def arm32_gcc_dep(d):
+    pfx = get_arm32_prefix(d)
+    if pfx != '':
+        return 'virtual/' + pfx + 'gcc'
+    else:
+        return ''
+
+DEPENDS += "${@arm32_gcc_dep(d)}"
+export CROSS_COMPILE_COMPAT="${@arm32_full_prefix(d)}"
+
 export OS = "${TARGET_OS}"
 export CROSS_COMPILE = "${TARGET_PREFIX}"

My patch3 is currently under review, and I’m hopeful that it will be merged soon.

Checking for the Availability of a vDSO

If you’re not sure whether a program can benefit from a vDSO, Linux offers multiple ways to check. Linux exposes the address of the vDSO via the auxiliary vector AT_SYSINFO_EHDR; this way, even statically linked programs can find and load the vDSO. Using the glibc helper function getauxval(3)4, a program can check itself whether a vDSO is available. The following snippet shows how:

#include <stdio.h>
#include <sys/auxv.h>

int main(void)
{
        if (getauxval(AT_SYSINFO_EHDR))
                printf("vDSO available!\n");
        else
                printf("No vDSO available!\n");

        return 0;
}

If the program is dynamically linked, the environment variable LD_SHOW_AUXV5 can be used to dump the auxiliary vector before the program is executed. If AT_SYSINFO_EHDR shows up, a vDSO is available.

root@target:~# LD_SHOW_AUXV=1 /bin/program
[...]
AT_SYSINFO_EHDR:      0xf7f6a000
[...]
root@target:~#

Summary

When building an AArch64 kernel intended to run legacy applications, ensure that CROSS_COMPILE_COMPAT is directed to a 32-bit toolchain. Failure to do so might lead to performance issues. Additionally, confirm that the build system you are using supports the configuration of CROSS_COMPILE_COMPAT. As of today, neither Yocto (actually OpenEmbedded), Buildroot, nor PTXdist sets CROSS_COMPILE_COMPAT.

Publish date

14.02.2024

Category

embedded

Authors

Richard Weinberger

Icon with a waving hand

Get in touch

+43 5 9980 400 00 (email preferred)

sigma star gmbh
Eduard-Bodem-Gasse 6, 1st floor
6020 Innsbruck | Austria

LinkedIn logo
sigma star gmbh logo