It’s common to run 32-bit legacy applications on 64-bit Linux systems.
Similar to many other architectures, AArch64 (referred to as arm64 in the context of Linux) also supports this.
When you build the Linux kernel for this scenario, you have to enable the CONFIG_COMPAT
option in the kernel config.
However, there’s one potential issue I’d like to highlight.
Unlike on x86_64, where the same GCC toolchain can generate both 64-bit and 32-bit programs (controlled by the -m32
switch),
AArch64 requires two separate GCC toolchains.
So, what’s the challenge when running a 64-bit kernel?
The problem lies in the virtual dynamic shared object (vDSO)1. To support a vDSO for 32-bit ARM applications,
the kernel requires a second toolchain during the build process.
If you simply enable CONFIG_COMPAT
without setting up CROSS_COMPILE_COMPAT
to an AArch32 cross compiler,
the resulting kernel won’t supply a vDSO to 32-bit applications.
While these applications will run smoothly, they won’t enjoy the advantages of the vDSO.
On the other hand, if you’re building the kernel with clang, setting CROSS_COMPILE_COMPAT
is not needed.
This is because the same clang toolchain can produce code for different CPU architectures.
The vDSO provides the functionality of specific syscalls without requiring a switch into the kernel context.
Instead, the kernel places sufficient information into a memory region accessible to userspace to achieve the same functionality.
An example of this is the gettimeofday(2)
2 system call.
Utilizing the vDSO for querying the current time eliminates the need for a context switch, resulting in a faster operation.
This is where forgetting to set CROSS_COMPILE_COMPAT
can have a significant impact.
If you solely rely on CONFIG_COMPAT
without configuring CROSS_COMPILE_COMPAT
, your legacy application will incur a context switch for each gettimeofday(2)
call.
While most applications may not be affected, those with strict time constraints requiring frequent timestamp retrieval, such as real-time applications, could face issues due to this.
I encountered this issue when I observed the absence of the vDSO for a legacy application on an AArch64-platform and became curious about its impact.
To assess the effect, I conducted a small benchmark.
The benchmark involves querying the current time using gettimeofday(2)
500000 times on a Raspberry Pi 3 B+, comparing the performance with and without the presence of a vDSO.
#include <stdio.h>
#include <sys/time.h>
int main(void)
{
struct timeval tv1, tv2, tvtmp;
unsigned long total;
int i;
gettimeofday(&tv1, NULL);
for (i = 0; i < 500000; i++) {
gettimeofday(&tvtmp, NULL);
}
gettimeofday(&tv2, NULL);
total = ((tv2.tv_sec * 1000000) + tv2.tv_usec) - ((tv1.tv_sec * 1000000) + tv1.tv_usec);
printf("%ld\n", total);
return 0;
}
Test | Duration (us) |
---|---|
without vDSO | 1905827 |
with vDSO | 68856 |
As illustrated, the benchmark runs more than 25 times slower without the vDSO.
CROSS_COMPILE_COMPAT
Support into YoctoFixing the problem for my current project turned out to be a little more work, and in this case, I used Yocto.
I found out the hard way that nobody else had considered CROSS_COMPILE_COMPAT
before me.
So, I had to patch the kernel bitbake class in Yocto to be able to use the second 32-bit toolchain while building the kernel.
diff --git a/meta/classes-recipe/kernel.bbclass b/meta/classes-recipe/kernel.bbclass
index 16b85dbca4..6fcbb1e41b 100644
--- a/meta/classes-recipe/kernel.bbclass
+++ b/meta/classes-recipe/kernel.bbclass
@@ -208,6 +208,39 @@ PACKAGES_DYNAMIC += "^${KERNEL_PACKAGE_NAME}-module-.*"
PACKAGES_DYNAMIC += "^${KERNEL_PACKAGE_NAME}-image-.*"
PACKAGES_DYNAMIC += "^${KERNEL_PACKAGE_NAME}-firmware-.*"
+def get_arm32_prefix(d):
+ arm_prefix = ''
+
+ if d.getVar('TARGET_ARCH').startswith('aarch64'):
+ pfxs = all_multilib_tune_values(d, 'TARGET_PREFIX').split()
+ for p in pfxs:
+ if p.startswith("arm-"):
+ arm_prefix = p
+
+ return arm_prefix
+
+def arm32_full_prefix(d):
+ pfx = get_arm32_prefix(d)
+ if pfx == '':
+ return ''
+
+ ps = all_multilib_tune_values(d, 'STAGING_BINDIR_TOOLCHAIN').split()
+ for p in ps:
+ if p.endswith(pfx[:-1]):
+ return p + '/' + pfx
+
+ return ''
+
+def arm32_gcc_dep(d):
+ pfx = get_arm32_prefix(d)
+ if pfx != '':
+ return 'virtual/' + pfx + 'gcc'
+ else:
+ return ''
+
+DEPENDS += "${@arm32_gcc_dep(d)}"
+export CROSS_COMPILE_COMPAT="${@arm32_full_prefix(d)}"
+
export OS = "${TARGET_OS}"
export CROSS_COMPILE = "${TARGET_PREFIX}"
My patch3 is currently under review, and I’m hopeful that it will be merged soon.
If you’re not sure whether a program can benefit from a vDSO, Linux offers multiple ways to check.
Linux exposes the address of the vDSO via the auxiliary vector AT_SYSINFO_EHDR
;
this way, even statically linked programs can find and load the vDSO.
Using the glibc helper function getauxval(3)
4, a program can check itself whether a vDSO is available.
The following snippet shows how:
#include <stdio.h>
#include <sys/auxv.h>
int main(void)
{
if (getauxval(AT_SYSINFO_EHDR))
printf("vDSO available!\n");
else
printf("No vDSO available!\n");
return 0;
}
If the program is dynamically linked, the environment variable LD_SHOW_AUXV
5 can be used to dump the auxiliary vector before the program is executed.
If AT_SYSINFO_EHDR
shows up, a vDSO is available.
root@target:~# LD_SHOW_AUXV=1 /bin/program
[...]
AT_SYSINFO_EHDR: 0xf7f6a000
[...]
root@target:~#
When building an AArch64 kernel intended to run legacy applications, ensure that CROSS_COMPILE_COMPAT
is directed to a 32-bit toolchain.
Failure to do so might lead to performance issues.
Additionally, confirm that the build system you are using supports the configuration of CROSS_COMPILE_COMPAT
.
As of today, neither Yocto (actually OpenEmbedded), Buildroot, nor PTXdist sets CROSS_COMPILE_COMPAT
.
Publish date
14.02.2024
Category
embedded
Authors
Richard Weinberger
+43 5 9980 400 00 (email preferred)
sigma star gmbh
Eduard-Bodem-Gasse 6, 1st floor
6020 Innsbruck | Austria