Qfplib-M0-full: a free, fast and compact ARM Cortex-M0 floating-point library
Qfplib-M0-full is a library of IEEE 754 single- and double-precision floating-point arithmetic routines for microcontrollers based on the ARM Cortex-M0 core (ARMv6-M architecture). It should also run on Cortex-M3 and Cortex-M4 microcontrollers and will give reasonable performance, but it is not optimised for these devices.
It provides correctly rounded (to nearest, even-on-tie) addition, subtraction, multiplication, division and square root operations, and sine, cosine, tangent, arctangent, logarithm and exponential functions that give a high degree of accuracy. There are also conversion functions between floating-point values and signed or unsigned integer or fixed-point values. The library occupies less than 6 kbyte of program memory.
Qfplib-M0-full does not use any static storage. Stack use is parsimonious and statically analysable; recursion is not used.
The Raspberry Pi RP2040 microcontroller includes a version of this library, slightly modified to take advantage of special hardware available on that device.
The following table compares cycle counts for Qfplib-M0-full against other libraries. Qfplib-M0-full and GCC library results are average values for non-exceptional arguments to the functions, include calling overhead, and are approximate. They were measured using an LPC11U68 microcontroller with single-cycle flash memory. Results for the Micro Digital ‘GoFast’ library—presumably optimised for speed rather than size, judging by its name—are inferred from the timings given on this page for an ARM7TDMI-based processor. The comparison here may not be not strictly fair to Qfplib-M0-full as it is not clear from their description whether Micro Digital’s library exploits features available on that processor but not on the Cortex-M0: for example, ARM mode is considerably faster and more flexible than Thumb mode, and the long multiply instructions can be used to advantage in several of the routines, especially in double precision. Micro Digital do not appear to provide public information on the code size of their library. The implementation of the basic functions does not appear to be IEEE 754 compliant with regard to rounding.
Note that in every case the Qfplib-M0-full double-precision implementation is faster than the corresponding GCC single-precision implementation, sometimes by a very large factor.
The ARM CMSIS implementations of the scientific functions, despite their name ‘FastMath’, appear to be many times slower than Qfplib-M0-full. For example, the average execution time for ARM's single-precision cosine function (compiled using GCC) is about 3880 cycles, virtually independent of the optimisation flags used.
Limitations and deviations from the IEEE 754 standard
On input and output NaNs are converted to infinities and denormals are flushed to zero.
Function ranges and accuracy
Subject to the limitations and deviations mentioned above, the addition, subtraction, multiplication, division and square root functions all produce correctly rounded (to nearest, even-on-tie) results. This has been verified using many billions of test cases, both random and contrived.
Other functions generally give results accurate to approximately 1 ulp (‘unit in last place’). Accuracy is poorer where a tiny change in an argument results in a change in the result of a large number of ulps, such as when taking the logarithm of a value near 1 or the sine of a value near a multiple of π. Accurate handling of such cases consumes a large amount of code space and is seldom if ever needed.
The single-precision trigonometric functions require an argument between –128 and +128; the double-precision trigonometric functions require an argument between –1024 and +1024.
The comparison functions return zero if its arguments are equal (negative zero is equal to positive zero) or plus or minus one if its first argument is respectively greater than or less than its second.
A comprehensive range of functions is provided to convert between floating-point data and signed and unsigned fixed-point and integer data. They are as follows.
You may also be interested in the qfp_float2str and qfp_str2float functions provided as part of Qfplib-M0-tiny library.
This page most recently updated Fri 4 Feb 16:49:53 GMT 2022
New: ARM Cortex-M7 cycle counts and dual-issue combinations; Free, fast, and compact ARM Cortex-M0 single- and double-precision floating-point library; Offline SOWPODS checker
Qxw is a free (GPL) crossword construction program. New! Release 20200708 for both Linux and Windows. Non-Roman alphabets, batch mode, multiplex lights, answer treatments, circular and hex grids, jumbled entries, lots more besides. More...
You can order my book, ‘Practical Signal Processing’, directly from CUP or via Hive, Amazon UK or Amazon US.
If you find this site useful or diverting, please consider a donation to NASS (a UK registered charity), to KickAS (in the US), or to a similar body in your own country.
All trademarks used are hereby acknowledged.