Qfplib-M3: a free, fast and accurate ARM Cortex-M3 floating-point library

Quinapalus Home :: Things Technical :: Qfplib-M3: a free, fast and accurate ARM Cortex-M3 floating-point library

See also: Qfplib-M0-tiny: a similar library optimised for code size, aimed at ARM Cortex-M0 microcontrollers; and
Qfplib-M0-full: a similar library more optimised for speed, including both single- and double-precision functions, also aimed at ARM Cortex-M0 microcontrollers.

Introduction

Qfplib-M3 is a library of IEEE 754 single-precision floating-point arithmetic routines for microcontrollers based on the ARM Cortex-M3 core (ARMv7-M architecture). It will also run on Cortex-M4 microcontrollers but is not optimised for these devices. The optimisation goals for Qfplib-M3 are speed and accuracy, while keeping code size within reasonable bounds.

Qfplib-M3 provides correctly rounded (to nearest, even-on-tie) addition, subtraction, multiplication, division and square root operations, and sine, cosine, tangent, arctangent, logarithm and exponential functions that give a very high degree of accuracy.

Licence

Qfplib-M3 is open source, licensed under version 2 of the GNU GPL. Use at your own risk. Qfplib-M3 is not licensed under the LGPL. Roughly speaking, this means that if you wish to use it in conjunction with non-GPL code you will require an alternative licence: please enquire using the e-mail address on the home page.

Code size

The complete set of functions in Qfplib-M3 occupies a little under 12 kbyte of program memory. In general code is not shared between the functions, so the footprint can be reduced significantly if not all functions are used: if using the GNU linker, supply the --gc-sections option. Qfplib-M3 does not depend on any other libraries.

Stack and static memory usage

Qfplib-M3 uses no stack and no static storage. No initialisation is required. The code is fully ROMable and is thread-safe.

Speed and accuracy

The following table compares cycle counts for Qfplib-M3 against other libraries. Qfplib-M3 timing results are approximate average values over the ranges of argument values shown and include a calling overhead of 3 cycles. They were measured using an LPC1763 microcontroller executing from (single-cycle) RAM.

In the following table, ‘ulp’ means ‘unit in last place’. Errors are measured relative to the correctly rounded result.

FunctionArguments Qfplib-M3‘GoFast’ IARKeil
CyclesAccuracyCyclesAccuracy CyclesAccuracyCyclesAccuracy
qfp_fadd(x,y)2–16≤|x|<216
2–16≤|y|<216
37.1 Exact in
all cases
90±1 ulp ‘in
most cases’
60? 55?
qfp_fsub(x,y)2–16≤|x|<216
2–16≤|y|<216
38.0 956055
qfp_fmul(x,y)2–16≤|x|<216
2–16≤|y|<216
36.0 805050
qfp_fdiv(x,y)2–16≤|x|<216
2–16≤|y|<216
57.1 19580135
qfp_fsqrt(x)2–16x<21649.3 380565260
qfp_fexp(x)2–4≤|x|<2444.1 ±1 ulp in
all cases
210±2 ulp ‘in
most cases’
1635? 1565?
qfp_fln(x)2–16x<21644.4 455830825
qfp_fsin(x)2–8≤|x|<143.0 205750710
2–8≤|x|<2860.1
All x63.9
qfp_fcos(x)2–8≤|x|<139.2 205740705
2–8≤|x|<2859.4
All x65.1
qfp_ftan(x)2–8≤|x|<148.2 345825835
2–8≤|x|<2870.5
All x72.5
qfp_fatan2(y,x)2–4≤|x|<24
2–4≤|y|<24
83.4 540860965

Note that unlike Qfplib-M3, none of the alternative libraries appears to offer IEEE 754 compliance with regard to rounding.

Results for the Micro Digital ‘GoFast’, Keil and IAR libraries are inferred from the timings given here and here. Those pages have not been updated for a few years: I would welcome any more up-to-date benchmark figures. Note however, that (for example) the end-user licence for the Keil MDK includes the clause ‘you shall treat any and all benchmarking data relating to the Software [...] which are indicative of its performance, efficacy, reliability or quality, as confidential information and you shall not disclose such information to any third party without the express written permission of ARM’. It is not clear whether such a clause is enforceable, but it nevertheless could be viewed as an indication of ARM’s confidence in the ‘performance, efficacy, reliability or quality’ of their software.

Accuracy analysis of scientific functions

Function Mean signed (systematic) error Mean unsigned error RMS error Worst-case negative error Worst-case positive error
qfp_fexp +0.0036 ulp 0.0216 ulp 0.1471 ulp –1 ulp +1 ulp
qfp_fln –0.0413 ulp 0.0417 ulp 0.2042 ulp –1 ulp +1 ulp
qfp_fsin –0.0019 ulp 0.0115 ulp 0.1074 ulp –1 ulp +1 ulp
qfp_fcos –0.0011 ulp 0.0119 ulp 0.1092 ulp –1 ulp +1 ulp
qfp_ftan –0.0247 ulp 0.0561 ulp 0.2368 ulp –1 ulp +1 ulp
qfp_fatan2 +0.0144 ulp 0.0186 ulp 0.1364 ulp –1 ulp +1 ulp

Care has been taken to ensure Qfplib-M3 maintains results accurate to 1 ulp in pathological cases, such as sin x where x is near a multiple of π, and cos x and tan x where x is near an odd multiple of π/2. It even correctly evaluates sin(16367173·273). I would be interested to learn of any applications that require such accuracy other than the testing of floating-point libraries or the evaluation of π to many digits, noble pursuits though those both are.

Testing

Each unary function has been tested against the standard GNU floating-point library supplied with GCC for x86 processors exhaustively on non-exceptional arguments, plus on tens of millions of random exceptional cases. In exceptional cases the unary functions return bit-identical results to the GNU library; qfp_fsqrt returns bit-identical results in all cases.

Each binary function has been tested against the GNU x86 library on over a billion cases, exceptional and non-exceptional, random and contrived. For qfp_fatan2 bit-identical results are returned in all exceptional cases; for all other binary functions, bit-identical results are returned in all cases.

Implementation of the IEEE 754 standard

Qfplib correctly treats signed zeros, denormals, infinities and NaNs according to the IEEE 754 standard. The results of the addition, subtraction, multiplication, division and square root operations are correctly rounded (to nearest, even-on-tie). Other rounding modes and traps are not supported.

Other functions

You may also be interested in the qfp_fcmp, qfp_float2int, qfp_float2fix, qfp_int2float, qfp_fix2float, qfp_float2uint, qfp_float2ufix, qfp_uint2float, qfp_ufix2float, qfp_float2str and qfp_str2float functions provided as part of Qfplib-M0-tiny library.

Download

Release 20160408.


This page most recently updated Fri 5 Jan 10:25:31 GMT 2024
Word Matcher

Options...
Type a pattern, e.g.
h???o
into the box and click ‘Go!’ to see a list of matching words. More...


Qxw screen
Qxw is a free (GPL) crossword construction program. New! Release 20200708 for both Linux and Windows. Non-Roman alphabets, batch mode, multiplex lights, answer treatments, circular and hex grids, jumbled entries, lots more besides. More...

You can order my book, ‘Practical Signal Processing’, directly from CUP or via Hive, Amazon UK or Amazon US.
Practical Signal Processing front cover
“Probably the best book on signal processing ever written” — review at Goodreads.
Wydanie polskie.

If you find this site useful or diverting, please consider a donation to NASS (a UK registered charity), to KickAS (in the US), or to a similar body in your own country.

Copyright ©2004–2024.
All trademarks used are hereby acknowledged.