• Undefined reference to __xxx_finite - why and how?

    This article explains the undefined reference to __xxx_finite problem, what is it, why it happens, and how to solve it.
    All the sources codes in this article has been placed in wangyoucao577/libmath-finite, which is a small library intends to solve this problem. Feel free to try it by yourself, file issue in the project if any question.

    The problem

    Someday when I try to build a project on my new computer that runs Debian 11, a lot of undefined reference errors occurred, which leads to symbols missed. At a first glance I find that all the missed symbols have _finite suffix and looks like math functions, for example, undefined reference to __pow_finite. But why?

    Reproduction

    Let’s try to reproduce it step-by-step.

    Test enviornments

    First of all, I setup two different enviornment as below to reproduce this problem:

    • Debian 11
    $ gcc --version
    gcc (Debian 10.2.1-6) 10.2.1 20210110
    ...
    
    $ ldd --version
    ldd (Debian GLIBC 2.31-13+deb11u2) 2.31
    ...
    
    $ ll /lib/x86_64-linux-gnu/libc-*.so
    -rwxr-xr-x 1 root root 1839792 Oct  2 12:47 /lib/x86_64-linux-gnu/libc-2.31.so
    
    • Debian 10
    $ gcc --version
    gcc (Debian 8.3.0-6) 8.3.0
    ...
    
    $ ldd --version
    ldd (Debian GLIBC 2.28-10) 2.28
    ...
    
    $ ll /lib/x86_64-linux-gnu/libc-*.so
    -rwxr-xr-x 1 root root 1824496 May  1  2019 /lib/x86_64-linux-gnu/libc-2.28.so
    

    Please noted that ldd --version indicates the version of libc and other included libraries, including libm, etc.

    Normal routine

    Let’s say if we have a library libfoo and an app uses it. The codes may like this:

    // libfoo/foo.h
    double foo(double x, double y);
    
    // libfoo/foo.c
    #include "foo.h"
    #include <math.h>
    double foo(double x, double y) {
        return pow(x, y);
    }
    
    // app/main.c
    #include "foo.h"
    #include <stdio.h>
    int main () {
        printf("foo(2.0, 5.0) = %f\n", foo(2.0, 5.0)); 
        return 0;
    }
    

    Then build and run it:

    # we put main.c in app/ and foo.* in libfoo
    $ tree
    .
    |-- app
    |   `-- main.c
    `-- libfoo
        |-- foo.c
        `-- foo.h
    
    # build libfoo.a
    $ gcc -c -ffast-math -o foo.o libfoo/foo.c
    $ ar rcs libfoo.a foo.o
    
    # build app
    $ gcc -Ilibfoo -L. -o app.out app/main.c -lfoo -lm
    
    # run app
    $ chmod +x ./app.out && ./app.out
    foo(2, 5) = 32.000000
    

    Now we see the app calls foo successfully and output foo(2.0, 5.0) = 32.000000. It works well on both Debian 10 and Debian 11.
    Be aware that we build libfoo.a with -ffast-math to leverage the GCC FloatingPointMath Optimiation.

    Build libfoo on Debian 10

    Now let’s try to build the libfoo on Debian 10.

    # build libfoo.a on Debian 10 
    $ gcc -c -ffast-math -o foo.o libfoo/foo.c
    $ ar rcs libfoo.a foo.o
    
    # build app on Debian 11, with `libfoo.a` copied from Debian 10
    $ gcc -Ilibfoo -L. -o app.out app/main.c -lfoo -lm
    /usr/bin/ld: libfoo-prebuilt-debian10/libfoo.a(foo.o): in function `foo':
    foo.c:(.text+0x1d): undefined reference to __pow_finite
    collect2: error: ld returned 1 exit status
    
    

    It happens! But why?

    Analysis

    Let’s look at the libfoo.a to check what inside it and what it need.

    # libfoo.a built on Debian 10
    $ nm libfoo.a
    
    foo.o:
                     U _GLOBAL_OFFSET_TABLE_
                     U __pow_finite
    0000000000000000 T foo
    

    The nm commands show that libfoo.a includes a foo function that matchs above source code. Also, it needs an external symbol __pow_finite, which is the error reported. But if the symbol is missed, why it works well if it’s compiled on Debian 11? Let’s checkout the libfoo.a that built on Debain 11.

    # libfoo.a built on Debian 11
    $ nm libfoo.a
    
    foo.o:
                     U _GLOBAL_OFFSET_TABLE_
    0000000000000000 T foo
                     U pow
    

    The libfoo.a built on Debian 11 uses pow instead of __pow_finite, that’s why it works well.
    Now let’s compare the two versions of libm to see what’s the different on this symbol.

    # libm-2.31 on Debian 11
    $ nm -D /lib/x86_64-linux-gnu/libm-2.31.so | grep __pow_finite
    000000000002e450 i __pow_finite@GLIBC_2.15
    
    # libm-2.28 on Debian 10
    $ nm -D /lib/x86_64-linux-gnu/libm-2.28.so | grep __pow_finite
    000000000002d3e0 i __pow_finite
    
    

    The libm-2.28 on Debian 10 has a default __pow_finite, so that app and link correctly. But libm-2.31 on Debian 11 only has an __pow_finite@GLIBC_2.15 for backward compatibility, which can NOT be linked by default. How the GNU C Library handles backward compatibility explains the glibc backward compatibility handling.

    After some research, Proposal: Remove or reduce math-finite.h tells us that the _finite names are just aliases of the normal name and so the asm attribute isn’t accompishing anything, and the sourceware.org - glibc.git - commit: remove math-finite.h redirections for math functions removes them completely. It’s actually an ABI change of glibc libraries, but doesn’t properly handled.

    How to solve

    Now we know why it happens, and there’re several ways to solve it.

    If you can change the source codes

    Simply add -fno-finite-math-only in compilation. For example,

    # build libfoo.a on Debian 10 
    $ gcc -c -ffast-math -fno-finite-math-only -o foo.o libfoo/foo.c
    $ ar rcs libfoo.a foo.o
    $ nm libfoo.a
    
    foo.o:
                     U _GLOBAL_OFFSET_TABLE_
    0000000000000000 T foo
                     U pow
    

    The -fno-finite-math-only flag tells compiler that don’t use finite-math functions only, so that the result libfoo.a relis on pow directly, and it works.

    If you only has an pre-built .a/.so that requires __xxx_finite functions

    You can not change the .a/.so’s source code and building process.

    Way1. Copy the libmath-finite.c/h into your project and compile them together

    For example,

    # build the app with `libmath-finite.c/h` together   
    $ gcc  -Ilibfoo -L. -I../.. -o app.out app/main.c ../../libmath-finite.c -lfoo -lm
    $ 
    $ nm app.out  | grep __pow_finite
    00000000000012a4 T __pow_finite
    

    In this case, the __pow_finite symbol has been defined in the app directly.

    For example,

    # build `libmath-finite.a`    
    $ gcc -c -o libmath-finite.o libmath-finite.c
    $ ar rcs libmath-finite.a libmath-finite.o
    
    # build app that linking `libmath-finite.a`
    $ gcc  -Ilibfoo -L. -L../.. -o app.out app/main.c -lfoo -lmath-finite -lm
    

    Be aware that -lmath-finite has to be placed after -lfoo and before -lm, otherwise the symbol can not be found correctly.

    For example,

    # build `libmath-finite.so`    
    $ gcc -c -fPIC -o libmath-finite.o libmath-finite.c
    $ gcc -shared -fPIC -o libmath-finite.so libmath-finite.o -lm
    
    # build app that linking `libmath-finite.a`
    $ gcc  -Ilibfoo -L. -L../.. -o app.out app/main.c -lfoo -lmath-finite -lm
    
    # run app
    $ export LD_LIBRARY_PATH=$(pwd)/../..
    $ ./app.out
    foo(2.0, 5.0) = 32.000000
    

    Same with Way2, the libmath-finite.so should be linked correctly. On the other hand, the libmath-finite.so should be distributed with your program together, and need to be find by your program when run. In this example, export LD_LIBRARY_PATH=$(pwd)/../.. is helpful to make app.out find the libmath-finite.so.

    Conclusion

    This problem is due to an ABI change of glibc between 2.30 and 2.31. You may fix it by various ways described above. Thanks!

    References

  • sum2bits - variable-precision SWAR algorithm

    This post is intend to understand a piece of code of the OSRM project, which also posted in Telenav/open-source-spec/sum2bits-swar.
    Checkout Project-OSRM/OSRM-backend, Telenav/osrm-backend and Telenav/open-source-spec for more if you’re interested.

    When read the .osrm.names processing related codes, I found a function named sum2bits() that looks a little strange to me. It’s used for summation of 16 2-bit values using SWAR according to comment, but what I can see in codes are a lot of bitwise operations with some magic numbers. It looks no sense to me.
    What does it exactly do? Let’s try to understand it step-by-step.

        /// Summation of 16 2-bit values using SWAR
        inline std::uint32_t sum2bits(std::uint32_t value) const
        {
            value = (value >> 2 & 0x33333333) + (value & 0x33333333);
            value = (value >> 4 & 0x0f0f0f0f) + (value & 0x0f0f0f0f);
            value = (value >> 8 & 0x00ff00ff) + (value & 0x00ff00ff);
            return (value >> 16 & 0x0000ffff) + (value & 0x0000ffff);
        }
    

subscribe via RSS