Native Compilation of Numba on M1-based Mac
This is cross-published on my HackMD
Motivation
The current installation instructions of Numba face some difficulties induced by the way Apple packages its software development toolkit. This note documents how to set Numba up from start to end on an M1-based Mac, and where alternate approaches fail. We are building with support for OpenMP to utilize all available threads, but without support for Threading Building Blocks, which are unsupported on Mac, and likewise without support for CUDA, which is also not available on Mac.
Building Numba from Source for Local Development with OpenMP-Support
To build Numba on Mac (or any platform for that matter) we need to begin by creating a conda environment with the base-dependencies
conda create -n numbaenv python=3.10 numba/label/dev::llvmlite numpy scipy jinja2 cffi
and activate the environment
conda activate numbaenv
At which point we have installed llvmlite for the LLVM JIT-engine underpinning Numba, whose installaton we can now verify from a Python REPL.
import llvmlite
llvmlite.__version__
We can now clone the source of Numba.
git clone git@github.com:numba/numba.git && cd numba
Preemptively disable Threading Building Blocks
export NUMBA_DISABLE_TBB=1
Point to the Mac-specific software development toolkit, the path to which can be found with xcrun --show-sdk-path
export SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
The important pointer to set the
SDKROOT
was found in the conda build-scripts of Numba, but downloading the older software development toolkit, as done in the build script, is not necessary on a modern Mac if you have already downloaded the software development toolkit libraries.
And install the OSX-specific Clang-compilers provided by Conda to not use Apple’s system compiler which have OpenMP disabled, and cannot import OpenMP or accept the -fopenmp
flag at compilation time.
conda install clang clangdev
The command to build Numba itself, for development purposes with --noopt
(no optimizations), and --debug
(debugging options enables) is then
python setup.py build_ext --inplace --noopt --debug
If we would just want to use it locally without the need for the development-focussed deactivation of optimizations we would build with just the --inplace
option
python setup.py build_ext --inplace
After which we can install the Numba wheel into our environment
python -m pip install --no-deps -e .
And verify its installation either from the REPL with
import numba
numba.__version__
or from the command line with Numba’s provided utility
numba -s
The output should look similar to this
System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time) : 2022-11-26 23:48:15.001253
UTC start time : 2022-11-27 05:48:15.001264
Running time (s) : 0.718704
__Hardware Information__
Machine : arm64
CPU Name : cyclone
CPU Count : 10
Number of accessible CPUs : ?
List of accessible CPUs cores : ?
CFS Restrictions (CPUs worth of runtime) : None
CPU Features :
Memory Total (MB) : 32768
Free Memory (MB) : 23
__OS Information__
Platform Name : macOS-13.0.1-arm64-arm-64bit
Platform Release : 22.1.0
OS Name : Darwin
OS Version : Darwin Kernel Version 22.1.0: Sun Oct 9 20:15:09 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T6000
OS Specific Version : 13.0.1 arm64
Libc Version : ?
__Python Information__
Python Compiler : Clang 14.0.6
Python Implementation : CPython
Python Version : 3.10.8
Python Locale : None.UTF-8
__Numba Toolchain Versions__
Numba Version : 0.57.0dev0+846.g728263512
llvmlite Version : 0.40.0dev0+43.g7783803
__LLVM Information__
LLVM Version : 11.1.0
__CUDA Information__
CUDA Device Initialized : False
CUDA Driver Version : ?
CUDA Runtime Version : ?
CUDA NVIDIA Bindings Available : ?
CUDA NVIDIA Bindings In Use : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None
__NumPy Information__
NumPy Version : 1.23.4
NumPy Supported SIMD features : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD', 'FPHP', 'ASIMDHP', 'ASIMDDP')
NumPy Supported SIMD dispatch : ('ASIMDHP', 'ASIMDDP', 'ASIMDFHM')
NumPy Supported SIMD baseline : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD')
NumPy AVX512_SKX support detected : False
__SVML Information__
SVML State, config.USING_SVML : False
SVML Library Loaded : False
llvmlite Using SVML Patched LLVM : True
SVML Operational : False
__Threading Layer Information__
TBB Threading Layer Available : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available : True
+-->Vendor: Intel
Workqueue Threading Layer Available : True
+-->Workqueue imported successfully.
__Numba Environment Variable Information__
None found.
__Conda Information__
Conda Build : 3.21.9
Conda Env : 4.13.0
Conda Platform : osx-arm64
Conda Python Version : 3.9.11.final.0
Conda Root Writable : True
__Installed Packages__
blas 1.0 openblas
bzip2 1.0.8 h620ffc9_4
ca-certificates 2022.10.11 hca03da5_0
certifi 2022.9.24 py310hca03da5_0
cffi 1.15.1 py310h80987f9_2
clang 14.0.6 hca03da5_0
clang-14 14.0.6 default_hf5194b7_0
clang-format 14.0.6 default_hf5194b7_0
clang-format-14 14.0.6 default_hf5194b7_0
clang-tools 14.0.6 default_hf5194b7_0
clangdev 14.0.6 default_hf5194b7_0
clangxx 14.0.6 default_hf5194b7_0
fftw 3.3.9 h1a28f6b_1
jinja2 3.1.2 py310hca03da5_0
libclang 14.0.6 default_hf5194b7_0
libclang-cpp 14.0.6 default_hf5194b7_0
libclang-cpp14 14.0.6 default_hf5194b7_0
libclang13 14.0.6 default_hf5a4b0a_0
libcxx 14.0.6 h848a8c0_0
libffi 3.4.2 hca03da5_6
libgfortran 5.0.0 11_3_0_hca03da5_28
libgfortran5 11.3.0 h009349e_28
libllvm14 14.0.6 h7ec7a93_1
libopenblas 0.3.21 h269037a_0
llvm-openmp 14.0.6 hc6e5704_0
llvm-tools 14.0.6 h7ec7a93_1
llvmdev 14.0.6 h7ec7a93_1
llvmlite 0.40.0dev0 py310_43 numba/label/dev
markupsafe 2.1.1 py310h1a28f6b_0
ncurses 6.3 h1a28f6b_3
numba 0.57.0.dev0+846.g728263512 dev_0 <develop>
numpy 1.23.4 py310hb93e574_0
numpy-base 1.23.4 py310haf87e8b_0
openssl 1.1.1s h1a28f6b_0
pip 22.2.2 py310hca03da5_0
pycparser 2.21 pyhd3eb1b0_0
python 3.10.8 hc0d8a6c_1
readline 8.2 h1a28f6b_0
scipy 1.9.3 py310h20cbe94_0
setuptools 65.5.0 py310hca03da5_0
sqlite 3.40.0 h7a7dc30_0
tk 8.6.12 hb8d0fd4_0
tzdata 2022f h04d1e81_0
wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.6 h1a28f6b_0
zlib 1.2.13 h5a0b063_0
No errors reported.
__Warning log__
Warning (cuda): CUDA driver library cannot be found or no CUDA enabled devices are present.
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it.
--------------------------------------------------------------------------------
Approaches not to take (for mental Sanity)
There are multiple approaches I attempted, but ended up aborting due to their instability, as well as the inherent fallacies with calling from multiple header directories with similar contents at the same time.
LLVM build from source for the Compiler
Circumvent the inability of the native Clang-compiler on Mac, we can try to build our own LLVM toolchain. As LLVMlite is based on a patched-up version of the release/14.x
branch of LLVM, checking out the source for llvm on that branch would be the most natural option
git clone https://github.com/llvm/llvm-project && cd llvm-project
git checkout release/14.x
mkdir build && cd build
To then build the Clang-toolchain from source for the M1-based Mac with the OpenMP libraries
cmake -G Ninja ../llvm -DCMAKE_BUILD_TYPE=Release\
-DLLVM_ENABLE_ASSERTIONS=ON\
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;openmp"\
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi"\
-DLLVM_LINK_LLVM_DYLIB=ON\
-DLLVM_ENABLE_EH=ON\
-DLLVM_ENABLE_FFI=ON\
-DLLVM_ENABLE_RTTI=ON\
-DLLVM_INCLUDE_DOCS=OFF\
-DLLVM_INSTALL_UTILS=ON\
-DLLVM_ENABLE_Z3_SOLVER=OFF\
-DLLVM_TARGETS_TO_BUILD="AArch64"\
-DLIBOMP_INSTALL_ALIASES=OFF\
-DLLVM_CREATE_XCODE_TOOLCHAIN=ON\
-DLLVM_BUILD_LLVM_C_DYLIB=ON\
-DLLVM_ENABLE_LIBCXX=ON\
-DRUNTIMES_CMAKE_ARGS=DCMAKE_INSTALL_RPATH="@loader_path/../lib"
We then have to export the $PATH
, and $LD_LIBRARY_PATH
paths for them to be found before we can continue the attempt to build Numba with this toolchain
export PATH=${PWD}/bin:$PATH
export LD_LIBRARY_PATH=${PWD}/lib:$LD_LIBRARY_PATH
So where does this begin to fail?
- At first we are missing the
stdio.h
library, which we can get from the XCode Developers SDK by manually pointing to the directory of the headers library. - Where this approach really collapses is at the next attempt where we run into conflicting versions of header libraries, which would require a lot of manual linking calls to not search for the header libraries in the entire directories.
Python Virtualenv-based Install
Another typically logical approach to take would be to use a Python-based virtual environment, and hence avoid the Conda-induced isolation with Conda’s own libraries and just have a very thin virtualenv
. To do this we’d
python3 -m venv numbaenv && source numbaenv/bin/activate
pip install llvmlite
at which point we can clone the Numba-repo and start a CPU-based build, which for the lack of abilities only run Numba Threads
without Threading Building Blocks, or OpenMP
python setup.py build_ext --inplace --noopt --debug
So where does this particular approach begin to fail?
- The Numba library built from
main
is incompatible with the version of the llvmlite library shipped by pip. As such, the two do not work together.
Setting CFlags, and LFlags as suggested on Stackoverflow
A typical suggestion on Stackoverflow, or similar sites is to set compiler, and linking flags at the command line to fix the compiler not finding certain header files, and setting a deployment target, i.e.
export CFLAGS="-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include"
export LDFLAGS=-L/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib
export MACOSX_DEPLOYMENT_TARGET=14.1
So why is this approach undesirable and ultimately fails?
- We end up patching up the first missing libraries such as
stdio.h
, but the next missing library is thenvector.h
, and the deluge of missing libraries does not seem to stop. As such I would call this an unclean approach, which I ultimately did not manage to get to work.