Creating a conda package with compilation and dependencies

I’ve just finished, what was for me, a difficult compiler/packaging attempt – creating a working bioconda package for seer. You can look at the pull request to see how many times I failed: https://github.com/bioconda/bioconda-recipes/pull/11263

(I would note I have made this package for legacy users. I would direct anyone interested in the software itself to the reimplementation pyseer)

The reason this was difficult was due to my own inclusion of a number of packages, all of which also need compilation, further adding to the complexity. While most of them were already available in some form in anaconda it turned out there were some issues with using the defaults. I want to note some of the things I had to modify here.

gcc and glibc version

seer requires gcc >4.8 to compile, and glibc > 4.9 to run. The default compiler version in conda is 4.8. Simply add a ‘conda_build_config.yaml’ file containing:

c_compiler:
  - gcc # [linux]
cxx_compiler:
  - gxx # [linux]

Submodules

I had dlib and gzstream as submodules. If you use a git_url as the source these clone recursively, but not with a standard url in meta.yaml. I needed to do ‘git clone –recursive’ with repository and tarball it myself to include these directories in the git hub release.

gzstream

Is not available on the bioconda channels so I had to compile myself. I included this as a submodule, but rather than using the default Makefile I needed to add the conda defined compiler flags to ensure these were consistent with later steps (particularly -fPIC in CPPFLAGS).

make CXX="$CXX $CXXFLAGS" CPPFLAGS="$CPPFLAGS -I. -I$PREFIX/include" AR="$AR cr"

boost

I was attempting to link boost_program_options using either the boost or boost-cpp anaconda packages, which unlike most boost libraries requires compiling. This led to undefined symbols at the linking stage, which I think are due to incompatible compiler (options) used to make the dynamic libraries in the versions currently available on anaconda. This turned out to be the most difficult thing to fix, requiring me to compile boost as part of the recipe.

Rather than downloading and compiling everything, I followed the boost github examples and made a shallow clone, with a fully copy of the boost library I’m using:

git clone --depth 1 https://github.com/boostorg/boost.git boost
rmdir libs/program_options
cd boost
git clone --depth 50 https://github.com/boostorg/program_options.git libs/program_options
git submodule update -q --init tools/boostdep
git submodule update -q --init tools/build
git submodule update -q --init tools/inspect

I then included this in the release tarball. A better way may be to use submodules so this is done automatically using –recursive.

This library needed to be built, but I did so in a work directory to avoid installing unexpected packages with the recipe. Following the conda-forge build.sh for boost-cpp:


pushd boost
python2 tools/boostdep/depinst/depinst.py program_options --include example
CXXFLAGS="${CXXFLAGS} -fPIC"
INCLUDE_PATH="${PREFIX}/include"
LIBRARY_PATH="${PREFIX}/lib"
LINKFLAGS="${LINKFLAGS} -L${LIBRARY_PATH}"
cat < tools/build/src/site-config.jam
using gcc : custom : ${CXX} ;
EOF
BOOST_BUILT="${SRC_DIR}/boost_built"
mkdir $BOOST_BUILT
./bootstrap.sh --prefix="${BOOST_BUILT}" --with-libraries=program_options --with-toolset=gcc
./b2 -q \
variant=release \
address-model="${ARCH}" \
architecture=x86 \
debug-symbols=off \
threading=multi \
runtime-link=shared \
link=static,shared \
toolset=gcc-custom \
include="${INCLUDE_PATH}" \
cxxflags="${CXXFLAGS}" \
linkflags="${LINKFLAGS}" \
--layout=system \
-j"${CPU_COUNT}" \
install
popd

The python2 line sorts out the header libraries required to compile, not included in the shallow clone. The rest are standard methods to install boost, ensuring the same compiler flags as the other compiled code and using the conda compiler.

I then needed to link this boost library statically (leaving the rest dynamic), so modified the make line as follows:

make CXXFLAGS="$CXXFLAGS" \
  SEER_LDLIBS="-L../gzstream -L${BOOST_BUILT}/lib -L/usr/local/hdf5/lib \
  -lhdf5 -lgzstream -lz -larmadillo -Wl,-Bstatic -lboost_program_options \
  -Wl,-Bdynamic -lopenblas -llapack -lpthread"

armadillo

The final trick was linking armadillo correctly. Confusingly it built and linked ok, tested ok locally, but on the bioconda CI I got undefined symbols to lapack at runtime:


seer: symbol lookup error: seer: undefined symbol: wrapper_dgbsv_

This was due to armadillo’s wrapper around its include which links in the versions of blas/openblas and lapack defined at the time it was compiled, which I think must be slightly different from what is now included with the armadillo package dependencies on conda. Easy enough to fix, use a compiler flag to turn the wrapper off and link the libraries manually:

make CPPFLAGS="${CPPFLAGS} -DARMA_DONT_USE_WRAPPER" \
  LDFLAGS="${LDFLAGS} -larmadillo -lopenblas -llapack"

After all of that, it finally worked!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s