Post

Dockerize Python module with shared object dependencies using multi-stage builds

Dockerize Python module with shared object dependencies using multi-stage builds

So you want to use multi-stage docker builds for your python application? You got rid of all your DEV dependencies with an runtime-image? You encountered an error like

1
ImportError: xxx.so.x: cannot open shared object file: No such file or directory

Then this article is for you!

You will learn how to build multi-stage images with an minimal footprint despite the runtime dependencies.

Python multi-stage docker images

I won’t dig deep into this topic, as this is already covered by pythonspeed.com in their great post
Multi-stage builds #2: Python specifics—virtualenv, –user, and other methods.
Their approach also was an huge inspiration for me, so thank you!

Case Study: Packaging Vowbal Wabbit as Azure ML Environment

Straight from their homepage

Vowpal Wabbit provides a fast, flexible, online, and active learning solution that empowers you to solve complex interactive machine learning problems.

https://vowpalwabbit.org/

My goal is to write a Dockerfile which could serve as a base image for an Azure ML Environment.
A requirement for the environment is an installed and compiled(!) version of Vowbal Wabbit.

So I look up how to install Vowbal Wabbit on Ubuntu, apply the approach from python speed and start with the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
FROM mcr.microsoft.com/azureml/base:openmpi3.1.2-ubuntu18.04 AS compile-image

RUN apt-get update
RUN apt-get -y install --no-install-recommends  \
    libboost-dev libboost-program-options-dev libboost-system-dev libboost-thread-dev libboost-math-dev libboost-test-dev libboost-python-dev zlib1g-dev cmake

RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

RUN python -m pip install --upgrade pip
RUN python -m pip install --no-cache-dir vowpalwabbit 

FROM mcr.microsoft.com/azureml/base:openmpi3.1.2-ubuntu18.04 AS runtime-image
COPY --from=compile-image /opt/venv /opt/venv

ENV PATH="/opt/venv/bin:$PATH"

First, the DEV dependencies are installed. Afterwards, the venv is configured and the pip package of Vowbal Wabbit is installed. Subsequently a new runtime-image is declared and the venv is copied over to the new image.

ImportError: libboost_python3-py36.so.1.65.1: cannot open shared object file: No such file or directory

Unfortunately, starting up a container based on this image, open up a python shell and hit

1
from vowpalwabbit import pyvw

leads to

1
2
3
4
5
Traceback (most recent call last): 
   File "<stdin>", line 1, in <module> File "/opt/venv/lib/python3.7/site-packages/vowpalwabbit/pyvw.py", 
   line 5, in <module>     
   import pylibvw ImportError: libboost_python3-py36.so.1.65.1: 
   cannot open shared object file: No such file or directory

Looking at the Vowbal Wabbit repository leads to the conclusion, that the python package, at is heart, is a wrapper for an C library. This library is compiled by installing the python module. Unfortunately, the library has dependencies by itself, which are not obvious (at least for me).

So, for me there are three issues to resolve:

  • find all dependencies of this library/shared object
  • find missing dependencies in the multi-stage Docker image
  • ensure that they are part of the final docker image

Find dependencies

Inspired by the great post from pipwheels, apply the following steps:

  • Remove the runtime-image and everything below from the Dockerfile
  • Rebuild the image
  • Navigate to /opt/venv/lib/python3.7/site-packages
  • Call ldd pylibvw.so

This resulted in an list of shared object dependencies, which are all present. This is expected as all DEV dependencies are installed in this version of the Docker image.

Find missing dependencies

Afterwards, re-insert the the compile-image part into the Dockerfile and rebuild the image.
Again, start a container with the new image and execute ldd pylibvw.so

The output should show the following missing dependencies:

libboost_python3-py36.so.1.65.1 => not found libboost_program_options.so.1.65.1 => not found

Ensure that they are part of the final docker image

You can use the approach of pipwheels and install them using apt install.
Or you use the COPY –from= command provided by Docker, which is used in the final Dockerfile.

1
2
COPY --from=compile-image usr/lib/x86_64-linux-gnu/libboost_python3-py36.so.1.65.1  usr/lib/x86_64-linux-gnu/libboost_python3-py36.so.1.65.1 
COPY --from=compile-image usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.65.1  usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.65.1

Final Dockerfile

The previous considerations lead to the following Dockerfile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
FROM mcr.microsoft.com/azureml/base:openmpi3.1.2-ubuntu18.04 AS compile-image

RUN apt-get update
RUN apt-get -y install --no-install-recommends  \
    libboost-dev \
    libboost-program-options-dev \
    libboost-system-dev \
    libboost-thread-dev \
    libboost-math-dev \
    libboost-test-dev \
    libboost-python-dev \
    zlib1g-dev \
    cmake

RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

RUN python -m pip install --upgrade pip
RUN python -m pip install --no-cache-dir vowpalwabbit 

FROM mcr.microsoft.com/azureml/base:openmpi3.1.2-ubuntu18.04 AS runtime-image

COPY --from=compile-image /opt/venv /opt/venv

COPY --from=compile-image usr/lib/x86_64-linux-gnu/libboost_python3-py36.so.1.65.1  usr/lib/x86_64-linux-gnu/libboost_python3-py36.so.1.65.1
COPY --from=compile-image usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.65.1  usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.65.1

ENV PATH="/opt/venv/bin:$PATH"

Now, we have a functional version of Vowbal Wabbit installed without the bloating DEV dependencies.

Wrapping Up

We explored an approach to minimal Dockerfiles for python applications with shared object dependencies using

  • Docker multi-stage image
  • the UNIX ldd tool
  • the Docker COPY –from= command

I am quite optimistic that this approach can be applied to other python packages as well.

Thank you for your time, see you next time!

This post is licensed under CC BY 4.0 by the author.