Docker V: Customize Docker Image using Dockerfile

From the previous docker commit, we saw that customizing an image is really just customizing the configuration/files that are added at each layer. If we could write each layer of modify, install, build, and manipulate commands into a script, and use that script to build and customize the image, then the unrepeatable problem, image build transparency problem, and size problem mentioned earlier would be solved. This script is a Dockerfile.

A Dockerfile is a text file that contains instructions, each of which builds a layer, so the content of each Instruction describes how the layer should be built.

Let’s also take the example of customizing an nginx image, this time we’ll use a Dockerfile.

In a blank directory, make a text file called Dockerfile:

$ mkdir mynginx
$ cd mynginx
$ touch Dockerfile

FROM nginx
RUN echo '<h1>Hello, Docker!</h1>' > /usr/share/nginx/html/index.html

This Dockerfile is very simple, just two lines. There are two instructions involved, FROM and RUN.

Specify the base image with FROM

The so-called custom mirror, it must be based on a mirror, on which to customize. Just like when we ran a container of nginx images and made changes, the base image must be specified. FROM specifies the base image, so FROM is a required directive in a Dockerfile and must be the first directive.

There are a lot of high-quality official images available on Docker Hub. There are images for service classes that you can use directly. For example, nginx, redis, mongo, mysql, httpd, php, tomcat, and so on. There are also mirrors that make it easy to develop, build, and run applications in various languages. For example, node, openjdk, python, ruby, golang and so on. The base image can be customized by looking for one that best matches our end goal.

If you can’t find one, there are also some more basic operating system images available. ubuntu, debian, centos, fedora, alpine, the software libraries of these operating systems provide us with broader scope for extension.

In addition to selecting an existing image as the base image, Docker also exists a special image called scratch. This image is a virtual concept, it does not actually exist, it represents a blank image.

If you are mirroring from scratch, meaning you are not mirroring from any image, the instructions you write next will start to exist as the first layer of mirroring.

It’s not uncommon to copy an executable directly into an image, regardless of any system. For statically compiled programs on Linux, there’s no need for an operating system to provide runtime support, and all the required libraries are already in the executable, so going straight FROM scratch makes the image smaller. Many applications written in Go use this approach for image creation, which is one of the reasons why Go is considered a particularly suitable language for container microservices architectures.

Execute the command with RUN

The RUN command is used to execute command-line commands. Because of the power of the command line, the RUN directive is one of the most commonly used when customizing images. It comes in two formats:

shell format: RUN < command >, as if you were typing a command directly on the command line. This is the format of the RUN command in the Dockerfile you just wrote.

RUN echo '<h1>Hello, Docker!</h1>' > /usr/share/nginx/html/index.html

exec format: RUN [” executable file “, “argument 1″,” argument 2″], which is more like a function call format.

If a RUN executes commands just like a Shell script, can we do the same for each command?

For example:

FROM debian:stretch

RUN apt-get update
RUN apt-get install -y gcc libc6-dev make wget
RUN wget -O redis.tar.gz "http://download.redis.io/releases/redis-5.0.3.tar.gz"
RUN mkdir -p /usr/src/redis
RUN tar -xzf redis.tar.gz -C /usr/src/redis --strip-components=1
RUN make -C /usr/src/redis
RUN make -C /usr/src/redis install

As we said before, every instruction in a Dockerfile creates a layer, and RUN is no exception. Each RUN behaves the same way as when we created the image manually: we create a new layer, execute these commands on it, and when we’re done, commit the changes to that layer to create a new image.

This way, you create seven layers of images. This makes no sense at all, and many things that are not needed by the runtime are loaded into the image, such as the compilation environment, updated packages, and so on. The result is a very bloated, multi-tier image that not only increases build and deployment times, but is also error-prone. This is a common mistake that many people who are new to Docker make.

There is a maximum layer limit for Union FS, such as AUFS, which was once a maximum of 42 layers, but is now a maximum of 127 layers.

The correct Dockerfile above looks like this:

FROM debian:stretch

RUN set -x; buildDeps='gcc libc6-dev make wget' \
    && apt-get update \
    && apt-get install -y $buildDeps \
    && wget -O redis.tar.gz "http://download.redis.io/releases/redis-5.0.3.tar.gz" \
    && mkdir -p /usr/src/redis \
    && tar -xzf redis.tar.gz -C /usr/src/redis --strip-components=1 \
    && make -C /usr/src/redis \
    && make -C /usr/src/redis install \
    && rm -rf /var/lib/apt/lists/* \
    && rm redis.tar.gz \
    && rm -r /usr/src/redis \
    && apt-get purge -y --auto-remove $buildDeps

First, all of the preceding commands have a single purpose: to compile and install the redis executable. So there’s no need to build many layers, it’s just a one-layer thing. Therefore, instead of having many RUNs that correspond to different commands, we use a single RUN directive and use && to chain the required commands together. The previous 7 layers are simplified to 1 layer. When writing a Dockerfile, always remind yourself that you are not writing a Shell script, you are defining how each layer should be built.

Also, there is a line break for formatting purposes. Dockerfiles support Shell classes with a \ at the end of the line and a # comment at the beginning of the line. Good formatting, such as line breaks, indentation, comments, and so on, makes maintenance and troubleshoot easier, which is a good practice to get into.

In addition, you can see that at the end of the set of commands we added a cleanup command, which removes any software needed to compile the build, clears any downloaded, expanded files, and clears the apt cache files. This is an important step, as we said earlier that images are multiple layers of storage, and the things in each layer are not deleted at the next layer, but always stay with the image. When mirroring the build, make sure you only add what you really need to add to each layer, and clean out anything that isn’t.

One of the reasons many people start with Docker and end up with bloated images is that they forget to always clean up extraneous files at the end of every build layer.

Build Image

Okay, let’s go back to the Dockerfile for the custom nginx image from earlier. Now that we understand the contents of this Dockerfile, let’s build the image.

Execute the Dockerfile file in the directory where it is located:

Here we have used the docker build command for image building. Its format is:

docker build -t nginx:v3 .
[+] Building 1.6s (6/6) FINISHED
 => [internal] load build definition from Dockerfile                                                               0.1s
 => => transferring dockerfile: 118B                                                                               0.0s
 => [internal] load .dockerignore                                                                                  0.0s
 => => transferring context: 2B                                                                                    0.0s
 => [internal] load metadata for docker.io/library/nginx:latest                                                    0.0s
 => [1/2] FROM docker.io/library/nginx                                                                             0.2s
 => [2/2] RUN echo '<h1>Hello, Docker!</h1>' > /usr/share/nginx/html/index.html                                    1.1s
 => exporting to image                                                                                             0.1s
 => => exporting layers                                                                                            0.0s
 => => writing image sha256:dee0fd6ffdb13d0a1910e3fab783c8f56fb159c15586ceed1dfe35276a7d311c                       0.0s
 => => naming to docker.io/library/nginx:v3

docker build [OPTIONS] PATH | URL | -

Here we specify the name of the final image -t nginx:v3. After a successful build, we can run this image as we did with nginx:v2 earlier, and the result will be the same as nginx:v2.

Mirror building context

If you pay attention, you’ll see that the docker build command ends with a . . . indicates the current directory, where the Dockerfile is located, so many beginners think that this path is specifying the path to the Dockerfile, which is inaccurate. If you look at the command format above, you may find that it is specifying the context path. So what is a context?

First of all, we need to understand how docker build works; Docker is divided into the Docker engine (aka the server-side daemon) and the client-side tools, and the Docker engine provides a set of REST APIs called the Docker Remote APIs. Docker’s engine provides a set of REST APIs called the Docker Remote APIs, and it is through these APIs that client tools such as docker commands interact with the Docker engine to perform various functions. So while it may appear that we are executing docker functions locally, in reality, everything is done on the server (the Docker engine) using remote calls. This C/S design also makes it easy to manipulate the Docker engine on the remote server.

When we build an image, not all customisation is done with the RUN command, but often we need to copy some local files into the image, e.g. with the COPY command, ADD command, etc. The docker build command builds the image. The docker build command builds the image, not locally, but on the server side, i.e. in the Docker engine. So in this client/server architecture, how can you make local files available to the server?

This introduces the concept of context. When building, the user specifies a path to build the image context, and when the docker build command learns of this path, it packages all the content under the path and uploads it to the Docker engine. When the Docker engine receives this context package, it unfolds everything it needs to build the image.

If you write it this way in your Dockerfile:

COPY ./package.json /app/

This does not copy the package.json in the directory where the docker build command was executed, nor does it copy the package.json in the Dockerfile directory, but rather the package.json in the context directory.

Therefore, the paths to the source files in the COPY command are relative paths. This is why beginners often ask why COPY … /package.json /app or COPY /opt/xxxx /app doesn’t work because those paths are out of context and the Docker engine can’t get the files in those locations. If you really need those files, you should copy them to the context directory.

Now you can understand that the command docker build -t nginx:v3 . This . , is actually specifying the context directory, and the docker build command will package the contents of that directory to the Docker engine to help build the image.

If you look at the docker build output, we actually see this process of sending the context:

 => [internal] load build definition from Dockerfile                                                               0.1s
 => => transferring dockerfile: 118B                                                                               0.0s

Understanding the build context is important for image building to avoid making mistakes that shouldn’t be made. For example, some beginners find that COPY /opt/xxxx /app doesn’t work, so they simply put the Dockerfile in the root directory of their hard drive to build it, only to find that the docker build executes and sends out a few dozen gigabytes of stuff, which is extremely slow and prone to failing builds. That’s because you’re asking docker build to package the entire drive, which is clearly a misuse.

In general, you should place your Dockerfile in an empty directory, or in the root of your project. If you don’t have the required files in that directory, then you should make a copy of the required files. If there is something in the directory that you really don’t want to pass to the Docker engine at build time, then you can write a .dockerignore using the same syntax as a .gitignore, which is used to weed out the things that don’t need to be passed as context to the Docker engine.

So why would anyone mistake . is specifying the directory where the Dockerfile is located? This is because by default, if you don’t specify a Dockerfile additionally, a file named Dockerfile in the context directory is used as the Dockerfile.

This is just the default behaviour, in fact, the filename of the Dockerfile doesn’t have to be Dockerfile and it doesn’t have to be located in the context directory, for example, it can be used with -f … /Dockerfile.php to specify a file as the Dockerfile.

Of course, it is common practice to use the default filename Dockerfile and to place it in the image build context directory.

Other uses of docker build

Building directly from the Git repo

As you may have noticed, docker build also supports building from URLs, such as directly from a Git repo:

# $env:DOCKER_BUILDKIT=0
# export DOCKER_BUILDKIT=0

$ docker build -t hello-world https://github.com/docker-library/hello-world.git#master:amd64/hello-world

Step 1/3 : FROM scratch
 --->
Step 2/3 : COPY hello /
 ---> ac779757d46e
Step 3/3 : CMD ["/hello"]
 ---> Running in d2a513a760ed
Removing intermediate container d2a513a760ed
 ---> 038ad4142d2b
Successfully built 038ad4142d2b

This line specifies the Git repo you want to build from, the branch you want to build from is master, and the build directory is /amd64/hello-world/, and then Docker will git clone the project, switch to the branch you want to build from, and go to the directory you want to build from.

Build with the given tar archive

docker build http://server/context.tar.gz

Build with the given tarball If the given URL is not a Git repo but a tar file, the Docker engine downloads it, automatically unpacks it, and uses it as a context to start building.

Reading a Dockerfile from standard input for a build

docker build - < Dockerfile

cat Dockerfile | docker build -

If the standard input is passed in as a text file, treat it as a Dockerfile and start building. This form, since it reads the contents of the Dockerfile directly from the standard input, it has no context, so it’s not possible to COPY local files into the image or anything like the other methods.

Reading a context archive from standard input for building

docker build - < context.tar.gz

If the standard input file format is found to be gzip, bzip2, or xz, it will be made into a context zip archive, which will simply be expanded, treated as a context, and the build will begin.

Docker V: Customize Docker Image using Dockerfile

Published by jamie on 30 May 202430 May 2024

Specify the base image with FROM

Execute the command with RUN

Build Image

Mirror building context

Other uses of docker build

Building directly from the Git repo

Build with the given tar archive

Reading a Dockerfile from standard input for a build

Reading a context archive from standard input for building

0 Comments

Leave a Reply Cancel reply

Docker VII: DockerFile

Docker VI: Other Ways to Make Docker Images

Docker IV: Understand Mirror Composition with ‘commit’ command

Docker V: Customize Docker Image using Dockerfile

Published by jamie on 30 May 202430 May 2024

Specify the base image with FROM

Execute the command with RUN

Build Image

Mirror building context

Other uses of docker build

Building directly from the Git repo

Build with the given tar archive

Reading a Dockerfile from standard input for a build

Reading a context archive from standard input for building

0 Comments

Leave a Reply Cancel reply

Related Posts

Docker VII: DockerFile

Docker VI: Other Ways to Make Docker Images

Docker IV: Understand Mirror Composition with ‘commit’ command