Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> First, the Dockerfile file format for declaratively describing a machine (operating system, installed packages, processes, etc).

The Dockerfile format is not declarative.

Declarative means you define the state you want

Imperative means you define the commands to run

Dockerfile is 100% imperative



I went down this rabbit hole a while back:

https://zwischenzugs.com/2023/04/27/is-it-imperative-to-be-d...

TL;DR you're probably right, but it's more complicated than I thought.


Example of an actually-declarative container definition:

https://github.com/nlewo/nix2container/blob/master/examples/...

(disclosure: I have contributed to that repo)


>writeText >writeTextDir >runCommand

this is not looking particularly more declarative than docker


The function name is misleading you into thinking that it is imperative. In reality, the function just returns a path to a file in the Nix store with the specified contents. The derivation then returns a container in which Nginx is pointed to that same config file. Nothing is temporal or happening in any particular order - you declare some inputs, and you get some outputs (a Docker container, in this case).


It's only setting file contents and folder generation, that's still declarative.


Genuine question, how is that more declarative than

    FROM nginx


Recall, the Dockerfile syntax is 'imperative'. If we change the order of the commands in the Dockerfile, we likely end up with a different image.

In the Nix example, the image we build is in the expression `nix2container.buildImage { ... }`.

The `nginxWebRoot` is the package with the index.html:

    nginxWebRoot = pkgs.writeTextDir "index.html" ''
      <html><body><h1>Hello from NGINX</h1></body></html>
    '';
It's reasonable to say "writeTextDir" modifies the disk. I don't think it's reasonable to say just because changes in state occur that the code is imperative. (e.g. SQL is a declarative language, but clearly allows modifying the database).

Similarly:

    nginxVar = pkgs.runCommand "nginx-var" {} ''
      mkdir -p $out/var/log/nginx
      mkdir -p $out/var/cache/nginx
    '';
We can say the contents of the runCommand argument are run imperatively, sure. (Especially: if you change the order of the bash commands, you might get a different result). But unlike the Dockerfile, the order where we declare this nginxVar package doesn't matter.

Or, say: the `copyToRoot` in the `nix2container.buildImage` takes in a list of packages where the contents are copied to root. The copying is an action; but the list of what to copy is not an action. -- And again, `copyToRoot` could be put after the `config` attribute.

The mechanisms describing how the copying is done is elsewhere.


One of the biggest ones is that that the nix2container definition is evaluated in the context of a flake.nix file that specifies all the inputs, and a `flake.lock` that guarantees they stay frozen.

By comparison, "FROM nginx" is just grabbing whatever is the latest in some external registry that you don't control— it's the same as starting a Dockerfile with "apt update; apt dist-upgrade", you have this huge chunk of external mutable state that you're dependent on which immediately throws any kind of real reproducibility out the window.

(And yes, doing "FROM nginx:x.y" or "FROM nginx:<sha>" does help a little, but the point remains that you're pulling a big binary blob that is essentially mystery meat— trying to make sense of what's in there is why there's now entire companies dedicated to untangling software bills of materials.)


Good point on the mutability of docker tags. But not sure how applicable "you're pulling a big binary blob that is essentially mystery meat" is when cache.nixos.org exists.


Fair, I suppose— both are remote build systems that you have to put trust in when you pull their tarballs.

But even in a world where Debian's reproducible build project completely achieves all its goals, a given docker build is always going to have temporal state in it if it depends on external images or a mutating package repository. So yes, you may have the Dockerfile that purportedly produced that disk image, but you're unlikely to be able to completely rebuild or verify it unless you also have a snapshot of the apt Packages.gz.

A nix2container image could in principle build completely from scratch, in just one command line invocation, with no external cache present, and get a bit-for-bit identical result. The only real "trusted" input that you have to start with is I believe a small busybox binary and gcc toolchain that is the initial bootstrap.


What if I told you that “imperative” and “declarative” are subjective terms and a matter of opinion


Every program is written in functional style if you squint hard enough.


But it seems to be doing it imperatively. I’d expect something like ‘nginxConf = pkgs.file “nginx.conf”, “contents”’ instead of ‘nginxConf = pkgs.writeText “nginx.conf”, “contents”’.

Not saying the system doesn’t apply this declaratively, but I find it difficult to intuit the above is checking for a state and applying changes only if necessary.


One distinction in Nix vs Docker is that Nix has a dag structure as opposed to a singlely linked list structure of layers.

The "writeText" function produces a derivation (basically an atomic build recipe) that produces that file. The crux of nix is that you make deterministic derivations, and then you can always refer to the results of a derivation from the hash of the derivation and its inputs.

What nix adds is glue logic to chain these derivations together in a way that preserves reproducibility of the individual imperative, but deterministic, components.

Unless you are using something like recursive-nix, you can completely evaluate the nix expression without building any of the derivations.


Also relevant to note that although Nix builds individual derivations imperatively (call this compiler, write this file, rename this directory), it completely controls all the inputs to that imperative process.

This is fundamentally different from a Dockerfile or Ansible script which have no idea what the "starting point" of the target environment is and are pretty much just mindlessly imposing mutations on top of whatever happens to already be there.


I'd say both.

Some is declarative, e.g. `FROM`, `ENV`, `EXPOSE`. While on the other hand `RUN`, `CMD`, etc. is fully imperative.


You don't really get credit for being "both". There are maintainability and comprehensibility benefits to keeping anything imperative out of a language (you don't have to reason causally from one statement to the next), which is out the window when you introduce imperative elements. Also: in a Dockerfile, those imperative elements are the heart of the system.


`ENV` is a bad example because it's effect differs greatly from where it's placed in the Dockerfile. Eg: before a RUN statement consuming it's value or after.

`FROM` also has more use cases when using multi stage builds.


oh, you're right. I forgot that a key characteristic of anything being "declarative" is that order of statements should not matter.

Acutally, come to think of it, since `RUN` may depend on any other Dockerfile statement (even `EXPOSE` might make a difference in code), does this mean that even a single imperative statement that is introduced in some language, makes the language imperative?


A Docker image is basically the cached result of a lucky, nondeterministic imperative build success from a Dockerfile.

In comparison, a Nix file is actually declarative ("I want my result system to have this" as opposed to "Do this to get me towards my result system"), and is actually reproducible.

But notice the sizes of both a Docker image (many megabytes) and a Nix file (a couple of K)...


A many megabytes Dockerfile?! Where?


You're right, and I meant "Docker image", was fortunately able to edit before the window closed! (Edited it for clarification. Sorry, 2-year-old plus no breakfast or coffee, minus sleep = daddy brain...)

A Dockerfile provides no guarantees that it will succeed, is what I was getting at- which is why people download Docker images to begin with, because the build product is guaranteed to work, since it's immutable at that point.


Like with functional programming, which at extremes is only declarative, there is a tendency to call “declarative” only those approaches that are perfectly declarative and incapable of imperativeness. However, not unlike purely functional programming, declarativeness is useless unless it is contaminated with real world at some boundary.

If you pretend that COPY means with these files like those files on host machine as of execution time, RUN means with this command executed at build time, etc., then even Dockerfile becomes fairly declarative. Every declaration defines a new immutable layer with its own unique hash, it’s just that some declarations can easily be used in ways that make the outcome vary based on the state of the entire world as of build time.

Just as in Ansible you can use “declarative” YAML in a very imperative lasagna of a setup, you can do the same with a Dockerfile, Nix, Haskell, or Python. You can also get pretty close to purely declarative bliss with any of them, but it will grow impractical before that point.


The thing is, some of the first advice you'll get about Docker is that the order matters a ton. These two Dockerfiles technically create the same result, but the difference between the two is very important:

    COPY . .
    RUN npm install
And:

    COPY package.* .
    RUN npm install
    COPY . .
The first file reruns npm install every single time any file changes in the code. The second only reruns npm install if the packages change. That can make the difference between a 5-minute build and a 5-second build, so it's not a small optimization.

Given how important the order of instructions is, it's hard for me to think of it as declarative. It fits better in my mind as a sequence of instructions, which is the very definition of imperative.


Correct.

I find Docker an interesting example of a fundamentally elegant approach that, possibly to facilitate commercialisation, was documented in a way that, instead of imposing any particular culture (such as, say, Nix or Haskell), strongly aligned with the preexisting culture of operations and system administration—the resulting ease of adoption and popularity (and, I assume, broadly deserved financial well-being of the original creators), however, could not come without a variety footguns along the lines of which you have described.


I feel like "not declarative" is fair enough if you look at how the important bits work in a Dockerfile. Like what software is installed, that's not usually some structured thing...but typically shell commands out to apk or apt-get.

I get why it is the way it is, but if it were more declarative, it would be easier to manage Dockerfiles through changes, security updates, etc.


You can shell out in Nix and Haskell. If they are not “declarative” then what is [both declarative and useful, a.k.a. capable of interfacing with outside world]?

Dockerfile is intentionally bare-bones. It just gives you RUN straight up, no scary hidden option, but it’s up to you how to use it. If you want to write imperative, you can. If you don’t write imperative, pin everything to a hash, shell out only to Dhall and Prolog, it can get very declarative…

…at a cost. The fact that people do not tend to go this route means equally that 1) they are lazy, and 2) using RUN this way is pragmatic.


That's what I meant by "I get why it is the way it is".

I'm not suggesting it change. I'm saying that calling it "not declarative" seems fair, based on the way that most people actually use it.


And I’m saying that it is not fair, for the same reason (as well as for another reason, which in short is “nothing can be both declarative and useful by that logic”).


> Every declaration defines a new immutable layer with its own unique hash...

This blurs the meaning of "declarative" to the point of meaningless.

Consider the following pseudocode:

  x = 5
  print x
  x = 10
  print x
This is clearly 'imperative'.

But from the perspective of 'each line of code declares a new program', then we consider the code snippet a declaration of a program.

> Just as in Ansible you can use “declarative” YAML in a very imperative lasagna of a setup

One of the things which makes Dockerfiles imperative is the sequence of Dockerfile commands is significant; if you swap the order of a COPY, RUN to a RUN, COPY, the result changes significantly.


I also would to call Dockerfiles imperative on syntactic grounds alone, but I feel they have decalartive semantics in a way that's noteworthy.

> But from the perspective of 'each line of code declares a new program', then we consider the code snippet a declaration of a program.

The critical difference is that each line in a Dockerfile yields (declares) a filesystem state which can be referenced and recreated. In contrast, in your example, I have no way to say "give me a snapshot of the system between the third and fourth instructions".

Dockerfiles have all sorts of rules and restrictions that make these semantics possible. You cannot create loops; there is nothing like a "function", at least within the context of one Dockerfile.

> One of the things which makes Dockerfiles imperative is the sequence of Dockerfile commands is significant; if you swap the order of a COPY, RUN to a RUN, COPY, the result changes significantly.

I reject this line of reasoning, simply because decalative languages are indeed order-dependent:

    let y = f(g(x))
is different than:

    let x = g(f(x))
Much in the same way that these are different:

    FROM x AS y
    RUN g
    RUN f
   
vs

    FROM x AS y
    RUN f
    RUN g


If you call things “declarative” or “imperative” on syntactic grounds, which would you call Ansible (like Docker, a devops tool, famously using pure YAML but in many cases for basically running a bunch of scripts)?

It’s just not a reasonable way of making the distinction; claiming X is “imperative” because you are using it in imperative ways is logically flawed, it is not a statement of truth about X.

Dockerfile is fundamentally declarative, as you note (that’s just how Docker works: every line describes a layer), and it has not even enough features to make it imperative (control flow? goto?).


I think it is fair to distinguish between syntax and semantics here.

Haskell's "do" notation is frequently described as an imperative syntax for functional/declarative transformations. I would put Dockerfiles in the same boat. They behave declaratively, but users can think imperatively when they write them (to a certain extent) and this is part of what makes them more accessible to newcomers.

Ansible is a great example of the opposite. It looks declarative but, like you said, basically runs a bunch of scripts one after another on a system, with state and all.


> I reject this line of reasoning, simply because decalative languages are indeed order-dependent

Eh, I can kinda see that. -- It's possible to imagine a declaration where the order of items in a list has consequence.

Still, I'd Dockerfiles as a sequence of statements more akin to a bash script than to an SQL expression.


This is a deeply mistaken view informed by the cargo-culting devops traditions of yore. A Dockerfile is a declaration of layers that does not resist being used in imperative way, in which sense it’s no different from YAML or any functional language you can think of.


Yeah, to pile on, I often see this line of thinking:

> Still, I'd Dockerfiles as a sequence of statements more akin to a bash script than to an SQL expression.

lead to some quite inefficient and poorly-factored Dockerfiles.


> This blurs the meaning of "declarative" to the point of meaningless

The alternative is to draw a clear line where there is none.

> One of the things which makes Dockerfiles imperative is the sequence of Dockerfile commands is significant; if you swap the order of a COPY, RUN to a RUN, COPY, the result changes significantly.

By that logic you can call every Nix program “imperative”. They are sequences of strings, the order of which matters. The horror!

A Docker image is a stack of layers. Every layer points to a preceding layer. It’s not a huge leap from that structure to a fairly elegant sugar of a string array, where each string is a declaration that applies to preceding layer thus describing the next one. I don’t see anything fundamentally imperative about it; it can be used in imperative ways, but anything can.


> The alternative is to draw a clear line where there is none.

I'd say that more/less "imperative" refers to describing sequences of actions that take place (especially those which might modify some state), whereas "declarative" is more about a structure of what the result should be.

I can agree that a Docker image can be considered as a structure, and that it's possible to construct one declaratively. Whereas I'd say a Dockerfile is an imperative construction of that.


Dockerfile describes a set of immutable layers. Whether you model it in your head as a sequence of commands or as a description of a set of immutable layers, is up to you.


Yeah, that's why I would consider a Makefile declarative even though it can contain bits saying how to do something. I can't consider a Dockerfile declarative simply because it is necessarily executed top to bottom.


An array can be declarative even if it has order. Think of the ordered list of statements in a Dockerfile as syntax sugar for a linked list of (previousLayerPointer, nextLayerDeclaration) tuples. I expanded on this in another comment.


Fair enough that everything is a scale, and you _can_ write dockerfiles in a more or less declarative way

But the syntax itself is conducive to imperative image definitions, and they are by far the most common


> Declarative means you define the state you want

“FROM ubuntu:latest” is pretty declarative then…


It an imperative command telling to copy/import another image, not different from importing a library in any other imperative language


It is the most declarative thing in a Dockerfile. It becomes an imperative solution very quickly.

Here's an example Dockerfile:

FROM McDonalds Cheeseburger HappyMeal

RUN Get ketchup packets

RUN Squirt ketchup on fries

RUN Dump happymeal on plate

The first is declarative, a menu, you point at what you want and software makes it so.

The rest is a recipe, an imperative, a command, you tell software to perform, and software makes it so.


If you're going to make absolutist statements about nitpicky minutiae, you have to get it right.

In point of fact the image side of a Dockerfile, where an image is a DAG of other images referenced by an immutable ID or pointer to hosted content, is "100%" declarative. It's only the "build" syntax that is ordered.


OP specifically said "The Dockerfile format is not declarative." They made no mention of the image side, only the build syntax.


If you're going to be pedantic about minutae, you have to get it right.

The "image side of a Dockerfile" isn't a Dockerfile, it's an image, more specifically an image in the OCI Image Format [0]. A Dockerfile is just the most common syntax for controlling software that can create an OCI image (such as Docker and Podman).

You could argue that the OCI Image Format is declarative, but that's not relevant to OP's comment about Dockerfiles.

[0] https://github.com/opencontainers/image-spec


In all the real-world cases I've seen, the base images in Dockerfiles are just tags, which are mutable (especially the :latest tag which changes with every release).


We are only talking about the build syntax here ("the Dockerfile file format")




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: