Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I understand this correctly, maven based builds can contain dependencies on libraries hosted on remote servers. golang build system has (or had) something similar too. Witnessing this trend take hold is astonishing and horrifying in equal parts. Not just as a security problem (which is clearly obvious) but also a huge hole in software engineering practices. How can anyone run a production build where parts of your build are being downloaded from untrusted third party sources in real time? How do you ensure repeatable, reliable builds? How do you debug production issues with limited knowledge of what version of various libraries are actually running in production?


Java developers kind of laugh when I explain them that Linux distros struggle to bootstrap Maven from source due to being a non-trivial tool that depends on hundreds of artifacts to build.

The point is, what do you care that your repo is local, or that your jars are secured, if the tool you got maven itself in binary form, from a server you don't control?

That is the whole point of Linux distros package managers. It is not only about dependencies. Is about securing the whole chain and ensure repeatability.

Maven design, unlike ant, forces you to bootstrap it from binaries. Even worse, maven itself can't handle building a project _AND_ its dependencies from source. Why will the rest of the infrastructure be important then?

Yes, Linux distros build gcc and ant using a binary gcc and a binary ant. But it is always the previous build, so at some point in the chain it ends with sources and not with binaries.

And this is not about Maven's idea and concept. If it had depended on a few libraries and a simple way of building itself instead of needing the binaries of half of the stuff it is supposed to build in the first place (hundreds), just to build itself.


> Yes, Linux distros build gcc and ant using a binary gcc and a binary ant. But it is always the previous build, so at some point in the chain it ends with sources and not with binaries.

I don't think so. The first versions of GCC were built with the C compilers from commercial UNIX from AT&T or the like. The first Linux systems were cross-built under Minix. At some point you'll go back to a program someone toggled in on the front panel, but we don't have the source for all the intermediate steps, nor any kind of chain of signatures stretching back that far.

> And this is not about Maven's idea and concept. If it had depended on a few libraries and a simple way of building itself instead of needing the binaries of half of the stuff it is supposed to build in the first place (hundreds), just to build itself.

Any nontrivial program should be written modularly, using existing (and, where necessary, new) libraries. Having a dependency manager to help keep track of those is a good thing. I don't see that it makes the bootstrap any more "binary"; gcc is built with a binary gcc for which source is available. Maven is built with a binary maven and a bunch of binary libraries, source for all of which is available.


It's fairly easy to setup a local server containing all your jars and still use maven or ivy. I do that at my current employer.


We use a local repo as well(its easy to setup) and so this type of security is not something we even think about. If we are adding/version changing dependencies we just have to put a little more work into making sure the jar that goes to our local repo is good, but that doesn't happen every day. Of course when prototyping or just playing around this could become an issue...


But in that case why maintain two separate repositories? One for "our code" and one for external. I'm assuming the code in these repositories is open source... right? Why not simply check in the version to be used right in your local SCM?


There are a bunch of different SCMs. It's nice to decouple "hold released builds at specific versions" from your general development repository.


Hosting your own makes sense for multiple reasons: you can be assured what code you are getting, you aren't limited by bandwidth rates of remote providers, and you get to control up/down time. The first is a must; the second and third make life more tolerable.


I'm honestly struggling to understand this line of reasoning. What you seem to be describing amounts to maintaining two different code repositories. Both of them have to be versioned and coordinated with each other. Why not just check in all the code you're using in your real SCM?


Because the word "repostitory" has two different meanings, and you're confusing them.

For external libraries, a "distribution repository" is a file store for a bunch of different projects. It typically stores released binaries for distribution (libfoo.1.5.pkg, libfoo.1.6.pkg, libbar.2.3.pkg, etc...), but could also contain clones of external source repos (libfoo.git/, libbar.hg/, etc...).

Which brings us to the other meaning - a "source repository" is the version-controlled store for the source of a single project.

The repo for external libraries is a distro repo, where the repo for your project is a source repo.

If you're checking the code for multiple projects into a single SCM, why bother maintaining separate source repos at all? Why don't we all use one giant source repo for all projects everywhere? Just check out "totality.git" and use the /libfoo/ and /libbar/ subdirectories. And in your internal company branch, add an /ourproject toplevel for your own code?

When you have answered that question, you will realise why we keep separate projects in separate source repos/SCMs.

Note that you will probably want to publish different "releases" of your own project to an internal distro repo for your internal clients to use, e.g. ourproject.1.2.pkg, ourproject.1.3.pkg, etc...


Thank you for your answer. I may have been a bit imprecise with my point there. I'm well aware of the reasons to maintain multiple repositories for different projects. The weak part of the story above is that we're not talking about separate projects. We're taking about a single project, where the build system is now effectively responsible for source control for some part of the project using totally different protocols for communication, authentication and versioning. Where's the win?


"the build system is now effectively responsible for source control for some part of the project"

Well, I think I've found your problem. :-/

I had a close call with nearly installing/building some Java packages a couple of weeks ago, and due to reasons I eventually decided to try and find a different solution. Looks like the bullet I dodged was bigger than I thought.


"How can anyone run a production build where parts of your build are being downloaded from untrusted third party sources in real time? How do you ensure repeatable, reliable builds?"

By not downloading everything from maven central in real time. Companies usually run their own repository and builds query that one. Central is queried only if the company run repository is missing some artifact or they want to update libraries. How much bureaucracy stands between you and company run repository upgrades depends on company and project needs.

As for production, does anyone compile stuff on production? I through everyone sends there compiled jars. You know what exact libs are contained in that jar, no information is missing.


In golang-land it is popular to deal with this by vendoring all the packages you depend on. There are several tools to manage this like godep. This is my preferred method as it allows for the reliable, repeatable build you are talking about.

There are other schools of thought, like pinning the remote repos to specific commit-id. These are better than nothing, but still depends on 3rd party repos which I think is to risky for production code. It is great for earlier stages of a project when you are trying to work out the libraries you will use and also need to collaborate.


A couple of years ago we were trying to use BigCouch in a product. The erlang build tool was happy to have transitory dependencies that were just pointing at github:branch/HEAD. It got to the point where we'd build it on a test machine, and then just copy the binary around.


With Maven you specify the versions of libraries, jars are cached locally, and you can run your own local Maven server if you need to.


Well, isn't that git or any other sane source control system do that for you? You are maintaining your own repositories of these external dependencies anyway. So why split pieces of your project on two different repos each with their own versioning, network protocol, authentication etc. What exactly are you getting in return for the added complexity?


git doesn't necessarily handle large binaries well (less true in recent versions). Also, when using Maven (etc) you can upgrade libraries just by bumping up the version # rather than hunting down the new jar manually. You can also use the same cached jar in multiple apps/modules rather that committing it in all of them - good if you've got a modularised application.

There are also other benefits like automatically downloading and linking sources and documentation (more relevant if using an IDE).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: