Parallel and Distributed Execution

summary: A parallel build that does not require the build master or plugin author to concern themselves with writing thread-safe code. status: Discussions are happening and we will iteratively add features for this after 1.0. code: planned

Every build has plenty of parallelism which Gradle can exploit to complete the build a quickly as possible. For example, the tests for a particular project can be running while its dependent projects are built and tested. Gradle will provide a general purpose mechanism to spread the execution of a build across multiple processes on a given machine, and across multiple processes running on different machines. Gradle will extract the parallelism of a given build based on information already present in the project model (such as the project and task dependencies, task input and output files, and so on). It will do this without requiring that the build master or plugin author concern themselves with writing thread-safe code.

Distributed execution is not just about speeding up the build. It allows a build to use resources without requiring them to be installed on the local machine. For example, a build user can build multiple native versions of an application without needing to run the build on each operating system. Or, functional tests can run against multiple different target platforms that may not be possible or easy to install locally, such as selenium tests which use multiple versions of Internet explorer. Distributed execution also opens up options to make the build independent of location. For example, a build user might be working at home or at a client site, and can use distributed execution to use a build farm at work, which can access internal repositories and other resources.

There will be very little effort involed in maintaining a build farm. The process for adding a build machine will be simple: Unzip the Gradle distribution and run a script to start the daemon. Optionally, you might specify some meta-data about the build machine. The build master will be able to specify in the build script various constraints on this meta-data which a build machine must meet in order to run the build. Gradle will discover and make use of those machines which are available and meet the criteria. Gradle will offer some capabilities for optionally restarting the affected part of build when a build machine failures. Gradle will expose state and operational capabilities over JMX, to allow centralised management.

Security will be baked into the platform. A build user will be authenticated with the build agent, and the build agent authenticated with the build user. Integration with existing enterprise authentication and authorisation infrastructure will be possible. On platforms where it is available, the builds for each build user will run in their own sandbox. Encrypted transport will be used to move files between the build user’s machine and the build agent.

The Gradle team might also look at being able make use of the build agents provided by various CI tools. We will offer tighter integration with various popular monitoring and provisioning tools.

Have you considered writing a Jenkins plugin that is “Gradle smart”? While fine grain parallelism isn’t a good match for Jenkins, some of your tasks for a build farm might be …

I am waiting for this feature quite a long time. During the builds, a few of the processors of our build server seem to be bored and they should help to reduce the overall build time.

Very much like the sound of this, I was wondering whether I could bake my own build parellism using gpars but was very much aware that the risks of gettting things nastily wrong would increase if I manage the parallelism myself. Luke also mentioned that it’s not an easy thing to integrate gpars with Gradle at the moment. So to hear that distributed and parallelised builds are on the Gradle roadmap is great!

I think distributed builds is not the right direction for Gradle. As neat as a feature as it would be, I doubt there would be a lot of adoption. Most companies are already tied into a CI infrastructure which they trust to run a server farm. Those CI servers are typically build tool agnostic. I don’t see the argument can be made that they would maintain a second server farm that is tied to a single build tool. Likewise, doing anything distributed is HARD, and it would consume a massive amount of developer work which would better be spent on promoting Gradle and increasing adoption.

Now parallel builds on the same machine would be a huge win. Everyone’s got CPUs sitting idle, and it doesn’t have any of the distribute file system issues, except for a shared cache that should be already taken care of with the new wharf cache. It would fit right into the existing CI ecosystem, while also benefiting developers on their desktop. The task graph seems embarrassingly parallelizable.

One distributed feature which I can see getting adoption is distributed tests. It’s a feature that no one else has natively. Which means in the meantime we have to hack it together manually cobbling together groups of tests and running them separately. But even then, long running tests tend to be integration tests which require the developer to consciously make parallelizable since they might be reaching out to a common external system. So, even that isn’t a slam dunk. I can barely imagine getting developers capable of running tests in parallel on the same machine, getting them to get their tests working in a distributed fashion is almost unthinkable. I completely understand the argument that tests should be independent, blah, blah, blah. But the world just doesn’t work that way sometimes. At a minimum they’d need to be test framework level support.

I agree. The best possible Jenkins support would be the more helpful thing to implement. Since many of us are stuck with multiple modules, which are not built as gradle multi module build. Which means we have a hoard of Jenkins jobs. Coordinating what upstream/downstream modules should be built should be in Gradle’s wheel-house. Just like running jobs in parallel is what Jenkins is good at. The existing Ivy support is poor, and it’s not gradle-aware.

As a build architect, this is exactly the kind of stuff I’ve been struggling with. My software baseline will have hundreds of modules. How should I build them? * Jenkins Ivy plugin is great at handling upstream/downstream dependencies and module level parallelism across nodes. Unfortunately, it is a Jenkins specific solution, i.e., it doesn’t apply to those of us who like to build outside of Jenkins. For that, you have to have separate build logic. * Gradle multi-module support is great, but has a different DSL to express intra-module dependencies versus the DSL used to declare dependencies obtained via repositories.

I think Gradle needs to have a story that describes how the many people like us can use our existing CI tools with Gradle. My suggestion is to let Jenkins do what it is good at (cron jobs & chunking work across build nodes) where Gradle can be called the same way inside or outside Jenkins using the same dependency DSL.

I’m definitely not complaining. I understand this is a work in progress. I look forward to seeing the result …

Hi Justin,

thanks a lot for your feedback. I’m just going through the roadmap as part of the 1.0 release and saw your comment. Unfortunately I missed it in the first place.

We all agree that parallelizing would be nice :). We will do this anyhow not multi-threaded but multi-process. So the step to distributing is not as big as it may seem. Additionally we don’t plan to come up with our own distributed execution infrastructure. One option would be to use Jenkins as a distributed execution platform (below the job level). I had a chat with Koshuke about this and he thinks this would make sense. Or we set up temporary Jenkins jobs for dealing with that. We are mostly interested in those things:

  • Allow Gradle builds to be well partionable. At the moment Gradle multi-project build are a little bit to coupled.

  • Provide a DSL that allows people to model the distribution requirements, e.g. for distributed integration tests. You would specify platform requirements, db, … Whether we generate Jenkins job out of them and execute them or fire them up somewhere else is an implementation detail.

    Why I think a new more powerful way of doing distributed builds is important

Exploratory learning and the local developer build

I see the the dirty local working copy of a developer a main use case for distributed builds. You are doing changes and are wondering what particular effect they might have. You might have very different questions you want to ask.

  • Do I still work with all the other golden versions of the other modules * Do I still work with the latest from head of module Foo. * Do I still work with all the modules from the team that doesn’t like the direction I’m going and who are just waiting for me to break things. * Can someone run the long running integration tests distributed please.

There are many more questions you want to ask. We don’t know them in detail but we know that they exists. People will only ask this question if they are easy to ask. We want to provide them a toolkit that makes it easy to parameterize them. The distributedness has two aspects. One is performance, the other is using multiple platforms.

Distributing a single build

There are huge legacy builds out there which are bound to one build pipeline. The current mechanisms of the CI machines are not capable of easily distributing such a build.

Distributed Integration Test

I agree with your point that the world of integration testing is and will remain dirty. This makes it even more important tough to give people stronger tools. I’m sure that many integration tests are not distributed because it would be such a pain to set them up and maintain this on CI, although the distribution potential is high. If people can express all the special cases (e.g. test order relevance, non-parallelizable but distributable, platform requirements, DB set up) with a powerful toolkit this would be a game changer in my opinion.

I think a programmatic way to express your distribution requirements for a build is more and more a matter of survival in many of the complex builds we have encountered.

Hello Hans,

Would it be possible to split this roadmap item into two separate items, one for parallel builds and another for distributed builds?

That way the parallel build feature could be shipped sooner, instead of waiting for the distributed build feature which will be more complex (authentication and discovery of build daemons on the network, etc)

Also, I am a bit concerned that you mention parallel builds would be multi-process. Gradle has a pretty high startup overhead, to the point that you have to use the Gradle daemon and stuff like that to alleviate it. Starting multiple Gradle processes for a parallel build could negate a large part of the performance gained from parallelizing, especially if it’s an incremental build that needs to do less work than a full rebuild.

I think it would be much better and faster if the parallel builds were implemented using threads (or executors, or fork/join) in a single Gradle process.

The roadmap is misleading here and also not fully up to date. We will tackle those things iteratively. In Gradle 1.2 we have introduced an incubating parallel build feature (not production ready yet) which is going multi-threaded. Have a look at the release notes. We will soon revamp our roadmap in some form so that it will stay more in tune with the current planning process.

Some (quasi-random) thoughts I’ve had about this:

There are two parts to distributed builds: * distributed data * distributed processing

The distributed data part can be solved by having a more abstract notion of a ‘file’ and/or with a distributed file system. For example, such a ‘file’ can be a URL (eg pointing to a SCM and file location) or it can include inputs (eg compiler flags, location of dependencies, etc that are hashed to find the location on a distributed file system).

I see two different kinds of file systems (whether accessed via URLs, using xfs, etc): one is read-only and is meant for files stored in SCM; the other is write-once and is meant for output files. The second file system allows for re-use of outputs (eg outputs created by CI systems).

The distributed processing part can be solved with a message-passing or actor-based architecture (think Akka). This can also be used to enable multi-threaded builds on one machine.

An actor will receive a message that contains inputs (eg file contents of inputs not stored in any distributed file system, file and tool dependencies in the form of abstract files, compiler flags, etc) and outputs (eg file locations in the form of abstract files). If all dependencies are obtainable (eg they have already been built), all the actor needs to do is build the outputs. Otherwise, it sends a message to the system for each dependency that’s missing and a message to build the outputs it was first asked to build.

This is a really rough draft so there’s likely lots missing (eg how do the actors know what the dependencies are in order to create a new message?) but I’m hoping to bring possibly new ideas into this discussion.