Integrating Babashka into Bazel
How to get fast Clojure actions in the best-worst build tool
Bazel is the open-source version of Google's internal build tool. It's terrible in many ways, but it can do amazing things if used correctly. If you're completely unfamiliar with Bazel, check out the "Using Bazel" and "Extending Bazel" sections on the tools homepage. This tutorial is no substitute for those basics, rather a demonstration of how to create something novel and actually useful.
Babashka is a Clojure executable with a bunch of built-in libraries. It improves upon Clojure by having near-instantaneous process start-up.This is achieved by first writing a Small Clojure Interpreter that does not rely on classloading, and then compiling it using GraalVM.
In this tutorial, we'll learn how to wrap Babashka so that we can use it from Bazel to build files, execute actions and write tests with the logic for these operations written in Clojure.
These Bazel rules are based on a very similar set of rules I recently implemented at work. If any of this seems interesting to you, we're hiring!
The final version of the code produced herein can be found here.
Contents
Basic setup
We begin with an empty directory, perhaps with version control initialized.
To start, we need to tell Bazel how to find the babashka executable (called bb
). This is accomplished by adding an entry in the WORKSPACE
file at the root of our, ehh, workspace. This is where all external dependencies of our build system are specified, such as third-party toolchains and downloadable files.
At (1)
, we import the http_archive
workspace rule that allows us to fetch arbitrary archives from the internet and make them available.
Since most software projects don't use Bazel, we can provide a BUILD.bazel
file using the optional build_file_content
parameter to http_archive
at (2)
. In this case, since the archive contains only the compiled bb
executable, we can use the built-in exports_files
function to make it visible. The three leading and trailing double quotes make this a literal string, allowing us to use individual quotes for the filename ("bb"
).
We specify the URL from the "Releases" tab on the project's Github page at (3)
, along with an optional checksum. Astute readers may have noticed that this release only works for Linux users, and they would be correct. We will address this later.
This file should now be available in the workspace. We can verify this by querying all files defined within the @babashka
repository:
$ bazel query "@babashka//...:*"
...
@babashka//:bb # (1)
@babashka//:BUILD.bazel
Line (1)
is the Bazel target that we'll use to refer to the executable from the rules that we will be implementing next.
Producing files: bb_genrule
Re-usable build abstractions are defined as rules in Bazel. These must be written in files that end in .bzl
and are split between interface and implementation.
The primary function of a build system is to produce files by running commands on other files. We will encapsulate this basic operation for the specific case of commands written in Clojure in a rule called bb_genrule
This rule is named in analogy to Bazel's built-in genrule
, which runs arbitrary shell commands and produces one or more output files.
.
Basic rule skeleton
This is the basic body of any Bazel rule. We have an implementation function that we leave empty for now at (1)
.
The rule
function defines the actual rule and refers to the implementation function. It also specifies the interface that we will use to invoke the rule later on in the attrs
parameter.
We specify a script
parameter which will be the Clojure file that we want Babashka to execute at (2)
.
We also specify what we want the output file to be called at (3)
. Lastly, we specify an implicit parameter that won't be specified at the usage site to inject the bb
executable that we provisioned earlier at (4)
.
Further choices and options for attributes can be found in the Bazel docs.
Invoking the rule
As an example, we will use bb_genrule
to define a target babashka_metadata
that produces a file bb-metadata.edn
that contains information about the version of Babashka that we provisioned.
# BUILD.bazel
load("//:babashka.bzl", "bb_genrule") # (1)
bb_genrule(
name = "babashka_metadata",
script = ":get_babashka_metadata.clj", # (2)
out = "bb-metadata.edn",
)
Rule invocation occurs in files named BUILD.bazel
. To be able to use the rule we just defined, we need to import it at (1)
. We invoke the rule with a script from the same directory that we will create promptly at (2)
.
We also write this script that just prints the babashka version to the terminal. This won't work yet, but it'll allow us to get one step closer to a working rule:
;; get_babashka_metadata.clj
(ns get-babashka-metadata
(:require
[clojure.pprint :refer [pprint]]))
(let [metadata {:version (System/getProperty "babashka.version")}]
(pprint metadata))
With all the necessary pieces in place, we can give it a try:
Bazel recognized the targets, but errors when prompted to build it. This is because it can tell from the implementation of the rule that the file bb-metadata.edn
is not being produced yet at (1)
. Let's fix that!
Actually doing the work
--- babashka.bzl.0
+++ babashka.bzl.1
@@ -1,6 +1,13 @@
# babashka.bzl
def _bb_genrule_impl(ctx):
- pass
+ ctx.actions.run( # (1)
+ inputs = [ctx.file.script], # (2)
+ outputs = [ctx.outputs.out], # (3)
+ executable = ctx.executable._bb, # (4)
+ arguments = [
+ ctx.file.script.path, # (5)
+ ],
+ )
bb_genrule = rule(
implementation = _bb_genrule_impl,
By calling the run
action at (1)
, we tell Bazel to call a program with the arguments that we specify.
In our case, the program we want to call can be found under the ctx.file._bb
field (at (4)
) because of how we specified the _bb
attribute in the rule interface.
The only argument we pass, for now, is the path to the script at (5)
, since that will cause Babashka to execute the file.
We also need to specify the input files (at (2)
) and expected output files (at (3)
) of this execution, otherwise they won't be available within the sandbox that Bazel uses to isolate commands.
Now that Bazel knows what to do, we can try again:
Still broken, but as we see at (1)
, our script is being run and producing output, in accordance with the call to clojure.pprint/pprint
.
The issue is that we are not writing the EDN to the output file. To do that, we need to pass the path of the output file to the script:
--- get_babashka_metadata.clj.0
+++ get_babashka_metadata.clj.1
@@ -1,7 +1,10 @@
;; get_babashka_metadata.clj
(ns get-babashka-metadata
- (:require
- [clojure.pprint :refer [pprint]]))
+ (:require [clojure.edn :as edn]
+ [clojure.java.io :as io]
+ [clojure.pprint :refer [pprint]]))
-(let [metadata {:version (System/getProperty "babashka.version")}]
- (pprint metadata))
+(let [{:keys [out-file]} (edn/read-string (first *command-line-args*)) ; (1)
+ metadata {:version (System/getProperty "babashka.version")}]
+ (pprint metadata)
+ (spit (io/file out-file) metadata)) ; (2)
At (1)
, we modified our script to parse the first command line argument as EDN and then bind the out-file
key.
At (2)
we then write the map to that path.
To supply that first argument, we need to slightly change the way bb
is called:
--- babashka.bzl.1
+++ babashka.bzl.2
@@ -6,6 +6,11 @@
executable = ctx.executable._bb,
arguments = [
ctx.file.script.path,
+ """{{
+ :out-file "{out_file}"
+ }}""".format(
+ out_file = ctx.outputs.out.path,
+ ),
],
)
This is how we create a Clojure map ({:some-key "its value"}
) in Starlark and pass it as a command line argument: the triple-quotes are necessary since we want to wrap the file path in a single pair of quotes. The doule braces turn into single braces, whereas {out-file}
gets replaced with the substitution that we specify in the call to format.
Keep in mind that, despite this being the second argument to bb
, it's the first argument that the script will see.
Now, building the target succeeds:
Bazel helpfully prints the path, relative to the workspace root, where the output file can be found at (1)
.
As we see at (2)
, the output is consistent with the version that we downloaded in the WORKSPACE
file.
Including more dependencies
It's not unusual to have have additional files as dependencies to a build step. To support this in our rule, we have to add an attribute:
We add the data
attribute as a list of targets and files and include it in the EDN map that is the first argument as the :data
key, formatted as a Clojure vector. It's important to add the files to the run
action inputs, otherwise there will be no files visible to the script at the paths under the :data
key!
To test this attribute, we can create a dummy file and add it to the data
argument of babashka_metadata
and modify the script to read the data argument and include the contents of the files in its output:
--- BUILD.bazel.0
+++ BUILD.bazel.1
@@ -5,4 +5,7 @@
name = "babashka_metadata",
script = ":get_babashka_metadata.clj",
out = "bb-metadata.edn",
+ data = [
+ ":DUMMY",
+ ],
)
--- get_babashka_metadata.clj.1
+++ get_babashka_metadata.clj.2
@@ -4,7 +4,8 @@
[clojure.java.io :as io]
[clojure.pprint :refer [pprint]]))
-(let [{:keys [out-file]} (edn/read-string (first *command-line-args*))
- metadata {:version (System/getProperty "babashka.version")}]
+(let [{:keys [data out-file]} (edn/read-string (first *command-line-args*))
+ metadata {:version (System/getProperty "babashka.version")
+ :data (mapv slurp data)}]
(pprint metadata)
(spit (io/file out-file) metadata))
If we re-run the build now, we can see the updated output:
Performing side-effects: bb_binary
Next, we want to be able to integrate tasks that occur during deployment into Bazel, and we want to write the logic for those in Clojure as well.
Examples of such tasks could be uploading an artifact to a remote or sending a notification on Slack after a successful deployment.
To support this, we will add another rule:
--- babashka.bzl.3
+++ babashka.bzl.4
@@ -30,3 +30,40 @@
),
},
)
+
+def _bb_binary_impl(ctx):
+ executable = ctx.actions.declare_file(ctx.label.name) # (1)
+ ctx.actions.write( # (2)
+ output = executable,
+ is_executable = True,
+ content = """
+ set -x
+ exec {bb} {src} {arguments} "$@" # (3)
+ """.format(
+ bb = ctx.executable._bb.path,
+ src = ctx.file.src.path,
+ arguments = " ".join(ctx.attr.arguments),
+ ),
+ )
+
+ return DefaultInfo(
+ executable = executable, # (4)
+ )
+
+bb_binary = rule(
+ implementation = _bb_binary_impl,
+ executable = True, # (5)
+ attrs = {
+ "src": attr.label(
+ allow_single_file = [".clj"],
+ mandatory = True,
+ ),
+ "arguments": attr.string_list(),
+ "_bb": attr.label(
+ executable = True,
+ allow_single_file = True,
+ cfg = "exec",
+ default = "@babashka//:bb",
+ ),
+ },
+)
The basic structure of this rule should look familiar now, however, there are some differences worth calling out:
We want targets created by this rule to be executable via bazel run
. This mean we are creating an executable rule, as seen at (5)
.
Before we can run an executable target, it needs to be built. This is what we are actually doing in the implementation function, by declaring a file at (1)
and then creating that file by writing to it at (2)
. This file will simply contain shell commands, the operative one being exec
(at (3)
) to start the actual command. Note that the command won't run at this stage of the build, it is simply being written to a file in the build sandbox.
It is expected for an executable rule to return a DefaultInfo
provider with the executable
field set to the file that will be executed, as seen at (4)
.
Take note of the trailing "$@"
, this enables injection of trailing arguments from the Bazel invocation (e.g. bazel run //:binary_target -- foo bar
).
To inspect the result of the expansion, we can define a target and build it:
--- BUILD.bazel.1
+++ BUILD.bazel.2
@@ -1,5 +1,5 @@
# BUILD.bazel
-load("//:babashka.bzl", "bb_genrule")
+load("//:babashka.bzl", "bb_genrule", "bb_binary")
bb_genrule(
name = "babashka_metadata",
@@ -8,4 +8,9 @@
data = [
":DUMMY",
],
+)
+
+bb_binary(
+ name = "say_hello",
+ src = ":hello.clj",
)
$ bazel build //:say_hello
INFO: Analyzed target //:say_hello (5 packages loaded, 8 targets configured).
INFO: Found 1 target...
Target //:say_hello up-to-date:
bazel-bin/say_hello
INFO: Elapsed time: 2.320s, Critical Path: 0.02s
INFO: 2 processes: 2 internal.
INFO: Build completed successfully, 2 total actions
$ cat bazel-bin/say_hello
set -x
exec external/babashka/bb hello.clj "$@"
%
However, when we try to run it, there is an issue:
As we see on the last line, some files can't be found. To fix this, we need to add the runfiles
attribute to the DefaultInfo
provider that we return:
--- babashka.bzl.4
+++ babashka.bzl.5
@@ -48,6 +48,7 @@
return DefaultInfo(
executable = executable,
+ runfiles = ctx.runfiles(files = [ctx.executable._bb, ctx.file.src]),
)
bb_binary = rule(
Just because a file is available at build-time doesn't mean it will be available at run-time. To make it available, we add it to the runfiles
attribute of the provider, which expects a specific datastructure that wraps the files to make available.
For this simple form of the bb_binary
rule, the only files needed at runtime are the bb
executable and the script in the src
attribute. If we added a data
attribute similar to what we did for bb_genrule
, we'd add those files to the runfiles as well.
With this inconspicious change in place, our rule now works correctly:
$ cat hello.clj
(println "hello there!")
$ bazel run //:say_hello
INFO: Analyzed target //:say_hello (5 packages loaded, 8 targets configured).
INFO: Found 1 target...
Target //:say_hello up-to-date:
bazel-bin/say_hello
INFO: Elapsed time: 2.320s, Critical Path: 0.02s
INFO: 2 processes: 2 internal.
INFO: Build completed successfully, 2 total actions
++ exec external/babashka/bb hello.clj
hello there!
Checking artifacts: bb_test
The last use-case we want to cover is that of writing tests in Clojure and executing them via Bazel. Test rules in Bazel are little more than executable rules that have a special meaning associated with their exit status.They also run under even stricter sandboxing than executable rules, in an effort to improve test determinism.
--- babashka.bzl.5
+++ babashka.bzl.6
@@ -51,20 +51,28 @@
runfiles = ctx.runfiles(files = [ctx.executable._bb, ctx.file.src]),
)
+EXEC_ATTRS = {
+ "src": attr.label(
+ allow_single_file = [".clj"],
+ mandatory = True,
+ ),
+ "arguments": attr.string_list(),
+ "_bb": attr.label(
+ executable = True,
+ allow_single_file = True,
+ cfg = "exec",
+ default = "@babashka//:bb",
+ ),
+}
+
bb_binary = rule(
implementation = _bb_binary_impl,
executable = True,
- attrs = {
- "src": attr.label(
- allow_single_file = [".clj"],
- mandatory = True,
- ),
- "arguments": attr.string_list(),
- "_bb": attr.label(
- executable = True,
- allow_single_file = True,
- cfg = "exec",
- default = "@babashka//:bb",
- ),
- },
+ attrs = EXEC_ATTRS,
+)
+
+bb_test = rule(
+ implementation = _bb_binary_impl,
+ test = True,
+ attrs = EXEC_ATTRS,
)
Our bb_test
rule is in fact so similar to the bb_binary
rule that we can use the same attributes and implementation function.
We can use this rule to write a test that all our Clojure files are named in the atavistic-seeming naming convention inherited from its origins on the JVM: that source file paths and names may not contain dashes.
This script receives the files in the current directory as command line arguments. It first removes all files that aren't Clojure files at (1)
and then checks if any of those remaining have paths that contain any but the allowed characters at (2)
.
If any are found, it prints their names to stdout at (3)
before indicating with a non-zero exit status, indicating failure of the test at (4)
.
--- BUILD.bazel.2
+++ BUILD.bazel.3
@@ -1,5 +1,5 @@
# BUILD.bazel
-load("//:babashka.bzl", "bb_genrule", "bb_binary")
+load("//:babashka.bzl", "bb_genrule", "bb_binary", "bb_test")
bb_genrule(
name = "babashka_metadata",
@@ -13,4 +13,10 @@
bb_binary(
name = "say_hello",
src = ":hello.clj",
+)
+
+bb_test(
+ name = "check_filenames",
+ src = ":check_filenames.clj",
+ arguments = glob(["*", ".*"]), # (1)
)
By using the glob
function (at (1)
), we inject all files in the directory as arguments to the test.
If we create a file that violates our criteria for a valid filename and run the test, we can see the test fail, and the report stating which file caused it to:
$ touch foo-bar.clj
$ bazel test --test_output=errors //:check_filenames
INFO: Build option --test_sharding_strategy has changed, discarding analysis cache.
INFO: Analyzed target //:check_filenames (0 packages loaded, 281 targets configured).
INFO: Found 1 test target...
FAIL: //:check_filenames (see $LONGPATH/testlogs/check_filenames/test.log)
INFO: From Testing //:check_filenames:
==================== Test output for //:check_filenames:
++ exec external/babashka/bb check_filenames.clj .gitignore BUILD.bazel DUMMY WORKSPACE babashka.bzl check_filenames.clj foo-bar.clj get_babashka_metadata.clj hello.clj
Files with invalid paths:
foo-bar.clj
================================================================================
Target //:check_filenames up-to-date:
bazel-bin/check_filenames
INFO: Elapsed time: 0.327s, Critical Path: 0.11s
INFO: 2 processes: 2 linux-sandbox.
INFO: Build completed, 1 test FAILED, 2 total actions
//:check_filenames FAILED in 0.1s
$LONGPATH/testlogs/check_filenames/test.log
INFO: Build completed, 1 test FAILED, 2 total actions
After we delete the offending file, the test succeeds:
$ rm -f foo-bar.clj
$ bazel test --test_output=errors //:check_filenames
INFO: Analyzed target //:check_filenames (4 packages loaded, 7 targets configured).
INFO: Found 1 test target...
Target //:check_filenames up-to-date:
bazel-bin/check_filenames
INFO: Elapsed time: 0.264s, Critical Path: 0.10s
INFO: 3 processes: 1 internal, 2 linux-sandbox.
INFO: Build completed successfully, 3 total actions
//:check_filenames PASSED in 0.1s
Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 3 total actions
Further steps for this rule might be to also accept a data
attribute as well as implementing a runner script so that users may write tests in the usual clojure.test
style.
From Cupertino, with Love: bb_toolchain
The last issue that remains is that this set of rules is only usable on Linux machines. It doesn't work on macOS devices, and let's not even mention other operating systems!
To remedy this, we can take advantage of a feature built into Bazel: toolchains.
Toolchains address the problem of providing different versions for some of our tools, depending on what platform we are on.They can also be used in more sophisticated ways to enable cross-compilation.
Defining a new toolchain involves implementing a new rule for that toolchain, creating targets with that rule for each platform we want to support, and then registering those in our WORKSPACE
file.
Implementing the toolchain rule
--- babashka.bzl.6
+++ babashka.bzl.7
@@ -1,4 +1,20 @@
# babashka.bzl
+def _bb_toolchain(ctx):
+ return platform_common.ToolchainInfo(
+ bb = ctx.executable.bb, # (1)
+ )
+
+bb_toolchain = rule(
+ implementation = _bb_toolchain,
+ attrs = {
+ "bb": attr.label( # (2)
+ executable = True,
+ allow_single_file = True,
+ cfg = "exec",
+ ),
+ },
+)
+
def _bb_genrule_impl(ctx):
ctx.actions.run(
inputs = [ctx.file.script] + ctx.files.data,
The implementation just returns a platform_common.ToolchainInfo
provider. This provider accepts arbitrary fields, in our case we only have one for the bb
executable.
Our rule interface therefore only has one attribute, and it should look very similar to the _bb
attribute of the rules we implemented already, save for the default
value.
Installing the toolchain
Before we can install the toolchain, we should make sure the executable will be available for all the platforms that we want to support:
Now that we have the required files, we can instantiate the toolchain rule twice:
--- BUILD.bazel.3
+++ BUILD.bazel.4
@@ -1,5 +1,35 @@
# BUILD.bazel
-load("//:babashka.bzl", "bb_genrule", "bb_binary", "bb_test")
+load("//:babashka.bzl", "bb_genrule", "bb_binary", "bb_test", "bb_toolchain")
+
+toolchain_type(name = "babashka_toolchain") # (1)
+
+bb_toolchain( # (2)
+ name = "bb_linux",
+ bb = "@babashka-linux//:bb",
+)
+
+toolchain(
+ name = "bb_linux_toolchain",
+ exec_compatible_with = [
+ "@platforms//os:linux", # (3)
+ ],
+ toolchain = ":bb_linux",
+ toolchain_type = ":babashka_toolchain", # (4)
+)
+
+bb_toolchain(
+ name = "bb_macos",
+ bb = "@babashka-macos//:bb",
+)
+
+toolchain(
+ name = "bb_macos_toolchain",
+ exec_compatible_with = [
+ "@platforms//os:macos",
+ ],
+ toolchain = ":bb_macos",
+ toolchain_type = ":babashka_toolchain",
+)
bb_genrule(
name = "babashka_metadata",
To group the two instances, we define a new toolchain type at (1)
.
After instantiating the bb_toolchain
rule, we need to also call the built-in toolchain
rule to indicate compatibility (at (3)
) and type (at (4)
).
With the toolchains created, we need to register them for use in the WORKSPACE
file:
Note that the toolchain name that we specify at (1)
is that of the toolchain
rule, not of the bb_toolchain
rule!
Using the toolchain
Lastly, we need to change our existing rules to take advantage of the new toolchain:
--- babashka.bzl.7
+++ babashka.bzl.8
@@ -16,10 +16,11 @@
)
def _bb_genrule_impl(ctx):
+ toolchain = ctx.toolchains["//:babashka_toolchain"] # (1)
ctx.actions.run(
inputs = [ctx.file.script] + ctx.files.data,
outputs = [ctx.outputs.out],
- executable = ctx.executable._bb,
+ executable = toolchain.bb, # (2)
arguments = [
ctx.file.script.path,
"""{{
@@ -38,16 +39,12 @@
"script": attr.label(allow_single_file = [".clj"], mandatory = True),
"out": attr.output(mandatory = True),
"data": attr.label_list(allow_files = True),
- "_bb": attr.label(
- executable = True,
- allow_single_file = True,
- cfg = "exec",
- default = "@babashka//:bb",
- ),
},
+ toolchains = ["//:babashka_toolchain"], # (3)
)
def _bb_binary_impl(ctx):
+ toolchain = ctx.toolchains["//:babashka_toolchain"]
executable = ctx.actions.declare_file(ctx.label.name)
ctx.actions.write(
output = executable,
@@ -56,7 +53,7 @@
set -x
exec {bb} {src} {arguments} "$@"
""".format(
- bb = ctx.executable._bb.path,
+ bb = toolchain.bb.path,
src = ctx.file.src.path,
arguments = " ".join(ctx.attr.arguments),
),
@@ -64,7 +61,7 @@
return DefaultInfo(
executable = executable,
- runfiles = ctx.runfiles(files = [ctx.executable._bb, ctx.file.src]),
+ runfiles = ctx.runfiles(files = [toolchain.bb, ctx.file.src]),
)
EXEC_ATTRS = {
@@ -73,22 +70,18 @@
mandatory = True,
),
"arguments": attr.string_list(),
- "_bb": attr.label(
- executable = True,
- allow_single_file = True,
- cfg = "exec",
- default = "@babashka//:bb",
- ),
}
bb_binary = rule(
implementation = _bb_binary_impl,
executable = True,
attrs = EXEC_ATTRS,
+ toolchains = ["//:babashka_toolchain"],
)
bb_test = rule(
implementation = _bb_binary_impl,
test = True,
attrs = EXEC_ATTRS,
+ toolchains = ["//:babashka_toolchain"],
)
This involves adding the toolchain to the rule (at (3)
) and then looking it up inside the implementation (at (1)
) to replace all references to what was previously the _bb
attribute.
This also means we can remove the _bb
attribute completely.
If we did everything correctly our build should work as previously, but now our comrades on macOS can benefit from the rules as well. I don't have a Mac, so you'll just have to believe me that it works!
$ bazel build //:babashka_metadata
INFO: Analyzed target //:babashka_metadata (0 packages loaded, 3 targets configured).
INFO: Found 1 target...
Target //:babashka_metadata up-to-date:
bazel-bin/bb-metadata.edn
INFO: Elapsed time: 0.127s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
$ bazel run //:say_hello
INFO: Analyzed target //:say_hello (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:say_hello up-to-date:
bazel-bin/say_hello
INFO: Elapsed time: 0.069s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
++ exec external/babashka-linux/bb hello.clj
hello there!
Epilogue
In this form, this set of rules is already quite useful and can cover a varienty of tasks in a CICD pipeline. Some things could be done to make them even more useful:
- documenting all rule attributes
- adding tests for rules
- adding a flag to
bb_binary
to run the executable outside the sandbox, on the actual repository directory, as can be done using some shell variables that are available to executable rules.This is not possible for tests however.