dumb little games for weekend tinkering
I have yet to do anything smart with it, though.
This book assumes that you already know how to program; I’m not going to waste your time explaining what a for
loop is. Specifically this book assumes that you already know how to program in JavaScript, because it’s better to have something concrete to diff against, and I’m willing to bet that even if it isn’t your first language, you know enough JavaScript to be able to follow along. And if you don’t, then, well, you’ll probably still be fine. All programming languages are basically the same.
With that in mind I’m going to emphasize what makes Janet different early on. I’ll cover the whole language, but I’ll talk about macros and images and PEGs before I talk about, like, if
statements. Is that a good way for a book to present information? I don’t know. We’re going to find out together.
This book is a real book, as previously established. But it is also undeniably a website, at this particular moment. As such it has the full power of cyberspace at its disposal, and there are some features you should be aware of before we get started.
The first is that this book contains a repl, and you can summon it whenever you’d like by pressing the escape key on your keyboard. The book will then start downloading like a megabyte of JavaScript and WebAssembly, and once it’s done you will be able to try out Janet right here in the comfort of your browser. No need to install anything; no need to leave the comfort of this book website if you’d like to test something out.
The repl is not just a repl, though. It is also a portal into conversation with me, the author. You can use the repl to report typos or factual errors, ask questions, or express confusion. I won’t be able to respond in the repl, but if you include some kind of contact information in your reports I will make an effort to follow up with you. Here, why don’t you try it now? Open up the repl and type something like this:
(say "hey nice book")
Fun, right?
Oh that’s me.
I’m just a fan of Janet; I am not affiliated with the language in any way. I have no real qualifications to be writing a book about it, and nothing that you read here should be considered authoritative, idiomatic, or educational.
Alright, let’s get this over with.
(print "hello world")
Janet has parentheses. Okay? That’s all I’m going to say about it. There are some parentheses here. Maybe more than you’re used to. Maybe more than you’re comfortable with. I’m not going to try to convince you that parentheses are somehow morally superior to curly braces, or waste your time claiming that really it’s the same number of parentheses and they’re just shifted over a little. In fact I’m going to try to talk as little as possible about the parentheses, because they just aren’t very interesting at this stage. They’ll get interesting, once we start talking about macros, but right now the conversation can’t really progress beyond “Ew, I don’t like them.” I know you don’t. And if you can’t get past that, that’s fine. If you draw the line at parentheses, this book comes with a full money-back guarantee.
I didn’t really even want to bring up the parentheses, but I thought it would be weird if I just blew past them. Like, we’re all thinking about them, right? And now you’re just wondering when I’m going to use the L-word. You want me to use it so you can go write a long screed about how Janet isn’t a real one. But I’m not going to give you the satisfaction. I’m not going to use that word until Chapter Fourteen. By which point you’ll be far too tired of all these long-winded tangents to remember what you were upset about in the first place.
What were we talking about? Oh yeah, hello world.
(print "hello world")
As much as I’d like to belabor the very idea of “prefix notation” and talk about function application and special forms and whatnot, we’re already running behind so I’m going to have to skip ahead a little bit.
(defmacro each-reverse [identifier list & body]
(with-syms [$list $i]
~(let [,$list ,list]
(var ,$i (- (,length ,$list) 1))
(while (>= ,$i 0)
(def ,identifier (in ,$list ,$i))
,;body
(-- ,$i)))))
(defn rewrite-verbose-assignments [tagged-lines]
(def result @[])
(var make-verbose false)
(each-reverse line tagged-lines
(match line
[:assignment identifier contents]
(if make-verbose
(array/push result [:verbose-assignment identifier contents])
(array/push result [:assignment contents]))
(array/push result line))
(match line
[:output _] (set make-verbose true)
_ (set make-verbose false)))
(reverse! result)
result)
Okay great. I think that’s a totally reasonable second code sample ever for you to look at.
Just take it in for a moment; don’t worry too much about what it’s doing. Try to notice a few things at a high level:
There’s just a nightmare explosion of punctuation right at the beginning there.
There are a lot of parentheses, but there are also a lot of square brackets.
There are two different ways to declare local variables: def
and var
.
array/push
looks like some kind of namespace.
We defined our own control structure.
We defined a function with no explicit return
.
There’s more we could say about this example — we could try to figure out what the heck it’s supposed to be doing, for instance — but it’s going to be hard to talk about things before we establish a baseline.
Which brings us back, finally, to the point of this chapter. The values of Janet, the nouns of Janet, the things that comprise a Janet program — the primitive data types and the built-in collections that we will wield to create our programs. And once we have that foundation, we can spend the rest of the book talking about Janet’s verbs.
So here’s what we’re working with:
repl:1:> 123
123
repl:2:> 1e6
1000000
repl:3:> 1_000
1000
repl:4:> -0x10
-16
repl:5:> 10.5
10.5
Like JavaScript, all numbers in Janet are 64-bit IEEE-754 double-precision floats. Janet doesn’t have a “numerical tower.”
repl:1:> true
true
repl:2:> false
false
repl:3:> maybe
just kidding
Like JavaScript, Janet has a concept of “falsiness.” But while JavaScript’s falsiness rules are a common source of wats, Janet’s rules are much simpler: false
and nil
are falsy; everything else is truthy.
repl:1:> (truthy? 0)
true
repl:2:> (truthy? [])
true
repl:3:> (truthy? "")
true
repl:4:> (truthy? nil)
false
repl:5:> (truthy? false)
false
repl:1:> nil
nil
nil
is Janet’s version of JavaScript’s undefined
. It’s the thing that functions return if they don’t return anything else; it’s the value that you get when you look up a key that doesn’t exist.
Janet does not have an equivalent of JavaScript’s null
— there is no special value of type object that is not actually an object in any meaningful sense of the word. nil
, like undefined
, is its own type.
Note that Janet’s nil
is not the empty list. If you don’t understand why I’m calling that out here, you can safely ignore this paragraph.
repl:1:> "hello"
"hello"
repl:2:> `"backticks"`
"\"backticks\""
repl:3:> ``"many`backticks"``
"\"many`backticks\""
Strings come in two flavors: mutable and immutable. Mutable strings are called “buffers” and start with @
, while immutable strings are called “strings.”
repl:1:> @"this is a buffer"
@"this is a buffer"
Janet strings are plain arrays of bytes. They are not encoding-aware, and there is no native Unicode support in the language for indexing or iterating over “characters.” There are some functions that interpret strings and buffers as ASCII-encoded characters, but they are appropriately named string/ascii-upper
and string/ascii-lower
.
There are external libraries for decoding UTF-8, but not for any other character encoding that I am aware of. And as far as I know there is no full-service Unicode library in Janet — if you need to count the number of extended grapheme clusters in a string, you will have to write some bindings yourself.
repl:1:> [1 "two" 3]
(1 "two" 3)
repl:2:> ["one" [2] "three"]
("one" (2) "three")
repl:3:> @[1 "two" 3]
@[1 "two" 3]
Vectors come in two flavors: mutable and immutable. Mutable vectors are called “arrays” and start with @
, while immutable vectors are called “tuples.” If you are used to other languages with tuples, don’t be fooled: Janet’s tuples do not behave like tuples in any other language. They are iterable, random access immutable vectors.
Also, it’s worth noting that tuples are not fancy immutable vectors, like you might find in Clojure. If you want to append something to a tuple, you have to create an entirely new copy of it first. We’ll talk more about the differences between mutable and immutable values in a bit.
repl:1:> {:hello "world"}
{:hello "world"}
repl:2:> @{"hello" "world" :x 1 :a 2}
@{"hello" "world" :a 2 :x 1}
Once again, mutable and immutable flavors. Mutable tables are called “tables” and start with an @
, while immutable tables are called “structs.” Yeah, I know. Right there with you. Modifying a struct, like modifying a tuple, requires creating a shallow copy first.
Tables are a lot like JavaScript objects, except the keys don’t have to be strings, newly created tables and structs don’t have a default “root class” prototype, and they cannot store nil
as either keys or values.
This makes some sense if you think of nil
as the undefined value: there is no ambiguity between “key does not exist” and “key exists but its value is undefined.” We’ll talk more about this in Chapter Eight.
repl:1:> :hello
:hello
repl:2:> (keyword "world")
:world
Generally you use keywords as the keys or field names in structs and tables. They’re also handy whenever you need to pass around an immutable named literal, like a tag or an enum.
JavaScript doesn’t really have an analog for keywords, although you might be familiar with the idea from Ruby, which calls them “symbols.” In JavaScript you just pass around short strings, which is functionally the same thing. The difference in Janet is that keywords are interned and strings are not.
repl:1:> 'hello
hello
repl:2:> (symbol "hello")
hello
Symbols are physically exactly the same as keywords. They share the same interning table; the only difference between a keyword and a symbol is their type.
Logically, though, symbols don’t represent small constant strings like enums. Symbols represent identifiers in your program. You’ll use symbols a lot when you’re writing macros, and basically nowhere else. I mean, you could use them elsewhere, if you really wanted to, but it’s usually more convenient to stick with keywords.
repl:1:> (fn [x] (+ x 1))
<function 0x600000A8C9E0>
Janet functions can be variadic, and support optional and named arguments. fn
creates an anonymous function, but you can also use defn
as a shorthand for (def name (fn ...))
.
repl:1:> (defn sum [& args] (+ ;args))
<function sum>
repl:2:> (sum 1 2 3)
6
&
in a parameter list makes the function variadic, and (+ ;args)
is how you call a function with a variable number of arguments — ;
is like JavaScript’s ...
. As you can see, the function called +
is already variadic, so there’s no actual reason to write a variadic sum
like this. But it’s just an example.
repl:1:> (defn incr [x &opt n] (default n 1) (+ x n))
<function incr>
repl:2:> (incr 10)
11
repl:3:> (incr 10 5)
15
&opt
makes all following arguments optional, and &named
makes, well, named parameters:
repl:1:> (defn incr [x &named by] (+ x by))
<function incr>
repl:2:> (incr 10 :by 5)
15
Note, however, that when we call a function, named arguments must come after any positional arguments.
repl:3:> (incr :by 5 10)
error: could not find method :+ for :by, or :r+ for nil
in incr [repl] on line 10, column 26
in _thunk [repl] (tailcall) on line 12, column 1
Because :by
is, after all, a valid argument to pass positionally.
repl:1:> (fiber/new (fn [] (yield 0)))
<fiber 0x600003C10150>
Fibers are powerful control flow primitives, and it’s hard to give a pithy definition for them. Janet uses fibers to implement exception handling, generators, dynamic variables, early return, async
/await
-style concurrency, and coroutines. Among other things.
One very incomplete but perhaps useful intuition is that a fiber is a function that can be paused and resumed later. Except it’s not a function; it’s actually a full call stack. And it’s not always resumable: you can also stop them. Maybe this is just confusing. You know what? We’re going to spend an entire chapter talking about fibers together later. Maybe I shouldn’t try to explain them poorly before then.
Alright. We did it.
Those are all the values in Janet.
At least, I think those are all the values.
I find it really comforting to have this bird’s eye survey of the Janet noun landscape, but it only really comforts me if I can see all the way to the shoreline. And so far all I’ve done is list out a bunch of types. Did I get all of them? Half of them? Or have I only just scratched the surface?
Well, one nice thing about Janet is that it’s distributed as a single .h
/.c
file pair, so it’s very easy to look in the source to check. So let’s do that.
We can download the latest amalgamated build from the Janet releases page, and grep for “type” until we find something plausible…
typedef enum JanetType {
JANET_NUMBER, // [x]
JANET_NIL, // [x]
JANET_BOOLEAN, // [x]
JANET_FIBER, // [x]
JANET_STRING, // [x]
JANET_SYMBOL, // [x]
JANET_KEYWORD, // [x]
JANET_ARRAY, // [x]
JANET_TUPLE, // [x]
JANET_TABLE, // [x]
JANET_STRUCT, // [x]
JANET_BUFFER, // [x]
JANET_FUNCTION, // [x]
JANET_CFUNCTION, // [ ]
JANET_ABSTRACT, // [ ]
JANET_POINTER // [ ]
} JanetType;
Okay. I did pretty good. JANET_CFUNCTION
is basically an implementation detail; a cfunction
looks and acts like a regular function in almost every respect, except that it’s implemented in C, not Janet.
repl:1:> (type pos?)
:function
repl:2:> (type int?)
:cfunction
We’ll talk more about cfunction
s in Chapter Nine.
JANET_POINTER
is useful for interacting with C programs; we’re not actually going to talk about it in this book but it’s exactly the thing that you think it is. JANET_ABSTRACT
is pretty important, though, so we should probably talk about it now.
A JANET_ABSTRACT
type is a type implemented in C code that you can interact with like any other Janet value. We’ll learn how to write our own in Chapter Nine, and you’ll get a chance to see how flexible they are: you could implement anything as an abstract type, and in fact the Janet standard library does exactly that.
This means that there are a few more types in the Janet standard library than the JanetType
enum implies, and for the sake of completeness I will list them here:
core/rng
(pseudorandom number generator)core/socket-address
core/process
core/parser
(specifically, the parser that Janet uses to parse Janet code)core/peg
(parsing expression grammar)core/stream
and core/channel
(concurrent communication primitives)core/lock
and core/rwlock
(multithreading stuff)core/ffi-signature
, core/ffi-struct
, and core/ffi-native
, which are parts of an experimental new FFI module that this book will not talk aboutcore/s64
and core/u64
(boxed 64-bit integer types)And those are all of the types that Janet gives you.
I mean, for one definition of “type.” There are some instances of “struct with a particular documented shape” in the standard library, and you could call those distinct types if you wanted to. But you have now seen every type that exists at a physical, mechanical level. You’ve seen all the building blocks; everything else is just a permutation of these values.
We’ll talk more about how these types work and what we can do with them in future chapters. But there is one thing that is so important and so primitive that we’re going to talk about it right now: equality.
Janet, unlike some languages, does not have separate eq
and eql
and equal
and equalp
functions. Nor does it have ==
and ===
and Object.is
. Janet has one real notion of equality: =
.
repl:1:> (= (+ 1 1) 2)
true
But =
means something different depending on whether you’re asking about a mutable value, like a table or an array, or an immutable value, like a number or a keyword or a tuple.
data type | immutable | mutable |
---|---|---|
atom | number, keyword, symbol, nil, boolean | |
closure | function | |
coroutine | fiber | |
byte array | string | buffer |
random-access list | tuple | array |
hash table | struct | table |
Mutable values are only equal to themselves; you might say that they have “reference semantics:“
repl:1:> (= @[1 2 3] @[1 2 3])
false
repl:2:> (def x @[1 2 3])
@[1 2 3]
repl:3:> (= x x)
true
While immutable values have “value semantics:“
repl:1:> (= [1 2 3] [1 2 3])
true
This means that you can use immutable values as the keys of tables or structs without worrying about the specific instance you have a handle on:
repl:1:> (def corners {[0 0] :bottom-left [1 1] :top-right})
{(0 0) :bottom-left (1 1) :top-right}
repl:2:> (get corners [1 1])
:top-right
While mutable keys must be the exact identical value:
repl:1:> (def zero-zero @[0 0])
@[0 0]
repl:2:> (def corners {zero-zero :bottom-left @[1 1] :top-right})
{@[1 1] :top-right @[0 0] :bottom-left}
repl:3:> (get corners @[0 0])
nil
repl:4:> (get corners zero-zero)
:bottom-left
Janet also has a function called deep=
, which performs a “structural equality” check for reference types, as well as a function called compare=
, which can invoke a custom equality method. But these are not “real” equality functions, in the sense that Janet’s built-in associative data structures — structs and tables — only ever use =
equality.
But you can use deep=
to compare two mutable values in your own code:
repl:1:> (deep= @[1 @"two" @{:three 3}] @[1 @"two" @{:three 3}])
true
Although it’s worth noting that values of different types are never deep-equal to one another, even if their elements are identical:
repl:1:> (= [1 2 3] @[1 2 3])
false
repl:2:> (deep= [1 2 3] @[1 2 3])
false
Abstract types can go either way — abstract type just means “implemented in C code,” and it’s possible to implement value-style or reference-style abstract types in C code. We’ll talk about how to do that in Chapter Nine.
Finally, I think that it’s worth saying again: Janet’s immutable values are simple immutable values. They are not fancy immutable values like you might find in a language like Clojure. There is no structural sharing here; if you want to append an element to an immutable tuple you have to make a full copy first.
That doesn’t mean that you shouldn’t append things to tuples! But it does mean that you should be aware of the trade-off, and probably prefer mutable structures if you’re working with large amounts of data.
Internally, though, immutable types are still passed by reference. When you return an immutable struct from a function, you’re actually returning a pointer to an immutable struct — you don’t have to make copies of them to pass them around “on the stack.”
Alright, we got the basics out of the way. Now we can get to the good stuff.
In this chapter we’re going to talk about compile-time programming and images. JavaScript has no analog for images, nor does it have any sort of “compilation” step, but I’m sure you’re familiar with the concept. Er, the concept of compilation, that is. I hope you’re not already familiar with images, because I want to be the first to tell you about them.
But in order to understand images, we first have to understand the life cycle of a Janet program. A Janet program like this one:
(def one 1)
(def two 2)
(def three (+ one two))
(print one)
(defn main [&]
(print three))
(print two)
If you copy that into a file and run it through the Janet interpreter, you will see the following output:
janet example.janet
1
2
3
Hopefully nothing too surprising. It ran through the top-level statements, then went back and executed our main
function.
But you can also compile Janet programs. Usually this means compiling them all the way down to native code using a tool called jpm
, which is Janet’s version of npm
or cargo
or whatever. But in order to produce native code, jpm
actually:
.c
file that also links in the Janet runtime and interpreter..c
file using your system’s C compiler.But I don’t want to talk about jpm
yet, and Janet can only natively do the first thing, so we’re going to be producing and running these “images” directly. We’ll talk about how to get a native binary in Chapter Seven.
So what is an image? Well, it’s easier if I just show you. Let’s make one:
janet -c example.janet example.jimage
1
2
Whoa, look! It executed our top-level statements, but it didn’t call our main
. It also produced a file called example.jimage
, which we can pass back to Janet to run:
janet -i example.jimage
3
Hey! There’s our main
function. And it’s just our main
function — the top-level print
statements didn’t run again. But it still knew how to print 3
, which was a value that we calculated in a top-level statement. Huh.
So top-level statements execute at “compile time”… but we can still refer to compile time values at “runtime.” Neat.
Does that work for any values? Let’s try something more complicated, with mutable structures and shared references:
(def skadi @{:name "Skadi" :type "German Shepherd"})
(def odin @{:name "Odin" :type "German Shepherd"})
(def people
[{:name "ian" :dogs [skadi odin]}
{:name "kelsey" :dogs [skadi odin]}
{:name "jeffrey" :dogs []}])
(pp people)
(defn main [&]
(set (odin :type)
"Well mostly German Shepherd but he's mixed with some collie so his ears are half-flops")
(pp people))
pp
is supposed to stand for “pretty print,” although it doesn’t really, so I’ll be manually reformatting the output a bit. If we compile this program, we’ll see how this list looked during compilation:
janet -c dogs.janet dogs.jimage
({:dogs (@{:name "Skadi" :type "German Shepherd"}
@{:name "Odin" :type "German Shepherd"})
:name "ian"}
{:dogs (@{:name "Skadi" :type "German Shepherd"}
@{:name "Odin" :type "German Shepherd"})
:name "kelsey"}
{:dogs () :name "jeffrey"})
And then if we run it, we can see how it looks after we mutate Odin:
janet -i dogs.jimage
({:dogs (@{:name "Skadi" :type "German Shepherd"}
@{:name "Odin" :type "Well mostly German Shepherd but he's mixed with some collie so his ears are half-flops"})
:name "ian"}
{:dogs (@{:name "Skadi" :type "German Shepherd"}
@{:name "Odin" :type "Well mostly German Shepherd but he's mixed with some collie so his ears are half-flops"})
:name "kelsey"}
{:dogs () :name "jeffrey"})
So let’s notice a few things about this:
When you print tuples, they’re wrapped in parentheses, even though you define them with square brackets and they should print with square brackets.
Tables and structs do not preserve the order of their keys.
References are preserved between compile time and runtime.
I wanted to point that last one out explicitly, because you can imagine a dumber version of this where that is not the case. Like, if you’re JavaScript, and you wanted to allow programs to refer to values created at compile time, one natural way to do that would be serialize those values into JSON and then read them back at program startup.
But Janet is doing something fancier than that. Janet is still serializing values to disk and reading them back, but the format it uses is able to express things like shared references and cyclic data structures and closures and the current state of a coroutine.
Janet calls this fancy serialization “marshaling,” as do many other languages, except for Python, which calls it “pickling.” This fact is not really relevant to this book at all; I just think “pickling” is a really whimsical term.
So let’s think about how this might work.
Perhaps when we compile a Janet program, we’re actually doing two things: there’s the “normal” compilation step, where we take high-level Janet code and turn it into lower-level bytecode that the Janet interpreter knows how to execute, just like a normal bytecode compiler. But then there’s also this second step, where we take the values that we computed at compile-time (which values?) and marshal them into bytes. And then an image is the combination of those two things. Is that right?
Well, no. Not really. Because these two steps are not actually separate: an image isn’t a “data” part plus a “code” part. It’s just a data part. As a matter of fact, the entire image consists of nothing more than a single marshaled value: our program’s environment.
”Environment” is a fancy word for scope, but in Janet it refers specifically to the top-level scope. It’s the table mapping symbols (like skadi
and main
) to values that we def
ined for them. And it is, itself, a first-class value! It is literally a Janet @{...}
table, and it is the “root” value that Janet serializes to form our image.
But some of the values in that environment table are functions. And of course functions are first-class values in Janet, so when we marshal the table we have to marshal those functions as well.
And how do you marshal a function? Well, you’ve probably guessed it already: as bytecode that represents the function’s implementation.
So an “image” is a serialized environment table that probably includes a key called main
whose value is a function. And when we “resume” or “execute” the image with janet -i
, Janet will first deserialize this environment, then look up the symbol called main
, and then execute that function.
Let’s make this a little more concrete. Show me the image:
repl:1:> (load-image (slurp "dogs.jimage"))
@{main @{:doc "(main)\n\n" :source-map ("dogs.janet" 11 1) :value <function main>} odin @{:source-map ("dogs.janet" 1 1) :value @{:name "Odin" :type "German Shepherd"}} people @{:source-map ("dogs.janet" 4 1) :value ({:dogs (@{:name "Skadi" :type "German Shepherd"} @{:name "Odin" :type "German Shepherd"}) :name "ian"} {:dogs (@{:name "Skadi" :type "German Shepherd"} @{:name "Odin" :type "German Shepherd"}) :name "kelsey"} {:dogs () :name "jeffrey"})} skadi @{:source-map ("dogs.janet" 2 1) :value @{:name "Skadi" :type "German Shepherd"}} :current-file "dogs.janet" :macro-lints @[] :source "dogs.janet"}
Alright, well, that’s a complete mess, so let me pretty-print it for you:
@{main @{:doc "(main)\n\n"
:source-map ("dogs.janet" 11 1)
:value <function main>}
odin @{:source-map ("dogs.janet" 1 1)
:value @{:name "Odin" :type "German Shepherd"}}
people @{:source-map ("dogs.janet" 4 1)
:value ({:dogs (@{:name "Skadi" :type "German Shepherd"} @{:name "Odin" :type "German Shepherd"}) :name "ian"}
{:dogs (@{:name "Skadi" :type "German Shepherd"} @{:name "Odin" :type "German Shepherd"}) :name "kelsey"}
{:dogs () :name "jeffrey"})}
skadi @{:source-map ("dogs.janet" 2 1) :value @{:name "Skadi" :type "German Shepherd"}}
:current-file "dogs.janet"
:macro-lints @[]
:source "dogs.janet"}
You can see that there’s a little bit more to the table than I let on — Janet stores some metadata about each binding, as well as some metadata about the environment itself.
But still, you can see that an image is just a snapshot of your program’s environment, frozen in time. And, in theory, you could take a snapshot of your program’s environment at any point in time…
repl:1:> (def greeting "hello world")
"hello world"
repl:2:> (defn main [&] (print greeting))
<function main>
repl:3:> (def image (make-image (curenv)))
@"\xD4\x05\xD8\x08root-env\xCF\x01_\xD3\x01\xD0\x05value\xD7\0\xCD\0\x98\0\0\x02\0\0\xCD\x7F\xFF\xFF\xFF\x02\x05\xCE\x04main\xCE\x04repl\xCE\vhello world\xD8\x05print,\0\0\0*\x01\0\0/\x01\0\0*\x01\x01\04\x01\0\0\x02\x01\0\x10\0\x10\0\x10\0\x10\xCF\x05image\xD3\x01\xD0\nsource-map\xD2\x03\0\xDA\x07\x03\x01\xCF\x08greeting\xD3\x02\xDA\f\xD2\x03\0\xDA\x07\x01\x01\xDA\x04\xDA\x08\xCF\x04main\xD3\x03\xDA\f\xD2\x03\0\xDA\x07\x02\x01\xDA\x04\xDA\x05\xD0\x03doc\xCE\n(main &)\n\n\xD8\r*macro-lints*\xD1\0"
repl:4:> (spit "repl.jimage" image)
nil
janet -i repl.jimage
hello world
Which is neat, I guess, and as I understand it this is actually the canonical way to write programs in some languages: you load an image, interactively modify it, then save the image back to disk.
This is possible in Janet, and maybe even fun and good, but I’m not going to say anything else about it. This is a style of programming that dates back to long before I was born, but I have never tried it so I don’t know what I’m missing and I’m going to dismiss it out of hand.
Instead I’m going to talk about images as if they are nothing more than the output of Janet’s “compilation” phase. Because even if you limit yourself to a strict compilation/runtime separation, you can still use compile-time code execution to do a lot of very powerful things.
In fact, I think “compilation” is selling Janet short a little bit. When I hear “compilation,” I think of a transformation from high-level code to lower-level code, probably with some optimization thrown in along the way. And that is part of what Janet does during the so-called compilation phase, but it can also do anything else! It can execute arbitrary code, perform complex calculations — even perform side effects! — and once it’s done it will give us not just bytecode, but a fully interwoven image of our environment.
So instead of the “compilation phase,” I’m going to propose we call this the imagination phase.
Okay, I hate it already. Proposal rescinded. Segue out of this one with me.
So far we’ve only looked at really contrived, artificial examples. I think it’s time to talk about something real.
OpenGL has a concept called “shaders,” which are little mini-programs that run on the GPU and do things like calculate the color for each pixel of your teapot or whatever.
You can’t compile these mini-programs ahead of time, because every GPU is a little bit different, so if you’re writing a game that uses OpenGL, you actually need to distribute the source of your shaders as part of your game, and let each of your players’ video drivers compile them on startup.
So there are lots of ways to do this: we could just distribute the shaders as separate files alongside the game and load them in at runtime relative to the path of our executable. And that would work fine!
But let’s say that we don’t want to do that. Let’s say we want to distribute a game as a single binary.
Well, we could just embed the shader source as a string in our code:
(def gamma-shader `
#version 330
in vec3 fragColor;
out vec4 outColor;
void main() {
outColor = vec4(pow(fragColor, vec3(1.0 / 2.2)), 1.0);
}`)
But that’s obviously terrible; we probably wouldn’t have any tooling support if we did that, and it would be pretty annoying to locate and change our shaders once we have more than a couple of them.
Instead, what if we kept the shaders in separate files, but loaded them into the program at compile time?
(def gamma-shader (slurp "gamma.fs"))
(defn main [&]
(print gamma-shader))
Neat! Now if we compile that to an image, we can embed the data into our final executable:
janet -c shader-example.janet shader-example.jimage
rm gamma.fs # no longer needed!
janet -i shader-example.jimage
#version 330
in vec3 fragColor;
out vec4 outColor;
void main() {
outColor = vec4(pow(fragColor, vec3(1.0 / 2.2)), 1.0);
}
Okay cool. We performed the side effect of reading from the disk at compile time, and then… well, nothing else. We just referred to it like a regular value, and Janet’s image marshaling took care of embedding the data into our final binary.
Now, obviously there are limits to what you can marshal: not all values can survive cryostasis. In fact, if we consider a slight variation of that code:
(def f (file/open "gamma.fs"))
(def gamma-shader (file/read f :all))
(file/close f)
(defn main [&]
(print gamma-shader))
This is functionally identical, and we can still run this script just fine:
janet shader-example2.janet
#version 330
in vec3 fragColor;
out vec4 outColor;
void main() {
outColor = vec4(pow(fragColor, vec3(1.0 / 2.2)), 1.0);
}
But if we try to compile it…
janet -c shader-example2.janet shader-example2.jimage
error: cannot marshal file in safe mode
in marshal [src/core/marsh.c] on line 1480
in make-image [boot.janet] on line 2637, column 3
in c-switch [boot.janet] (tailcall) on line 3873, column 36
in cli-main [boot.janet] on line 3909, column 13
We can’t. We now have a reference to a core/file
abstract type in our top-level environment, and when Janet tries to marshal the environment it throws its hands up on that value. Because of course it does: you can’t serialize a file handle or a network connection or anything like that to disk.
I think we can notice three things from this:
The entire environment is marshaled into our image.
Janet doesn’t special-case closed file handles.
You could imagine a world where Janet lets us get away with this just this once, since unmarshaling a closed file handle could be well-defined. But also useless.
There exist programs that Janet knows how to execute but which cannot be compiled into images.
In practice you don’t really have to think about this, like, ever.
I actually had to contort a bit to write this “broken” program. The correct way to read from a file, if you are allergic to typing the word slurp
, would be:
(def gamma-shader
(with [f (file/open "gamma.fs")]
(file/read f :all)))
(defn main [&]
(print gamma-shader))
Which of course compiles fine — f
is not a top-level variable, so it’s not a part of the environment.
And when you’re writing little shebang scripts, you probably won’t even define a main
function, and it will look like Janet just runs through your script in order like any other scripting language. All of your work will take place during the “compilation” phase, and Janet will never try to construct an image at all, and you really won’t have to think about this.
But once you start writing larger programs that you compile ahead of time, you can start to think about the distinction, and decide if there’s any work you want to perform ahead-of-time. You don’t have to — you can put everything in main
if you want to — but you have that power should you need it.
Finally, I think it’s worth pointing out explicitly: just because we can’t marshal core/file
s, that doesn’t mean we can’t marshal other abstract types. Many of the abstract types in the standard library (like core/peg
) are perfectly marshalable, and when we define our own abstract types we can optionally provide custom marshaling routines. We’ll talk more about that in Chapter Nine.
And now I’m done talking about images.
You got a little taste of what you can do with compile-time programming, and I hope that it was to your liking. Because the next chapter…
Well, I don’t want to spoil it.
Oh; it’s just right there. Not a lot of time to build suspense, really. It kind of spoiled itself.
Alright, well, yes, we’re going to talk about macros. We’re going to talk about macros as if you have never heard of them, even though you probably have, because remember that in this book you’re only supposed to know JavaScript and JavaScript doesn’t have any.
And actually, we should probably talk about that. JavaScript doesn’t have macros. Most popular languages don’t have macros as a matter of fact. And you’ve made it this far in life using those languages, and you’re doing just fine. Do you really need macros?
Well, no. There is no program that you can write with macros that you couldn’t write in some other way — just like there is no program that you can write with first-class functions that you couldn’t write some other way.
But aren’t first-class functions nice? Think of all the things they let you do: fold
, promises, event handlers— do you know what it’s like writing event handlers without first-class functions? Have you ever implemented a delegate class? It’s awful.
Macros are similarly useful. You can write macros to eliminate boilerplate to a degree that you cannot imagine in a language without them. You can write macros to define new control-flow constructs, like a switch
with destructuring case
s (don’t you wish JavaScript had that?). You can write macros that define new functions for you, or modify existing functions à la Python’s decorators. You can write macros that take high-level descriptions of binary formats and generate code to efficiently parse them. You can write macros to do all sorts of things.
So: macros.
Macros.
I really am going to talk about macros, but you’ll have to give me a second, because first I want to talk about metaprogramming. Macros are a tool that we can use to write very powerful “metaprograms,” but we can do metaprogramming without any macros at all.
In fact, in a way, all compiled Janet programs are “metaprograms” — programs that write other programs — because of the way that the compilation step works. When we “compile” a Janet program, we’re really executing (part of) it, performing all of the top-level effects, computing all of the top-level values, and then finally producing, as the output of this “metaprogram,” an image with a main
function. And what is that image but a new program forged in the fires of our metaprogram?
So here’s a macro-free metaprogram where this “program construction” is a little more explicit. It’s a script that you can compile to produce an image that prints hello world
.
(defn sequence [& fs]
(fn [&]
(each f fs
(f))))
(defn first-word []
(prin "hello"))
(defn space []
(prin " "))
(defn second-word []
(prin "world"))
(defn newline []
(print))
(def main (sequence first-word space second-word newline))
prin
is like print
, but it doesn’t print the trailing newline, so the output will all show up on one line.
Nothing mind-blowing, right? If you’re comfortable with the idea of functions returning other functions, you can see that we created an anonymous closure over our various smaller functions, then we created a binding called main
in the environment that points to that closure. When Janet constructs the image, its main
function will be exactly that closure that we dynamically allocated during the compilation phase.
janet -c meta.janet meta.jimage
janet -i meta.jimage
hello world
Metaprogramming.
But that’s not really what typical metaprogramming looks like. That’s a weird high-level example that I made up to get you comfortable with the idea of creating new functions at compile-time.
But there’s a much more direct way to create new functions at compile time. In fact, Janet has a function that takes in the body of a function as an abstract syntax tree and gives us back an actual real-live function that represents the result of evaluating that body. We can use that to directly create new functions:
(def main (compile ['print "weird"]))
Remember that 'print
is a “symbol.” ['print "weird"]
is an immutable vector (“tuple”) of two elements: the symbol 'print
and a string. That tuple is how Janet represents the abstract syntax tree of the expression (print "weird")
, and indeed if we compile and execute it, we can see that it does exactly that:
janet -c compile.janet compile.jimage
janet -i compile.jimage
weird
When we pass an abstract syntax tree to the compile
function, Janet will give us back the function that ignores its arguments and has that abstract syntax tree as its body. And it works for any abstract syntax tree we construct:
repl:1:> (def f (compile ['+ 1 ['* 2 3]]))
<function _thunk>
repl:2:> (f)
7
Because Janet uses a very lightweight representation for abstract syntax trees — they’re regular tuples of regular Janet values, like symbols and numbers and other tuples — it’s very easy for us to manipulate abstract syntax trees. We don’t need to use some special AbstractSyntaxTreeNodeVisitor
class or something to wrangle these values; we can just use regular loops and maps and anything else that we could do with any other tuple.
So let’s do that.
(defn set-x [& expressions]
(def result @['do])
(each expression expressions
(array/push result ['print (string/format "about to execute %q" expression)])
(array/push result expression))
(tuple/slice result))
We’re appending elements to a mutable vector (“array”), but we want to return a tuple, so we use tuple/slice
to make an immutable copy.
I called this set-x
because it reminds me of bash’s set -x
option. But it doesn’t change anything about how Janet works; it’s just a function that takes any number of abstract syntax trees and returns a single new abstract syntax tree.
Since functions have to return a single value, we wrap everything in do
to produce the abstract syntax tree that, well, does all of the things we want in order.
Actually, let’s take a quick look at do
before we move on:
(do
(first-thing)
(second-thing))
This is equivalent to creating a new block in JavaScript:
{
firstThing();
secondThing();
}
In JavaScript you usually create a new block so that you can control the extent of block-scoped variables:
{
const x = 10;
console.log(x);
}
// x no longer exists
The same is true in Janet:
(do
(def x 10)
(print x))
# x no longer exists
But this actually isn’t what we want here. We’re only using do
because we want to pack a bunch of abstract syntax trees into a single abstract syntax tree. The fact that do
creates a new scope is sort of incidental, and in fact might actually be problematic.
Fortunately, Janet has a do
alternative called upscope
, and upscope
does not create a new scope. It executes all of its expressions in the same scope that it runs in.
(upscope
(def x 10)
(print x))
(print x) # still exists!
I think that’s actually more appropriate for our set-x
function. So let’s switch to that:
(defn set-x [& expressions]
(def result @['upscope])
(each expression expressions
(array/push result ['print (string/format "about to execute %q" expression)])
(array/push result expression))
(tuple/slice result))
Okay. So this is a function; let’s call it:
repl:1:> (set-x ['print ['string/ascii-upper "hello"]] ['+ ['* 3 2] ['/ 1 2]])
(upscope (print "about to execute (print (string/ascii-upper \"hello\"))") (print (string/ascii-upper "hello")) (print "about to execute (+ (* 3 2) (/ 1 2))") (+ (* 3 2) (/ 1 2)))
janet -l
tells Janet to load the provided library so that it’s in scope in the repl. We saved our function into a file called set-x.janet
, but we import it as ./set-x
. If we just use set-x
, Janet will look for a library with that name in the shared library path:
janet -l set-x
error: could not find module set-x:
/usr/local/lib/janet/set-x.jimage
/usr/local/lib/janet/set-x.janet
/usr/local/lib/janet/set-x/init.janet
/usr/local/lib/janet/set-x.so
in require-1 [boot.janet] on line 2900, column 20
in import* [boot.janet] on line 2939, column 15
in l-switch [boot.janet] (tailcall) on line 3878, column 12
in cli-main [boot.janet] on line 3909, column 13
By making it look like a path with ./
, Janet looks for a local library instead. And we don’t specify the extension because a library can take many forms, as you can see from the error above, and having a uniform way to refer to libraries allows us to more easily change their physical representation later.
Wow that was the longest aside ever. I forgot what we were even talking about.
repl:1:> (set-x ['print ['string/ascii-upper "hello"]] ['+ ['* 3 2] ['/ 1 2]])
(upscope (print "about to execute (print (string/ascii-upper \"hello\"))") (print (string/ascii-upper "hello")) (print "about to execute (+ (* 3 2) (/ 1 2))") (+ (* 3 2) (/ 1 2)))
Oh, that’s right. Let’s prettify that:
(upscope
(print "about to execute (print (string/ascii-upper \"hello\"))")
(print (string/ascii-upper "hello"))
(print "about to execute (+ (* 3 2) (/ 1 2))")
(+ (* 3 2) (/ 1 2)))
Alright! Remember that this is only an abstract syntax tree. We need to compile
it if we actually want to execute the code:
repl:1:> (set-x ['print ['string/ascii-upper "hello"]] ['+ ['* 3 2] ['/ 1 2]])
(upscope (print "about to execute (print (string/ascii-upper \"hello\"))") (print (string/ascii-upper "hello")) (print "about to execute (+ (* 3 2) (/ 1 2))") (+ (* 3 2) (/ 1 2)))
repl:2:> (compile _)
<function _thunk>
repl:3:> (_)
about to execute (print (string/ascii-upper "hello"))
HELLO
about to execute (+ (* 3 2) (/ 1 2))
6.5
In the repl, _
refers to the result of the previous expression. So (_)
invokes the anonymous <function _thunk>
value that we compiled.
Okay, cool. Metaprogramming.
Except: is it cool? Or is it an unreadable mess, to the extent that you’re wondering why anyone would ever want to write code like this?
Well, don’t worry: no one writes code like this. The thing we’re doing here is weird, and it’s especially weird if you’re already used to macros. You’re never actually going to see code like this; you’re probably never going to invoke the compile
function directly. I’m just going through this so that you can see the things that Janet makes possible:
And these are the core ideas behind macros.
But macros give us a much more ergonomic way to do all of these things. Macros are so ergonomic, in fact, that they verge into “magic” territory, and it’s easy to lose sight of the actual mechanisms underlying them. When I started writing macros I didn’t really understand how they worked, when they executed, or the full extent of what I could do with them. I thought of them as simple syntactic transformations, and it took a lot of playing with Janet before I understood their true potential.
So we’re going to keep building towards macros. Right now we have a super explicit, super ugly version of something kinda like a macro. Let’s sprinkle syntax sugar on it until it tastes good.
First off, no one ever writes abstract syntax trees like this:
['+ ['* 3 2] ['/ 1 2]]
Instead, you write this:
'(+ (* 3 2) (/ 1 2))
Which means exactly the same thing, but it looks much closer to the Janet code that it represents.
'
is pronounced “quote,” and whatever comes after the quote gets, umm, quoted. We’ve seen quote before with 'symbols
, and I pretended like it was just the way that you wrote symbol literals. But actually, quoting an identifier is just the most convenient way to get a symbol. You can also make them with the (symbol "name")
constructor.
You can quote anything, but quoting most things just gives you the thing you quoted back:
repl:1:> '100
100
repl:2:> '"hello"
"hello"
repl:3:> '{:key value}
{:key value}
repl:4:> '@[1 2 3]
@[1 2 3]
repl:5:> ':foo
:foo
repl:6:> 'true
true
repl:7:> 'nil
nil
But when you quote something, it also quotes every sub-expression in it. So when we write '{:key value}
, we get the “struct” {':key 'value}
. Quoting a keyword gives us the keyword back, and quoting the identifier value
gives us the symbol 'value
instead of treating it like the name of a variable, so we don’t get an error about value
not being defined.
So really '(+ 1 2)
is the same as ['+ '1 '2]
, but since '1
is the same as 1
we don’t bother to write that all out.
Okay. So: does this help us? A little. Now we can write:
(set-x '(print (string/ascii-upper "hello")) '(+ (* 3 2) (/ 1 2)))
Which looks a tiny bit better, I guess. But does this help us with the implementation of set-x
?
(defn set-x [& expressions]
(def result @['upscope])
(each expression expressions
(array/push result ['print (string/format "about to execute %q" expression)])
(array/push result expression))
(tuple/slice result))
Ummmm, no. Not really.
We do construct the abstract syntax tree ['print (string/format "..." expression)]
, but we can’t write that as '(print (string/format "..." expression))
. That would quote every element in the list, but we only want to quote the first element — we still want to evaluate the call to string/format
.
Fortunately, Janet has a way to quote some elements in an expression without quoting every element: replace '
with ~
.
~
is pronounced “quasiquote.” It works exactly the same as '
, except that you can “opt out” sub-expressions from getting quoted. By default it quotes everything, but you can tell it not to quote a specific term by prefixing it with a comma:
repl:1:> ~(print ,(+ 1 2))
(print 3)
,
is pronounced “unquote,” and it works, essentially, like string interpolation. Consider the following JavaScript:
node
Welcome to Node.js v16.16.0.
Type ".help" for more information.
> `1 + 2 = ${1 + 2}`
'1 + 2 = 3'
This is very similar to the following Janet expression:
repl:1:> ~(1 + 2 = ,(+ 1 2))
(1 + 2 = 3)
Except it’s not a string; it’s a tuple. You can actually use this “tuple interpolation” to make any tuples you want; you don’t have to use it to make abstract syntax trees. But generally []
notation is more convenient: it’s usually better to make quoting opt-in instead of opt-out, unless you’re writing down an abstract syntax tree.
Okay. So now we can write this instead:
(defn set-x [& expressions]
(def result @['upscope])
(each expression expressions
(array/push result ~(print ,(string/format "about to execute %q" expression)))
(array/push result expression))
(tuple/slice result))
Is that better? Eh; maybe a tiny bit. But not really.
There’s one more bit of syntax sugar to add, though, and it’s going to help us substantially.
Consider the following contrived example:
repl:1:> (def nums [1 2 3])
(1 2 3)
repl:2:> (def sum-ast ~(+ ,nums))
(+ (1 2 3))
We interpolated a list into our list, and of course we got a nested list, because that’s what we asked for. But what if that’s not what we want? What if we want (+ 1 2 3)
instead? Well, it turns out Janet has just the thing:
repl:3:> (def sum-ast ~(+ ,;nums))
(+ 1 2 3)
,;
is pronounced “unquote-splice,” and it splices each element in the inner list into the outer list.
And that actually helps us a lot! That lets us completely rewrite set-x
:
(defn set-x [& expressions]
~(upscope
,;(mapcat (fn [expression]
[~(print ,(string/format "about to execute %q" expression))
expression])
expressions)))
mapcat
is Janet’s name for “concat map” — the function JavaScript calls flatMap
. Unfortunately, the function argument comes first, so it’s very hard to read when you pass it an anonymous function like this.
This is just a more functional version of the imperative thing we were doing before, using unquote-splice to form the list instead of explicit mutation.
Now that’s starting to look like a real macro.
Except, of course, it isn’t actually a macro yet. It’s just a function. A function that takes abstract syntax trees as input, and returns an abstract syntax tree as output. Which is essentially all that a macro is, except that we declare macros differently. We declare macros like this:
(defmacro set-x [& expressions]
~(upscope
,;(mapcat (fn [expression]
[~(print ,(string/format "about to execute %q" expression))
expression])
expressions)))
This is exactly the same as before, but now we’re using defmacro
instead of defn
. And when we call it, we can see two interesting differences:
repl:1:> (set-x (print "hello") (+ 1 2))
about to execute (print "hello")
hello
about to execute (+ 1 2)
3
First off, notice that we don’t have to explicitly quote the arguments anymore. We just wrote (print "hello")
, but our macro got '(print "hello")
as one of its arguments.
But that’s not very interesting. The interesting bit is that it didn’t just return the abstract syntax tree — it compiled and executed it as well.
It executed it because we typed this into the repl, so this was sort of a confusing first example, as we combined both the compile time and runtime steps into one. So let’s take a look at how macros normally work:
(use ./set-x-macro)
(defn main [&]
(set-x
(var sum 0)
(for i 0 10
(+= sum i))
(print sum)))
It looks like we’re “calling” set-x
in main
. However, I’m going to claim that set-x
will not run when we invoke main
. Instead, set-x
will run during the compilation phase, when we compile main
, and the abstract syntax tree that set-x
returns will run when we invoke main
.
Here, watch. Let’s compile this program:
janet -c macro-example.janet macro-example.jimage
Well… huh. Yeah. We can’t really tell. Maybe set-x
ran; maybe nothing happened at all.
Let’s make it a little more clear what’s going on.
(defmacro set-x [& expressions]
(print "expanding the set-x macro")
(printf " here are my arguments: %q" expressions)
(def result ~(upscope
,;(mapcat (fn [expression]
[~(print ,(string/format "about to execute %q" expression))
expression])
expressions)))
(printf " and i'm going to return: %q" result)
result)
Let’s switch to that version of the macro, and add in a little self-reflection while we’re at it:
(use ./set-x-verbose)
(defn main [&]
(set-x
(var sum 0)
(for i 0 10
(+= sum i))
(print sum)))
(print)
(print "and this is what main looks like:")
(print)
(pp (disasm main))
Now when we compile that program, we can see it all in action:
janet -c macro-example-verbose.janet macro-example-verbose.jimage
expanding the set-x macro
here are my arguments: ((var sum 0) (for i 0 10 (+= sum i)) (print sum))
and i'm going to return: (upscope (print "about to execute (var sum 0)") (var sum 0) (print "about to execute (for i 0 10 (+= sum i))") (for i 0 10 (+= sum i)) (print "about to execute (print sum)") (print sum))
and this is what main looks like:
{:arity 0 :bytecode @[ (lds 0) (ldc 1 0) (push 1) (ldc 2 1) (call 1 2) (ldi 1 0) (ldc 2 2) (push 2) (ldc 3 1) (call 2 3) (ldi 2 0) (ldi 3 10) (lt 4 2 3) (jmpno 4 5) (movn 5 2) (add 1 1 5) (addim 2 2 1) (jmp -5) (ldc 2 3) (push 2) (ldc 3 1) (call 2 3) (push 1) (ldc 2 1) (tcall 2)] :constants @["about to execute (var sum 0)" <cfunction print> "about to execute (for i 0 10 (+= sum i))" "about to execute (print sum)"] :defs @[] :environments @[] :max-arity 2147483647 :min-arity 0 :name "main" :slotcount 6 :source "macro-example-verbose.janet" :sourcemap @[ (3 1) (4 3) (4 3) (4 3) (4 3) (5 5) (4 3) (4 3) (4 3) (4 3) (6 5) (6 5) (6 5) (6 5) (6 5) (7 7) (6 5) (6 5) (4 3) (4 3) (4 3) (4 3) (8 5) (8 5) (8 5)] :structarg false :vararg false}
Don’t worry about actually understanding the bytecode for main
, just take a look at the constants
for strong evidence that the abstract syntax tree we returned actually did make its way in there somewhere.
And indeed, when we run it, we can see that main
does what we expect:
janet -i macro-example-verbose.jimage
about to execute (var sum 0)
about to execute (for i 0 10 (+= sum i))
about to execute (print sum)
45
Neat.
So macros are like functions, but they always run at compile time no matter where they appear in your program. Here’s how it works:
defn
statements, top-level side effects, whatever — it first performs “macro expansion” on the abstract syntax tree of the expression. This means that Janet scans the abstract syntax tree for any previously-defined macros, and…compile
s the top-level abstract syntax tree into a function.Alright.
Macros.
That’s macros.
We did it.
Except, gosh, we haven’t really done it at all, have we?
We’ve only scratched the surface. We’ve only seen the absolute most boring, basic macro you could imagine.
So let’s do something exciting, before we draw this chapter to a close.
We’re going to write a macro that will read a SQLite schema file at compile time, then use that to generate functions that will query a corresponding SQLite database at runtime.
Should you actually do this? No. This is veering so far into “magic” territory that I cannot in good conscience endorse anything like it. But I think it is a good demonstration of the power of metaprogramming, and it’ll be fun.
Janet has a semi-first-party sqlite3
package; we’re going to rely on that to do all of the heavy-lifting. I mean, to the extent that we’re lifting anything. It’s actually going to be pretty light.
First off, we’ll need a schema. I’ll use something really simple for the sake of this example:
create table people(
id integer primary key,
name text not null
);
create table grudges(
id integer primary key,
holder integer not null,
against integer not null,
reason text not null,
foreign key(holder) references people(id),
foreign key(against) references people(id)
);
Next we’ll need a dummy database:
sqlite3 db.sqlite
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite> .read schema.sql
sqlite> insert into people values (1, 'ian');
sqlite> insert into people values (2, 'jeffrey');
sqlite> insert into grudges values (1, 1, 2, 'claimed that my chapter on macros was "impenetrable"');
sqlite> insert into grudges values (2, 1, 2, 'doesn''t even have any dogs');
Next we’ll write some code to generate functions to query that schema. I’m waving my hands over how I have the sqlite3
Janet package; we’ll talk more about using libraries in Chapter Seven.
(import sqlite3)
(defmacro querify [schema-file]
(defn table-definition [table-name]
~(defn ,(symbol table-name) [conn]
# This might be vulnerable to some kind of wild SQL injection attack
# if an attacker somehow controls your schema file (???), but you can't
# bind table names as parameters and also this is definitely fine.
(sqlite3/eval conn ,(string/format "select * from %s;" table-name))))
(def conn (sqlite3/open ":memory:"))
(sqlite3/eval conn (string (slurp schema-file)))
(def tables
(->> (sqlite3/eval conn "select name from sqlite_schema where type = 'table';")
(map |($ :name))
(filter |(not (string/has-prefix? "sqlite_" $)))))
(sqlite3/close conn)
~(upscope
,;(map table-definition tables)))
(querify "schema.sql")
(defn main [&]
(def conn (sqlite3/open "db.sqlite"))
(pp (people conn))
(pp (grudges conn)))
Look! We’re able to call the functions people
and grudges
that only exist because there are tables with those names in our schema.
janet querify.janet
@[{:id 1 :name "ian"} {:id 2 :name "jeffrey"}]
@[{:against 2 :holder 1 :id 1 :reason "claimed that my chapter on macros was \"impenetrable\""} {:against 2 :holder 1 :id 2 :reason "doesn't even have any dogs"}]
Once again: I’m not really endorsing this. This is just a party trick. It’s a quick and dirty example of how easy it is to generate code at compile time, but you could imagine applying this technique in a different situation where it would actually be useful. For example, you could query the National Weather Service and make your code execute more slowly if someone compiles it on a rainy day. They have an API, you know.
But before we do that, let’s make sure that we actually understand this macro first. That tables
expression is probably the trickiest part of all of this to digest, because I snuck in some new stuff, but I’ll walk you through it.
(def tables
(->> (sqlite3/eval conn "select name from sqlite_schema where type = 'table';")
(map |($ :name))
(filter |(not (string/has-prefix? "sqlite_" $)))))
Loosely speaking, you can read ->>
as a Janet analog of method-chaining in JavaScript. It’s a macro that— hey! Wait!
It’s a macro!
And it’s actually an example of a good macro. So let’s put a pin in whatever nonsense we were doing for a bit, and let’s talk about ->>
.
(->> (sqlite3/eval conn "select name from sqlite_schema where type = 'table';")
(map |($ :name))
(filter |(not (string/has-prefix? "sqlite_" $))))
->>
is pronounced “thread last.” All it does is take its first argument and stick it at the end of its second argument:
(->>
(map |($ :name)
(sqlite3/eval conn "select name from sqlite_schema where type = 'table';"))
(filter |(not (string/has-prefix? "sqlite_" $))))
And then it repeats that until there aren’t any expressions left to merge:
(->>
(filter |(not (string/has-prefix? "sqlite_" $))
(map |($ :name)
(sqlite3/eval conn "select name from sqlite_schema where type = 'table';"))))
(filter |(not (string/has-prefix? "sqlite_" $))
(map |($ :name)
(sqlite3/eval conn "select name from sqlite_schema where type = 'table';")))
Neat. ->>
is a nice bit of syntax sugar that lets us reduce the amount of paren-nesting we have to contend with. It’s also a much simpler macro than the weird metaprogramming function-generation thing we’re doing right now, and probably would have made a better first example, but it’s too late to change that now.
Here’s one way we could implement it, taking advantage of the fact that Janet will continue expanding macros until there aren’t any left to expand to implement a sort of recursive macro without an explicit recursive call:
(defmacro my->> [first & rest]
(match rest
[next & rest] ~(my->> (,;next ,first) ,;rest)
[] first))
The order of the cases in this match
expression is, unfortunately, very important. We’ll talk more about that in Chapter Six.
In addition to ->>
, there’s also ->
(“thread first”), which as you might expect from the name does the same thing but threads values as the first argument instead of the last one, as well as -?>
and -?>>
, which short-circuit nil
values, plus as->
and as?->
which allow you to thread into any position.
Alright; let’s get back to the tables
expression:
(def tables
(->> (sqlite3/eval conn "select name from sqlite_schema where type = 'table';")
(map |($ :name))
(filter |(not (string/has-prefix? "sqlite_" $)))))
The next weird thing is |($ :name)
. This is just a shorthand way to create an anonymous function: |($ :name)
is exactly the same as (fn [$] ($ :name))
, and (struct :key)
is one way to look up the value of a key on a table or struct. It looks like a function call, but the thing being “called” is not a function, so Janet does a lookup instead. We could also write |(in $ :name)
to make the lookup explicit.
So the whole expression is equivalent to the following JavaScript:
const tables = conn.eval("select name from sqlite_schema where type = 'table';")
.map(x => x.name)
.filter(name => !name.startsWith("sqlite_"));
Alright. Hopefully now you can understand how the querify
macro works, but I think it’s always easier to understand macros by looking at their “expansions.”
The easiest way to do this is to open up the Janet repl with our script loaded in, and to invoke the function macex1
:
repl:1:> (macex1 '(querify "schema.sql"))
(upscope
(defn people [conn]
(sqlite3/eval conn "select * from people;"))
(defn grudges [conn]
(sqlite3/eval conn "select * from grudges;")))
I reformatted that for clarity. Janet will of course just dump it all on one line.
macex1
(“macro expand once”) is not a macro; it’s a function that expects an abstract syntax tree, so we have to quote the expression we want it to expand.
We could also use macex
, which will fully expand its argument:
repl:2:> (macex '(querify "schema.sql"))
(upscope
(def people "(people conn)\n\n"
(fn people [conn]
(sqlite3/eval conn "select * from people;")))
(def grudges "(grudges conn)\n\n"
(fn grudges [conn]
(sqlite3/eval conn "select * from grudges;"))))
And here we can see that defn
is just a macro that expands to (def ... (fn ..))
, with a docstring added for good measure. Since this is further away from what we actually wrote, I find that sometimes it’s harder to understand the fully expanded output. I usually prefer to keep calling (macex1 _)
until I reach the fixed point.
So, just to recap: every time we compile this program, we create an in-memory SQLite database, load in our schema, and then query that database to get a list of table names. And then we generate some functions with the same names. And then we compile them into our image.
If you were able to follow all of that, then congratulations: you just earned your macro white belt.
Actually, ugh. No. Not quite yet. I’m sorry. You’re really close, and you’re doing great, and I know this chapter is way too long already. But we can’t actually leave until we talk about hygiene.
But you’re tired, and I’m tired, and this is a lot of information to take in at once, and hygiene is sort of a tricky concept to wrap your head around at the best of times.
So here’s all I’m going to say about it for now:
(defmacro each-reverse [identifier list & body]
~(do
(var i (- (length ,list) 1))
(while (>= i 0)
(def ,identifier (in ,list i))
,;body
(-- i))))
This macro is simple enough. And it looks like it works just fine:
(use ./each-reverse)
(each-reverse num [1 2 3 4 5]
(print num))
janet hygiene.janet
5
4
3
2
1
But if we try this alternative…
(use ./each-reverse)
(each-reverse i [1 2 3 4 5]
(print i))
janet name-clash.janet
name-clash.janet:4:1: compile error: cannot set constant
We learn that this macro is actually fragile.
And if we continue interrogating it:
(use ./each-reverse)
(each-reverse x (do (os/sleep 1) [1 2 3 4 5])
(print x))
time janet over-eval.janet
5
4
3
2
1
janet over-eval.janet 0.01s user 0.00s system 0% cpu 6.038 total
We can see that it’s quite inefficient as well.
If you think about these programs in terms of their expanded abstract syntax trees, it’s easy to see why these problems happen:
(do
(var i (- (length [1 2 3 4 5]) 1))
(while (>= i 0)
# we accidentally shadow i
(def i (in [1 2 3 4 5] i))
(print i)
(-- i)))
(do
(var i (- (length (do (os/sleep 1) [1 2 3 4 5])) 1))
(while (>= i 0)
# we re-evaluate the list on every iteration
(def x (in (do (os/sleep 1) [1 2 3 4 5]) i))
(print x)
(-- i)))
But it isn’t clear what we can do to stop it.
Now, since you know what the macro is going to expand to, you could carefully avoid making any variables called i
, and you could make sure that you only pass in a pure, simple expression as your list
.
But that’s terrible. Macros should be referentially transparent: you should be able to call a macro without any idea of the actual abstract syntax tree that it produces. A macro that you have to handle with gloves on is not a good macro, in my book.
But there’s a bit of an art to writing robust, referentially transparent macros. It’s not hard to do, but it’s not trivial, and I think it deserves a chapter of its own.
But not, umm, not the next chapter. We’ve done enough metaprogramming for now. Let’s switch to something else, and circle back to this in Chapter Thirteen.
Janet does not have native, built-in regular expressions.
You can use a third-party regular expression library if you really have to, I dunno, validate an email address or something. But most of the time, if you’re writing Janet, you’ll be writing PEGs instead.
PEG stands for “parsing expression grammar,” which is a mouthful, so I’m going to stick with the acronym, even though I just wrote a whole chapter about macros without abbreviating AST once.
As a first — extremely crude — approximation, you can think of PEGs as an alternative notation for writing regular expressions. That’s not actually correct — PEGs are quite a bit more powerful than regular expressions, for starters, and they behave differently in a few respects — but we have to start somewhere, and this will let us re-use a lot of our existing knowledge.
Here, let’s look at a few random regular expressions, and see how we’d write them in PEG format.
regex: .*
peg: (any 1)
1
means “match one byte.” any
means “zero or more.”
regex: (na)+
peg: (some "na")
Strings match literally. There are no special characters to escape. some
means “one or more.”
regex: \w{1,3}
peg: (between 1 3 (choice :w "_"))
peg: (between 1 3 (+ :w "_"))
Janet’s :w
does not include _
, so we use +
to say “word character or underscore.” (+ ...)
is an alias for (choice ...)
. between
is inclusive on both ends.
regex: [^a-z-]
peg: (not (choice "-" (range "az")))
peg: (! (+ "-" (range "az")))
(! ...)
is an alias for (not ...)
. You can negate any PEG, not just character classes.
regex: [a-z][0-9]?
peg: (sequence (range "az") (opt (range "09")))
peg: (* (range "az") (? (range "09")))
*
matches all of its arguments in order. (* ...)
is an alias for (sequence ...)
. ?
means “zero or one,” and (? ...)
is an alias for (opt ...)
.
Those are pretty random examples, and this is nowhere near an exhaustive list, but it’s enough for you to start forming a general idea. Let’s notice a few things from this:
PEGs are quite a bit more verbose than regular expressions.
I wouldn’t want to write PEGs for, like, searching in my text editor, but in code I think the verbosity is almost always a good thing: it makes them easier to read and easier to modify.
And when you’re writing “real” PEGs, you will break up large patterns into smaller, named components, which will prevent any single pattern from becoming unwieldy.
PEGs use a lot of characters that usually mean something else.
Like (+ first second third)
. That’s not addition; it’s choice. How does that work?
Well, I didn’t state this explicitly, but PEGs are actually written quoted. A PEG is not (some "na")
; it’s actually ['some "na"]
. There is no function called some
; the symbol itself is meaningful to the functions that consume PEGs.
It’s conventional to write PEGs as quasiquoted forms: ~(some "na")
, so that you can easily interpolate other values into them. (We’ll get to that soon.)
PEGs are structured trees, rather than strings.
This makes it easy to compose PEGs out of smaller pieces, which we’ll start to do soon, or to write functions that manipulate PEGs in the same way that we are used to manipulating abstract syntax trees.
PEGs aren’t Janet abstract syntax trees, but you can see that they have a lot in common: they represent a tree structure out of nested tuples, lots of quoted symbols, and some numbers or strings or other values mixed in as well. In fact there is a general term for this kind of value: both abstract syntax trees and PEGs are examples of “symbolic expressions.”
Alright, now let’s talk about some of the ways that these patterns differ from their regular expression equivalents.
First off, PEGs are always anchored to the beginning of the input, so there’s no equivalent “start of input” pattern. So (any 1)
is actually equivalent to the regular expression ^.*
.
Except, no, that’s not strictly true. Because PEGs do not backtrack. Which means that all repetition is implicitly “possessive,” to use the regular expression term. So (any 1)
is actually actually equivalent to ^.*+
, which is not a construct that JavaScript’s regular expression engine supports.
The distinction is irrelevant in this case, but it matters for something like [ab]*[bc]
— that will match bbb
, but [ab]*+[bc]
, or the equivalent PEG (* (any (+ "a" "b")) (+ "b" "c"))
, will not.
PEGs do backtrack when using the choice
combinator, as well as a few others. But backtracking is always obvious and explicit, as opposed to regular expressions’ implicit backtracking everywhere. This makes it less likely that you’ll accidentally write a PEG that executes in exponential time.
Alright. There’s one more thing we should talk about before we get to a concrete example: numbers.
We’ve seen 1
already, as a way to match any byte. You can write any other integer — 2
, say, or even 3
— to match exactly that number of bytes.
But you can also write negative numbers. Negative numbers don’t advance the input at all, and they fail if you could advance that many characters. So -4
will fail unless there are fewer than four bytes left in the input. In practice I’ve only ever used this to write -1
, which means “end of input.” I don’t think -1
is a particularly intuitive way to write “end of input,” so I wanted to call this out ahead of time.
Now that we’ve covered the basics, let’s look at a real example. Let’s write an HTML pretty printer.
(defn element-to-struct [tag attrs children]
{:tag tag :attrs (struct ;attrs) :children children})
(def html-peg (peg/compile
~{:main (* :nodes -1)
:nodes (any (+ :element :text))
:element (unref
{:main (/ (* :open-tag (group :nodes) :close-tag) ,element-to-struct)
:open-tag (* "<" (<- :w+ :tag-name) (group (? (* :s+ :attributes))) ">")
:attributes
{:main (some (* :attribute (? :s+)))
:attribute (* (<- :w+) "=" :quoted-string)
:quoted-string (* `"` (<- (any (if-not `"` 1))) `"`)}
:close-tag (* "</" (backmatch :tag-name) ">")})
:text (<- (some (if-not "<" 1)))}))
(defn main [&]
(def input (string/trim (file/read stdin :all)))
(pp (peg/match html-peg input)))
Okay wow; we’re just diving right in huh.
First off, this isn’t really an HTML pretty printer; this is only an HTML parser. Well, strictly speaking, it’s a parser for a small subset of HTML — enough to make a point, without getting bogged down in minutiae.
So what are we looking at here?
First off, the outer pattern is a struct. The keys are names, and the values are patterns, and these patterns can reference other patterns by name — even recursively. Even mutually recursively, as you can see with :nodes
and :element
referring to one another.
We’ve seen named patterns like :w
before, when I said it was an analog of regular expressions’ \w
. But those are only the default pattern aliases, and by writing a struct like this we can create our own custom aliases, with scoping rules that make sense: patterns inside nested structs can refer to elements in the “outer struct,” but not the other way around.
Okay. Now let’s try to go through these individual patterns and make sure we understand them.
:main (* :nodes -1)
:nodes (any (+ :element :text))
The name :main
is special, as that will be the pattern’s entry-point. This :main
just calls :nodes
, which matches zero or more :element
s or :text
s, and then asserts that there’s no input left. (* "x" -1)
is like the regular expression ^x$
.
:text (<- (some (if-not "<" 1)))
:text
uses a combinator that we haven’t seen before: <-
.
<-
is an alias for (capture ...)
. We haven’t talked about captures yet, but they work similarly to regular expressions’ captures.
Just to quickly review, consider the regular expression <([^<]*)>
. The parentheses around the innards there mean that there is a single “capture group,” and if we run this expression over a string, we can extract that match:
node
Welcome to Node.js v16.16.0.
Type ".help" for more information.
> /<([^<]*)>/.exec('<hello> there')
[
'<hello>',
'hello',
index: 0,
input: '<hello> there',
groups: undefined
]
This returns an array of captured groups. The first element is the entire substring that matched the regular expression; the second is the text that matched the first (and in this case only) capture group.
> /<([^<]*)>/.exec('<hello> there')[1]
'hello'
PEGs work similarly: when you match a PEG over a string, you get a list of captures back.
repl:1:> (peg/match ~(* "<" (any (if-not ">" 1)) ">") "<hello>")
@[]
The list is empty here, because PEGs don’t implicitly capture anything. We have to explicitly ask for a capture, using <-
:
repl:2:> (peg/match ~(* "<" (<- (any (if-not ">" 1))) ">") "<hello>")
@["hello"]
We could also capture the entire matching substring, if we wanted to:
repl:3:> (peg/match ~(<- (* "<" (<- (any (if-not ">" 1))) ">")) "<hello>")
@["hello" "<hello>"]
But note that captures show up “inside out.” (<- pat)
first matches pat
, which might push captures of its own, and then it pushes the text that pat
matched.
So far this looks basically like regex country. But PEGs allow you to do so much more with captures. Here, let’s look at a slightly more interesting example:
repl:4:> (peg/match ~(* "<" (/ (<- (any (if-not ">" 1))) ,string/ascii-upper) ">") "<hello>")
@["HELLO"]
We have to unquote string/ascii-upper
because we actually want the function in our PEG, not the symbol 'string/ascii-upper
. This is why we’re using quasiquote instead of regular quote.
(/ ...)
is an alias for (replace ...)
, which is a misleading name: if you pass it a function, it doesn’t replace the capture with the function, but actually maps the function over the captured value. And if you pass it a table or a struct, it looks up the capture as a key and replaces it with the value. If you pass any other values, then it actually replaces. (If actually you want to actually replace a capture with a function or a table, you have to wrap it in a function that ignores its argument.)
So we’re mapping the function string/ascii-upper
over the value captured by (<- (any (if-not ">" 1)))
, which happens to produce a new string. But it doesn’t have to!
repl:5:> (peg/match ~(* "<" (/ (<- (any (if-not ">" 1))) ,length) ">") "<hello>")
@[5]
Our captures can be any Janet values — they don’t have to be strings. (<- pat)
always captures the string that pat
matches, but you can always map it, and there are other combinators that capture other things. Take $
:
repl:6:> (peg/match ~(* "<" (* (<- (any (if-not ">" 1))) ($)) ">") "<hello>")
@["hello" 6]
($)
is an alias for (position)
. It’s a pattern that always succeeds, consumes no input, and adds the current byte index to the capture stack. There’s also (line)
and (column)
, which do what you expect.
But the most useful capture alternative is the constant
operator. (constant x)
always succeeds, consumes no input, and adds an arbitrary value to the capture stack. It’s useful for parsing text into something with a little more structure:
repl:7:> (peg/match ~(any (+
(* "↑" (constant :up))
(* "↓" (constant :down))
(* "←" (constant :left))
(* "→" (constant :right))
(* "A" (constant :a))
(* "B" (constant :b))
(* "START" (constant :start))
1))
"↑↑↓↓←→←→ B A START")
@[:up :up :down :down :left :right :left :right :b :a :start]
Unconditional capture with constant
is useful, but note that in this particular case we would probably just write:
repl:8:> (peg/match ~(any (+
(/ "↑" :up)
(/ "↓" :down)
(/ "←" :left)
(/ "→" :right)
(/ "A" :a)
(/ "B" :b)
(/ "START" :start)
1))
"↑↑↓↓←→←→ B A START")
@[:up :up :down :down :left :right :left :right :b :a :start]
Okay. This has been: PEG Captures 101. Now let’s get back to our HTML example.
:text (<- (some (if-not "<" 1)))
Right. So (<- (some (if-not "<" 1)))
is equivalent to the regular expression ^([^<]++)
. It tries to match "<"
, and if that fails — if the next character is not <
— then it advances by one character. And then it repeats, until it finds a <
character or runs out of input, and finally it adds the entire string it consumed to the capture stack.
So if we give it the following input, it’s going to match the following substring:
hello yes this is <b>janet</b>
Easy. The next part is… not so easy.
:element (unref
{:main (/ (* :open-tag (group :nodes) :close-tag) ,element-to-struct)
:open-tag (* "<" (<- :w+ :tag-name) (group (? (* :s+ :attributes))) ">")
:attributes
{:main (some (* :attribute (? :s+)))
:attribute (* (<- :w+) "=" :quoted-string)
:quoted-string (* `"` (<- (any (if-not `"` 1))) `"`)}
:close-tag (* "</" (backmatch :tag-name) ">")})
But we’ll take it one step at a time, and it’ll be fine.
The whole pattern is wrapped in unref
, but I can’t actually explain that until the end, so we’ll skip over it for now and jump straight to :main
. We’ll circle back to unref
after we talk about backreferences.
:main (/ (* :open-tag (group :nodes) :close-tag) ,element-to-struct)
So an :element
consists of an opening tag, some child nodes, and then a matching closing tag. Like <i>hello</i>
.
But we don’t match :nodes
; we match (group :nodes)
. Because recall that :nodes
is going to push multiple nodes onto the capture stack:
:nodes (any (+ :element :text))
Specifically, anything captured in :element
or :text
. But (group :nodes)
says “well, instead of pushing every capture individually, wrap all the captures into a tuple and push that tuple.” So we’ll match multiple nodes, but we’ll only have a single (possibly empty!) list of nodes on the capture stack when we’re done.
After we parse all of a tag’s individual components — tag name, attributes, and children — we’ll call element-to-struct
to wrap it up into a nicer format. Note that element-to-struct
actually takes three arguments: one for each of :element
’s capture groups. (The tag name and attributes are captured by the :open-tag
sub-pattern.)
But actually matching the tags is the interesting bit.
:open-tag (* "<" (<- :w+ :tag-name) (group (? (* :s+ :attributes))) ">")
:close-tag (* "</" (backmatch :tag-name) ">")
I want to draw your attention to (<- :w+ :tag-name)
. This is a tagged capture, and :tag-name
is its “tag.” When you tag a capture, you can refer back to it later in the match — that’s exactly what (backmatch :tag-name)
does.
But hark! There might be multiple tagged captures to contend with.
<p>If you have <em>nested</em> tags</p>
<p>
will push a tagged capture to the stack, and so will <em>
. So now there are two captures tagged :tag-name
. But when we backmatch
, we’re going to look for the most recent time we tagged a capture with :tag-name
— which is going to be "em"
. This will match </em>
successfully, but of course it will fail once we get to </p>
.
And that’s bad! What we want to do is “scope” the tagged matches, so that parsing the <em>
tag doesn’t leak out to our parsing of the <p>
tag.
So that’s exactly what unref
does. It says “after you’re done parsing this pattern, remove all of the tags that you associated with any captures.” By wrapping unref
around our :element
, we make these tagged captures local to each <tag>
.
Wow, it sure is confusing that HTML tags are called tags and capture tags are also called tags. Someone really could have picked a better example to introduce tagging, huh?
Okay, now you might be thinking: why is this a problem? Sure, we pushed "em"
to the capture stack, but then we popped it off! We replace
d it with an untagged struct when we called element-to-struct
, right? Why can backmatch
still see it?
Well, tagged captures are actually separate from the capture stack. backmatch
doesn’t look for “the uppermost capture on the stack with this tag” — the tags don’t live on the capture stack at all. backmatch
actually looks for “the last time we captured something with this tag.”
To help make this make sense, I’m going to describe a model of how you might implement a simple PEG matcher. We’ll keep track of two pieces of state: a stack of stacks, and a stack of tag scopes. We’ll start with a single stack on the stack-stack, and a single scope on the scope-stack, and different combinators will manipulate these.
The group
combinator, for example, pushes a new stack onto the stack-stack, executes its pattern, and then pops that new stack and pushes it onto next highest stack (as a tuple). The replace
combinator pushes a new stack, executes its pattern, then pops it off the stack-stack, passing its contents as positional arguments to its function. And then it pushes the return value to the new topmost stack on the stack-stack.
Meanwhile unref
pushes a new tag scope, executes its pattern, and then pops the tag scope once it’s done. unref
is the only combinator that affects the tag scope stack.
You can actually pass a specific named tag to unref
to only “scope” that particular tag name, allowing you to leak some tags into the outer scope. So in that case unref
pushes a new tag scope, executes its pattern, and then copies everything except the named tag into the outer scope.
Alright. Now the only thing we haven’t talked about is the :attributes
bit.
:attributes
{:main (some (* :attribute (? :s+)))
:attribute (* (<- :w+) "=" :quoted-string)
:quoted-string (* `"` (<- (any (if-not `"` 1))) `"`)}
And I actually don’t think there’s much to say about this? You’ve seen it all already. This is easy. :s+
is “one or more whitespace characters,” and is one of many named patterns available by default.
Alright. That wasn’t so bad, was it?
(defn element-to-struct [tag attrs children]
{:tag tag :attrs (struct ;attrs) :children children})
(def html-peg (peg/compile
~{:main (* :nodes -1)
:nodes (any (+ :element :text))
:element (unref
{:main (/ (* :open-tag (group :nodes) :close-tag) ,element-to-struct)
:open-tag (* "<" (<- :w+ :tag-name) (group (? (* :s+ :attributes))) ">")
:attributes
{:main (some (* :attribute (? :s+)))
:attribute (* (<- :w+) "=" :quoted-string)
:quoted-string (* `"` (<- (any (if-not `"` 1))) `"`)}
:close-tag (* "</" (backmatch :tag-name) ">")})
:text (<- (some (if-not "<" 1)))}))
(defn main [&]
(def input (string/trim (file/read stdin :all)))
(pp (peg/match html-peg input)))
When you look at it all at once, it is pretty intimidating. But just think what the equivalent regular expression would look like! Oh, wait. You can’t. Parsing HTML with regexes is famously impossible.
We’ve already seen a lot of useful PEG combinators, but we’re not limited to the built-in operations that Janet gives us. We can actually interleave arbitrary functions into a PEG, and use them to guide the matching process. This allows us to write custom predicates to express complicated matching logic that would be very difficult to implement natively in a PEG (“identifier with more vowels than consonants”), but it’s especially useful when we already have a regular function that knows how to parse strings.
For example, scan-number
is a built-in function that parses numeric strings into numbers:
repl:1:> (scan-number "512")
512
repl:2:> (scan-number "512x")
nil
If we wanted to parse a number somewhere in a PEG, then… well, we’d use the built-in (number)
operator that does exactly that. But let’s pretend that that doesn’t exist for a second, and try to implement it in terms of scan-number
. Here’s a first attempt:
repl:1:> (peg/match ~(/ (<- (some (+ :d (set ".-+")))) ,scan-number) "123")
@[123]
That works, sometimes. But of course that number pattern is not very accurate, and we already saw that scan-number
will return nil
if we give it a bad input:
repl:2:> (peg/match ~(/ (<- (some (+ :d (set ".-+")))) ,scan-number) "1-12-3+3.-++")
@[nil]
But the match still succeeded, and captured nil
, because that was what we told it to do.
So we could try to carefully write a valid number pattern here, such that we only ever pass valid input to scan-number
. But we don’t want to do that. That sounds hard. We just want the pattern to fail if scan-number
can’t actually parse a number.
Enter cmt
:
repl:3:> (peg/match ~(cmt (<- (some (+ :d (set ".-+")))) ,scan-number) "1-12-3+3.-++")
nil
So cmt
is very similar to replace
, except that if your function returns something falsy (remember: just nil
or false
), then the cmt
clause itself will fail to match. It’s sort of like a map
vs filterMap
situation.
Of course in real life, as previously mentioned, we’d just write this:
repl:1:> (peg/match ~(number (some (+ :d (set ".-+")))) "123")
@[123]
cmt
stands for “match-time capture,” apparently, even though the letters are not in that order. The name comes to us from LPEG, the Lua PEG library that inspired Janet’s PEG library, where all capture-related functions start with C
. It’s a very useful function despite the confusing name, and there’s something else that makes it even more useful: the ->
operator.
->
stands for backref
, and it looks quite strange at first glance: all it does is re-capture a previously tagged capture. If you just used it by itself, it would duplicate previously tagged captures onto the capture stack and consume no input, which doesn’t sound very useful.
repl:1:> (peg/match ~(* (<- :d+ :num) (-> :num)) "123")
@["123" "123"]
But if you use it inside the pattern you pass to cmt
, you can add previous captures as arguments to your custom mapping predicate.
Here’s a concrete, if extremely dumb, example: I’ve invented my own HTML dialect that is identical to regular HTML, except that <head>
tags can optionally be closed with a </tail>
tag, because that’s modestly whimsical.
Previously we were able to use backmatch
to match closing tags, because they happened to be bytewise-identical to the values we captured in :open-tag
:
:close-tag (* "</" (backmatch :tag-name) ">")
But now that’s no longer true, and backmatch
isn’t sufficient to handle this very practical HTML dialect. We’ll have to write some logic:
(defn check-close-tag [open-tag close-tag]
(or (= open-tag close-tag)
(and (= open-tag "head")
(= close-tag "tail"))))
:close-tag (* "</" (drop (cmt (* (-> :tag-name) (<- :w+)) ,check-close-tag)) ">")
Notice that we “re-capture” :tag-name
, in addition to capturing the :w+
. Because cmt
needs a single pattern to execute, I stuck them together with *
, but both of these captures will be passed as arguments to check-close-tag
.
Neat.
We are now very close to knowing everything there is to know about PEGs, but I think we should talk about one more thing before we leave this chapter:
Regular expressions aren’t just useful for matching or extracting text. They’re also useful for changing text.
Regex replace is a common primitive operation; you use it all the time in your editor or with sed
or whatever. And of course Janet has a native peg/replace
function, and we’re going to talk about it soon.
But let’s just pretend, for a moment, that it doesn’t exist. Because it turns out that you don’t actually need a built-in PEG replace function: you can implement replacement as a special case of capturing.
It’s a pretty simple trick: we’re going to write a PEG that captures two things: the part of the string that matches the pattern we want to replace, and the entire rest of the string.
Just so we have something concrete to work with, let’s write a chaotic evil PEG: given a string, we’ll find all of the Oxford commas in that string, and replace them with Oxford semicolons.
So given input like this:
this is dumb, confusing, and upsetting
We’ll wind up with:
this is dumb, confusing; and upsetting
Naturally.
So the PEG itself is easy: we just want to match the literal string ", and"
, wherever it appears in the input:
repl:1:> (peg/match ~(any (+ ", and" 1)) "a, b, and c")
@[]
Okay. It did work; you just can’t tell. Let’s replace it, which will automatically capture the output, so we can at least see that it’s working:
repl:2:> (peg/match ~(any (+ (/ ", and" "; and") 1)) "a, b, and c")
@["; and"]
Okay. And now let’s also capture everything else:
repl:3:> (peg/match ~(any (+ (/ ", and" "; and") (<- 1))) "a, b, and c")
@["a" "," " " "b" "; and" " " "c"]
And we’re done! Sort of! We have the entire modified string, as a list of captures, and all we have to do now is stick them back together.
And I know: this looks unbelievably inefficient. And it would be, if we just called, like, string/concat
on this result. But Janet has a way to efficiently join these matches together without even making these intermediate string allocations in the first place.
It’s called accumulate
, although I’m going to use the short alias %
:
repl:4:> (peg/match ~(% (any (+ (/ ", and" "; and") (<- 1)))) "a, b, and c")
@["a, b; and c"]
And accumulate
is special-cased in the PEG engine: while Janet is executing the pattern inside an accumulate
block, anything that would normally push captures onto the stack instead just copies it into a shared mutable buffer. And once it’s done with its pattern, that buffer becomes a string, and accumulate
pushes it onto the capture stack.
So that’s a global replace. But what if you only want to replace the first occurrence?
Here’s one way:
repl:5:> (peg/match ~(% (any (+ (* (/ ", and" "; and") (<- (to -1))) (<- 1)))) "a, and b, and c")
@["a; and b, and c"]
After we match and replace the pattern, we immediately consume the rest of the string, so that the any
repetition won’t fire again.
Hey look! We did it. accumulate
was the last combinator on my list of combinators to tell you about, and I just told you about it. That means we’re almost done with the chapter now.
But we get to do something fun and easy first. There’s actually another way that we could have written that last pattern:
repl:6:> (peg/match ~(% (any (+ (* (/ ", and" "; and") '(to -1)) '1))) "a, and b, and c")
@["a; and b, and c"]
We replaced all of the (<- x)
captures with just 'x
, which does exactly the same thing. How does that work?
Well, 'x
is actually just syntax sugar for (quote x)
. They both parse into exactly the same abstract syntax tree: if you’re writing a macro or a PEG engine or whatever else, you can’t actually tell whether it was originally written using the '
shorthand or not. So when the whole thing is quasiquoted:
repl:7:> ~(% (any (+ (* (/ ", and" "; and") '(to -1)) '1)))
(% (any (+ (* (/ ", and" "; and") (quote (to -1))) (quote 1))))
All of those single-quotes get expanded into (quote)
forms, and quote
is just another alias for capture
in the PEG parser. But when you use the shorthand, you can save quite a few parentheses.
Now, it’s fun to work through these examples, and I think it’s valuable to understand how they work, just in case you ever find yourself needing to perform some weird text surgery deep inside some complicated PEG. But of course, in real life, you only have to write:
repl:1:> (peg/replace ", and" "; and" "a, b, and c")
@"a, b; and c"
There is also (peg/replace-all)
, (peg/find)
, which returns the index of the first match; and (peg/find-all)
, which returns all of the indices where the PEG would match.
Alright. That’s all of the important PEG stuff sorted, but I want to close with a few scattered, wandering observations:
PEGs operate on bytes, not characters.
So far we’ve only talked about parsing text, but you can write PEGs to parse binary formats just as easily. There are even built-in combinators for parsing signed and unsigned big- and little-endian integers up to 64 bits wide, which return the boxed core/s64
and core/u64
abstract types.
PEGs are harder to debug than regular expressions.
There are a million helpful “regex tester” websites that can show you your pattern as a finite state machine or interactively highlight different parts of matches or capture groups. But there is no equivalent for PEGs. If you’re running into trouble with your PEGs, well… you basically have to ask about it in the Janet chatroom, I think.
Do you want to make a PEG visualization website? You should. I would use that.
You can compile PEGs ahead of time.
You don’t have to do this — you can just pass the symbolic expression directly to peg/match
, as we’ve been doing — but if you’re going to use a PEG more than once then it’s probably a good idea. Especially if you’re compiling your program — Janet will marshal an optimized bytecode representation of your PEG into the final image.
Note that peg/compile
is a function, not a macro, so you’ll have to remember to call it at the top-level to ensure it executes during the compilation phase. There’s no reason to spend time compiling PEGs at runtime, after all, unless you’re dynamically constructing them.
You can define your own combinators.
PEGs are just symbolic expressions, and you already know how to write functions that manipulate symbolic expressions.
PEGs are the best.
PEGs really are one of my favorite things about Janet — I have never met a scripting language that made it so easy to parse text before.
It’s because this is a chapter about fibers.
JavaScript doesn’t have fibers, so I’m going to pretend like you’ve never heard of them before, even though you might be familiar with them from another language already.
The word “fiber” is a cute play on “thread:” a fiber is a lot like a thread, but it’s smaller and lighter. And a thread is a lot like a string, except— wait, no. That’s not right.
We could try to compare threads and fibers and talk about how a fiber is essentially lightweight cooperatively scheduled thread, but I don’t think that provides any useful intuition. If you’re programming with threads, you’re doing it because you have no other choice: your performance constraints require it. If you’re programming with fibers, you’re probably doing it because it’s fun and pleasant and it makes your code easier to read.
So let’s instead approach fibers from first principles. Let’s not think about threads or concurrency at all; let’s just get a hold of a fiber and see how it feels.
(defn print-something []
(print "something"))
(def fiber (fiber/new print-something))
janet fiber.janet
Okay, nothing happened.
We created a fiber by giving it a function, but it didn’t call the function. Or really: it didn’t call the function yet. It will call the function as soon as we ask it to:
(defn print-something []
(print "something"))
(def fiber (fiber/new print-something))
(resume fiber)
janet fiber.janet
something
There it is.
Now this is obviously boring, so let’s make it slightly more interesting:
(defn range [count]
(for i 0 count
(yield i))
"done")
(def fiber (fiber/new (fn [] (range 5))))
(print (resume fiber))
(print (resume fiber))
(print (resume fiber))
(print (resume fiber))
(print (resume fiber))
(print (resume fiber))
(print (resume fiber))
janet fiber.janet
0
1
2
3
4
done
error: cannot resume fiber with status :dead
in _thunk [fiber.janet] (tailcall) on line 16, column 8
Alright. So hopefully this isn’t too weird; this is exactly like the following generator in JavaScript:
function* range(count) {
for (let i = 0; i < count; i++) {
yield i;
}
return "done";
}
Except that Janet throws an error if we try to resume
a fiber that has already returned, while JavaScript just gives you undefined
if you call .next()
on a completed generator.
Fibers are iterable in Janet — just like generators are iterable in JavaScript — so we’d probably write something like this instead:
(defn range [count]
(for i 0 count
(yield i))
"done")
(def fiber (fiber/new (fn [] (range 5))))
(each value fiber
(print value))
Which prints 0
through 4
, but ignores the final return value.
There’s an important difference between Janet generators and JavaScript generators, though:
(defn yield-twice [x]
(yield x)
(yield x))
(defn double-range [count]
(for i 0 count
(yield-twice i)))
(def fiber (fiber/new (fn [] (double-range 5))))
(each value fiber
(print value))
janet fibers.janet
0
0
1
1
2
2
3
3
4
4
You can’t do that in JavaScript, because in JavaScript a generator is “scoped” to a single function. You can’t yield
from a regular function and expect it to “know” that you were calling it from a generator function.
JavaScript does have a way to yield all of the values from another generator:
function* yieldTwice(x) {
yield x;
yield x;
}
function* range(count) {
for (let i = 0; i < count; i++) {
yield* yieldTwice(i);
}
}
But this is essentially just syntax sugar for iterating over the generator returned by yieldTwice
and yielding all of its values.
In Janet, though, yield
does not return control from a function. It returns control from a fiber. And a fiber has a whole call stack of its very own, so when you call yield
, it might have to jump “up” several stack frames at once to yield the value back to the place that called resume
.
Except, really, you aren’t jumping up the stack. You’re jumping across, to a different stack. The call stack that you yielded from is still there, in all of its glory, and you can always jump back over to it by calling resume
again.
But there’s something else that you can do to actually jump up and unwind the fiber’s call stack: you can raise an exception.
(defn do-your-best []
(error "oh no"))
(defn believe-in-yourself []
(while true
(do-your-best)))
(def fiber (fiber/new believe-in-yourself))
(resume fiber)
janet fiberror.janet
error: oh no
in do-your-best [fiberror.janet] on line 2, column 3
in believe-in-yourself [fiberror.janet] on line 6, column 5
in _thunk [fiberror.janet] (tailcall) on line 10, column 1
Okay, so, that’s probably what you expected. We raised an exception; we got an error.
But an exception doesn’t have to propagate all the way up to the root of our program. We can create fibers that intercept exceptions for us:
(defn do-your-best []
(error "oh no"))
(defn believe-in-yourself []
(while true
(do-your-best)))
(def fiber (fiber/new believe-in-yourself :e))
(resume fiber)
The only difference is that I added the :e
argument to the fiber/new
call. And now it seems like nothing happens:
janet fiberror-caught.janet
But, in fact, something did happen. Rather than returning a yielded value, the resume
call actually returned the error. We just didn’t print it:
repl:1:> (def fiber (fiber/new believe-in-yourself :e))
<fiber 0x6000039234F0>
repl:2:> (resume fiber)
"oh no"
But wait a minute. How do we know that that’s an error? That’s just a string. What if it yielded that value? Or just returned it?
repl:3:> (fiber/status fiber)
:error
Oh, I see.
So: the :e
argument means that this fiber will, for lack of a better word, “catch” any errors thrown by the functions that it runs. It essentially acts like a barrier on the call stack: exceptions can get as far as the last call to resume
, but no further.
Now this would be a pretty verbose way to program with exceptions, so Janet provides a macro called try
that provides a familiar try-catch interface for creating fibers like this.
repl:4:> (try (do-your-best) ([e] (print e)))
oh no
It’s a little… weird-looking, I think. try
takes two arguments: an expression to evaluate, and then a “catch” section, which is wrapped in parentheses and starts with a binding list.
But we can look at the expansion to see that this macro creates a fiber, resumes it, and then checks its status. In fact, we can even get the underlying fiber that it creates by adding a second identifier (fib
) to the “catch” binding clause, which we could then use to print a stacktrace of the fiber at the time of the error:
repl:5:> (macex1 '(try (do-your-best) ([e fib] (debug/stacktrace fib e ""))))
(let [_000000 (<cfunction fiber/new> (fn [] (do-your-best)) :ie)
_000001 (<function resume> _000000)]
(if (<function => (<cfunction fiber/status> _000000) :error)
(do
(def e _000001)
(def fib _000000)
(debug/stacktrace fib e ""))
_000001))
That <function =>
bit means (= (...) :error)
. I point this out because my brain parses =>
as its own token.
We’ll talk about those weird _00000
identifiers in Chapter Thirteen.
Annoyingly, we also have to pass the empty string to debug/stacktrace
in order to get it to do the thing that we want. This argument is the “prefix” to print before the error line, and even though it’s an optional argument, if we omit it our error won’t show up at all.
Note that this actually passes :ie
instead of just :e
. :i
means “inherit the current environment” — an environment, if you recall from chapter two, is the table of top-level bindings, but actually every fiber has its own environment. We’ll talk more about fiber environments later in this very chapter.
Okay. So if you’re paying too much attention, you might be concerned. We’ve already seen that we create fibers to yield from them, as a way to make our own generators. But we also create fibers every time we want to catch an exception. But what if we’re doing both?
(defn yield-dangerously [x]
(if (< (math/random) 0.9)
(yield x)
(error "only way to live")))
(defn generate-safely [count]
(for i 0 count
(try
(yield-dangerously i)
([e]
(print "saved it")))))
(def fiber (fiber/new (fn [] (generate-safely 5))))
(each value fiber
(print value))
When we invoke yield-dangerously
, it’s actually nested inside two fibers (well, three, if you count the top-level fiber that our code begins in). try
creates a fiber, and we want that fiber to catch errors. But we learned previously that yield
will yield to the parent fiber! So this would mean that this doesn’t work, right?
Well, fortunately, that is not the case. It works fine. The fiber that try
creates will let yield
s just pass through to its parents — just like the fiber for the generator will allow exceptions to pass through to the top-level fiber.
This all comes down to the fiber’s “signal mask:” when you call fiber/new
, the default “signal mask” is :y
, for yield
. This means that the fiber “intercepts” yield
calls and prevents them from propagating to the parent fiber. But when we just pass :e
, our fiber no longer intercepts yield
s. It intercepts exceptions instead.
You can pass :ye
, if you want to, to intercept both “signals.” But I don’t know why you would want to do that.
Yield and error aren’t the only “signals” that fibers know about. There’s also a debug
signal, which we’ll talk about in Chapter Eleven — it jumps “up the stack” to an interactive debugger, if you have one running. :yield
, :error
, and :debug
are the only named signals, but there are also ten numbered “user” signals that you can intercept with flags :0
through :9
.
”User signal” is sort of a misnomer, because they aren’t actually reserved for you, the user. They’re just extra, numbered signals, and the Janet standard library ascribes its own meaning to some of them: it provides an “early return” macro that raises a “user signal 0” to exit from the current fiber, and “user signal 9” is an important part of its asynchrony story. Meanwhile “user signal 8” is how you interrupt one fiber from another fiber.
Janet does not make any officially documented guarantees about which signals are used by the standard library now or which signals may be used by the standard library at some point in the future — I only know that those signals are “reserved” because I have read the Janet source code. This means that I can’t give any concrete advice about which user signals are safe for you to use to implement your own control flow.
Also, not all user signals are created equally. You cannot resume
a fiber that raises user signals 0 to 4 inclusive, but you can resume
a fiber that raises user signals 5 to 9 inclusive. This places an additional limit on which user signals you physically can use depending on the type of thing you’re doing with them.
Okay. Fibers. So one way to think about fibers is that they give you a way to put “labels” in your call stack, and then to say things like “jump up to the nearest point in the call stack labeled :e
.” And they’re also first-class values that you can pass around and resume
in order to jump arbitrarily deep into a suspended call stack.
But enough about what fibers are. Let’s switch gears, and talk about why we would actually want to use fibers when we’re programming.
So we saw try
already — that’s a pretty big one. That’s useful. And we saw generators.
And generators are useful too! You can use them to generate ad-hoc sequences or elegantly traverse trees or lazily process complex data pipelines with better cache coherency and fewer intermediate allocations than Janet’s normal map
and filter
and reduce
would give you.
In fact, there’s even a nice shorthand for declaring ad-hoc generators without having to go through fiber/new
: coro
.
(def fiber (coro
(for i 0 5
(yield i))))
(each value fiber
(print value))
It’s called coro
because, well, Janet fibers are more than just generators. They’re actually full coroutines.
”Coroutine” is a fancy word, but it’s basically the same as a generator. To use JavaScript notation for a minute: you make a generator with f(...args)
, and then you extract elements from it by calling .next()
. You make a coroutine with f(...args)
and then you extract elements from it by calling .next(arg)
.
Note that JavaScript “generators” are actually full coroutines as well, although all function*
invocations produce an object called a Generator
. Even though it’s a coroutine. According to The Encyclopedia of Computer Science, generators are sometimes called “semicoroutines,” although I have never heard anyone say that in real life.
The only difference between a generator and a coroutine is that the code that “consumes” or “uses” or “drives” or “schedules” or “iterates over” a coroutine doesn’t just say “give me your next value.” It says “here’s a value for you, now give me your next value.” It’s kind of like a generator whose behavior can be guided by the code iterating over it.
But in practice, you don’t use coroutines like generators at all! Generators are used as a lightweight way to interleave control flow between multiple unrelated functions, while coroutines are almost exclusively used as a way to interleave long-running side effectful operations into code without blocking your entire program.
In JavaScript, you usually use a different syntax called async
/await
when you’re writing this type of coroutine. async
/await
effectively creates a function*
coroutine that only yields promises, and resumes it every time a promise completes. It’s a slightly less general — but much more convenient — interface for this most common of coroutine use cases.
Janet also special-cases this type of coroutine. When you’re programming asynchronously, you create fibers that you do not explicitly yield
from, and that you do not explicitly resume
elsewhere in your code. Instead, you hand the fiber to Janet, and then as your fiber executes it will implicitly yield when you invoke certain functions, and Janet — the Janet runtime, or, more specifically the Janet “event loop” — will resume your fiber in the future once it figures out the result.
The Janet “event loop” is a little scheduler that exists in the background of the Janet runtime. When you call functions that might take a long time to complete (like reading bytes from a socket), your program will actually “yield to the event loop.” Concretely this means that it raises a “user signal 9,” which will (probably) not be caught until it reaches the top-level of the Janet runtime, at which point Janet will start performing the effect you requested and then resume your fiber once it completes.
We’ll use the function ev/sleep
to demonstrate how this works (ev
means “event loop”):
(print "hello")
(ev/sleep 1)
(print "goodbye")
janet event-loop.janet
hello
goodbye
Oh. Right. I forgot that this is a book, so you can’t actually perceive the passage of time.
But, well, imagine the hello
appearing, and then a one second pause, and then the goodbye
appearing after that. It’s… it’s what you expect.
Here, let’s try to visualize the passage of time for you. We’ll print a .
every 100 milliseconds.
(defn visualize-time []
(while true
(prin ".")
(flush) # output is line-buffered by default
(ev/sleep 0.1)))
(visualize-time)
(print "hello")
(ev/sleep 1)
(print "goodbye")
janet event-loop.janet
..................^C
Well, of course this just prints .
forever, because I put the main fiber into an infinite loop. But that’s not really what I wanted: what I wanted was to run visualize-time
in the background. To schedule it to run only when the main fiber is waiting for its asynchronous ev/sleep
to complete.
To do that, we can use ev/call
to both create a new fiber and to schedule that fiber to be resumed as soon as the main fiber yields to the event loop. Which it does as soon as we run ev/sleep
:
(defn visualize-time []
(while true
(prin ".")
(flush) # output is line-buffered by default
(ev/sleep 0.1)))
(ev/call visualize-time)
(print "hello")
(ev/sleep 1)
(print "goodbye")
janet event-loop.janet
hello
..........goodbye
......................^C
Great. Except that, well, the program runs forever, because visualize-time
just loops indefinitely. We’ll have to interrupt it to get our program to complete gracefully:
(defn visualize-time []
(while true
(prin ".")
(flush) # output is line-buffered by default
(ev/sleep 0.1)))
(def background-fiber (ev/call visualize-time))
(print "hello")
(ev/sleep 1)
(print "goodbye")
(ev/cancel background-fiber "interruption")
janet event-loop.janet
hello
..........goodbye
error: interruption
in ev/sleep [src/core/ev.c] on line 2928
in visualize-time [event-loop.janet] (tailcall) on line 5, column 4
Oh, gross. Now it printed an error. That’s not really what we wanted. Why did that happen?
Let’s make the control flow a little more explicit:
(defn visualize-time []
(while true
(prin ".")
(flush) # output is line-buffered by default
(ev/sleep 0.1)))
(def background-fiber (ev/call visualize-time))
(print "hello")
(ev/sleep 1)
(print "goodbye")
(ev/cancel background-fiber "interruption")
(print)
(print "The main fiber is still running!")
(print "But as soon as our background task")
(print "resumes, it will immediately raise an")
(print "error. Like this:")
(print)
(ev/sleep 0)
(print)
(print "And we're back to the main fiber.")
(print "Let's check on our background fiber:")
(print)
(print "status: " (fiber/status background-fiber))
(print "value: " (fiber/last-value background-fiber))
janet event-loop.janet
hello
..........goodbye
The main fiber is still running!
But as soon as our background task
resumes, it will immediately raise an
error. Like this:
error: interruption
in ev/sleep [src/core/ev.c] on line 2928
in visualize-time [event-loop.janet] (tailcall) on line 5, column 5
And we're back to the main fiber.
Let's check on our background fiber:
status: error
value: interruption
Neat. Okay. So we sort of canceled the task loudly and violently, by forcing it to raise an exception that propagated all the way to the top level. But if we want to silently stop this background task, we can instead catch the exception:
(defn visualize-time []
(var stopped false)
(while (not stopped)
(prin ".")
(flush) # output is line-buffered by default
(try
(ev/sleep 0.1)
([error try-catch-fiber]
(if (= error :stop)
# gracefully handle the expected "exception"
(set stopped true)
# re-raise any unexpected exceptions
(propagate error try-catch-fiber))))))
(def background-fiber (ev/call visualize-time))
(print "hello")
(ev/sleep 1)
(print "goodbye")
(ev/cancel background-fiber :stop)
janet event-loop.janet
hello
..........goodbye
There we go. We wait for a second, and now you can viscerally appreciate the passage of time through the universal language of small progress dots.
Now, since there’s only one “waiting state” that we can be in, and since the function’s control flow is so easy to exit, an exception feels like a little bit of overkill.
But there’s another way that we can influence how the scheduler resumes this thread: we can ask ev/go
to “fill in” the current value that it’s waiting for, overriding whatever the actual event loop might be doing.
Normally ev/sleep
just “returns nil
”, by which I mean “the Janet event loop resumes our fiber with the value nil
for this expression.” But we can cause it to return a different result:
(defn visualize-time []
(var stopped false)
(while (not stopped)
(prin ".")
(flush) # output is line-buffered by default
(if (= (ev/sleep 0.1) :stop)
(set stopped true))))
(def background-fiber (ev/call visualize-time))
(print "hello")
(ev/sleep 1)
(print "goodbye")
(ev/go background-fiber :stop)
janet event-loop.janet
hello
..........goodbye
ev/go
is just like calling resume
on the fiber, except that we can’t manually resume
a fiber once we hand it to the event loop. Janet calls these fibers “root fibers,” and they can only be resumed with ev/go
.
And this is pretty weird; I don’t even know what would happen if the fiber were actually waiting on something, and I don’t know when we would reasonably want to jump in front of the event loop like this.
And while this code is a bit shorter than the exception version in this particular case, if there were multiple points where our background fiber could yield to the event loop, we could use an exception to take care of all of them at once (instead of having to check for :stop
at every yield point).
So let’s go back to the exception-throwing case, and talk about another way that we could handle this gracefully:
(defn visualize-time []
(while true
(prin ".")
(flush) # output is line-buffered by default
(ev/sleep 0.1)))
(def background-fiber (ev/call visualize-time))
(print "hello")
(ev/sleep 1)
(print "goodbye")
(ev/cancel background-fiber "interruption")
janet event-loop.janet
hello
..........goodbye
error: interruption
in ev/sleep [src/core/ev.c] on line 2928
in visualize-time [event-loop.janet] (tailcall) on line 5, column 4
So by default when a root fiber raises an exception, Janet will print a stacktrace like this. But we can change the way that Janet handles exceptions in root fibers, by installing a supervisor for the fiber.
(defn visualize-time []
(while true
(prin ".")
(flush) # output is line-buffered by default
(ev/sleep 0.1)))
(def supervisor (ev/chan))
(def background-fiber (ev/go visualize-time nil supervisor))
(print "hello")
(ev/sleep 1)
(print "goodbye")
(ev/cancel background-fiber :stop)
(def fiber-event (ev/take supervisor))
(match fiber-event
[:error fib environment] (do
(def error (fiber/last-value fib))
(if (= error :stop)
(print "gracefully stopped")
(propagate error fib)))
event (error (string/format "unexpected fiber event %q" event)))
janet event-loop.janet
hello
..........goodbye
gracefully stopped
Which feels much better to me. The background fiber no longer needs to know how that it’s going to be canceled or exactly what cancellation is going to look like. Our top-level fiber handles the exception for it, and checks it against the value that it chose to mean “gracefully cancel.”
So the way this works is that if a signal propagates all the way to a root fiber, and that signal is in the fiber’s “signal mask,” Janet will write a message about the signal into the “supervisor channel.” And ev/go
will, by default, create a fiber with a signal mask of :e01234
, which is why we see the error event.
A channel is a bounded queue, that can be read from and written to asynchronously. Reads suspend execution until a value is available, and writes suspend execution if the queue is full, resuming once another fiber take
s a value off the queue.
We could use “supervisor channels” to implement our own scheduler, if we wanted to, reacting to errors (or other signals!) across multiple worker fibers. But we’re not going to do that, in this book. We’re not going to talk about channels much at all.
Which is a shame, because channels are very cool, and they’re an important communication primitive when you’re writing complex concurrent programs. But we just aren’t going to have time to do that together, and there is already a large body of literature about “communicating sequential processes” that will teach you how to take advantage of this model of concurrency. I don’t think there’s much point to giving a Janet-specific treatment here — channels have exactly the API you’d expect.
So that’s the event loop. You have seen how it works now, even if we haven’t really discussed what you can do with it.
I suspect that you will mostly interact with the event loop when you want to perform non-blocking IO using the “stream” API, which is an abstraction over byte buffers that you can read from or write to without blocking your program. You’ll probably create streams to read or write to files or TCP sockets, although you can also create streams programmatically.
(defn print-dots []
(while true
(prin ".")
(flush)
(ev/sleep 0)))
(ev/call print-dots)
(def f (os/open "lorem-ipsum.txt" :r))
(print "About to read")
(def bytes (ev/read f 10))
(print "Done reading")
(ev/close f)
(print "Done closing the file descriptor")
(printf "read %q" bytes)
(os/exit 0)
janet streams.janet
About to read
.Done reading
Done closing the file descriptor
read @"Lorem ipsu"
Wait, we could have just called os/exit
this entire time?? That whole to-do about :stop
ping the background task was actually a contrived and artificial problem I made up to talk way too much about fibers??
Ah, well, the read completed very quickly, so we only got a single time-passing dot. But we can see that the other fibers in our program still got a chance to run while this was taking place.
Contrast this with the blocking file
API:
(defn print-dots []
(while true
(prin ".")
(flush)
(ev/sleep 0)))
(ev/call print-dots)
(def f (file/open "lorem-ipsum.txt" :r))
(print "About to read")
(def bytes (file/read f 10))
(print "Done reading")
(file/close f)
(print "Done closing the file descriptor")
(printf "read %q" bytes)
(os/exit 0)
janet blocking.janet
About to read
Done reading
Done closing the file descriptor
read @"Lorem ipsu"
Basically the same code, but file/read
suspended our entire program (not just the current fiber) while it did the read, so the Janet event loop never got a chance to schedule the fiber running our print-dots
function.
I think it’s very important to understand why one is blocking and the other is non-blocking, so at the risk of over-explaining this pretty simple example, here’s what actually happened in the non-blocking case:
The call to ev/read
does two things: first, it tells the kernel that we want to read from the underlying file descriptor backing the stream, using epoll on Linux or kqueue on macOS or something called an IoCompletionPort
(?) on Windows. And then it raises a user signal 9, which causes the current fiber to stop running, and ultimately (unless there is another fiber intercepting user signal 9!) yields control all the way up to the Janet event loop. And then the Janet event loop, umm, loops for a bit, checking on the jobs that we’ve asked the kernel to do, stopping only when the file descriptor has bytes available that it can read. Then the event loop resume
s the fiber that called ev/read
in the first place, passing it (through the resume
call) the actual bytes that it read from the kernel.
Of course the event loop is actually implemented in C, so it doesn’t literally call the Janet function (resume bytes)
, but it does an equivalent thing.
Okay.
Fibers.
Fibers.
We’ve talked an awful lot about fibers already, haven’t we? Surely there isn’t anything else to say about them? It’s probably time for a recap now, isn’t it?
So just to recap, fibers are a primitive control flow construct that you can use to do the following useful things:
Oh gosh. We haven’t talked about all of these things yet. We still have a few things to get through. But the event loop stuff was by far the trickiest bit; the rest will be pretty easy in comparison.
First off: early return. I think that you know enough about fibers by this point to understand how you would implement “early return:” you just wrap the body in a fiber that intercepts a signal:
(defmacro with-early-return [& body]
~(resume (fiber/new (fn [] ,;body) :i0)))
(defn return [value]
(signal 0 value))
(defn my-function []
(with-early-return
(print "hello")
(return "stopping early")
(print "after returning")))
(print (my-function))
Not so bad! Except that this has the weird property that you can actually return from a function that called you. Look:
(defmacro with-early-return [& body]
~(resume (fiber/new (fn [] ,;body) :i0)))
(defn return [value]
(signal 0 value))
(defn helper-function []
(return "helper function currently on strike"))
(defn my-function []
(with-early-return
(print "do some work")
(helper-function)
(print "keep working")))
(print (my-function))
janet early-return.janet
do some work
helper function currently on strike
Weird, right?
Now, Janet already has built-in macros that implement more sophisticated “early return” behaviors than this — prompt
, which has this “return from a parent function” behavior, and label
, which does not. Er, well, you can sort of do it anyway with label
, but you’d have to give your helper functions explicit permission to return… whatever. Fiber-based control flow is just a little bit different than traditional early-return.
Oh, coroutines.
We’ve spent a lot of time already talking about a specific application of coroutines: asynchronous event-driven IO. But coroutines in general are fancier, more powerful versions of generators, right? And generators can do lots of cool things. Coroutines should be able to do even cooler things, shouldn’t they?
But you don’t see it very often! And it’s hard to come up with a simple example of when you’d want to use a coroutine to simplify your code, in part because coroutines don’t really make simple things easier. They make complex, hairy things easier.
In fact, in the last year, I have only encountered one problem where I felt that coroutines — pure coroutines — were a good fit, and actually made the code simpler and easier to follow.
I was writing a parser for a weird language that lets you use custom operators before you define them. So any time I encountered an unknown symbol I had to stop parsing the current statement and move onto the next one, because I didn’t know whether to parse that symbol as an operator or as a regular value. (And, for reasons, I couldn’t do a two-pass thing to identify operators ahead of time.)
So a very natural way to implement that is to create a coroutine for every statement, and an outer “scheduler” for the whole program that you’re parsing. The scheduler starts the first statement’s coroutine, and lets it run until it encounters an unknown symbol (which it yields). Then the scheduler writes down the symbol that it’s waiting for, and moves on to the next statement.
Whenever a coroutine finishes parsing a statement, and you learn whether it contained an operator or a function declaration, then the scheduler finds any coroutines that were waiting on that symbol and resumes them (passing in the symbol’s type when it does).
You can see strong parallels between this parser and the “effectful” coroutines of the async
/await
variety. In both cases there’s some kind of scheduler that’s coordinating work between multiple coroutines — either the built-in event loop, or my own “parser scheduler.” In both cases yield means “I’m asking a question that you might not know the answer to yet.” And in both cases the scheduler resumes once it has the answer.
But I don’t want you to think that this “shape” of problem is the only thing that you can use pure coroutines for — it’s just the only time I ever think to reach for them. All of my experience with coroutines comes from this asynchronous event loop type of programming, so those are the only nails I try to hit with them.
Food for thought, though: generators make it easy to write ad-hoc iterators. Coroutines make it easy to write ad-hoc state machines. But this conversation is a little bit out of scope for this book.
Oh, speaking of scopes…
We haven’t talked about dynamic variables yet, but one way to think about them is like a global variable with a stack of values. Instead of setting dynamic variables, you push new values for them, and when you’re done with whatever it is you’re doing, you pop that value off, restoring the dynamic variable to whatever it was set to previously.
But actually, the “stack” of values is determined by the “stack” of fibers that you are currently running. Fibers each have their own view of the current “dynamic variables,” and when a fiber completes, any dynamic variables that it had set go away.
This is a simplification, and it’s weird, so let’s look at a concrete example.
(def file (file/open "output.txt" :w))
(print "everything is normal")
(with-dyns [*out* file]
(print "but this writes to a file"))
(print "back to normal")
janet dynamic.janet
everything is normal
back to normal
cat output.txt
but this writes to a file
*out*
is a dynamic variable that determines the default destination for functions like print
and prin
and printf
. By setting the dynamic variable to a new value, we essentially “redirect” these functions to write a file instead. (Note that we aren’t actually redirecting stdout when we do this, we’re just changing the behavior of print
, which knows to consult this special dynamic variable.)
I mostly see dynamic variables used like this: as implicit additional function arguments that are silently available to functions. So rather than print
taking an optional argument for the destination buffer, Janet uses a pass-by-dynamic-variable calling convention for it.
So how do dynamic variables work, and what do they have to do with fibers?
Well, every fiber has something called an environment. You might remember environments from Chapter Two, when I said that your program’s environment is the “top-level scope.” This was a simplification: it’s not really the “program’s environment;” it’s the “default fiber’s environment.”
You can manipulate the environment by calling setdyn
, and you can query the environment by calling dyn
. The environment is a table, so you can put any values in it, but by convention dynamic variables are named with keywords:
(def f (file/open "output.txt" :w))
(printf "*out* is actually just %q" *out*)
(setdyn *out* f)
(pp (curenv))
janet environment.janet
*out* is actually just :out
cat output.txt
@{f @{:source-map ("environment.janet" 1 1) :value <core/file 0x6000022512B0>}
:args @["environment.janet"]
:current-file "environment.janet"
:out <core/file 0x6000022512B0>
:source "environment.janet"}
Well, I actually pretty-printed it a little, but you get the idea.
So you can see those other entries in our environment table — :args
and :current-file
and :source
— those are actually just dynamic variables that Janet sets by default. We can get the current value with (dyn :args)
:
(print "I have the following arguments:")
(pp (dyn :args))
janet dyn.janet
I have the following arguments:
@["dyn.janet"]
janet -c dyn.janet dyn.jimage
I have the following arguments:
@["-c" "dyn.janet" "dyn.jimage"]
Okay, so so far these dynamic variables are just entries in the root fiber’s environment table. But when we create a new fiber, we have three choices for what environment it should have:
setdyn
without an environment, it will automatically create an empty one for you, and install it with fiber/setenv
.:i
flag, for “inherit”). If this fiber calls setdyn
, it will change its parents environment table.:p
flag, for “prototype”). This environment will be able to read all of the values in the parent environment, but if it calls setdyn
, those changes won’t be visible to the parent fiber. (We’ll talk more about prototypal inheritance in Chapter Eight, if this doesn’t make sense.)In practice, you won’t have to think about this at all. You will just use the helper with-dyns
, which just creates and immediately resumes a fiber with no signal mask and the :p
environment flag, whose function first calls setdyn
for each of the dynamic bindings and then runs all of the expressions that you pass it. Using with-dyns
means that you don’t need to worry about your dynamic variables accidentally outliving their intended scope in the case that you raise an exception before you can clean up after yourself.
Ah, that feels good.
But I actually left one thing out. One thing that I don’t really want to talk about, but that I have to mention before we can bring this chapter to a close.
Janet also supports running fibers in their own actual OS-level threads. You can actually spawn “real” background tasks that run in parallel with the rest of your process and communicate with other fibers via thread-safe channels that you can create with ev/thread-chan
. Janet supports multithreading.
I’m not going to talk about multithreading in Janet, because I don’t have any personal experience writing multithreaded Janet, so all I could really do is regurgitate the official documentation. And the official documentation is pretty easy to understand. So go there, if you want to write multithreaded Janet.
Alright. We just did a whole chapter on concurrency and coroutines and complicated cross-stack control flow. I think we’ve earned a break.
So this is going to be a chapter about simple control flow. Loops and list comprehensions and if
expressions; things like that.
You’ve seen a lot of control flow already, and I didn’t think that any of it deserved explanation. The following all do the things that you’d expect them to:
(each x [1 2 3]
(print x))
(for x 0 3
(print x))
(while true
(print x))
It’s worth talking about each
, though. each
can iterate over a variety of data structures — tuples, arrays, structs, tables, strings, buffers, fibers (generators), and even keywords and symbols (which behave identically to strings).
Janet doesn’t have a formal concept of “interfaces” or “protocols” for types to conform to, and you can’t make an iterable “object” by defining a few “methods.” Iteration is based on a single function, next
, and you cannot overload what next
means for types that you define in Janet.
But! If you define a custom JANET_ABSTRACT
type, you can provide a custom implementation of next
. We’ll talk about this in Chapter Nine.
It’s a bit weird that Janet doesn’t let you make custom iterable types without writing C code, but at the same time it means that you can always use structs and tables as generic containers: you never need to worry about accidentally inserting a key called :next
and shadowing a method or something, so Janet has no equivalent of JavaScript’s Object.prototype.toString.call(object)
pattern.
Okay, so next
is really simple: it gives you a way to iterate over the keys in a data structure:
repl:1:> (next [10 20 30])
0
repl:2:> (next [10 20 30] 0)
1
repl:3:> (next [10 20 30] 1)
2
repl:4:> (next [10 20 30] 2)
nil
Keys! Not values. For tuples and arrays — and strings, and other sequential types — the keys are just indices. For associative types, they’re the, umm, keys:
repl:1:> (next {:foo 1 :bar 2})
:foo
repl:2:> (next {:foo 1 :bar 2} :foo)
:bar
repl:3:> (next {:foo 1 :bar 2} :bar)
nil
Note that next
returns nil
to indicate “no more keys.” This means that nil
cannot, itself, be a key of any data structure! This is why Janet doesn’t allow nil
to appear as a key in a table or a struct.
For fibers, however, next
actually resumes and advances to the first call to yield
. And then it returns 0
. Yes, 0
. Always 0
.
repl:1:> (def generator (coro (yield 10) (yield 20) (yield 30)))
repl:2:> (next generator)
0
repl:3:> (in generator 0)
10
repl:4:> (in generator 0)
10
repl:5:> (next generator 0)
0
repl:6:> (in generator 0)
20
repl:7:> (next generator)
0
repl:8:> (next generator)
nil
(in generator 0)
is the same as (fiber/last-value generator)
.
So next
is not a pure function; it can actually advance the underlying structure in some cases. This is weird, since it looks like a pure function — you give it the “previous index” as an explicit argument, after all. And usually it is! But you can’t rely on that, when you’re dealing with fibers or abstract types.
So each
uses next
to compute the keys, and then it calls in
to look up the values. There’s also eachk
, which calls next
to compute the keys, and just iterates over those:
repl:1:> (eachk i [-3 1 99] (pp i))
0
1
2
nil
repl:2:> (eachk i {:foo 1 :bar 2} (pp i))
:foo
:bar
nil
repl:3:> (eachk i (coro (yield 1) (yield 2)) (pp i))
0
0
nil
And there’s eachp
, which iterates over key-value pairs:
repl:1:> (eachp i [-3 1 99] (pp i))
(0 -3)
(1 1)
(2 99)
nil
repl:2:> (eachp i {:foo 1 :bar 2} (pp i))
(:foo 1)
(:bar 2)
nil
repl:3:> (eachp i (coro (yield 1) (yield 2)) (pp i))
(0 1)
(0 2)
nil
Nothing tricky here.
Now, okay. I said that you can’t define your own iterable implementation in Janet without resorting to C code. This is true. But you can write a fiber that statefully iterates over values in a structure, and then iterate over that. It’s sort of… weird and hacky, and you can’t use eachk
or eachp
, because the keys of the thing you’re actually iterating over will always be 0
, but it’s an easy way to define an ad-hoc iterator:
(defn make-table-set [& elements]
(def result @{})
(each element elements
(put result element true))
result)
(defn elements [table-set]
(coro
(eachk element table-set
(yield element))))
repl:1:> (def good-numbers (make-table-set 1 3 60))
@{1 true 3 true 60 true}
repl:2:> (reduce + 0 (elements good-numbers))
64
This is a pretty dumb example, but you can see that this trick allows us to use functions that use next
under the hood, like map
and reduce
and filter
, with structs and tables that have their own logical idea of how iteration should work.
There’s no list of “functions that use next
under the hood,” but as a general convention, bare functions like map
work on all iterable values, while namespaced functions like array/concat
only work on specific types.
Okay, that’s looping on easy mode. But sometimes looping is not quite as easy. Sometimes you have to write nested loops, or loops full of conditions. Consider this simple structure:
(def hosts [
{:name "claudius"
:ip "45.63.9.183"
:online true
:services
[{:name "janet.guide"}
{:name "bauble.studio"}
{:name "ianthehenry.com"}]}
{:name "caligula"
:ip "45.63.9.184"
:online false
:services[{:name "basilica.horse"}]}])
Let’s say we want to print all of the names of the services for any hosts that are online. This isn’t hard; it’s just a nested loop:
(each host hosts
(if (host :online)
(each service (host :services)
(print (service :name)))))
I don’t think there’s anything wrong with that code, but you might prefer the following alternative:
(loop [host :in hosts
:when (host :online)
service :in (host :services)]
(print (service :name)))
loop
is a little DSL that makes it easy to write nested loops and conditionals. In this case it didn’t buy us too much, since the expression was so simple. But it can really simplify complex nested looping:
(def hosts [
{:name "claudius"
:ip "45.63.9.183"
:online true
:services
{"janet.guide" true
"bauble.studio" false
"ianthehenry.com" true}}
{:name "caligula"
:ip "45.63.9.184"
:online false
:services {"basilica.horse" true}}])
(each host hosts
(if (host :online)
(let [ip (host :ip)]
(eachp [service-name available] (host :services)
(if available
(for instance 0 3
(pp [ip service-name instance])))))))
Now compare that to the equivalent loop
expression:
(loop [host :in hosts
:when (host :online)
:let [ip (host :ip)]
[service-name available] :pairs (host :services)
:when available
instance :range [0 3]]
(pp [ip service-name instance]))
I fully admit that this is a contrived, artificial example, but I hope that it demonstrates some of the power of loop
. It lets you iterate over values, keys, key-value pairs, and arbitrary ranges. It lets you insert conditions — stateless conditions like :when
, and stateful conditions like :while
and :until
. :let
allows you to give names to intermediate values, and you can inject arbitrary effects before and after the inner loop with :before
and :after
.
loop
can be very powerful, and perhaps even a little intimidating at first, and you might be wondering if it’s worth learning a whole weird DSL just to make nested loops slightly shorter to write. And that’s fair — I think that I mostly just use loop
because :when
lets me save a little indentation over (each ... (if ...))
.
But there’s a good reason to understand this little DSL, and that reason is seq
.
seq
is not loop
, but it uses the exact same language to express what it does. But instead of imperatively looping to perform side effects and then returning nil
, seq
will allocate an array that collects every value that your loop body evaluates to. It’s like a super-powered list comprehension:
(def hosts [
{:name "claudius"
:ip "45.63.9.183"
:online true
:services
{"janet.guide" true
"bauble.studio" false
"ianthehenry.com" true}}
{:name "caligula"
:ip "45.63.9.184"
:online false
:services {"basilica.horse" true}}])
(def services
(seq [host :in hosts
:when (host :online)
service :pairs (host :services)]
service))
(pp services)
janet hosts.janet
@[("ianthehenry.com" true) ("janet.guide" true) ("bauble.studio" false)]
This is a dumb example, but you can often simplify a complex map
/filter
/mapcat
pipeline into a single seq
that performs your data transformation more efficiently.
There’s also tabseq
, which you can use to construct a table out of a sequence of key-value pairs, and generate
, which will return a fiber that yields each of the inner values, so that you can lazily consume them later.
That’s all I’m going to say about the loop
macro — the official documentation has an exhaustive list of all the things you can write in a loop
or loop
-flavored expression, and it’s worth glancing over it once.
Finally, we should talk about break
. break
works just like it does in JavaScript — it breaks out of the innermost loop. But you can also use break
outside of a loop, as a cheap kind of early return:
(defn test-breaking []
(if true
(break "everything is fine"))
(error "this won't get a chance to raise"))
repl:1:> (test-breaking)
"everything is fine"
But if you want to early return from inside a loop, or if you want to break out of multiple levels a loop at once, you’ll have to use the prompt
or label
macros to create an abortable fiber.
Sadly break
does not allow loops to evaluate to an expression. Loops always return nil
, even if you break with a value:
repl:1:> (while true (break 123))
nil
And there is no equivalent of JavaScript’s continue
built into the language — but, of course, you can simulate it with a fiber.
Alright. That’s all I know about looping in Janet. Let’s move on to conditionals.
Conditionals are very easy and very simple, but they might look slightly weird if you’re only used to JavaScript.
Let’s start with if
. There’s no else
in Janet’s if
; the else part is implicit. (if condition then-part else-part)
. This means that the then-part
can only contain a single expression, so you might need to group expressions with do
if you want to do multiple things.
But it also means that there’s nowhere to write else if
the way that you would in JavaScript:
if (x > 0) {
console.log("positive")
} else if (x < 0) {
console.log("negative")
} else if (x === 0) {
console.log("zero")
} else {
console.log("NaNs for breakfast again??")
}
If you wrote that in Janet, it would look…
(if (> x 0)
(print "positive")
(if (< x 0)
(print "negative")
(if (= x 0)
(print "zero")
(print "NaNaNaNaN"))))
…awful, in my opinion. There’s nothing worse than having to count parentheses because your expressions get too nested.
But fortunately you don’t have to write code like this. Nested if
s are such a common thing that Janet has a special macro for creating them without any triangular indentation:
(cond
(> x 0) (print "positive")
(< x 0) (print "negative")
(= x 0) (print "zero")
(print "NaNaNaNaN"))
cond
is literally the same as writing nested if
s:
repl:1:> (macex '(cond (> x 0) (print "positive") (< x 0) (print "negative") (= x 0) (print "zero") (print "NaNaNaNaN")))
(if (> x 0) (print "positive") (if (< x 0) (print "negative") (if (= x 0) (print "zero") (print "NaNaNaNaN"))))
But it’s much nicer looking.
Janet also has case
, which is a special, umm, case of cond
, when all of your conditions are of the form (= value something)
:
(defn strings [data]
(case (type data)
:string (print data)
:tuple (each element data (strings element))
(error "invalid")))
repl:1:> (strings ["find" ["those" ["nested"]] "values"])
find
those
nested
values
nil
This is very similar to JavaScript’s switch
statement, but much more ergonomic. No break
s to worry about, no arguing over whether or not to indent the case
lines. Just round, sumptuous parentheses as far as the eye can see.
But Janet has another switch
alternative that’s a lot more powerful than case
. It’s called match
, and instead of only matching literal values, it matches data structures against patterns, allowing you to check multiple values in the same structure and to easily extract individual pieces.
We could use match
to implement a really verbose and contrived calculator:
(defn calculate [expr]
(match expr
[:add x y] (+ x y)
[:subtract x y] (- x y)
[:multiply x y] (* x y)
[:divide x y] (/ x y)))
repl:1:> (calculate [:add 1 2])
3
repl:2:> (calculate [:subtract 5 10])
-5
“Simple” values like keywords and numbers and strings match by equality, while “fancy” values like tuples and structs match each of their elements by equality. Identifiers like x
match anything, and bind the name x
to the value that it matched. You get it. It’s pattern matching. I know JavaScript doesn’t have pattern matching, but I’m sure you’ve seen this somewhere before.
You can also add arbitrary conditions to any pattern by wrapping it in parentheses and adding a boolean expression:
(defn calculate [expr]
(match expr
[:add x y] (+ x y)
[:subtract x y] (- x y)
[:multiply x y] (* x y)
([:divide x y] (= y 0)) (error "division by zero!")
[:divide x y] (/ x y)))
repl:1:> (calculate [:add 1 2])
3
repl:2:> (calculate [:divide 1 0])
error: division by zero!
Which makes match
strictly more powerful than cond
.
The pattern _
matches anything but doesn’t create a binding named _
, even though that is a valid identifier. And you can match dynamic runtime values by using (@ identifier)
:
(def magic-number (math/rng-int (math/rng) 10))
(defn guessing-game [guess]
(match guess
(@ magic-number) "you got it!"
_ "better luck next time"))
repl:1:> (guessing-game 1)
"better luck next time"
repl:2:> (guessing-game 3)
"better luck next time"
repl:3:> (guessing-game 6)
"better luck next time"
repl:4:> (guessing-game 5)
"better luck next time"
repl:5:> (guessing-game 4)
"you got it!"
Nice.
Obviously this particular case should just be an if
expression, but remember that you can include _
and (@ foo)
anywhere inside a deeply nested pattern, so they can be very useful.
Alright. Now: match
is great, and pattern matching is great, and you won’t hear me say anything against pattern matching in the abstract.
But.
Janet’s implementation of pattern matching happens to have a couple rough edges that you’ll need to be aware of when you’re using match
.
The first and largest gotcha concerns tuple patterns: tuple patterns actually match prefixes of sequential structures:
(match [1 2]
[] "no elements"
[x] "one element"
[x y] "two elements"
[x y z] "three elements")
What would you expect that to evaluate to? Yeah, me too. But, unfortunately, it’s "no elements"
, because []
is the first pattern that matches a prefix of the data. If you invert the order of the cases, it does the thing you’d expect:
(match [1 2]
[x y z] "three elements"
[x y] "two elements"
[x] "one element"
[] "no elements")
That evaluates to "two elements"
, because [x y]
is the first matching prefix.
This is terrible, and you will mess this up at some point, even though I warned you about it, because it’s just so unintuitive. My only recommendation to avoid this is to write your own match
macro that doesn’t have this problem, and exclusively use that.
There’s a historical explanation for this, which is that prior to Janet 1.20.0 there was no way to match a prefix pattern at all. But Janet 1.20.0 added [x y & rest]
support to match
, so you can explicitly match a prefix. But now the pattern [x y]
means the same thing that the pattern [x y &]
should mean, and [x y &]
is just an error, and that’s very sad.
This is true in all destructuring assignments, too:
repl:1:> (def [x y] [1 2 3])
(1 2 3)
repl:2:> x
1
repl:3:> y
2
repl:4:> (def [x y &] [1 2 3])
repl:4:1: compile error: expected symbol following '& in destructuring pattern
:(
The next gotcha has to do with associative patterns — matching tables and structs.
Associative patterns can be very annoying in Janet, because of the fact that structs and tables cannot contain nil
. This is unfortunate, and I’m very sorry; it’s still my least favorite thing about Janet. But it’s something that you’ll have to be aware of, because you might find yourself writing a very simple match
like this:
(def binary-tree {:value 10 :left nil :right {:value 15 :left nil :right nil}})
(defn contains? [tree needle]
(match tree
nil false
{:value (@ needle)} true
{:value value :left left :right right} (cond
(< needle value) (contains? left needle)
(> needle value) (contains? right needle))))
And then being very surprised that it does not work:
repl:1:> (contains? binary-tree 10)
true
repl:2:> (contains? binary-tree 15)
nil
This is because the pattern {:left _}
cannot match a struct with {:left nil}
, because there are no structs with {:left nil}
. {:left nil}
is the same as {}
. It’s the empty struct. Really:
repl:1:> {:left nil}
{}
This isn’t the end of the world; it just means that we can’t use nil
as a sentinel value in any associative data structures. Arguably it’s nice to have a separate sentinel anyway, but usually when I’m hacking up a quick script I just want to reach for nil
in the same places that I would reach for null
in other languages. But remember: nil
is not null
. nil
is undefined
, so we have to make our own null
substitute:
(def empty-tree @{})
(def binary-tree
{:value 10
:left empty-tree
:right {:value 15 :left empty-tree :right empty-tree}})
(defn contains? [tree needle]
(match tree
(@ empty-tree) false
{:value (@ needle)} true
{:value value :left left :right right} (cond
(< needle value) (contains? left needle)
(> needle value) (contains? right needle))))
repl:1:> (contains? binary-tree 10)
true
repl:2:> (contains? binary-tree 11)
false
repl:3:> (contains? binary-tree 15)
true
I’m using an empty table because I just want some globally unique value, and the address of a mutable data structure is guaranteed to be unique. You could also use a buffer or an array or even a generated symbol, if you were fancy, which we’ll talk about in Chapter Thirteen.
So that’s match
. The most powerful conditional control flow statement.
cond
, case
, and match
all take pairs of “thing to check” and “expression to evaluate if that thing passes the check.” But they can all optionally take a single final argument to act as a default expression if nothing else matches before that. Otherwise, they’ll default to nil
.
(case x
1 "one"
2 "two"
"default value")
(cond
(= x 1) "one"
(= x 2) "two"
"default value")
(match x
1 "one"
(@ (+ 1 1)) "two"
"default value")
And that’s control flow! That was easy, wasn’t it? I mean, compared to fibers, that was nothing.
There are a few little stragglers that we can knock out quickly before we say goodbye: when
is a lot like if
, but when
has no else part, so you can write multiple things in the then part without having to wrap them in do
:
(when (even? x)
(print "it's even!")
(print "this is a joyous day"))
I think of when
as an imperative, side-effecty thing, and if
as more of an expressiony thing. But (when x y z)
is just shorthand for (if x (do y z))
.
There’s also unless
, which is exactly like when
, but inverts the condition. So (unless x y z)
is the same as (if (not x) (do y z))
.
Actually, there’s shorthand for that too: (if-not x (do y z))
.
There’s actually a whole menagerie of additional control flow constructs that you could use — if-let
and when-let
, if-with
and when-with
. And forever
, which is an alias for while true
, and forv
, which is just like for
except that you can mutate the iteration variable within the loop. I’m not going to talk about these because they’re pretty straightforward and not incredibly useful, but you should look them up when you get home.
Eventually you’re going to want to put code in multiple files. Like this:
(defn shout [x]
(printf "%s!"
(string/ascii-upper x)))
(use ./helpers)
(shout "hey there")
janet main.janet
HEY THERE!
There are two macros that you’ll reach for when you’re doing this: use
and import
.
use
brings all of the public bindings from one file into another. import
brings them in, but with a module prefix:
(defn shout [x]
(printf "%s!"
(string/ascii-upper x)))
(import ./helpers)
(helpers/shout "ahoy")
janet main.janet
AHOY!
You can specify a different module name with (import ./helpers :as h)
— that will give you h/shout
— or a different prefix altogether: (import ./helpers :prefix "helpers--")
will give you helpers--shout
instead.
This is all very easy and intuitive, but it’s worth spending some time talking about precisely what this means and how this works. Let’s notice a few things here:
We specified the module as a path, ./helpers
, not a name like helpers
.
If we import something as a bare name, Janet will try to load a package with that name from the module load path, which defaults to /usr/local/lib/janet
but can be overridden with the JANET_PATH
environment variable. We’ll talk more about this in a bit.
We didn’t specify a file extension.
This is because Janet can import modules in a few different formats. It can import source files, obviously, with the extension .janet
. It can import directories — if helpers.janet
doesn’t exist, Janet will look for helpers/init.janet
instead. It can also import .jimage
files — precompiled images — and .so
/.dll
files, when you have precompiled native libraries. (We’ll talk more about that in Chapter Nine.)
We didn’t do anything to “export” the shout
function, or declare ourselves as a module, or anything like that.
That last point is sort of interesting, and is somewhere that Janet differs from JavaScript.
In Janet, when we import a source file, we’re really importing the environment of that file. Er, the environment that results from executing that file.
Recall from Chapter Two that top-level statements execute during file “compilation,” and Janet source files end up producing “environments,” which are tables of bindings and metadata to values.
So (use ./helpers)
will actually execute the script ./helpers.janet
, compute its environment, and then create a bunch of local names in our environment with the same values. Er, but “a bunch” in this case is only one, because the environment happened to only have one identifier. But, you know, in general it’s a bunch.
We can actually split this up into smaller steps: we can compute the module’s environment without creating any names in our own environment. We can just grab a hold of it with require
:
(defn shout [x]
(printf "%s!"
(string/ascii-upper x)))
(def helpers
(require "./helpers"))
(pp helpers)
janet require.janet
@{shout @{:doc "(shout x)\n\n"
:source-map ("helpers.janet" 1 1)
:value <function shout>}
:current-file "helpers.janet"
:macro-lints @[]
:source "helpers.janet"}
Note that require
is actually a function, not a macro, so we have to pass "./helpers"
as a string so that Janet doesn’t look for a variable called ./helpers
(it’s a valid identifier!). The expression still executes at compile-time because we’re using it as a top-level statement, but you could require
something at runtime if you wanted to — there’s even a function version of import
that you can use, called import*
. (There’s no use*
, but that’s just import* :prefix ""
.)
Janet will only execute the scripts we import once, and will cache the returned environments for future use
or import
or require
invocations. But we can pass :fresh true
to one of those calls to bypass the module cache, which is useful if we’re programming interactively and want to reload a module without restarting the repl.
But okay. Sometimes when you write a module, you don’t want to export everything. You can also create “private” bindings in an environment, with def-
, var-
, defn-
, and defmacro-
.
(def- upcase string/ascii-upper)
(defn shout [x]
(printf "%s!"
(upcase x)))
(print "this environment:")
(pp (require "./helpers"))
(print)
(print "creates these bindings:")
(use ./helpers)
(pp (curenv))
janet private.janet
this environment:
@{shout @{:doc "(shout x)\n\n"
:source-map ("helpers.janet" 3 1)
:value <function shout>}
upcase @{:private true
:source-map ("helpers.janet" 1 1)
:value <cfunction string/ascii-upper>}
:current-file "helpers.janet"
:macro-lints @[]
:source "helpers.janet"}
creates these bindings:
@{shout @{:private true}
:args @["private.janet"]
:current-file "private.janet"
:macro-lints @[]
:source "private.janet"}
Okay, so I have a few things to say about this.
First off, we can see that upcase
is a normal entry in the environment table, but it has the :private true
metadata set. And use
and import
know to skip any binding with the :private true
metadata.
We could make our own voyeur
macro that doesn’t check binding metadata, and imports private bindings the same as any others — the privateness is only advisory. This might come in handy if we ever want to write tests for implementation details of our modules, but we will speak no more of it in this book.
You can make actually private bindings by not putting them in the environment in the first place, but instead scoping them to a block. But then you have to manually alter the current environment, since defn
will also apply to that block.
(let [upcase string/ascii-upper]
(put (curenv) 'shout @{:value (fn [x]
(printf "%s!"
(upcase x)))}))
Don’t… don’t do this.
Second off, take a closer look at this output:
creates these bindings:
@{shout @{:private true}
:args @["private.janet"]
:current-file "private.janet"
:macro-lints @[]
:source "private.janet"}
shout
is imported as a :private
binding, so any module that imports this module will not re-import it. You can change that with (import ./helpers :export true)
, which will cause it to import bindings without the :private true
bit.
But wait: shout
is only @{:private true}
. This binding has no :value
!
This is an unfortunate quirk of Janet’s pp
behavior when it prints out tables with prototypes. This isn’t just the table @{:private true}
; it also has a prototype that points to the original binding. Instead of copying that table and then setting :private true
, use
and import
create a new table that “inherits” from the original binding:
(use ./helpers)
(def original-shout-binding
(in (require "./helpers") 'shout))
(def local-shout-binding
(in (curenv) 'shout))
(pp local-shout-binding)
(pp (table/getproto local-shout-binding))
(pp original-shout-binding)
(print
(= original-shout-binding
(table/getproto local-shout-binding)))
janet proto.janet
@{:private true}
@{:value <function 0x600003DDB9A0>}
@{:value <function 0x600003DDB9A0>}
true
We’ll talk more about prototypes in Chapter Eight, but the idea is exactly the same as in JavaScript.
This is important in the case that we actually imported a variable from another file — something declared with var
— rather than a const binding declared with def
. By inheriting from the original environment entry, rather than copying it, we’ll automatically see any mutations to the original variable.
Okay. That’s modules in Janet.
Well, actually, we kind of just scratched the surface. Janet’s module system is implemented mostly in Janet, and it’s very flexible, but you will probably not ever need to interact with it beyond import
. But you could, in theory, write a custom module loader to import other things beyond images and source files; you could control the way that Janet resolves modules and searches by file extension. This is mostly useful for writing Janet dialects that you can import from regular files (I myself wrote a version with infix operators when I was doing lots of math stuff), but you could in theory do something more exotic. If you ever actually feel like you need to do advanced module mischief, the official documentation is a perfectly good reference.
So instead of talking more about that, let’s move on to talk about jpm
.
jpm
is the “Janet Project Manager,” not, as you might have guessed, the Janet Package Manager. But its role is mostly the same as npm
or cargo
or opam
or any other package manager — it just, umm, well…
Janet is a young language, and it has some rough edges. One of those rough edges is jpm
. We’re going to talk about it, and it’s going to be fine, but just… lower your expectations slightly before we start.
jpm
does two things: it builds projects and it manages dependencies.
Let’s start with the building bit. We’ll write a very useful binary, cat-v
, and we’ll build it.
(defn choose [rng selections]
(def index (math/rng-int rng (length selections)))
(in selections index))
(defn verbosify [rng word]
(choose rng
(case word
"quick" ["alacritous" "expeditious"]
"lazy" ["indolent" "lackadaisical" "languorous"]
"jumps" ["gambols"]
[word])))
(defn main [&]
(def rng (math/rng (os/time)))
(as-> stdin $
(file/read $ :all)
(string/split " " $)
(map (partial verbosify rng) $)
(string/join $ " ")
(prin $)))
Note that our cat-v
doesn’t actually concatenate anything; it only works over stdin, because, well, that seemed more likely to upset the people who get upset over how other people use cat
.
Let’s take it for a spin:
janet main.janet <<<"The quick brown fox jumps over the lazy dog."
The alacritous brown fox gambols over the languorous dog.
Perfect. I can already tell that this is going to be very useful, so let’s set about packaging it for the rest of the world.
To do this, all we have to do is create a project.janet
file.
A project.janet
file is basically a combination of metadata and Makefile-style tasks, in script form. jpm
will run your project.janet
file, which will produce an environment of metadata, as well as registering tasks (as a side effect).
Janet’s task runner DSL is very simple:
(task "say-hello" ["get-ready"]
(print "hello"))
(task "get-ready" []
(print "getting ready..."))
jpm run say-hello
getting ready...
hello
That’s a valid project file, although in practice they won’t really look like that. They’ll look like this:
(declare-project
:name "cat-v"
:description "cat --verbose"
:dependencies [])
(declare-executable
:name "cat-v"
:entry "main.janet")
declare-project
and declare-executable
are built-in functions that will register default tasks like build
and install
, as well as setting all the correct metadata variables that jpm
likes.
jpm build
generating executable c source build/cat-v.c from main.janet...
compiling build/cat-v.c to build/build___cat-v.o...
linking build/cat-v...
So that actually produced a native binary that we can run and distribute just like any other executable:
build/cat-v <<<"the quick brown fox"
the alacritous brown fox
file build/cat-v
build/cat-v: Mach-O 64-bit executable arm64
otool -L build/cat-v
build/cat-v:
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)
du -h build/cat-v
684K build/cat-v
684 kibibytes is not small, for such a trivial program. Maybe we can make it smaller? Let’s see how it was compiled.
jpm build --verbose
Oh, nothing happened. That’s unfortunate.
Nothing happened because our input files didn’t change, and even though we asked for a verbose build, jpm
won’t actually perform a rebuild if the generated targets have a newer mtime
than the sources. So:
touch -m main.janet
And then:
jpm build --verbose
generating executable c source build/cat-v.c from main.janet...
compiling build/cat-v.c to build/build___cat-v.o...
cc -c build/cat-v.c -DJANET_BUILD_TYPE=release -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -O2 -o build/build___cat-v.o
linking build/cat-v...
cc -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -O2 -o build/cat-v build/build___cat-v.o /usr/local/lib/libjanet.a -lm -ldl -pthread
Annoyingly, jpm
has no way to bypass this mtime
check and force a rebuild, so we’ll just have to run jpm clean
before every build invocation when we’re tweaking the build settings.
But alright. We can see that it’s building with -O2
. Let’s try blindly passing -Os
and see if that makes any difference:
jpm clean; jpm build --verbose --cflags="-Os"
Deleted build directory build/
generating executable c source build/cat-v.c from main.janet...
compiling build/cat-v.c to build/build___cat-v.o...
cc -c build/cat-v.c -DJANET_BUILD_TYPE=release -Os -I/usr/local/include/janet -I/usr/local/lib/janet -O2 -o build/build___cat-v.o
linking build/cat-v...
cc -Os -I/usr/local/include/janet -I/usr/local/lib/janet -O2 -o build/cat-v build/build___cat-v.o /usr/local/lib/libjanet.a -lm -ldl -pthread
Oh dear. That wasn’t what we meant. We can see that specifying our own --cflags
got rid of the -std=c99
flag, but did not get rid of -O2
.
In fact there is no way to get rid of -O2
completely. jpm
will always pass an -O
level, and we control which optimization level by passing the --optimize
flag to jpm
:
jpm clean; jpm build --verbose --optimize=3
Deleted build directory build/
generating executable c source build/cat-v.c from main.janet...
compiling build/cat-v.c to build/build___cat-v.o...
cc -c build/cat-v.c -DJANET_BUILD_TYPE=release -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -O3 -o build/build___cat-v.o
linking build/cat-v...
cc -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -O3 -o build/cat-v build/build___cat-v.o /usr/local/lib/libjanet.a -lm -ldl -pthread
Note that jpm
’s argument parser is pretty conservative: we have to pass --optimize=3
; --optimize 3
will not work.
And in fact jpm
doesn’t have a way to build with -Os
:
jpm clean; jpm build --verbose --optimize=s
Deleted build directory build/
error: option :optimize, expected integer, got "s"
in errorf [boot.janet] (tailcall) on line 171, column 3
in setup [/usr/local/lib/janet/jpm/cli.janet] on line 50, column 26
in run [/usr/local/lib/janet/jpm/cli.janet] (tailcall) on line 84, column 15
in run-main [boot.janet] on line 3790, column 16
in cli-main [boot.janet] on line 3935, column 17
It can only build with -O0
through -O3
. Annoying.
But, fortunately, we don’t have to use jpm
to build this. We can build it ourselves.
(declare-project
:name "cat-v"
:description "cat --verbose"
:dependencies [])
(declare-executable
:name "cat-v"
:entry "main.janet"
:no-compile true)
Note that, although we can pass --cflags
and --optimize
as command line flags, we can’t pass --no-compile
. Why? No idea. Some subset of options are overridable from the command line, and are easily discoverable; others only exist in project.janet
, and you have to read the jpm
source to figure out what they are.
jpm clean; jpm build --verbose
Deleted build directory build/
generating executable c source build/cat-v.c from main.janet...
Okay. This created a very interesting file:
static const unsigned char bytes[] = {215, 0, 205, 0, 152, 0, 0, 7, 0, 0, 205, 127, 255, 255, 255, 12, 34, 206, 4, 109, 97, 105, 110, 206, 10, 109, 97, 105, 110, 46, 106, 97, 110, 101, 116, 216, 7, 111, 115, 47, 116, 105, 109, 101, 216, 8, 109, 97, 116, 104, 47, 114, 110, 103, 216, 5, 115, 116, 100, 105, 110, 208, 3, 97, 108, 108, 216, 9, 102, 105, 108, 101, 47, 114, 101, 97, 100, 206, 1, 32, 216, 12, 115, 116, 114, 105, 110, 103, 47, 115, 112, 108, 105, 116, 215, 0, 205, 0, 152, 0, 0, 10, 2, 2, 2, 7, 24, 206, 9, 118, 101, 114, 98, 111, 115, 105, 102, 121, 218, 2, 206, 5, 113, 117, 105, 99, 107, 210, 2, 0, 206, 10, 97, 108, 97, 99, 114, 105, 116, 111, 117, 115, 206, 11, 101, 120, 112, 101, 100, 105, 116, 105, 111, 117, 115, 206, 4, 108, 97, 122, 121, 210, 3, 0, 206, 8, 105, 110, 100, 111, 108, 101, 110, 116, 206, 13, 108, 97, 99, 107, 97, 100, 97, 105, 115, 105, 99, 97, 108, 206, 10, 108, 97, 110, 103, 117, 111, 114, 111, 117, 115, 206, 5, 106, 117, 109, 112, 115, 210, 1, 0, 206, 7, 103, 97, 109, 98, 111, 108, 115, 215, 0, 205, 0, 152, 0, 0, 6, 2, 2, 2, 1, 8, 206, 6, 99, 104, 111, 111, 115, 101, 218, 2, 216, 12, 109, 97, 116, 104, 47, 114, 110, 103, 45, 105, 110, 116, 44, 2, 0, 0, 61, 3, 1, 0, 48, 0, 3, 0, 42, 5, 0, 0, 51, 4, 5, 0, 25, 3, 4, 0, 56, 5, 1, 3, 3, 5, 0, 0, 1, 1, 1, 32, 0, 14, 0, 14, 0, 14, 0, 3, 1, 3, 0, 3, 44, 2, 0, 0, 42, 5, 0, 0, 35, 4, 1, 5, 28, 4, 3, 0, 42, 3, 1, 0, 26, 16, 0, 0, 42, 7, 2, 0, 35, 6, 1, 7, 28, 6, 3, 0, 42, 5, 3, 0, 26, 10, 0, 0, 42, 9, 4, 0, 35, 8, 1, 9, 28, 8, 3, 0, 42, 7, 5, 0, 26, 4, 0, 0, 47, 1, 0, 0, 67, 9, 0, 0, 25, 7, 9, 0, 25, 5, 7, 0, 25, 3, 5, 0, 48, 0, 3, 0, 42, 4, 6, 0, 52, 4, 0, 0, 5, 1, 2, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 4, 7, 0, 7, 191, 252, 5, 0, 5, 0, 5, 191, 255, 3, 0, 3, 0, 3, 216, 7, 112, 97, 114, 116, 105, 97, 108, 216, 3, 109, 97, 112, 216, 11, 115, 116, 114, 105, 110, 103, 47, 106, 111, 105, 110, 216, 4, 112, 114, 105, 110, 44, 0, 0, 0, 42, 2, 0, 0, 51, 1, 2, 0, 47, 1, 0, 0, 42, 3, 1, 0, 51, 2, 3, 0, 25, 1, 2, 0, 42, 3, 2, 0, 42, 4, 3, 0, 48, 3, 4, 0, 42, 5, 4, 0, 51, 4, 5, 0, 25, 3, 4, 0, 42, 4, 5, 0, 48, 4, 3, 0, 42, 5, 6, 0, 51, 4, 5, 0, 25, 3, 4, 0, 42, 4, 7, 0, 48, 4, 1, 0, 42, 5, 8, 0, 51, 4, 5, 0, 48, 4, 3, 0, 42, 6, 9, 0, 51, 5, 6, 0, 25, 3, 5, 0, 42, 4, 5, 0, 48, 3, 4, 0, 42, 5, 10, 0, 51, 4, 5, 0, 25, 3, 4, 0, 47, 3, 0, 0, 42, 4, 11, 0, 52, 4, 0, 0, 13, 1, 1, 22, 0, 22, 0, 12, 0, 12, 0, 12, 0, 3, 1, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3};
const unsigned char *janet_payload_image_embed = bytes;
size_t janet_payload_image_embed_size = sizeof(bytes);
int main(int argc, const char **argv) {
defined(JANET_PRF)
uint8_t hash_key[JANET_HASH_KEY_SIZE + 1];
JANET_REDUCED_OS
char *envvar = NULL;
char *envvar = getenv("JANET_HASHSEED");
if (NULL != envvar) {
strncpy((char *) hash_key, envvar, sizeof(hash_key) - 1);
} else if (janet_cryptorand(hash_key, JANET_HASH_KEY_SIZE) != 0) {
fputs("unable to initialize janet PRF hash function.\n", stderr);
return 1;
}
janet_init_hash_key(hash_key);
janet_init();
/* Get core env */
JanetTable *env = janet_core_env(NULL);
JanetTable *lookup = janet_env_lookup(env);
JanetTable *temptab;
int handle = janet_gclock(); /* Unmarshal bytecode */
Janet marsh_out = janet_unmarshal(
janet_payload_image_embed,
janet_payload_image_embed_size,
0,
lookup,
NULL);
/* Verify the marshalled object is a function */
if (!janet_checktype(marsh_out, JANET_FUNCTION)) {
fprintf(stderr, "invalid bytecode image - expected function.");
return 1;
}
JanetFunction *jfunc = janet_unwrap_function(marsh_out);
/* Check arity */
janet_arity(argc, jfunc->def->min_arity, jfunc->def->max_arity);
/* Collect command line arguments */
JanetArray *args = janet_array(argc);
for (int i = 0; i < argc; i++) {
janet_array_push(args, janet_cstringv(argv[i]));
}
/* Create enviornment */
temptab = env;
janet_table_put(temptab, janet_ckeywordv("args"), janet_wrap_array(args));
janet_gcroot(janet_wrap_table(temptab));
/* Unlock GC */
janet_gcunlock(handle);
/* Run everything */
JanetFiber *fiber = janet_fiber(jfunc, 64, argc, argc ? args->data : NULL);
fiber->env = temptab;
JANET_EV
janet_gcroot(janet_wrap_fiber(fiber));
janet_schedule(fiber, janet_wrap_nil());
janet_loop();
int status = janet_fiber_status(fiber);
janet_deinit();
return status;
Janet out;
JanetSignal result = janet_continue(fiber, janet_wrap_nil(), &out);
if (result != JANET_SIGNAL_OK && result != JANET_SIGNAL_EVENT) {
janet_stacktrace(fiber, out);
janet_deinit();
return result;
}
janet_deinit();
return 0;
}
<janet.h>
It’s short; I recommend just reading through it. We can notice a few things here:
jpm
embeds the marshaled image of our Janet program as literal bytes inside this source file, which is smart and cool.jpm
only marshals the main
function, not the entire environment of our program.This is pretty shocking, and means that programs compiled in this way may behave differently than programs run directly with janet main.janet
or programs compiled with janet -c main.janet main.jimage
and run later with janet -i main.jimage
.
This is most likely to bite you if you’re using dynamic variables at compile time. For example, if you set *pretty-format*
during the compilation phase, your changes will just get silently thrown away when you jpm build
as a native binary.
Once we have that file, we can build it ourselves:
(declare-project
:name "cat-v"
:description "cat --verbose"
:dependencies [])
(declare-executable
:name "cat-v"
:entry "main.janet"
:no-compile true)
(task "compile" ["build"]
(shell "cc -c build/cat-v.c -DJANET_BUILD_TYPE=release -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -Os -o build/build___cat-v.o"))
(task "link" ["compile"]
(shell "cc -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -Os -o build/cat-v build/build___cat-v.o /usr/local/lib/libjanet.a -lm -ldl -pthread"))
jpm clean; jpm run link
Deleted build directory build/
generating executable c source build/cat-v.c from main.janet...
du -h build/cat-v
684K build/cat-v
Well, that made no difference, which isn’t surprising, since -Os
is basically identical to -O2
. But this was a farce anyway; I don’t really care about the binary size. In OCaml this would be, like, half a gig easy.
But we learned how to add custom build tasks to a project file. This isn’t really a good way to build a native project, because we’re hardcoding paths and compilers and options — it is less portable now — but it is a way to do it that might come in handy if you want to make a more complicated build process.
Also, we really should have written something like this:
(shell "cc"
"-c" "build/cat-v.c"
"-DJANET_BUILD_TYPE=release"
"-std=c99"
"-I/usr/local/include/janet"
"-I/usr/local/lib/janet"
"-Os"
"-o" "build/build___cat-v.o")
But jpm
will politely split the first argument to shell
for us.
It will literally just split it on spaces, though; it doesn’t know anything about shell quoting conventions.
Also, this isn’t really documented, so I don’t know how much you should depend on it. Reading the jpm
source is basically the only way to figure out how to use it, and the source makes no guarantees about future stability.
Alright. That’s all we’re going to say about the first half of jpm
: building projects. Now let’s talk about managing dependencies.
In the process of writing cat-v
, we’ve implemented an extremely interesting and broadly useful function that we could factor into its own library.
(defn verbosify [rng word]
(choose rng
(case word
"quick" ["alacritous" "expeditious"]
"lazy" ["indolent" "lackadaisical" "languorous"]
"jumps" ["gambols"]
[word])))
Yep; that’s the one.
We can move this into its own directory, and package it is as a project…
(defn- choose [rng selections]
(def index (math/rng-int rng (length selections)))
(in selections index))
(defn verbosify [rng word]
(choose rng
(case word
"quick" ["alacritous" "expeditious"]
"lazy" ["indolent" "lackadaisical" "languorous"]
"jumps" ["gambols"]
[word])))
(declare-project
:name "verbosify"
:description "a very useful library"
:dependencies [])
(declare-source
:source "verbosify.janet")
Note that we use declare-source
instead of declare-executable
, because this is a library. And now we just need to add this library as a dependency to our cat-v
project…
Well, actually, we can’t quite yet. jpm
only knows how to install dependencies from git repositories, so we’ll need to create one first:
git init
Initialized empty Git repository in /Users/ian/src/verbosify/.git/
git add .
git commit -m 'make a very useful library'
[master (root-commit) 0a4e386] make a very useful library
2 files changed, 18 insertions(+)
create mode 100644 project.janet
create mode 100644 verbosify.janet
Once that’s done, we can actually add the dependency to our cat-v
project:
(declare-project
:name "cat-v"
:description "cat --verbose"
:dependencies ["file:///Users/ian/src/verbosify"])
(declare-executable
:name "cat-v"
:entry "main.janet")
We declare it as a file://
URL, because we don’t want to push this to some remote site just to pull it back. You could push it to some remote site and serve it over HTTP, and that’s what you’ll do for most normal dependencies. But when you’re developing libraries alongside your application, it can be convenient to reference the local path for a while.
Now we want to ask jpm
to install our declared dependencies, which we can do with jpm deps
. But jpm deps
will actually install our dependencies to a global package repository, not to a “virtual environment” or “sandbox” or something specific to this project. To install to a local directory, we actually have to call jpm deps --local
:
jpm deps -l
Initialized empty Git repository in /Users/ian/src/cat-v/jpm_tree/lib/.cache/git__file____Users_ian_src_verbosify/.git/
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (4/4), 536 bytes | 536.00 KiB/s, done.
From file:///Users/ian/src/verbosify
* [new branch] master -> origin/master
From file:///Users/ian/src/verbosify
* branch HEAD -> FETCH_HEAD
HEAD is now at 0a4e386 make a very useful library
generating /Users/ian/src/cat-v/jpm_tree/lib/.manifests/verbosify.jdn...
Installed as 'verbosify'.
copying verbosify.janet to /Users/ian/src/cat-v/jpm_tree/lib...
npm
also has global and local mode; the difference is that with npm
the default is to work on a local node_modules/
directory, and you have to explicitly call npm --global
to interact with the global directory. In jpm
, unfortunately, it’s the opposite.
Now this created a jpm_tree/
directory for us, which contains the following files:
tree jpm_tree
jpm_tree
├── bin
├── lib
│ └── verbosify.janet
└── man
Note that this does not put each dependency in its own directory like you would see in node_modules/
. It just throws all of the declared source files in a single lib/
directory. We happened to name our source file verbosify.janet
, following a Janet convention, but if we had chosen a generic name like main.janet
or src/init.janet
or something, we would be at risk of a conflict.
What happens if there’s a conflict? If you have multiple dependencies that declare a :source
with the same name?
It happens to be the case at the moment that Janet copies files in the order that you declare them as :dependencies
, so the last listed dependency wins. And this overwriting happens silently; there’s no warning or error about the conflict. Like I said… rough edges.
This is weird! In most languages you put your source files in a directory called src/
or lib/
or something, but most Janet libraries usually put them in a directory named after the project, because of the default way that jpm
merges files from multiple projects together like this. But we can override that if we want, to divorce our internal directory structure from the final installation structure:
(declare-source
:source "src/init.janet"
:prefix "verbosify")
That will cause our clients to install our entry point as jpm_tree/lib/verbosify/init.janet
.
Alright. Now if we want to run our cat-v
program, we have to run it with jpm -l
. Because if we just run janet
:
janet main.janet
error: could not find module verbosify:
/usr/local/lib/janet/verbosify.jimage
/usr/local/lib/janet/verbosify.janet
/usr/local/lib/janet/verbosify/init.janet
/usr/local/lib/janet/verbosify.so
in require-1 [boot.janet] on line 2900, column 20
in import* [boot.janet] (tailcall) on line 2939, column 15
It will try to look in the global include path. Instead we want to run:
jpm -l janet main.janet <<<"quick brown fox"
alacritous brown fox
Note: jpm -l janet
, not jpm janet -l
. jpm janet -l
means something else.
Or we could directly set the environment variable JANET_PATH
, which is how Janet decides where to look for modules:
JANET_PATH=jpm_tree/lib janet main.janet <<<"quick brown fox"
expeditious brown fox
In fact that’s all jpm -l janet
does. Well, that and adding jpm_tree/bin
to our PATH
:
SCRIPT='(each [k v] (-> (os/environ) pairs sort) (print k "=" v))'
$ diff -U0 -L janet <(janet -e $SCRIPT) -L jpm <(jpm -l janet -e $SCRIPT)
--- janet
+++ jpm
@@ -6,0 +7 @@
+JANET_PATH=/Users/ian/src/cat-v/jpm_tree/lib
@@ -19 +20 @@
-PATH=...
+PATH=/Users/ian/src/cat-v/jpm_tree/bin:...
@@ -41 +42 @@
-_=/usr/local/bin/janet
+_=/usr/local/bin/jpm
Alright. So now we have our app, and we have a dependency in a separate library. cat-v
is starting to look like a real project.
But let’s expand its vocabulary a little. Let’s add a few more words to our verbosify
function:
(defn- choose [rng selections]
(def index (math/rng-int rng (length selections)))
(in selections index))
(defn verbosify [rng word]
(choose rng
(case word
"quick" ["alacritous" "expeditious"]
"lazy" ["indolent" "lackadaisical" "languorous"]
"jumps" ["gambols"]
"dog" ["canine"]
"fox" ["vulpine"]
[word])))
And then we’ll update our dependencies…
jpm -l deps
From file:///Users/ian/src/verbosify
* branch HEAD -> FETCH_HEAD
HEAD is now at 0a4e386 make a very useful library
removing /Users/ian/src/cat-v/jpm_tree/lib/verbosify.janet
removing manifest /Users/ian/src/cat-v/jpm_tree/lib/.manifests/verbosify.jdn
Uninstalled.
generating /Users/ian/src/cat-v/jpm_tree/lib/.manifests/verbosify.jdn...
Installed as 'verbosify'.
copying verbosify.janet to /Users/ian/src/cat-v/jpm_tree/lib...
And test it out:
jpm -l janet main.janet <<<"the quick brown fox jumps over the lazy dog"
the expeditious brown fox gambols over the lackadaisical dog
And it didn’t work!
It didn’t work because our project has a dependency on a git repository, and we didn’t actually commit these changes yet. jpm
just pulls the latest commit, and jpm
has no idea that our working directory is dirty.
What can we do about this? Unfortunately there’s no right answer, but we have a few options:
jpm -l deps
again.I know you’re very good at rebasing and amending commits, but this is still pretty annoying.
jpm_tree/lib/verbosify.janet
to the actual source.This is a pretty good solution if you’re iterating quickly on something and you only have a single source file or directory to symlink. It lets you pick up changes to the source itself, but it won’t pick up changes to the project definition — if we add dependencies to the verbosify
project, we’ll have to install them manually in the cat-v
project.
You’ll also have to remember to change the symlinks back once you’re done hacking, which is a little annoying.
(use ../verbosify/verbosify)
instead of involving jpm
at all.This also works, but I think it’s a bit worse than the symlink approach. It has the same problem of not picking up transitive dependencies, and if you import your library from multiple source files, you’ll have to edit every single import. And since Janet can’t import absolute paths, that might mean referring to a different relative path every time.
jpm install
instead of jpm deps
.Instead of pulling the verbosify
code from our cat-v
project, we could push the verbosify
code into cat-v
by setting a remote JANET_PATH
:
cd ~/src/verbosify; JANET_PATH=../cat-v/jpm_tree/lib jpm install
This seems like it should work well, but jpm install
actually won’t install any of verbosify
’s dependencies. It just copies the source files. Even though installation clearly should include the dependencies necessary for the project to run. Oh well.
jpm
and add support for local file paths.I mean, that’s probably the right answer. But we’re not going to do that right now. We have some more things to cover.
In real life, when we add dependencies, we’ll want to add dependencies on specific versions. For example, we don’t really want to say that we depend on verbosify
, we want to say that we depend on verbosify-1.1.0
. Otherwise future changes to the verbosify
library could break our cat-v
app, and we don’t want that.
But jpm
doesn’t have a concept of semver or version constraints, nor does it have a package index of specific versions. All dependencies are git repos, and jpm
only lets us specify version constraints in the form of a version or tag:
(declare-project
:name "cat-v"
:description "cat --verbose"
:dependencies [{:url "file:///Users/ian/src/verbosify"
:tag "v1.1.0"}])
Despite the name, :tag
can either be the name of a tag or a revision hash. Or a branch name. Anything that you could ask git for, really.
Now, it’s always a good idea to depend on specific versions of our dependencies, but that might not be sufficient to ensure that our project’s dependencies are reproducible. Because even if we lock all of our dependencies to specific revisions, those libraries might have dependencies of their own, and they might not be as fastidious as we are about how they specify them.
Fortunately, jpm
gives us a way to freeze all of a project’s transitive dependencies, by making a lockfile:
jpm -l make-lockfile
created lockfile.jdn
That will write down specific revisions not just for your immediate dependencies, but for the whole closure of transitive dependencies, ensuring that changes to random great-grand-dependencies won’t suddenly break our business-critical cat-v
application.
Note that once we have a lockfile, we have to call jpm -l load-lockfile
to install dependencies — jpm deps
ignores the file.
Note that jpm deps
also ignores any lockfiles that it finds in a project’s dependencies, so if you’re publishing a library you can’t just create a lockfile and call it a day. You’ll have to specify explicit constraints in your project.janet
file, or else the users of your library might wind up with invalid dependencies in the future.
Alright. I think that’s all that I have to say about modules and packages and jpm
, but I’d like to close this chapter by talking a little bit about the Janet package ecosystem.
The Janet package ecosystem is… young.
There is no equivalent of the npm
registry; packages are just Git repos floating around on the internet.
Except that there is a registry, sort of, of packages that you can install with short, abbreviated names. jpm install sqlite3
, for example, will (globally) install a package called sqlite3
, and that name registry has to live somewhere.
In fact, it lives in this file right here:
https://github.com/janet-lang/pkgs/blob/master/pkgs.janet
It’s not a long list!
But that’s just the packages that have special short names; there are plenty of other packages that never bothered to make a PR against that registry. It’s not an exhaustive list by any means.
If you’re looking for a third-party package, another way to find it is by searching the website Powered by Janet. It’s still a small ecosystem! But not quite as small as the official registry implies.
There is one package in particular that’s worth mentioning now: Spork.
Spork is a monolithic, first-party “contrib” module. It’s a bit of a grab bag of lots of things that don’t really fit into the standard library: Spork has a JSON parser, a code formatter, helper functions for working with generators, a UTF-8 parser that doesn’t actually conform to the UTF-8 specification… there are dozens of packages in Spork, of varying scope and quality.
Of course you can depend on the Spork mega-library and only use a small part of it, but by doing so you’re locked into a single version of Spork for all of its components. If you want to upgrade to a newer Spork to pick up changes to the spork/zip
module, but you want to keep running an old version of spork/argparse
, then… you can’t. Sorry.
This monolithicity is also annoying if you’re targeting WebAssembly, because Spork contains native modules that won’t build with Emscripten. So even if you just want to use some of the pure Janet parts of Spork, you can’t, because adding a dependency on Spork will break your build.
I don’t know why Spork exists as a single library, rather than a collection of many. If I had to guess I’d say that it’s because jpm
doesn’t have many affordances for managing project dependencies easily. But I don’t know. You should be aware of Spork, but I would caution you to be wary of it.
Janet tables are very similar to JavaScript objects, and they fill the same roles in the language: you can use a table as a primitive associative data structure, or you can use tables to emulate “instances” of a “class.” As a matter of fact, Janet tables are so similar to JavaScript objects that I’m not going to try to explain them from first principles — instead, I’m just going to describe how they differ.
Remember that tables come in both mutable (“table”) and immutable (“struct”) varieties, but I’m going to use the term “table” loosely to mean “one of Janet’s associative data structures.” They’re exactly the same, apart from the mutability bit.
First off, the big one: keys of a JavaScript object must be strings, while keys of a Janet table can be any value. Well, almost any value. You can’t use NaN
as a key, which makes perfect sense, because NaN
is not equal to itself. And you can’t use nil
as a key, because, as we saw in Chapter Six, returning nil
is how next
indicates that there aren’t any more keys.
However! Unlike JavaScript, and literally every other language except Lua, Janet does not let you store nil
as a value of a table. It’s just not allowed:
repl:1:> {:foo 123 :bar nil}
{:foo 123}
It’s not an error! It’s just silently dropped.
This has some convenient side effects: it means that you can check if a key exists in a table by doing (nil? (foo :key))
— there is no explicit (has? foo :key)
— and this plays nicely with if-let
and when-let
.
Although note that such a nil
check doesn’t tell you if the key exists on that specific table or if it exists somewhere in its prototype chain — if you care about that, use (nil? (table/rawget foo :key))
instead.
It also means that Janet doesn’t have a function to delete a key from a table. Instead, to remove a key, you set its value to nil
:
repl:1:> (def foo @{:x 1})
@{:x 1}
repl:2:> (set (foo :x) nil)
nil
repl:3:> foo
@{}
So this is cute, I guess, but honestly this is one of my least favorite things about Janet. And I realize that distinguishing “key not found” from “key found but set to nil
” is a problem that every dynamic language solves in a different way, and every approach has tradeoffs, and now is not the time to compare and contrast them or voice my opinions about the clearly correct solution (Python’s) (don’t @ me) so let’s move on to talking about prototypes.
Lightning refresher on prototypal inheritance: all tables have a magic, hidden field called a “prototype,” which is either nil
or a pointer to another table. When you lookup a key that doesn’t exist on a table, the table will check if its prototype has the key. And then its prototype will check its prototype if needed, and so on until it either finds the key or reaches the “root” prototype (a table that has a prototype of nil
).
In practice — using JavaScript terminology — you usually use this to put “methods” (which are just functions!) on a “prototype” object, and then create “instances” that have that object-full-of-methods as their prototype. So all “instances” of a “class” share the same prototype, and when you type foo.bar()
you’re (usually!) invoking the “method” bar
that lives in the prototype of foo
, not in foo
itself.
Unlike JavaScript, the prototype of a table is not a secret hidden entry — there’s no equivalent of obj.__proto__
. A table’s prototype is a completely separate field that you can only retrieve with the table/getproto
or struct/getproto
functions.
Also unlike JavaScript, tables in Janet have no default prototype, which means there are no methods common to all tables. Instead, common functionality that you would find on JavaScript’s Object.prototype
exist as functions — so the Janet way to write {}.toString()
is (string {})
. This sidesteps a whole class of JavaScript problems, including everyone’s favorite (recently retired!) Object.prototype.hasOwnProperty.call(obj, 'key')
.
Janet tables have no equivalent of JavaScript object properties — there are no getters or setters in Janet, and table entries have no metadata like JavaScript’s enumerable
flag. Instead, when you enumerate the keys of a table using next
, you always enumerate the keys of that specific table, and not any of the keys from its prototype.
Finally, we should talk about methods.
Like JavaScript, “methods” in Janet are just functions. Unlike JavaScript, Janet methods actually make sense. There is no secret, magic this
argument that works completely differently than every other argument. this
is conventionally spelled self
in Janet, but it’s just a normal positional argument and you can call it whatever you want. Let’s take a look at a table with a “method:“
repl:1:> (def table @{:get-foo (fn [self] (self :_foo)) :_foo 123 })
@{:_foo 123 :get-foo <function 0x600003B62BE0>}
We can look up the “method” like any other key:
repl:2:> (table :get-foo)
<function 0x600003B62BE0>
And we can call the method, just like any other function:
repl:3:> ((table :get-foo))
error: <function 0x600003B62BE0> called with 0 arguments, expected 1
in _thunk [repl] (tailcall) on line 3, column 1
And of course this is an error, because :get-foo
is a function that takes one argument. We have to pass it its self
argument:
repl:4:> ((table :get-foo) table)
123
But it’s very cumbersome to repeat table
like that, so Janet has a shorthand to invoke functions in one shot like this:
repl:5:> (:get-foo table)
123
When we “call” a keyword like this, it looks up the function on the table and then calls the function with the table as its first arguments — plus any remaining arguments. So the following two lines are exactly equivalent:
(:method table x y)
((table :method) table x y)
So: now that we understand how tables work, let’s take a look at how we could actually use them to simulate a sort of object-oriented programming.
(def counter-prototype
@{:add (fn [self amount] (+= (self :_count) amount))
:increment (fn [self] (:add self 1))
:count (fn [self] (self :_count))})
(defn new-counter []
(table/setproto @{:_count 0} counter-prototype))
(def counter (new-counter))
(print (:count counter))
(:increment counter)
(print (:count counter))
(:add counter 3)
(print (:count counter))
janet counter.janet
0
1
4
Note that this is a little more verbose than it would look in JavaScript. We have to define the prototype explicitly, along with a separate constructor/“factory” function that’s in charge of hooking it up correctly. Prototypes and constructor functions aren’t bundled together in Janet like they are in JavaScript, although we could bundle them together if we wanted to:
(def Counter
(let [proto @{:add (fn [self amount] (+= (self :_count) amount))
:increment (fn [self] (:add self 1))
:count (fn [self] (self :_count))}]
(fn [] (table/setproto @{:_count 0} proto))))
(def counter (Counter))
(print (:count counter))
(:increment counter)
(print (:count counter))
(:add counter 3)
(print (:count counter))
Or we could make an explicit first-class class, distinct from the constructor:
(def Counter
{:proto @{:add (fn [self amount] (+= (self :_count) amount))
:increment (fn [self] (:add self 1))
:count (fn [self] (self :_count))}
:new (fn [self]
(table/setproto @{:_count 0} (self :proto)))})
(def counter (:new Counter))
(print (:count counter))
(:increment counter)
(print (:count counter))
(:add counter 3)
(print (:count counter))
Or we could write a macro that lets us write something like ES6’s class
syntax:
(class Counter
constructor (fn [self] (set (self :_count) 0))
add (fn [self amount] (+= (self :_count) amount))
increment (fn [self] (:add self 1))
count (fn [self] (self :_count)))
(def counter (Counter))
(print (:count counter))
(:increment counter)
(print (:count counter))
(:add counter 3)
(print (:count counter))
We’ll talk about how we could write such a macro in Chapter Thirteen.
We can do anything we want! There are no rules here, and there aren’t even really idiomatic conventions for this sort of thing in Janet. Object-oriented programming just isn’t very common — it’s far more common to write modules full of functions than it is to write tables full of methods. But if you want to write in an object-oriented style, pick the style you like best — Janet only gives you the barest of building blocks to work with.
Now, why might we want to do any of this? What’s the point of object-oriented programming in the first place?
The point is polymorphism, which means defining different sorts of values that have the same interface. For example:
(defn print-all [readable]
(print (:read readable :all)))
(with [file (file/open "readme.txt")]
(print-all file))
(with [stream (net/connect "janet.guide" 80)]
(:write stream "GET / HTTP/1.1\r\n\r\n")
(print-all stream))
file/open
returns a core/file
abstract type, and net/connect
returns a core/stream
abstract type. But both of these types have a method called :close
, so they both work with the with
macro. with
executes its body and then calls (:close file)
or (:close stream)
, and it works on any value that has a :close
method. You might call it a polymorphic macro, except that that term is weird and misleading and let’s not call it that. It’s just a regular macro that happens to expand to code that exploits runtime polymorphism.
Files and streams also both have a method called :read
, so we can pass them to the polymorphic function print-all
. print-all
works with any value that has a :read
method — files, stream, or even custom types that we define ourselves, be they tables or abstract types.
Tables and abstract types are the only polymorphic values in Janet. We can never add a :read
method to a tuple, for instance, even if we really want to. And tables are pretty limited in what you can actually do with them: you can define whatever methods you want, but there aren’t very many functions in the Janet standard library that will try to call methods.
In fact the only built-in functions that you can overload with a method are the “math operator” functions:
repl:1:> (def addable @{:+ (fn [a b] (printf "adding %q %q" a b) 10)})
@{:+ <function 0x600002E4C0C0>}
repl:2:> (+ addable "foo")
adding @{:+ <function 0x600002E4C0C0>} "foo"
10
And the “bitwise operator” functions:
repl:1:> (bxor @{:^ (fn [a b] a)} nil)
@{:^ <function 0x600002E57960>}
And the “polymorphic compare” function called compare
, which you can override with the :compare
method:
repl:1:> (def compare-on-value (fn [a b] (compare (a :_value) (b :_value))))
<function 0x600000A4E260>
repl:2:> (def box-value (fn [value] @{:_value value :compare compare-on-value}))
<function 0x600000A584E0>
repl:3:> (compare (box-value 1) (box-value 2))
-1
repl:4:> (compare (box-value 2) (box-value 2))
0
repl:5:> (compare (box-value 3) (box-value 2))
1
But note that the normal comparison operators, like <
and =
and >=
do not use the polymorphic compare
function.
repl:6:> (= (box-value 2) (box-value 2))
false
There are polymorphic versions of the standard comparators that you can use instead:
repl:7:> (compare= (box-value 2) (box-value 2))
true
Which are useful if you want to, for example, sort values like these, since the default sort
functions also do not use polymorphic comparison by default:
repl:8:> (sort @[(box-value 1) (box-value 2)])
@[@{:_value 2 :compare <function 0x600000A4E260>} @{:_value 1 :compare <function 0x600000A4E260>}]
repl:9:> (sort @[(box-value 1) (box-value 2)] compare<)
@[@{:_value 1 :compare <function 0x600000A4E260>} @{:_value 2 :compare <function 0x600000A4E260>}]
In fact the only built-in functions that use polymorphic compare are zero?
, pos?
, neg?
, one?
, even?
, and odd?
, for some reason.
But note that if you’re defining an abstract type, you can override the standard comparison functions — the polymorphic compare interface only matters for tables, and only for these few functions. It’s odd.
Back to operators: operators can also be overloaded in the “right-hand” direction:
repl:1:> (def right-addable @{:r+ (fn [a b] (printf "adding %q %q" a b) 10)})
@{:r+ <function 0x600002E57620>}
repl:2:> (+ "foo" right-addable)
adding @{:r+ <function 0x600002E57620>} "foo"
10
But, like, we’re entering the realm of Janet trivia now. In practice you will probably never make tables that override any of the default operators, because it just isn’t very useful. You could use it to implement something like a vector:
(def Point (do
(var new nil)
(def proto
{:+ (fn [{:x x1 :y y1} {:x x2 :y y2}]
(new (+ x1 x2) (+ y1 y2)))})
(set new (fn [x y]
(struct/with-proto proto :x x :y y)))))
(pp (+ (Point 1 2) (Point 3 4)))
Janet doesn’t hoist function definitions, so we have to “forward-declare” new
as a variable and then re-assign it in order to close over it in the method.
Because structs are immutable, there is no struct/setproto
. Instead, we use struct/with-proto
to create a struct with a prototype.
janet points.janet
@{:x 4 :y 6}
Which is not useless, but it’s unlikely that we’d want to do this in practice — allocating a struct isn’t free; if we care about performance we’d probably write this as an abstract type and save a few bytes for every point. And if we don’t care about performance, it’s more convenient to just write something like (+ [1 2] [3 4])
and redefine +
to work with that.
So, to recap: math operators, bitwise operators, and compare
. Those are the only things that tables can override. Custom to-string
? Nope. And if we’re trying to make our own data structure, we can’t overload the definition of length
, nor, as we saw in Chapter Six, can we define a custom next
. This limits the usefulness of tables-as-custom-types quite a bit, and in practice if we’re trying to make our own types, we’ll probably wind up writing abstract types instead. They’re much more flexible, and give us a lot more control over how our values work at runtime.
So we’ll talk about how to do that soon.
Actually, we’ll talk about how to do that right now.
One thing that I really wish Janet had is a native set type. I love sets; they’re often the right way to model a problem. But usually when I write Janet code that wants a set, I have to write a table instead, and just set all the keys to true
:
repl:1:> (def cities-visited @{})
@{}
repl:3:> (set (cities-visited "NYC") true)
true
repl:4:> (set (cities-visited "LA") true)
true
repl:5:> (set (cities-visited "NYC") true)
true
repl:6:> cities-visited
@{"LA" true "NYC" true}
Which is okay! This is a pretty good way to hack up a set, but it’s not perfect. We can’t iterate over the elements with each
; we have to use eachk
, which means that other functions that act on iterable structures, like filter
or map
, won’t work on our “set.” This is… slightly annoying, but ultimately fine.
But! Wouldn’t it be nice if there were a proper, built-in set type that we could use? Don’t you think that would be useful? Just say yes; it’ll get the actual chapter started.
So, as we saw in the last chapter, we can’t implement a proper set as a pure Janet type, because we can’t override equality or iteration. We’re going to have to reach for some C code instead, which is actually quite easy. Writing native Janet modules in C isn’t some weird esoteric difficult thing; it’s a straightforward and even rather pleasant adventure.
In order to get started, we’ll need to declare a native module. We’ll start small:
(declare-project :name "set")
(declare-native
:name "set"
:source ["set.c"])
static Janet cfun_hello(int32_t argc, Janet *argv) {
janet_fixarity(argc, 0);
printf("hello world\n");
return janet_wrap_nil();
}
static JanetReg cfuns[] = {
{"hello", cfun_hello, "(hello)\n\nprints hello"},
{NULL, NULL, NULL}
};
JANET_MODULE_ENTRY(JanetTable *env) {
janet_cfuns(env, "set", cfuns);
}
<janet.h>
Sixteen lines! That’s all it takes to write a native module. I know some of the lines don’t make sense yet, but we’ll fix that soon. First, let’s see how easy it is to use this:
(import set)
(set/hello)
Now, we can’t just run this with janet main.janet
. We’ll need to build and install the native module first:
jpm install --local --verbose
cc -c set.c -DJANET_BUILD_TYPE=release -std=c99 -I/usr/local/include/janet -I/Users/ian/src/janet-set/jpm_tree/lib -O2 -fPIC -o build/set.o
generating meta file build/set.meta.janet...
generating /Users/ian/src/janet-set/jpm_tree/lib/.manifests/set.jdn...
cc -std=c99 -I/usr/local/include/janet -I/Users/ian/src/janet-set/jpm_tree/lib -O2 -o build/set.so build/set.o -shared -undefined dynamic_lookup -lpthread
cc -c set.c -DJANET_BUILD_TYPE=release -DJANET_ENTRY_NAME=janet_module_entry_set -std=c99 -I/usr/local/include/janet -I/Users/ian/src/janet-set/jpm_tree/lib -O2 -o build/set.static.o
ar rcs build/set.a build/set.static.o
Installed as 'set'.
copying build/set.so to /Users/ian/src/janet-set/jpm_tree/lib/...
cp -rf build/set.so /Users/ian/src/janet-set/jpm_tree/lib/
copying build/set.meta.janet to /Users/ian/src/janet-set/jpm_tree/lib/...
cp -rf build/set.meta.janet /Users/ian/src/janet-set/jpm_tree/lib/
copying build/set.a to /Users/ian/src/janet-set/jpm_tree/lib/...
cp -rf build/set.a /Users/ian/src/janet-set/jpm_tree/lib/
Now we can run it:
jpm -l janet main.janet
hello world
And see that it worked.
But how did it work?
Well, if we look at what this actually installed, we can see three files:
ls -l jpm_tree/lib
total 120
-rw-r--r-- 1 ian 1.4K set.a
-rw-r--r-- 1 ian 136B set.meta.janet
-rwxr-xr-x 1 ian 49K set.so
We have a static library and a dynamic library, plus a file of build information.
If we import our library from the Janet repl, or from a script that we execute with the janet
interpreter, we’ll dynamically link in the native module set.so
. But if we ask jpm
to compile a native executable, jpm
will statically link in the set.a
archive. The set.meta.janet
file contains some information that jpm
will use in order to statically link it properly:
# Metadata for static library set.a
{ :cpp false
:ldflags (quote nil)
:lflags (quote nil)
:static-entry "janet_module_entry_set"}
So when we run jpm -l janet main.janet
, we load the dynamic library, and somehow that gives us a function called hello
in Janet.
repl:1:> (use set)
@{_ @{:value <cycle 0>} hello @{:private true} :macro-lints @[]}
repl:2:> hello
<cfunction set/hello>
But how did that work?
Normally when we load a Janet module we get an environment, which is a regular Janet table. And that’s exactly what we get when we load a native module as well:
repl:2:> (require "set")
@{hello @{:value <cfunction set/hello>} :native "/Users/ian/src/janet-set/jpm_tree/lib/set.so"}
But where did that table come from? Is there some marshaled environment table hiding in the dynamic library that we built?
Well, no. It’s simpler than that. Let’s take a closer look at the C code that we wrote:
static Janet cfun_hello(int32_t argc, Janet *argv) {
janet_fixarity(argc, 0);
printf("hello world\n");
return janet_wrap_nil();
}
static JanetReg cfuns[] = {
{"hello", cfun_hello, "(hello)\n\nprints hello"},
{NULL, NULL, NULL}
};
JANET_MODULE_ENTRY(JanetTable *env) {
janet_cfuns(env, "set", cfuns);
}
<janet.h>
<stdio.h>
JANET_MODULE_ENTRY
is a macro, but we can expand it to something like this:
# 2 "set.c" 2
static Janet cfun_hello(int32_t argc, Janet *argv) {
janet_fixarity(argc, 0);
printf("hello world\n");
return janet_wrap_nil();
}
static JanetReg cfuns[] = {
{"hello", cfun_hello, "(hello)\n\nprints hello"},
{((void*)0), ((void*)0), ((void*)0)}
};
__attribute__((visibility ("default"))) JanetBuildConfig _janet_mod_config(void) { return ((JanetBuildConfig){ 1, 23, 1, (0 | 0) }); } __attribute__((visibility ("default"))) void _janet_init(JanetTable *env) {
janet_cfuns(env, "set", cfuns);
}
If you peer past the __attribute__
annotations, you can see that the JANET_MODULE_ENTRY
macro defined two functions:
JanetBuildConfig _janet_mod_config(void) {
return ((JanetBuildConfig){ 1, 27, 0, (0 | 0) });
}
void _janet_init(JanetTable *env) {
janet_cfuns(env, "set", cfuns);
}
_janet_mod_config
is a function that returns the current version of Janet — when Janet dynamically loads a native module, it will first check to make sure that it was compiled with the same version of Janet.
_janet_init
is the interesting bit, though. It doesn’t actually return an environment table, but instead takes a freshly allocated table as an input and mutates it, installing all of the environment entries for our module.
You’ll typically do this with the janet_cfuns
helper, which is a function that iterates over a null-terminated array of JanetReg
structs:
struct JanetReg {
const char *name;
JanetCFunction cfun;
const char *documentation;
};
And installs them into the environment table, boxing the raw C function pointers into Janet cfunction
values.
But we could do other things in this function. We could execute arbitrary code to compute an environment. This function is a bit like top-level statements in a regular Janet file, except instead of running at compile time, it runs when the native module is loaded, so there is a runtime cost that we will pay either when the module is dynamically linked in, or on program startup if it’s statically linked.
Just for fun, let’s compute something, and put it in the environment table manually:
JANET_MODULE_ENTRY(JanetTable *env) {
janet_cfuns(env, "set", cfuns);
janet_def(env, "answer", janet_wrap_integer(42), "the answer");
}
jpm -l janet -e '(import set) (print set/answer)'
42
Very original.
But alright, we’re probably not going to do that very often. For the most part we’re just going to define C functions that we can call from Janet code.
So let’s talk about that:
static Janet cfun_hello(int32_t argc, Janet *argv) {
janet_fixarity(argc, 0);
printf("hello world\n");
return janet_wrap_nil();
}
We’re going to see a lot of Janet
s today, and I think it’s helpful to understand what the type actually is. A Janet
is a small value — on x86 and x64 architectures, it’s implemented as a single word using a technique called “NaN boxing.” On other architectures it’s implemented as a tagged union consisting of a one-byte enum plus eight payload bytes. So on my computer with my C compiler, a Janet
is two 64-bit words long: the first is a tag value; the second is either a pointer or a double or a boolean integer.
The point is that we can pass Janet
values around quite cheaply — copying a Janet
is never going to copy an entire giant data structure; it will only copy a pointer to the data structure, even when we’re dealing with immutable tuples or structs.
So, coming back to cfun_hello
, you can see that it takes an array of Janet
s and returns a Janet
— specifically janet_wrap_nil()
, which is the slightly verbose way that you write nil
in the C API.
Our function doesn’t actually take arguments, so we assert that the user didn’t pass us any. Unless you’re writing a fully variadic function, you should start all cfunctions
with an arity assertion like this. They come in a few flavors:
// Exactly two arguments:
janet_fixarity(argc, 2);
// One, two, or three arguments:
janet_arity(argc, 1, 3);
// At least two arguments:
janet_arity(argc, 2, -1);
I think that’s about all I can say about cfun_hello
, so let’s move on to something real.
We’re trying to make a set
type, so let’s write down a set/new
function. I want it to work like this:
repl:1:> (set/new 1 2 3)
<set {1 2 3}>
So it will be a fully variadic function that returns… well, an abstract type, of course.
An abstract type is just a record that contains a name and a bunch of function pointers:
struct JanetAbstractType {
const char *name;
int (*gc)(void *data, size_t len);
int (*gcmark)(void *data, size_t len);
int (*get)(void *data, Janet key, Janet *out);
void (*put)(void *data, Janet key, Janet value);
void (*marshal)(void *p, JanetMarshalContext *ctx);
void *(*unmarshal)(JanetMarshalContext *ctx);
void (*tostring)(void *p, JanetBuffer *buffer);
int (*compare)(void *lhs, void *rhs);
int32_t (*hash)(void *p, size_t len);
Janet(*next)(void *p, Janet key);
Janet(*call)(void *p, int32_t argc, Janet *argv);
size_t (*length)(void *p, size_t len);
JanetByteView(*bytes)(void *p, size_t len);
};
The function pointers allow us to override different built-in bits of Janet functionality — you can probably guess what most of those do just from their names. Except, er, yeah, I don’t love to read function pointer signatures like that. This is a lot easier for me to read:
int gc(void *data, size_t len);
int gcmark(void *data, size_t len);
int get(void *data, Janet key, Janet *out);
void put(void *data, Janet key, Janet value);
void marshal(void *p, JanetMarshalContext *ctx);
void *unmarshal(JanetMarshalContext *ctx);
void tostring(void *p, JanetBuffer *buffer);
int compare(void *lhs, void *rhs);
int32_t hash(void *p, size_t len);
Janet next(void *p, Janet key);
Janet call(void *p, int32_t argc, Janet *argv);
Now there are lots of ways that we could choose to represent a set, but for now I am going to cheat a little: I’m going to implement a set as a regular Janet table, but wrapped in an abstract type. That way we won’t need to worry about actually implementing the data structure, and we can just focus on the abstract interface.
So here’s how I’m going to write my set/new
function:
static JanetTable *new_abstract_set(int32_t capacity) {
JanetTable *set = (JanetTable *)janet_abstract(&set_type, sizeof(JanetTable));
set->gc = (JanetGCObject){0, NULL};
janet_table_init_raw(set, capacity);
return set;
}
static Janet cfun_new(int32_t argc, Janet *argv) {
JanetTable *set = new_abstract_set(argc);
for (int32_t i = 0; i < argc; i++) {
janet_table_put(set, argv[i], janet_wrap_true());
}
return janet_wrap_abstract(set);
}
We’re allocating memory for a JanetTable
struct using the janet_abstract
function, then we’re doing something weird and JanetGCObject
-related, and then we’re calling janet_table_init_raw
. Everything after that is pretty straightforward.
The GC thing is only necessary because of the very weird thing that we’re doing of wrapping a JanetTable
as an abstract type. I do want to explain this code, but understanding it requires understanding a few Janet implementation details that you absolutely do not need to understand to write any “normal” C functions. So consider the next aside to be completely optional reading.
Let’s start from the top: when Janet creates new values, which it does in functions like janet_table
and janet_abstract
, it calls a (private!) function called janet_gcalloc
to actually allocate that memory. janet_gcalloc
is a simple function that basically just calls malloc
and then sets a few fields at the beginning of the allocated memory.
Specifically, janet_gcalloc
assumes that the thing that it’s allocating is a (C!) struct that begins with a field of type JanetGCObject
, which all Janet values do. For example, here’s what a table looks like:
struct JanetTable {
JanetGCObject gc;
int32_t count;
int32_t capacity;
int32_t deleted;
JanetKV *data;
JanetTable *proto;
};
Here’s a tuple:
struct JanetTupleHead {
JanetGCObject gc;
int32_t length;
int32_t hash;
int32_t sm_line;
int32_t sm_column;
const Janet data[];
};
And here’s an abstract type:
struct JanetAbstractHead {
JanetGCObject gc;
const JanetAbstractType *type;
size_t size;
long long data[];
};
When we allocate an abstract value with janet_abstract
, we call janet_gcalloc
with a size large enough to hold the JanetAbstractHead
and the actual data. There’s no pointer indirection; the backing memory for our type is allocated contiguously after the header.
So when we allocate an abstract type that contains a JanetTable
struct, we’re basically allocating a struct that looks like this:
struct JanetAbstractTable {
JanetGCObject gc;
const JanetAbstractType *type;
size_t size;
JanetTable data;
};
Which is morally equivalent to writing something like:
struct JanetAbstractTable {
JanetGCObject gc;
const JanetAbstractType *type;
size_t size;
JanetGCObject table_gc;
int32_t table_count;
int32_t table_capacity;
int32_t table_deleted;
JanetKV *table_data;
JanetTable *table_proto;
};
But note the weirdness: we now have two JanetGCObject
values. One in the abstract header, and one in the table itself. janet_table_init_raw
initializes all of the other fields of a table, but it doesn’t set the gc
field — it assumes that janet_gcalloc
already took care of that.
So if we don’t do anything here, our data.gc
field will be uninitialized. Which doesn’t really matter yet, because nothing is trying to read that value: this table was not directly allocated with janet_gcalloc
, so the Janet garbage collector doesn’t even know that it exists.
But it will start to matter soon, when we try to free the table, because when we free the table it’s going to look in its JanetGCObject
struct to decide exactly how it needs to free memory.
This is because Janet has a special kind of “fast temporary table” that uses a separate allocator, and the flags
field of the JanetGCObject
stores whether or not the table was created as a regular table or a temporary local table. When freeing a table, Janet has to determine if it’s freeing it from the regular heap or from this Janet “scratch space.”
So it’s very important that we initialize the garbage collector field, and it just so happens that 0
is the correct value to initialize it to. Which is good, because the other possible garbage collector flag values that we could set are not actually exposed as part of the Janet header… so we’re relying on an implementation detail here. Like I said: the thing we’re doing is weird.
Okay! With that out of the way, we still have to define the abstract type itself — the set_type
value that I referenced in cfun_new
. I’m going to define it like this for now:
static const JanetAbstractType set_type = {
.name = "set",
.gc = set_gc,
.gcmark = set_gcmark,
.get = NULL,
.put = NULL,
.marshal = NULL,
.unmarshal = NULL,
.tostring = set_tostring,
.compare = NULL,
.hash = NULL,
.next = NULL,
.call = NULL,
.length = NULL,
.bytes = NULL,
};
We’ll add some more functions later on, but we’re starting with the absolute basics.
set_gc
is the function that Janet will call in order to garbage collect our value; it’s the deinitializer or destructor or whatever you want to call it for this abstract type. Our implementation is very simple:
static int set_gc(void *data, size_t len) {
(void) len;
janet_table_deinit((JanetTable *)data);
return 0;
}
Even though len
is an argument to this function, we don’t need to free the memory that we allocated for this abstract value; the garbage collector will do that for us. We only need to free any memory that we allocated in addition to that. The len
argument is there just in case we allocated memory proportional to our original size — it saves us from having to store that length separately.
In that same vein, janet_table_deinit
won’t free the actual JanetTable
struct, only the memory that it allocated itself (the hash buckets). In general, if you’re writing an abstract type that doesn’t dynamically allocate any additional memory, you can just set .gc = NULL
.
Finally, we return 0
to indicate success, like a process exit code.
Next we need to implement set_gcmark
. Again, this is very simple:
static int set_gcmark(void *data, size_t len) {
(void) len;
janet_mark(janet_wrap_table((JanetTable *)data));
return 0;
}
In general set_gcmark
should loop over any janet_gcalloc
ated Janet values that this abstract type knows about and call janet_mark
on them. If the abstract type you’re defining isn’t some kind of container, you probably don’t need to implement this function at all.
Note that we don’t actually need to mark the table itself here, as long as we mark all of the values in the table: the table is not known to the garbage collector, as we allocated it as part of our abstract type. But there’s no harm in doing so, and calling janet_mark
on the whole table is a very convenient way to recursively mark all of the table’s keys and values.
Once again, return 0;
means that we didn’t encounter any error while trying to mark.
And now we’re basically done! There’s only one more function to implement: set_tostring
:
static void set_tostring(void *data, JanetBuffer *buffer) {
JanetTable *set = (JanetTable *)data;
janet_buffer_push_cstring(buffer, "{");
int first = 1;
for (int32_t i = 0; i < set->capacity; i++) {
JanetKV *entry = &set->data[i];
if (janet_checktype(entry->key, JANET_NIL)) {
continue;
}
if (first) {
first = 0;
} else {
janet_buffer_push_cstring(buffer, " ");
}
janet_pretty(buffer, 0, 0, entry->key);
}
janet_buffer_push_cstring(buffer, "}");
}
Amusingly, this is the most complicated function of all, and it’s just a stupid printer.
Note that because hash tables store keys in a sparse array, we can’t just iterate over the values directly, and instead we have to iterate over every bucket in the table’s capacity
and skip over the empty ones.
Finally, we have to register the abstract type we declared. This just means adding one additional line to our _janet_init
function:
JANET_MODULE_ENTRY(JanetTable *env) {
janet_cfuns(env, "set", cfuns);
janet_register_abstract_type(&set_type);
}
And now we are actually done. Just to recap, the code in its entirety looks like this:
static int set_gc(void *data, size_t len) {
(void) len;
janet_table_deinit((JanetTable *)data);
return 0;
}
static int set_gcmark(void *data, size_t len) {
(void) len;
janet_mark(janet_wrap_table((JanetTable *)data));
return 0;
}
static void set_tostring(void *data, JanetBuffer *buffer) {
JanetTable *set = (JanetTable *)data;
janet_buffer_push_cstring(buffer, "{");
int first = 1;
for (int32_t i = 0; i < set->capacity; i++) {
JanetKV *entry = &set->data[i];
if (janet_checktype(entry->key, JANET_NIL)) {
continue;
}
if (first) {
first = 0;
} else {
janet_buffer_push_cstring(buffer, " ");
}
janet_pretty(buffer, 0, 0, entry->key);
}
janet_buffer_push_cstring(buffer, "}");
}
static const JanetAbstractType set_type = {
.name = "set",
.gc = set_gc,
.gcmark = set_gcmark,
.get = NULL,
.put = NULL,
.marshal = NULL,
.unmarshal = NULL,
.tostring = set_tostring,
.compare = NULL,
.hash = NULL,
.next = NULL,
.call = NULL,
.length = NULL,
.bytes = NULL,
};
static JanetTable *new_abstract_set(int32_t capacity) {
JanetTable *set = (JanetTable *)janet_abstract(&set_type, sizeof(JanetTable));
set->gc = (JanetGCObject){0, NULL};
janet_table_init_raw(set, capacity);
return set;
}
static Janet cfun_new(int32_t argc, Janet *argv) {
JanetTable *set = new_abstract_set(argc);
for (int32_t i = 0; i < argc; i++) {
janet_table_put(set, argv[i], janet_wrap_true());
}
return janet_wrap_abstract(set);
}
static const JanetReg cfuns[] = {
{"new", cfun_new, "(set/new & xs)\n\n"
"Returns a set containing only this function's arguments."},
{NULL, NULL, NULL}
};
JANET_MODULE_ENTRY(JanetTable *env) {
janet_cfuns(env, "set", cfuns);
janet_register_abstract_type(&set_type);
}
<janet.h>
It’s a lot of code when we look at it all at once, but each individual piece is pretty simple. And now we finally get to test it out!
repl:1:> (set/new 1 2 3)
<set {1 2 3}>
Hooray! We wrote an extremely, umm, useless set
type.
But we can try to make it more useful.
The first thing I want to do is to make it enumerable: I want to be able to loop over it with a normal each
loop.
So recall, from Chapter Six, the iteration protocol: we need to implement a function called next
that will return the next key in the structure, and we need to implement a function called get
that will return a value for that key. So in order to implement this, we have to decide: what are the “keys” of a set?
One idea is to have next
iterate over the keys in our underlying table — the elements in our set — and when we call get
, to just return the element itself if it exists or nil
if it doesn’t. Since nil
cannot be a key of a table (and thus cannot be an element in our set), this happens to work nicely.
This is perfectly reasonable, and we could choose to implement next
this way, but there’s a small problem with this approach: if we allow arbitrary values to be our keys, we can no longer have our abstract type respond to methods.
Of course it’s just fine to say “who cares” and not implement any methods for our set, but that wouldn’t be any fun. I think it would be nice to overload +
to mean “union” and -
to mean “difference” and so on, and in order to overload operators like that we’ll need to implement methods.
Implementing a method is pretty simple:
static Janet cfun_union(int32_t argc, Janet *argv);
static const JanetMethod set_methods[] = {
{"+", cfun_union},
{NULL, NULL}
};
static int set_get(void *data, Janet key, Janet *out) {
if (!janet_checktype(key, JANET_KEYWORD)) {
return 0;
}
return janet_getmethod(janet_unwrap_keyword(key), set_methods, out);
}
static const JanetAbstractType set_type = {
.name = "set",
.gc = set_gc,
.gcmark = set_gcmark,
.get = set_get,
.put = NULL,
.marshal = NULL,
.unmarshal = NULL,
.tostring = set_tostring,
.compare = NULL,
.hash = NULL,
.next = NULL,
.call = NULL,
.length = NULL,
.bytes = NULL,
};
set_get
returns an int
, but in this case it’s a boolean, not an “exit code.” So 0
means that the key was not found, while anything else means that it was. The janet_getmethod
helper makes it easy to implement this; it does a linear scan through a NULL
-terminated array of methods and “returns,” via the out parameter, the first one with a matching name.
We’ll need to reference set_type
from the cfun_union
implementation, so I forward declare it to implement later on:
static Janet cfun_union(int32_t argc, Janet *argv) {
JanetTable *result = new_abstract_set(argc);
for (int32_t arg_ix = 0; arg_ix < argc; arg_ix++) {
JanetTable *arg = (JanetTable *)janet_getabstract(argv, arg_ix, &set_type);
for (int32_t bucket_ix = 0; bucket_ix < arg->capacity; bucket_ix++) {
JanetKV *entry = &arg->data[bucket_ix];
if (janet_checktype(entry->key, JANET_NIL)) {
continue;
}
janet_table_put(result, entry->key, janet_wrap_true());
}
}
return janet_wrap_abstract(result);
}
And now we can recompile this and test it out:
repl:1:> (+ (set/new))
<set {}>
repl:2:> (+ (set/new 1 2 3) (set/new 2 3 4))
<set {1 2 3 4}>
Perfect.
Since there is only one get
function, and it has to work for both methods and values, there’s a restriction on what we can use as keys for our next
implementation — if we allow arbitrary keywords to be keys, we won’t be able to implement any methods.
So instead let me propose a different key: the index of the hash bucket. It’ll be very fast to look it up, and there’s no chance that we’ll confuse it with a method name… but it will be completely meaningless if we mutate the underlying table during iteration, so we’ll have to make sure not to do that.
This key makes for a very simple implementation of next
:
static Janet set_next(void *data, Janet key) {
int32_t previous_index;
if (janet_checktype(key, JANET_NIL)) {
previous_index = -1;
} else if (janet_checkint(key)) {
previous_index = janet_unwrap_integer(key);
if (previous_index < 0) {
janet_panicf("set key %v cannot be negative", key);
}
} else {
janet_panicf("set key %v must be an integer", key);
}
JanetTable *set = (JanetTable *)data;
for (int32_t i = previous_index + 1; i < set->capacity; i++) {
if (!janet_checktype(set->data[i].key, JANET_NIL)) {
return janet_wrap_integer(i);
}
}
return janet_wrap_nil();
}
We’re iterating over the buckets in the hash table again, just like we did for set_tostring
, but this time we return the index of the first or next “full” bucket that we find.
That’s already sufficient to do something:
repl:1:> (eachk x (set/new 1 2 3 4 5) (print x))
0
1
2
4
8
nil
repl:2:> (eachk x (set/new 1 "two" :three 'four) (print x))
0
1
5
7
nil
The indexes themselves sort of leak some implementation details of Janet tables, but we’re going to treat them as completely opaque values. A set isn’t actually an indexed structure at all, and using eachk
or eachp
with a set is like using it with a generator — the keys just aren’t meaningful.
So let’s extend our get
implementation to work with these keys, so that we can actually support useful iteration:
static int set_get(void *data, Janet key, Janet *out) {
if (janet_checkint(key)) {
JanetTable *set = (JanetTable *)data;
int32_t index = janet_unwrap_integer(key);
if (index < 0 || index >= set->capacity) {
janet_panicf("set key %v out of bounds (did you mutate during iteration?)", key);
}
Janet element = set->data[index].key;
if (janet_checktype(element, JANET_NIL)) {
janet_panicf("set key %v not found (did you mutate during iteration?)", key);
}
*out = element;
return 1;
} else if (janet_checktype(key, JANET_KEYWORD)) {
return janet_getmethod(janet_unwrap_keyword(key), set_methods, out);
} else {
return 0;
}
}
We could skip the assertions and return “key not found” for invalid keys, but if you’re ever indexing a set with an invalid key, something has gone horribly wrong, and I think it’s better to fail early.
And now, at long last, we actually have a useful set:
repl:1:> (each x (set/new 1 3 1 2 2 1 1) (print x))
1
3
2
nil
repl:2:> (map |(* $ 2) (set/new 1 2 3 4 5))
@[2 4 8 10 6]
I mean, for some definition of useful. We haven’t actually done anything set-related yet, and as the code stands right now we can’t even check for membership. So we still have a little ways to go.
One thing that was nice about our actual-table-as-a-set approach is that we could do a membership check by “invoking” the set:
repl:1:> (def cities-visited @{"LA" true "NYC" true})
@{"LA" true "NYC" true}
repl:2:> (cities-visited "LA")
true
repl:2:> (cities-visited "Pittsburgh")
nil
I mean, true
/nil
is janky as heck, but it works in most situations.
It would be nice to replicate this with our set type, but by default when we “invoke” an abstract type, it’s the same as calling get
:
repl:1:> (def cities-visited (set/new "NYC" "LA"))
<set {"LA" "NYC"}>
repl:2:> (cities-visited "NYC")
error: key "NYC" not found in <set {"LA" "NYC"}>
in _thunk [repl] (tailcall) on line 2, column 1
repl:3:> (cities-visited :length)
<cfunction 0x000104443DBC>
But we can change this by implementing a custom call
function:
static Janet set_call(void *data, int32_t argc, Janet *argv) {
janet_fixarity(argc, 1);
JanetTable *set = (JanetTable *)data;
Janet value = janet_table_get(set, argv[0]);
int key_found = !janet_checktype(value, JANET_NIL);
return janet_wrap_boolean(key_found);
}
And now if we “invoke” our set, it will perform a membership test instead:
repl:1:> (def cities-visited (set/new "NYC" "LA"))
<set {"LA" "NYC"}>
repl:2:> (cities-visited "NYC")
true
repl:3:> (cities-visited "Pittsburgh")
false
We now have a pretty thorough set implementation! We’ve put almost everything we can into the abstract type:
static const JanetAbstractType set_type = {
.name = "set",
.gc = set_gc,
.gcmark = set_gcmark,
.get = set_get,
.put = NULL,
.marshal = NULL,
.unmarshal = NULL,
.tostring = set_tostring,
.compare = NULL,
.hash = NULL,
.next = set_next,
.call = set_call,
.length = NULL,
.bytes = NULL,
};
length
is trivial; let’s go ahead and knock that one out:
static size_t set_length(void *data, size_t len) {
(void) len;
JanetTable *set = (JanetTable *)data;
return set->count;
}
We won’t implement bytes
; bytes
only makes sense for abstract types that are string-like or buffer-like. It’s supposed to return a slice of contiguous bytes, and we don’t have any of those.
If we were making an immutable set, we’d want to implement custom compare
and hash
functions to make sure that two sets with the same elements are equal to one another and hash to the same value. But for the sake of simplicity, let’s say that we only care about writing a mutable set, and we can just use the default pointer equality when we compare two sets.
Note that if you are implementing a mutable type, there’s no way to overload the behavior of Janet’s deep=
function. You’ll have to implement a separate function if you want to support deep equality.
I don’t know if we should implement put
; it seems a bit weird to have an asymmetry between the keys we use for get
and put
. But it might be convenient to be able to write (set (cities-visited "NYC") true)
to add keys and (set (cities-visited "NYC") false)
to remove them. In any case, we won’t learn anything new by doing that, so let’s skip it for now.
Which just leaves us with the marshaling functions.
Recall from Chapter Two that marshaling means serializing a Janet data structure into a sequence of bytes. When we “compile” a Janet program, we compute the program’s environment table and then marshal that table to disk.
But in order to do that, all of the values in the environment table have to be marshalable. And right now, if we write a simple program like this, we actually have two values in our environment that are not marshalable:
(import set)
(def numbers (set/new 1 2 3 4 5))
(defn main [&]
(each number numbers
(print number)))
The first is, obviously, the set called numbers
. The second is more subtle: it’s the cfunction
called set/new
that we import
ed.
jpm -l janet -c main.janet main.jimage
error: no registry value and cannot marshal <cfunction set/new>
in marshal [src/core/marsh.c] on line 1480
in make-image [boot.janet] on line 2637, column 3
in c-switch [boot.janet] (tailcall) on line 3873, column 36
in cli-main [boot.janet] on line 3909, column 13
Now, there’s not really any way for Janet to marshal cfunction
s. You might imagine some kind of serialization of the actual machine code of the native function… but no. That wouldn’t work, and even if it did it wouldn’t be portable.
But it is possible for Janet to safely skip marshaling certain cfunction
s. If Janet knows that the cfunction
is going to be present in the environment that unmarshals this image, it can just write down an identifier for the cfunction
and trust that the unmarshaling code will know how to match that identifier up to the actual correct cfunction
later on. (This is the “registry value” that the error message is referring to.)
This is why you can have built-in cfunction
s in your environment and marshal them just fine. You won’t get an error trying to marshal this file’s environment:
(def is-integer? int?)
Even though int?
is a cfunction
.
Now, when we use jpm
to build an executable that references our native set library, jpm
will take care of automatically skipping all of the cfunction
s exposed by any native modules that the executable depends on, so we generally won’t have to think about this at all.
And we could manually alter the make-image-dict
so that we could marshal a “raw” image without using jpm
to compile it all the way to a native executable, and then very carefully unmarshal that image later… but we’re not going to do that.
But even if we use jpm
to compile this, we still won’t be able to marshal the resulting image, because our set is not marshalable yet. But let’s try it anyway. I’m going to add an executable with a native dependency:
(declare-project :name "set")
(def native-module
(declare-native
:name "set"
:source ["set.c"]))
(declare-executable
:name "main"
:entry "main.janet"
:deps [(native-module :static)])
And then I’m going to alter the way that we import the native module:
(import /build/set)
(def cities-visited (set/new "NYC" "LA"))
(defn main [&]
(print (cities-visited "LA")))
Because we’re depending on a native library declared in the same project, we can’t install
it before we run this script, so we can’t (import set)
anymore. Instead we have to (import /build/set)
. This is the real actual right way to do this, although I agree that it is very gross.
Now if we try to compile this executable:
jpm -l build
generating executable c source build/main.c from main.janet...
found native build/set.so...
error: cannot marshal <set {"LA" "NYC"}>
We get an error, as expected. So we’re going to need to implement marshaling functions for our abstract set type.
Now, we could just defer to Janet’s existing table marshaling functions, and write these as one liners. But we’re not going to do that for three reasons:
true
.In any case, writing custom marshaling functions is very easy:
static void set_marshal(void *data, JanetMarshalContext *ctx) {
janet_marshal_abstract(ctx, data);
JanetTable *set = (JanetTable *)data;
janet_marshal_int(ctx, set->count);
for (int32_t i = 0; i < set->capacity; i++) {
Janet element = set->data[i].key;
if (!janet_checktype(element, JANET_NIL)) {
janet_marshal_janet(ctx, element);
}
}
}
We write down the number of elements in the table, then we write down each element.
Unmarshaling is just the reverse of that:
static void *set_unmarshal(JanetMarshalContext *ctx) {
JanetTable *set = (JanetTable *)janet_unmarshal_abstract(ctx, sizeof(JanetTable));
set->gc = (JanetGCObject){0, NULL};
janet_table_init_raw(set, 0);
int32_t length = janet_unmarshal_int(ctx);
for (int32_t i = 0; i < length; i++) {
janet_table_put(set, janet_unmarshal_janet(ctx), janet_wrap_true());
}
return set;
}
Although we do have to remember to zero out the set->gc
fields again.
And now we have a pretty good set type. We can even compile and run a native executable that includes a marshaled set in its image:
(import /build/set)
(def cities-visited (set/new "NYC" "LA"))
(defn main [&]
(print (cities-visited "LA")))
jpm -l build && build/main
true
And we are done implementing functions for the abstract type protocol. Which means that we can start to write some functions for ourselves! We’ve done the busy work, and now we can start to have some fun.
For starters, we don’t have any way to change the set. We should start by implementing some basic functionality, like add
and remove
:
static Janet cfun_add(int32_t argc, Janet *argv) {
janet_arity(argc, 1, -1);
JanetTable *set = (JanetTable *)janet_getabstract(argv, 0, &set_type);
for (int32_t i = 1; i < argc; i++) {
janet_table_put(set, argv[i], janet_wrap_true());
}
return janet_wrap_nil();
}
static Janet cfun_remove(int32_t argc, Janet *argv) {
janet_arity(argc, 1, -1);
JanetTable *set = (JanetTable *)janet_getabstract(argv, 0, &set_type);
for (int32_t i = 1; i < argc; i++) {
janet_table_remove(set, argv[i]);
}
return janet_wrap_nil();
}
And next we should probably implement the rest of the set-like functions, like intersect
and subtract
.
But you know what? I’m kind of tired of writing C code. And if we’re going to write a full set API here, it would be nice if we could implement some of the higher-level helpers in Janet code.
And of course it’s easy to do this in our little main
executable… we can just write helpers. But if we want to make a set library that can be used by other people we’ll need to figure out how to mix and match native modules with pure Janet code.
And there are three ways that we could do that:
Embed Janet source code into our native module and execute it during janet_init
.
jpm
actually does have helpers for embedding Janet source code into native modules, but only source code. This means that we’d have to parse, macroexpand, compile, and finally compute the environment in our _janet_init
function at load time, instead of compiling it into an image ahead of time. This is a little weird to me, because I’m used to thinking of macros as “free” — they execute at compile time — but if we do this, suddenly we’ll have to pay for macro-expansion during program startup.
But, you know, computers are fast, and Janet compiles quickly… it’s unlikely that this would actually effect any noticeable startup latency. It just feels wrong to me.
Compile a Janet image ahead of time, embed the image into our native module, and then execute it during _janet_init
.
This is possible, but it’s a fair bit of work to avoid a circular dependency between your native module and your Janet code, and we’re not going to spend any time on it in this book.
Write a native Janet module that imports and re-exports the environment of a private native module, then declare-source
it in our project.janet
.
I think this is the easiest thing to do, but note that we will no longer be able to mix and match an executable with a library written in this way.
For a really dumb reason: in order to refer to a locally built native module from an executable, you need to use something like (import /build/set)
, while to import an installed native module you need to use (import set)
. You can sort of hack this by symlinking your jpm_tree/lib
to the build/
directory but… is it worth it?
Here’s a template for such a module:
(declare-project
:name "set")
(declare-native
:name "set/native"
:source ["set.c"])
(declare-source
:source "init.janet"
:prefix "set/")
Note that we have to rename the native module so that (import set)
will import our pure Janet module instead:
(import set/native :as set)
(defn intersection [set1 set2]
(def new-set (set/new))
(each element set1
(if (set2 element)
(set/add new-set element)))
new-set)
(import set/native :prefix "" :export true)
Importing it twice like that is purely a stylistic choice; if you’re okay working with unqualified names like new
and add
you can just (import set/native :prefix "" :export true)
at the top of the file.
Now if we add this as a dependency to another project, we’ll wind up with something that looks like this:
tree jpm_tree
jpm_tree
├── bin
├── lib
│ └── set
│ ├── init.janet
│ ├── native.a
│ ├── native.meta.janet
│ └── native.so
└── man
And when we (import set)
, it will import our set/init.janet
module that re-exports the set/native
module.
Okay! That’s just about everything you need to know about Janet’s foreign function interface. We learned how to create an abstract type, and we learned how to call into C code from Janet. But, umm… we didn’t really do anything foreign, did we?
Usually you’ll create abstract types and write native modules in order to interoperate with existing C libraries, like sqlite
or libcurl
. And if you think about it, we sort of did that, except that our “existing library” was just the Janet C API.
But this allowed us to skip over something very important: we haven’t talked about how to link in code from actual external libraries. And I don’t want you to leave this chapter feeling cheated, so we’re going to do one more thing before we go.
There’s an open-source library called immer
that provides persistent, immutable data structures in C++. I’ve never used it before, but I think it looks pretty neat, and it includes a set type. So let’s write bindings for it!
Except… we’re not going to do that together. It’s so similar to what we already did that we’d just be rehashing all of the exact same ground, so instead we’re just going to talk about the differences.
First off, we declare the project like this:
(declare-project :name "jimmy")
(declare-native
:name "jimmy/native"
:source ["src/jimmy.cpp"]
:cppflags ["-Iimmer" "-std=c++14"])
(declare-source
:source [
"src/set.janet"
"src/init.janet"
]
:prefix "jimmy")
And then… that’s it. That’s the only difference. Everything else is exactly the same. jpm
detects that we’re using C++ by the file extension, and produces static and dynamic native modules that each statically link the immer
library that I vendored as a git submodule (which jpm
will automatically fetch).
We’re not going to walk through the code together, but the code is out there for you to peruse at your leisure, in case you ever find yourself wanting to interoperate with a C++ API. It might be useful to see the directory structure jpm
expects in order for you to be able to distribute a library with nested submodules and native components.
Okay. In the last chapter we learned how to call C code from Janet. In this chapter, we’re going to learn how to call Janet code from C.
Specifically, we’re going to learn how to embed the Janet interpreter inside a larger app — it doesn’t have to be written in C, as long as it has a C FFI. But we’ll stick with C as our lingua franca.
Oh wait! I forgot that JavaScript was supposed to be our lingua franca. Oh no. Oh no. We just spent a whole chapter writing C code together and you didn’t say anything to me? Did you forget about the (say)
function?
Well, hmm. Maybe we can have two… linguae franca? Lingua francae? Whatever. Maybe we can write a C program that embeds Janet, but call that program from JavaScript via WebAssembly: we’ll still learn how to embed Janet, but in the end we’ll have a program that runs in the browser so other people can actually use it.
Sounds like a good idea for a chapter! Let’s do it.
We actually already saw how to embed Janet into a C program, back in Chapter Seven, when we looked at how jpm
produces native executables:
static const unsigned char bytes[] = {215, 0, 205, /* ... */};
const unsigned char *janet_payload_image_embed = bytes;
size_t janet_payload_image_embed_size = sizeof(bytes);
int main(int argc, const char **argv) {
defined(JANET_PRF)
uint8_t hash_key[JANET_HASH_KEY_SIZE + 1];
JANET_REDUCED_OS
char *envvar = NULL;
char *envvar = getenv("JANET_HASHSEED");
if (NULL != envvar) {
strncpy((char *) hash_key, envvar, sizeof(hash_key) - 1);
} else if (janet_cryptorand(hash_key, JANET_HASH_KEY_SIZE) != 0) {
fputs("unable to initialize janet PRF hash function.\n", stderr);
return 1;
}
janet_init_hash_key(hash_key);
janet_init();
/* Get core env */
JanetTable *env = janet_core_env(NULL);
JanetTable *lookup = janet_env_lookup(env);
JanetTable *temptab;
int handle = janet_gclock();
/* Unmarshal bytecode */
Janet marsh_out = janet_unmarshal(
janet_payload_image_embed,
janet_payload_image_embed_size,
0,
lookup,
NULL);
/* Verify the marshalled object is a function */
if (!janet_checktype(marsh_out, JANET_FUNCTION)) {
fprintf(stderr, "invalid bytecode image - expected function.");
return 1;
}
JanetFunction *jfunc = janet_unwrap_function(marsh_out);
/* Check arity */
janet_arity(argc, jfunc->def->min_arity, jfunc->def->max_arity);
/* Collect command line arguments */
JanetArray *args = janet_array(argc);
for (int i = 0; i < argc; i++) {
janet_array_push(args, janet_cstringv(argv[i]));
}
/* Create enviornment */
temptab = env;
janet_table_put(temptab, janet_ckeywordv("args"), janet_wrap_array(args));
janet_gcroot(janet_wrap_table(temptab));
/* Unlock GC */
janet_gcunlock(handle);
/* Run everything */
JanetFiber *fiber = janet_fiber(jfunc, 64, argc, argc ? args->data : NULL);
fiber->env = temptab;
JANET_EV
janet_gcroot(janet_wrap_fiber(fiber));
janet_schedule(fiber, janet_wrap_nil());
janet_loop();
int status = janet_fiber_status(fiber);
janet_deinit();
return status;
Janet out;
JanetSignal result = janet_continue(fiber, janet_wrap_nil(), &out);
if (result != JANET_SIGNAL_OK && result != JANET_SIGNAL_EVENT) {
janet_stacktrace(fiber, out);
janet_deinit();
return result;
}
janet_deinit();
return 0;
}
<janet.h>
And if all that we want to do is run some Janet code that we already wrote and compiled ahead of time, then this little snippet is all we need.
But you probably aren’t embedding the whole Janet runtime just so that you can write part of your application logic in a higher-level language. The real reason to embed Janet in your program is so that you can run Janet scripts that you didn’t write at all: plugins, mods, extensions — whatever you want to call them.
There are lots of neat things you can do with an embedded programming language, but since we only have one chapter to talk about this, we’ll have to pick a specific project. I think “programmatic art playground” is as good a genre as any, so we’re going to talk about how to build an app where users can write scripts that draw turtle graphics.
Here, it’s easier if you take a look, so I don’t have to explain it in too much detail: https://toodle.studio. That’s the final product that we’re going to be working towards: users can write scripts that have access to a pre-defined drawing DSL, and our program will execute those scripts asynchronously over time to make little animations.
But in case you are reading this book on paper — which you aren’t; I can tell — or have no patience for whimsical art playgrounds — which you don’t; I can tell — I will briefly summarize the features of our application:
So, to get more technical, we have the following bits of state in our wrapper application (i.e., in JavaScript):
Then we’ll have the following bits of runtime state, which we will store as entries in the environment of our running program:
That second part might seem like a weird unimportant detail that someone should probably just leave out of their book, but it’s actually going to be important once we get to the code. Just trust me. I am leaving out a lot of other details — in the actual application you can pause the current program, for example — but these are the only interesting, Janet-related bits of state.
Okay, so: how do we do this?
Let’s start small. Let’s say we have a string of Janet code. How do we run it?
Well, we just need to do exactly what Janet does when we import
a file: parse the source, go through each top-level statement’s abstract syntax tree, perform macro expansion on it until we’re all out of macros, call the magic built-in compile
function to turn the abstract syntax tree into a function, and then run that function.
But that sounds like a lot of work. We don’t want to do all of that by hand, in C code, even though we could. But fortunately Janet has a helper function that will do all of the hard work for us: run-context
.
The signature for run-context
is pretty intimidating, because it has a million optional arguments that we could use to override how parsing works or what to do on compilation errors or whatever, but the minimal API is pretty easy to use:
(defn evaluate [user-script]
(def env (make-env root-env))
(run-context
{:env env
:chunks (chunk-string user-script)
:on-status (fn [fiber value]
(printf "got %q (%q)" value (fiber/status fiber)))}))
run-context
will execute each top-level form in its own fiber that will catch errors. It takes a few arguments:
env
is the environment that we’ll execute the script in.
root-env
is the environment that contains all of the built-in functions like +
and array/slice
and all that. We don’t actually want to execute Marley’s program directly in that environment, so we use make-env
to create a new table with root-env
as its prototype. That way Marley’s script will still be able to read all of those built-in functions, but any new symbols that she defines will not pollute the global root-env
.
:chunks
is the actual input.
on-status
is a callback that will run after each top-level form is finished executing.
Let’s talk about :chunks
for a minute. run-context
doesn’t take a string directly, but instead takes a callback that it will keep invoking until it returns nil
. That callback is responsible for writing bytes into a buffer that it takes as an argument. It also takes a Janet parser, which we can just ignore.
There isn’t a default way to get a “chunking” function for a string, so I wrote a short helper:
(defn chunk-string [str]
(var unread true)
(fn [buf _]
(when unread
(set unread false)
(buffer/blit buf str))))
This might seem like a weird API, but typically Janet expects to be reading from a file, or from a REPL, where not all “chunks” are available up front.
So that’s run-context
, in its simplest form. Now that we understand it, let’s test it out:
(defn evaluate [user-script]
(def env (make-env root-env))
(run-context
{:env env
:chunks (chunk-string user-script)
:on-status (fn [fiber value]
(printf "> %q (%q)" value (fiber/status fiber)))}))
(evaluate `
(+ 1 2)
foo
(print "done")
(pp (yield 10))
)
(error "oh no")
`)
janet eval.janet
> 3 (:dead)
<anonymous>:2:1: compile error: unknown symbol foo
done
> nil (:dead)
> 10 (:pending)
nil
> nil (:dead)
<anonymous>:5:1: parse error: unexpected closing delimiter )
> "oh no" (:error)
Let’s notice a few things about this:
Every top-level expression is wrapped in its own fiber.
Usually on-status
is called once for every top-level statement (:dead
means that a fiber completed successfully).
But if a top-level statement yield
s, on-status
will be called multiple times, until it completes or errors.
When a top-level expression yields, run-context
will resume it with whatever on-status
returns. In this case that was just nil
, because printf
returns nil
.
run-context
keeps going even after a parse or compilation error.
That last thing is probably not what we want most of the time, because a compilation error might change the behavior of the rest of our code in unpredictable ways (if, for example, a function that was supposed to be shadowed wasn’t).
We can fix that by adding a couple more callbacks to our run-context
call:
(defn evaluate [user-script]
(def env (make-env root-env))
(defn on-parse-error [parser where]
(bad-parse parser where)
(set (env :exit) true))
(defn on-compile-error [msg fiber where line col]
(bad-compile msg fiber where line col)
(set (env :exit) true))
(run-context
{:env env
:chunks (chunk-string user-script)
:on-status (fn [fiber value]
(printf "> %q (%q)" value (fiber/status fiber)))
:on-parse-error on-parse-error
:on-compile-error on-compile-error
}))
(evaluate `
(+ 1 2)
foo
(print "done")
(pp (yield 10))
)
(error "oh no")
`)
janet eval.janet
> 3 (:dead)
<anonymous>:2:1: compile error: unknown symbol foo
<anonymous>:5:1: parse error: unexpected closing delimiter )
Setting (env :exit)
to true
is how we signal to run-context
that we don’t want it to keep going — although as you can see, it might not stop parsing immediately. But that’s pretty harmless.
bad-parse
and bad-compile
are functions that print out those stack traces; they are the default values for the on-parse-error
and on-compile-error
callbacks.
So this is all we need to write if we just want to run the user code, but remember that this code is going to have side effects — specifically, in our case, this code might create turtles.
We’ll want to be able to inspect these turtles later on, so we’re going to return the final environment — but only if there was no error.
(defn capture-stderr [f & args]
(def buf @"")
(with-dyns [*err* buf *err-color* false]
(f ;args))
(string/slice buf 0 -2))
(defn evaluate [user-script]
(def env (make-env root-env))
(var err nil)
(var err-fiber nil)
(defn on-parse-error [parser where]
(set err (capture-stderr bad-parse parser where))
(set (env :exit) true))
(defn on-compile-error [msg fiber where line col]
(set err (capture-stderr bad-compile msg nil where line col))
(set err-fiber fiber)
(set (env :exit) true))
(run-context
{:env env
:chunks (chunk-string user-script)
:on-status (fn [fiber value]
(printf "> %q (%q)" value (fiber/status fiber)))
:on-parse-error on-parse-error
:on-compile-error on-compile-error
})
(if (nil? err)
env
(if (nil? err-fiber)
(error err)
(propagate err err-fiber))))
And we’re actually done now! That is a fully-fledged Janet evaluator.
Note that we’re still calling bad-parse
and bad-compile
, which print to (dyn *err*)
— typically stderr — but we redirect that to a buffer so that we can raise it later, and then strip off the trailing newline.
There’s a little bit of subtlety here around preserving stack traces nicely: we don’t pass the fiber to bad-compile
anymore, so it won’t print out a full stack trace, just the actual error. But later on we propagate
the error from the original fiber, so that it preserves the original stack trace (instead of coming from our evaluate
function).
Now that we have our working run-context
-based evaluator, let’s talk about how to actually call this function from our application code.
On the one hand, we have this Janet function, which takes a Janet string. On the other hand, we have… an HTML <textarea>
or something, from which we can extract a JavaScript string.
How do you convert a JavaScript string into a Janet string?
Well… it’s a little convoluted. We’re going to use something called Emscripten to compile native code into WebAssembly. Emscripten makes it really easy to interoperate between JavaScript and C++ code, so we’re going to take advantage of that power and write our wrapper program in C++, not C. Then we’ll use Emscripten to automatically translate our JavaScript string into a C++ string, and then convert that to a C string with the .c_str()
method, and then convert that C string to a Janet string with janet_cstringv
. Like I said: convoluted. This is the price we pay for writing a program that runs in the browser; if we were writing a native application, this would probably be a lot more straightforward.
But alright, assuming that we have the input as a Janet string… how do we call it from C?
There are a few steps:
evaluate
function, using janet -c
.janet_pcall
.The first few parts are easy:
static JanetFunction *janetfn_evaluate;
int main() {
janet_init();
Janet environment = janet_unmarshal(...);
JanetTable *env_table = janet_unwrap_table(environment);
Janet evaluate;
janet_resolve(env_table, janet_csymbol("evaluate"), &evaluate);
janet_gcroot(evaluate);
janetfn_evaluate = janet_unwrap_function(evaluate);
}
Note that we also add it as a garbage collector root with the janet_gcroot
function. This is very important!
Because later on, when we call this function:
bool call_fn(JanetFunction *fn, int argc, const Janet *argv, Janet *out) {
JanetFiber *fiber = NULL;
if (janet_pcall(fn, argc, argv, out, &fiber) == JANET_SIGNAL_OK) {
return true;
} else {
janet_stacktrace(fiber, *out);
return false;
}
}
struct EvaluationResult {
bool is_error;
string error;
uintptr_t environment;
};
EvaluationResult toodle_evaluate(string source) {
Janet environment;
const Janet args[1] = { janet_cstringv(source.c_str()) };
if (!call_fn(janetfn_evaluate, 1, args, &environment)) {
return (EvaluationResult) {
.is_error = true,
.error = "evaluation error",
.environment = 0,
};
}
janet_gcroot(environment);
return (EvaluationResult) {
.is_error = false,
.error = "",
.environment = reinterpret_cast<uintptr_t>(janet_unwrap_table(environment)),
};
}
We’ll have to reference janetfn_evaluate
, and we’ll be very sad if it has been garbage collected in the interim. Which it will, by default — there’s no reason for Janet to keep this unmarshaled value around.
We could also add the entire unmarshaled environment of our program as a gcroot
, which would cause all of the functions we define to stay alive. This will come in handy once we start defining more of them, though it would mean that we’d be retaining slightly more memory than we need: the environment table itself and the binding entry tables, not just the :value
s that we care about.
Now note that, unfortunately, Emscripten doesn’t let us return structs containing pointers to JavaScript, so we’ll have to convert it to a number first (specifically, a uintptr_t
). And because C++ lacks variant types, we’ll have to put it in a sort of dumb-looking struct with an explicit tag.
Now, this is all fine. This works.
But remember the constraints of our program: we want to be able to start and re-start this program. But an environment is a living, breathing thing — it will contain fibers and reference types and all sorts of things that we can’t just “restart.”
Hmm. If we knew that all we had were immutable values, we could just hold onto the original… but there are mutable values all over the place. Our turtles are fibers, and fibers might have arbitrary amounts of internal, mutable state that we can’t possibly know about.
So since our program contains mutable state, we won’t be able to run it. Instead, we’ll have to run a clone of it, and hold on to a pristine copy of the original.
But how do we clone the environment? It’s not sufficient to copy the environment table itself — we’d have to make a deep copy of the table, plus all the data structures it references, and all of the fibers inside of it…
There’s no one step “deep copy everything” function in Janet, but there is a way to do this: we can marshal the environment to a buffer, freezing it in carbonite, and from this static image of our program we can instantiate as many living copies as we like.
In fact, huh. We evaluate a script, produce an environment, and then marshal that environment into an image, which we will then resume later… does any of that sound familiar?
Exactly: we learned about this all the way back in Chapter Two. I’ve just described imagination. Er, compilation, I mean.
So we aren’t really evaluating Marley’s script (although we are). We’re really compiling Marley’s script into an image, that we can then breathe life into as many times as we like.
With this insight in mind, let’s modify our function slightly:
struct CompilationResult {
bool is_error;
string error;
uintptr_t image;
};
CompilationResult toodle_compile(string source) {
Janet environment;
const Janet args[1] = { janet_cstringv(source.c_str()) };
if (!call_fn(janetfn_evaluate, 1, args, &environment)) {
return (CompilationResult) {
.is_error = true,
.error = "compilation error",
.image = 0,
};
}
JanetTable *reverse_lookup = env_lookup_table(janet_core_env(NULL), "make-image-dict");
JanetBuffer *image = janet_buffer(2 << 8);
janet_marshal(image, environment, reverse_lookup, 0);
janet_gcroot(janet_wrap_buffer(image));
return (CompilationResult) {
.is_error = false,
.error = "",
.image = reinterpret_cast<uintptr_t>(image),
};
}
Instead of returning an environment, we now return an image of the environment.
Great! Which immediately tells us what we need to do next: we’ll need to write a function that takes an image and returns an actual environment. And then we’ll need another function that takes that environment and does something with it — advances the program; scoots the turtles forward.
Whenever Marley starts a new program or restarts the current program, we’ll unmarshal the corresponding image. And then we’ll call the advance function on every frame.
The “start” function is not very interesting; we’ve already seen how to unmarshal images:
uintptr_t toodle_start(uintptr_t image_ptr) {
JanetBuffer *image = reinterpret_cast<JanetBuffer *>(image_ptr);
JanetTable *lookup = env_lookup_table(janet_core_env(NULL), "load-image-dict");
Janet environment = janet_unmarshal(image->data, image->count, 0, lookup, NULL);
janet_gcroot(environment);
return reinterpret_cast<uintptr_t>(janet_unwrap_table(environment));
}
But the “run” function is quite interesting.
For starters, we’ll have to call a Janet function that actually knows the internal details of our environment and what to do with it. I’ll call it janetfn_run
, and we’ll assume that we extracted it in main()
exactly as we did janetfn_evaluate
.
Now this function is going to return two things: it will return a list of lines to draw, and it will return a color that will determine how the image fades out over time. This means we’ll really call two functions: first janetfn_run
, and then janetfn_get_bg
.
And therein lies the interesting bit of all of this!
RunResult toodle_run(uintptr_t environment_ptr) {
JanetTable *environment = reinterpret_cast<JanetTable *>(environment_ptr);
Janet run_result;
Janet bg;
const Janet args[1] = { janet_wrap_table(environment) };
if (!call_fn(janetfn_run, 1, args, &run_result)) {
return run_error("evaluation error");
}
janet_gcroot(run_result);
if (!call_fn(janetfn_get_bg, 1, args, &bg)) {
return run_error("evaluation error");
}
janet_gcunroot(run_result);
JanetArray *lines = janet_unwrap_array(run_result);
int32_t count = lines->count;
auto line_vec = std::vector<Line>();
// convert the run_result into a C++ vector...
return (RunResult) {
.is_error = false,
.error = "",
.lines = line_vec,
.background = unsafe_parse_color(janet_unwrap_tuple(bg)),
};
}
Notice that we return the run_result
value from Janet to C. But then we jump back into the Janet runtime in order to extract the background color. But! We have to add the lines-to-draw value as a GC root before we give control back to the Janet VM. We don’t want the garbage collector to have a chance to collect that value before we’re done with it!
In fact any time we give control to the Janet VM with janet_pcall
or another function like that, we’re giving the garbage collector a chance to run, so we have to make sure that we set up the GC roots for any Janet
value that we have a reference to in C code. Once we’re out of the VM for good, we can remove the root, because the Janet GC won’t run unless we either run some Janet code or explicitly trigger a collection.
Yes, in this case we could just change the order of the code slightly so that we finish extracting values from run_result
before we call janetfn_get_bg
. But then our contrived example would be less educational.
And now we are almost done. But we’re missing something very important, and we can’t leave the chapter until we fix it: when we created our images and environments, we added them as janet_gcroot
s. But we never called janet_gcunroot
on them! Which means we have a memory leak.
In order to plug it, we’ll have to add four more simple functions:
void retain_environment(uintptr_t environment_ptr) {
janet_gcroot(janet_wrap_table(reinterpret_cast<JanetTable *>(environment_ptr)));
}
void release_environment(uintptr_t environment_ptr) {
janet_gcunroot(janet_wrap_table(reinterpret_cast<JanetTable *>(environment_ptr)));
}
void retain_image(uintptr_t image_ptr) {
janet_gcroot(janet_wrap_buffer(reinterpret_cast<JanetBuffer *>(image_ptr)));
}
void release_image(uintptr_t image_ptr) {
janet_gcunroot(janet_wrap_buffer(reinterpret_cast<JanetBuffer *>(image_ptr)));
}
I call these functions retain
and release
, because we’re going to treat Janet values as if they are reference-counted, which they basically are. The reference counts aren’t intrusive like you might be used to — when we “retain” a value, we’re really adding it to a list, and when we “release” a value, we’re removing it from the same list — but still, values can appear in the Janet root list multiple times, and janet_gcunroot
will only remove one entry for the corresponding value.
So: how do we use these?
Well, for every image returned from toodle_compile
, we’ll need to eventually call release_image
. And for every image returned from toodle_start
, we’ll need to call release_environment
.
But! Remember that our program actually needs to hold onto two images: the image of the currently-running program (so that we can restart it), and the image of the “next” program that we’ve successfully compiled (so that we can switch over to it without having to re-compile the user’s script). And these values might be the same value at many points in time! So we’ll also need to call retain_image
when we mark a current image as the “active” image, to ensure that it doesn’t get garbage collected when it is no longer the “next” image. Which means that we’ll need to call release_image
one more time, on the previous “active” image, before we retain the new one.
I won’t go too much into reference-counted memory management in this book, but it’s essentially a matter of balancing parentheses. Whenever we create or retain a value, we need to remember to release it later. And if we ever create a new reference to a value, we have to remember to retain it, and then to release it, once the reference changes or goes out of scope.
Our program has two variables that can reference Janet values:
let potentialNextImage: Image | null;
let currentImage: Image | null;
These values might be the same, or they might be different. But whenever we say currentImage = potentialNextImage
— whenever we promote a newly compiled image to be the “current” program — we’re essentially taking another reference to the same Janet value. So we want to retain that image (and release the previous image).
Remember, though, that we don’t want to retain something that comes directly from one of the call to toodle_compile
or toodle_start
, because they begin with a “reference count” of 1. We could change this, and make them begin with a reference count of 0, but then we would have to be very careful to retain them from JavaScript before we give control back to the Janet VM. Which is completely fine! And a reasonable, consistent way to decide to manage memory in Janet.
Okay. Now that we’ve fixed the leaks, we’re basically done. But there’s one final detail of the implementation that’s worth mentioning.
We have to parse the returned “list of lines” into a C++ struct, which Emscripten will automatically translate into a JavaScript object for us. But we’re parsing values that are produced, in part, by a script that Marley wrote. And Marley, as you know, is not to be trusted: even though a turtle is supposed to yield lines, Marley could have written turtles that misbehave, and yield arbitrarily crazy values.
So in order to parse the results of the fiber invocations, we need to validate the values. And it’s going to be much, much easier to do that validation in Janet than it would be to do it from our C++ wrapper. So before we return anything into C++, we have to validate that all of the data we’re going to return is in the format that our C++ code expects it. And then in our C++ wrapper, we blindly trust that we have the correct shape of data.
We could of course do the validation in C++ instead, and I think it even feels more correct to do that: we won’t need to rely on careful coordination between the Janet validator and the C++ parser that way. But it’s a trade-off, and it’s so much easier to write the validation logic in Janet that I would rather just be extra careful about keeping them in sync.
The rest of our program is, well, the actual application — the JavaScript UI, the DSL for declaring turtles, the Emscripten bindings to allow us to speak C++ from JavaScript, the 3D turtle logo with eyes that track the mouse, etc. The full code is available online if you’re curious about the details, but we won’t go over much more of it.
But there’s still one more interesting bit to talk about.
It concerns the turtle DSL.
Let’s consider this very simple program:
(var hue (/ 2 6))
(toodle {:width 3 :speed 0}
(set (self :color) (hsv hue 1 1))
(+= hue 0.001)
(turn 0.08)
(+= (self :speed) 0.01))
This program creates a single turtle that draws an outward spiral.
But that is not actually how I want to write that program. I’d rather write it like this instead:
(var hue (/ 2 6))
(toodle {:width 3 :speed 0}
(set self.color (hsv hue 1 1))
(+= hue 0.001)
(turn 0.08)
(+= self.speed 0.01))
Look at that self.field
notation. That isn’t Janet! What’s up with that?
Well, there’s one more argument to run-context
: :expander
.
:expander
is a function that runs on every top-level form that takes the abstract syntax tree and returns a new one. We can use it to, essentially, wrap every top-level form in a custom macro.
And that dot syntax? That’s just a macro that searches through the abstract syntax tree and rewrites symbols with dots in them, like foo.bar
into (foo :bar)
instead.
That’s a pretty mild extension, but we could use this feature to create arbitrary Janet dialects, if we wanted to. We could add infix operators, or special syntax that doesn’t need to exist within a macro call.
In fact, we could even replace the parser altogether, and design a language with whitespace-sensitive indentation that parses into normal Janet tuples. We could re-use the Janet compiler and runtime with a completely custom syntax, if we wanted to.
But we’re not going to do that in this book.
This book is just about done talking about embedding Janet. But this book would like to talk about one last detail before we bring the chapter to a close: what happens if Marley writes a function with an infinite loop?
Sadly, the answer is that her entire browser tab will freeze.
But in general, there is a function called janet_interpreter_interrupt
, which will, umm, interrupt the interpreter. But of course, we need to call it from a separate thread: if the current thread is spinning in an infinite loop, there’s no way that we’ll be able to sneak a janet_interpreter_interrupt
in there.
Sadly implementing this in the browser is so difficult that I have to leave it as an exercise for the reader. You can, perhaps, create shared WebAssembly memory that you call into from a web worker… or perhaps you cannot. I could not, at least, in time to satisfy this book’s publisher. Who is me. It’s self-published. But I wanted to release this book instead of fighting with asynchronous browser APIs or undocumented Emscripten features. I’m sure you can understand.
So just… don’t write infinite loops.
I know that you’ve been looking forward to this chapter the whole book.
Sure, compile-time metaprogramming or whatever is fine, but when do we get to talk about automated test suites? You haven’t even thought about skipping this one.
I hope we can forego the spiel about the importance of testing — we’re both adults here, and I’m going to assume that if you’ve ever worked on a codebase in a dynamically typed language for longer than six weeks, you understand the value of a comprehensive automated test suite and an ergonomic testing framework.
But we actually don’t need any test framework at all to start writing tests in Janet. jpm
comes with a test
subcommand that just works out of the box: by default, jpm test
will recursively search a directory called test/
for any .janet
files, and then it will compile and run them, and check if any of them exit non-zero.
For example:
(assert (= (+ 2 2) 3))
jpm test
running test/math.janet ...
error: assert failure in (= (+ 2 2) 3)
in _thunk [test/math.janet] (tailcall) on line 1, column 1
non-zero exit code in test/math.janet: 1
Failing test scripts: 1
That’s all it takes to write a test!
Or, well, we’ll probably want to test something that we actually wrote:
(defn add [x y] (+ x y))
(use /src/math)
(assert (= (add 2 2) 4))
/src/math
is a path relative to the current working directory. You don’t want to use working-directory-relative imports in actual library code, because you have no idea what the current working directory will be when your code runs. But for test code, it works great, and means that we can import using the same paths from all of our scripts.
For very simple programs, a few assertions in a file like this is probably sufficient. But I don’t usually write tests for very simple programs. I usually write tests for complex programs, and I don’t think that this is a very nice way to test complex programs.
For starters, this type of test stops as soon as we reach the first failing assertion in each file. But sometimes it’s useful to see all failing tests, not just the first one. “Run a script and look at the exit code” doesn’t leave a lot of room for the sort of granular error-handling that would be necessary to continue after an error, and it also doesn’t give us any nice way to group multiple assertions within a single file, or easily select individual tests to run without re-running the whole suite.
To get those nice features, we’ll have to bring in a test framework. And we have a few choices on that front, but this chapter is going to focus on one test framework in particular.
It’s called Judge, and, full disclosure, I am the author of Judge, so I am definitely biased. But I am the author of Judge because automated testing is a weirdly strong passion of mine, and I believe that Judge’s approach to testing is a material improvement over traditional assertion-based testing.
Judge is a framework for writing inline snapshot tests. I’m going to assume that you’ve never heard of inline snapshot testing before, because the technique is not extremely well-known, but lots of test frameworks in lots of languages support it. But it’s often an afterthought, or a bonus feature. In Judge, it is the only feature.
Inline snapshot testing is basically like running a repl directly in your source code. Except it’s a persistent repl that you can re-play in the future, or even share with other people.
A simple port of our test into Judge might look like this:
(use /src/math)
(use judge)
(test (add 1 2))
Notice that there’s no, umm, assertion there. There’s an expression to test, but it says nothing about what we want it to be. Just like if we typed this at the repl!
When we run the test, Judge will fill in the answer — directly in our actual source code.
(use /src/math)
(use judge)
(test (add 1 2) 3)
And then, the next time we run this test, Judge will only tell us if the output changes. If during some delicate refactor we accidentally break the function:
(defn add [a b]
(+ a a)) # whoops
Then when we run tests again, Judge will let us know:
judge
running test: test/math.janet:4:1
- (test (add 1 2) 3)
+ (test (add 1 2) 2)
0 passed 1 failed 0 skipped 0 unreachable
Now, this might seem really weird to you at first. It might seem like we’re assuming that the code is correct the first time we run it, and that we’re immortalizing whatever random value it happens to give us for the rest of time. But we’re not: there’s a still a human in the loop, and it is a loop. If the output isn’t right, we’ll notice, and we’ll fix the bug, and we’ll run it again.
But what if we don’t notice? What if we erroneously accept bad output when Judge prompts us, and then don’t notice when we’re staging our commit, and then don’t notice during code review either? Well, then, yes. We’d have a bug. But. I mean. Come on.
Now, I am committing a pedagogical sin here, by showing you examples of tests that no sane human would ever write.
So let’s look at something slightly less trivial:
(test (sorted-by - [10 3 1 28 4]))
sorted-by
is a built-in Janet function, but let’s pretend that we wrote it and we want to test it. We could, of course, write an assertion about the output. But sorting this list by hand would probably take me a few seconds, and I can check if the output looks right almost instantly:
- (test (sorted-by - [10 3 1 28 4]))
+ (test (sorted-by - [10 3 1 28 4]) @[28 10 4 3 1])
Because it’s so cheap and easy to write tests this way, I find that I write more of them. And because writing tests this way is easier and cheaper than jumping over to a repl, writing tests is the default way that I engage with my programs.
An important part of the ergonomics here is that Judge tests can exist inside your regular source files. We probably wouldn’t write a test like this in the test/
subdirectory — we’d probably write it directly in our source:
(use judge)
(defn add [x y] (+ x y))
(test (add 1 2) 3)
Not only is this more convenient — you don’t have to tab to a different file — but it also makes the code easier to read. The tests act as living, automatically-updating documentation for the behavior of the code.
Is this “test-driven development?” I have no idea. To me it’s just regular development: instead of writing tests before I wrote code, or writing tests after I write code, I write tests as I write code, trying things out in the repl that is my regular source files.
But this is an unusual workflow. I am very accustomed to it from my experience with OCaml, but a more common way to program in languages with lots of parentheses is to start a long-running repl server that you can use to talk to your program interactively. You can do this in Janet, using Spork’s netrepl
module — but by using source files as your repl, you get persistence over time (because you can run your tests later) and over space (because you can share them with other people) for free. Traditional repls are single player affairs, but inline snapshot testing is multiplayer. Plus the kool-aid is delicious.
Alright, that’s enough about testing for now. Let’s move on to debugging.
One way that I like to debug things is an ancient, hallowed technique called “printf debugging.” Printf debugging is great, and don’t let anyone tell you differently.
Judge actually makes printf debugging even nicer, with a macro called test-stdout
. test-stdout
runs an expression and shows you what it prints. Like this:
(test-stdout (print "hello") `
hello
`)
On its own, this probably seems really silly. But you can do some wonderful things with it:
(deftest "testing output"
(def data [[0 1] [1 3] [2 5]])
(test-stdout (print-table ["x" "y"] data) `
╭───┬───╮
│ x │ y │
├───┼───┤
│ 0 │ 1 │
│ 1 │ 3 │
│ 2 │ 5 │
╰───┴───╯
`))
The way that you visualize test expressions can have a big difference on how effective your tests are. Of course Judge doesn’t care if you print that as a nicely formatted table or not — it’ll just make sure that it stays the same over time — but people care. Tests are arguments about the behavior of code, and arguments should be convincing. And it’s a lot easier to make sense of a graph or a chart or an image than it is to wade through a bunch of assertions or raw data.
But test-stdout
is more powerful than just pretty-printing a string and checking the result. It captures stdout dynamically, so you can use test-stdout
with printf
debugging to get a better understanding of your code. For example:
(defn slowsort [list]
(case (length list)
0 list
1 list
2 (let [[a b] list] [(max a b) (min a b)])
(do
(def pivot-index (math/floor (/ (length list) 2)))
(def pivot (list pivot-index))
(def smalls (filter |(< $ pivot) list))
(def bigs (filter |(> $ pivot) list))
[;(slowsort smalls) pivot ;(slowsort bigs)])))
That’s a not-completely-trivial function. Does it work? Let’s find out.
(use judge)
(test (slowsort [3 10 2 -5]) [-5 2 10 3])
Ah. No. That doesn’t look right.
I know you can already see the problem, but let’s pretend that we are mystified by this, and the only way we can get to the bottom of it is to sprinkle some printf
s over the code:
(defn slowsort [list]
(printf "slowsort %q" list)
(case (length list)
0 list
1 list
2 (let [[a b] list] [(max a b) (min a b)])
(do
(def pivot-index (math/floor (/ (length list) 2)))
(def pivot (list pivot-index))
(def smalls (filter |(< $ pivot) list))
(def bigs (filter |(> $ pivot) list))
(printf " %q %q %q" smalls pivot bigs)
[;(slowsort smalls) pivot ;(slowsort bigs)])))
And then change our test
expression to test-stdout
:
(test-stdout (slowsort [3 10 2 -5]) `
slowsort (3 10 2 -5)
@[-5] 2 @[3 10]
slowsort @[-5]
slowsort @[3 10]
` [-5 2 10 3])
Aha! We realize from looking at the execution trace that we flipped the output in the two-element case. We fix our test, and move on.
Hopefully to write more tests, because that was definitely not the only bug in that code.
Of course we don’t have to use Judge to do any of this — we could just run it by hand and look at the output. That’s a completely reasonable thing to do, but by running code with Judge we get to co-locate the expressions and the output they produce, which is easier for my brain to reckon with. We can also keep all of our existing editor tooling for running the test under the cursor or running all the tests in the current file, instead of having to modify our (main)
entry point somehow.
I think that that editor tooling is pretty important if you want Judge to replace the repl for you altogether, but unfortunately Judge does not have very good support in major editors. But it is one of the simpler integrations to write for your favorite editor: just run judge file.janet:line:col
, passing it the position of your cursor, and Judge will take it from there.
Okay, so I guess we didn’t really switch gears before. We’re still talking about testing, aren’t we. But let’s actually move on to debugging now. Even though they are intimately related, and often the easiest way to debug something is to write more tests for it, and—
Okay fine. There is more to life than writing automated tests.
So Janet actually includes an interactive step-through debugger. You can bring up at any point, and you can have Janet automatically bring it up on uncaught errors. Let’s take a look:
(defn inc [x]
(+ x 1))
(defn main [&]
(print (inc "foo")))
If we just run that, we’ll get an error:
janet debug.janet
error: could not find method :+ for "foo"
in inc [debug.janet] on line 2, column 3
in main [debug.janet] (tailcall) on line 5, column 10
in run-main [boot.janet] on line 3795, column 16
in cli-main [boot.janet] on line 3940, column 17
And, you know, this happens to be a pretty clear error message. But let’s pretend, for example’s sake, that we are mystified, and cannot understand what has gone wrong.
Enter the Janet debugger:
janet -d debug.janet
error: could not find method :+ for "foo"
in inc [debug.janet] on line 2, column 3
in main [debug.janet] (tailcall) on line 5, column 10
entering debug[1] - (quit) to exit
debug[1]:1:>
Now we’re in a prompt like a repl that we can use to poke around. First off, let’s try to figure out where we are.
debug[1]:1:> (.stack)
error: could not find method :+ for "foo"
in inc [debug.janet] on line 2, column 3
in main [debug.janet] (tailcall) on line 5, column 10
nil
Hmm, okay. That tells us where the error started, although we sort of remembered that from the original message. But what was at debug.janet:2:3
again? It would be nice to see the actual expression that raised.
debug[1]:2:> (.source)
(defn inc [x]
(+ x 1))
(defn main [&]
(print (inc "foo")))
nil
Unfortunately Janet’s debugger doesn’t have a way to combine the stack frames with the actual source. It would be nice to highlight the current stack frame in context. But okay — the information is there.
So it seems like maybe the problem has to do with this x
. What is x
?
debug[1]:3:> x
<anonymous>:3:1: compile error: unknown symbol x
Ah, hmm. This might be surprising, but the debugger repl is not actually running in the context of the place where we’re currently paused. Instead, to inspect the local environment, we have to use:
debug[1]:4:> (.locals)
@{inc <function inc> x "foo"}
Which gives us a table of all the local bindings.
(.locals)
is short for (.locals 0)
. You can also call (.locals 1)
to get the locals in the stack frame below this one.
That’s about the extent of your debugging abilities, if you’re stopped on an error like this.
However there are other ways to conjure the debugger. Let’s consider a slightly more complicated program:
(defn enemy-of-enemy [name people]
(def subject (people name))
(def nemesis (people (subject :nemesis)))
(people (nemesis :nemesis)))
(defn main [&]
(def people {"ian" {:age "young at heart"
:nemesis "jeffrey"}
"jeffrey" {:age 7.5
:nemesis "sarah"}})
(print (enemy-of-enemy "ian" people)))
Let’s try running it:
janet step.janet
Ah. Just… a blank line. Huh. That’s mysterious. There’s no error, but still… it would be nice if we could poke around inside the runtime environment of our program to see what we did wrong.
Which we can do by adding a call to (debug)
:
(defn enemy-of-enemy [name people]
(debug)
(def subject (people name))
(def nemesis (people (subject :nemesis)))
(people (nemesis :nemesis)))
janet step.janet
debug:
in enemy-of-enemy [step.janet] on line 2, column 3
in main [step.janet] (tailcall) on line 12, column 10
in run-main [boot.janet] on line 3795, column 16
in cli-main [boot.janet] on line 3940, column 17
Ah, umm, whoops. By default the (debug)
function just raises a signal, and there’s nothing handling that signal unless we start our program with janet -d
:
janet -d step.janet
debug:
in enemy-of-enemy [step.janet] on line 2, column 3
in main [step.janet] (tailcall) on line 12, column 10
entering debug[1] - (quit) to exit
debug[1]:1:>
That’s better. Now let’s see what went wrong:
debug[1]:1:> (.locals)
@{enemy-of-enemy <function enemy-of-enemy> name "ian" people {"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}}}
Well, yes, the function just started. Let’s see if maybe the subject
lookup failed, by stepping over the next instruction:
debug[1]:2:> (.step)
nil
debug[1]:3:> ((.locals) 'subject)
nil
Ah. Well. You might conclude that the subject
lookup didn’t work, but actually…
Actually we did not step over the line (def subject (people name))
. We actually stepped through a single virtual machine instruction.
We haven’t talked much about this yet, but we aren’t actually interpreting Janet code at runtime. We’re actually interpreting a precompiled bytecode.
We can look at the code we’re “actually” running with .ppasm
:
debug[1]:4:> (.ppasm)
signal:
status: debug
function: enemy-of-enemy [step.janet]
constants: @[:nemesis]
slots: @["ian" {"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}} <function enemy-of-enemy> nil nil nil nil nil nil]
lds 2 # line 1, column 1
ldn 4 # line 2, column 3
sig 3 4 2
> push 0 # line 3, column 16
call 4 1
movn 5 4 # line 3, column 3
ldc 6 0 # line 4, column 24
push 6
call 6 5
push 6 # line 4, column 16
call 7 1
movn 6 7 # line 4, column 3
ldc 8 0 # line 5, column 11
push 8
call 8 6
push 8 # line 5, column 3
tcall 1
nil
And that shows us where we actually are. We’re about to execute line 3, column 16 — that would be the (people name)
expression.
How do these instructions correspond to (people name)
? Well, push 0
pushes the name
argument onto the stack, and call 4 1
invokes the people
“function” with the current arguments on the stack, storing the result in “slot 4”.
How did I know that? Well, a big part of it is looking at the “slots” argument as a way to translate all those numbers into values:
["ian" {"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}} <function enemy-of-enemy> nil nil nil nil nil nil]
From this we can guess that slot 0
is the local variable name
and slot 1
is the local variable people
. Slot 2
is a reference to the function itself (to allow recursion), and the rest of the slots are currently nil
— they haven’t been filled in yet.
So we can rewrite this virtual machine code into something slightly more readable:
push name # line 3, column 16
slots[4] = call people
slots[5] = slots[4] # line 3, column 3
From this we can guess that slot 4 is a scratch space for the result of the (people name)
call, and slot 5 corresponds to the subject
local variable.
Let’s step a few more times until we’ve run those instructions:
debug[1]:5:> (.step)
nil
debug[1]:6:> (.step)
nil
debug[1]:7:> (.step)
nil
debug[1]:8:> (.ppasm)
signal:
status: debug
function: enemy-of-enemy [step.janet]
constants: @[:nemesis]
slots: @["ian" {"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}} <function enemy-of-enemy> nil {:age "young at heart" :nemesis "jeffrey"} {:age "young at heart" :nemesis "jeffrey"} nil nil nil]
lds 2 # line 1, column 1
ldn 4 # line 2, column 3
sig 3 4 2
push 0 # line 3, column 16
call 4 1
movn 5 4 # line 3, column 3
> ldc 6 0 # line 4, column 24
push 6
call 6 5
push 6 # line 4, column 16
call 7 1
movn 6 7 # line 4, column 3
ldc 8 0 # line 5, column 11
push 8
call 8 6
push 8 # line 5, column 3
tcall 1
nil
Great. We just ran the movn
instruction. And now?
debug[1]:9:> ((.locals) 'subject)
{:age "young at heart" :nemesis "jeffrey"}
Great! Looks like it worked.
Stepping one instruction at a time isn’t exactly the best way to debug, though. And we can step multiple instructions at a time, by using (.step count)
. But I don’t really want to count instructions.
Fortunately, the .ppasm
output has line and column numbers, and we can use those to set breakpoints:
debug[1]:10:> (debug/break "step.janet" 5 11)
nil
And then we can continue until the next breakpoint:
debug[1]:11:> (.next)
nil
debug[1]:12:> (.ppasm)
signal:
status: debug
function: enemy-of-enemy [step.janet]
constants: @[:nemesis]
slots: @["ian" {"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}} <function enemy-of-enemy> nil {:age "young at heart" :nemesis "jeffrey"} {:age "young at heart" :nemesis "jeffrey"} {:age 7.5 :nemesis "sarah"} {:age 7.5 :nemesis "sarah"} nil]
lds 2 # line 1, column 1
ldn 4 # line 2, column 3
sig 3 4 2
push 0 # line 3, column 16
call 4 1
movn 5 4 # line 3, column 3
ldc 6 0 # line 4, column 24
push 6
call 6 5
push 6 # line 4, column 16
call 7 1
movn 6 7 # line 4, column 3
*> ldc 8 0 # line 5, column 11
push 8
call 8 6
push 8 # line 5, column 3
tcall 1
nil
Great! (The asterisk is telling us that we still have a breakpoint set there.)
So if you remember our original program, we just ran the line def nemesis
:
(defn enemy-of-enemy [name people]
(debug)
(def subject (people name))
(def nemesis (people (subject :nemesis)))
# we are here
(people (nemesis :nemesis)))
And now we’re going to do one more lookup in the people
struct and return the result. At this point we have all the information we need to figure out why this doesn’t work:
debug[1]:13:> (def nemesis ((.locals) 'nemesis))
{:age 7.5 :nemesis "sarah"}
debug[1]:14:> (def people ((.locals) 'people))
{"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}}
debug[1]:15:> (nemesis :nemesis)
"sarah"
debug[1]:16:> (people "sarah")
nil
Aha. That would do it, wouldn’t it?
Now, deciphering Janet bytecode was, umm, clearly overkill for such a trivial bug. But this runtime step-through debugging can be quite useful in hairier situations. It’s not the most ergonomic experience, though, so it’s probably not something that you should reach for unless you’re pretty well stumped already.
I don’t use the debugger much, personally. Janet’s interactive debugger is mostly useful for deep bugs — bugs that are hard to reproduce in test cases, that only show up after accumulating a bit of runtime state, or that only arise non-deterministically. Deep bugs are unavoidable, but I like to structure my programs to be as shallow as possible, so that I can shake out as many bugs as I can with automated tests.
There are some more commands available in the debugger that we haven’t talked about. You can see them all with autocomplete:
debug[1]:1:> .<TAB>
.break
.breakall
.bytecode
.clear
.clearall
.disasm
.fiber
.fn
.frame
.locals
.next
.nextc
.ppasm
.signal
.slot
.slots
.source
.stack
.step
And you can read about them with doc
:
debug[1]:2:> (doc .bytecode)
function
boot.janet on line 3411, column 3
(.bytecode &opt n)
Get the bytecode for the current function.
nil
But I think that .locals
, .ppasm
, and debug/break
are the most useful ones to know about.
There’s one more thing I’ll mention before we leave here.
By default, you can’t use janet -d
to break on top-level errors. This means that you can’t use the interactive debugger for errors that you encounter during the compilation phase — including errors inside macro definitions.
(error "oh no")
janet -d raise.janet
error: oh no
in _thunk [raise.janet] (tailcall) on line 1, column 1
To catch top-level errors like this, you also have to pass -p
:
janet -p -d raise.janet
error: oh no
in _thunk [raise.janet] (tailcall) on line 1, column 1
entering debug[1] - (quit) to exit
debug[1]:1:>
This can be pretty helpful when you’re debugging macros so tricky that your tests don’t even compile.
I think that Janet is a very good scripting language. Janet scripts have almost no startup time, PEGs make ad-hoc text-wrangling easy and fun, and you can even compile Janet scripts to native executables if you want to share them with people who have never even heard of Janet.
But in this chapter we’re going to talk about a couple of libraries that transform Janet from a very good scripting language into the best scripting language.
First of all, I want to talk about a library called sh
, which was written by a prolific Janetor named Andrew Chambers. sh
is one of the few libraries that I would recommend installing globally, because it’s Just That Good:
jpm install sh
But also because that way you can import it from any shebanged Janet script you write without having to set up a project.janet
first.
The core of the sh
library is a macro called $
, which executes a shell command:
#!/usr/bin/env janet
(use sh)
($ echo "hey janet")
./greet
hey janet
But $
supports a surprising amount of shell-like syntax. For example, you can use it set up a multi-process pipeline:
#!/usr/bin/env janet
(use sh)
($ echo "hey janet" | tr "a-z" "A-Z")
./greet
HEY JANET
Just like in a real shell.
Of course there’s no real reason to shell out to echo
or tr
like this in real life, since we can just call print
and string/ascii-upper
. But there are gonna be a lot of dumb examples in this chapter.
sh
also supports redirection, either to files:
#!/usr/bin/env janet
(use sh)
($ echo "hey janet" >(file/open "output.txt" :w))
($ cat output.txt)
./greet
hey janet
Or to Janet buffers:
#!/usr/bin/env janet
(use sh)
(def output @"")
($ echo "hey janet" >,output)
(prin output)
./greet
HEY JANET
I used prin
because echo
already adds a newline.
Although that’s sort of a silly redirection, because janet-sh
includes another macro, $<
, which runs a command and returns the output as a string:
(def output ($< echo "hey janet"))
Although I think that $<_
is more useful — $<_
is just like $<
, but it strips trailing whitespace from its output.
You can also redirect-append with >>
, and you can redirect stdin with <
. Just like a real shell.
#!/usr/bin/env janet
(use sh)
(def baby-names `
thaddeus
leopold
ezekiel
`)
($ sort <,baby-names | sed "s/^/name: /")
./baby-names
name: ezekiel
name: leopold
name: thaddeus
$
raises an exception if any of the commands in its pipeline exits non-zero (in Bash terms: sh
implicitly sets pipefail
), and it always returns nil
. But you can also use $?
to check if a command succeeded — it will return true
for exit status 0
, and false
for anything else.
repl:1:> ($? grep "supercalifragilistic" /usr/share/dict/words)
false
If you need more information than that, sh
exports a macro called run
that returns the numeric exit code of a command. It actually returns an array of exit codes, one for each command in a pipeline:
repl:2:> (run grep "supercali" /usr/share/dict/words)
@[1]
repl:3:> (run grep "supercal" /usr/share/dict/words | sort)
supercalender
supercallosal
@[0 0]
Remember that you can still redirect stdout to a buffer when you’re using $?
or run
to grab the text as well, if you want to.
sh
also exports a function called glob
. It’s a function, not a macro, so you have to pass it a string:
#!/usr/bin/env janet
(use sh)
(each file (glob "tests/*.janet" :x)
(printf "running %s" file)
($ janet ,file))
And it returns an array of files that match the glob:
./run-tests
running tests/bar.janet
running tests/foo.janet
:x
tells the glob to return the empty array if no files match the glob. The default behavior is to return @["tests/*.janet"]
, just like Bash does. :x
is like shopt -s nullglob
.
Because glob
returns an array of strings, you’ll have to splice
on its output in order to use it inside one of the $
macros:
repl:1:> ($ echo ;(glob "tests/*.janet"))
tests/bar.janet tests/foo.janet
nil
sh
won’t auto-split variables into multiple arguments, even when it sees a list.
Finally, the $
macros support a binary operator called ^
. ^
will concatenate strings together into a single argument:
repl:1:> (def hello "hello")
nil
repl:2:> ($ echo ,hello ^ world)
helloworld
nil
This can be handy for constructing file names and paths.
Okay! You now know almost everything there is to know about sh
. Let’s try it out!
We’ll start small. Here’s a simple Bash script that checks for .c
files without corresponding .h
files:
for file in *.c; do
if [[ ! -e "$(basename "$file" .c).h" ]]; then
echo "$file is missing a header file"
fi
done
We can rewrite that in Janet, using ^
notation:
(each file (glob "*.c")
(unless ($? test -e ($<_ basename ,file .c) ^ .h)
(print file " is missing a header file")))
Or with the more explicit:
(each file (glob "*.c")
(unless ($? test -e (string ($<_ basename ,file .c) ".h"))
(print file " is missing a header file")))
Which is more to type, but I find it easier for my brain to parse.
Now, I know that wasn’t very interesting. That was a pretty contrived example.
So let’s try a real shell script.
Hmm.
Here’s one I wrote forever ago that I’ve gotten a lot of mileage out of. It prints the description of a Nix package, because somehow there is no built-in command to do this:
set -euo pipefail
nix-env -qaA "nixpkgs.$1" --json --meta \
| jq -r '.[] | .name + " " + .meta.description,
"",
(.meta.longDescription | rtrimstr("\n"))'
We can port this into Janet pretty easily:
#!/usr/bin/env janet
(use sh)
($ nix-env -qaA nixpkgs. ^ (in (dyn *args*) 1) --json --meta
| jq -r `.[] | .name + " " + .meta.description,
"",
(.meta.longDescription | rtrimstr("\n"))`)
Three things to notice about this:
$
invocation, and as long as they’re wrapped in parentheses we don’t need to unquote them.`backtick`
-quoted strings are a really nice way to sidestep shell quoting problems.(in (dyn *args*) 1)
is a whole lot more to type than $1
.It works, though:
./nix-info git
git-2.37.1 Distributed version control system
Git, a popular distributed version control system designed to
handle very large projects with speed and efficiency.
But is this any better than writing Bash? No. Not really. This is a platonic ideal shell script: start a process, pipe the output to another process, exit. Janet isn’t really bringing anything new to the table here.
But I think it’s really impressive that the Janet implementation is not worse than the equivalent Bash script. Subprocess DSLs that I have seen in JavaScript and Python add a lot more noise.
In fact, the only thing I don’t like about the Janet version is the argument handling — (in (dyn *args*) 1)
is pretty messy looking. We could simplify this a little by wrapping it in a main
function and reading the arguments from its parameters, but I’m going to propose an alternative approach:
#!/usr/bin/env janet
(use sh)
(import cmd)
(cmd/def package (required :string))
($ nix-env -qaA nixpkgs. ^ ,package --json --meta
| jq -r `.[] | .name + " " + .meta.description,
"",
(.meta.longDescription | rtrimstr("\n"))`)
cmd
is a library for parsing command-line arguments. (cmd/def package (required :string))
says that our executable takes one required positional argument, a string, and binds it to the symbol package
.
I wrote cmd
, so I’m certainly biased, but I think that sh
and cmd
together provide a better interface for writing “shell scripts” than Bash does — or indeed any higher-level language. The above doesn’t really showcase what it can do, but as soon as we add a named argument:
#!/usr/bin/env janet
(use sh)
(import cmd)
(cmd/def
package (required :string)
--name (flag))
(defn query-name [name]
($< nix-env -qa ,name --json --meta))
(defn query-attr [attr]
($< nix-env -qaA nixpkgs. ^ ,attr --json --meta))
($ <(if name (query-name package) (query-attr package))
jq -r `.[] | .name + " " + .meta.description,
"",
(.meta.longDescription | rtrimstr("\n"))`)
Now we can type ./nix-info git
or ./nix-info git --name
or ./nix-info --name git
, and the cmd
library will assign a boolean value to name
based on whether or not the flag was specified.
This is still extremely simple, but I hope that you can see how this scales better than the equivalent command-line argument parsing code in Bash. Of course you could write this in Bash, and the code wouldn’t even be too bad, since we’re not trying to parse named arguments that have values after them. But we could, with cmd
.
By using cmd/def
, we also got an autogenerated --help
flag, which, at the moment is pretty sparse:
./nix-info --help
./nix-info STRING
=== flags ===
[--help] : Print this help text and exit
[--name] : undocumented
Though by adding a few more annotations:
(cmd/def "Print the description of Nix derivation."
package (required ["<package>" :string])
--name (flag) "Query by name instead of attribute")
We can get slightly better --help
output:
./nix-info --help
Print the description of Nix derivation.
./nix-info <package>
=== flags ===
[--help] : Print this help text and exit
[--name] : Query by name instead of attribute
This is a very small taste of what cmd
can do, and I’m not going to give an exhaustive description of it in this book. But for a slightly larger taste:
# named arguments start with a hyphen
(cmd/def --foo (required :string))
# positional arguments don't
(cmd/def foo (required :number))
# arguments can be optional
(cmd/def --foo (optional :file))
# you can specify multiple aliases for a named argument
(cmd/def [--foo -f] (optional :string))
# as well as a separate name to use for the Janet variable
(cmd/def [bar --foo] :number)
cmd
is a lot less interesting than sh
, because it should — hopefully — just stay out of your way. The official documentation describes how to use it in great detail, but most of the scripts you write will probably just have a flag or two, and the above examples should be sufficient.
So: with the combined power of sh
and cmd
, we can actually replace a lot of shell scripts with Janet scripts. And by doing so we get saner error handling, we don’t have to worry about word-splitting, and we have full access to sequential and associative arrays that actually make sense.
But we also have access to a superpower of Janet: PEGs.
Writing scripts with PEGs is so much nicer than wrangling Awk or Sed invocations that I think it’s pretty hard to go back once you’ve done it a few times. And I say that as someone who loves Sed — and who almost tolerates Awk.
So. With these three superpowers at our disposal — sh
, cmd
, and Janet’s native PEGs — let’s write a little project together.
I want to try writing a little todo list in Janet. It will be very, very simple, but at the end we’ll have something that is actually usable and perhaps even useful. And it can serve as a starting point for a custom todo list app exactly tailored to your personal workflow.
We’ll store our todo list in a plain text file that looks like this:
- [ ] this is a task to do
- [x] this one is already done
This is a pretty simple file format — we could parse this with Sed no problem. But by writing this with PEGs, we’ll actually be able to support tasks that span multiple lines. Which I fully admit is not very useful, but it’s neat that we can do it.
Our “app” will expose the following command line interface:
to do
print the todo listto do 'some task'
add a new taskto done
mark tasks completedThis is a very barebones interface, but it’s a good starting point.
When we print the todo list, we’ll strike out completed tasks, and word wrap any longer tasks to the width of your terminal. Like this:
- [x] a completed task
- [ ] pretend like this is a task and
you have a very narrow terminal
- [ ] a shorter task
But rather than writing our own line-wrapping function, we’ll just shell out to the standard fold
command. And rather than querying the terminfo database directly, we’ll just shell out to tput
.
When you mark a task completed, we’ll actually show an interactive menu from which you can select tasks. And we’ll do that by just shelling out to fzf
, which will even let us support crossing off multiple tasks at once.
Of course we could implement all of this functionality in pure Janet, but by harnessing the power of sh
we can implement it very easily. In fact this whole program — with nicely formatted output and fzf
-powered interactive multi-select — will weigh under 100 lines.
99 lines, to be exact:
#!/usr/bin/env janet
(use sh)
(import cmd)
(defn strikethrough [text] (string "\e[9m" text "\e[0m"))
(def todo-file (string/format "%s/todo" (os/getenv "HOME")))
(def char-to-state {" " :todo "x" :done})
(def state-to-char (invert char-to-state))
(def task-peg (peg/compile
~{:main (* (any (* :task (+ "\n" -1))) -1)
:state (cmt (* "- [" (<- (to "]")) "]") ,|(char-to-state $))
:text (/ (<- (to (+ "\n- [" -1))) ,string/trim)
:task (/ (* :state :text) ,|@{:state $0 :text $1})}))
(defn parse-tasks []
(assert (peg/match task-peg (slurp todo-file))
"could not parse todo list"))
(def cols (scan-number ($<_ tput cols)))
(defn print-task [{:state state :text text}]
(def decorate (case state
:done strikethrough
identity))
(def prefix (string/format "- [%s] " (state-to-char state)))
(def indent (string/repeat " " (length prefix)))
(def wrap-width (- cols (length prefix)))
(def wrapped-text ($< fold <,text -s -w ,wrap-width))
(def lines (string/split "\n" wrapped-text))
(eachp [i line] lines
(print
(if (= i 0) prefix indent)
(decorate line))))
(defn print-tasks [tasks]
(each task (sort-by |(in $ :state) tasks)
(print-task task)))
(defn first-word [str]
(take-while |(not= $ (chr " ")) str))
(defn save-tasks [tasks]
(def temp-file (string todo-file ".bup"))
(with [f (file/open temp-file :a)]
(each {:state state :text text} tasks
($ printf -- "- [%s] %s\n" (state-to-char state) ,text >>,f)))
($ mv ,temp-file ,todo-file))
(cmd/defn to-done "cross something off" []
(def tasks (parse-tasks))
(def input @"")
(loop [[i {:state state :text text}] :pairs tasks
:when (= state :todo)]
(buffer/push-string input
(string/format "%d %s" i text))
(buffer/push-byte input 0))
(when (empty? input)
(print "nothing to do!")
(os/exit 0))
(def output @"")
(def [exit-status]
(run fzf <,input >,output --height 10 --multi --print0 --with-nth "2.." --read0))
(def selections
(case exit-status
0 (drop -1 (string/split "\0" output))
1 []
2 (error "fzf error")
130 []
(error "unknown error")))
(each selection selections
(def task-index (scan-number (first-word selection)))
(def task (in tasks task-index))
(set (task :state) :done)
(print-task task))
(unless (empty? selections)
(save-tasks tasks)))
(defn append-task [text]
(with [f (file/open todo-file :a)]
(file/write f (string/format "- [ ] %s\n" text)))
(print-task {:state :todo :text text}))
(cmd/defn to-do "add or list tasks"
[task (optional ["<task>" :string])]
(if task
(append-task task)
(print-tasks (parse-tasks))))
(cmd/main (cmd/group "A very simple task manager."
do to-do
done to-done))
And that’s, you know, 99 comfortably spaced lines of code.
I’m not going to go over the whole thing, because 99 lines is pretty short for a real program but pretty long for a program in a book. But I do want to hit the highlights.
First off, we parse the todo list with a PEG.
(def task-peg (peg/compile
~{:main (* (any (* :task (+ "\n" -1))) -1)
:state (cmt (* "- [" (<- (to "]")) "]") ,|(char-to-state $))
:text (/ (<- (to (+ "\n- [" -1))) ,string/trim)
:task (/ (* :state :text) ,|@{:state $0 :text $1})}))
This might look a little complicated at first, until you realize that it correctly parses hard-wrapped, multi-line tasks — something that’s fairly difficult to do with a plain old regular expression.
That’s not very shelly, though; that’s just Janet. So let’s take a look at task pretty-printing:
(def cols (scan-number ($<_ tput cols)))
(defn print-task [{:state state :text text}]
(def decorate (case state :done strikethrough identity))
(def prefix (string/format "- [%s] " (state-to-char state)))
(def indent (string/repeat " " (length prefix)))
(def wrap-width (- cols (length prefix)))
(def wrapped-text ($< fold <,text -s -w ,wrap-width))
(def lines (string/split "\n" wrapped-text))
(eachp [i line] lines
(print
(if (= i 0) prefix indent)
(decorate line))))
fold
wraps text to the specified width, and tput
can tell us how wide the terminal is — something that would otherwise require writing a native Janet module, because the standard library doesn’t expose this.
To implement task selection, we construct a buffer of null-terminated strings that we pass to fzf
. We use run
to get the exit code, because fzf
returns 130
if the user presses escape to cancel, and we want to handle that gracefully. Why 130
? No one knows.
(def output @"")
(def [exit-status] (run fzf <,input >,output --height 10 --multi --print0 --with-nth "2.." --read0))
(def selections
(case exit-status
0 (drop -1 (string/split "\0" output))
1 []
2 (error "fzf error")
130 []
(error "unknown error")))
This is a big departure from how I normally program. But I was able to hack up this todo list in, like, thirty minutes. If I weren’t using fzf
to do all the heavy-lifting, I’d probably still be reading about curses bindings and ANSI escape codes right now. And if I had written this in pure shell, I’d still be working on the Sed script to parse multi-line tasks.
This is the beauty of this kind of hybrid scripting: it’s quick, it’s dirty, but it already works. This program does nothing to protect against concurrent writes, it makes far more syscalls than it needs to, and it spawns processes without any concern for the overhead. But none of that really matters, for a program used interactively by a single person.
Now, I don’t think that Janet can replace shell scripts altogether. sh
and cmd
make a pretty good argument, but Bash still has a lot to recommend it: there’s no equivalent of trap EXIT
in Janet, nor is there an analog of the extremely-useful cp foo.bar{,.bup}
expansion shorthand. It’s a lot more verbose to set and reference environment variables in Janet, and there’s no ~/foo
or ~user/foo
shorthand for specifying home directories. You can’t spawn background jobs at all, and Janet has no job control facilities.
So I don’t expect Janet to displace Bash for you entirely. But I think it can absolutely displace Perl, or Python, or Ruby, or whatever higher-level scripting language you currently reach for when your shell scripts get too long.
All the way back in Chapter Three, we looked at the following macro:
(defmacro each-reverse [identifier list & body]
~(do
(var i (- (length ,list) 1))
(while (>= i 0)
(def ,identifier (in ,list i))
,;body
(-- i))))
And we talked about two problems that it has.
The first is that we can’t use i
as the name of our looping variable:
(each-reverse i [1 2 3 4 5]
(print i))
Because the macro uses i
as the index variable already:
(do
# from the macro itself
# ↓
(var i (- (length [1 2 3 4 5]) 1))
(while (>= i 0)
# from the macro's arguments
# ↓
(def i (in [1 2 3 4 5] i))
(print i)
(-- i)))
The second is that the abstract syntax tree that we call list
actually appears in two places in the expansion:
(each-reverse x (do (os/sleep 1) [1 2 3 4 5])
(print x))
Which means that it’s ultimately going to be evaluated twice, duplicating work and possibly even performing side effects multiple times:
(do
# first sleep
# ↓
(var i (- (length (do (os/sleep 1) [1 2 3 4 5])) 1))
(while (>= i 0)
# second sleep
# ↓
(def x (in (do (os/sleep 1) [1 2 3 4 5]) i))
(print x)
(-- i)))
Fixing the second problem is pretty easy: we can just evaluate list
once, and store the result in a variable:
(defmacro each-reverse [identifier list & body]
~(do
(def list ,list)
(var i (- (length list) 1))
(while (>= i 0)
(def ,identifier (in list i))
,;body
(-- i))))
Except, well, that just made the first problem worse. Now not only is i
off limits, but we’ve shadowed the word list
as well.
It’s not hard to fix this, but I think that the fix is pretty confusing the first time you see it.
The trick is that, instead of using identifiers like i
and list
, we’re going to generate new, unique identifiers that won’t clash with any other symbols.
We do this with a function called gensym
.
repl:1:> (gensym)
_000001
_000001
is a unique identifier that has not been used anywhere else in the program before. Janet knows it’s unique because, remember, all symbols are stored in an interning table, and gensym
consults that table to make sure it’s really giving you a unique symbol every time you call it. If Janet had parsed a symbol called _000001
in your program already, it wouldn’t return _000001
:
repl:1:> (def _000001 'hi)
hi
repl:2:> (gensym)
_000002
See?
Yes, you could deliberately construct a symbol called _000001
dynamically in your program in such a way that it doesn’t get into the interning table until after you call gensym
, if you really wanted to. gensym
returns a unique symbol at the time that it’s called, but it’s still possible for you to conspire to create a collision afterwards.
We can use gensym to create names for the variables in our macro. Instead of i
, we’ll use something like _000001
. And instead of list
, we’ll use something like _000002
.
And here’s where it gets weird.
(defmacro each-reverse [identifier list & body]
(def $list (gensym))
(def $i (gensym))
~(do
(def ,$list ,list)
(var ,$i (- (length ,$list) 1))
(while (>= ,$i 0)
(def ,identifier (in ,$list ,$i))
,;body
(-- ,$i))))
So $list
is a symbol. The symbol (symbol "$list")
. I wrote that in my program; that is a real symbol that exists.
But $list
is also a variable name — or well, the name of an immutable binding, but whatever. We’ll drop the pedantry for a second; this is confusing enough already.
$list
is a variable, and $list
happens to be assigned to a value that is also a symbol — a symbol like _000001
.
The dollar sign prefix is just a convention; it stands for $ymbol — the value of $list
is the symbol that will eventually hold the value of list
. list
, remember, is an abstract syntax tree. So we have to unquote list
to put that abstract syntax tree into the abstract syntax tree that we’re returning from our macro. And we have to unquote $list
so that we make a variable called _000001
instead of a variable called $list
.
So if we examine the expansion of our original problematic invocation:
(defmacro each-reverse [identifier list & body]
(def $list (gensym))
(def $i (gensym))
~(do
(def ,$list ,list)
(var ,$i (- (length ,$list) 1))
(while (>= ,$i 0)
(def ,identifier (in ,$list ,$i))
,;body
(-- ,$i))))
(each-reverse i [1 2 3 4 5]
(print i))
We’ll find something a lot like this:
(do
(def _000001 [1 2 3 4 5])
(var _000002 (- (length _000001) 1))
(while (>= _000002 0)
(def i (in _000001 _000002))
(print i)
(-- _000002)))
i
still shows up, because that’s the name we chose for the looping variable when we called the macro. But there’s no more conflict with the i
variable that the macro used internally to store the index — that has been replace with a harmless _000002
.
So let’s notice a few things about this.
It’s, like, ten times harder to read (def ,$list ,list)
than it was to read (def list ,list)
.
Yeah, and it’s probably not going to get better while you’re reading this book. But after you write this a few times, you won’t have to think about it at all.
It’s really easy to forget to do this, and accidentally write a fragile macro.
At first. You do get used to it, though, and gensym
ing quickly becomes second nature. Which isn’t to say you won’t forget, occasionally — you will — but it will get less and less common with practice.
It’s really hard to detect fragile macros like this, because only very specific inputs are problematic.
These problems have caused a lot of people to spend a lot of time thinking about ways to improve on this state of affairs, and there are multiple different “hygienic” macro systems designed to prevent these sorts of mistakes.
But those techniques are out of scope for this book. This is a book about Janet, and Janet macros are filthy. Raw, unfiltered syntax tree transformations. They are very powerful, and they are very simple, and they are very easy to shoot yourself in the foot with.
One way that we can reduce the likelihood of shooting ourselves in the foot is to actually look at the macro expansions. We’ve seen how to do this in the repl with macex1
, but the output is pretty hard to read, and it’s easier to just test that the macro works without looking at its expansion.
But there’s a better way to test a macro expansion: Judge’s test-macro
:
(use judge)
(test-macro (each-reverse i [1 2 3 4 5] (print i)))
After we run this, Judge will fill in the expansion of the macro:
(test-macro (each-reverse i [1 2 3 4 5] (print i))
(do
(def <1> [1 2 3 4 5])
(var <2> (- (length <1>) 1))
(while
(>= <2> 0)
(def i (in <1> <2>))
(print i)
(-- <2>))))
Judge’s test-macro
code formatting is pretty basic, but it’s much better than dumping it all on one line. And notice that the gensym’d symbols are replaced by the shorter <1>
and <2>
identifiers, which will be stable across multiple invocations (whereas the actual underlying symbols will depend on the total number of times gensym
has run since the Janet VM started running).
test-macro
makes it easy to inspect macro expansions, but it has a side benefit as well: it can serve as auto-generated documentation about the behavior of complicated macros. And you can also use it if you have a question about the behavior of a built-in macro. For example, if you want to sanity check how ->>
works:
(test-macro (->> foo (map f) (filter pred))
(filter pred (map f foo)))
Using your source files as a repl like this is—
Okay, sorry, my editor is telling me that I’m not allowed to go on any teary-eyed tangents about the joys of testing in this chapter. We already did that, back in Chapter Eleven. Let’s get back to macros.
So now that we’ve covered the basic errors that you can make while writing a macro, let’s move on to some of the advanced errors.
Because, as complicated as our macro definition has become, it’s still a little bit fragile.
(each-reverse i [1 2 3 4 5]
(print i))
That works great. But this:
(def length 10)
(each-reverse i [1 2 3 4 5]
(print i))
Does not. And it’s easy to see why, when we look at the expansion:
(def length 10)
(do
(def _000001 [1 2 3 4 5])
(var _000002 (- (length _000001) 1))
(while (>= _000002 0)
(def i (in _000001 _000002))
(print i)
(-- _000002)))
Our macro expansion includes the symbol length
. But we really meant the function called length
from the standard library. But that’s not what we said: we said the symbol length
, whatever that might be at the place where our macro was expanded.
The fix is easy, though: by unquoting length
, we actually include the function itself in our final abstract syntax tree:
(defmacro each-reverse [identifier list & body]
(def $list (gensym))
(def $i (gensym))
~(do
(def ,$list ,list)
(var ,$i (- (,length ,$list) 1))
(while (>= ,$i 0)
(def ,identifier (in ,$list ,$i))
,;body
(-- ,$i))))
Now if we test our macro expansion, we’ll wind up with something like this:
(def length 10)
(do
(def _000001 [1 2 3 4 5])
(var _000002 (- (<function length> _000001) 1))
(while (>= _000002 0)
(def i (in _000001 _000002))
(print i)
(-- _000002)))
But wait a minute. We fixed length
, but that’s just one function! And it’s not the only function here. There’s also -
— that’s a function in Janet, remember. And >=
, and in
. All functions. So we actually have to write:
(defmacro each-reverse [identifier list & body]
(def $list (gensym))
(def $i (gensym))
~(do
(def ,$list ,list)
(var ,$i (,- (,length ,$list) 1))
(while (,>= ,$i 0)
(def ,identifier (,in ,$list ,$i))
,;body
(-- ,$i))))
Okay. At this point we’re unquoting more things than we’re quoting, and we could consider abandoning the quasiquote syntax altogether:
(defmacro each-reverse [identifier list & body]
(def $list (gensym))
(def $i (gensym))
['do
['def $list list]
['var $i [- [length $list] 1]]
['while [>= $i 0]
['def identifier [in $list $i]]
;body
['-- $i]]])
I find that harder to read, personally, but you could write a macro that way if you wanted to.
But, well, wait a minute. We covered all the functions — but what about these macros? What about while
? What about def
?? What if someone wants to use this macro somewhere that they’ve redefined while
?
Well, you can’t redefine while
, actually. while
is a “special form,” a language primitive. There is no variable called while
for you to shadow:
repl:1:> (print while)
repl:1:1: compile error: unknown symbol while
And even if we try to shadow while
, (while)
will still mean the special built-in:
repl:1:> (var while 3)
3
repl:2:> (while (> while 0) (print while) (-- while))
3
2
1
nil
while
isn’t the only symbol that’s special-cased like this. There are, in fact, 13 “special forms” in Janet:
do
, upscope
def
, var
set
fn
while
, if
, break
quote
, quasiquote
, unquote
, splice
Everything in Janet — every function, every macro, every everything — is ultimately made out of those 13 building blocks. Or, well, or it’s written in C. Lots of stuff is written in C.
But returning to our macro:
(defmacro each-reverse [identifier list & body]
(def $list (gensym))
(def $i (gensym))
~(do
(def ,$list ,list)
(var ,$i (,- (,length ,$list) 1))
(while (,>= ,$i 0)
(def ,identifier (,in ,$list ,$i))
,;body
(-- ,$i))))
There is still one symbol here that is not a special form.
--
is a normal macro, and if someone shadowed it and then used our each-reverse
function, it wouldn’t work correctly.
(def -- :minus-minus)
(each-reverse i [1 2 3]
(print i))
And it isn’t obvious that that’s the case, right? I mean, we wrote each-reverse
, so we know that the macro expands to include --
. But if we were publishing this macro as part of a library, we don’t really want the users of our library to have to know anything about the expansion of the macro. We want it to just work, always and transparently, just like functions do.
Now, the unquoting trick that we used on functions doesn’t exactly work on macros. If you unquote a macro, it doesn’t unquote to a macro — it unquotes to the abstract-syntax-tree-transforming function that backs the macro. But we don’t want to call that function at runtime, when we eventually run the expanded code — we want to call that function at compile-time.
So there’s a sort of canonical way to do this, which is to use a macro called as-macro
. as-macro
is a trivial macro that takes a function and some arguments and calls the function at compile time. It lets us “unquote” macros, and we can use it to fix this macro definition:
(defmacro each-reverse [identifier list & body]
(def $list (gensym))
(def $i (gensym))
~(do
(def ,$list ,list)
(var ,$i (,- (,length ,$list) 1))
(while (,>= ,$i 0)
(def ,identifier (,in ,$list ,$i))
,;body
(as-macro ,-- ,$i))))
Except, of course, that we have only moved the problem.
If someone shadows as-macro
, we’re exactly back to where we started. as-macro
is not a special form, so it is possible to shadow it.
And look: no one is going to shadow as-macro
. I know that. You know that. If someone shadows as-macro
and then complains that your macros don’t work, that’s not… that’s just not a reasonable thing to try to protect against.
But still. We’ve already come this far. Let’s make this thing airtight.
So macros are just functions, right? Functions from abstract syntax trees to abstract syntax trees. And we can just directly call those functions at compile time — we don’t need to go through as-macro
at all.
(defmacro each-reverse [identifier list & body]
(def $list (gensym))
(def $i (gensym))
# we have to make a new function binding,
# because -- is a macro binding, and we
# want to call it as a function
(def fn-- --)
~(do
(def ,$list ,list)
(var ,$i (,- (,length ,$list) 1))
(while (,>= ,$i 0)
(def ,identifier (,in ,$list ,$i))
,;body
,(fn-- $i))))
And look: don’t do this. This is a fun exercise designed to show you that it is possible to write macros that are entirely insulated from their expanding environment.
But you shouldn’t actually write macros this defensively. We jumped the shark a few pages ago. You definitely shouldn’t worry about someone shadowing as-macro
. And you probably shouldn’t even worry about writing someone shadowing --
— it’s just not worth spending the time to defend against.
But there is still a very good reason to understand how to write macros that are indifferent to the environment in which they’re used.
The reason to care about all of this is that these techniques let us write macros that refer to private functions — functions that don’t exist at all in the environment in which the macro is used.
We could, for example, write our own version of the ++
macro:
(defn- plus-one [x]
(+ x 1))
(defmacro plus-plus [variable]
~(set ,variable (plus-one ,variable)))
And then use it from another file:
(import ./custom-macros)
(var x 0)
(custom-macros/plus-plus x)
(print x)
But that would, of course, give us an error:
janet main.janet
main.janet:4:1: compile error: unknown symbol plus-one
Because (custom-macros/plus-plus x)
expands to (set x (plus-one x))
, and plus-one
is not in scope. And neither is custom-macros/plus-one
— we defined it as a private function, after all.
But by unquoting the private plus-one
function, we can still refer to it from within our macro’s expansion:
(defn- plus-one [x]
(+ x 1))
(defmacro plus-plus [variable]
~(set ,variable (,plus-one ,variable)))
janet main.janet
1
There’s not really a reason to write a “private macro,” because you can just write a private function instead and call that, like we did in the fn--
example. But you can use this to refer to otherwise public macros without needing to know exactly what name they’re bound to in the calling environment — be it foo/my-macro
or just my-macro
. You can use as-macro
to paper over those naming differences, because you know that as-macro
is always going to be called as-macro
.
Alright. Now let’s go back to a reasonable version of our macro:
(defmacro each-reverse [identifier list & body]
(def $list (gensym))
(def $i (gensym))
~(do
(def ,$list ,list)
(var ,$i (- (,length ,$list) 1))
(while (>= ,$i 0)
(def ,identifier (in ,$list ,$i))
,;body
(-- ,$i))))
As you can see this assumes that -
, >=
, in
, and --
exist with their normal definitions in the calling environment. I still unquoted length
, because I think that’s a common variable name. I’ll also unquote functions like tuple
or struct
. There’s an element of human judgment here.
Note that sometimes you actually shouldn’t unquote functions, even if you can. For example, there’s a macro in the standard library called +=
. It’s defined like this:
(defmacro += [x n]
~(set ,x (,+ ,x ,n)))
And that’s actually really annoying!
If we shadow the function called +
— say, because we want to overload it work over tuples — then +=
no longer does what I would expect it to. I want it to be the case that (+= x 1)
is short for (set x (+ x 1))
, but it’s not. It’s short for (set x (<function +> x 1))
. It always invokes the +
from root-env
, even when we have a different +
in scope.
So this is a case where I think it’s better to be deliberately unhygienic.
So, okay. We’ve written a reasonable macro. Now let’s make it look a little nicer.
It’s pretty common for macros to start by declaring a bunch of gensym
’d variables, and there’s a helper macro that makes that a little bit easier. It’s called with-syms
, and we can use it to replace our explicit gensym
calls with this:
(defmacro each-reverse [identifier list & body]
(with-syms [$list $i]
~(do
(def ,$list ,list)
(var ,$i (- (,length ,$list) 1))
(while (>= ,$i 0)
(def ,identifier (in ,$list ,$i))
,;body
(-- ,$i)))))
It’s also common to use let
to bind those temporary variables to their corresponding abstract syntax trees, so that we don’t need the explicit do
:
(defmacro each-reverse [identifier list & body]
(with-syms [$list $i]
~(let [,$list ,list]
(var ,$i (- (,length ,$list) 1))
(while (>= ,$i 0)
(def ,identifier (in ,$list ,$i))
,;body
(-- ,$i)))))
Although in this case $i
is going to be a variable, not a binding, so it has to stay out of the let
. But that still saved us one line.
You’ll see this pattern a lot when you’re writing macros:
(defmacro something [foo bar]
(with-syms [$foo $bar]
~(let [,$foo ,foo
,$bar ,bar]
...)))
So it’s a good idea to try to write it once or twice so that it makes sense.
I think that this final version is a pretty good macro:
(defmacro each-reverse [identifier list & body]
(with-syms [$i $list]
~(let [,$list ,list]
(var ,$i (,dec (,length ,$list)))
(while (>= ,$i 0)
(def ,identifier (in ,$list ,$i))
,;body
(-- ,$i)))))
You could unquote a little more, you could expand the --
macro ahead of time, but I think that’s how I would write this one.
In fact, that’s how I did write that macro, all the way back in Chapter One. Remember that? This was the code that I showed to try to scare you away from Janet before you had a chance to fall in love. Not so scary now, is it?
So let’s try something scarier.
Back in Chapter Eight, I presented a hypothetical macro designed to loosely mimic JavaScript’s class
syntax:
(class Counter
constructor (fn [self] (set (self :_count) 0))
add (fn [self amount] (+= (self :_count) amount))
increment (fn [self] (:add self 1))
count (fn [self] (self :_count)))
And now that we understand gensym
, we can actually write such a macro. Like this:
(defmacro class [name & methods]
(def proto @{})
(var constructor nil)
(each [name impl] (partition 2 methods)
(if (= name 'constructor)
(set constructor impl)
(put proto (keyword name) impl)))
(with-syms [$proto $constructor]
~(def ,name
(let [,$proto ,proto
,$constructor ,constructor]
(fn [& args]
(def self (,table/setproto @{} ,$proto))
(,$constructor self ;args)
self)))))
It’s a bit more complicated, but I think you’re ready for it.
First off, we use partition
to chunk the input into pairs — so given a list like ['foo 1 'bar 2 'baz 3]
, it’ll give us [['foo 1] ['bar 2] ['baz 3]]
. Then we iterate over those pairs, convert symbols like 'add
into keywords like :add
, and then stick them into a table. Except for the symbol constructor
, which is special-cased.
After the loop runs, we’ll have a table from keywords to abstract syntax trees. Note that we haven’t evaluated the functions yet! They’re just syntax trees at this point, so our proto
table will look like this:
@{:add ['fn '[self amount] ['+= ['self :_count] 'amount]]
:increment ['fn '[self] [:add 'self 1]]
:count ['fn '[self] ['self :_count]]}
Similarly, constructor
is not a constructor function — it’s an abstract syntax tree that will become the constructor function after we evaluate it.
['fn '[self] ['set ['self :_count] 0]]
We can’t evaluate these abstract syntax trees directly, but we can return them from our macro for Janet to evaluate later. We have to make sure to only evaluate them once, so we use with-syms
to mint temporary names to store the evaluated results in.
The rest is hopefully straightforward. Or at least… tractable. The final expansion looks something like this:
(def Counter
(let [_000001 @{:add (fn [self amount] (+= (self :_count) amount))
:count (fn [self] (self :_count))
:increment (fn [self] (:add self 1))}
_000002 (fn [self] (set (self :_count) 0))]
(fn [& args]
(def self (<cfunction table/setproto> @{} _000001))
(_000002 self (splice args))
self)))
So the prototype is evaluated once, and then constructor function is evaluated once, and then we create a function that creates a new table, sets its prototype, calls the constructor function, and then finally returns the table. And we call that function Counter
.
That wasn’t too bad, right?
Because we’re manipulating syntax trees, we could actually go a little further. We could ditch the fn
, and implicitly include self
, so that we just write something like this instead:
(class Counter
(constructor [] (set (self :_count) 0))
(add [amount] (+= (self :_count) amount))
(increment [] (:add self 1))
(count [] (self :_count)))
I’m not saying that’s better, but it’s a thing that we could do. We’d have to go in and modify the inputs so that we still returned something like (fn [self] ...)
, but we don’t have to take that as an argument.
We could also add some kind of extends
syntax for subclassing, if we wanted to. We could do anything! It’s just a question of manipulating abstract syntax trees. Carefully manipulating abstract syntax trees — don’t forget about the gensym
ing and the function unquoting.
Alright. At this point, I think that you’re ready to go out into the world and write macros safely and robustly. But before we leave, I want to talk about “abstract syntax trees.”
I’ve used that term a lot, but I never actually explained what I meant by it. After all, the things I’m calling abstract syntax trees in one place could be safely called symbols or tuples in other places. Because that’s what they are.
Back in Chapter One, we talked about all the different values of Janet. And we talked about them as normal values and data structures. But every Janet value is, simultaneously, an abstract syntax tree. And all I mean by that is that you can pass any Janet value to the compile
function, and it will give you back a nullary function that does something with it.
But what, exactly? What does it mean for a struct to be an abstract syntax tree? Or a buffer? How does that work?
Well, let’s find out. We’ll go over all of the values of Janet one more time, and consider them no longer as regular values, but as “abstract syntax trees” representing Janet programs.
We’ll start simple. Most values just evaluate to themselves. This includes simple “atomic” values like numbers, strings, nil
, booleans, and keywords:
repl:1:> ((compile "hello"))
"hello"
repl:2:> ((compile 123))
123
Functions and cfunctions also evaluate to themselves — that’s why we were able to unquote functions earlier in this chapter.
repl:3:> ((compile pos?))
<function pos?>
repl:4:> ((compile int?))
<cfunction int?>
And so do fibers, which is a little bit weird. It’s weird because fibers are mutable: if you write a macro that returns an abstract syntax tree that contains a fiber, then that’s going to be the same fiber every time you call it:
repl:5:> (def count (coro (yield 1) (yield 2) (yield 3)))
<fiber 0x600003FBC1C0>
repl:6:> (defmacro counter [] ~(do (each x ,count (print x))))
<function counter>
repl:7:> (counter)
1
2
3
nil
repl:8:> (counter)
nil
Functions are mutable too, in a way — they can close over variables or mutable values, such that they behave differently every time they’re called. But fibers feels more “obviously” mutable to me.
Abstract types and pointers also evaluate to themselves, always, even if the underlying type is mutable.
repl:9:> (def peg (peg/compile "abc"))
<core/peg 0x6000037BC340>
repl:10:> ((compile peg))
<core/peg 0x6000037BC340>
Symbols are the first things that don’t evaluate to themselves. Symbols evaluate to, well, a lookup of that symbol.
repl:11:> (compile 'foo)
@{:error "unknown symbol foo"}
repl:12:> ((compile 'peg))
<core/peg 0x6000037BC340>
If you want to evaluate the symbol itself, then you have to quote it:
repl:12:> ((compile '(quote foo)))
foo
Or, more cryptically:
repl:13:> ((compile ''peg))
peg
Sometimes you’ll want to write macros that take symbols as inputs and return the actual symbols themselves, not the values they become. To do this, you need to explicitly quote
them:
repl:14:> (defmacro symbolton [sym val] ~{(quote ,sym) ,val})
<function symbolton>
repl:15:> (symbolton x 1)
{x 1}
I think that’s the most explicit way to write this, but once you’re more comfortable with quasiquoting, you might prefer the more cryptic:
(defmacro symbolton [sym val] {~',sym val})
One of my, umm, favorite bits of Janet is |~',$
, which is an anonymous function that returns its argument as a quoted form — useful for mapping over a list of symbols and quoting each of them. Spelled out more explicitly, it’s just (fn [$] ~(quote ,$))
.
Okay. Continuing onward: tuples become function invocations:
repl:17:> ((compile ['+ 1 2]))
3
repl:18:> ((compile ~(+ 1 2)))
3
Although the first argument can be something other than a symbol. It could be an actual function:
repl:19:> ((compile [+ 1 2]))
3
repl:20:> ((compile ~(,+ 1 2)))
3
Or it could be any other “callable” value:
repl:21:> ((compile [{:foo 123} :foo]))
123
But wait a minute.
Tuples become invocations. But what if we just want to return a tuple?
Well, I have somehow managed to skirt this fact for this entire book, but there are actually two kinds of tuples in Janet: bracketed tuples and parenthesized tuples.
Confusingly, you make a parenthesized tuple using square brackets, like [1 2 3]
, or by quoting parentheses, like '(1 2 3)
, or by using the tuple
function.
repl:22:> [1 2 3]
(1 2 3)
repl:23:> '(1 2 3)
(1 2 3)
repl:24:> (tuple 1 2 3)
(1 2 3)
And so far all the tuples we’ve been compiling have been parenthesized tuples.
But you can make bracketed tuples by quoting brackets, or by using the tuple/brackets
function:
repl:25:> '[1 2 3]
[1 2 3]
repl:26:> (tuple/brackets 1 2 3)
[1 2 3]
You will probably only encounter bracketed tuples when you’re writing macros — apart from compile
, they behave identically to normal, “parenthesized” tuples at runtime.
You can inspect the type of the tuple like this:
repl:27:> (tuple/type '(1 2 3))
:parens
repl:28:> (tuple/type '[1 2 3])
:brackets
When you compile a bracketed tuple, you get a regular, parenthesized tuple back:
repl:29:> ((compile '[1 2 3]))
(1 2 3)
Which makes sense: evaluating the abstract syntax tree '[1 2 3]
does exactly the same thing that typing the characters [1 2 3]
into a text file would do.
Note that, when you’re compiling such tuples, every element in the tuple is treated as an abstract syntax tree as well. It also gets compiled according to the rules that we’re setting out here:
repl:30:> ((compile (tuple/brackets [+ 1 2] 4)))
(3 4)
The same is true for arrays. Every element of the array is interpreted as an abstract syntax tree:
repl:31:> (def foo @[1 [+ 1 1] 3])
@[1 (<function +> 1 1) 3]
repl:32:> ((compile foo))
@[1 2 3]
But note that this will always return a new array, even if there’s nothing to “evaluate” inside of it:
repl:33:> (def bar @[])
@[]
repl:34:> (= bar ((compile bar)))
false
Which makes sense: typing @[]
into a file also always creates a new array.
Structs evaluate their keys and their values as abstract syntax trees, and return a struct containing the results of that evaluation:
repl:35:> ((compile {'+ 1}))
{<function +> 1}
And tables do the same thing, but, like arrays, they always return a new table:
repl:36:> ((compile @{''plus '+}))
@{plus <function +>}
Note that if a struct or a table has a prototype, it is completely ignored during compilation.
Finally, buffers — mutable strings — evaluate to copies of themselves:
repl:37:> (def hello @"hello")
@"hello"
repl:38:> ((compile hello))
@"hello"
repl:39:> (= hello ((compile hello)))
false
And those are all of the values of Janet. Again.
Every value is a valid argument to the compile
function. Every value, if given the chance, can become a brand new Janet program. We have witnessed the duality between code and data, and we have emerged enlightened.
Or, well, I shouldn’t speak for you, I guess.
Do you feel enlightened?
Did you get anything out of this?
Has this book made you better, or wiser, or stronger?
Has it inspired you to give Janet a try?
Let me know in the comments.
Er, the (say)
function, that is.