Janet for Mortals
(a real book)
by Ian Henry

Hello and thank you for coming to my book.

I want to tell you about Janet, because I think that Janet is a very good language and it’s a shame that you haven’t heard of it yet. I like Janet so much that I wrote an entire book about it. Look:

Wow! Look at all those chapters. Just like a real book.

This isn’t one of those chapters yet, though. This is the before part. If this were a real book, which it is, this part might be labeled “Introduction.”

Introduction

Oh, good.

Janet occupies a really valuable niche: it is a small, simple language that is actually usable. It has an elegant simplicity that you might associate with someone’s hobby project, but you can actually run it on Windows. It has built-in concurrency and multithreading, and it is an excellent language for text-wrangling, thanks to the native support for parsing expression grammars (think better regular expressions). It has a simple C FFI, so the package ecosystem is “all of them,” as long as you’re willing to write a few lines of binding code first. And thanks to the lightweight runtime, it’s very easy to use Janet as an embedded scripting language.

My favorite feature of Janet, though, is something that sounds really dumb when I say it out loud: you can actually distribute Janet programs to other people. You can compile Janet programs into statically-linked native binaries and give them to people who have never even heard of Janet before. And they can run them without having to install any gems or set up any virtual environments or download any runtimes or anything else like that.

This means that if you want to use Janet for something, you can, and no one even needs to know.

And to be clear, I’m not going to try to convince you to bet your next startup on Janet, or even to use it in any sort of production setting. But I think it’s an excellent language for exploratory programming, scripting, and fun side projects. I’ve personally used Janet in a few capacities:

I have yet to do anything smart with it, though.

about this book

This book assumes that you already know how to program; I’m not going to waste your time explaining what a for loop is. Specifically this book assumes that you already know how to program in JavaScript, because it’s better to have something concrete to diff against, and I’m willing to bet that even if it isn’t your first language, you know enough JavaScript to be able to follow along. And if you don’t, then, well, you’ll probably still be fine. All programming languages are basically the same.

With that in mind I’m going to emphasize what makes Janet different early on. I’ll cover the whole language, but I’ll talk about macros and images and PEGs before I talk about, like, if statements. Is that a good way for a book to present information? I don’t know. We’re going to find out together.

about this website

This book is a real book, as previously established. But it is also undeniably a website, at this particular moment. As such it has the full power of cyberspace at its disposal, and there are some features you should be aware of before we get started.

The first is that this book contains a repl, and you can summon it whenever you’d like by pressing the escape key on your keyboard. The book will then start downloading like a megabyte of JavaScript and WebAssembly, and once it’s done you will be able to try out Janet right here in the comfort of your browser. No need to install anything; no need to leave the comfort of this book website if you’d like to test something out.

The repl is not just a repl, though. It is also a portal into conversation with me, the author. You can use the repl to report typos or factual errors, ask questions, or express confusion. I won’t be able to respond in the repl, but if you include some kind of contact information in your reports I will make an effort to follow up with you. Here, why don’t you try it now? Open up the repl and type something like this:

(report "hey nice book")

Fun, right?

about this author

Oh that’s me.

I’m just a fan of Janet; I am not affiliated with the language in any way. I have no real qualifications to be writing a book about it, and nothing that you read here should be considered authoritative, idiomatic, or educational.

Chapter One: Values and References

Alright, let’s get this over with.

(print "hello world")

Janet has parentheses. Okay? That’s all I’m going to say about it. There are some parentheses here. Maybe more than you’re used to. Maybe more than you’re comfortable with. I’m not going to try to convince you that parentheses are somehow morally superior to curly braces, or waste your time claiming that really it’s the same number of parentheses and they’re just shifted over a little. In fact I’m going to try to talk as little as possible about the parentheses, because they just aren’t very interesting at this stage. They’ll get interesting, once we start talking about macros, but right now the conversation can’t really progress beyond “Ew, I don’t like them.” I know you don’t. And if you can’t get past that, that’s fine. If you draw the line at parentheses, this book comes with a full money-back guarantee.

I didn’t really even want to bring up the parentheses, but I thought it would be weird if I just blew past them. Like, we’re all thinking about them, right? And now you’re just wondering when I’m going to use the L-word. You want me to use it so you can go write a long screed about how Janet isn’t a real one. But I’m not going to give you the satisfaction. I’m not going to use that word until Chapter Fourteen. By which point you’ll be far too tired of all these long-winded tangents to remember what you were upset about in the first place.

What were we talking about? Oh yeah, hello world.

(print "hello world")

As much as I’d like to belabor the very idea of “prefix notation” and talk about function application and special forms and whatnot, we’re already running behind so I’m going to have to skip ahead a little bit.

(defmacro each-reverse [identifier list & body]
  (with-syms [$list $i]
    ~(let [,$list ,list]
      (var ,$i (- (,length ,$list) 1))
      (while (>= ,$i 0)
        (def ,identifier (in ,$list ,$i))
        ,;body
        (-- ,$i)))))

(defn rewrite-verbose-assignments [tagged-lines]
  (def result @[])
  (var make-verbose false)
  (each-reverse line tagged-lines
    (match line
      [:assignment identifier contents]
        (if make-verbose
          (array/push result [:verbose-assignment identifier contents])
          (array/push result [:assignment contents]))
      (array/push result line))
    (match line
      [:output _] (set make-verbose true)
      _ (set make-verbose false)))
  (reverse! result)
  result)

Okay great. I think that’s a totally reasonable second code sample ever for you to look at.

Just take it in for a moment; don’t worry too much about what it’s doing. Try to notice a few things at a high level:

  1. There’s just a nightmare explosion of punctuation right at the beginning there.

  2. There are a lot of parentheses, but there are also a lot of square brackets.

  3. There are two different ways to declare local variables: def and var.

  4. array/push looks like some kind of namespace.

  5. We defined our own control structure.

  6. We defined a function with no explicit return.

There’s more we could say about this example — we could try to figure out what the heck it’s supposed to be doing, for instance — but it’s going to be hard to talk about things before we establish a baseline.

Which brings us back, finally, to the point of this chapter. The values of Janet, the nouns of Janet, the things that comprise a Janet program — the primitive data types and the built-in collections that we will wield to create our programs. And once we have that foundation, we can spend the rest of the book talking about Janet’s verbs.

So here’s what we’re working with:

numbers
repl:1:> 123
123
repl:2:> 1e6
1000000
repl:3:> 1_000
1000
repl:4:> -0x10
-16
repl:5:> 10.5
10.5

Like JavaScript, all numbers in Janet are 64-bit IEEE-754 double-precision floats. Janet doesn’t have a ”numerical tower.”

booleans
repl:1:> true
true
repl:2:> false
false
repl:3:> maybe
just kidding

Like JavaScript, Janet has a concept of “falsiness.” But while JavaScript’s falsiness rules are a common source of wats, Janet’s rules are much simpler: false and nil are falsy; everything else is truthy.

repl:1:> (truthy? 0)
true
repl:2:> (truthy? [])
true
repl:3:> (truthy? "")
true
repl:4:> (truthy? nil)
false
repl:5:> (truthy? false)
false
nil
repl:1:> nil
nil

nil is Janet’s version of JavaScript’s undefined. It’s the thing that functions return if they don’t return anything else; it’s the value that you get when you look up a key that doesn’t exist.

Janet does not have an equivalent of JavaScript’s null — there is no special value of type object that is not actually an object in any meaningful sense of the word. nil, like undefined, is its own type.

Note that Janet’s nil is not the empty list. If you don’t understand why I’m calling that out here, you can safely ignore this paragraph.

strings
repl:1:> "hello"
"hello"
repl:2:> `"backticks"`
"\"backticks\""
repl:3:> ``"many`backticks"``
"\"many`backticks\""

Strings come in two flavors: mutable and immutable. Mutable strings are called “buffers” and start with @, while immutable strings are called “strings.”

repl:1:> @"this is a buffer"
@"this is a buffer"

Janet strings are plain arrays of bytes. They are not encoding-aware, and there is no native Unicode support in the language for indexing or iterating over “characters.” There are some functions that interpret strings and buffers as ASCII-encoded characters, but they are appropriately named string/ascii-upper and string/ascii-lower.

There are external libraries for decoding UTF-8, but not for any other character encoding that I am aware of. And as far as I know there is no full-service Unicode library in Janet — if you need to count the number of extended grapheme clusters in a string, you will have to write some bindings yourself.

vectors
repl:1:> [1 "two" 3]
(1 "two" 3)
repl:2:> ["one" [2] "three"]
("one" (2) "three")
repl:3:> @[1 "two" 3]
@[1 "two" 3]

Vectors come in two flavors: mutable and immutable. Mutable vectors are called “arrays” and start with @, while immutable vectors are called “tuples.” If you are used to other languages with tuples, don’t be fooled: Janet’s tuples do not behave like tuples in any other language. They are iterable, random access immutable vectors.

Also, it’s worth noting that tuples are not fancy immutable vectors, like you might find in Clojure. If you want to append something to a tuple, you have to create an entirely new copy of it first. We’ll talk more about the differences between mutable and immutable values in a bit.

tables
repl:1:> {:hello "world"}
{:hello "world"}
repl:2:> @{"hello" "world" :x 1 :a 2}
@{"hello" "world" :a 2 :x 1}

Once again, mutable and immutable flavors. Mutable tables are called “tables” and start with an @, while immutable tables are called “structs.” Yeah, I know. Right there with you. Modifying a struct, like modifying a tuple, requires creating a shallow copy first.

Tables are a lot like JavaScript objects, except the keys don’t have to be strings, newly created tables and structs don’t have a default “root class” prototype, and they cannot store nil as either keys or values.

This makes some sense if you think of nil as the undefined value: there is no ambiguity between “key does not exist” and “key exists but its value is undefined.” We’ll talk more about this in Chapter Eight.

keywords
repl:1:> :hello
:hello
repl:2:> (keyword "world")
:world

Generally you use keywords as the keys or field names in structs and tables. They’re also handy whenever you need to pass around an immutable named literal, like a tag or an enum.

JavaScript doesn’t really have an analog for keywords, although you might be familiar with the idea from Ruby, which calls them “symbols.” In JavaScript you just pass around short strings, which is functionally the same thing. The difference in Janet is that keywords are interned and strings are not.

symbols
repl:1:> 'hello
hello
repl:2:> (symbol "hello")
hello

Symbols are physically exactly the same as keywords. They share the same interning table; the only difference between a keyword and a symbol is their type.

Logically, though, symbols don’t represent small constant strings like enums. Symbols represent identifiers in your program. You’ll use symbols a lot when you’re writing macros, and basically nowhere else. I mean, you could use them elsewhere, if you really wanted to, but it’s usually more convenient to stick with keywords.

functions
repl:1:> (fn [x] (+ x 1))
<function 0x600000A8C9E0>

Janet functions can be variadic, and support optional and named arguments. fn creates an anonymous function, but you can also use defn as a shorthand for (def name (fn ...)).

repl:1:> (defn sum [& args] (+ ;args))
<function sum>
repl:2:> (sum 1 2 3)
6

& in a parameter list makes the function variadic, and (+ ;args) is how you call a function with a variable number of arguments — ; is like JavaScript’s .... As you can see, the function called + is already variadic, so there’s no actual reason to write a variadic sum like this. But it’s just an example.

repl:1:> (defn incr [x &opt n] (default n 1) (+ x n))
<function incr>
repl:2:> (incr 10)
11
repl:3:> (incr 10 5)
15

&opt makes all following arguments optional, and &named makes, well, named parameters:

repl:1:> (defn incr [x &named by] (+ x by))
<function incr>
repl:2:> (incr 10 :by 5)
15

Note, however, that when we call a function, named arguments must come after any positional arguments.

repl:3:> (incr :by 5 10)
error: could not find method :+ for :by, or :r+ for nil
  in incr [repl] on line 10, column 26
  in _thunk [repl] (tailcall) on line 12, column 1

Because :by is, after all, a valid argument to pass positionally.

fibers
repl:1:> (fiber/new (fn [] (yield 0)))
<fiber 0x600003C10150>

Fibers are powerful control flow primitives, and it’s hard to give a pithy definition for them. Janet uses fibers to implement exception handling, generators, dynamic variables, early return, async/await-style concurrency, and coroutines. Among other things.

One very incomplete but perhaps useful intuition is that a fiber is a function that can be paused and resumed later. Except it’s not a function; it’s actually a full call stack. And it’s not always resumable: you can also stop them. Maybe this is just confusing. You know what? We’re going to spend an entire chapter talking about fibers together later. Maybe I shouldn’t try to explain them poorly before then.


Alright. We did it.

Those are all the values in Janet.

At least, I think those are all the values.

I find it really comforting to have this bird’s eye survey of the Janet noun landscape, but it only really comforts me if I can see all the way to the shoreline. And so far all I’ve done is list out a bunch of types. Did I get all of them? Half of them? Or have I only just scratched the surface?

Well, one nice thing about Janet is that it’s distributed as a single .h/.c file pair, so it’s very easy to look in the source to check. So let’s do that.

We can download the latest amalgamated build from the Janet releases page, and grep for “type” until we find something plausible…

typedef enum JanetType {
  JANET_NUMBER,    // [x]
  JANET_NIL,       // [x]
  JANET_BOOLEAN,   // [x]
  JANET_FIBER,     // [x]
  JANET_STRING,    // [x]
  JANET_SYMBOL,    // [x]
  JANET_KEYWORD,   // [x]
  JANET_ARRAY,     // [x]
  JANET_TUPLE,     // [x]
  JANET_TABLE,     // [x]
  JANET_STRUCT,    // [x]
  JANET_BUFFER,    // [x]
  JANET_FUNCTION,  // [x]
  JANET_CFUNCTION, // [ ]
  JANET_ABSTRACT,  // [ ]
  JANET_POINTER    // [ ]
} JanetType;

Okay. I did pretty good. JANET_CFUNCTION is basically an implementation detail; a cfunction looks and acts like a regular function in almost every respect, except that it’s implemented in C, not Janet.

repl:1:> (type pos?)
:function
repl:2:> (type int?)
:cfunction

We’ll talk more about cfunctions in Chapter Nine.

JANET_POINTER is useful for interacting with C programs; we’re not actually going to talk about it in this book but it’s exactly the thing that you think it is. JANET_ABSTRACT is pretty important, though, so we should probably talk about it now.

A JANET_ABSTRACT type is a type implemented in C code that you can interact with like any other Janet value. We’ll learn how to write our own in Chapter Nine, and you’ll get a chance to see how flexible they are: you could implement anything as an abstract type, and in fact the Janet standard library does exactly that.

This means that there are a few more types in the Janet standard library than the JanetType enum implies, and for the sake of completeness I will list them here:

And those are all of the types that Janet gives you.

I mean, for one definition of “type.” There are some instances of “struct with a particular documented shape” in the standard library, and you could call those distinct types if you wanted to. But you have now seen every type that exists at a physical, mechanical level. You’ve seen all the building blocks; everything else is just a permutation of these values.

We’ll talk more about how these types work and what we can do with them in future chapters. But there is one thing that is so important and so primitive that we’re going to talk about it right now: equality.

Janet, unlike some languages, does not have separate eq and eql and equal and equalp functions. Nor does it have == and === and Object.is. Janet has one real notion of equality: =.

repl:1:> (= (+ 1 1) 2)
true

But = means something different depending on whether you’re asking about a mutable value, like a table or an array, or an immutable value, like a number or a keyword or a tuple.

data typeimmutablemutable
atomnumber, keyword, symbol, nil, boolean
closurefunction
coroutinefiber
byte arraystringbuffer
random-access listtuplearray
hash tablestructtable

Mutable values are only equal to themselves; you might say that they have “reference semantics:”

repl:1:> (= @[1 2 3] @[1 2 3])
false
repl:2:> (def x @[1 2 3])
@[1 2 3]
repl:3:> (= x x)
true

While immutable values have “value semantics:”

repl:1:> (= [1 2 3] [1 2 3])
true

This means that you can use immutable values as the keys of tables or structs without worrying about the specific instance you have a handle on:

repl:1:> (def corners {[0 0] :bottom-left [1 1] :top-right})
{(0 0) :bottom-left (1 1) :top-right}
repl:2:> (get corners [1 1])
:top-right

While mutable keys must be the exact identical value:

repl:1:> (def zero-zero @[0 0])
@[0 0]
repl:2:> (def corners {zero-zero :bottom-left @[1 1] :top-right})
{@[1 1] :top-right @[0 0] :bottom-left}
repl:3:> (get corners @[0 0])
nil
repl:4:> (get corners zero-zero)
:bottom-left

Janet also has a function called deep=, which performs a “structural equality” check for reference types, as well as a function called compare=, which can invoke a custom equality method. But these are not “real” equality functions, in the sense that Janet’s built-in associative data structures — structs and tables — only ever use = equality.

But you can use deep= to compare two mutable values in your own code:

repl:1:> (deep= @[1 @"two" @{:three 3}] @[1 @"two" @{:three 3}])
true

Although it’s worth noting that values of different types are never deep-equal to one another, even if their elements are identical:

repl:1:> (= [1 2 3] @[1 2 3])
false
repl:2:> (deep= [1 2 3] @[1 2 3])
false

Abstract types can go either way — abstract type just means “implemented in C code,” and it’s possible to implement value-style or reference-style abstract types in C code. We’ll talk about how to do that in Chapter Nine.

Finally, I think that it’s worth saying again: Janet’s immutable values are simple immutable values. They are not fancy immutable values like you might find in a language like Clojure. There is no structural sharing here; if you want to append an element to an immutable tuple you have to make a full copy first.

That doesn’t mean that you shouldn’t append things to tuples! But it does mean that you should be aware of the trade-off, and probably prefer mutable structures if you’re working with large amounts of data.

Internally, though, immutable types are still passed by reference. When you return an immutable struct from a function, you’re actually returning a pointer to an immutable struct — you don’t have to make copies of them to pass them around “on the stack.”

Chapter Two: Compilation and Imagination

Alright, we got the basics out of the way. Now we can get to the good stuff.

In this chapter we’re going to talk about compile-time programming and images. JavaScript has no analog for images, nor does it have any sort of “compilation” step, but I’m sure you’re familiar with the concept. Er, the concept of compilation, that is. I hope you’re not already familiar with images, because I want to be the first to tell you about them.

But in order to understand images, we first have to understand the life cycle of a Janet program. A Janet program like this one:

example.janet
(def one 1)
(def two 2)
(def three (+ one two))

(print one)

(defn main [&]
  (print three))

(print two)

If you copy that into a file and run it through the Janet interpreter, you will see the following output:

janet example.janet
1
2
3

Hopefully nothing too surprising. It ran through the top-level statements, then went back and executed our main function.

But you can also compile Janet programs. Usually this means compiling them all the way down to native code using a tool called jpm, which is Janet’s version of npm or cargo or whatever. But in order to produce native code, jpm actually:

  1. Compiles your Janet source code into an “image.”
  2. Embeds the contents of that image into a .c file that also links in the Janet runtime and interpreter.
  3. Compiles that .c file using your system’s C compiler.

But I don’t want to talk about jpm yet, and Janet can only natively do the first thing, so we’re going to be producing and running these “images” directly. We’ll talk about how to get a native binary in Chapter Seven.

So what is an image? Well, it’s easier if I just show you. Let’s make one:

janet -c example.janet example.jimage
1
2

Whoa, look! It executed our top-level statements, but it didn’t call our main. It also produced a file called example.jimage, which we can pass back to Janet to run:

janet -i example.jimage
3

Hey! There’s our main function. And it’s just our main function — the top-level print statements didn’t run again. But it still knew how to print 3, which was a value that we calculated in a top-level statement. Huh.

So top-level statements execute at “compile time”… but we can still refer to compile time values at “runtime.” Neat.

Does that work for any values? Let’s try something more complicated, with mutable structures and shared references:

(def skadi @{:name "Skadi" :type "German Shepherd"})
(def odin @{:name "Odin" :type "German Shepherd"})

(def people
  [{:name "ian" :dogs [skadi odin]}
   {:name "kelsey" :dogs [skadi odin]}
   {:name "jeffrey" :dogs []}])

(pp people)

(defn main [&]
  (set (odin :type)
    "Well mostly German Shepherd but he's mixed with some collie so his ears are half-flops")
  (pp people))

pp is supposed to stand for “pretty print,” although it doesn’t really, so I’ll be manually reformatting the output a bit. If we compile this program, we’ll see how this list looked during compilation:

janet -c dogs.janet dogs.jimage
({:dogs (@{:name "Skadi" :type "German Shepherd"}
         @{:name "Odin" :type "German Shepherd"})
  :name "ian"}
 {:dogs (@{:name "Skadi" :type "German Shepherd"}
         @{:name "Odin" :type "German Shepherd"})
  :name "kelsey"}
 {:dogs () :name "jeffrey"})

And then if we run it, we can see how it looks after we mutate Odin:

janet -i dogs.jimage
({:dogs (@{:name "Skadi" :type "German Shepherd"}
         @{:name "Odin" :type "Well mostly German Shepherd but he's mixed with some collie so his ears are half-flops"})
  :name "ian"}
 {:dogs (@{:name "Skadi" :type "German Shepherd"}
         @{:name "Odin" :type "Well mostly German Shepherd but he's mixed with some collie so his ears are half-flops"})
  :name "kelsey"}
 {:dogs () :name "jeffrey"})

So let’s notice a few things about this:

  1. When you print tuples, they’re wrapped in parentheses, even though you define them with square brackets and they should print with square brackets.

  2. Tables and structs do not preserve the order of their keys.

  3. References are preserved between compile time and runtime.

I wanted to point that last one out explicitly, because you can imagine a dumber version of this where that is not the case. Like, if you’re JavaScript, and you wanted to allow programs to refer to values created at compile time, one natural way to do that would be serialize those values into JSON and then read them back at program startup.

But Janet is doing something fancier than that. Janet is still serializing values to disk and reading them back, but the format it uses is able to express things like shared references and cyclic data structures and closures and the current state of a coroutine.

Janet calls this fancy serialization “marshaling,” as do many other languages, except for Python, which calls it “pickling.” This fact is not really relevant to this book at all; I just think “pickling” is a really whimsical term.

So let’s think about how this might work.

Perhaps when we compile a Janet program, we’re actually doing two things: there’s the “normal” compilation step, where we take high-level Janet code and turn it into lower-level bytecode that the Janet interpreter knows how to execute, just like a normal bytecode compiler. But then there’s also this second step, where we take the values that we computed at compile-time (which values?) and marshal them into bytes. And then an image is the combination of those two things. Is that right?

Well, no. Not really. Because these two steps are not actually separate: an image isn’t a “data” part plus a “code” part. It’s just a data part. As a matter of fact, the entire image consists of nothing more than a single marshaled value: our program’s environment.

“Environment” is a fancy word for scope, but in Janet it refers specifically to the top-level scope. It’s the table mapping symbols (like skadi and main) to values that we defined for them. And it is, itself, a first-class value! It is literally a Janet @{...} table, and it is the “root” value that Janet serializes to form our image.

But some of the values in that environment table are functions. And of course functions are first-class values in Janet, so when we marshal the table we have to marshal those functions as well.

And how do you marshal a function? Well, you’ve probably guessed it already: as bytecode that represents the function’s implementation.

So an “image” is a serialized environment table that probably includes a key called main whose value is a function. And when we “resume” or “execute” the image with janet -i, Janet will first deserialize this environment, then look up the symbol called main, and then execute that function.

Let’s make this a little more concrete. Show me the image:

repl:1:> (load-image (slurp "dogs.jimage"))
@{main @{:doc "(main)\n\n" :source-map ("dogs.janet" 11 1) :value <function main>} odin @{:source-map ("dogs.janet" 1 1) :value @{:name "Odin" :type "German Shepherd"}} people @{:source-map ("dogs.janet" 4 1) :value ({:dogs (@{:name "Skadi" :type "German Shepherd"} @{:name "Odin" :type "German Shepherd"}) :name "ian"} {:dogs (@{:name "Skadi" :type "German Shepherd"} @{:name "Odin" :type "German Shepherd"}) :name "kelsey"} {:dogs () :name "jeffrey"})} skadi @{:source-map ("dogs.janet" 2 1) :value @{:name "Skadi" :type "German Shepherd"}} :current-file "dogs.janet" :macro-lints @[] :source "dogs.janet"}

Alright, well, that’s a complete mess, so let me pretty-print it for you:

@{main @{:doc "(main)\n\n"
         :source-map ("dogs.janet" 11 1)
         :value <function main>}
  odin @{:source-map ("dogs.janet" 1 1)
         :value @{:name "Odin" :type "German Shepherd"}}
  people @{:source-map ("dogs.janet" 4 1)
           :value ({:dogs (@{:name "Skadi" :type "German Shepherd"} @{:name "Odin" :type "German Shepherd"}) :name "ian"}
                   {:dogs (@{:name "Skadi" :type "German Shepherd"} @{:name "Odin" :type "German Shepherd"}) :name "kelsey"}
                   {:dogs () :name "jeffrey"})}
  skadi @{:source-map ("dogs.janet" 2 1) :value @{:name "Skadi" :type "German Shepherd"}}
  :current-file "dogs.janet"
  :macro-lints @[]
  :source "dogs.janet"}

You can see that there’s a little bit more to the table than I let on — Janet stores some metadata about each binding, as well as some metadata about the environment itself.

But still, you can see that an image is just a snapshot of your program’s environment, frozen in time. And, in theory, you could take a snapshot of your program’s environment at any point in time…

repl:1:> (def greeting "hello world")
"hello world"
repl:2:> (defn main [&] (print greeting))
<function main>
repl:3:> (def image (make-image (curenv)))
@"\xD4\x05\xD8\x08root-env\xCF\x01_\xD3\x01\xD0\x05value\xD7\0\xCD\0\x98\0\0\x02\0\0\xCD\x7F\xFF\xFF\xFF\x02\x05\xCE\x04main\xCE\x04repl\xCE\vhello world\xD8\x05print,\0\0\0*\x01\0\0/\x01\0\0*\x01\x01\04\x01\0\0\x02\x01\0\x10\0\x10\0\x10\0\x10\xCF\x05image\xD3\x01\xD0\nsource-map\xD2\x03\0\xDA\x07\x03\x01\xCF\x08greeting\xD3\x02\xDA\f\xD2\x03\0\xDA\x07\x01\x01\xDA\x04\xDA\x08\xCF\x04main\xD3\x03\xDA\f\xD2\x03\0\xDA\x07\x02\x01\xDA\x04\xDA\x05\xD0\x03doc\xCE\n(main &)\n\n\xD8\r*macro-lints*\xD1\0"
repl:4:> (spit "repl.jimage" image)
nil
janet -i repl.jimage
hello world

Which is neat, I guess, and as I understand it this is actually the canonical way to write programs in some languages: you load an image, interactively modify it, then save the image back to disk.

This is possible in Janet, and maybe even fun and good, but I’m not going to say anything else about it. This is a style of programming that dates back to long before I was born, but I have never tried it so I don’t know what I’m missing and I’m going to dismiss it out of hand.

Instead I’m going to talk about images as if they are nothing more than the output of Janet’s “compilation” phase. Because even if you limit yourself to a strict compilation/runtime separation, you can still use compile-time code execution to do a lot of very powerful things.

In fact, I think “compilation” is selling Janet short a little bit. When I hear “compilation,” I think of a transformation from high-level code to lower-level code, probably with some optimization thrown in along the way. And that is part of what Janet does during the so-called compilation phase, but it can also do anything else! It can execute arbitrary code, perform complex calculations — even perform side effects! — and once it’s done it will give us not just bytecode, but a fully interwoven image of our environment.

So instead of the “compilation phase,” I’m going to propose we call this the imagination phase.

Okay, I hate it already. Proposal rescinded. Segue out of this one with me.

So far we’ve only looked at really contrived, artificial examples. I think it’s time to talk about something real.

OpenGL has a concept called “shaders,” which are little mini-programs that run on the GPU and do things like calculate the color for each pixel of your teapot or whatever.

You can’t compile these mini-programs ahead of time, because every GPU is a little bit different, so if you’re writing a game that uses OpenGL, you actually need to distribute the source of your shaders as part of your game, and let each of your players’ video drivers compile them on startup.

So there are lots of ways to do this: we could just distribute the shaders as separate files alongside the game and load them in at runtime relative to the path of our executable. And that would work fine!

But let’s say that we don’t want to do that. Let’s say we want to distribute a game as a single binary.

Well, we could just embed the shader source as a string in our code:

(def gamma-shader `
  #version 330

  in vec3 fragColor;
  out vec4 outColor;

  void main() {
    outColor = vec4(pow(fragColor, vec3(1.0 / 2.2)), 1.0);
  }`)

But that’s obviously terrible; we probably wouldn’t have any tooling support if we did that, and it would be pretty annoying to locate and change our shaders once we have more than a couple of them.

Instead, what if we kept the shaders in separate files, but loaded them into the program at compile time?

shader-example.janet
(def gamma-shader (slurp "gamma.fs"))

(defn main [&]
  (print gamma-shader))

Neat! Now if we compile that to an image, we can embed the data into our final executable:

janet -c shader-example.janet shader-example.jimage
rm gamma.fs # no longer needed!
janet -i shader-example.jimage
#version 330

in vec3 fragColor;
out vec4 outColor;

void main() {
  outColor = vec4(pow(fragColor, vec3(1.0 / 2.2)), 1.0);
}

Okay cool. We performed the side effect of reading from the disk at compile time, and then… well, nothing else. We just referred to it like a regular value, and Janet’s image marshaling took care of embedding the data into our final binary.

Now, obviously there are limits to what you can marshal: not all values can survive cryostasis. In fact, if we consider a slight variation of that code:

shader-example2.janet
(def f (file/open "gamma.fs"))
(def gamma-shader (file/read f :all))
(file/close f)

(defn main [&]
  (print gamma-shader))

This is functionally identical, and we can still run this script just fine:

janet shader-example2.janet
#version 330

in vec3 fragColor;
out vec4 outColor;

void main() {
  outColor = vec4(pow(fragColor, vec3(1.0 / 2.2)), 1.0);
}

But if we try to compile it…

janet -c shader-example2.janet shader-example2.jimage
error: cannot marshal file in safe mode
  in marshal [src/core/marsh.c] on line 1480
  in make-image [boot.janet] on line 2637, column 3
  in c-switch [boot.janet] (tailcall) on line 3873, column 36
  in cli-main [boot.janet] on line 3909, column 13

We can’t. We now have a reference to a core/file abstract type in our top-level environment, and when Janet tries to marshal the environment it throws its hands up on that value. Because of course it does: you can’t serialize a file handle or a network connection or anything like that to disk.

I think we can notice three things from this:

  1. The entire environment is marshaled into our image.

  2. Janet doesn’t special-case closed file handles.

  3. There exist programs that Janet knows how to execute but which cannot be compiled into images.

In practice you don’t really have to think about this, like, ever.

I actually had to contort a bit to write this “broken” program. The correct way to read from a file, if you are allergic to typing the word slurp, would be:

(def gamma-shader
  (with [f (file/open "gamma.fs")]
    (file/read f :all)))

(defn main [&]
  (print gamma-shader))

Which of course compiles fine — f is not a top-level variable, so it’s not a part of the environment.

And when you’re writing little shebang scripts, you probably won’t even define a main function, and it will look like Janet just runs through your script in order like any other scripting language. All of your work will take place during the “compilation” phase, and Janet will never try to construct an image at all, and you really won’t have to think about this.

But once you start writing larger programs that you compile ahead of time, you can start to think about the distinction, and decide if there’s any work you want to perform ahead-of-time. You don’t have to — you can put everything in main if you want to — but you have that power should you need it.

Finally, I think it’s worth pointing out explicitly: just because we can’t marshal core/files, that doesn’t mean we can’t marshal other abstract types. Many of the abstract types in the standard library (like core/peg) are perfectly marshalable, and when we define our own abstract types we can optionally provide custom marshaling routines. We’ll talk more about that in Chapter Nine.

And now I’m done talking about images.

You got a little taste of what you can do with compile-time programming, and I hope that it was to your liking. Because the next chapter…

Well, I don’t want to spoil it.

Chapter Three: Macros and Metaprogramming

Oh; it’s just right there. Not a lot of time to build suspense, really. It kind of spoiled itself.

Alright, well, yes, we’re going to talk about macros. We’re going to talk about macros as if you have never heard of them, even though you probably have, because remember that in this book you’re only supposed to know JavaScript and JavaScript doesn’t have any.

And actually, we should probably talk about that. JavaScript doesn’t have macros. Most popular languages don’t have macros as a matter of fact. And you’ve made it this far in life using those languages, and you’re doing just fine. Do you really need macros?

Well, no. There is no program that you can write with macros that you couldn’t write in some other way — just like there is no program that you can write with first-class functions that you couldn’t write some other way.

But aren’t first-class functions nice? Think of all the things they let you do: fold, promises, event handlers— do you know what it’s like writing event handlers without first-class functions? Have you ever implemented a delegate class? It’s awful.

Macros are similarly useful. You can write macros to eliminate boilerplate to a degree that you cannot imagine in a language without them. You can write macros to define new control-flow constructs, like a switch with destructuring cases (don’t you wish JavaScript had that?). You can write macros that define new functions for you, or modify existing functions à la Python’s decorators. You can write macros that take high-level descriptions of binary formats and generate code to efficiently parse them. You can write macros to do all sorts of things.

So: macros.

Macros.

I really am going to talk about macros, but you’ll have to give me a second, because first I want to talk about metaprogramming. Macros are a tool that we can use to write very powerful “metaprograms,” but we can do metaprogramming without any macros at all.

In fact, in a way, all compiled Janet programs are “metaprograms” — programs that write other programs — because of the way that the compilation step works. When we “compile” a Janet program, we’re really executing (part of) it, performing all of the top-level effects, computing all of the top-level values, and then finally producing, as the output of this “metaprogram,” an image with a main function. And what is that image but a new program forged in the fires of our metaprogram?

So here’s a macro-free metaprogram where this “program construction” is a little more explicit. It’s a script that you can compile to produce an image that prints hello world.

meta.janet
(defn sequence [& fs]
  (fn [&]
    (each f fs
      (f))))

(defn first-word []
  (prin "hello"))

(defn space []
  (prin " "))

(defn second-word []
  (prin "world"))

(defn newline []
  (print))

(def main (sequence first-word space second-word newline))

Nothing mind-blowing, right? If you’re comfortable with the idea of functions returning other functions, you can see that we created an anonymous closure over our various smaller functions, then we created a binding called main in the environment that points to that closure. When Janet constructs the image, its main function will be exactly that closure that we dynamically allocated during the compilation phase.

janet -c meta.janet meta.jimage
janet -i meta.jimage
hello world

Metaprogramming.

But that’s not really what typical metaprogramming looks like. That’s a weird high-level example that I made up to get you comfortable with the idea of creating new functions at compile-time.

But there’s a much more direct way to create new functions at compile time. In fact, Janet has a function that takes in the body of a function as an abstract syntax tree and gives us back an actual real-live function that represents the result of evaluating that body. We can use that to directly create new functions:

compile.janet
(def main (compile ['print "weird"]))

Remember that 'print is a “symbol.” ['print "weird"] is an immutable vector (“tuple”) of two elements: the symbol 'print and a string. That tuple is how Janet represents the abstract syntax tree of the expression (print "weird"), and indeed if we compile and execute it, we can see that it does exactly that:

janet -c compile.janet compile.jimage
janet -i compile.jimage
weird

When we pass an abstract syntax tree to the compile function, Janet will give us back the function that ignores its arguments and has that abstract syntax tree as its body. And it works for any abstract syntax tree we construct:

repl:1:> (def f (compile ['+ 1 ['* 2 3]]))
<function _thunk>
repl:2:> (f)
7

Because Janet uses a very lightweight representation for abstract syntax trees — they’re regular tuples of regular Janet values, like symbols and numbers and other tuples — it’s very easy for us to manipulate abstract syntax trees. We don’t need to use some special AbstractSyntaxTreeNodeVisitor class or something to wrangle these values; we can just use regular loops and maps and anything else that we could do with any other tuple.

So let’s do that.

(defn set-x [& expressions]
  (def result @['do])
  (each expression expressions
    (array/push result ['print (string/format "about to execute %q" expression)])
    (array/push result expression))
  (tuple/slice result))

I called this set-x because it reminds me of bash’s set -x option. But it doesn’t change anything about how Janet works; it’s just a function that takes any number of abstract syntax trees and returns a single new abstract syntax tree.

Since functions have to return a single value, we wrap everything in do to produce the abstract syntax tree that, well, does all of the things we want in order.

Actually, let’s take a quick look at do before we move on:

(do
  (first-thing)
  (second-thing))

This is equivalent to creating a new block in JavaScript:

{
  firstThing();
  secondThing();
}

In JavaScript you usually create a new block so that you can control the extent of block-scoped variables:

{
  const x = 10;
  console.log(x);
}
// x no longer exists

The same is true in Janet:

(do
  (def x 10)
  (print x))
# x no longer exists

But this actually isn’t what we want here. We’re only using do because we want to pack a bunch of abstract syntax trees into a single abstract syntax tree. The fact that do creates a new scope is sort of incidental, and in fact might actually be problematic.

Fortunately, Janet has a do alternative called upscope, and upscope does not create a new scope. It executes all of its expressions in the same scope that it runs in.

(upscope
  (def x 10)
  (print x))
(print x) # still exists!

I think that’s actually more appropriate for our set-x function. So let’s switch to that:

set-x.janet
(defn set-x [& expressions]
  (def result @['upscope])
  (each expression expressions
    (array/push result ['print (string/format "about to execute %q" expression)])
    (array/push result expression))
  (tuple/slice result))

Okay. So this is a function; let’s call it:

janet -l ./set-x
repl:1:> (set-x ['print ['string/ascii-upper "hello"]] ['+ ['* 3 2] ['/ 1 2]])
(upscope (print "about to execute (print (string/ascii-upper \"hello\"))") (print (string/ascii-upper "hello")) (print "about to execute (+ (* 3 2) (/ 1 2))") (+ (* 3 2) (/ 1 2)))

Wow that was the longest aside ever. I forgot what we were even talking about.

janet -l ./set-x
repl:1:> (set-x ['print ['string/ascii-upper "hello"]] ['+ ['* 3 2] ['/ 1 2]])
(upscope (print "about to execute (print (string/ascii-upper \"hello\"))") (print (string/ascii-upper "hello")) (print "about to execute (+ (* 3 2) (/ 1 2))") (+ (* 3 2) (/ 1 2)))

Oh, that’s right. Let’s prettify that:

(upscope
  (print "about to execute (print (string/ascii-upper \"hello\"))")
  (print (string/ascii-upper "hello"))
  (print "about to execute (+ (* 3 2) (/ 1 2))")
  (+ (* 3 2) (/ 1 2)))

Alright! Remember that this is only an abstract syntax tree. We need to compile it if we actually want to execute the code:

janet -l ./set-x
repl:1:> (set-x ['print ['string/ascii-upper "hello"]] ['+ ['* 3 2] ['/ 1 2]])
(upscope (print "about to execute (print (string/ascii-upper \"hello\"))") (print (string/ascii-upper "hello")) (print "about to execute (+ (* 3 2) (/ 1 2))") (+ (* 3 2) (/ 1 2)))
repl:2:> (compile _)
<function _thunk>
repl:3:> (_)
about to execute (print (string/ascii-upper "hello"))
HELLO
about to execute (+ (* 3 2) (/ 1 2))
6.5

Okay, cool. Metaprogramming.

Except: is it cool? Or is it an unreadable mess, to the extent that you’re wondering why anyone would ever want to write code like this?

Well, don’t worry: no one writes code like this. The thing we’re doing here is weird, and it’s especially weird if you’re already used to macros. You’re never actually going to see code like this; you’re probably never going to invoke the compile function directly. I’m just going through this so that you can see the things that Janet makes possible:

And these are the core ideas behind macros.

But macros give us a much more ergonomic way to do all of these things. Macros are so ergonomic, in fact, that they verge into “magic” territory, and it’s easy to lose sight of the actual mechanisms underlying them. When I started writing macros I didn’t really understand how they worked, when they executed, or the full extent of what I could do with them. I thought of them as simple syntactic transformations, and it took a lot of playing with Janet before I understood their true potential.

So we’re going to keep building towards macros. Right now we have a super explicit, super ugly version of something kinda like a macro. Let’s sprinkle syntax sugar on it until it tastes good.

First off, no one ever writes abstract syntax trees like this:

['+ ['* 3 2] ['/ 1 2]]

Instead, you write this:

'(+ (* 3 2) (/ 1 2))

Which means exactly the same thing, but it looks much closer to the Janet code that it represents.

' is pronounced “quote,” and whatever comes after the quote gets, umm, quoted. We’ve seen quote before with 'symbols, and I pretended like it was just the way that you wrote symbol literals. But actually, quoting an identifier is just the most convenient way to get a symbol. You can also make them with the (symbol "name") constructor.

You can quote anything, but quoting most things just gives you the thing you quoted back:

repl:1:> '100
100
repl:2:> '"hello"
"hello"
repl:3:> '{:key value}
{:key value}
repl:4:> '@[1 2 3]
@[1 2 3]
repl:5:> ':foo
:foo
repl:6:> 'true
true
repl:7:> 'nil
nil

But when you quote something, it also quotes every sub-expression in it. So when we write '{:key value}, we get the “struct” {':key 'value}. Quoting a keyword gives us the keyword back, and quoting the identifier value gives us the symbol 'value instead of treating it like the name of a variable, so we don’t get an error about value not being defined.

So really '(+ 1 2) is the same as ['+ '1 '2], but since '1 is the same as 1 we don’t bother to write that all out.

Okay. So: does this help us? A little. Now we can write:

(set-x '(print (string/ascii-upper "hello")) '(+ (* 3 2) (/ 1 2)))

Which looks a tiny bit better, I guess. But does this help us with the implementation of set-x?

(defn set-x [& expressions]
  (def result @['upscope])
  (each expression expressions
    (array/push result ['print (string/format "about to execute %q" expression)])
    (array/push result expression))
  (tuple/slice result))

Ummmm, no. Not really.

We do construct the abstract syntax tree ['print (string/format "..." expression)], but we can’t write that as '(print (string/format "..." expression)). That would quote every element in the list, but we only want to quote the first element — we still want to evaluate the call to string/format.

Fortunately, Janet has a way to quote some elements in an expression without quoting every element: replace ' with ~.

~ is pronounced “quasiquote.” It works exactly the same as ', except that you can “opt out” sub-expressions from getting quoted. By default it quotes everything, but you can tell it not to quote a specific term by prefixing it with a comma:

repl:1:> ~(print ,(+ 1 2))
(print 3)

, is pronounced “unquote,” and it works, essentially, like string interpolation. Consider the following JavaScript:

node
Welcome to Node.js v16.16.0.
Type ".help" for more information.
> `1 + 2 = ${1 + 2}`
'1 + 2 = 3'

This is very similar to the following Janet expression:

repl:1:> ~(1 + 2 = ,(+ 1 2))
(1 + 2 = 3)

Except it’s not a string; it’s a tuple. You can actually use this “tuple interpolation” to make any tuples you want; you don’t have to use it to make abstract syntax trees. But generally [] notation is more convenient: it’s usually better to make quoting opt-in instead of opt-out, unless you’re writing down an abstract syntax tree.

Okay. So now we can write this instead:

(defn set-x [& expressions]
  (def result @['upscope])
  (each expression expressions
    (array/push result ~(print ,(string/format "about to execute %q" expression)))
    (array/push result expression))
  (tuple/slice result))

Is that better? Eh; maybe a tiny bit. But not really.

There’s one more bit of syntax sugar to add, though, and it’s going to help us substantially.

Consider the following contrived example:

repl:1:> (def nums [1 2 3])
(1 2 3)
repl:2:> (def sum-ast ~(+ ,nums))
(+ (1 2 3))

We interpolated a list into our list, and of course we got a nested list, because that’s what we asked for. But what if that’s not what we want? What if we want (+ 1 2 3) instead? Well, it turns out Janet has just the thing:

repl:3:> (def sum-ast ~(+ ,;nums))
(+ 1 2 3)

,; is pronounced “unquote-splice,” and it splices each element in the inner list into the outer list.

And that actually helps us a lot! That lets us completely rewrite set-x:

(defn set-x [& expressions]
  ~(upscope
    ,;(mapcat (fn [expression]
        [~(print ,(string/format "about to execute %q" expression))
         expression])
        expressions)))

This is just a more functional version of the imperative thing we were doing before, using unquote-splice to form the list instead of explicit mutation.

Now that’s starting to look like a real macro.

Except, of course, it isn’t actually a macro yet. It’s just a function. A function that takes abstract syntax trees as input, and returns an abstract syntax tree as output. Which is essentially all that a macro is, except that we declare macros differently. We declare macros like this:

set-x-macro.janet
(defmacro set-x [& expressions]
  ~(upscope
    ,;(mapcat (fn [expression]
        [~(print ,(string/format "about to execute %q" expression))
         expression])
        expressions)))

This is exactly the same as before, but now we’re using defmacro instead of defn. And when we call it, we can see two interesting differences:

janet -l ./set-x-macro
repl:1:> (set-x (print "hello") (+ 1 2))
about to execute (print "hello")
hello
about to execute (+ 1 2)
3

First off, notice that we don’t have to explicitly quote the arguments anymore. We just wrote (print "hello"), but our macro got '(print "hello") as one of its arguments.

But that’s not very interesting. The interesting bit is that it didn’t just return the abstract syntax tree — it compiled and executed it as well.

It executed it because we typed this into the repl, so this was sort of a confusing first example, as we combined both the compile time and runtime steps into one. So let’s take a look at how macros normally work:

macro-example.janet
(use ./set-x-macro)

(defn main [&]
  (set-x
    (var sum 0)
    (for i 0 10
      (+= sum i))
    (print sum)))

It looks like we’re “calling” set-x in main. However, I’m going to claim that set-x will not run when we invoke main. Instead, set-x will run during the compilation phase, when we compile main, and the abstract syntax tree that set-x returns will run when we invoke main.

Here, watch. Let’s compile this program:

janet -c macro-example.janet macro-example.jimage

Well… huh. Yeah. We can’t really tell. Maybe set-x ran; maybe nothing happened at all.

Let’s make it a little more clear what’s going on.

set-x-verbose.janet
(defmacro set-x [& expressions]
  (print "expanding the set-x macro")
  (printf "  here are my arguments: %q" expressions)
  (def result ~(upscope
    ,;(mapcat (fn [expression]
        [~(print ,(string/format "about to execute %q" expression))
         expression])
        expressions)))
  (printf "  and i'm going to return: %q" result)
  result)

Let’s switch to that version of the macro, and add in a little self-reflection while we’re at it:

macro-example-verbose.janet
(use ./set-x-verbose)

(defn main [&]
  (set-x
    (var sum 0)
    (for i 0 10
      (+= sum i))
    (print sum)))

(print)
(print "and this is what main looks like:")
(print)
(pp (disasm main))

Now when we compile that program, we can see it all in action:

janet -c macro-example-verbose.janet macro-example-verbose.jimage
expanding the set-x macro
  here are my arguments: ((var sum 0) (for i 0 10 (+= sum i)) (print sum))
  and i'm going to return: (upscope (print "about to execute (var sum 0)") (var sum 0) (print "about to execute (for i 0 10 (+= sum i))") (for i 0 10 (+= sum i)) (print "about to execute (print sum)") (print sum))

and this is what main looks like:

{:arity 0 :bytecode @[ (lds 0) (ldc 1 0) (push 1) (ldc 2 1) (call 1 2) (ldi 1 0) (ldc 2 2) (push 2) (ldc 3 1) (call 2 3) (ldi 2 0) (ldi 3 10) (lt 4 2 3) (jmpno 4 5) (movn 5 2) (add 1 1 5) (addim 2 2 1) (jmp -5) (ldc 2 3) (push 2) (ldc 3 1) (call 2 3) (push 1) (ldc 2 1) (tcall 2)] :constants @["about to execute (var sum 0)" <cfunction print> "about to execute (for i 0 10 (+= sum i))" "about to execute (print sum)"] :defs @[] :environments @[] :max-arity 2147483647 :min-arity 0 :name "main" :slotcount 6 :source "macro-example-verbose.janet" :sourcemap @[ (3 1) (4 3) (4 3) (4 3) (4 3) (5 5) (4 3) (4 3) (4 3) (4 3) (6 5) (6 5) (6 5) (6 5) (6 5) (7 7) (6 5) (6 5) (4 3) (4 3) (4 3) (4 3) (8 5) (8 5) (8 5)] :structarg false :vararg false}

Don’t worry about actually understanding the bytecode for main, just take a look at the constants for strong evidence that the abstract syntax tree we returned actually did make its way in there somewhere.

And indeed, when we run it, we can see that main does what we expect:

janet -i macro-example-verbose.jimage
about to execute (var sum 0)
about to execute (for i 0 10 (+= sum i))
about to execute (print sum)
45

Neat.

So macros are like functions, but they always run at compile time no matter where they appear in your program. Here’s how it works:

  1. Janet goes through each expression in your file in order from top to bottom.
  2. Before it executes any expression — defn statements, top-level side effects, whatever — it first performs “macro expansion” on the abstract syntax tree of the expression. This means that Janet scans the abstract syntax tree for any previously-defined macros, and…
  3. …for every macro that Janet finds, it invokes the macro’s underlying abstract-syntax-tree-manipulation-function, passing it all of its “arguments” as abstract syntax trees.
  4. Janet then looks through the abstract syntax tree that that function returned, and checks if it has any macro invocations. If so, goto 3.
  5. Once Janet reaches a fixed point where there are no more macros to expand, it replaces the macro invocations with their fully-expanded results, creating a new top-level abstract syntax tree.
  6. Janet then compiles the top-level abstract syntax tree into a function.
  7. And calls it.

Alright.

Macros.

That’s macros.

We did it.

Except, gosh, we haven’t really done it at all, have we?

We’ve only scratched the surface. We’ve only seen the absolute most boring, basic macro you could imagine.

So let’s do something exciting, before we draw this chapter to a close.

We’re going to write a macro that will read a SQLite schema file at compile time, then use that to generate functions that will query a corresponding SQLite database at runtime.

Should you actually do this? No. This is veering so far into “magic” territory that I cannot in good conscience endorse anything like it. But I think it is a good demonstration of the power of metaprogramming, and it’ll be fun.

Janet has a semi-first-party sqlite3 package; we’re going to rely on that to do all of the heavy-lifting. I mean, to the extent that we’re lifting anything. It’s actually going to be pretty light.

First off, we’ll need a schema. I’ll use something really simple for the sake of this example:

create table people(
  id integer primary key,
  name text not null
);

create table grudges(
  id integer primary key,
  holder integer not null,
  against integer not null,
  reason text not null,
  foreign key(holder) references people(id),
  foreign key(against) references people(id)
);

Next we’ll need a dummy database:

sqlite3 db.sqlite
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite> .read schema.sql
sqlite> insert into people values (1, 'ian');
sqlite> insert into people values (2, 'jeffrey');
sqlite> insert into grudges values (1, 1, 2, 'claimed that my chapter on macros was "impenetrable"');
sqlite> insert into grudges values (2, 1, 2, 'doesn''t even have any dogs');

Next we’ll write some code to generate functions to query that schema. I’m waving my hands over how I have the sqlite3 Janet package; we’ll talk more about using libraries in Chapter Seven.

querify.janet
(import sqlite3)

(defmacro querify [schema-file]

  (defn table-definition [table-name]
    ~(defn ,(symbol table-name) [conn]
      # This might be vulnerable to some kind of wild SQL injection attack
      # if an attacker somehow controls your schema file (???), but you can't
      # bind table names as parameters and also this is definitely fine.
      (sqlite3/eval conn ,(string/format "select * from %s;" table-name))))

  (def conn (sqlite3/open ":memory:"))
  (sqlite3/eval conn (string (slurp schema-file)))
  (def tables
    (->> (sqlite3/eval conn "select name from sqlite_schema where type = 'table';")
      (map |($ :name))
      (filter |(not (string/has-prefix? "sqlite_" $)))))
  (sqlite3/close conn)

  ~(upscope
    ,;(map table-definition tables)))

(querify "schema.sql")

(defn main [&]
  (def conn (sqlite3/open "db.sqlite"))
  (pp (people conn))
  (pp (grudges conn)))

Look! We’re able to call the functions people and grudges that only exist because there are tables with those names in our schema.

janet querify.janet
@[{:id 1 :name "ian"} {:id 2 :name "jeffrey"}]
@[{:against 2 :holder 1 :id 1 :reason "claimed that my chapter on macros was \"impenetrable\""} {:against 2 :holder 1 :id 2 :reason "doesn't even have any dogs"}]

Once again: I’m not really endorsing this. This is just a party trick. It’s a quick and dirty example of how easy it is to generate code at compile time, but you could imagine applying this technique in a different situation where it would actually be useful. For example, you could query the National Weather Service and make your code execute more slowly if someone compiles it on a rainy day. They have an API, you know.

But before we do that, let’s make sure that we actually understand this macro first. That tables expression is probably the trickiest part of all of this to digest, because I snuck in some new stuff, but I’ll walk you through it.

(def tables
  (->> (sqlite3/eval conn "select name from sqlite_schema where type = 'table';")
    (map |($ :name))
    (filter |(not (string/has-prefix? "sqlite_" $)))))

Loosely speaking, you can read ->> as a Janet analog of method-chaining in JavaScript. It’s a macro that— hey! Wait!

It’s a macro!

And it’s actually an example of a good macro. So let’s put a pin in whatever nonsense we were doing for a bit, and let’s talk about ->>.

(->> (sqlite3/eval conn "select name from sqlite_schema where type = 'table';")
  (map |($ :name))
  (filter |(not (string/has-prefix? "sqlite_" $))))

->> is pronounced “thread last.” All it does is take its first argument and stick it at the end of its second argument:

(->>
  (map |($ :name)
    (sqlite3/eval conn "select name from sqlite_schema where type = 'table';"))
  (filter |(not (string/has-prefix? "sqlite_" $))))

And then it repeats that until there aren’t any expressions left to merge:

(->>
  (filter |(not (string/has-prefix? "sqlite_" $))
    (map |($ :name)
      (sqlite3/eval conn "select name from sqlite_schema where type = 'table';"))))
(filter |(not (string/has-prefix? "sqlite_" $))
  (map |($ :name)
    (sqlite3/eval conn "select name from sqlite_schema where type = 'table';")))

Neat. ->> is a nice bit of syntax sugar that lets us reduce the amount of paren-nesting we have to contend with. It’s also a much simpler macro than the weird metaprogramming function-generation thing we’re doing right now, and probably would have made a better first example, but it’s too late to change that now.

Here’s one way we could implement it, taking advantage of the fact that Janet will continue expanding macros until there aren’t any left to expand to implement a sort of recursive macro without an explicit recursive call:

(defmacro my->> [first & rest]
  (match rest
    [next & rest] ~(my->> (,;next ,first) ,;rest)
    [] first))

In addition to ->>, there’s also -> (“thread first”), which as you might expect from the name does the same thing but threads values as the first argument instead of the last one, as well as -?> and -?>>, which short-circuit nil values, plus as-> and as?-> which allow you to thread into any position.

Alright; let’s get back to the tables expression:

(def tables
  (->> (sqlite3/eval conn "select name from sqlite_schema where type = 'table';")
    (map |($ :name))
    (filter |(not (string/has-prefix? "sqlite_" $)))))

The next weird thing is |($ :name). This is just a shorthand way to create an anonymous function: |($ :name) is exactly the same as (fn [$] ($ :name)), and (struct :key) is one way to look up the value of a key on a table or struct. It looks like a function call, but the thing being “called” is not a function, so Janet does a lookup instead. We could also write |(in $ :name) to make the lookup explicit.

So the whole expression is equivalent to the following JavaScript:

const tables = conn.eval("select name from sqlite_schema where type = 'table';")
  .map(x => x.name)
  .filter(name => !name.startsWith("sqlite_"));

Alright. Hopefully now you can understand how the querify macro works, but I think it’s always easier to understand macros by looking at their “expansions.”

The easiest way to do this is to open up the Janet repl with our script loaded in, and to invoke the function macex1:

janet -l ./querify
repl:1:> (macex1 '(querify "schema.sql"))
(upscope
  (defn people [conn]
    (sqlite3/eval conn "select * from people;"))
  (defn grudges [conn]
    (sqlite3/eval conn "select * from grudges;")))

macex1 (“macro expand once”) is not a macro; it’s a function that expects an abstract syntax tree, so we have to quote the expression we want it to expand.

We could also use macex, which will fully expand its argument:

repl:2:> (macex '(querify "schema.sql"))
(upscope
  (def people "(people conn)\n\n"
    (fn people [conn]
      (sqlite3/eval conn "select * from people;")))
  (def grudges "(grudges conn)\n\n"
    (fn grudges [conn]
      (sqlite3/eval conn "select * from grudges;"))))

And here we can see that defn is just a macro that expands to (def ... (fn ..)), with a docstring added for good measure. Since this is further away from what we actually wrote, I find that sometimes it’s harder to understand the fully expanded output. I usually prefer to keep calling (macex1 _) until I reach the fixed point.

So, just to recap: every time we compile this program, we create an in-memory SQLite database, load in our schema, and then query that database to get a list of table names. And then we generate some functions with the same names. And then we compile them into our image.

If you were able to follow all of that, then congratulations: you just earned your macro white belt.

Actually, ugh. No. Not quite yet. I’m sorry. You’re really close, and you’re doing great, and I know this chapter is way too long already. But we can’t actually leave until we talk about hygiene.

But you’re tired, and I’m tired, and this is a lot of information to take in at once, and hygiene is sort of a tricky concept to wrap your head around at the best of times.

So here’s all I’m going to say about it for now:

each-reverse.janet
(defmacro each-reverse [identifier list & body]
  ~(do
    (var i (- (length ,list) 1))
    (while (>= i 0)
      (def ,identifier (in ,list i))
      ,;body
      (-- i))))

This macro is simple enough. And it looks like it works just fine:

hygiene.janet
(use ./each-reverse)

(each-reverse num [1 2 3 4 5]
  (print num))
janet hygiene.janet
5
4
3
2
1

But if we try this alternative…

name-clash.janet
(use ./each-reverse)

(each-reverse i [1 2 3 4 5]
  (print i))
janet name-clash.janet
name-clash.janet:4:1: compile error: cannot set constant

We learn that this macro is actually fragile.

And if we continue interrogating it:

over-eval.janet
(use ./each-reverse)

(each-reverse x (do (os/sleep 1) [1 2 3 4 5])
  (print x))
time janet over-eval.janet
5
4
3
2
1
janet over-eval.janet  0.01s user 0.00s system 0% cpu 6.038 total

We can see that it’s quite inefficient as well.

If you think about these programs in terms of their expanded abstract syntax trees, it’s easy to see why these problems happen:

name-clash.expanded.janet
(do
  (var i (- (length [1 2 3 4 5]) 1))
  (while (>= i 0)
    # we accidentally shadow i
    (def i (in [1 2 3 4 5] i))
    (print i)
    (-- i)))
over-eval.expanded.janet
(do
  (var i (- (length (do (os/sleep 1) [1 2 3 4 5])) 1))
  (while (>= i 0)
    # we re-evaluate the list on every iteration
    (def x (in (do (os/sleep 1) [1 2 3 4 5]) i))
    (print x)
    (-- i)))

But it isn’t clear what we can do to stop it.

Now, since you know what the macro is going to expand to, you could carefully avoid making any variables called i, and you could make sure that you only pass in a pure, simple expression as your list.

But that’s terrible. Macros should be referentially transparent: you should be able to call a macro without any idea of the actual abstract syntax tree that it produces. A macro that you have to handle with gloves on is not a good macro, in my book.

But there’s a bit of an art to writing robust, referentially transparent macros. It’s not hard to do, but it’s not trivial, and I think it deserves a chapter of its own.

But not, umm, not the next chapter. We’ve done enough metaprogramming for now. Let’s switch to something else, and circle back to this in Chapter Thirteen.

Chapter Four: Pegular Expressions

Janet does not have native, built-in regular expressions.

You can use a third-party regular expression library if you really have to, I dunno, validate an email address or something. But most of the time, if you’re writing Janet, you’ll be writing PEGs instead.

PEG stands for “parsing expression grammar,” which is a mouthful, so I’m going to stick with the acronym, even though I just wrote a whole chapter about macros without abbreviating AST once.

As a first — extremely crude — approximation, you can think of PEGs as an alternative notation for writing regular expressions. That’s not actually correct — PEGs are quite a bit more powerful than regular expressions, for starters, and they behave differently in a few respects — but we have to start somewhere, and this will let us re-use a lot of our existing knowledge.

Here, let’s look at a few random regular expressions, and see how we’d write them in PEG format.

regex: .*
  peg: (any 1)

1 means “match one byte.” any means “zero or more.”

regex: (na)+
  peg: (some "na")

Strings match literally. There are no special characters to escape. some means “one or more.”

regex: \w{1,3}
  peg: (between 1 3 (choice :w "_"))
  peg: (between 1 3 (+ :w "_"))

Janet’s :w does not include _, so we use + to say “word character or underscore.” (+ ...) is an alias for (choice ...). between is inclusive on both ends.

regex: [^a-z-]
  peg: (not (choice "-" (range "az")))
  peg: (! (+ "-" (range "az")))

(! ...) is an alias for (not ...). You can negate any PEG, not just character classes.

regex: [a-z][0-9]?
  peg: (sequence (range "az") (opt (range "09")))
  peg: (* (range "az") (? (range "09")))

* matches all of its arguments in order. (* ...) is an alias for (sequence ...). ? means “zero or one,” and (? ...) is an alias for (opt ...).

Those are pretty random examples, and this is nowhere near an exhaustive list, but it’s enough for you to start forming a general idea. Let’s notice a few things from this:

  1. PEGs are quite a bit more verbose than regular expressions.

  2. PEGs use a lot of characters that usually mean something else.

  3. PEGs are structured trees, rather than strings.

Alright, now let’s talk about some of the ways that these patterns differ from their regular expression equivalents.

First off, PEGs are always anchored to the beginning of the input, so there’s no equivalent “start of input” pattern. So (any 1) is actually equivalent to the regular expression ^.*.

Except, no, that’s not strictly true. Because PEGs do not backtrack. Which means that all repetition is implicitly ”possessive,” to use the regular expression term. So (any 1) is actually actually equivalent to ^.*+, which is not a construct that JavaScript’s regular expression engine supports.

PEGs do backtrack when using the choice combinator, as well as a few others. But backtracking is always obvious and explicit, as opposed to regular expressions’ implicit backtracking everywhere. This makes it less likely that you’ll accidentally write a PEG that executes in exponential time.

Alright. There’s one more thing we should talk about before we get to a concrete example: numbers.

We’ve seen 1 already, as a way to match any byte. You can write any other integer — 2, say, or even 3 — to match exactly that number of bytes.

But you can also write negative numbers. Negative numbers don’t advance the input at all, and they fail if you could advance that many characters. So -4 will fail unless there are fewer than four bytes left in the input. In practice I’ve only ever used this to write -1, which means “end of input.” I don’t think -1 is a particularly intuitive way to write “end of input,” so I wanted to call this out ahead of time.

Now that we’ve covered the basics, let’s look at a real example. Let’s write an HTML pretty printer.

(defn element-to-struct [tag attrs children]
  {:tag tag :attrs (struct ;attrs) :children children})

(def html-peg (peg/compile
  ~{:main (* :nodes -1)
    :nodes (any (+ :element :text))
    :element (unref
      {:main (/ (* :open-tag (group :nodes) :close-tag) ,element-to-struct)
       :open-tag (* "<" (<- :w+ :tag-name) (group (? (* :s+ :attributes))) ">")
       :attributes
         {:main (some (* :attribute (? :s+)))
          :attribute (* (<- :w+) "=" :quoted-string)
          :quoted-string (* `"` (<- (any (if-not `"` 1))) `"`)}
       :close-tag (* "</" (backmatch :tag-name) ">")})
    :text (<- (some (if-not "<" 1)))}))

(defn main [&]
  (def input (string/trim (file/read stdin :all)))
  (pp (peg/match html-peg input)))

Okay wow; we’re just diving right in huh.

First off, this isn’t really an HTML pretty printer; this is only an HTML parser. Well, strictly speaking, it’s a parser for a small subset of HTML — enough to make a point, without getting bogged down in minutiae.

So what are we looking at here?

First off, the outer pattern is a struct. The keys are names, and the values are patterns, and these patterns can reference other patterns by name — even recursively. Even mutually recursively, as you can see with :nodes and :element referring to one another.

We’ve seen named patterns like :w before, when I said it was an analog of regular expressions’ \w. But those are only the default pattern aliases, and by writing a struct like this we can create our own custom aliases, with scoping rules that make sense: patterns inside nested structs can refer to elements in the “outer struct,” but not the other way around.

Okay. Now let’s try to go through these individual patterns and make sure we understand them.

:main (* :nodes -1)
:nodes (any (+ :element :text))

The name :main is special, as that will be the pattern’s entry-point. This :main just calls :nodes, which matches zero or more :elements or :texts, and then asserts that there’s no input left. (* "x" -1) is like the regular expression ^x$.

:text (<- (some (if-not "<" 1)))

:text uses a combinator that we haven’t seen before: <-.

<- is an alias for (capture ...). We haven’t talked about captures yet, but they work similarly to regular expressions’ captures.

Just to quickly review, consider the regular expression <([^<]*)>. The parentheses around the innards there mean that there is a single “capture group,” and if we run this expression over a string, we can extract that match:

node
Welcome to Node.js v16.16.0.
Type ".help" for more information.
> /<([^<]*)>/.exec('<hello> there')
[
  '<hello>',
  'hello',
  index: 0,
  input: '<hello> there',
  groups: undefined
]

This returns an array of captured groups. The first element is the entire substring that matched the regular expression; the second is the text that matched the first (and in this case only) capture group.

> /<([^<]*)>/.exec('<hello> there')[1]
'hello'

PEGs work similarly: when you match a PEG over a string, you get a list of captures back.

repl:1:> (peg/match ~(* "<" (any (if-not ">" 1)) ">") "<hello>")
@[]

The list is empty here, because PEGs don’t implicitly capture anything. We have to explicitly ask for a capture, using <-:

repl:2:> (peg/match ~(* "<" (<- (any (if-not ">" 1))) ">") "<hello>")
@["hello"]

We could also capture the entire matching substring, if we wanted to:

repl:3:> (peg/match ~(<- (* "<" (<- (any (if-not ">" 1))) ">")) "<hello>")
@["hello" "<hello>"]

But note that captures show up “inside out.” (<- pat) first matches pat, which might push captures of its own, and then it pushes the text that pat matched.

So far this looks basically like regex country. But PEGs allow you to do so much more with captures. Here, let’s look at a slightly more interesting example:

repl:4:> (peg/match ~(* "<" (/ (<- (any (if-not ">" 1))) ,string/ascii-upper) ">") "<hello>")
@["HELLO"]

(/ ...) is an alias for (replace ...), which is a misleading name: if you pass it a function, it doesn’t replace the capture with the function, but actually maps the function over the captured value. And if you pass it a table or a struct, it looks up the capture as a key and replaces it with the value. If you pass any other values, then it actually replaces. (If actually you want to actually replace a capture with a function or a table, you have to wrap it in a function that ignores its argument.)

So we’re mapping the function string/ascii-upper over the value captured by (<- (any (if-not ">" 1))), which happens to produce a new string. But it doesn’t have to!

repl:5:> (peg/match ~(* "<" (/ (<- (any (if-not ">" 1))) ,length) ">") "<hello>")
@[5]

Our captures can be any Janet values — they don’t have to be strings. (<- pat) always captures the string that pat matches, but you can always map it, and there are other combinators that capture other things. Take $:

repl:6:> (peg/match ~(* "<" (* (<- (any (if-not ">" 1))) ($)) ">") "<hello>")
@["hello" 6]

($) is an alias for (position). It’s a pattern that always succeeds, consumes no input, and adds the current byte index to the capture stack. There’s also (line) and (column), which do what you expect.

But the most useful capture alternative is the constant operator. (constant x) always succeeds, consumes no input, and adds an arbitrary value to the capture stack. It’s useful for parsing text into something with a little more structure:

repl:7:> (peg/match ~(any (+
    (* "↑" (constant :up))
    (* "↓" (constant :down))
    (* "←" (constant :left))
    (* "→" (constant :right))
    (* "A" (constant :a))
    (* "B" (constant :b))
    (* "START" (constant :start))
    1))
    "↑↑↓↓←→←→ B A START")
@[:up :up :down :down :left :right :left :right :b :a :start]

Okay. This has been: PEG Captures 101. Now let’s get back to our HTML example.

:text (<- (some (if-not "<" 1)))

Right. So (<- (some (if-not "<" 1))) is equivalent to the regular expression ^([^<]++). It tries to match "<", and if that fails — if the next character is not < — then it advances by one character. And then it repeats, until it finds a < character or runs out of input, and finally it adds the entire string it consumed to the capture stack.

So if we give it the following input, it’s going to match the following substring:

hello yes this is <b>janet</b>

Easy. The next part is… not so easy.

:element (unref
  {:main (/ (* :open-tag (group :nodes) :close-tag) ,element-to-struct)
   :open-tag (* "<" (<- :w+ :tag-name) (group (? (* :s+ :attributes))) ">")
   :attributes
     {:main (some (* :attribute (? :s+)))
      :attribute (* (<- :w+) "=" :quoted-string)
      :quoted-string (* `"` (<- (any (if-not `"` 1))) `"`)}
   :close-tag (* "</" (backmatch :tag-name) ">")})

But we’ll take it one step at a time, and it’ll be fine.

The whole pattern is wrapped in unref, but I can’t actually explain that until the end, so we’ll skip over it for now and jump straight to :main. We’ll circle back to unref after we talk about backreferences.

:main (/ (* :open-tag (group :nodes) :close-tag) ,element-to-struct)

So an :element consists of an opening tag, some child nodes, and then a matching closing tag. Like <i>hello</i>.

But we don’t match :nodes; we match (group :nodes). Because recall that :nodes is going to push multiple nodes onto the capture stack:

:nodes (any (+ :element :text))

Specifically, anything captured in :element or :text. But (group :nodes) says “well, instead of pushing every capture individually, wrap all the captures into a tuple and push that tuple.” So we’ll match multiple nodes, but we’ll only have a single (possibly empty!) list of nodes on the capture stack when we’re done.

After we parse all of a tag’s individual components — tag name, attributes, and children — we’ll call element-to-struct to wrap it up into a nicer format. Note that element-to-struct actually takes three arguments: one for each of :element’s capture groups. (The tag name and attributes are captured by the :open-tag sub-pattern.)

But actually matching the tags is the interesting bit.

:open-tag (* "<" (<- :w+ :tag-name) (group (? (* :s+ :attributes))) ">")
:close-tag (* "</" (backmatch :tag-name) ">")

I want to draw your attention to (<- :w+ :tag-name). This is a tagged capture, and :tag-name is its “tag.” When you tag a capture, you can refer back to it later in the match — that’s exactly what (backmatch :tag-name) does.

But hark! There might be multiple tagged captures to contend with.

<p>If you have <em>nested</em> tags</p>

<p> will push a tagged capture to the stack, and so will <em>. So now there are two captures tagged :tag-name. But when we backmatch, we’re going to look for the most recent time we tagged a capture with :tag-name — which is going to be "em". This will match </em> successfully, but of course it will fail once we get to </p>.

And that’s bad! What we want to do is “scope” the tagged matches, so that parsing the <em> tag doesn’t leak out to our parsing of the <p> tag.

So that’s exactly what unref does. It says “after you’re done parsing this pattern, remove all of the tags that you associated with any captures.” By wrapping unref around our :element, we make these tagged captures local to each <tag>.

Okay, now you might be thinking: why is this a problem? Sure, we pushed "em" to the capture stack, but then we popped it off! We replaced it with an untagged struct when we called element-to-struct, right? Why can backmatch still see it?

Well, tagged captures are actually separate from the capture stack. backmatch doesn’t look for “the uppermost capture on the stack with this tag” — the tags don’t live on the capture stack at all. backmatch actually looks for “the last time we captured something with this tag.”

To help make this make sense, I’m going to describe a model of how you might implement a simple PEG matcher. We’ll keep track of two pieces of state: a stack of stacks, and a stack of tag scopes. We’ll start with a single stack on the stack-stack, and a single scope on the scope-stack, and different combinators will manipulate these.

The group combinator, for example, pushes a new stack onto the stack-stack, executes its pattern, and then pops that new stack and pushes it onto next highest stack (as a tuple). The replace combinator pushes a new stack, executes its pattern, then pops it off the stack-stack, passing its contents as positional arguments to its function. And then it pushes the return value to the new topmost stack on the stack-stack.

Meanwhile unref pushes a new tag scope, executes its pattern, and then pops the tag scope once it’s done. unref is the only combinator that affects the tag scope stack.

Alright. Now the only thing we haven’t talked about is the :attributes bit.

:attributes
  {:main (some (* :attribute (? :s+)))
   :attribute (* (<- :w+) "=" :quoted-string)
   :quoted-string (* `"` (<- (any (if-not `"` 1))) `"`)}

And I actually don’t think there’s much to say about this? You’ve seen it all already. This is easy. :s+ is “one or more whitespace characters,” and is one of many named patterns available by default.

Alright. That wasn’t so bad, was it?

(defn element-to-struct [tag attrs children]
  {:tag tag :attrs (struct ;attrs) :children children})

(def html-peg (peg/compile
  ~{:main (* :nodes -1)
    :nodes (any (+ :element :text))
    :element (unref
      {:main (/ (* :open-tag (group :nodes) :close-tag) ,element-to-struct)
       :open-tag (* "<" (<- :w+ :tag-name) (group (? (* :s+ :attributes))) ">")
       :attributes
         {:main (some (* :attribute (? :s+)))
          :attribute (* (<- :w+) "=" :quoted-string)
          :quoted-string (* `"` (<- (any (if-not `"` 1))) `"`)}
       :close-tag (* "</" (backmatch :tag-name) ">")})
    :text (<- (some (if-not "<" 1)))}))

(defn main [&]
  (def input (string/trim (file/read stdin :all)))
  (pp (peg/match html-peg input)))

When you look at it all at once, it is pretty intimidating. But just think what the equivalent regular expression would look like! Oh, wait. You can’t. Parsing HTML with regexes is famously impossible.

We’ve already seen a lot of useful PEG combinators, but we’re not limited to the built-in operations that Janet gives us. We can actually interleave arbitrary functions into a PEG, and use them to guide the matching process. This allows us to write custom predicates to express complicated matching logic that would be very difficult to implement natively in a PEG (“identifier with more vowels than consonants”), but it’s especially useful when we already have a regular function that knows how to parse strings.

For example, scan-number is a built-in function that parses numeric strings into numbers:

repl:1:> (scan-number "512")
512
repl:2:> (scan-number "512x")
nil

If we wanted to parse a number somewhere in a PEG, then… well, we’d use the built-in (number) operator that does exactly that. But let’s pretend that that doesn’t exist for a second, and try to implement it in terms of scan-number. Here’s a first attempt:

repl:1:> (peg/match ~(/ (<- (some (+ :d (set ".-+")))) ,scan-number) "123")
@[123]

That works, sometimes. But of course that number pattern is not very accurate, and we already saw that scan-number will return nil if we give it a bad input:

repl:2:> (peg/match ~(/ (<- (some (+ :d (set ".-+")))) ,scan-number) "1-12-3+3.-++")
@[nil]

But the match still succeeded, and captured nil, because that was what we told it to do.

So we could try to carefully write a valid number pattern here, such that we only ever pass valid input to scan-number. But we don’t want to do that. That sounds hard. We just want the pattern to fail if scan-number can’t actually parse a number.

Enter cmt:

repl:3:> (peg/match ~(cmt (<- (some (+ :d (set ".-+")))) ,scan-number) "1-12-3+3.-++")
nil

So cmt is very similar to replace, except that if your function returns something falsy (remember: just nil or false), then the cmt clause itself will fail to match. It’s sort of like a map vs filterMap situation.

cmt stands for “match-time capture,” apparently, even though the letters are not in that order. The name comes to us from LPEG, the Lua PEG library that inspired Janet’s PEG library, where all capture-related functions start with C. It’s a very useful function despite the confusing name, and there’s something else that makes it even more useful: the -> operator.

-> stands for backref, and it looks quite strange at first glance: all it does is re-capture a previously tagged capture. If you just used it by itself, it would duplicate previously tagged captures onto the capture stack and consume no input, which doesn’t sound very useful.

repl:1:> (peg/match ~(* (<- :d+ :num) (-> :num)) "123")
@["123" "123"]

But if you use it inside the pattern you pass to cmt, you can add previous captures as arguments to your custom mapping predicate.

Here’s a concrete, if extremely dumb, example: I’ve invented my own HTML dialect that is identical to regular HTML, except that <head> tags can optionally be closed with a </tail> tag, because that’s modestly whimsical.

Previously we were able to use backmatch to match closing tags, because they happened to be bytewise-identical to the values we captured in :open-tag:

:close-tag (* "</" (backmatch :tag-name) ">")

But now that’s no longer true, and backmatch isn’t sufficient to handle this very practical HTML dialect. We’ll have to write some logic:

(defn check-close-tag [open-tag close-tag]
  (or (= open-tag close-tag)
      (and (= open-tag "head")
           (= close-tag "tail"))))
:close-tag (* "</" (drop (cmt (* (-> :tag-name) (<- :w+)) ,check-close-tag)) ">")

Notice that we “re-capture” :tag-name, in addition to capturing the :w+. Because cmt needs a single pattern to execute, I stuck them together with *, but both of these captures will be passed as arguments to check-close-tag.

Neat.

We are now very close to knowing everything there is to know about PEGs, but I think we should talk about one more thing before we leave this chapter:

Regular expressions aren’t just useful for matching or extracting text. They’re also useful for changing text.

Regex replace is a common primitive operation; you use it all the time in your editor or with sed or whatever. And of course Janet has a native peg/replace function, and we’re going to talk about it soon.

But let’s just pretend, for a moment, that it doesn’t exist. Because it turns out that you don’t actually need a built-in PEG replace function: you can implement replacement as a special case of capturing.

It’s a pretty simple trick: we’re going to write a PEG that captures two things: the part of the string that matches the pattern we want to replace, and the entire rest of the string.

Just so we have something concrete to work with, let’s write a chaotic evil PEG: given a string, we’ll find all of the Oxford commas in that string, and replace them with Oxford semicolons.

So given input like this:

this is dumb, confusing, and upsetting

We’ll wind up with:

this is dumb, confusing; and upsetting

Naturally.

So the PEG itself is easy: we just want to match the literal string ", and", wherever it appears in the input:

repl:1:> (peg/match ~(any (+ ", and" 1)) "a, b, and c")
@[]

Okay. It did work; you just can’t tell. Let’s replace it, which will automatically capture the output, so we can at least see that it’s working:

repl:2:> (peg/match ~(any (+ (/ ", and" "; and") 1)) "a, b, and c")
@["; and"]

Okay. And now let’s also capture everything else:

repl:3:> (peg/match ~(any (+ (/ ", and" "; and") (<- 1))) "a, b, and c")
@["a" "," " " "b" "; and" " " "c"]

And we’re done! Sort of! We have the entire modified string, as a list of captures, and all we have to do now is stick them back together.

And I know: this looks unbelievably inefficient. And it would be, if we just called, like, string/concat on this result. But Janet has a way to efficiently join these matches together without even making these intermediate string allocations in the first place.

It’s called accumulate, although I’m going to use the short alias %:

repl:4:> (peg/match ~(% (any (+ (/ ", and" "; and") (<- 1)))) "a, b, and c")
@["a, b; and c"]

And accumulate is special-cased in the PEG engine: while Janet is executing the pattern inside an accumulate block, anything that would normally push captures onto the stack instead just copies it into a shared mutable buffer. And once it’s done with its pattern, that buffer becomes a string, and accumulate pushes it onto the capture stack.

So that’s a global replace. But what if you only want to replace the first occurrence?

Here’s one way:

repl:5:> (peg/match ~(% (any (+ (* (/ ", and" "; and") (<- (to -1))) (<- 1)))) "a, and b, and c")
@["a; and b, and c"]

After we match and replace the pattern, we immediately consume the rest of the string, so that the any repetition won’t fire again.

Hey look! We did it. accumulate was the last combinator on my list of combinators to tell you about, and I just told you about it. That means we’re almost done with the chapter now.

But we get to do something fun and easy first. There’s actually another way that we could have written that last pattern:

repl:6:> (peg/match ~(% (any (+ (* (/ ", and" "; and") '(to -1)) '1))) "a, and b, and c")
@["a; and b, and c"]

We replaced all of the (<- x) captures with just 'x, which does exactly the same thing. How does that work?

Well, 'x is actually just syntax sugar for (quote x). They both parse into exactly the same abstract syntax tree: if you’re writing a macro or a PEG engine or whatever else, you can’t actually tell whether it was originally written using the ' shorthand or not. So when the whole thing is quasiquoted:

repl:7:> ~(% (any (+ (* (/ ", and" "; and") '(to -1)) '1)))
(% (any (+ (* (/ ", and" "; and") (quote (to -1))) (quote 1))))

All of those single-quotes get expanded into (quote) forms, and quote is just another alias for capture in the PEG parser. But when you use the shorthand, you can save quite a few parentheses.

Now, it’s fun to work through these examples, and I think it’s valuable to understand how they work, just in case you ever find yourself needing to perform some weird text surgery deep inside some complicated PEG. But of course, in real life, you only have to write:

repl:1:> (peg/replace ", and" "; and" "a, b, and c")
@"a, b; and c"

There is also (peg/replace-all), (peg/find), which returns the index of the first match; and (peg/find-all), which returns all of the indices where the PEG would match.

Alright. That’s all of the important PEG stuff sorted, but I want to close with a few scattered, wandering observations:

  1. PEGs operate on bytes, not characters.

  2. PEGs are harder to debug than regular expressions.

  3. You can compile PEGs ahead of time.

  4. You can define your own combinators.

  5. PEGs are the best.

Chapter Fibe: Concurrency and Coroutines

It’s because this is a chapter about fibers.

JavaScript doesn’t have fibers, so I’m going to pretend like you’ve never heard of them before, even though you might be familiar with them from another language already.

The word “fiber” is a cute play on “thread:” a fiber is a lot like a thread, but it’s smaller and lighter. And a thread is a lot like a string, except— wait, no. That’s not right.

We could try to compare threads and fibers and talk about how a fiber is essentially lightweight cooperatively scheduled thread, but I don’t think that provides any useful intuition. If you’re programming with threads, you’re doing it because you have no other choice: your performance constraints require it. If you’re programming with fibers, you’re probably doing it because it’s fun and pleasant and it makes your code easier to read.

So let’s instead approach fibers from first principles. Let’s not think about threads or concurrency at all; let’s just get a hold of a fiber and see how it feels.

fiber.janet
(defn print-something []
  (print "something"))

(def fiber (fiber/new print-something))
janet fiber.janet

Okay, nothing happened.

We created a fiber by giving it a function, but it didn’t call the function. Or really: it didn’t call the function yet. It will call the function as soon as we ask it to:

fiber.janet
(defn print-something []
  (print "something"))

(def fiber (fiber/new print-something))
(resume fiber)
janet fiber.janet
something

There it is.

Now this is obviously boring, so let’s make it slightly more interesting:

fiber.janet
(defn range [count]
  (for i 0 count
    (yield i))
  "done")

(def fiber (fiber/new (fn [] (range 5))))

(print (resume fiber))
(print (resume fiber))
(print (resume fiber))
(print (resume fiber))
(print (resume fiber))
(print (resume fiber))
(print (resume fiber))
janet fiber.janet
0
1
2
3
4
done
error: cannot resume fiber with status :dead
  in _thunk [fiber.janet] (tailcall) on line 16, column 8

Alright. So hopefully this isn’t too weird; this is exactly like the following generator in JavaScript:

function* range(count) {
  for (let i = 0; i < count; i++) {
    yield i;
  }
  return "done";
}

Except that Janet throws an error if we try to resume a fiber that has already returned, while JavaScript just gives you undefined if you call .next() on a completed generator.

Fibers are iterable in Janet — just like generators are iterable in JavaScript — so we’d probably write something like this instead:

(defn range [count]
  (for i 0 count
    (yield i))
  "done")

(def fiber (fiber/new (fn [] (range 5))))

(each value fiber
  (print value))

Which prints 0 through 4, but ignores the final return value.

There’s an important difference between Janet generators and JavaScript generators, though:

(defn yield-twice [x]
  (yield x)
  (yield x))

(defn double-range [count]
  (for i 0 count
    (yield-twice i)))

(def fiber (fiber/new (fn [] (double-range 5))))

(each value fiber
  (print value))
janet fibers.janet
0
0
1
1
2
2
3
3
4
4

You can’t do that in JavaScript, because in JavaScript a generator is “scoped” to a single function. You can’t yield from a regular function and expect it to “know” that you were calling it from a generator function.

JavaScript does have a way to yield all of the values from another generator:

function* yieldTwice(x) {
  yield x;
  yield x;
}

function* range(count) {
  for (let i = 0; i < count; i++) {
    yield* yieldTwice(i);
  }
}

But this is essentially just syntax sugar for iterating over the generator returned by yieldTwice and yielding all of its values.

In Janet, though, yield does not return control from a function. It returns control from a fiber. And a fiber has a whole call stack of its very own, so when you call yield, it might have to jump “up” several stack frames at once to yield the value back to the place that called resume.

Except, really, you aren’t jumping up the stack. You’re jumping across, to a different stack. The call stack that you yielded from is still there, in all of its glory, and you can always jump back over to it by calling resume again.

But there’s something else that you can do to actually jump up and unwind the fiber’s call stack: you can raise an exception.

fiberror.janet
(defn do-your-best []
  (error "oh no"))

(defn believe-in-yourself []
  (while true
    (do-your-best)))

(def fiber (fiber/new believe-in-yourself))

(resume fiber)
janet fiberror.janet
error: oh no
  in do-your-best [fiberror.janet] on line 2, column 3
  in believe-in-yourself [fiberror.janet] on line 6, column 5
  in _thunk [fiberror.janet] (tailcall) on line 10, column 1

Okay, so, that’s probably what you expected. We raised an exception; we got an error.

But an exception doesn’t have to propagate all the way up to the root of our program. We can create fibers that intercept exceptions for us:

fiber-caught.janet
(defn do-your-best []
  (error "oh no"))

(defn believe-in-yourself []
  (while true
    (do-your-best)))

(def fiber (fiber/new believe-in-yourself :e))

(resume fiber)

The only difference is that I added the :e argument to the fiber/new call. And now it seems like nothing happens:

janet fiberror-caught.janet

But, in fact, something did happen. Rather than returning a yielded value, the resume call actually returned the error. We just didn’t print it:

janet -l ./fiberror-caught
repl:1:> (def fiber (fiber/new believe-in-yourself :e))
<fiber 0x6000039234F0>
repl:2:> (resume fiber)
"oh no"

But wait a minute. How do we know that that’s an error? That’s just a string. What if it yielded that value? Or just returned it?

repl:3:> (fiber/status fiber)
:error

Oh, I see.

So: the :e argument means that this fiber will, for lack of a better word, “catch” any errors thrown by the functions that it runs. It essentially acts like a barrier on the call stack: exceptions can get as far as the last call to resume, but no further.

Now this would be a pretty verbose way to program with exceptions, so Janet provides a macro called try that provides a familiar try-catch interface for creating fibers like this.

repl:4:> (try (do-your-best) ([e] (print e)))
oh no

It’s a little… weird-looking, I think. try takes two arguments: an expression to evaluate, and then a “catch” section, which is wrapped in parentheses and starts with a binding list.

But we can look at the expansion to see that this macro creates a fiber, resumes it, and then checks its status. In fact, we can even get the underlying fiber that it creates by adding a second identifier (fib) to the “catch” binding clause, which we could then use to print a stacktrace of the fiber at the time of the error:

repl:5:> (macex1 '(try (do-your-best) ([e fib] (debug/stacktrace fib e ""))))
(let [_000000 (<cfunction fiber/new> (fn [] (do-your-best)) :ie)
      _000001 (<function resume> _000000)]
  (if (<function => (<cfunction fiber/status> _000000) :error)
    (do
      (def e _000001)
      (def fib _000000)
      (debug/stacktrace fib e ""))
    _000001))

Okay. So if you’re paying too much attention, you might be concerned. We’ve already seen that we create fibers to yield from them, as a way to make our own generators. But we also create fibers every time we want to catch an exception. But what if we’re doing both?

(defn yield-dangerously [x]
  (if (< (math/random) 0.9)
    (yield x)
    (error "only way to live")))

(defn generate-safely [count]
  (for i 0 count
    (try
      (yield-dangerously i)
    ([e]
      (print "saved it")))
    (++ i)))

(def fiber (fiber/new (fn [] (generate-safely 5))))

(each value fiber
  (print value))

When we invoke yield-dangerously, it’s actually nested inside two fibers (well, three, if you count the top-level fiber that our code begins in). try creates a fiber, and we want that fiber to catch errors. But we learned previously that yield will yield to the parent fiber! So this would mean that this doesn’t work, right?

Well, fortunately, that is not the case. It works fine. The fiber that try creates will let yields just pass through to its parents — just like the fiber for the generator will allow exceptions to pass through to the top-level fiber.

This all comes down to the fiber’s “signal mask:” when you call fiber/new, the default “signal mask” is :y, for yield. This means that the fiber “intercepts” yield calls and prevents them from propagating to the parent fiber. But when we just pass :e, our fiber no longer intercepts yields. It intercepts exceptions instead.

You can pass :ye, if you want to, to intercept both “signals.” But I don’t know why you would want to do that.

Yield and error aren’t the only “signals” that fibers know about. There’s also a debug signal, which we’ll talk about in Chapter Eleven — it jumps “up the stack” to an interactive debugger, if you have one running. :yield, :error, and :debug are the only named signals, but there are also ten numbered “user” signals that you can intercept with flags :0 through :9.

Okay. Fibers. So one way to think about fibers is that they give you a way to put “labels” in your call stack, and then to say things like “jump up to the nearest point in the call stack labeled :e.” And they’re also first-class values that you can pass around and resume in order to jump arbitrarily deep into a suspended call stack.

But enough about what fibers are. Let’s switch gears, and talk about why we would actually want to use fibers when we’re programming.

So we saw try already — that’s a pretty big one. That’s useful. And we saw generators.

And generators are useful too! You can use them to generate ad-hoc sequences or elegantly traverse trees or lazily process complex data pipelines with better cache coherency and fewer intermediate allocations than Janet’s normal map and filter and reduce would give you.

In fact, there’s even a nice shorthand for declaring ad-hoc generators without having to go through fiber/new: coro.

(def fiber (coro
  (for i 0 5
    (yield i))))

(each value fiber
  (print value))

It’s called coro because, well, Janet fibers are more than just generators. They’re actually full coroutines.

“Coroutine” is a fancy word, but it’s basically the same as a generator. To use JavaScript notation for a minute: you make a generator with f(...args), and then you extract elements from it by calling .next(). You make a coroutine with f(...args) and then you extract elements from it by calling .next(arg).

The only difference between a generator and a coroutine is that the code that “consumes” or “uses” or “drives” or “schedules” or “iterates over” a coroutine doesn’t just say “give me your next value.” It says “here’s a value for you, now give me your next value.” It’s kind of like a generator whose behavior can be guided by the code iterating over it.

But in practice, you don’t use coroutines like generators at all! Generators are used as a lightweight way to interleave control flow between multiple unrelated functions, while coroutines are almost exclusively used as a way to interleave long-running side effectful operations into code without blocking your entire program.

In JavaScript, you usually use a different syntax called async/await when you’re writing this type of coroutine. async/await effectively creates a function* coroutine that only yields promises, and resumes it every time a promise completes. It’s a slightly less general — but much more convenient — interface for this most common of coroutine use cases.

Janet also special-cases this type of coroutine. When you’re programming asynchronously, you create fibers that you do not explicitly yield from, and that you do not explicitly resume elsewhere in your code. Instead, you hand the fiber to Janet, and then as your fiber executes it will implicitly yield when you invoke certain functions, and Janet — the Janet runtime, or, more specifically the Janet “event loop” — will resume your fiber in the future once it figures out the result.

The Janet “event loop” is a little scheduler that exists in the background of the Janet runtime. When you call functions that might take a long time to complete (like reading bytes from a socket), your program will actually “yield to the event loop.” Concretely this means that it raises a “user signal 9,” which will (probably) not be caught until it reaches the top-level of the Janet runtime, at which point Janet will start performing the effect you requested and then resume your fiber once it completes.

We’ll use the function ev/sleep to demonstrate how this works (ev means “event loop”):

event-loop.janet
(print "hello")
(ev/sleep 1)
(print "goodbye")
janet event-loop.janet
hello
goodbye

Oh. Right. I forgot that this is a book, so you can’t actually perceive the passage of time.

But, well, imagine the hello appearing, and then a one second pause, and then the goodbye appearing after that. It’s… it’s what you expect.

Here, let’s try to visualize the passage of time for you. We’ll print a . every 100 milliseconds.

event-loop.janet
(defn visualize-time []
  (while true
    (prin ".")
    (flush) # output is line-buffered by default
    (ev/sleep 0.1)))

(visualize-time)

(print "hello")
(ev/sleep 1)
(print "goodbye")
janet event-loop.janet
..................^C

Well, of course this just prints . forever, because I put the main fiber into an infinite loop. But that’s not really what I wanted: what I wanted was to run visualize-time in the background. To schedule it to run only when the main fiber is waiting for its asynchronous ev/sleep to complete.

To do that, we can use ev/call to both create a new fiber and to schedule that fiber to be resumed as soon as the main fiber yields to the event loop. Which it does as soon as we run ev/sleep:

event-loop.janet
(defn visualize-time []
  (while true
    (prin ".")
    (flush) # output is line-buffered by default
    (ev/sleep 0.1)))

(ev/call visualize-time)

(print "hello")
(ev/sleep 1)
(print "goodbye")
janet event-loop.janet
hello
..........goodbye
......................^C

Great. Except that, well, the program runs forever, because visualize-time just loops indefinitely. We’ll have to interrupt it to get our program to complete gracefully:

event-loop.janet
(defn visualize-time []
  (while true
    (prin ".")
    (flush) # output is line-buffered by default
    (ev/sleep 0.1)))

(def background-fiber (ev/call visualize-time))

(print "hello")
(ev/sleep 1)
(print "goodbye")

(ev/cancel background-fiber "interruption")
janet event-loop.janet
hello
..........goodbye
error: interruption
  in ev/sleep [src/core/ev.c] on line 2928
  in visualize-time [event-loop.janet] (tailcall) on line 5, column 4

Oh, gross. Now it printed an error. That’s not really what we wanted. Why did that happen?

Let’s make the control flow a little more explicit:

event-loop.janet
(defn visualize-time []
  (while true
    (prin ".")
    (flush) # output is line-buffered by default
    (ev/sleep 0.1)))

(def background-fiber (ev/call visualize-time))

(print "hello")
(ev/sleep 1)
(print "goodbye")

(ev/cancel background-fiber "interruption")

(print)
(print "The main fiber is still running!")
(print "But as soon as our background task")
(print "resumes, it will immediately raise an")
(print "error. Like this:")
(print)
(ev/sleep 0)
(print)
(print "And we're back to the main fiber.")
(print "Let's check on our background fiber:")
(print)
(print "status: " (fiber/status background-fiber))
(print "value: " (fiber/last-value background-fiber))
janet event-loop.janet
hello
..........goodbye

The main fiber is still running!
But as soon as our background task
resumes, it will immediately raise an
error. Like this:

error: interruption
  in ev/sleep [src/core/ev.c] on line 2928
  in visualize-time [event-loop.janet] (tailcall) on line 5, column 5

And we're back to the main fiber.
Let's check on our background fiber:

status: error
value: interruption

Neat. Okay. So we sort of canceled the task loudly and violently, by forcing it to raise an exception that propagated all the way to the top level. But if we want to silently stop this background task, we can instead catch the exception:

event-loop.janet
(defn visualize-time []
  (var stopped false)
  (while (not stopped)
    (prin ".")
    (flush) # output is line-buffered by default
    (try
      (ev/sleep 0.1)
      ([error try-catch-fiber]
        (if (= error :stop)
          # gracefully handle the expected "exception"
          (set stopped true)
          # re-raise any unexpected exceptions
          (propagate error try-catch-fiber))))))

(def background-fiber (ev/call visualize-time))

(print "hello")
(ev/sleep 1)
(print "goodbye")

(ev/cancel background-fiber :stop)
janet event-loop.janet
hello
..........goodbye

There we go. We wait for a second, and now you can viscerally appreciate the passage of time through the universal language of small progress dots.

Now, since there’s only one “waiting state” that we can be in, and since the function’s control flow is so easy to exit, an exception feels like a little bit of overkill.

But there’s another way that we can influence how the scheduler resumes this thread: we can ask ev/go to “fill in” the current value that it’s waiting for, overriding whatever the actual event loop might be doing.

Normally ev/sleep just “returns nil”, by which I mean “the Janet event loop resumes our fiber with the value nil for this expression.” But we can cause it to return a different result:

event-loop.janet
(defn visualize-time []
  (var stopped false)
  (while (not stopped)
    (prin ".")
    (flush) # output is line-buffered by default
    (if (= (ev/sleep 0.1) :stop)
      (set stopped true))))

(def background-fiber (ev/call visualize-time))

(print "hello")
(ev/sleep 1)
(print "goodbye")

(ev/go background-fiber :stop)
janet event-loop.janet
hello
..........goodbye

ev/go is just like calling resume on the fiber, except that we can’t manually resume a fiber once we hand it to the event loop. Janet calls these fibers “root fibers,” and they can only be resumed with ev/go.

And this is pretty weird; I don’t even know what would happen if the fiber were actually waiting on something, and I don’t know when we would reasonably want to jump in front of the event loop like this.

And while this code is a bit shorter than the exception version in this particular case, if there were multiple points where our background fiber could yield to the event loop, we could use an exception to take care of all of them at once (instead of having to check for :stop at every yield point).

So let’s go back to the exception-throwing case, and talk about another way that we could handle this gracefully:

event-loop.janet
(defn visualize-time []
  (while true
    (prin ".")
    (flush) # output is line-buffered by default
    (ev/sleep 0.1)))

(def background-fiber (ev/call visualize-time))

(print "hello")
(ev/sleep 1)
(print "goodbye")

(ev/cancel background-fiber "interruption")
janet event-loop.janet
hello
..........goodbye
error: interruption
  in ev/sleep [src/core/ev.c] on line 2928
  in visualize-time [event-loop.janet] (tailcall) on line 5, column 4

So by default when a root fiber raises an exception, Janet will print a stacktrace like this. But we can change the way that Janet handles exceptions in root fibers, by installing a supervisor for the fiber.

event-loop.janet
(defn visualize-time []
 (while true
   (prin ".")
   (flush) # output is line-buffered by default
   (ev/sleep 0.1)))

(def supervisor (ev/chan))

(def background-fiber (ev/go visualize-time nil supervisor))

(print "hello")
(ev/sleep 1)
(print "goodbye")

(ev/cancel background-fiber :stop)
(def fiber-event (ev/take supervisor))
(match fiber-event
  [:error fib environment] (do
    (def error (fiber/last-value fib))
    (if (= error :stop)
      (print "gracefully stopped")
      (propagate error fib)))
  event (error (string/format "unexpected fiber event %q" event)))
janet event-loop.janet
hello
..........goodbye
gracefully stopped

Which feels much better to me. The background fiber no longer needs to know how that it’s going to be canceled or exactly what cancellation is going to look like. Our top-level fiber handles the exception for it, and checks it against the value that it chose to mean “gracefully cancel.”

So the way this works is that if a signal propagates all the way to a root fiber, and that signal is in the fiber’s “signal mask,” Janet will write a message about the signal into the “supervisor channel.” And ev/go will, by default, create a fiber with a signal mask of :e01234, which is why we see the error event.

A channel is a bounded queue, that can be read from and written to asynchronously. Reads suspend execution until a value is available, and writes suspend execution if the queue is full, resuming once another fiber takes a value off the queue.

We could use “supervisor channels” to implement our own scheduler, if we wanted to, reacting to errors (or other signals!) across multiple worker fibers. But we’re not going to do that, in this book. We’re not going to talk about channels much at all.

Which is a shame, because channels are very cool, and they’re an important communication primitive when you’re writing complex concurrent programs. But we just aren’t going to have time to do that together, and there is already a large body of literature about “communicating sequential processes” that will teach you how to take advantage of this model of concurrency. I don’t think there’s much point to giving a Janet-specific treatment here — channels have exactly the API you’d expect.

So that’s the event loop. You have seen how it works now, even if we haven’t really discussed what you can do with it.

I suspect that you will mostly interact with the event loop when you want to perform non-blocking IO using the “stream” API, which is an abstraction over byte buffers that you can read from or write to without blocking your program. You’ll probably create streams to read or write to files or TCP sockets, although you can also create streams programmatically.

streams.janet
(defn print-dots []
  (while true
    (prin ".")
    (flush)
    (ev/sleep 0)))

(ev/call print-dots)

(def f (os/open "lorem-ipsum.txt" :r))
(print "About to read")
(def bytes (ev/read f 10))
(print "Done reading")
(ev/close f)
(print "Done closing the file descriptor")
(printf "read %q" bytes)
(os/exit 0)
janet streams.janet
About to read
.Done reading
Done closing the file descriptor
read @"Lorem ipsu"

Ah, well, the read completed very quickly, so we only got a single time-passing dot. But we can see that the other fibers in our program still got a chance to run while this was taking place.

Contrast this with the blocking file API:

blocking.janet
(defn print-dots []
  (while true
    (prin ".")
    (flush)
    (ev/sleep 0)))

(ev/call print-dots)

(def f (file/open "lorem-ipsum.txt" :r))
(print "About to read")
(def bytes (file/read f 10))
(print "Done reading")
(file/close f)
(print "Done closing the file descriptor")
(printf "read %q" bytes)
(os/exit 0)
janet blocking.janet
About to read
Done reading
Done closing the file descriptor
read @"Lorem ipsu"

Basically the same code, but file/read suspended our entire program (not just the current fiber) while it did the read, so the Janet event loop never got a chance to schedule the fiber running our print-dots function.

I think it’s very important to understand why one is blocking and the other is non-blocking, so at the risk of over-explaining this pretty simple example, here’s what actually happened in the non-blocking case:

The call to ev/read does two things: first, it tells the kernel that we want to read from the underlying file descriptor backing the stream, using epoll on Linux or kqueue on macOS or something called an IoCompletionPort (?) on Windows. And then it raises a user signal 9, which causes the current fiber to stop running, and ultimately (unless there is another fiber intercepting user signal 9!) yields control all the way up to the Janet event loop. And then the Janet event loop, umm, loops for a bit, checking on the jobs that we’ve asked the kernel to do, stopping only when the file descriptor has bytes available that it can read. Then the event loop resumes the fiber that called ev/read in the first place, passing it (through the resume call) the actual bytes that it read from the kernel.

Okay.

Fibers.

Fibers.

We’ve talked an awful lot about fibers already, haven’t we? Surely there isn’t anything else to say about them? It’s probably time for a recap now, isn’t it?

So just to recap, fibers are a primitive control flow construct that you can use to do the following useful things:

  1. catch exceptions
  2. write generators
  3. perform non-blocking, event-driven IO
  4. early return from functions
  5. write coroutines
  6. scope dynamic variables

Oh gosh. We haven’t talked about all of these things yet. We still have a few things to get through. But the event loop stuff was by far the trickiest bit; the rest will be pretty easy in comparison.

First off: early return. I think that you know enough about fibers by this point to understand how you would implement “early return:” you just wrap the body in a fiber that intercepts a signal:

(defmacro with-early-return [& body]
  ~(resume (fiber/new (fn [] ,;body) :i0)))

(defn return [value]
  (signal 0 value))

(defn my-function []
  (with-early-return
    (print "hello")
    (return "stopping early")
    (print "after returning")))

(print (my-function))

Not so bad! Except that this has the weird property that you can actually return from a function that called you. Look:

early-return.janet
(defmacro with-early-return [& body]
  ~(resume (fiber/new (fn [] ,;body) :i0)))

(defn return [value]
  (signal 0 value))

(defn helper-function []
  (return "helper function currently on strike"))

(defn my-function []
  (with-early-return
    (print "do some work")
    (helper-function)
    (print "keep working")))

(print (my-function))
janet early-return.janet
do some work
helper function currently on strike

Weird, right?

Now, Janet already has built-in macros that implement more sophisticated “early return” behaviors than this — prompt, which has this “return from a parent function” behavior, and label, which does not. Er, well, you can sort of do it anyway with label, but you’d have to give your helper functions explicit permission to return… whatever. Fiber-based control flow is just a little bit different than traditional early-return.

  1. catch exceptions
  2. write generators
  3. perform non-blocking, event-driven IO
  4. early return from functions
  5. write coroutines
  6. scope dynamic variables

Oh, coroutines.

We’ve spent a lot of time already talking about a specific application of coroutines: asynchronous event-driven IO. But coroutines in general are fancier, more powerful versions of generators, right? And generators can do lots of cool things. Coroutines should be able to do even cooler things, shouldn’t they?

But you don’t see it very often! And it’s hard to come up with a simple example of when you’d want to use a coroutine to simplify your code, in part because coroutines don’t really make simple things easier. They make complex, hairy things easier.

In fact, in the last year, I have only encountered one problem where I felt that coroutines — pure coroutines — were a good fit, and actually made the code simpler and easier to follow.

I was writing a parser for a weird language that lets you use custom operators before you define them. So any time I encountered an unknown symbol I had to stop parsing the current statement and move onto the next one, because I didn’t know whether to parse that symbol as an operator or as a regular value. (And, for reasons, I couldn’t do a two-pass thing to identify operators ahead of time.)

So a very natural way to implement that is to create a coroutine for every statement, and an outer “scheduler” for the whole program that you’re parsing. The scheduler starts the first statement’s coroutine, and lets it run until it encounters an unknown symbol (which it yields). Then the scheduler writes down the symbol that it’s waiting for, and moves on to the next statement.

Whenever a coroutine finishes parsing a statement, and you learn whether it contained an operator or a function declaration, then the scheduler finds any coroutines that were waiting on that symbol and resumes them (passing in the symbol’s type when it does).

You can see strong parallels between this parser and the “effectful” coroutines of the async/await variety. In both cases there’s some kind of scheduler that’s coordinating work between multiple coroutines — either the built-in event loop, or my own “parser scheduler.” In both cases yield means “I’m asking a question that you might not know the answer to yet.” And in both cases the scheduler resumes once it has the answer.

But I don’t want you to think that this “shape” of problem is the only thing that you can use pure coroutines for — it’s just the only time I ever think to reach for them. All of my experience with coroutines comes from this asynchronous event loop type of programming, so those are the only nails I try to hit with them.

Food for thought, though: generators make it easy to write ad-hoc iterators. Coroutines make it easy to write ad-hoc state machines. But this conversation is a little bit out of scope for this book.

Oh, speaking of scopes…

  1. catch exceptions
  2. write generators
  3. perform non-blocking, event-driven IO
  4. early return from functions
  5. write coroutines
  6. scope dynamic variables

We haven’t talked about dynamic variables yet, but one way to think about them is like a global variable with a stack of values. Instead of setting dynamic variables, you push new values for them, and when you’re done with whatever it is you’re doing, you pop that value off, restoring the dynamic variable to whatever it was set to previously.

But actually, the “stack” of values is determined by the “stack” of fibers that you are currently running. Fibers each have their own view of the current “dynamic variables,” and when a fiber completes, any dynamic variables that it had set go away.

This is a simplification, and it’s weird, so let’s look at a concrete example.

dynamic.janet
(def file (file/open "output.txt" :w))
(print "everything is normal")
(with-dyns [*out* file]
  (print "but this writes to a file"))
(print "back to normal")
janet dynamic.janet
everything is normal
back to normal
cat output.txt
but this writes to a file

*out* is a dynamic variable that determines the default destination for functions like print and prin and printf. By setting the dynamic variable to a new value, we essentially “redirect” these functions to write a file instead. (Note that we aren’t actually redirecting stdout when we do this, we’re just changing the behavior of print, which knows to consult this special dynamic variable.)

I mostly see dynamic variables used like this: as implicit additional function arguments that are silently available to functions. So rather than print taking an optional argument for the destination buffer, Janet uses a pass-by-dynamic-variable calling convention for it.

So how do dynamic variables work, and what do they have to do with fibers?

Well, every fiber has something called an environment. You might remember environments from Chapter Two, when I said that your program’s environment is the “top-level scope.” This was a simplification: it’s not really the “program’s environment;” it’s the “default fiber’s environment.”

You can manipulate the environment by calling setdyn, and you can query the environment by calling dyn. The environment is a table, so you can put any values in it, but by convention dynamic variables are named with keywords:

environment.janet
(def f (file/open "output.txt" :w))
(printf "*out* is actually just %q" *out*)
(setdyn *out* f)
(pp (curenv))
janet environment.janet
*out* is actually just :out
cat output.txt
@{f @{:source-map ("environment.janet" 1 1) :value <core/file 0x6000022512B0>}
  :args @["environment.janet"]
  :current-file "environment.janet"
  :out <core/file 0x6000022512B0>
  :source "environment.janet"}

Well, I actually pretty-printed it a little, but you get the idea.

So you can see those other entries in our environment table — :args and :current-file and :source — those are actually just dynamic variables that Janet sets by default. We can get the current value with (dyn :args):

dyn.janet
(print "I have the following arguments:")
(pp (dyn :args))
janet dyn.janet
I have the following arguments:
@["dyn.janet"]
janet -c dyn.janet dyn.jimage
I have the following arguments:
@["-c" "dyn.janet" "dyn.jimage"]

Okay, so so far these dynamic variables are just entries in the root fiber’s environment table. But when we create a new fiber, we have three choices for what environment it should have:

  1. No environment at all (this is the default). If you call setdyn without an environment, it will automatically create an empty one for you, and install it with fiber/setenv.
  2. The exact same environment as the code creating it (the :i flag, for “inherit”). If this fiber calls setdyn, it will change its parents environment table.
  3. A new environment table whose prototype is equal to the parent environment (the :p flag, for “prototype”). This environment will be able to read all of the values in the parent environment, but if it calls setdyn, those changes won’t be visible to the parent fiber. (We’ll talk more about prototypal inheritance in Chapter Eight, if this doesn’t make sense.)

In practice, you won’t have to think about this at all. You will just use the helper with-dyns, which just creates and immediately resumes a fiber with no signal mask and the :p environment flag, whose function first calls setdyn for each of the dynamic bindings and then runs all of the expressions that you pass it. Using with-dyns means that you don’t need to worry about your dynamic variables accidentally outliving their intended scope in the case that you raise an exception before you can clean up after yourself.

  1. catch exceptions
  2. write generators
  3. perform non-blocking, event-driven IO
  4. early return from functions
  5. write coroutines
  6. scope dynamic variables

Ah, that feels good.

But I actually left one thing out. One thing that I don’t really want to talk about, but that I have to mention before we can bring this chapter to a close.

Janet also supports running fibers in their own actual OS-level threads. You can actually spawn “real” background tasks that run in parallel with the rest of your process and communicate with other fibers via thread-safe channels that you can create with ev/thread-chan. Janet supports multithreading.

I’m not going to talk about multithreading in Janet, because I don’t have any personal experience writing multithreaded Janet, so all I could really do is regurgitate the official documentation. And the official documentation is pretty easy to understand. So go there, if you want to write multithreaded Janet.

Chapter Six: Control Flow

Alright. We just did a whole chapter on concurrency and coroutines and complicated cross-stack control flow. I think we’ve earned a break.

So this is going to be a chapter about simple control flow. Loops and list comprehensions and if expressions; things like that.

You’ve seen a lot of control flow already, and I didn’t think that any of it deserved explanation. The following all do the things that you’d expect them to:

(each x [1 2 3]
  (print x))
(for x 0 3
  (print x))
(while true
  (print x))

It’s worth talking about each, though. each can iterate over a variety of data structures — tuples, arrays, structs, tables, strings, buffers, fibers (generators), and even keywords and symbols (which behave identically to strings).

Janet doesn’t have a formal concept of “interfaces” or “protocols” for types to conform to, and you can’t make an iterable “object” by defining a few “methods.” Iteration is based on a single function, next, and you cannot overload what next means for types that you define in Janet.

But! If you define a custom JANET_ABSTRACT type, you can provide a custom implementation of next. We’ll talk about this in Chapter Nine.

It’s a bit weird that Janet doesn’t let you make custom iterable types without writing C code, but at the same time it means that you can always use structs and tables as generic containers: you never need to worry about accidentally inserting a key called :next and shadowing a method or something, so Janet has no equivalent of JavaScript’s Object.prototype.toString.call(object) pattern.

Okay, so next is really simple: it gives you a way to iterate over the keys in a data structure:

repl:1:> (next [10 20 30])
0
repl:2:> (next [10 20 30] 0)
1
repl:3:> (next [10 20 30] 1)
2
repl:4:> (next [10 20 30] 2)
nil

Keys! Not values. For tuples and arrays — and strings, and other sequential types — the keys are just indices. For associative types, they’re the, umm, keys:

repl:1:> (next {:foo 1 :bar 2})
:foo
repl:2:> (next {:foo 1 :bar 2} :foo)
:bar
repl:3:> (next {:foo 1 :bar 2} :bar)
nil

Note that next returns nil to indicate “no more keys.” This means that nil cannot, itself, be a key of any data structure! This is why Janet doesn’t allow nil to appear as a key in a table or a struct.

For fibers, however, next actually resumes and advances to the first call to yield. And then it returns 0. Yes, 0. Always 0.

repl:1:> (def generator (coro (yield 10) (yield 20) (yield 30)))

repl:2:> (next generator)
0
repl:3:> (in generator 0)
10
repl:4:> (in generator 0)
10
repl:5:> (next generator 0)
0
repl:6:> (in generator 0)
20
repl:7:> (next generator)
0
repl:8:> (next generator)
nil

So next is not a pure function; it can actually advance the underlying structure in some cases. This is weird, since it looks like a pure function — you give it the “previous index” as an explicit argument, after all. And usually it is! But you can’t rely on that, when you’re dealing with fibers or abstract types.

So each uses next to compute the keys, and then it calls in to look up the values. There’s also eachk, which calls next to compute the keys, and just iterates over those:

repl:1:> (eachk i [-3 1 99] (pp i))
0
1
2
nil
repl:2:> (eachk i {:foo 1 :bar 2} (pp i))
:foo
:bar
nil
repl:3:> (eachk i (coro (yield 1) (yield 2)) (pp i))
0
0
nil

And there’s eachp, which iterates over key-value pairs:

repl:1:> (eachp i [-3 1 99] (pp i))
(0 -3)
(1 1)
(2 99)
nil
repl:2:> (eachp i {:foo 1 :bar 2} (pp i))
(:foo 1)
(:bar 2)
nil
repl:3:> (eachp i (coro (yield 1) (yield 2)) (pp i))
(0 1)
(0 2)
nil

Nothing tricky here.

Now, okay. I said that you can’t define your own iterable implementation in Janet without resorting to C code. This is true. But you can write a fiber that statefully iterates over values in a structure, and then iterate over that. It’s sort of… weird and hacky, and you can’t use eachk or eachp, because the keys of the thing you’re actually iterating over will always be 0, but it’s an easy way to define an ad-hoc iterator:

(defn make-table-set [& elements]
  (def result @{})
  (each element elements
    (put result element true))
  result)

(defn elements [table-set]
  (coro
    (eachk element table-set
      (yield element))))
repl:1:> (def good-numbers (make-table-set 1 3 60))
@{1 true 3 true 60 true}
repl:2:> (reduce + 0 (elements good-numbers))
64

This is a pretty dumb example, but you can see that this trick allows us to use functions that use next under the hood, like map and reduce and filter, with structs and tables that have their own logical idea of how iteration should work.

Okay, that’s looping on easy mode. But sometimes looping is not quite as easy. Sometimes you have to write nested loops, or loops full of conditions. Consider this simple structure:

(def hosts [
  {:name "claudius"
   :ip "45.63.9.183"
   :online true
   :services
     [{:name "janet.guide"}
      {:name "bauble.studio"}
      {:name "ianthehenry.com"}]}
  {:name "caligula"
   :ip "45.63.9.184"
   :online false
   :services[{:name "basilica.horse"}]}])

Let’s say we want to print all of the names of the services for any hosts that are online. This isn’t hard; it’s just a nested loop:

(each host hosts
  (if (host :online)
    (each service (host :services)
      (print (service :name)))))

I don’t think there’s anything wrong with that code, but you might prefer the following alternative:

(loop [host :in hosts
       :when (host :online)
       service :in (host :services)]
  (print (service :name)))

loop is a little DSL that makes it easy to write nested loops and conditionals. In this case it didn’t buy us too much, since the expression was so simple. But it can really simplify complex nested looping:

(def hosts [
  {:name "claudius"
   :ip "45.63.9.183"
   :online true
   :services
     {"janet.guide" true
      "bauble.studio" false
      "ianthehenry.com" true}}
  {:name "caligula"
   :ip "45.63.9.184"
   :online false
   :services {"basilica.horse" true}}])

(each host hosts
  (if (host :online) 
    (let [ip (host :ip)]
      (eachp [service-name available] (host :services)
        (if available
          (for instance 0 3
            (pp [ip service-name instance])))))))

Now compare that to the equivalent loop expression:

(loop [host :in hosts
       :when (host :online)
       :let [ip (host :ip)]
       [service-name available] :pairs (host :services)
       :when available
       instance :range [0 3]]
  (pp [ip service-name instance]))

I fully admit that this is a contrived, artificial example, but I hope that it demonstrates some of the power of loop. It lets you iterate over values, keys, key-value pairs, and arbitrary ranges. It lets you insert conditions — stateless conditions like :when, and stateful conditions like :while and :until. :let allows you to give names to intermediate values, and you can inject arbitrary effects before and after the inner loop with :before and :after.

loop can be very powerful, and perhaps even a little intimidating at first, and you might be wondering if it’s worth learning a whole weird DSL just to make nested loops slightly shorter to write. And that’s fair — I think that I mostly just use loop because :when lets me save a little indentation over (each ... (if ...)).

But there’s a good reason to understand this little DSL, and that reason is seq.

seq is not loop, but it uses the exact same language to express what it does. But instead of imperatively looping to perform side effects and then returning nil, seq will allocate an array that collects every value that your loop body evaluates to. It’s like a super-powered list comprehension:

hosts.janet
(def hosts [
  {:name "claudius"
   :ip "45.63.9.183"
   :online true
   :services
     {"janet.guide" true
      "bauble.studio" false
      "ianthehenry.com" true}}
  {:name "caligula"
   :ip "45.63.9.184"
   :online false
   :services {"basilica.horse" true}}])

(def services
    (seq [host :in hosts 
          :when (host :online)
          service :pairs (host :services)]
      service))

(pp services)
janet hosts.janet
@[("ianthehenry.com" true) ("janet.guide" true) ("bauble.studio" false)]

This is a dumb example, but you can often simplify a complex map/filter/mapcat pipeline into a single seq that performs your data transformation more efficiently.

There’s also tabseq, which you can use to construct a table out of a sequence of key-value pairs, and generate, which will return a fiber that yields each of the inner values, so that you can lazily consume them later.

That’s all I’m going to say about the loop macro — the official documentation has an exhaustive list of all the things you can write in a loop or loop-flavored expression, and it’s worth glancing over it once.

Finally, we should talk about break. break works just like it does in JavaScript — it breaks out of the innermost loop. But you can also use break outside of a loop, as a cheap kind of early return:

(defn test-breaking []
  (if true
    (break "everything is fine"))
  (error "this won't get a chance to raise"))
repl:1:> (test-breaking)
"everything is fine"

But if you want to early return from inside a loop, or if you want to break out of multiple levels a loop at once, you’ll have to use the prompt or label macros to create an abortable fiber.

Sadly break does not allow loops to evaluate to an expression. Loops always return nil, even if you break with a value:

repl:1:> (while true (break 123))
nil

And there is no equivalent of JavaScript’s continue built into the language — but, of course, you can simulate it with a fiber.

Alright. That’s all I know about looping in Janet. Let’s move on to conditionals.

Conditionals are very easy and very simple, but they might look slightly weird if you’re only used to JavaScript.

Let’s start with if. There’s no else in Janet’s if; the else part is implicit. (if condition then-part else-part). This means that the then-part can only contain a single expression, so you might need to group expressions with do if you want to do multiple things.

But it also means that there’s nowhere to write else if the way that you would in JavaScript:

if (x > 0) {
  console.log("positive")
} else if (x < 0) {
  console.log("negative")
} else if (x === 0) {
  console.log("zero")
} else {
  console.log("NaNs for breakfast again??")
}

If you wrote that in Janet, it would look…

(if (> x 0)
  (print "positive")
  (if (< x 0)
    (print "negative")
    (if (= x 0)
      (print "zero")
      (print "NaNaNaNaN"))))

…awful, in my opinion. There’s nothing worse than having to count parentheses because your expressions get too nested.

But fortunately you don’t have to write code like this. Nested ifs are such a common thing that Janet has a special macro for creating them without any triangular indentation:

(cond
  (> x 0) (print "positive")
  (< x 0) (print "negative")
  (= x 0) (print "zero")
  (print "NaNaNaNaN"))

cond is literally the same as writing nested ifs:

repl:1:> (macex '(cond (> x 0) (print "positive") (< x 0) (print "negative") (= x 0) (print "zero") (print "NaNaNaNaN")))
(if (> x 0) (print "positive") (if (< x 0) (print "negative") (if (= x 0) (print "zero") (print "NaNaNaNaN"))))

But it’s much nicer looking.

Janet also has case, which is a special, umm, case of cond, when all of your conditions are of the form (= value something):

(defn strings [data]
  (case (type data)
    :string (print data)
    :tuple (each element data (strings element))
    (error "invalid")))
repl:1:> (strings ["find" ["those" ["nested"]] "values"])
find
those
nested
values
nil

This is very similar to JavaScript’s switch statement, but much more ergonomic. No breaks to worry about, no arguing over whether or not to indent the case lines. Just round, sumptuous parentheses as far as the eye can see.

But Janet has another switch alternative that’s a lot more powerful than case. It’s called match, and instead of only matching literal values, it matches data structures against patterns, allowing you to check multiple values in the same structure and to easily extract individual pieces.

We could use match to implement a really verbose and contrived calculator:

(defn calculate [expr]
  (match expr
    [:add x y] (+ x y)
    [:subtract x y] (- x y)
    [:multiply x y] (* x y)
    [:divide x y] (/ x y)))
repl:1:> (calculate [:add 1 2])
3
repl:2:> (calculate [:subtract 5 10])
-5

“Simple” values like keywords and numbers and strings match by equality, while “fancy” values like tuples and structs match each of their elements by equality. Identifiers like x match anything, and bind the name x to the value that it matched. You get it. It’s pattern matching. I know JavaScript doesn’t have pattern matching, but I’m sure you’ve seen this somewhere before.

You can also add arbitrary conditions to any pattern by wrapping it in parentheses and adding a boolean expression:

(defn calculate [expr]
  (match expr
    [:add x y] (+ x y)
    [:subtract x y] (- x y)
    [:multiply x y] (* x y)
    ([:divide x y] (= y 0)) (error "division by zero!")
    [:divide x y] (/ x y)))
repl:1:> (calculate [:add 1 2])
3
repl:2:> (calculate [:divide 1 0])
error: division by zero!

Which makes match strictly more powerful than cond.

The pattern _ matches anything but doesn’t create a binding named _, even though that is a valid identifier. And you can match dynamic runtime values by using (@ identifier):

(def magic-number (math/rng-int (math/rng) 10))
(defn guessing-game [guess]
  (match guess
    (@ magic-number) "you got it!"
    _ "better luck next time"))
repl:1:> (guessing-game 1)
"better luck next time"
repl:2:> (guessing-game 3)
"better luck next time"
repl:3:> (guessing-game 6)
"better luck next time"
repl:4:> (guessing-game 5)
"better luck next time"
repl:5:> (guessing-game 4)
"you got it!"

Nice.

Obviously this particular case should just be an if expression, but remember that you can include _ and (@ foo) anywhere inside a deeply nested pattern, so they can be very useful.

Alright. Now: match is great, and pattern matching is great, and you won’t hear me say anything against pattern matching in the abstract.

But.

Janet’s implementation of pattern matching happens to have a couple rough edges that you’ll need to be aware of when you’re using match.

The first and largest gotcha concerns tuple patterns: tuple patterns actually match prefixes of sequential structures:

(match [1 2]
  [] "no elements"
  [x] "one element"
  [x y] "two elements"
  [x y z] "three elements")

What would you expect that to evaluate to? Yeah, me too. But, unfortunately, it’s "no elements", because [] is the first pattern that matches a prefix of the data. If you invert the order of the cases, it does the thing you’d expect:

(match [1 2]
  [x y z] "three elements"
  [x y] "two elements"
  [x] "one element"
  [] "no elements")

That evaluates to "two elements", because [x y] is the first matching prefix.

This is terrible, and you will mess this up at some point, even though I warned you about it, because it’s just so unintuitive. My only recommendation to avoid this is to write your own match macro that doesn’t have this problem, and exclusively use that.

The next gotcha has to do with associative patterns — matching tables and structs.

Associative patterns can be very annoying in Janet, because of the fact that structs and tables cannot contain nil. This is unfortunate, and I’m very sorry; it’s still my least favorite thing about Janet. But it’s something that you’ll have to be aware of, because you might find yourself writing a very simple match like this:

(def binary-tree {:value 10 :left nil :right {:value 15 :left nil :right nil}})

(defn contains? [tree needle]
  (match tree
    nil false
    {:value (@ needle)} true
    {:value value :left left :right right} (cond
      (< needle value) (contains? left needle)
      (> needle value) (contains? right needle))))

And then being very surprised that it does not work:

repl:1:> (contains? binary-tree 10)
true
repl:2:> (contains? binary-tree 15)
nil

This is because the pattern {:left _} cannot match a struct with {:left nil}, because there are no structs with {:left nil}. {:left nil} is the same as {}. It’s the empty struct. Really:

repl:1:> {:left nil}
{}

This isn’t the end of the world; it just means that we can’t use nil as a sentinel value in any associative data structures. Arguably it’s nice to have a separate sentinel anyway, but usually when I’m hacking up a quick script I just want to reach for nil in the same places that I would reach for null in other languages. But remember: nil is not null. nil is undefined, so we have to make our own null substitute:

(def empty-tree @{})

(def binary-tree
  {:value 10
   :left empty-tree
   :right {:value 15 :left empty-tree :right empty-tree}})

(defn contains? [tree needle]
  (match tree
    (@ empty-tree) false
    {:value (@ needle)} true
    {:value value :left left :right right} (cond
      (< needle value) (contains? left needle)
      (> needle value) (contains? right needle))))
repl:1:> (contains? binary-tree 10)
true
repl:2:> (contains? binary-tree 11)
false
repl:3:> (contains? binary-tree 15)
true

So that’s match. The most powerful conditional control flow statement.

cond, case, and match all take pairs of “thing to check” and “expression to evaluate if that thing passes the check.” But they can all optionally take a single final argument to act as a default expression if nothing else matches before that. Otherwise, they’ll default to nil.

(case x
  1 "one"
  2 "two"
  "default value")
(cond
  (= x 1) "one"
  (= x 2) "two"
  "default value")
(match x
  1 "one"
  (@ (+ 1 1)) "two"
  "default value")

And that’s control flow! That was easy, wasn’t it? I mean, compared to fibers, that was nothing.

There are a few little stragglers that we can knock out quickly before we say goodbye: when is a lot like if, but when has no else part, so you can write multiple things in the then part without having to wrap them in do:

(when (even? x)
  (print "it's even!")
  (print "this is a joyous day"))

I think of when as an imperative, side-effecty thing, and if as more of an expressiony thing. But (when x y z) is just shorthand for (if x (do y z)).

There’s also unless, which is exactly like when, but inverts the condition. So (unless x y z) is the same as (if (not x) (do y z)).

Actually, there’s shorthand for that too: (if-not x (do y z)).

There’s actually a whole menagerie of additional control flow constructs that you could use — if-let and when-let, if-with and when-with. And forever, which is an alias for while true, and forv, which is just like for except that you can mutate the iteration variable within the loop. I’m not going to talk about these because they’re pretty straightforward and not incredibly useful, but you should look them up when you get home.

Chapter Seven: Modules and Packages

Eventually you’re going to want to put code in multiple files. Like this:

helpers.janet
(defn shout [x]
  (printf "%s!"
    (string/ascii-upper x)))
main.janet
(use ./helpers)

(shout "hey there")
janet main.janet
HEY THERE!

There are two macros that you’ll reach for when you’re doing this: use and import.

use brings all of the public bindings from one file into another. import brings them in, but with a module prefix:

helpers.janet
(defn shout [x]
  (printf "%s!"
    (string/ascii-upper x)))
main.janet
(import ./helpers)

(helpers/shout "ahoy")
janet main.janet
AHOY!

This is all very easy and intuitive, but it’s worth spending some time talking about precisely what this means and how this works. Let’s notice a few things here:

  1. We specified the module as a path, ./helpers, not a name like helpers.

  2. We didn’t specify a file extension.

  3. We didn’t do anything to “export” the shout function, or declare ourselves as a module, or anything like that.

That last point is sort of interesting, and is somewhere that Janet differs from JavaScript.

In Janet, when we import a source file, we’re really importing the environment of that file. Er, the environment that results from executing that file.

So (use ./helpers) will actually execute the script ./helpers.janet, compute its environment, and then create a bunch of local names in our environment with the same values. Er, but “a bunch” in this case is only one, because the environment happened to only have one identifier. But, you know, in general it’s a bunch.

We can actually split this up into smaller steps: we can compute the module’s environment without creating any names in our own environment. We can just grab a hold of it with require:

helpers.janet
(defn shout [x]
  (printf "%s!"
    (string/ascii-upper x)))
require.janet
(def helpers
  (require "./helpers"))

(pp helpers)
janet require.janet
@{shout @{:doc "(shout x)\n\n"
          :source-map ("helpers.janet" 1 1)
          :value <function shout>}
  :current-file "helpers.janet"
  :macro-lints @[]
  :source "helpers.janet"}

Janet will only execute the scripts we import once, and will cache the returned environments for future use or import or require invocations. But we can pass :fresh true to one of those calls to bypass the module cache, which is useful if we’re programming interactively and want to reload a module without restarting the repl.

But okay. Sometimes when you write a module, you don’t want to export everything. You can also create “private” bindings in an environment, with def-, var-, defn-, and defmacro-.

helpers.janet
(def- upcase string/ascii-upper)

(defn shout [x]
  (printf "%s!"
    (upcase x)))
private.janet
(print "this environment:")
(pp (require "./helpers"))
(print)
(print "creates these bindings:")
(use ./helpers)
(pp (curenv))
janet private.janet
this environment:
@{shout @{:doc "(shout x)\n\n"
          :source-map ("helpers.janet" 3 1)
          :value <function shout>}
  upcase @{:private true
           :source-map ("helpers.janet" 1 1)
           :value <cfunction string/ascii-upper>}
  :current-file "helpers.janet"
  :macro-lints @[]
  :source "helpers.janet"}

creates these bindings:
@{shout @{:private true}
  :args @["private.janet"]
  :current-file "private.janet"
  :macro-lints @[]
  :source "private.janet"}

Okay, so I have a few things to say about this.

First off, we can see that upcase is a normal entry in the environment table, but it has the :private true metadata set. And use and import know to skip any binding with the :private true metadata.

We could make our own voyeur macro that doesn’t check binding metadata, and imports private bindings the same as any others — the privateness is only advisory. This might come in handy if we ever want to write tests for implementation details of our modules, but we will speak no more of it in this book.

Second off, take a closer look at this output:

creates these bindings:
@{shout @{:private true}
  :args @["private.janet"]
  :current-file "private.janet"
  :macro-lints @[]
  :source "private.janet"}

shout is imported as a :private binding, so any module that imports this module will not re-import it. You can change that with (import ./helpers :export true), which will cause it to import bindings without the :private true bit.

But wait: shout is only @{:private true}. This binding has no :value!

This is an unfortunate quirk of Janet’s pp behavior when it prints out tables with prototypes. This isn’t just the table @{:private true}; it also has a prototype that points to the original binding. Instead of copying that table and then setting :private true, use and import create a new table that “inherits” from the original binding:

(use ./helpers)

(def original-shout-binding
  (in (require "./helpers") 'shout))
(def local-shout-binding
  (in (curenv) 'shout))

(pp local-shout-binding)
(pp (table/getproto local-shout-binding))
(pp original-shout-binding)

(print
  (= original-shout-binding
     (table/getproto local-shout-binding)))
janet proto.janet
@{:private true}
@{:value <function 0x600003DDB9A0>}
@{:value <function 0x600003DDB9A0>}
true

We’ll talk more about prototypes in Chapter Eight, but the idea is exactly the same as in JavaScript.

Okay. That’s modules in Janet.

Well, actually, we kind of just scratched the surface. Janet’s module system is implemented mostly in Janet, and it’s very flexible, but you will probably not ever need to interact with it beyond import. But you could, in theory, write a custom module loader to import other things beyond images and source files; you could control the way that Janet resolves modules and searches by file extension. This is mostly useful for writing Janet dialects that you can import from regular files (I myself wrote a version with infix operators when I was doing lots of math stuff), but you could in theory do something more exotic. If you ever actually feel like you need to do advanced module mischief, the official documentation is a perfectly good reference.

So instead of talking more about that, let’s move on to talk about jpm.

jpm is the “Janet Project Manager,” not, as you might have guessed, the Janet Package Manager. But its role is mostly the same as npm or cargo or opam or any other package manager — it just, umm, well…

Janet is a young language, and it has some rough edges. One of those rough edges is jpm. We’re going to talk about it, and it’s going to be fine, but just… lower your expectations slightly before we start.

jpm does two things: it builds projects and it manages dependencies.

Let’s start with the building bit. We’ll write a very useful binary, cat-v, and we’ll build it.

(defn choose [rng selections]
  (def index (math/rng-int rng (length selections)))
  (in selections index))

(defn verbosify [rng word]
  (choose rng
    (case word
      "quick" ["alacritous" "expeditious"]
      "lazy" ["indolent" "lackadaisical" "languorous"]
      "jumps" ["gambols"]
      [word])))

(defn main [&]
  (def rng (math/rng (os/time)))
  (as-> stdin $
    (file/read $ :all)
    (string/split " " $)
    (map (partial verbosify rng) $)
    (string/join $ " ")
    (prin $)))

Note that our cat-v doesn’t actually concatenate anything; it only works over stdin, because, well, that seemed more likely to upset the people who get upset over how other people use cat.

Let’s take it for a spin:

janet main.janet <<<"The quick brown fox jumps over the lazy dog."
The alacritous brown fox gambols over the languorous dog.

Perfect. I can already tell that this is going to be very useful, so let’s set about packaging it for the rest of the world.

To do this, all we have to do is create a project.janet file.

A project.janet file is basically a combination of metadata and Makefile-style tasks, in script form. jpm will run your project.janet file, which will produce an environment of metadata, as well as registering tasks (as a side effect).

Janet’s task runner DSL is very simple:

project.janet
(task "say-hello" ["get-ready"]
  (print "hello"))

(task "get-ready" []
  (print "getting ready..."))
jpm run say-hello
getting ready...
hello

That’s a valid project file, although in practice they won’t really look like that. They’ll look like this:

project.janet
(declare-project
  :name "cat-v"
  :description "cat --verbose"
  :dependencies [])

(declare-executable
 :name "cat-v"
 :entry "main.janet")

declare-project and declare-executable are built-in functions that will register default tasks like build and install, as well as setting all the correct metadata variables that jpm likes.

jpm build
generating executable c source build/cat-v.c from main.janet...
compiling build/cat-v.c to build/build___cat-v.o...
linking build/cat-v...

So that actually produced a native binary that we can run and distribute just like any other executable:

build/cat-v <<<"the quick brown fox"
the alacritous brown fox
file build/cat-v
build/cat-v: Mach-O 64-bit executable arm64
otool -L build/cat-v
build/cat-v:
  /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)
du -h build/cat-v
684K build/cat-v

684 kibibytes is not small, for such a trivial program. Maybe we can make it smaller? Let’s see how it was compiled.

jpm build --verbose

Oh, nothing happened. That’s unfortunate.

Nothing happened because our input files didn’t change, and even though we asked for a verbose build, jpm won’t actually perform a rebuild if the generated targets have a newer mtime than the sources. So:

touch -m main.janet

And then:

jpm build --verbose
generating executable c source build/cat-v.c from main.janet...
compiling build/cat-v.c to build/build___cat-v.o...
cc -c build/cat-v.c -DJANET_BUILD_TYPE=release -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -O2 -o build/build___cat-v.o
linking build/cat-v...
cc -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -O2 -o build/cat-v build/build___cat-v.o /usr/local/lib/libjanet.a -lm -ldl -pthread

But alright. We can see that it’s building with -O2. Let’s try blindly passing -Os and see if that makes any difference:

jpm clean; jpm build --verbose --cflags="-Os"
Deleted build directory build/
generating executable c source build/cat-v.c from main.janet...
compiling build/cat-v.c to build/build___cat-v.o...
cc -c build/cat-v.c -DJANET_BUILD_TYPE=release -Os -I/usr/local/include/janet -I/usr/local/lib/janet -O2 -o build/build___cat-v.o
linking build/cat-v...
cc -Os -I/usr/local/include/janet -I/usr/local/lib/janet -O2 -o build/cat-v build/build___cat-v.o /usr/local/lib/libjanet.a -lm -ldl -pthread

Oh dear. That wasn’t what we meant. We can see that specifying our own --cflags got rid of the -std=c99 flag, but did not get rid of -O2.

In fact there is no way to get rid of -O2 completely. jpm will always pass an -O level, and we control which optimization level by passing the --optimize flag to jpm:

jpm clean; jpm build --verbose --optimize=3
Deleted build directory build/
generating executable c source build/cat-v.c from main.janet...
compiling build/cat-v.c to build/build___cat-v.o...
cc -c build/cat-v.c -DJANET_BUILD_TYPE=release -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -O3 -o build/build___cat-v.o
linking build/cat-v...
cc -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -O3 -o build/cat-v build/build___cat-v.o /usr/local/lib/libjanet.a -lm -ldl -pthread

And in fact jpm doesn’t have a way to build with -Os:

jpm clean; jpm build --verbose --optimize=s
Deleted build directory build/
error: option :optimize, expected integer, got "s"
  in errorf [boot.janet] (tailcall) on line 171, column 3
  in setup [/usr/local/lib/janet/jpm/cli.janet] on line 50, column 26
  in run [/usr/local/lib/janet/jpm/cli.janet] (tailcall) on line 84, column 15
  in run-main [boot.janet] on line 3790, column 16
  in cli-main [boot.janet] on line 3935, column 17

It can only build with -O0 through -O3. Annoying.

But, fortunately, we don’t have to use jpm to build this. We can build it ourselves.

project.janet
(declare-project
  :name "cat-v"
  :description "cat --verbose"
  :dependencies [])

(declare-executable
 :name "cat-v"
 :entry "main.janet"
 :no-compile true)
jpm clean; jpm build --verbose
Deleted build directory build/
generating executable c source build/cat-v.c from main.janet...

Okay. This created a very interesting file:

cat-v.c
#include <janet.h>
static const unsigned char bytes[] = {215, 0, 205, 0, 152, 0, 0, 7, 0, 0, 205, 127, 255, 255, 255, 12, 34, 206, 4, 109, 97, 105, 110, 206, 10, 109, 97, 105, 110, 46, 106, 97, 110, 101, 116, 216, 7, 111, 115, 47, 116, 105, 109, 101, 216, 8, 109, 97, 116, 104, 47, 114, 110, 103, 216, 5, 115, 116, 100, 105, 110, 208, 3, 97, 108, 108, 216, 9, 102, 105, 108, 101, 47, 114, 101, 97, 100, 206, 1, 32, 216, 12, 115, 116, 114, 105, 110, 103, 47, 115, 112, 108, 105, 116, 215, 0, 205, 0, 152, 0, 0, 10, 2, 2, 2, 7, 24, 206, 9, 118, 101, 114, 98, 111, 115, 105, 102, 121, 218, 2, 206, 5, 113, 117, 105, 99, 107, 210, 2, 0, 206, 10, 97, 108, 97, 99, 114, 105, 116, 111, 117, 115, 206, 11, 101, 120, 112, 101, 100, 105, 116, 105, 111, 117, 115, 206, 4, 108, 97, 122, 121, 210, 3, 0, 206, 8, 105, 110, 100, 111, 108, 101, 110, 116, 206, 13, 108, 97, 99, 107, 97, 100, 97, 105, 115, 105, 99, 97, 108, 206, 10, 108, 97, 110, 103, 117, 111, 114, 111, 117, 115, 206, 5, 106, 117, 109, 112, 115, 210, 1, 0, 206, 7, 103, 97, 109, 98, 111, 108, 115, 215, 0, 205, 0, 152, 0, 0, 6, 2, 2, 2, 1, 8, 206, 6, 99, 104, 111, 111, 115, 101, 218, 2, 216, 12, 109, 97, 116, 104, 47, 114, 110, 103, 45, 105, 110, 116, 44, 2, 0, 0, 61, 3, 1, 0, 48, 0, 3, 0, 42, 5, 0, 0, 51, 4, 5, 0, 25, 3, 4, 0, 56, 5, 1, 3, 3, 5, 0, 0, 1, 1, 1, 32, 0, 14, 0, 14, 0, 14, 0, 3, 1, 3, 0, 3, 44, 2, 0, 0, 42, 5, 0, 0, 35, 4, 1, 5, 28, 4, 3, 0, 42, 3, 1, 0, 26, 16, 0, 0, 42, 7, 2, 0, 35, 6, 1, 7, 28, 6, 3, 0, 42, 5, 3, 0, 26, 10, 0, 0, 42, 9, 4, 0, 35, 8, 1, 9, 28, 8, 3, 0, 42, 7, 5, 0, 26, 4, 0, 0, 47, 1, 0, 0, 67, 9, 0, 0, 25, 7, 9, 0, 25, 5, 7, 0, 25, 3, 5, 0, 48, 0, 3, 0, 42, 4, 6, 0, 52, 4, 0, 0, 5, 1, 2, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 0, 5, 4, 7, 0, 7, 191, 252, 5, 0, 5, 0, 5, 191, 255, 3, 0, 3, 0, 3, 216, 7, 112, 97, 114, 116, 105, 97, 108, 216, 3, 109, 97, 112, 216, 11, 115, 116, 114, 105, 110, 103, 47, 106, 111, 105, 110, 216, 4, 112, 114, 105, 110, 44, 0, 0, 0, 42, 2, 0, 0, 51, 1, 2, 0, 47, 1, 0, 0, 42, 3, 1, 0, 51, 2, 3, 0, 25, 1, 2, 0, 42, 3, 2, 0, 42, 4, 3, 0, 48, 3, 4, 0, 42, 5, 4, 0, 51, 4, 5, 0, 25, 3, 4, 0, 42, 4, 5, 0, 48, 4, 3, 0, 42, 5, 6, 0, 51, 4, 5, 0, 25, 3, 4, 0, 42, 4, 7, 0, 48, 4, 1, 0, 42, 5, 8, 0, 51, 4, 5, 0, 48, 4, 3, 0, 42, 6, 9, 0, 51, 5, 6, 0, 25, 3, 5, 0, 42, 4, 5, 0, 48, 3, 4, 0, 42, 5, 10, 0, 51, 4, 5, 0, 25, 3, 4, 0, 47, 3, 0, 0, 42, 4, 11, 0, 52, 4, 0, 0, 13, 1, 1, 22, 0, 22, 0, 12, 0, 12, 0, 12, 0, 3, 1, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3};

const unsigned char *janet_payload_image_embed = bytes;
size_t janet_payload_image_embed_size = sizeof(bytes);

int main(int argc, const char **argv) {

#if defined(JANET_PRF)
    uint8_t hash_key[JANET_HASH_KEY_SIZE + 1];
#ifdef JANET_REDUCED_OS
    char *envvar = NULL;
#else
    char *envvar = getenv("JANET_HASHSEED");
#endif
    if (NULL != envvar) {
        strncpy((char *) hash_key, envvar, sizeof(hash_key) - 1);
    } else if (janet_cryptorand(hash_key, JANET_HASH_KEY_SIZE) != 0) {
        fputs("unable to initialize janet PRF hash function.\n", stderr);
        return 1;
    }
    janet_init_hash_key(hash_key);
#endif

    janet_init();

    /* Get core env */
JanetTable *env = janet_core_env(NULL);
JanetTable *lookup = janet_env_lookup(env);
JanetTable *temptab;
int handle = janet_gclock();    /* Unmarshal bytecode */
    Janet marsh_out = janet_unmarshal(
      janet_payload_image_embed,
      janet_payload_image_embed_size,
      0,
      lookup,
      NULL);

    /* Verify the marshalled object is a function */
    if (!janet_checktype(marsh_out, JANET_FUNCTION)) {
        fprintf(stderr, "invalid bytecode image - expected function.");
        return 1;
    }
    JanetFunction *jfunc = janet_unwrap_function(marsh_out);

    /* Check arity */
    janet_arity(argc, jfunc->def->min_arity, jfunc->def->max_arity);

    /* Collect command line arguments */
    JanetArray *args = janet_array(argc);
    for (int i = 0; i < argc; i++) {
        janet_array_push(args, janet_cstringv(argv[i]));
    }

    /* Create enviornment */
    temptab = env;
    janet_table_put(temptab, janet_ckeywordv("args"), janet_wrap_array(args));
    janet_gcroot(janet_wrap_table(temptab));

    /* Unlock GC */
    janet_gcunlock(handle);

    /* Run everything */
    JanetFiber *fiber = janet_fiber(jfunc, 64, argc, argc ? args->data : NULL);
    fiber->env = temptab;
#ifdef JANET_EV
    janet_gcroot(janet_wrap_fiber(fiber));
    janet_schedule(fiber, janet_wrap_nil());
    janet_loop();
    int status = janet_fiber_status(fiber);
    janet_deinit();
    return status;
#else
    Janet out;
    JanetSignal result = janet_continue(fiber, janet_wrap_nil(), &out);
    if (result != JANET_SIGNAL_OK && result != JANET_SIGNAL_EVENT) {
      janet_stacktrace(fiber, out);
      janet_deinit();
      return result;
    }
    janet_deinit();
    return 0;
#endif
}

It’s short; I recommend just reading through it. We can notice a few things here:

  1. jpm embeds the marshaled image of our Janet program as literal bytes inside this source file, which is smart and cool.
  2. jpm only marshals the main function, not the entire environment of our program.
  1. The Janet event loop is optional, and you can run fine without it. This is useful when embedding Janet in a larger program, which we’ll talk more about in Chapter Ten.

Once we have that file, we can build it ourselves:

project.janet
(declare-project
  :name "cat-v"
  :description "cat --verbose"
  :dependencies [])

(declare-executable
 :name "cat-v"
 :entry "main.janet"
 :no-compile true)

(task "compile" ["build"]
  (shell "cc -c build/cat-v.c -DJANET_BUILD_TYPE=release -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -Os -o build/build___cat-v.o"))

(task "link" ["compile"]
  (shell "cc -std=c99 -I/usr/local/include/janet -I/usr/local/lib/janet -Os -o build/cat-v build/build___cat-v.o /usr/local/lib/libjanet.a -lm -ldl -pthread"))
jpm clean; jpm run link
Deleted build directory build/
generating executable c source build/cat-v.c from main.janet...
du -h build/cat-v
684K build/cat-v

Well, that made no difference, which isn’t surprising, since -Os is basically identical to -O2. But this was a farce anyway; I don’t really care about the binary size. In OCaml this would be, like, half a gig easy.

But we learned how to add custom build tasks to a project file. This isn’t really a good way to build a native project, because we’re hardcoding paths and compilers and options — it is less portable now — but it is a way to do it that might come in handy if you want to make a more complicated build process.

Also, we really should have written something like this:

(shell "cc"
  "-c" "build/cat-v.c"
  "-DJANET_BUILD_TYPE=release"
  "-std=c99"
  "-I/usr/local/include/janet"
  "-I/usr/local/lib/janet"
  "-Os"
  "-o" "build/build___cat-v.o")

But jpm will politely split the first argument to shell for us.

Alright. That’s all we’re going to say about the first half of jpm: building projects. Now let’s talk about managing dependencies.

In the process of writing cat-v, we’ve implemented an extremely interesting and broadly useful function that we could factor into its own library.

(defn verbosify [rng word]
  (choose rng
    (case word
      "quick" ["alacritous" "expeditious"]
      "lazy" ["indolent" "lackadaisical" "languorous"]
      "jumps" ["gambols"]
      [word])))

Yep; that’s the one.

We can move this into its own directory, and package it is as a project…

~/src/verbosify/verbosify.janet
(defn- choose [rng selections]
  (def index (math/rng-int rng (length selections)))
  (in selections index))

(defn verbosify [rng word]
  (choose rng
    (case word
      "quick" ["alacritous" "expeditious"]
      "lazy" ["indolent" "lackadaisical" "languorous"]
      "jumps" ["gambols"]
      [word])))
~/src/verbosify/project.janet
(declare-project
  :name "verbosify"
  :description "a very useful library"
  :dependencies [])

(declare-source
 :source "verbosify.janet")

Note that we use declare-source instead of declare-executable, because this is a library. And now we just need to add this library as a dependency to our cat-v project…

Well, actually, we can’t quite yet. jpm only knows how to install dependencies from git repositories, so we’ll need to create one first:

git init
Initialized empty Git repository in /Users/ian/src/verbosify/.git/
git add .
git commit -m 'make a very useful library'
[master (root-commit) 0a4e386] make a very useful library
 2 files changed, 18 insertions(+)
 create mode 100644 project.janet
 create mode 100644 verbosify.janet

Once that’s done, we can actually add the dependency to our cat-v project:

project.janet
(declare-project
  :name "cat-v"
  :description "cat --verbose"
  :dependencies ["file:///Users/ian/src/verbosify"])

(declare-executable
 :name "cat-v"
 :entry "main.janet")

Now we want to ask jpm to install our declared dependencies, which we can do with jpm deps. But jpm deps will actually install our dependencies to a global package repository, not to a “virtual environment” or “sandbox” or something specific to this project. To install to a local directory, we actually have to call jpm deps --local:

jpm deps -l
Initialized empty Git repository in /Users/ian/src/cat-v/jpm_tree/lib/.cache/git__file____Users_ian_src_verbosify/.git/
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (4/4), 536 bytes | 536.00 KiB/s, done.
From file:///Users/ian/src/verbosify
 * [new branch]      master     -> origin/master
From file:///Users/ian/src/verbosify
 * branch            HEAD       -> FETCH_HEAD
HEAD is now at 0a4e386 make a very useful library
generating /Users/ian/src/cat-v/jpm_tree/lib/.manifests/verbosify.jdn...
Installed as 'verbosify'.
copying verbosify.janet to /Users/ian/src/cat-v/jpm_tree/lib...

Now this created a jpm_tree/ directory for us, which contains the following files:

tree jpm_tree
jpm_tree
├── bin
├── lib
│   └── verbosify.janet
└── man

Note that this does not put each dependency in its own directory like you would see in node_modules/. It just throws all of the declared source files in a single lib/ directory. We happened to name our source file verbosify.janet, following a Janet convention, but if we had chosen a generic name like main.janet or src/init.janet or something, we would be at risk of a conflict.

This is weird! In most languages you put your source files in a directory called src/ or lib/ or something, but most Janet libraries usually put them in a directory named after the project, because of the default way that jpm merges files from multiple projects together like this. But we can override that if we want, to divorce our internal directory structure from the final installation structure:

(declare-source
  :source "src/init.janet"
  :prefix "verbosify")

That will cause our clients to install our entry point as jpm_tree/lib/verbosify/init.janet.

Alright. Now if we want to run our cat-v program, we have to run it with jpm -l. Because if we just run janet:

janet main.janet
error: could not find module verbosify:
    /usr/local/lib/janet/verbosify.jimage
    /usr/local/lib/janet/verbosify.janet
    /usr/local/lib/janet/verbosify/init.janet
    /usr/local/lib/janet/verbosify.so
  in require-1 [boot.janet] on line 2900, column 20
  in import* [boot.janet] (tailcall) on line 2939, column 15

It will try to look in the global include path. Instead we want to run:

jpm -l janet main.janet <<<"quick brown fox"
alacritous brown fox

Or we could directly set the environment variable JANET_PATH, which is how Janet decides where to look for modules:

JANET_PATH=jpm_tree/lib janet main.janet <<<"quick brown fox"
expeditious brown fox

In fact that’s all jpm -l janet does. Well, that and adding jpm_tree/bin to our PATH:

SCRIPT='(each [k v] (-> (os/environ) pairs sort) (print k "=" v))'
$ diff -U0 -L janet <(janet -e $SCRIPT) -L jpm <(jpm -l janet -e $SCRIPT)
--- janet
+++ jpm
@@ -6,0 +7 @@
+JANET_PATH=/Users/ian/src/cat-v/jpm_tree/lib
@@ -19 +20 @@
-PATH=...
+PATH=/Users/ian/src/cat-v/jpm_tree/bin:...
@@ -41 +42 @@
-_=/usr/local/bin/janet
+_=/usr/local/bin/jpm

Alright. So now we have our app, and we have a dependency in a separate library. cat-v is starting to look like a real project.

But let’s expand its vocabulary a little. Let’s add a few more words to our verbosify function:

~/src/verbosify/verbosify.janet
(defn- choose [rng selections]
  (def index (math/rng-int rng (length selections)))
  (in selections index))

(defn verbosify [rng word]
  (choose rng
    (case word
      "quick" ["alacritous" "expeditious"]
      "lazy" ["indolent" "lackadaisical" "languorous"]
      "jumps" ["gambols"]
      "dog" ["canine"]
      "fox" ["vulpine"]
      [word])))

And then we’ll update our dependencies…

jpm -l deps
From file:///Users/ian/src/verbosify
 * branch            HEAD       -> FETCH_HEAD
HEAD is now at 0a4e386 make a very useful library
removing /Users/ian/src/cat-v/jpm_tree/lib/verbosify.janet
removing manifest /Users/ian/src/cat-v/jpm_tree/lib/.manifests/verbosify.jdn
Uninstalled.
generating /Users/ian/src/cat-v/jpm_tree/lib/.manifests/verbosify.jdn...
Installed as 'verbosify'.
copying verbosify.janet to /Users/ian/src/cat-v/jpm_tree/lib...

And test it out:

jpm -l janet main.janet <<<"the quick brown fox jumps over the lazy dog"
the expeditious brown fox gambols over the lackadaisical dog

And it didn’t work!

It didn’t work because our project has a dependency on a git repository, and we didn’t actually commit these changes yet. jpm just pulls the latest commit, and jpm has no idea that our working directory is dirty.

What can we do about this? Unfortunately there’s no right answer, but we have a few options:

  1. Commit our changes and then run jpm -l deps again.
  1. Symlink jpm_tree/lib/verbosify.janet to the actual source.
  1. (use ../verbosify/verbosify) instead of involving jpm at all.
  1. Use jpm install instead of jpm deps.
  1. Fork jpm and add support for local file paths.

I mean, that’s probably the right answer. But we’re not going to do that right now. We have some more things to cover.

In real life, when we add dependencies, we’ll want to add dependencies on specific versions. For example, we don’t really want to say that we depend on verbosify, we want to say that we depend on verbosify-1.1.0. Otherwise future changes to the verbosify library could break our cat-v app, and we don’t want that.

But jpm doesn’t have a concept of semver or version constraints, nor does it have a package index of specific versions. All dependencies are git repos, and jpm only lets us specify version constraints in the form of a version or tag:

(declare-project
  :name "cat-v"
  :description "cat --verbose"
  :dependencies [{:url "file:///Users/ian/src/verbosify"
                  :tag "v1.1.0"}])

Despite the name, :tag can either be the name of a tag or a revision hash. Or a branch name. Anything that you could ask git for, really.

Now, it’s always a good idea to depend on specific versions of our dependencies, but that might not be sufficient to ensure that our project’s dependencies are reproducible. Because even if we lock all of our dependencies to specific revisions, those libraries might have dependencies of their own, and they might not be as fastidious as we are about how they specify them.

Fortunately, jpm gives us a way to freeze all of a project’s transitive dependencies, by making a lockfile:

jpm -l make-lockfile
created lockfile.jdn

That will write down specific revisions not just for your immediate dependencies, but for the whole closure of transitive dependencies, ensuring that changes to random great-grand-dependencies won’t suddenly break our business-critical cat-v application.

Note that once we have a lockfile, we have to call jpm -l load-lockfile to install dependencies — jpm deps ignores the file.

Alright. I think that’s all that I have to say about modules and packages and jpm, but I’d like to close this chapter by talking a little bit about the Janet package ecosystem.

The Janet package ecosystem is… young.

There is no equivalent of the npm registry; packages are just Git repos floating around on the internet.

Except that there is a registry, sort of, of packages that you can install with short, abbreviated names. jpm install sqlite3, for example, will (globally) install a package called sqlite3, and that name registry has to live somewhere.

In fact, it lives in this file right here:

https://github.com/janet-lang/pkgs/blob/master/pkgs.janet

It’s not a long list!

But that’s just the packages that have special short names; there are plenty of other packages that never bothered to make a PR against that registry. It’s not an exhaustive list by any means.

If you’re looking for a third-party package, another way to find it is by searching the website Powered by Janet. It’s still a small ecosystem! But not quite as small as the official registry implies.

There is one package in particular that’s worth mentioning now: Spork.

Spork is a monolithic, first-party “contrib” module. It’s a bit of a grab bag of lots of things that don’t really fit into the standard library: Spork has a JSON parser, a code formatter, helper functions for working with generators, a UTF-8 parser that doesn’t actually conform to the UTF-8 specification… there are dozens of packages in Spork, of varying scope and quality.

Of course you can depend on the Spork mega-library and only use a small part of it, but by doing so you’re locked into a single version of Spork for all of its components. If you want to upgrade to a newer Spork to pick up changes to the spork/zip module, but you want to keep running an old version of spork/argparse, then… you can’t. Sorry.

This monolithicity is also annoying if you’re targeting WebAssembly, because Spork contains native modules that won’t build with Emscripten. So even if you just want to use some of the pure Janet parts of Spork, you can’t, because adding a dependency on Spork will break your build.

I don’t know why Spork exists as a single library, rather than a collection of many. If I had to guess I’d say that it’s because jpm doesn’t have many affordances for managing project dependencies easily. But I don’t know. You should be aware of Spork, but I would caution you to be wary of it.

Chapter Eight: Tables and Polymorphism

Janet tables are very similar to JavaScript objects, and they fill the same roles in the language: you can use a table as a primitive associative data structure, or you can use tables to emulate “instances” of a “class.” As a matter of fact, Janet tables are so similar to JavaScript objects that I’m not going to try to explain them from first principles — instead, I’m just going to describe how they differ.

First off, the big one: keys of a JavaScript object must be strings, while keys of a Janet table can be any value. Well, almost any value. You can’t use NaN as a key, which makes perfect sense, because NaN is not equal to itself. And you can’t use nil as a key, because, as we saw in Chapter Six, returning nil is how next indicates that there aren’t any more keys.

However! Unlike JavaScript, and literally every other language except Lua, Janet does not let you store nil as a value of a table. It’s just not allowed:

repl:1:> {:foo 123 :bar nil}
{:foo 123}

It’s not an error! It’s just silently dropped.

This has some convenient side effects: it means that you can check if a key exists in a table by doing (nil? (foo :key)) — there is no explicit (has? foo :key) — and this plays nicely with if-let and when-let.

It also means that Janet doesn’t have a function to delete a key from a table. Instead, to remove a key, you set its value to nil:

repl:1:> (def foo @{:x 1})
@{:x 1}
repl:2:> (set (foo :x) nil)
nil
repl:3:> foo
@{}

So this is cute, I guess, but honestly this is one of my least favorite things about Janet. And I realize that distinguishing “key not found” from “key found but set to nil” is a problem that every dynamic language solves in a different way, and every approach has tradeoffs, and now is not the time to compare and contrast them or voice my opinions about the clearly correct solution (Python’s) (don’t @ me) so let’s move on to talking about prototypes.

Unlike JavaScript, the prototype of a table is not a secret hidden entry — there’s no equivalent of obj.__proto__. A table’s prototype is a completely separate field that you can only retrieve with the table/getproto or struct/getproto functions.

Also unlike JavaScript, tables in Janet have no default prototype, which means there are no methods common to all tables. Instead, common functionality that you would find on JavaScript’s Object.prototype exist as functions — so the Janet way to write {}.toString() is (string {}). This sidesteps a whole class of JavaScript problems, including everyone’s favorite (recently retired!) Object.prototype.hasOwnProperty.call(obj, 'key').

Janet tables have no equivalent of JavaScript object properties — there are no getters or setters in Janet, and table entries have no metadata like JavaScript’s enumerable flag. Instead, when you enumerate the keys of a table using next, you always enumerate the keys of that specific table, and not any of the keys from its prototype.

Finally, we should talk about methods.

Like JavaScript, “methods” in Janet are just functions. Unlike JavaScript, Janet methods actually make sense. There is no secret, magic this argument that works completely differently than every other argument. this is conventionally spelled self in Janet, but it’s just a normal positional argument and you can call it whatever you want. Let’s take a look at a table with a “method:”

repl:1:> (def table @{:get-foo (fn [self] (self :_foo)) :_foo 123 })
@{:_foo 123 :get-foo <function 0x600003B62BE0>}

We can look up the “method” like any other key:

repl:2:> (table :get-foo)
<function 0x600003B62BE0>

And we can call the method, just like any other function:

repl:3:> ((table :get-foo))
error: <function 0x600003B62BE0> called with 0 arguments, expected 1
  in _thunk [repl] (tailcall) on line 3, column 1

And of course this is an error, because :get-foo is a function that takes one argument. We have to pass it its self argument:

repl:4:> ((table :get-foo) table)
123

But it’s very cumbersome to repeat table like that, so Janet has a shorthand to invoke functions in one shot like this:

repl:5:> (:get-foo table)
123

When we “call” a keyword like this, it looks up the function on the table and then calls the function with the table as its first arguments — plus any remaining arguments. So the following two lines are exactly equivalent:

(:method table x y)
((table :method) table x y)

So: now that we understand how tables work, let’s take a look at how we could actually use them to simulate a sort of object-oriented programming.

(def counter-prototype
  @{:add (fn [self amount] (+= (self :_count) amount))
    :increment (fn [self] (:add self 1))
    :count (fn [self] (self :_count))})

(defn new-counter []
  (table/setproto @{:_count 0} counter-prototype))

(def counter (new-counter))

(print (:count counter))
(:increment counter)
(print (:count counter))
(:add counter 3)
(print (:count counter))
janet counter.janet
0
1
4

Note that this is a little more verbose than it would look in JavaScript. We have to define the prototype explicitly, along with a separate constructor/“factory” function that’s in charge of hooking it up correctly. Prototypes and constructor functions aren’t bundled together in Janet like they are in JavaScript, although we could bundle them together if we wanted to:

(def Counter
  (let [proto @{:add (fn [self amount] (+= (self :_count) amount))
                :increment (fn [self] (:add self 1))
                :count (fn [self] (self :_count))}]
    (fn [] (table/setproto @{:_count 0} proto))))

(def counter (Counter))
(print (:count counter))
(:increment counter)
(print (:count counter))
(:add counter 3)
(print (:count counter))

Or we could make an explicit first-class class, distinct from the constructor:

(def Counter
    {:proto @{:add (fn [self amount] (+= (self :_count) amount))
              :increment (fn [self] (:add self 1))
              :count (fn [self] (self :_count))}
     :new (fn [self]
       (table/setproto @{:_count 0} (self :proto)))})

(def counter (:new Counter))
(print (:count counter))
(:increment counter)
(print (:count counter))
(:add counter 3)
(print (:count counter))

Or we could write a macro that lets us write something like ES6’s class syntax:

(class Counter
  constructor (fn [self] (set (self :_count) 0))
  add (fn [self amount] (+= (self :_count) amount))
  increment (fn [self] (:add self 1))
  count (fn [self] (self :_count)))

(def counter (Counter))
(print (:count counter))
(:increment counter)
(print (:count counter))
(:add counter 3)
(print (:count counter))

We can do anything we want! There are no rules here, and there aren’t even really idiomatic conventions for this sort of thing in Janet. Object-oriented programming just isn’t very common — it’s far more common to write modules full of functions than it is to write tables full of methods. But if you want to write in an object-oriented style, pick the style you like best — Janet only gives you the barest of building blocks to work with.

Now, why might we want to do any of this? What’s the point of object-oriented programming in the first place?

The point is polymorphism, which means defining different sorts of values that have the same interface. For example:

(defn print-all [readable]
  (print (:read readable :all)))

(with [file (file/open "readme.txt")]
  (print-all file))

(with [stream (net/connect "janet.guide" 80)]
  (:write stream "GET / HTTP/1.1\r\n\r\n")
  (print-all stream))

file/open returns a core/file abstract type, and net/connect returns a core/stream abstract type. But both of these types have a method called :close, so they both work with the with macro. with executes its body and then calls (:close file) or (:close stream), and it works on any value that has a :close method. You might call it a polymorphic macro, except that that term is weird and misleading and let’s not call it that. It’s just a regular macro that happens to expand to code that exploits runtime polymorphism.

Files and streams also both have a method called :read, so we can pass them to the polymorphic function print-all. print-all works with any value that has a :read method — files, stream, or even custom types that we define ourselves, be they tables or abstract types.

Tables and abstract types are the only polymorphic values in Janet. We can never add a :read method to a tuple, for instance, even if we really want to. And tables are pretty limited in what you can actually do with them: you can define whatever methods you want, but there aren’t very many functions in the Janet standard library that will try to call methods.

In fact the only built-in functions that you can overload with a method are the “math operator” functions:

repl:1:> (def addable @{:+ (fn [a b] (printf "adding %q %q" a b) 10)})
@{:+ <function 0x600002E4C0C0>}
repl:2:> (+ addable "foo")
adding @{:+ <function 0x600002E4C0C0>} "foo"
10

And the “bitwise operator” functions:

repl:1:> (bxor @{:^ (fn [a b] a)} nil)
@{:^ <function 0x600002E57960>}

And the “polymorphic compare” function called compare, which you can override with the :compare method:

repl:1:> (def compare-on-value (fn [a b] (compare (a :_value) (b :_value))))
<function 0x600000A4E260>
repl:2:> (def box-value (fn [value] @{:_value value :compare compare-on-value}))
<function 0x600000A584E0>
repl:3:> (compare (box-value 1) (box-value 2))
-1
repl:4:> (compare (box-value 2) (box-value 2))
0
repl:5:> (compare (box-value 3) (box-value 2))
1

But note that the normal comparison operators, like < and = and >= do not use the polymorphic compare function.

repl:6:> (= (box-value 2) (box-value 2))
false

There are polymorphic versions of the standard comparators that you can use instead:

repl:7:> (compare= (box-value 2) (box-value 2))
true

Which are useful if you want to, for example, sort values like these, since the default sort functions also do not use polymorphic comparison by default:

repl:8:> (sort @[(box-value 1) (box-value 2)])
@[@{:_value 2 :compare <function 0x600000A4E260>} @{:_value 1 :compare <function 0x600000A4E260>}]
repl:9:> (sort @[(box-value 1) (box-value 2)] compare<)
@[@{:_value 1 :compare <function 0x600000A4E260>} @{:_value 2 :compare <function 0x600000A4E260>}]

In fact the only built-in functions that use polymorphic compare are zero?, pos?, neg?, one?, even?, and odd?, for some reason.

But note that if you’re defining an abstract type, you can override the standard comparison functions — the polymorphic compare interface only matters for tables, and only for these few functions. It’s odd.

Back to operators: operators can also be overloaded in the “right-hand” direction:

repl:1:> (def right-addable @{:r+ (fn [a b] (printf "adding %q %q" a b) 10)})
@{:r+ <function 0x600002E57620>}
repl:2:> (+ "foo" right-addable)
adding @{:r+ <function 0x600002E57620>} "foo"
10

But, like, we’re entering the realm of Janet trivia now. In practice you will probably never make tables that override any of the default operators, because it just isn’t very useful. You could use it to implement something like a vector:

points.janet
(def Point (do
  (var new nil)
  (def proto
    {:+ (fn [{:x x1 :y y1} {:x x2 :y y2}]
          (new (+ x1 x2) (+ y1 y2)))})
  (set new (fn [x y]
    (struct/with-proto proto :x x :y y)))))

(pp (+ (Point 1 2) (Point 3 4)))
janet points.janet
@{:x 4 :y 6}

Which is not useless, but it’s unlikely that we’d want to do this in practice — allocating a struct isn’t free; if we care about performance we’d probably write this as an abstract type and save a few bytes for every point. And if we don’t care about performance, it’s more convenient to just write something like (+ [1 2] [3 4]) and redefine + to work with that.

So, to recap: math operators, bitwise operators, and compare. Those are the only things that tables can override. Custom to-string? Nope. And if we’re trying to make our own data structure, we can’t overload the definition of length, nor, as we saw in Chapter Six, can we define a custom next. This limits the usefulness of tables-as-custom-types quite a bit, and in practice if we’re trying to make our own types, we’ll probably wind up writing abstract types instead. They’re much more flexible, and give us a lot more control over how our values work at runtime.

So we’ll talk about how to do that soon.

Actually, we’ll talk about how to do that right now.

Chapter Nine: Xenofunctions

One thing that I really wish Janet had is a native set type. I love sets; they’re often the right way to model a problem. But usually when I write Janet code that wants a set, I have to write a table instead, and just set all the keys to true:

repl:1:> (def cities-visited @{})
@{}
repl:3:> (set (cities-visited "NYC") true)
true
repl:4:> (set (cities-visited "LA") true)
true
repl:5:> (set (cities-visited "NYC") true)
true
repl:6:> cities-visited
@{"LA" true "NYC" true}

Which is okay! This is a pretty good way to hack up a set, but it’s not perfect. We can’t iterate over the elements with each; we have to use eachk, which means that other functions that act on iterable structures, like filter or map, won’t work on our “set.” This is… slightly annoying, but ultimately fine.

But! Wouldn’t it be nice if there were a proper, built-in set type that we could use? Don’t you think that would be useful? Just say yes; it’ll get the actual chapter started.

So, as we saw in the last chapter, we can’t implement a proper set as a pure Janet type, because we can’t override equality or iteration. We’re going to have to reach for some C code instead, which is actually quite easy. Writing native Janet modules in C isn’t some weird esoteric difficult thing; it’s a straightforward and even rather pleasant adventure.

In order to get started, we’ll need to declare a native module. We’ll start small:

project.janet
(declare-project :name "set")

(declare-native
  :name "set"
  :source ["set.c"])
set.c
#include <janet.h>

static Janet cfun_hello(int32_t argc, Janet *argv) {
  janet_fixarity(argc, 0);
  printf("hello world\n");
  return janet_wrap_nil();
}

static JanetReg cfuns[] = {
  {"hello", cfun_hello, "(hello)\n\nprints hello"},
  {NULL, NULL, NULL}
};

JANET_MODULE_ENTRY(JanetTable *env) {
  janet_cfuns(env, "set", cfuns);
}

Sixteen lines! That’s all it takes to write a native module. I know some of the lines don’t make sense yet, but we’ll fix that soon. First, let’s see how easy it is to use this:

main.janet
(import set)

(set/hello)

Now, we can’t just run this with janet main.janet. We’ll need to build and install the native module first:

jpm install --local --verbose
cc -c set.c -DJANET_BUILD_TYPE=release -std=c99 -I/usr/local/include/janet -I/Users/ian/src/janet-set/jpm_tree/lib -O2 -fPIC -o build/set.o
generating meta file build/set.meta.janet...
generating /Users/ian/src/janet-set/jpm_tree/lib/.manifests/set.jdn...
cc -std=c99 -I/usr/local/include/janet -I/Users/ian/src/janet-set/jpm_tree/lib -O2 -o build/set.so build/set.o -shared -undefined dynamic_lookup -lpthread
cc -c set.c -DJANET_BUILD_TYPE=release -DJANET_ENTRY_NAME=janet_module_entry_set -std=c99 -I/usr/local/include/janet -I/Users/ian/src/janet-set/jpm_tree/lib -O2 -o build/set.static.o
ar rcs build/set.a build/set.static.o
Installed as 'set'.
copying build/set.so to /Users/ian/src/janet-set/jpm_tree/lib/...
cp -rf build/set.so /Users/ian/src/janet-set/jpm_tree/lib/
copying build/set.meta.janet to /Users/ian/src/janet-set/jpm_tree/lib/...
cp -rf build/set.meta.janet /Users/ian/src/janet-set/jpm_tree/lib/
copying build/set.a to /Users/ian/src/janet-set/jpm_tree/lib/...
cp -rf build/set.a /Users/ian/src/janet-set/jpm_tree/lib/

Now we can run it:

jpm -l janet main.janet
hello world

And see that it worked.

But how did it work?

Well, if we look at what this actually installed, we can see three files:

ls -l jpm_tree/lib
total 120
-rw-r--r--  1 ian   1.4K set.a
-rw-r--r--  1 ian   136B set.meta.janet
-rwxr-xr-x  1 ian    49K set.so

We have a static library and a dynamic library, plus a file of build information.

If we import our library from the Janet repl, or from a script that we execute with the janet interpreter, we’ll dynamically link in the native module set.so. But if we ask jpm to compile a native executable, jpm will statically link in the set.a archive. The set.meta.janet file contains some information that jpm will use in order to statically link it properly:

jpm tree/lib/set.meta.janet
# Metadata for static library set.a

{ :cpp false
  :ldflags (quote nil)
  :lflags (quote nil)
  :static-entry "janet_module_entry_set"}

So when we run jpm -l janet main.janet, we load the dynamic library, and somehow that gives us a function called hello in Janet.

jpm -l repl
repl:1:> (use set)
@{_ @{:value <cycle 0>} hello @{:private true} :macro-lints @[]}
repl:2:> hello
<cfunction set/hello>

But how did that work?

Normally when we load a Janet module we get an environment, which is a regular Janet table. And that’s exactly what we get when we load a native module as well:

repl:2:> (require "set")
@{hello @{:value <cfunction set/hello>} :native "/Users/ian/src/janet-set/jpm_tree/lib/set.so"}

But where did that table come from? Is there some marshaled environment table hiding in the dynamic library that we built?

Well, no. It’s simpler than that. Let’s take a closer look at the C code that we wrote:

#include <janet.h>
#include <stdio.h>

static Janet cfun_hello(int32_t argc, Janet *argv) {
  janet_fixarity(argc, 0);
  printf("hello world\n");
  return janet_wrap_nil();
}

static JanetReg cfuns[] = {
  {"hello", cfun_hello, "(hello)\n\nprints hello"},
  {NULL, NULL, NULL}
};

JANET_MODULE_ENTRY(JanetTable *env) {
  janet_cfuns(env, "set", cfuns);
}

JANET_MODULE_ENTRY is a macro, but we can expand it to something like this:

cc -E set.c | sed -n '/2 "set.c"/,$p'
# 2 "set.c" 2

static Janet cfun_hello(int32_t argc, Janet *argv) {
  janet_fixarity(argc, 0);
  printf("hello world\n");
  return janet_wrap_nil();
}

static JanetReg cfuns[] = {
  {"hello", cfun_hello, "(hello)\n\nprints hello"},
  {((void*)0), ((void*)0), ((void*)0)}
};

 __attribute__((visibility ("default"))) JanetBuildConfig _janet_mod_config(void) { return ((JanetBuildConfig){ 1, 23, 1, (0 | 0) }); } __attribute__((visibility ("default"))) void _janet_init(JanetTable *env) {
  janet_cfuns(env, "set", cfuns);
}

If you peer past the __attribute__ annotations, you can see that the JANET_MODULE_ENTRY macro defined two functions:

JanetBuildConfig _janet_mod_config(void) {
  return ((JanetBuildConfig){ 1, 27, 0, (0 | 0) });
}

void _janet_init(JanetTable *env) {
  janet_cfuns(env, "set", cfuns);
}

_janet_mod_config is a function that returns the current version of Janet — when Janet dynamically loads a native module, it will first check to make sure that it was compiled with the same version of Janet.

_janet_init is the interesting bit, though. It doesn’t actually return an environment table, but instead takes a freshly allocated table as an input and mutates it, installing all of the environment entries for our module.

You’ll typically do this with the janet_cfuns helper, which is a function that iterates over a null-terminated array of JanetReg structs:

struct JanetReg {
  const char *name;
  JanetCFunction cfun;
  const char *documentation;
};

And installs them into the environment table, boxing the raw C function pointers into Janet cfunction values.

But we could do other things in this function. We could execute arbitrary code to compute an environment. This function is a bit like top-level statements in a regular Janet file, except instead of running at compile time, it runs when the native module is loaded, so there is a runtime cost that we will pay either when the module is dynamically linked in, or on program startup if it’s statically linked.

Just for fun, let’s compute something, and put it in the environment table manually:

JANET_MODULE_ENTRY(JanetTable *env) {
  janet_cfuns(env, "set", cfuns);
  janet_def(env, "answer", janet_wrap_integer(42), "the answer");
}
jpm -l janet -e '(import set) (print set/answer)'
42

Very original.

But alright, we’re probably not going to do that very often. For the most part we’re just going to define C functions that we can call from Janet code.

So let’s talk about that:

static Janet cfun_hello(int32_t argc, Janet *argv) {
  janet_fixarity(argc, 0);
  printf("hello world\n");
  return janet_wrap_nil();
}

We’re going to see a lot of Janets today, and I think it’s helpful to understand what the type actually is. A Janet is a small value — on x86 and x64 architectures, it’s implemented as a single word using a technique called “NaN boxing.” On other architectures it’s implemented as a tagged union consisting of a one-byte enum plus eight payload bytes. So on my computer with my C compiler, a Janet is two 64-bit words long: the first is a tag value; the second is either a pointer or a double or a boolean integer.

The point is that we can pass Janet values around quite cheaply — copying a Janet is never going to copy an entire giant data structure; it will only copy a pointer to the data structure, even when we’re dealing with immutable tuples or structs.

So, coming back to cfun_hello, you can see that it takes an array of Janets and returns a Janet — specifically janet_wrap_nil(), which is the slightly verbose way that you write nil in the C API.

Our function doesn’t actually take arguments, so we assert that the user didn’t pass us any. Unless you’re writing a fully variadic function, you should start all cfunctions with an arity assertion like this. They come in a few flavors:

// Exactly two arguments:
janet_fixarity(argc, 2);

// One, two, or three arguments:
janet_arity(argc, 1, 3);

// At least two arguments:
janet_arity(argc, 2, -1);

I think that’s about all I can say about cfun_hello, so let’s move on to something real.

We’re trying to make a set type, so let’s write down a set/new function. I want it to work like this:

repl:1:> (set/new 1 2 3)
<set {1 2 3}>

So it will be a fully variadic function that returns… well, an abstract type, of course.

An abstract type is just a record that contains a name and a bunch of function pointers:

struct JanetAbstractType {
  const char *name;
  int (*gc)(void *data, size_t len);
  int (*gcmark)(void *data, size_t len);
  int (*get)(void *data, Janet key, Janet *out);
  void (*put)(void *data, Janet key, Janet value);
  void (*marshal)(void *p, JanetMarshalContext *ctx);
  void *(*unmarshal)(JanetMarshalContext *ctx);
  void (*tostring)(void *p, JanetBuffer *buffer);
  int (*compare)(void *lhs, void *rhs);
  int32_t (*hash)(void *p, size_t len);
  Janet(*next)(void *p, Janet key);
  Janet(*call)(void *p, int32_t argc, Janet *argv);
  size_t (*length)(void *p, size_t len);
  JanetByteView(*bytes)(void *p, size_t len);
};

The function pointers allow us to override different built-in bits of Janet functionality — you can probably guess what most of those do just from their names. Except, er, yeah, I don’t love to read function pointer signatures like that. This is a lot easier for me to read:

int gc(void *data, size_t len);
int gcmark(void *data, size_t len);
int get(void *data, Janet key, Janet *out);
void put(void *data, Janet key, Janet value);
void marshal(void *p, JanetMarshalContext *ctx);
void *unmarshal(JanetMarshalContext *ctx);
void tostring(void *p, JanetBuffer *buffer);
int compare(void *lhs, void *rhs);
int32_t hash(void *p, size_t len);
Janet next(void *p, Janet key);
Janet call(void *p, int32_t argc, Janet *argv);

Now there are lots of ways that we could choose to represent a set, but for now I am going to cheat a little: I’m going to implement a set as a regular Janet table, but wrapped in an abstract type. That way we won’t need to worry about actually implementing the data structure, and we can just focus on the abstract interface.

So here’s how I’m going to write my set/new function:

static JanetTable *new_abstract_set() {
  JanetTable *set = (JanetTable *)janet_abstract(&set_type, sizeof(JanetTable));
  set->gc = (JanetGCObject){0, NULL};
  janet_table_init_raw(set, argc);
  return set;
}

static Janet cfun_new(int32_t argc, Janet *argv) {
  JanetTable *set = new_abstract_set();
  for (int32_t i = 0; i < argc; i++) {
    janet_table_put(set, argv[i], janet_wrap_true());
  }
  return janet_wrap_abstract(set);
}

We’re allocating memory for a JanetTable struct using the janet_abstract function, then we’re doing something weird and JanetGCObject-related, and then we’re calling janet_table_init_raw. Everything after that is pretty straightforward.

The GC thing is only necessary because of the very weird thing that we’re doing of wrapping a JanetTable as an abstract type. I do want to explain this code, but understanding it requires understanding a few Janet implementation details that you absolutely do not need to understand to write any “normal” C functions. So consider the next aside to be completely optional reading.

Okay! With that out of the way, we still have to define the abstract type itself — the set_type value that I referenced in cfun_new. I’m going to define it like this for now:

static const JanetAbstractType set_type = {
  .name = "set",
  .gc = set_gc,
  .gcmark = set_gcmark,
  .get = NULL,
  .put = NULL,
  .marshal = NULL,
  .unmarshal = NULL,
  .tostring = set_tostring,
  .compare = NULL,
  .hash = NULL,
  .next = NULL,
  .call = NULL,
  .length = NULL,
  .bytes = NULL,
};

We’ll add some more functions later on, but we’re starting with the absolute basics.

set_gc is the function that Janet will call in order to garbage collect our value; it’s the deinitializer or destructor or whatever you want to call it for this abstract type. Our implementation is very simple:

static int set_gc(void *data, size_t len) {
  (void) len;
  janet_table_deinit((JanetTable *)data);
  return 0;
}

Even though len is an argument to this function, we don’t need to free the memory that we allocated for this abstract value; the garbage collector will do that for us. We only need to free any memory that we allocated in addition to that. The len argument is there just in case we allocated memory proportional to our original size — it saves us from having to store that length separately.

In that same vein, janet_table_deinit won’t free the actual JanetTable struct, only the memory that it allocated itself (the hash buckets). In general, if you’re writing an abstract type that doesn’t dynamically allocate any additional memory, you can just set .gc = NULL.

Finally, we return 0 to indicate success, like a process exit code.

Next we need to implement set_gcmark. Again, this is very simple:

static int set_gcmark(void *data, size_t len) {
  (void) len;
  janet_mark(janet_wrap_table((JanetTable *)data));
  return 0;
}

In general set_gcmark should loop over any janet_gcallocated Janet values that this abstract type knows about and call janet_mark on them. If the abstract type you’re defining isn’t some kind of container, you probably don’t need to implement this function at all.

Note that we don’t actually need to mark the table itself here, as long as we mark all of the values in the table: the table is not known to the garbage collector, as we allocated it as part of our abstract type. But there’s no harm in doing so, and calling janet_mark on the whole table is a very convenient way to recursively mark all of the table’s keys and values.

Once again, return 0; means that we didn’t encounter any error while trying to mark.

And now we’re basically done! There’s only one more function to implement: set_tostring:

static void set_tostring(void *data, JanetBuffer *buffer) {
  JanetTable *set = (JanetTable *)data;
  janet_buffer_push_cstring(buffer, "{");
  int first = 1;
  for (int32_t i = 0; i < set->capacity; i++) {
    JanetKV *entry = &set->data[i];
    if (janet_checktype(entry->key, JANET_NIL)) {
      continue;
    }
    if (first) {
      first = 0;
    } else {
      janet_buffer_push_cstring(buffer, " ");
    }
    janet_pretty(buffer, 0, 0, entry->key);
  }
  janet_buffer_push_cstring(buffer, "}");
}

Amusingly, this is the most complicated function of all, and it’s just a stupid printer.

Finally, we have to register the abstract type we declared. This just means adding one additional line to our _janet_init function:

JANET_MODULE_ENTRY(JanetTable *env) {
  janet_cfuns(env, "set", cfuns);
  janet_register_abstract_type(&set_type);
}

And now we are actually done. Just to recap, the code in its entirety looks like this:

set.c
#include <janet.h>

static int set_gc(void *data, size_t len) {
  (void) len;
  janet_table_deinit((JanetTable *)data);
  return 0;
}

static int set_gcmark(void *data, size_t len) {
  (void) len;
  janet_mark(janet_wrap_table((JanetTable *)data));
  return 0;
}

static void set_tostring(void *data, JanetBuffer *buffer) {
  JanetTable *set = (JanetTable *)data;
  janet_buffer_push_cstring(buffer, "{");
  int first = 1;
  for (int32_t i = 0; i < set->capacity; i++) {
    JanetKV *entry = &set->data[i];
    if (janet_checktype(entry->key, JANET_NIL)) {
      continue;
    }
    if (first) {
      first = 0;
    } else {
      janet_buffer_push_cstring(buffer, " ");
    }
    janet_pretty(buffer, 0, 0, entry->key);
  }
  janet_buffer_push_cstring(buffer, "}");
}

static const JanetAbstractType set_type = {
  .name = "set",
  .gc = set_gc,
  .gcmark = set_gcmark,
  .get = NULL,
  .put = NULL,
  .marshal = NULL,
  .unmarshal = NULL,
  .tostring = set_tostring,
  .compare = NULL,
  .hash = NULL,
  .next = NULL,
  .call = NULL,
  .length = NULL,
  .bytes = NULL,
};

static JanetTable *new_abstract_set() {
  JanetTable *set = (JanetTable *)janet_abstract(&set_type, sizeof(JanetTable));
  set->gc = (JanetGCObject){0, NULL};
  janet_table_init_raw(set, argc);
  return set;
}

static Janet cfun_new(int32_t argc, Janet *argv) {
  JanetTable *set = new_abstract_set();
  for (int32_t i = 0; i < argc; i++) {
    janet_table_put(set, argv[i], janet_wrap_true());
  }

  return janet_wrap_abstract(set);
}

static const JanetReg cfuns[] = {
  {"new", cfun_new, "(set/new & xs)\n\n"
    "Returns a set containing only this function's arguments."},
  {NULL, NULL, NULL}
};

JANET_MODULE_ENTRY(JanetTable *env) {
  janet_cfuns(env, "set", cfuns);
  janet_register_abstract_type(&set_type);
}

It’s a lot of code when we look at it all at once, but each individual piece is pretty simple. And now we finally get to test it out!

jpm -l janet -e '(import set)' -r
repl:1:> (set/new 1 2 3)
<set {1 2 3}>

Hooray! We wrote an extremely, umm, useless set type.

But we can try to make it more useful.

The first thing I want to do is to make it enumerable: I want to be able to loop over it with a normal each loop.

So recall, from Chapter Six, the iteration protocol: we need to implement a function called next that will return the next key in the structure, and we need to implement a function called get that will return a value for that key. So in order to implement this, we have to decide: what are the “keys” of a set?

One idea is to have next iterate over the keys in our underlying table — the elements in our set — and when we call get, to just return the element itself if it exists or nil if it doesn’t. Since nil cannot be a key of a table (and thus cannot be an element in our set), this happens to work nicely.

This is perfectly reasonable, and we could choose to implement next this way, but there’s a small problem with this approach: if we allow arbitrary values to be our keys, we can no longer have our abstract type respond to methods.

Of course it’s just fine to say “who cares” and not implement any methods for our set, but that wouldn’t be any fun. I think it would be nice to overload + to mean “union” and - to mean “difference” and so on, and in order to overload operators like that we’ll need to implement methods.

Implementing a method is pretty simple:

static Janet cfun_union(int32_t argc, Janet *argv);
static const JanetMethod set_methods[] = {
  {"+", cfun_union},
  {NULL, NULL}
};

static int set_get(void *data, Janet key, Janet *out) {
  if (!janet_checktype(key, JANET_KEYWORD)) {
    return 0;
  }
  return janet_getmethod(janet_unwrap_keyword(key), set_methods, out);
}

static const JanetAbstractType set_type = {
  .name = "set",
  .gc = set_gc,
  .gcmark = set_gcmark,
  .get = set_get,
  .put = NULL,
  .marshal = NULL,
  .unmarshal = NULL,
  .tostring = set_tostring,
  .compare = NULL,
  .hash = NULL,
  .next = NULL,
  .call = NULL,
  .length = NULL,
  .bytes = NULL,
};

set_get returns an int, but in this case it’s a boolean, not an “exit code.” So 0 means that the key was not found, while anything else means that it was. The janet_getmethod helper makes it easy to implement this; it does a linear scan through a NULL-terminated array of methods and “returns,” via the out parameter, the first one with a matching name.

We’ll need to reference set_type from the cfun_union implementation, so I forward declare it to implement later on:

static Janet cfun_union(int32_t argc, Janet *argv) {
  JanetTable *result = new_abstract_set();

  for (int32_t arg_ix = 0; arg_ix < argc; arg_ix++) {
    JanetTable *arg = (JanetTable *)janet_getabstract(argv, arg_ix, &set_type);
    for (int32_t bucket_ix = 0; bucket_ix < arg->capacity; bucket_ix++) {
      JanetKV *entry = &arg->data[bucket_ix];
      if (janet_checktype(entry->key, JANET_NIL)) {
        continue;
      }
      janet_table_put(result, entry->key, janet_wrap_true());
    }
  }

  return janet_wrap_abstract(result);
}

And now we can recompile this and test it out:

jpm -l janet -e '(import set)' -r
repl:1:> (+ (set/new))
<set {}>
repl:2:> (+ (set/new 1 2 3) (set/new 2 3 4))
<set {1 2 3 4}>

Perfect.

Since there is only one get function, and it has to work for both methods and values, there’s a restriction on what we can use as keys for our next implementation — if we allow arbitrary keywords to be keys, we won’t be able to implement any methods.

So instead let me propose a different key: the index of the hash bucket. It’ll be very fast to look it up, and there’s no chance that we’ll confuse it with a method name… but it will be completely meaningless if we mutate the underlying table during iteration, so we’ll have to make sure not to do that.

This key makes for a very simple implementation of next:

static Janet set_next(void *data, Janet key) {
  int32_t previous_index;
  if (janet_checktype(key, JANET_NIL)) {
    previous_index = -1;
  } else if (janet_checkint(key)) {
    previous_index = janet_unwrap_integer(key);
    if (previous_index < 0) {
      janet_panicf("set key %v cannot be negative", key);
    }
  } else {
    janet_panicf("set key %v must be an integer", key);
  }

  JanetTable *set = (JanetTable *)data;
  for (int32_t i = previous_index + 1; i < set->capacity; i++) {
    if (!janet_checktype(set->data[i].key, JANET_NIL)) {
      return janet_wrap_integer(i);
    }
  }
  return janet_wrap_nil();
}

We’re iterating over the buckets in the hash table again, just like we did for set_tostring, but this time we return the index of the first or next “full” bucket that we find.

That’s already sufficient to do something:

jpm -l janet -e '(import set)' -r
repl:1:> (eachk x (set/new 1 2 3 4 5) (print x))
0
1
2
4
8
nil
repl:2:> (eachk x (set/new 1 "two" :three 'four) (print x))
0
1
5
7
nil

The indexes themselves sort of leak some implementation details of Janet tables, but we’re going to treat them as completely opaque values. A set isn’t actually an indexed structure at all, and using eachk or eachp with a set is like using it with a generator — the keys just aren’t meaningful.

So let’s extend our get implementation to work with these keys, so that we can actually support useful iteration:

static int set_get(void *data, Janet key, Janet *out) {
  if (janet_checkint(key)) {
    JanetTable *set = (JanetTable *)data;
    int32_t index = janet_unwrap_integer(key);
    if (index < 0 || index >= set->capacity) {
      janet_panicf("set key %v out of bounds (did you mutate during iteration?)", key);
    }
    Janet element = set->data[index].key;
    if (janet_checktype(element, JANET_NIL)) {
      janet_panicf("set key %v not found (did you mutate during iteration?)", key);
    }
    *out = element;
    return 1;
  } else if (janet_checktype(key, JANET_KEYWORD)) {
    return janet_getmethod(janet_unwrap_keyword(key), set_methods, out);
  } else {
    return 0;
  }
}

And now, at long last, we actually have a useful set:

jpm -l janet -e '(import set)' -r
repl:1:> (each x (set/new 1 3 1 2 2 1 1) (print x))
1
3
2
nil
repl:2:> (map |(* $ 2) (set/new 1 2 3 4 5))
@[2 4 8 10 6]

I mean, for some definition of useful. We haven’t actually done anything set-related yet, and as the code stands right now we can’t even check for membership. So we still have a little ways to go.

One thing that was nice about our actual-table-as-a-set approach is that we could do a membership check by “invoking” the set:

repl:1:> (def cities-visited @{"LA" true "NYC" true})
@{"LA" true "NYC" true}
repl:2:> (cities-visited "LA")
true
repl:2:> (cities-visited "Pittsburgh")
nil

I mean, true/nil is janky as heck, but it works in most situations.

It would be nice to replicate this with our set type, but by default when we “invoke” an abstract type, it’s the same as calling get:

repl:1:> (def cities-visited (set/new "NYC" "LA"))
<set {"LA" "NYC"}>
repl:2:> (cities-visited "NYC")
error: key "NYC" not found in <set {"LA" "NYC"}>
  in _thunk [repl] (tailcall) on line 2, column 1
repl:3:> (cities-visited :length)
<cfunction 0x000104443DBC>

But we can change this by implementing a custom call function:

static Janet set_call(void *data, int32_t argc, Janet *argv) {
  janet_fixarity(argc, 1);
  JanetTable *set = (JanetTable *)data;
  Janet value = janet_table_get(set, argv[0]);
  int key_found = !janet_checktype(value, JANET_NIL);
  return janet_wrap_boolean(key_found);
}

And now if we “invoke” our set, it will perform a membership test instead:

repl:1:> (def cities-visited (set/new "NYC" "LA"))
<set {"LA" "NYC"}>
repl:2:> (cities-visited "NYC")
true
repl:3:> (cities-visited "Pittsburgh")
false

We now have a pretty thorough set implementation! We’ve put almost everything we can into the abstract type:

static const JanetAbstractType set_type = {
  .name = "set",
  .gc = set_gc,
  .gcmark = set_gcmark,
  .get = set_get,
  .put = NULL,
  .marshal = NULL,
  .unmarshal = NULL,
  .tostring = set_tostring,
  .compare = NULL,
  .hash = NULL,
  .next = set_next,
  .call = set_call,
  .length = NULL,
  .bytes = NULL,
};

length is trivial; let’s go ahead and knock that one out:

static size_t set_length(void *data, size_t len) {
  (void) len;
  JanetTable *set = (JanetTable *)data;
  return set->count;
}

We won’t implement bytes; bytes only makes sense for abstract types that are string-like or buffer-like. It’s supposed to return a slice of contiguous bytes, and we don’t have any of those.

If we were making an immutable set, we’d want to implement custom compare and hash functions to make sure that two sets with the same elements are equal to one another and hash to the same value. But for the sake of simplicity, let’s say that we only care about writing a mutable set, and we can just use the default pointer equality when we compare two sets.

I don’t know if we should implement put; it seems a bit weird to have an asymmetry between the keys we use for get and put. But it might be convenient to be able to write (set (cities-visited "NYC") true) to add keys and (set (cities-visited "NYC") false) to remove them. In any case, we won’t learn anything new by doing that, so let’s skip it for now.

Which just leaves us with the marshaling functions.

Recall from Chapter Two that marshaling means serializing a Janet data structure into a sequence of bytes. When we “compile” a Janet program, we compute the program’s environment table and then marshal that table to disk.

But in order to do that, all of the values in the environment table have to be marshalable. And right now, if we write a simple program like this, we actually have two values in our environment that are not marshalable:

(import set)

(def numbers (set/new 1 2 3 4 5))

(defn main [&]
  (each number numbers
    (print number)))

The first is, obviously, the set called numbers. The second is more subtle: it’s the cfunction called set/new that we imported.

jpm -l janet -c main.janet main.jimage
error: no registry value and cannot marshal <cfunction set/new>
  in marshal [src/core/marsh.c] on line 1480
  in make-image [boot.janet] on line 2637, column 3
  in c-switch [boot.janet] (tailcall) on line 3873, column 36
  in cli-main [boot.janet] on line 3909, column 13

Now, there’s not really any way for Janet to marshal cfunctions. You might imagine some kind of serialization of the actual machine code of the native function… but no. That wouldn’t work, and even if it did it wouldn’t be portable.

But it is possible for Janet to safely skip marshaling certain cfunctions. If Janet knows that the cfunction is going to be present in the environment that unmarshals this image, it can just write down an identifier for the cfunction and trust that the unmarshaling code will know how to match that identifier up to the actual correct cfunction later on. (This is the “registry value” that the error message is referring to.)

This is why you can have built-in cfunctions in your environment and marshal them just fine. You won’t get an error trying to marshal this file’s environment:

(def is-integer? int?)

Even though int? is a cfunction.

Now, when we use jpm to build an executable that references our native set library, jpm will take care of automatically skipping all of the cfunctions exposed by any native modules that the executable depends on, so we generally won’t have to think about this at all.

But even if we use jpm to compile this, we still won’t be able to marshal the resulting image, because our set is not marshalable yet. But let’s try it anyway. I’m going to add an executable with a native dependency:

project.janet
(declare-project :name "set")

(def native-module
  (declare-native
    :name "set"
    :source ["set.c"]))

(declare-executable
  :name "main"
  :entry "main.janet"
  :deps [(native-module :static)])

And then I’m going to alter the way that we import the native module:

main.janet
(import /build/set)

(def cities-visited (set/new "NYC" "LA"))

(defn main [&]
  (print (cities-visited "LA")))

Because we’re depending on a native library declared in the same project, we can’t install it before we run this script, so we can’t (import set) anymore. Instead we have to (import /build/set). This is the real actual right way to do this, although I agree that it is very gross.

Now if we try to compile this executable:

jpm -l build
generating executable c source build/main.c from main.janet...
found native build/set.so...
error: cannot marshal <set {"LA" "NYC"}>

We get an error, as expected. So we’re going to need to implement marshaling functions for our abstract set type.

Now, we could just defer to Janet’s existing table marshaling functions, and write these as one liners. But we’re not going to do that for three reasons:

  1. It won’t teach us anything about how to write marshaling functions in general.
  2. It will include redundant data in the image, since we know that all the values in the table are true.
  3. When we unmarshal a table, Janet is going to allocate the memory for that table as a normal garbage-collected value, and we won’t be able to safely copy its contents into the address that Janet allocated for the abstract value itself. We’d have to manually remove the allocated block from the Janet garbage collector’s list of allocated nodes without actually freeing its associated memory, and Janet doesn’t expose any way for us to do that.

In any case, writing custom marshaling functions is very easy:

static void set_marshal(void *data, JanetMarshalContext *ctx) {
  janet_marshal_abstract(ctx, data);
  JanetTable *set = (JanetTable *)data;
  janet_marshal_int(ctx, set->count);
  for (int32_t i = 0; i < set->capacity; i++) {
    Janet element = set->data[i].key;
    if (!janet_checktype(element, JANET_NIL)) {
      janet_marshal_janet(ctx, element);
    }
  }
}

We write down the number of elements in the table, then we write down each element.

Unmarshaling is just the reverse of that:

static void *set_unmarshal(JanetMarshalContext *ctx) {
  JanetTable *set = (JanetTable *)janet_unmarshal_abstract(ctx, sizeof(JanetTable));
  set->gc = (JanetGCObject){0, NULL};
  janet_table_init_raw(set, 0);
  int32_t length = janet_unmarshal_int(ctx);
  for (int32_t i = 0; i < length; i++) {
    janet_table_put(set, janet_unmarshal_janet(ctx), janet_wrap_true());
  }
  return set;
}

Although we do have to remember to zero out the set->gc fields again.

And now we have a pretty good set type. We can even compile and run a native executable that includes a marshaled set in its image:

main.janet
(import /build/set)

(def cities-visited (set/new "NYC" "LA"))

(defn main [&]
  (print (cities-visited "LA")))
jpm -l build && build/main
true

And we are done implementing functions for the abstract type protocol. Which means that we can start to write some functions for ourselves! We’ve done the busy work, and now we can start to have some fun.

For starters, we don’t have any way to change the set. We should start by implementing some basic functionality, like add and remove:

static Janet cfun_add(int32_t argc, Janet *argv) {
  janet_arity(argc, 1, -1);
  JanetTable *set = (JanetTable *)janet_getabstract(argv, 0, &set_type);
  for (int32_t i = 1; i < argc; i++) {
    janet_table_put(set, argv[i], janet_wrap_true());
  }
  return janet_wrap_nil();
}

static Janet cfun_remove(int32_t argc, Janet *argv) {
  janet_arity(argc, 1, -1);
  JanetTable *set = (JanetTable *)janet_getabstract(argv, 0, &set_type);
  for (int32_t i = 1; i < argc; i++) {
    janet_table_remove(set, argv[i]);
  }
  return janet_wrap_nil();
}

And next we should probably implement the rest of the set-like functions, like intersect and subtract.

But you know what? I’m kind of tired of writing C code. And if we’re going to write a full set API here, it would be nice if we could implement some of the higher-level helpers in Janet code.

And of course it’s easy to do this in our little main executable… we can just write helpers. But if we want to make a set library that can be used by other people we’ll need to figure out how to mix and match native modules with pure Janet code.

And there are three ways that we could do that:

  1. Embed Janet source code into our native module and execute it during janet_init.

  2. Compile a Janet image ahead of time, embed the image into our native module, and then execute it during _janet_init.

  3. Write a native Janet module that imports and re-exports the environment of a private native module, then declare-source it in our project.janet.

I think this is the easiest thing to do, but note that we will no longer be able to mix and match an executable with a library written in this way.

Here’s a template for such a module:

project.janet
(declare-project
  :name "set")

(declare-native
  :name "set/native"
  :source ["set.c"])

(declare-source
  :source "init.janet"
  :prefix "set/")

Note that we have to rename the native module so that (import set) will import our pure Janet module instead:

init.janet
(import set/native :as set)

(defn intersection [set1 set2]
  (def new-set (set/new))
  (each element set1
    (if (set2 element)
      (set/add new-set element)))
  new-set)

(import set/native :prefix "" :export true)

Now if we add this as a dependency to another project, we’ll wind up with something that looks like this:

tree jpm_tree
jpm_tree
├── bin
├── lib
│   └── set
│       ├── init.janet
│       ├── native.a
│       ├── native.meta.janet
│       └── native.so
└── man

And when we (import set), it will import our set/init.janet module that re-exports the set/native module.

Okay! That’s just about everything you need to know about Janet’s foreign function interface. We learned how to create an abstract type, and we learned how to call into C code from Janet. But, umm… we didn’t really do anything foreign, did we?

Usually you’ll create abstract types and write native modules in order to interoperate with existing C libraries, like sqlite or libcurl. And if you think about it, we sort of did that, except that our “existing library” was just the Janet C API.

But this allowed us to skip over something very important: we haven’t talked about how to link in code from actual external libraries. And I don’t want you to leave this chapter feeling cheated, so we’re going to do one more thing before we go.

There’s an open-source library called immer that provides persistent, immutable data structures in C++. I’ve never used it before, but I think it looks pretty neat, and it includes a set type. So let’s write bindings for it!

Except… we’re not going to do that together. It’s so similar to what we already did that we’d just be rehashing all of the exact same ground, so instead we’re just going to talk about the differences.

First off, we declare the project like this:

project.janet
(declare-project :name "jimmy")

(declare-native
  :name "jimmy/native"
  :source ["src/jimmy.cpp"]
  :cppflags ["-Iimmer" "-std=c++14"])

(declare-source
  :source [
    "src/set.janet"
    "src/init.janet"
  ]
  :prefix "jimmy")

And then… that’s it. That’s the only difference. Everything else is exactly the same. jpm detects that we’re using C++ by the file extension, and produces static and dynamic native modules that each statically link the immer library that I vendored as a git submodule (which jpm will automatically fetch).

We’re not going to walk through the code together, but the code is out there for you to peruse at your leisure, in case you ever find yourself wanting to interoperate with a C++ API. It might be useful to see the directory structure jpm expects in order for you to be able to distribute a library with nested submodules and native components.

Chapter Ten: Embedding Janet

Okay. In the last chapter we learned how to call C code from Janet. In this chapter, we’re going to learn how to call Janet code from C.

Specifically, we’re going to learn how to embed the Janet interpreter inside a larger app — it doesn’t have to be written in C, as long as it has a C FFI. But we’ll stick with C as our lingua franca.

Oh wait! I forgot that JavaScript was supposed to be our lingua franca. Oh no. Oh no. We just spent a whole chapter writing C code together and you didn’t say anything to me? Did you forget about the report function?

Well, hmm. Maybe we can have two… linguae franca? Lingua francae? Whatever. Maybe we can write a C program that embeds Janet, but call that program from JavaScript via WebAssembly: we’ll still learn how to embed Janet, but in the end we’ll have a program that runs in the browser so other people can actually use it.

Sounds like a good idea for a chapter! Let’s do it.

We actually already saw how to embed Janet into a C program, back in Chapter Seven, when we looked at how jpm produces native executables:

#include <janet.h>
static const unsigned char bytes[] = {215, 0, 205, /* ... */};

const unsigned char *janet_payload_image_embed = bytes;
size_t janet_payload_image_embed_size = sizeof(bytes);

int main(int argc, const char **argv) {

#if defined(JANET_PRF)
    uint8_t hash_key[JANET_HASH_KEY_SIZE + 1];
#ifdef JANET_REDUCED_OS
    char *envvar = NULL;
#else
    char *envvar = getenv("JANET_HASHSEED");
#endif
    if (NULL != envvar) {
        strncpy((char *) hash_key, envvar, sizeof(hash_key) - 1);
    } else if (janet_cryptorand(hash_key, JANET_HASH_KEY_SIZE) != 0) {
        fputs("unable to initialize janet PRF hash function.\n", stderr);
        return 1;
    }
    janet_init_hash_key(hash_key);
#endif

    janet_init();

    /* Get core env */
    JanetTable *env = janet_core_env(NULL);
    JanetTable *lookup = janet_env_lookup(env);
    JanetTable *temptab;
    int handle = janet_gclock();
    /* Unmarshal bytecode */
    Janet marsh_out = janet_unmarshal(
      janet_payload_image_embed,
      janet_payload_image_embed_size,
      0,
      lookup,
      NULL);

    /* Verify the marshalled object is a function */
    if (!janet_checktype(marsh_out, JANET_FUNCTION)) {
        fprintf(stderr, "invalid bytecode image - expected function.");
        return 1;
    }
    JanetFunction *jfunc = janet_unwrap_function(marsh_out);

    /* Check arity */
    janet_arity(argc, jfunc->def->min_arity, jfunc->def->max_arity);

    /* Collect command line arguments */
    JanetArray *args = janet_array(argc);
    for (int i = 0; i < argc; i++) {
        janet_array_push(args, janet_cstringv(argv[i]));
    }

    /* Create enviornment */
    temptab = env;
    janet_table_put(temptab, janet_ckeywordv("args"), janet_wrap_array(args));
    janet_gcroot(janet_wrap_table(temptab));

    /* Unlock GC */
    janet_gcunlock(handle);

    /* Run everything */
    JanetFiber *fiber = janet_fiber(jfunc, 64, argc, argc ? args->data : NULL);
    fiber->env = temptab;
#ifdef JANET_EV
    janet_gcroot(janet_wrap_fiber(fiber));
    janet_schedule(fiber, janet_wrap_nil());
    janet_loop();
    int status = janet_fiber_status(fiber);
    janet_deinit();
    return status;
#else
    Janet out;
    JanetSignal result = janet_continue(fiber, janet_wrap_nil(), &out);
    if (result != JANET_SIGNAL_OK && result != JANET_SIGNAL_EVENT) {
      janet_stacktrace(fiber, out);
      janet_deinit();
      return result;
    }
    janet_deinit();
    return 0;
#endif
}

And if all that we want to do is run some Janet code that we already wrote and compiled ahead of time, then this little snippet is all we need.

But you probably aren’t embedding the whole Janet runtime just so that you can write part of your application logic in a higher-level language. The real reason to embed Janet in your program is so that you can run Janet scripts that you didn’t write at all: plugins, mods, extensions — whatever you want to call them.

There are lots of neat things you can do with an embedded programming language, but since we only have one chapter to talk about this, we’ll have to pick a specific project. I think “programmatic art playground” is as good a genre as any, so we’re going to talk about how to build an app where users can write scripts that draw turtle graphics.

Here, it’s easier if you take a look, so I don’t have to explain it in too much detail: https://toodle.studio. That’s the final product that we’re going to be working towards: users can write scripts that have access to a pre-defined drawing DSL, and our program will execute those scripts asynchronously over time to make little animations.

But in case you are reading this book on paper — which you aren’t; I can tell — or have no patience for whimsical art playgrounds — which you don’t; I can tell — I will briefly summarize the features of our application:

So, to get more technical, we have the following bits of state in our wrapper application (i.e., in JavaScript):

  1. The current image. This can be null, if we haven’t started running any program yet. We’ll need to hold onto this in order to be able to restart the current program.
  2. The next available image. This represents the result of compiling the current script. It can become the “current image” at the press of a button. It will be null if we were unable to compile Marley’s script.
  3. The current environment. This holds all of the runtime state. It will be null if we haven’t started running anything yet.

Then we’ll have the following bits of runtime state, which we will store as entries in the environment of our running program:

  1. An array of turtles, which are actually just fibers.
  2. A number representing how much we should fade out the image over time.

That second part might seem like a weird unimportant detail that someone should probably just leave out of their book, but it’s actually going to be important once we get to the code. Just trust me. I am leaving out a lot of other details — in the actual application you can pause the current program, for example — but these are the only interesting, Janet-related bits of state.

Okay, so: how do we do this?

Let’s start small. Let’s say we have a string of Janet code. How do we run it?

Well, we just need to do exactly what Janet does when we import a file: parse the source, go through each top-level statement’s abstract syntax tree, perform macro expansion on it until we’re all out of macros, call the magic built-in compile function to turn the abstract syntax tree into a function, and then run that function.

But that sounds like a lot of work. We don’t want to do all of that by hand, in C code, even though we could. But fortunately Janet has a helper function that will do all of the hard work for us: run-context.

The signature for run-context is pretty intimidating, because it has a million optional arguments that we could use to override how parsing works or what to do on compilation errors or whatever, but the minimal API is pretty easy to use:

(defn evaluate [user-script]
  (def env (make-env root-env))
  (run-context
    {:env env
     :chunks (chunk-string user-script)
     :on-status (fn [fiber value]
       (printf "got %q (%q)" value (fiber/status fiber)))}))

run-context will execute each top-level form in its own fiber that will catch errors. It takes a few arguments:

Let’s talk about :chunks for a minute. run-context doesn’t take a string directly, but instead takes a callback that it will keep invoking until it returns nil. That callback is responsible for writing bytes into a buffer that it takes as an argument. It also takes a Janet parser, which we can just ignore.

There isn’t a default way to get a “chunking” function for a string, so I wrote a short helper:

(defn chunk-string [str]
  (var unread true)
  (fn [buf _]
    (when unread
      (set unread false)
      (buffer/blit buf str))))

This might seem like a weird API, but typically Janet expects to be reading from a file, or from a REPL, where not all “chunks” are available up front.

So that’s run-context, in its simplest form. Now that we understand it, let’s test it out:

eval.janet
(defn evaluate [user-script]
  (def env (make-env root-env))
  (run-context
    {:env env
     :chunks (chunk-string user-script)
     :on-status (fn [fiber value]
       (printf "> %q (%q)" value (fiber/status fiber)))}))

(evaluate `
(+ 1 2)
foo
(print "done")
(pp (yield 10))
)
(error "oh no")
`)
janet eval.janet
> 3 (:dead)
<anonymous>:2:1: compile error: unknown symbol foo
done
> nil (:dead)
> 10 (:pending)
nil
> nil (:dead)
<anonymous>:5:1: parse error: unexpected closing delimiter )
> "oh no" (:error)

Let’s notice a few things about this:

That last thing is probably not what we want most of the time, because a compilation error might change the behavior of the rest of our code in unpredictable ways (if, for example, a function that was supposed to be shadowed wasn’t).

We can fix that by adding a couple more callbacks to our run-context call:

eval.janet
(defn evaluate [user-script]
  (def env (make-env root-env))

  (defn on-parse-error [parser where]
    (bad-parse parser where)
    (set (env :exit) true))

  (defn on-compile-error [msg fiber where line col]
    (bad-compile msg fiber where line col)
    (set (env :exit) true))

  (run-context
    {:env env
     :chunks (chunk-string user-script)
     :on-status (fn [fiber value]
       (printf "> %q (%q)" value (fiber/status fiber)))
     :on-parse-error on-parse-error
     :on-compile-error on-compile-error
     }))

(evaluate `
(+ 1 2)
foo
(print "done")
(pp (yield 10))
)
(error "oh no")
`)
janet eval.janet
> 3 (:dead)
<anonymous>:2:1: compile error: unknown symbol foo
<anonymous>:5:1: parse error: unexpected closing delimiter )

Setting (env :exit) to true is how we signal to run-context that we don’t want it to keep going — although as you can see, it might not stop parsing immediately. But that’s pretty harmless.

bad-parse and bad-compile are functions that print out those stack traces; they are the default values for the on-parse-error and on-compile-error callbacks.

So this is all we need to write if we just want to run the user code, but remember that this code is going to have side effects — specifically, in our case, this code might create turtles.

We’ll want to be able to inspect these turtles later on, so we’re going to return the final environment — but only if there was no error.

eval.janet
(defn capture-stderr [f & args]
  (def buf @"")
  (with-dyns [*err* buf *err-color* false]
    (f ;args))
  (string/slice buf 0 -2))

(defn evaluate [user-script]
  (def env (make-env root-env))

  (var err nil)
  (var err-fiber nil)

  (defn on-parse-error [parser where]
    (set err (capture-stderr bad-parse parser where))
    (set (env :exit) true))

  (defn on-compile-error [msg fiber where line col]
    (set err (capture-stderr bad-compile msg nil where line col))
    (set err-fiber fiber)
    (set (env :exit) true))

  (run-context
    {:env env
     :chunks (chunk-string user-script)
     :on-status (fn [fiber value]
       (printf "> %q (%q)" value (fiber/status fiber)))
     :on-parse-error on-parse-error
     :on-compile-error on-compile-error
     })

  (if (nil? err)
    env
    (if (nil? err-fiber)
      (error err)
      (propagate err err-fiber))))

And we’re actually done now! That is a fully-fledged Janet evaluator.

Note that we’re still calling bad-parse and bad-compile, which print to (dyn *err*) — typically stderr — but we redirect that to a buffer so that we can raise it later, and then strip off the trailing newline.

Now that we have our working run-context-based evaluator, let’s talk about how to actually call this function from our application code.

On the one hand, we have this Janet function, which takes a Janet string. On the other hand, we have… an HTML <textarea> or something, from which we can extract a JavaScript string.

How do you convert a JavaScript string into a Janet string?

Well… it’s a little convoluted. We’re going to use something called Emscripten to compile native code into WebAssembly. Emscripten makes it really easy to interoperate between JavaScript and C++ code, so we’re going to take advantage of that power and write our wrapper program in C++, not C. Then we’ll use Emscripten to automatically translate our JavaScript string into a C++ string, and then convert that to a C string with the .c_str() method, and then convert that C string to a Janet string with janet_cstringv. Like I said: convoluted. This is the price we pay for writing a program that runs in the browser; if we were writing a native application, this would probably be a lot more straightforward.

But alright, assuming that we have the input as a Janet string… how do we call it from C?

There are a few steps:

  1. Create a Janet image that contains our evaluate function, using janet -c.
  2. Embed that image somehow in our final program — we could stick it in a literal byte array, read it from a separate file, whatever.
  3. Unmarshal the image to get an environment, exactly like we saw in the compiled program example.
  4. Look up the function we want to call in the environment, and hold onto it.
  5. Actually call the function using janet_pcall.

The first few parts are easy:

static JanetFunction *janetfn_evaluate;

int main() {
  janet_init();

  Janet environment = janet_unmarshal(...);

  JanetTable *env_table = janet_unwrap_table(environment);
  Janet evaluate;
  janet_resolve(env_table, janet_csymbol("evaluate"), &evaluate);
  janet_gcroot(evaluate);

  janetfn_evaluate = janet_unwrap_function(evaluate);
}

Note that we also add it as a garbage collector root with the janet_gcroot function. This is very important!

Because later on, when we call this function:

bool call_fn(JanetFunction *fn, int argc, const Janet *argv, Janet *out) {
  JanetFiber *fiber = NULL;
  if (janet_pcall(fn, argc, argv, out, &fiber) == JANET_SIGNAL_OK) {
    return true;
  } else {
    janet_stacktrace(fiber, *out);
    return false;
  }
}

struct EvaluationResult {
  bool is_error;
  string error;
  uintptr_t environment;
};

EvaluationResult toodle_evaluate(string source) {
  Janet environment;
  const Janet args[1] = { janet_cstringv(source.c_str()) };
  if (!call_fn(janetfn_evaluate, 1, args, &environment)) {
    return (EvaluationResult) {
      .is_error = true,
      .error = "evaluation error",
      .environment = 0,
    };
  }

  janet_gcroot(environment);
  return (EvaluationResult) {
   .is_error = false,
   .error = "",
   .environment = reinterpret_cast<uintptr_t>(janet_unwrap_table(environment)),
  };
}

We’ll have to reference janetfn_evaluate, and we’ll be very sad if it has been garbage collected in the interim. Which it will, by default — there’s no reason for Janet to keep this unmarshaled value around.

Now note that, unfortunately, Emscripten doesn’t let us return structs containing pointers to JavaScript, so we’ll have to convert it to a number first (specifically, a uintptr_t). And because C++ lacks variant types, we’ll have to put it in a sort of dumb-looking struct with an explicit tag.

Now, this is all fine. This works.

But remember the constraints of our program: we want to be able to start and re-start this program. But an environment is a living, breathing thing — it will contain fibers and reference types and all sorts of things that we can’t just “restart.”

Hmm. If we knew that all we had were immutable values, we could just hold onto the original… but there are mutable values all over the place. Our turtles are fibers, and fibers might have arbitrary amounts of internal, mutable state that we can’t possibly know about.

So since our program contains mutable state, we won’t be able to run it. Instead, we’ll have to run a clone of it, and hold on to a pristine copy of the original.

But how do we clone the environment? It’s not sufficient to copy the environment table itself — we’d have to make a deep copy of the table, plus all the data structures it references, and all of the fibers inside of it…

There’s no one step “deep copy everything” function in Janet, but there is a way to do this: we can marshal the environment to a buffer, freezing it in carbonite, and from this static image of our program we can instantiate as many living copies as we like.

In fact, huh. We evaluate a script, produce an environment, and then marshal that environment into an image, which we will then resume later… does any of that sound familiar?

Exactly: we learned about this all the way back in Chapter Two. I’ve just described imagination. Er, compilation, I mean.

So we aren’t really evaluating Marley’s script (although we are). We’re really compiling Marley’s script into an image, that we can then breathe life into as many times as we like.

With this insight in mind, let’s modify our function slightly:

struct CompilationResult {
  bool is_error;
  string error;
  uintptr_t image;
};

CompilationResult toodle_compile(string source) {
  Janet environment;
  const Janet args[1] = { janet_cstringv(source.c_str()) };
  if (!call_fn(janetfn_evaluate, 1, args, &environment)) {
    return (CompilationResult) {
      .is_error = true,
      .error = "compilation error",
      .image = 0,
    };
  }

  JanetTable *reverse_lookup = env_lookup_table(janet_core_env(NULL), "make-image-dict");
  JanetBuffer *image = janet_buffer(2 << 8);
  janet_marshal(image, environment, reverse_lookup, 0);

  janet_gcroot(janet_wrap_buffer(image));
  return (CompilationResult) {
   .is_error = false,
   .error = "",
   .image = reinterpret_cast<uintptr_t>(image),
  };
}

Instead of returning an environment, we now return an image of the environment.

Great! Which immediately tells us what we need to do next: we’ll need to write a function that takes an image and returns an actual environment. And then we’ll need another function that takes that environment and does something with it — advances the program; scoots the turtles forward.

Whenever Marley starts a new program or restarts the current program, we’ll unmarshal the corresponding image. And then we’ll call the advance function on every frame.

The “start” function is not very interesting; we’ve already seen how to unmarshal images:

uintptr_t toodle_start(uintptr_t image_ptr) {
  JanetBuffer *image = reinterpret_cast<JanetBuffer *>(image_ptr);
  JanetTable *lookup = env_lookup_table(janet_core_env(NULL), "load-image-dict");
  Janet environment = janet_unmarshal(image->data, image->count, 0, lookup, NULL);

  janet_gcroot(environment);
  return reinterpret_cast<uintptr_t>(janet_unwrap_table(environment));
}

But the “run” function is quite interesting.

For starters, we’ll have to call a Janet function that actually knows the internal details of our environment and what to do with it. I’ll call it janetfn_run, and we’ll assume that we extracted it in main() exactly as we did janetfn_evaluate.

Now this function is going to return two things: it will return a list of lines to draw, and it will return a color that will determine how the image fades out over time. This means we’ll really call two functions: first janetfn_run, and then janetfn_get_bg.

And therein lies the interesting bit of all of this!

RunResult toodle_run(uintptr_t environment_ptr) {
  JanetTable *environment = reinterpret_cast<JanetTable *>(environment_ptr);

  Janet run_result;
  Janet bg;
  const Janet args[1] = { janet_wrap_table(environment) };
  if (!call_fn(janetfn_run, 1, args, &run_result)) {
    return run_error("evaluation error");
  }
  janet_gcroot(run_result);
  if (!call_fn(janetfn_get_bg, 1, args, &bg)) {
    return run_error("evaluation error");
  }
  janet_gcunroot(run_result);

  JanetArray *lines = janet_unwrap_array(run_result);
  int32_t count = lines->count;

  auto line_vec = std::vector<Line>();
  // convert the run_result into a C++ vector...

  return (RunResult) {
   .is_error = false,
   .error = "",
   .lines = line_vec,
   .background = unsafe_parse_color(janet_unwrap_tuple(bg)),
  };
}

Notice that we return the run_result value from Janet to C. But then we jump back into the Janet runtime in order to extract the background color. But! We have to add the lines-to-draw value as a GC root before we give control back to the Janet VM. We don’t want the garbage collector to have a chance to collect that value before we’re done with it!

In fact any time we give control to the Janet VM with janet_pcall or another function like that, we’re giving the garbage collector a chance to run, so we have to make sure that we set up the GC roots for any Janet value that we have a reference to in C code. Once we’re out of the VM for good, we can remove the root, because the Janet GC won’t run unless we either run some Janet code or explicitly trigger a collection.

And now we are almost done. But we’re missing something very important, and we can’t leave the chapter until we fix it: when we created our images and environments, we added them as janet_gcroots. But we never called janet_gcunroot on them! Which means we have a memory leak.

In order to plug it, we’ll have to add four more simple functions:

void retain_environment(uintptr_t environment_ptr) {
  janet_gcroot(janet_wrap_table(reinterpret_cast<JanetTable *>(environment_ptr)));
}
void release_environment(uintptr_t environment_ptr) {
  janet_gcunroot(janet_wrap_table(reinterpret_cast<JanetTable *>(environment_ptr)));
}

void retain_image(uintptr_t image_ptr) {
  janet_gcroot(janet_wrap_buffer(reinterpret_cast<JanetBuffer *>(image_ptr)));
}
void release_image(uintptr_t image_ptr) {
  janet_gcunroot(janet_wrap_buffer(reinterpret_cast<JanetBuffer *>(image_ptr)));
}

I call these functions retain and release, because we’re going to treat Janet values as if they are reference-counted, which they basically are. The reference counts aren’t intrusive like you might be used to — when we “retain” a value, we’re really adding it to a list, and when we “release” a value, we’re removing it from the same list — but still, values can appear in the Janet root list multiple times, and janet_gcunroot will only remove one entry for the corresponding value.

So: how do we use these?

Well, for every image returned from toodle_compile, we’ll need to eventually call release_image. And for every image returned from toodle_start, we’ll need to call release_environment.

But! Remember that our program actually needs to hold onto two images: the image of the currently-running program (so that we can restart it), and the image of the “next” program that we’ve successfully compiled (so that we can switch over to it without having to re-compile the user’s script). And these values might be the same value at many points in time! So we’ll also need to call retain_image when we mark a current image as the “active” image, to ensure that it doesn’t get garbage collected when it is no longer the “next” image. Which means that we’ll need to call release_image one more time, on the previous “active” image, before we retain the new one.

I won’t go too much into reference-counted memory management in this book, but it’s essentially a matter of balancing parentheses. Whenever we create or retain a value, we need to remember to release it later. And if we ever create a new reference to a value, we have to remember to retain it, and then to release it, once the reference changes or goes out of scope.

Our program has two variables that can reference Janet values:

let potentialNextImage: Image | null;
let currentImage: Image | null;

These values might be the same, or they might be different. But whenever we say currentImage = potentialNextImage — whenever we promote a newly compiled image to be the “current” program — we’re essentially taking another reference to the same Janet value. So we want to retain that image (and release the previous image).

Okay. Now that we’ve fixed the leaks, we’re basically done. But there’s one final detail of the implementation that’s worth mentioning.

We have to parse the returned “list of lines” into a C++ struct, which Emscripten will automatically translate into a JavaScript object for us. But we’re parsing values that are produced, in part, by a script that Marley wrote. And Marley, as you know, is not to be trusted: even though a turtle is supposed to yield lines, Marley could have written turtles that misbehave, and yield arbitrarily crazy values.

So in order to parse the results of the fiber invocations, we need to validate the values. And it’s going to be much, much easier to do that validation in Janet than it would be to do it from our C++ wrapper. So before we return anything into C++, we have to validate that all of the data we’re going to return is in the format that our C++ code expects it. And then in our C++ wrapper, we blindly trust that we have the correct shape of data.

We could of course do the validation in C++ instead, and I think it even feels more correct to do that: we won’t need to rely on careful coordination between the Janet validator and the C++ parser that way. But it’s a trade-off, and it’s so much easier to write the validation logic in Janet that I would rather just be extra careful about keeping them in sync.

The rest of our program is, well, the actual application — the JavaScript UI, the DSL for declaring turtles, the Emscripten bindings to allow us to speak C++ from JavaScript, the 3D turtle logo with eyes that track the mouse, etc. The full code is available online if you’re curious about the details, but we won’t go over much more of it.

But there’s still one more interesting bit to talk about.

It concerns the turtle DSL.

Let’s consider this very simple program:

(var hue (/ 2 6))
(toodle {:width 3 :speed 0}
  (set (self :color) (hsv hue 1 1))
  (+= hue 0.001)
  (turn 0.08)
  (+= (self :speed) 0.01))

This program creates a single turtle that draws an outward spiral.

But that is not actually how I want to write that program. I’d rather write it like this instead:

(var hue (/ 2 6))
(toodle {:width 3 :speed 0}
  (set self.color (hsv hue 1 1))
  (+= hue 0.001)
  (turn 0.08)
  (+= self.speed 0.01))

Look at that self.field notation. That isn’t Janet! What’s up with that?

Well, there’s one more argument to run-context: :expander.

:expander is a function that runs on every top-level form that takes the abstract syntax tree and returns a new one. We can use it to, essentially, wrap every top-level form in a custom macro.

And that dot syntax? That’s just a macro that searches through the abstract syntax tree and rewrites symbols with dots in them, like foo.bar into (foo :bar) instead.

That’s a pretty mild extension, but we could use this feature to create arbitrary Janet dialects, if we wanted to. We could add infix operators, or special syntax that doesn’t need to exist within a macro call.

In fact, we could even replace the parser altogether, and design a language with whitespace-sensitive indentation that parses into normal Janet tuples. We could re-use the Janet compiler and runtime with a completely custom syntax, if we wanted to.

But we’re not going to do that in this book.

This book is just about done talking about embedding Janet. But this book would like to talk about one last detail before we bring the chapter to a close: what happens if Marley writes a function with an infinite loop?

Sadly, the answer is that her entire browser tab will freeze.

But in general, there is a function called janet_interpreter_interrupt, which will, umm, interrupt the interpreter. But of course, we need to call it from a separate thread: if the current thread is spinning in an infinite loop, there’s no way that we’ll be able to sneak a janet_interpreter_interrupt in there.

Sadly implementing this in the browser is so difficult that I have to leave it as an exercise for the reader. You can, perhaps, create shared WebAssembly memory that you call into from a web worker… or perhaps you cannot. I could not, at least, in time to satisfy this book’s publisher. Who is me. It’s self-published. But I wanted to release this book instead of fighting with asynchronous browser APIs or undocumented Emscripten features. I’m sure you can understand.

So just… don’t write infinite loops.

Chapter Eleven: Testing and Debugging

I know that you’ve been looking forward to this chapter the whole book.

Sure, compile-time metaprogramming or whatever is fine, but when do we get to talk about automated test suites? You haven’t even thought about skipping this one.

I hope we can forego the spiel about the importance of testing — we’re both adults here, and I’m going to assume that if you’ve ever worked on a codebase in a dynamically typed language for longer than six weeks, you understand the value of a comprehensive automated test suite and an ergonomic testing framework.

But we actually don’t need any test framework at all to start writing tests in Janet. jpm comes with a test subcommand that just works out of the box: by default, jpm test will recursively search a directory called test/ for any .janet files, and then it will compile and run them, and check if any of them exit non-zero.

For example:

test/math.janet
(assert (= (+ 2 2) 3))
jpm test
running test/math.janet ...
error: assert failure in (= (+ 2 2) 3)
  in _thunk [test/math.janet] (tailcall) on line 1, column 1
non-zero exit code in test/math.janet: 1
Failing test scripts: 1

That’s all it takes to write a test!

Or, well, we’ll probably want to test something that we actually wrote:

src/math.janet
(defn add [x y] (+ x y))
test/math.janet
(use /src/math)

(assert (= (add 2 2) 4))

/src/math is a path relative to the current working directory. You don’t want to use working-directory-relative imports in actual library code, because you have no idea what the current working directory will be when your code runs. But for test code, it works great, and means that we can import using the same paths from all of our scripts.

For very simple programs, a few assertions in a file like this is probably sufficient. But I don’t usually write tests for very simple programs. I usually write tests for complex programs, and I don’t think that this is a very nice way to test complex programs.

For starters, this type of test stops as soon as we reach the first failing assertion in each file. But sometimes it’s useful to see all failing tests, not just the first one. “Run a script and look at the exit code” doesn’t leave a lot of room for the sort of granular error-handling that would be necessary to continue after an error, and it also doesn’t give us any nice way to group multiple assertions within a single file, or easily select individual tests to run without re-running the whole suite.

To get those nice features, we’ll have to bring in a test framework. And we have a few choices on that front, but this chapter is going to focus on one test framework in particular.

It’s called Judge, and, full disclosure, I am the author of Judge, so I am definitely biased. But I am the author of Judge because automated testing is a weirdly strong passion of mine, and I believe that Judge’s approach to testing is a material improvement over traditional assertion-based testing.

Judge is a framework for writing inline snapshot tests. I’m going to assume that you’ve never heard of inline snapshot testing before, because the technique is not extremely well-known, but lots of test frameworks in lots of languages support it. But it’s often an afterthought, or a bonus feature. In Judge, it is the only feature.

Inline snapshot testing is basically like running a repl directly in your source code. Except it’s a persistent repl that you can re-play in the future, or even share with other people.

A simple port of our test into Judge might look like this:

test/math.janet
(use /src/math)
(use judge)

(test (add 1 2))

Notice that there’s no, umm, assertion there. There’s an expression to test, but it says nothing about what we want it to be. Just like if we typed this at the repl!

When we run the test, Judge will fill in the answer — directly in our actual source code.

(use /src/math)
(use judge)

(test (add 1 2) 3)

And then, the next time we run this test, Judge will only tell us if the output changes. If during some delicate refactor we accidentally break the function:

(defn add [a b]
  (+ a a)) # whoops

Then when we run tests again, Judge will let us know:

judge
running test: test/math.janet:4:1
- (test (add 1 2) 3)
+ (test (add 1 2) 2)
0 passed 1 failed 0 skipped 0 unreachable

Now, this might seem really weird to you at first. It might seem like we’re assuming that the code is correct the first time we run it, and that we’re immortalizing whatever random value it happens to give us for the rest of time. But we’re not: there’s a still a human in the loop, and it is a loop. If the output isn’t right, we’ll notice, and we’ll fix the bug, and we’ll run it again.

But what if we don’t notice? What if we erroneously accept bad output when Judge prompts us, and then don’t notice when we’re staging our commit, and then don’t notice during code review either? Well, then, yes. We’d have a bug. But. I mean. Come on.

Now, I am committing a pedagogical sin here, by showing you examples of tests that no sane human would ever write.

So let’s look at something slightly less trivial:

(test (sorted-by - [10 3 1 28 4]))

sorted-by is a built-in Janet function, but let’s pretend that we wrote it and we want to test it. We could, of course, write an assertion about the output. But sorting this list by hand would probably take me a few seconds, and I can check if the output looks right almost instantly:

- (test (sorted-by - [10 3 1 28 4]))
+ (test (sorted-by - [10 3 1 28 4]) @[28 10 4 3 1])

Because it’s so cheap and easy to write tests this way, I find that I write more of them. And because writing tests this way is easier and cheaper than jumping over to a repl, writing tests is the default way that I engage with my programs.

An important part of the ergonomics here is that Judge tests can exist inside your regular source files. We probably wouldn’t write a test like this in the test/ subdirectory — we’d probably write it directly in our source:

src/math.janet
(use judge)

(defn add [x y] (+ x y))

(test (add 1 2) 3)

Not only is this more convenient — you don’t have to tab to a different file — but it also makes the code easier to read. The tests act as living, automatically-updating documentation for the behavior of the code.

Is this “test-driven development?” I have no idea. To me it’s just regular development: instead of writing tests before I wrote code, or writing tests after I write code, I write tests as I write code, trying things out in the repl that is my regular source files.

But this is an unusual workflow. I am very accustomed to it from my experience with OCaml, but a more common way to program in languages with lots of parentheses is to start a long-running repl server that you can use to talk to your program interactively. You can do this in Janet, using Spork’s netrepl module — but by using source files as your repl, you get persistence over time (because you can run your tests later) and over space (because you can share them with other people) for free. Traditional repls are single player affairs, but inline snapshot testing is multiplayer. Plus the kool-aid is delicious.

Alright, that’s enough about testing for now. Let’s move on to debugging.

One way that I like to debug things is an ancient, hallowed technique called “printf debugging.” Printf debugging is great, and don’t let anyone tell you differently.

Judge actually makes printf debugging even nicer, with a macro called test-stdout. test-stdout runs an expression and shows you what it prints. Like this:

(test-stdout (print "hello") `
  hello
`)

On its own, this probably seems really silly. But you can do some wonderful things with it:

(deftest "testing output"
  (def data [[0 1] [1 3] [2 5]])
  (test-stdout (print-table ["x" "y"] data) `
    ╭───┬───╮
    │ x │ y │
    ├───┼───┤
    │ 0 │ 1 │
    │ 1 │ 3 │
    │ 2 │ 5 │
    ╰───┴───╯
  `))

The way that you visualize test expressions can have a big difference on how effective your tests are. Of course Judge doesn’t care if you print that as a nicely formatted table or not — it’ll just make sure that it stays the same over time — but people care. Tests are arguments about the behavior of code, and arguments should be convincing. And it’s a lot easier to make sense of a graph or a chart or an image than it is to wade through a bunch of assertions or raw data.

But test-stdout is more powerful than just pretty-printing a string and checking the result. It captures stdout dynamically, so you can use test-stdout with printf debugging to get a better understanding of your code. For example:

(defn slowsort [list]
  (case (length list)
    0 list
    1 list
    2 (let [[a b] list] [(max a b) (min a b)])
    (do
      (def pivot-index (math/floor (/ (length list) 2)))
      (def pivot (list pivot-index))
      (def smalls (filter |(< $ pivot) list))
      (def bigs (filter |(> $ pivot) list))
      [;(slowsort smalls) pivot ;(slowsort bigs)])))

That’s a not-completely-trivial function. Does it work? Let’s find out.

(use judge)

(test (slowsort [3 10 2 -5]) [-5 2 10 3])

Ah. No. That doesn’t look right.

I know you can already see the problem, but let’s pretend that we are mystified by this, and the only way we can get to the bottom of it is to sprinkle some printfs over the code:

(defn slowsort [list]
  (printf "slowsort %q" list)
  (case (length list)
    0 list
    1 list
    2 (let [[a b] list] [(max a b) (min a b)])
    (do
      (def pivot-index (math/floor (/ (length list) 2)))
      (def pivot (list pivot-index))
      (def smalls (filter |(< $ pivot) list))
      (def bigs (filter |(> $ pivot) list))
      (printf "  %q %q %q" smalls pivot bigs)
      [;(slowsort smalls) pivot ;(slowsort bigs)])))

And then change our test expression to test-stdout:

(test-stdout (slowsort [3 10 2 -5]) `
  slowsort (3 10 2 -5)
    @[-5] 2 @[3 10]
  slowsort @[-5]
  slowsort @[3 10]
` [-5 2 10 3])

Aha! We realize from looking at the execution trace that we flipped the output in the two-element case. We fix our test, and move on.

Of course we don’t have to use Judge to do any of this — we could just run it by hand and look at the output. That’s a completely reasonable thing to do, but by running code with Judge we get to co-locate the expressions and the output they produce, which is easier for my brain to reckon with. We can also keep all of our existing editor tooling for running the test under the cursor or running all the tests in the current file, instead of having to modify our (main) entry point somehow.

I think that that editor tooling is pretty important if you want Judge to replace the repl for you altogether, but unfortunately Judge does not have very good support in major editors. But it is one of the simpler integrations to write for your favorite editor: just run judge file.janet:line:col, passing it the position of your cursor, and Judge will take it from there.

Okay, so I guess we didn’t really switch gears before. We’re still talking about testing, aren’t we. But let’s actually move on to debugging now. Even though they are intimately related, and often the easiest way to debug something is to write more tests for it, and—

Okay fine. There is more to life than writing automated tests.

So Janet actually includes an interactive step-through debugger. You can bring up at any point, and you can have Janet automatically bring it up on uncaught errors. Let’s take a look:

debug.janet
(defn inc [x]
  (+ x 1))

(defn main [&]
  (print (inc "foo")))

If we just run that, we’ll get an error:

janet debug.janet
error: could not find method :+ for "foo"
  in inc [debug.janet] on line 2, column 3
  in main [debug.janet] (tailcall) on line 5, column 10
  in run-main [boot.janet] on line 3795, column 16
  in cli-main [boot.janet] on line 3940, column 17

And, you know, this happens to be a pretty clear error message. But let’s pretend, for example’s sake, that we are mystified, and cannot understand what has gone wrong.

Enter the Janet debugger:

janet -d debug.janet
error: could not find method :+ for "foo"
  in inc [debug.janet] on line 2, column 3
  in main [debug.janet] (tailcall) on line 5, column 10
entering debug[1] - (quit) to exit
debug[1]:1:>

Now we’re in a prompt like a repl that we can use to poke around. First off, let’s try to figure out where we are.

debug[1]:1:> (.stack)

error: could not find method :+ for "foo"
  in inc [debug.janet] on line 2, column 3
  in main [debug.janet] (tailcall) on line 5, column 10

nil

Hmm, okay. That tells us where the error started, although we sort of remembered that from the original message. But what was at debug.janet:2:3 again? It would be nice to see the actual expression that raised.

debug[1]:2:> (.source)

(defn inc [x]
  (+ x 1))

(defn main [&]
  (print (inc "foo")))

nil

Unfortunately Janet’s debugger doesn’t have a way to combine the stack frames with the actual source. It would be nice to highlight the current stack frame in context. But okay — the information is there.

So it seems like maybe the problem has to do with this x. What is x?

debug[1]:3:> x
<anonymous>:3:1: compile error: unknown symbol x

Ah, hmm. This might be surprising, but the debugger repl is not actually running in the context of the place where we’re currently paused. Instead, to inspect the local environment, we have to use:

debug[1]:4:> (.locals)
@{inc <function inc> x "foo"}

Which gives us a table of all the local bindings.

(.locals) is short for (.locals 0). You can also call (.locals 1) to get the locals in the stack frame below this one.

That’s about the extent of your debugging abilities, if you’re stopped on an error like this.

However there are other ways to conjure the debugger. Let’s consider a slightly more complicated program:

step.janet
(defn enemy-of-enemy [name people]
  (def subject (people name))
  (def nemesis (people (subject :nemesis)))
  (people (nemesis :nemesis)))

(defn main [&]
  (def people {"ian" {:age "young at heart"
                      :nemesis "jeffrey"}
               "jeffrey" {:age 7.5
                          :nemesis "sarah"}})
  (print (enemy-of-enemy "ian" people)))

Let’s try running it:

janet step.janet

Ah. Just… a blank line. Huh. That’s mysterious. There’s no error, but still… it would be nice if we could poke around inside the runtime environment of our program to see what we did wrong.

Which we can do by adding a call to (debug):

(defn enemy-of-enemy [name people]
  (debug)
  (def subject (people name))
  (def nemesis (people (subject :nemesis)))
  (people (nemesis :nemesis)))
janet step.janet
debug:
  in enemy-of-enemy [step.janet] on line 2, column 3
  in main [step.janet] (tailcall) on line 12, column 10
  in run-main [boot.janet] on line 3795, column 16
  in cli-main [boot.janet] on line 3940, column 17

Ah, umm, whoops. By default the (debug) function just raises a signal, and there’s nothing handling that signal unless we start our program with janet -d:

janet -d step.janet
debug:
  in enemy-of-enemy [step.janet] on line 2, column 3
  in main [step.janet] (tailcall) on line 12, column 10
entering debug[1] - (quit) to exit
debug[1]:1:>

That’s better. Now let’s see what went wrong:

debug[1]:1:> (.locals)
@{enemy-of-enemy <function enemy-of-enemy> name "ian" people {"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}}}

Well, yes, the function just started. Let’s see if maybe the subject lookup failed, by stepping over the next instruction:

debug[1]:2:> (.step)
nil
debug[1]:3:> ((.locals) 'subject)
nil

Ah. Well. You might conclude that the subject lookup didn’t work, but actually…

Actually we did not step over the line (def subject (people name)). We actually stepped through a single virtual machine instruction.

We haven’t talked much about this yet, but we aren’t actually interpreting Janet code at runtime. We’re actually interpreting a precompiled bytecode.

We can look at the code we’re “actually” running with .ppasm:

debug[1]:4:> (.ppasm)

  signal:
  status:     debug
  function:   enemy-of-enemy [step.janet]
  constants:  @[:nemesis]
  slots:      @["ian" {"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}} <function enemy-of-enemy> nil nil nil nil nil nil]

   lds 2                # line 1, column 1
   ldn 4                # line 2, column 3
   sig 3 4 2
 > push 0               # line 3, column 16
   call 4 1
   movn 5 4             # line 3, column 3
   ldc 6 0              # line 4, column 24
   push 6
   call 6 5
   push 6               # line 4, column 16
   call 7 1
   movn 6 7             # line 4, column 3
   ldc 8 0              # line 5, column 11
   push 8
   call 8 6
   push 8               # line 5, column 3
   tcall 1

nil

And that shows us where we actually are. We’re about to execute line 3, column 16 — that would be the (people name) expression.

How do these instructions correspond to (people name)? Well, push 0 pushes the name argument onto the stack, and call 4 1 invokes the people “function” with the current arguments on the stack, storing the result in “slot 4”.

How did I know that? Well, a big part of it is looking at the “slots” argument as a way to translate all those numbers into values:

["ian" {"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}} <function enemy-of-enemy> nil nil nil nil nil nil]

From this we can guess that slot 0 is the local variable name and slot 1 is the local variable people. Slot 2 is a reference to the function itself (to allow recursion), and the rest of the slots are currently nil — they haven’t been filled in yet.

So we can rewrite this virtual machine code into something slightly more readable:

push name                # line 3, column 16
slots[4] = call people
slots[5] = slots[4]      # line 3, column 3

From this we can guess that slot 4 is a scratch space for the result of the (people name) call, and slot 5 corresponds to the subject local variable.

Let’s step a few more times until we’ve run those instructions:

debug[1]:5:> (.step)
nil
debug[1]:6:> (.step)
nil
debug[1]:7:> (.step)
nil
debug[1]:8:> (.ppasm)

  signal:
  status:     debug
  function:   enemy-of-enemy [step.janet]
  constants:  @[:nemesis]
  slots:      @["ian" {"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}} <function enemy-of-enemy> nil {:age "young at heart" :nemesis "jeffrey"} {:age "young at heart" :nemesis "jeffrey"} nil nil nil]

   lds 2                # line 1, column 1
   ldn 4                # line 2, column 3
   sig 3 4 2
   push 0               # line 3, column 16
   call 4 1
   movn 5 4             # line 3, column 3
 > ldc 6 0              # line 4, column 24
   push 6
   call 6 5
   push 6               # line 4, column 16
   call 7 1
   movn 6 7             # line 4, column 3
   ldc 8 0              # line 5, column 11
   push 8
   call 8 6
   push 8               # line 5, column 3
   tcall 1

nil

Great. We just ran the movn instruction. And now?

debug[1]:9:> ((.locals) 'subject)
{:age "young at heart" :nemesis "jeffrey"}

Great! Looks like it worked.

Stepping one instruction at a time isn’t exactly the best way to debug, though. And we can step multiple instructions at a time, by using (.step count). But I don’t really want to count instructions.

Fortunately, the .ppasm output has line and column numbers, and we can use those to set breakpoints:

debug[1]:10:> (debug/break "step.janet" 5 11)
nil

And then we can continue until the next breakpoint:

debug[1]:11:> (.next)
nil
debug[1]:12:> (.ppasm)

  signal:
  status:     debug
  function:   enemy-of-enemy [step.janet]
  constants:  @[:nemesis]
  slots:      @["ian" {"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}} <function enemy-of-enemy> nil {:age "young at heart" :nemesis "jeffrey"} {:age "young at heart" :nemesis "jeffrey"} {:age 7.5 :nemesis "sarah"} {:age 7.5 :nemesis "sarah"} nil]

   lds 2                # line 1, column 1
   ldn 4                # line 2, column 3
   sig 3 4 2
   push 0               # line 3, column 16
   call 4 1
   movn 5 4             # line 3, column 3
   ldc 6 0              # line 4, column 24
   push 6
   call 6 5
   push 6               # line 4, column 16
   call 7 1
   movn 6 7             # line 4, column 3
*> ldc 8 0              # line 5, column 11
   push 8
   call 8 6
   push 8               # line 5, column 3
   tcall 1

nil

Great! (The asterisk is telling us that we still have a breakpoint set there.)

So if you remember our original program, we just ran the line def nemesis:

(defn enemy-of-enemy [name people]
  (debug)
  (def subject (people name))
  (def nemesis (people (subject :nemesis)))
  # we are here
  (people (nemesis :nemesis)))

And now we’re going to do one more lookup in the people struct and return the result. At this point we have all the information we need to figure out why this doesn’t work:

debug[1]:13:> (def nemesis ((.locals) 'nemesis))
{:age 7.5 :nemesis "sarah"}
debug[1]:14:> (def people ((.locals) 'people))
{"ian" {:age "young at heart" :nemesis "jeffrey"} "jeffrey" {:age 7.5 :nemesis "sarah"}}
debug[1]:15:> (nemesis :nemesis)
"sarah"
debug[1]:16:> (people "sarah")
nil

Aha. That would do it, wouldn’t it?

Now, deciphering Janet bytecode was, umm, clearly overkill for such a trivial bug. But this runtime step-through debugging can be quite useful in hairier situations. It’s not the most ergonomic experience, though, so it’s probably not something that you should reach for unless you’re pretty well stumped already.

I don’t use the debugger much, personally. Janet’s interactive debugger is mostly useful for deep bugs — bugs that are hard to reproduce in test cases, that only show up after accumulating a bit of runtime state, or that only arise non-deterministically. Deep bugs are unavoidable, but I like to structure my programs to be as shallow as possible, so that I can shake out as many bugs as I can with automated tests.

There are some more commands available in the debugger that we haven’t talked about. You can see them all with autocomplete:

debug[1]:1:> .<TAB>
.break
.breakall
.bytecode
.clear
.clearall
.disasm
.fiber
.fn
.frame
.locals
.next
.nextc
.ppasm
.signal
.slot
.slots
.source
.stack
.step

And you can read about them with doc:

debug[1]:2:> (doc .bytecode)


    function
    boot.janet on line 3411, column 3

    (.bytecode &opt n)

    Get the bytecode for the current function.


nil

But I think that .locals, .ppasm, and debug/break are the most useful ones to know about.

There’s one more thing I’ll mention before we leave here.

By default, you can’t use janet -d to break on top-level errors. This means that you can’t use the interactive debugger for errors that you encounter during the compilation phase — including errors inside macro definitions.

raise.janet
(error "oh no")
janet -d raise.janet
error: oh no
  in _thunk [raise.janet] (tailcall) on line 1, column 1

To catch top-level errors like this, you also have to pass -p:

janet -p -d raise.janet
error: oh no
  in _thunk [raise.janet] (tailcall) on line 1, column 1
entering debug[1] - (quit) to exit
debug[1]:1:>

This can be pretty helpful when you’re debugging macros so tricky that your tests don’t even compile.

Chapter Twelve: Scripting

I think that Janet is a very good scripting language. Janet scripts have almost no startup time, PEGs make ad-hoc text-wrangling easy and fun, and you can even compile Janet scripts to native executables if you want to share them with people who have never even heard of Janet.

But in this chapter we’re going to talk about a couple of libraries that transform Janet from a very good scripting language into the best scripting language.

First of all, I want to talk about a library called sh, which was written by a prolific Janetor named Andrew Chambers. sh is one of the few libraries that I would recommend installing globally, because it’s Just That Good:

jpm install sh

But also because that way you can import it from any shebanged Janet script you write without having to set up a project.janet first.

The core of the sh library is a macro called $, which executes a shell command:

greet
#!/usr/bin/env janet
(use sh)

($ echo "hey janet")
./greet
hey janet

But $ supports a surprising amount of shell-like syntax. For example, you can use it set up a multi-process pipeline:

greet
#!/usr/bin/env janet
(use sh)

($ echo "hey janet" | tr "a-z" "A-Z")
./greet
HEY JANET

Just like in a real shell.

sh also supports redirection, either to files:

greet
#!/usr/bin/env janet
(use sh)

($ echo "hey janet" >(file/open "output.txt" :w))
($ cat output.txt)
./greet
hey janet

Or to Janet buffers:

greet
#!/usr/bin/env janet
(use sh)

(def output @"")
($ echo "hey janet" >,output)

(prin output)
./greet
HEY JANET

Although that’s sort of a silly redirection, because janet-sh includes another macro, $<, which runs a command and returns the output as a string:

(def output ($< echo "hey janet"))

Although I think that $<_ is more useful — $<_ is just like $<, but it strips trailing whitespace from its output.

You can also redirect-append with >>, and you can redirect stdin with <. Just like a real shell.

baby-names
#!/usr/bin/env janet
(use sh)

(def baby-names `
thaddeus
leopold
ezekiel
`)

($ sort <,baby-names | sed "s/^/name: /")
./baby-names
name: ezekiel
name: leopold
name: thaddeus

$ raises an exception if any of the commands in its pipeline exits non-zero (in Bash terms: sh implicitly sets pipefail), and it always returns nil. But you can also use $? to check if a command succeeded — it will return true for exit status 0, and false for anything else.

janet -l sh
repl:1:> ($? grep "supercalifragilistic" /usr/share/dict/words)
false

If you need more information than that, sh exports a macro called run that returns the numeric exit code of a command. It actually returns an array of exit codes, one for each command in a pipeline:

repl:2:> (run grep "supercali" /usr/share/dict/words)
@[1]
repl:3:> (run grep "supercal" /usr/share/dict/words | sort)
supercalender
supercallosal
@[0 0]

sh also exports a function called glob. It’s a function, not a macro, so you have to pass it a string:

run-tests
#!/usr/bin/env janet
(use sh)

(each file (glob "tests/*.janet" :x)
  (printf "running %s" file)
  ($ janet ,file))

And it returns an array of files that match the glob:

./run-tests
running tests/bar.janet
running tests/foo.janet

Because glob returns an array of strings, you’ll have to splice on its output in order to use it inside one of the $ macros:

janet -l sh
repl:1:> ($ echo ;(glob "tests/*.janet"))
tests/bar.janet tests/foo.janet
nil

Finally, the $ macros support a binary operator called ^. ^ will concatenate strings together into a single argument:

janet -l sh
repl:1:> (def hello "hello")
nil
repl:2:> ($ echo ,hello ^ world)
helloworld
nil

This can be handy for constructing file names and paths.

Okay! You now know almost everything there is to know about sh. Let’s try it out!

We’ll start small. Here’s a simple Bash script that checks for .c files without corresponding .h files:

for file in *.c; do
  if [[ ! -e "$(basename "$file" .c).h" ]]; then
    echo "$file is missing a header file"
  fi
done

We can rewrite that in Janet, using ^ notation:

(each file (glob "*.c")
  (unless ($? test -e ($<_ basename ,file .c) ^ .h)
    (print file " is missing a header file")))

Or with the more explicit:

(each file (glob "*.c")
  (unless ($? test -e (string ($<_ basename ,file .c) ".h"))
    (print file " is missing a header file")))

Which is more to type, but I find it easier for my brain to parse.

Now, I know that wasn’t very interesting. That was a pretty contrived example.

So let’s try a real shell script.

Hmm.

Here’s one I wrote forever ago that I’ve gotten a lot of mileage out of. It prints the description of a Nix package, because somehow there is no built-in command to do this:

~/sd/nix/info
#!/usr/bin/env bash

set -euo pipefail

nix-env -qaA "nixpkgs.$1" --json --meta \
| jq -r '.[] | .name + " " + .meta.description,
         "",
         (.meta.longDescription | rtrimstr("\n"))'

We can port this into Janet pretty easily:

nix-info
#!/usr/bin/env janet

(use sh)

($ nix-env -qaA nixpkgs. ^ (in (dyn *args*) 1) --json --meta
 | jq -r `.[] | .name + " " + .meta.description,
          "",
          (.meta.longDescription | rtrimstr("\n"))`)

Three things to notice about this:

It works, though:

./nix-info git
git-2.37.1 Distributed version control system

Git, a popular distributed version control system designed to
handle very large projects with speed and efficiency.

But is this any better than writing Bash? No. Not really. This is a platonic ideal shell script: start a process, pipe the output to another process, exit. Janet isn’t really bringing anything new to the table here.

But I think it’s really impressive that the Janet implementation is not worse than the equivalent Bash script. Subprocess DSLs that I have seen in JavaScript and Python add a lot more noise.

In fact, the only thing I don’t like about the Janet version is the argument handling — (in (dyn *args*) 1) is pretty messy looking. We could simplify this a little by wrapping it in a main function and reading the arguments from its parameters, but I’m going to propose an alternative approach:

nix-info
#!/usr/bin/env janet

(use sh)
(import cmd)

(cmd/def package (required :string))

($ nix-env -qaA nixpkgs. ^ ,package --json --meta
 | jq -r `.[] | .name + " " + .meta.description,
          "",
          (.meta.longDescription | rtrimstr("\n"))`)

cmd is a library for parsing command-line arguments. (cmd/def package (required :string)) says that our executable takes one required positional argument, a string, and binds it to the symbol package.

I wrote cmd, so I’m certainly biased, but I think that sh and cmd together provide a better interface for writing “shell scripts” than Bash does — or indeed any higher-level language. The above doesn’t really showcase what it can do, but as soon as we add a named argument:

nix-info
#!/usr/bin/env janet

(use sh)
(import cmd)

(cmd/def
  package (required :string)
  --name (flag))

(defn query-name [name]
  ($< nix-env -qa ,name --json --meta))

(defn query-attr [attr]
  ($< nix-env -qaA nixpkgs. ^ ,attr --json --meta))

($ <(if name (query-name package) (query-attr package))
   jq -r `.[] | .name + " " + .meta.description,
          "",
          (.meta.longDescription | rtrimstr("\n"))`)

Now we can type ./nix-info git or ./nix-info git --name or ./nix-info --name git, and the cmd library will assign a boolean value to name based on whether or not the flag was specified.

This is still extremely simple, but I hope that you can see how this scales better than the equivalent command-line argument parsing code in Bash. Of course you could write this in Bash, and the code wouldn’t even be too bad, since we’re not trying to parse named arguments that have values after them. But we could, with cmd.

By using cmd/def, we also got an autogenerated --help flag, which, at the moment is pretty sparse:

./nix-info --help
  ./nix-info STRING

=== flags ===

  [--help] : Print this help text and exit
  [--name] : undocumented

Though by adding a few more annotations:

(cmd/def "Print the description of Nix derivation."
  package (required ["<package>" :string])
  --name (flag) "Query by name instead of attribute")

We can get slightly better --help output:

./nix-info --help
Print the description of Nix derivation.

  ./nix-info <package>

=== flags ===

  [--help] : Print this help text and exit
  [--name] : Query by name instead of attribute

This is a very small taste of what cmd can do, and I’m not going to give an exhaustive description of it in this book. But for a slightly larger taste:

# named arguments start with a hyphen
(cmd/def --foo (required :string))

# positional arguments don't
(cmd/def foo (required :number))

# arguments can be optional
(cmd/def --foo (optional :file))

# you can specify multiple aliases for a named argument
(cmd/def [--foo -f] (optional :string))

# as well as a separate name to use for the Janet variable
(cmd/def [bar --foo] :number)

cmd is a lot less interesting than sh, because it should — hopefully — just stay out of your way. The official documentation describes how to use it in great detail, but most of the scripts you write will probably just have a flag or two, and the above examples should be sufficient.

So: with the combined power of sh and cmd, we can actually replace a lot of shell scripts with Janet scripts. And by doing so we get saner error handling, we don’t have to worry about word-splitting, and we have full access to sequential and associative arrays that actually make sense.

But we also have access to a superpower of Janet: PEGs.

Writing scripts with PEGs is so much nicer than wrangling Awk or Sed invocations that I think it’s pretty hard to go back once you’ve done it a few times. And I say that as someone who loves Sed — and who almost tolerates Awk.

So. With these three superpowers at our disposal — sh, cmd, and Janet’s native PEGs — let’s write a little project together.

I want to try writing a little todo list in Janet. It will be very, very simple, but at the end we’ll have something that is actually usable and perhaps even useful. And it can serve as a starting point for a custom todo list app exactly tailored to your personal workflow.

We’ll store our todo list in a plain text file that looks like this:

- [ ] this is a task to do
- [x] this one is already done

This is a pretty simple file format — we could parse this with Sed no problem. But by writing this with PEGs, we’ll actually be able to support tasks that span multiple lines. Which I fully admit is not very useful, but it’s neat that we can do it.

Our “app” will expose the following command line interface:

This is a very barebones interface, but it’s a good starting point.

When we print the todo list, we’ll strike out completed tasks, and word wrap any longer tasks to the width of your terminal. Like this:

- [x] a completed task
- [ ] pretend like this is a task and
      you have a very narrow terminal
- [ ] a shorter task

But rather than writing our own line-wrapping function, we’ll just shell out to the standard fold command. And rather than querying the terminfo database directly, we’ll just shell out to tput.

When you mark a task completed, we’ll actually show an interactive menu from which you can select tasks. And we’ll do that by just shelling out to fzf, which will even let us support crossing off multiple tasks at once.

Of course we could implement all of this functionality in pure Janet, but by harnessing the power of sh we can implement it very easily. In fact this whole program — with nicely formatted output and fzf-powered interactive multi-select — will weigh under 100 lines.

99 lines, to be exact:

#!/usr/bin/env janet

(use sh)
(import cmd)

(defn strikethrough [text] (string "\e[9m" text "\e[0m"))

(def todo-file (string/format "%s/todo" (os/getenv "HOME")))

(def char-to-state {" " :todo "x" :done})
(def state-to-char (invert char-to-state))

(def task-peg (peg/compile
  ~{:main (* (any (* :task (+ "\n" -1))) -1)
    :state (cmt (* "- [" (<- (to "]")) "]") ,|(char-to-state $))
    :text (/ (<- (to (+ "\n- [" -1))) ,string/trim)
    :task (/ (* :state :text) ,|@{:state $0 :text $1})}))

(defn parse-tasks []
  (assert (peg/match task-peg (slurp todo-file))
    "could not parse todo list"))

(def cols (scan-number ($<_ tput cols)))

(defn print-task [{:state state :text text}]
  (def decorate (case state 
    :done strikethrough 
    identity))
  (def prefix (string/format "- [%s] " (state-to-char state)))
  (def indent (string/repeat " " (length prefix)))
  (def wrap-width (- cols (length prefix)))
  (def wrapped-text ($< fold <,text -s -w ,wrap-width))
  (def lines (string/split "\n" wrapped-text))
  (eachp [i line] lines
    (print
      (if (= i 0) prefix indent)
      (decorate line))))

(defn print-tasks [tasks]
  (each task (sort-by |(in $ :state) tasks)
    (print-task task)))

(defn first-word [str]
  (take-while |(not= $ (chr " ")) str))

(defn save-tasks [tasks]
  (def temp-file (string todo-file ".bup"))
  (with [f (file/open temp-file :a)]
    (each {:state state :text text} tasks
      ($ printf -- "- [%s] %s\n" (state-to-char state) ,text >>,f)))
  ($ mv ,temp-file ,todo-file))

(cmd/defn to-done "cross something off" []
  (def tasks (parse-tasks))
  (def input @"")
  (loop [[i {:state state :text text}] :pairs tasks 
         :when (= state :todo)]
    (buffer/push-string input
      (string/format "%d %s" i text))
    (buffer/push-byte input 0))

  (when (empty? input)
    (print "nothing to do!")
    (os/exit 0))

  (def output @"")
  (def [exit-status] 
    (run fzf <,input >,output --height 10 --multi --print0 --with-nth "2.." --read0))
  (def selections
    (case exit-status
      0 (drop -1 (string/split "\0" output))
      1 []
      2 (error "fzf error")
      130 []
      (error "unknown error")))

  (each selection selections
    (def task-index (scan-number (first-word selection)))
    (def task (in tasks task-index))
    (set (task :state) :done)
    (print-task task))

  (unless (empty? selections)
    (save-tasks tasks)))

(defn append-task [text]
  (with [f (file/open todo-file :a)]
    (file/write f (string/format "- [ ] %s\n" text)))
  (print-task {:state :todo :text text}))

(cmd/defn to-do "add or list tasks"
  [task (optional ["<task>" :string])]
  (if task
    (append-task task)
    (print-tasks (parse-tasks))))

(cmd/main (cmd/group "A very simple task manager."
  do to-do
  done to-done))

And that’s, you know, 99 comfortably spaced lines of code.

I’m not going to go over the whole thing, because 99 lines is pretty short for a real program but pretty long for a program in a book. But I do want to hit the highlights.

First off, we parse the todo list with a PEG.

(def task-peg (peg/compile
  ~{:main (* (any (* :task (+ "\n" -1))) -1)
    :state (cmt (* "- [" (<- (to "]")) "]") ,|(char-to-state $))
    :text (/ (<- (to (+ "\n- [" -1))) ,string/trim)
    :task (/ (* :state :text) ,|@{:state $0 :text $1})}))

This might look a little complicated at first, until you realize that it correctly parses hard-wrapped, multi-line tasks — something that’s fairly difficult to do with a plain old regular expression.

That’s not very shelly, though; that’s just Janet. So let’s take a look at task pretty-printing:

(def cols (scan-number ($<_ tput cols)))

(defn print-task [{:state state :text text}]
  (def decorate (case state :done strikethrough identity))
  (def prefix (string/format "- [%s] " (state-to-char state)))
  (def indent (string/repeat " " (length prefix)))
  (def wrap-width (- cols (length prefix)))
  (def wrapped-text ($< fold <,text -s -w ,wrap-width))
  (def lines (string/split "\n" wrapped-text))
  (eachp [i line] lines
    (print
      (if (= i 0) prefix indent)
      (decorate line))))

fold wraps text to the specified width, and tput can tell us how wide the terminal is — something that would otherwise require writing a native Janet module, because the standard library doesn’t expose this.

To implement task selection, we construct a buffer of null-terminated strings that we pass to fzf. We use run to get the exit code, because fzf returns 130 if the user presses escape to cancel, and we want to handle that gracefully. Why 130? No one knows.

(def output @"")
(def [exit-status] (run fzf <,input >,output --height 10 --multi --print0 --with-nth "2.." --read0))
(def selections
  (case exit-status
    0 (drop -1 (string/split "\0" output))
    1 []
    2 (error "fzf error")
    130 []
    (error "unknown error")))

This is a big departure from how I normally program. But I was able to hack up this todo list in, like, thirty minutes. If I weren’t using fzf to do all the heavy-lifting, I’d probably still be reading about curses bindings and ANSI escape codes right now. And if I had written this in pure shell, I’d still be working on the Sed script to parse multi-line tasks.

This is the beauty of this kind of hybrid scripting: it’s quick, it’s dirty, but it already works. This program does nothing to protect against concurrent writes, it makes far more syscalls than it needs to, and it spawns processes without any concern for the overhead. But none of that really matters, for a program used interactively by a single person.

Now, I don’t think that Janet can replace shell scripts altogether. sh and cmd make a pretty good argument, but Bash still has a lot to recommend it: there’s no equivalent of trap EXIT in Janet, nor is there an analog of the extremely-useful cp foo.bar{,.bup} expansion shorthand. It’s a lot more verbose to set and reference environment variables in Janet, and there’s no ~/foo or ~user/foo shorthand for specifying home directories. You can’t spawn background jobs at all, and Janet has no job control facilities.

So I don’t expect Janet to displace Bash for you entirely. But I think it can absolutely displace Perl, or Python, or Ruby, or whatever higher-level scripting language you currently reach for when your shell scripts get too long.

Chapter Thirteen: Macro Mischief

All the way back in Chapter Three, we looked at the following macro:

(defmacro each-reverse [identifier list & body]
  ~(do
    (var i (- (length ,list) 1))
    (while (>= i 0)
      (def ,identifier (in ,list i))
      ,;body
      (-- i))))

And we talked about two problems that it has.

The first is that we can’t use i as the name of our looping variable:

(each-reverse i [1 2 3 4 5]
  (print i))

Because the macro uses i as the index variable already:

(do
  # from the macro itself
  #    ↓
  (var i (- (length [1 2 3 4 5]) 1))
  (while (>= i 0)
    # from the macro's arguments
    #    ↓
    (def i (in [1 2 3 4 5] i))
    (print i)
    (-- i)))

The second is that the abstract syntax tree that we call list actually appears in two places in the expansion:

(each-reverse x (do (os/sleep 1) [1 2 3 4 5])
  (print x))

Which means that it’s ultimately going to be evaluated twice, duplicating work and possibly even performing side effects multiple times:

(do
  #                    first sleep
  #                         ↓
  (var i (- (length (do (os/sleep 1) [1 2 3 4 5])) 1))
  (while (>= i 0)
    #             second sleep
    #                  ↓
    (def x (in (do (os/sleep 1) [1 2 3 4 5]) i))
    (print x)
    (-- i)))

Fixing the second problem is pretty easy: we can just evaluate list once, and store the result in a variable:

(defmacro each-reverse [identifier list & body]
  ~(do
    (def list ,list)
    (var i (- (length list) 1))
    (while (>= i 0)
      (def ,identifier (in list i))
      ,;body
      (-- i))))

Except, well, that just made the first problem worse. Now not only is i off limits, but we’ve shadowed the word list as well.

It’s not hard to fix this, but I think that the fix is pretty confusing the first time you see it.

The trick is that, instead of using identifiers like i and list, we’re going to generate new, unique identifiers that won’t clash with any other symbols.

We do this with a function called gensym.

repl:1:> (gensym)
_000001

_000001 is a unique identifier that has not been used anywhere else in the program before. Janet knows it’s unique because, remember, all symbols are stored in an interning table, and gensym consults that table to make sure it’s really giving you a unique symbol every time you call it. If Janet had parsed a symbol called _000001 in your program already, it wouldn’t return _000001:

repl:1:> (def _000001 'hi)
hi
repl:2:> (gensym)
_000002

See?

We can use gensym to create names for the variables in our macro. Instead of i, we’ll use something like _000001. And instead of list, we’ll use something like _000002.

And here’s where it gets weird.

(defmacro each-reverse [identifier list & body]
  (def $list (gensym))
  (def $i (gensym))
  ~(do
    (def ,$list ,list)
    (var ,$i (- (length ,$list) 1))
    (while (>= ,$i 0)
      (def ,identifier (in ,$list ,$i))
      ,;body
      (-- ,$i))))

So $list is a symbol. The symbol (symbol "$list"). I wrote that in my program; that is a real symbol that exists.

But $list is also a variable name — or well, the name of an immutable binding, but whatever. We’ll drop the pedantry for a second; this is confusing enough already.

$list is a variable, and $list happens to be assigned to a value that is also a symbol — a symbol like _000001.

The dollar sign prefix is just a convention; it stands for $ymbol — the value of $list is the symbol that will eventually hold the value of list. list, remember, is an abstract syntax tree. So we have to unquote list to put that abstract syntax tree into the abstract syntax tree that we’re returning from our macro. And we have to unquote $list so that we make a variable called _000001 instead of a variable called $list.

So if we examine the expansion of our original problematic invocation:

(defmacro each-reverse [identifier list & body]
  (def $list (gensym))
  (def $i (gensym))
  ~(do
    (def ,$list ,list)
    (var ,$i (- (length ,$list) 1))
    (while (>= ,$i 0)
      (def ,identifier (in ,$list ,$i))
      ,;body
      (-- ,$i))))

(each-reverse i [1 2 3 4 5]
  (print i))

We’ll find something a lot like this:

(do
  (def _000001 [1 2 3 4 5])
  (var _000002 (- (length _000001) 1))
  (while (>= _000002 0)
    (def i (in _000001 _000002))
    (print i)
    (-- _000002)))

i still shows up, because that’s the name we chose for the looping variable when we called the macro. But there’s no more conflict with the i variable that the macro used internally to store the index — that has been replace with a harmless _000002.

So let’s notice a few things about this.

These problems have caused a lot of people to spend a lot of time thinking about ways to improve on this state of affairs, and there are multiple different “hygienic” macro systems designed to prevent these sorts of mistakes.

But those techniques are out of scope for this book. This is a book about Janet, and Janet macros are filthy. Raw, unfiltered syntax tree transformations. They are very powerful, and they are very simple, and they are very easy to shoot yourself in the foot with.

One way that we can reduce the likelihood of shooting ourselves in the foot is to actually look at the macro expansions. We’ve seen how to do this in the repl with macex1, but the output is pretty hard to read, and it’s easier to just test that the macro works without looking at its expansion.

But there’s a better way to test a macro expansion: Judge’s test-macro:

(use judge)

(test-macro (each-reverse i [1 2 3 4 5] (print i)))

After we run this, Judge will fill in the expansion of the macro:

(test-macro (each-reverse i [1 2 3 4 5] (print i)) 
  (do
    (def <1> [1 2 3 4 5])
    (var <2> (- (length <1>) 1))
    (while
      (>= <2> 0)
      (def i (in <1> <2>))
      (print i)
      (-- <2>))))

Judge’s test-macro code formatting is pretty basic, but it’s much better than dumping it all on one line. And notice that the gensym’d symbols are replaced by the shorter <1> and <2> identifiers, which will be stable across multiple invocations (whereas the actual underlying symbols will depend on the total number of times gensym has run since the Janet VM started running).

test-macro makes it easy to inspect macro expansions, but it has a side benefit as well: it can serve as auto-generated documentation about the behavior of complicated macros. And you can also use it if you have a question about the behavior of a built-in macro. For example, if you want to sanity check how ->> works:

(test-macro (->> foo (map f) (filter pred)) 
  (filter pred (map f foo)))

Using your source files as a repl like this is—

Okay, sorry, my editor is telling me that I’m not allowed to go on any teary-eyed tangents about the joys of testing in this chapter. We already did that, back in Chapter Eleven. Let’s get back to macros.

So now that we’ve covered the basic errors that you can make while writing a macro, let’s move on to some of the advanced errors.

Because, as complicated as our macro definition has become, it’s still a little bit fragile.

(each-reverse i [1 2 3 4 5]
  (print i))

That works great. But this:

(def length 10)
(each-reverse i [1 2 3 4 5]
  (print i))

Does not. And it’s easy to see why, when we look at the expansion:

(def length 10)
(do
  (def _000001 [1 2 3 4 5])
  (var _000002 (- (length _000001) 1))
  (while (>= _000002 0)
    (def i (in _000001 _000002))
    (print i)
    (-- _000002)))

Our macro expansion includes the symbol length. But we really meant the function called length from the standard library. But that’s not what we said: we said the symbol length, whatever that might be at the place where our macro was expanded.

The fix is easy, though: by unquoting length, we actually include the function itself in our final abstract syntax tree:

(defmacro each-reverse [identifier list & body]
  (def $list (gensym))
  (def $i (gensym))
  ~(do
    (def ,$list ,list)
    (var ,$i (- (,length ,$list) 1))
    (while (>= ,$i 0)
      (def ,identifier (in ,$list ,$i))
      ,;body
      (-- ,$i))))

Now if we test our macro expansion, we’ll wind up with something like this:

(def length 10)
(do
  (def _000001 [1 2 3 4 5])
  (var _000002 (- (<function length> _000001) 1))
  (while (>= _000002 0)
    (def i (in _000001 _000002))
    (print i)
    (-- _000002)))

But wait a minute. We fixed length, but that’s just one function! And it’s not the only function here. There’s also - — that’s a function in Janet, remember. And >=, and in. All functions. So we actually have to write:

(defmacro each-reverse [identifier list & body]
  (def $list (gensym))
  (def $i (gensym))
  ~(do
    (def ,$list ,list)
    (var ,$i (,- (,length ,$list) 1))
    (while (,>= ,$i 0)
      (def ,identifier (,in ,$list ,$i))
      ,;body
      (-- ,$i))))

Okay. At this point we’re unquoting more things than we’re quoting, and we could consider abandoning the quasiquote syntax altogether:

(defmacro each-reverse [identifier list & body]
  (def $list (gensym))
  (def $i (gensym))
  ['do
    ['def $list list]
    ['var $i [- [length $list] 1]]
    ['while [>= $i 0]
      ['def identifier [in $list $i]]
      ;body
      ['-- $i]]])

I find that harder to read, personally, but you could write a macro that way if you wanted to.

But, well, wait a minute. We covered all the functions — but what about these macros? What about while? What about def?? What if someone wants to use this macro somewhere that they’ve redefined while?

Well, you can’t redefine while, actually. while is a “special form,” a language primitive. There no variable called while for you to shadow:

repl:1:> (print while)
repl:1:1: compile error: unknown symbol while

And even if you we try to shadow while, (while) will still mean to the special built-in:

repl:1:> (var while 3)
3
repl:2:> (while (> while 0) (print while) (-- while))
3
2
1
nil

while isn’t the only symbol that’s special-cased like this. There are, in fact, 13 “special forms” in Janet:

Everything in Janet — every function, every macro, every everything — is ultimately made out of those 13 building blocks. Or, well, or it’s written in C. Lots of stuff is written in C.

But returning to our macro:

(defmacro each-reverse [identifier list & body]
  (def $list (gensym))
  (def $i (gensym))
  ~(do
    (def ,$list ,list)
    (var ,$i (,- (,length ,$list) 1))
    (while (,>= ,$i 0)
      (def ,identifier (,in ,$list ,$i))
      ,;body
      (-- ,$i))))

There is still one symbol here that is not a special form.

-- is a normal macro, and if someone shadowed it and then used our each-reverse function, it wouldn’t work correctly.

(def -- :minus-minus)
(each-reverse i [1 2 3]
  (print i))

And it isn’t obvious that that’s the case, right? I mean, we wrote each-reverse, so we know that the macro expands to include --. But if we were publishing this macro as part of a library, we don’t really want the users of our library to have to know anything about the expansion of the macro. We want it to just work, always and transparently, just like functions do.

Now, the unquoting trick that we used on functions doesn’t exactly work on macros. If you unquote a macro, it doesn’t unquote to a macro — it unquotes to the abstract-syntax-tree-transforming function that backs the macro. But we don’t want to call that function at runtime, when we eventually run the expanded code — we want to call that function at compile-time.

So there’s a sort of canonical way to do this, which is to use a macro called as-macro. as-macro is a trivial macro that takes a function and some arguments and calls the function at compile time. It lets us “unquote” macros, and we can use it to fix this macro definition:

(defmacro each-reverse [identifier list & body]
  (def $list (gensym))
  (def $i (gensym))
  ~(do
    (def ,$list ,list)
    (var ,$i (,- (,length ,$list) 1))
    (while (,>= ,$i 0)
      (def ,identifier (,in ,$list ,$i))
      ,;body
      (as-macro ,-- ,$i))))

Except, of course, that we have only moved the problem.

If someone shadows as-macro, we’re exactly back to where we started. as-macro is not a special form, so it is possible to shadow it.

And look: no one is going to shadow as-macro. I know that. You know that. If someone shadows as-macro and then complains that your macros don’t work, that’s not… that’s just not a reasonable thing to try to protect against.

But still. We’ve already come this far. Let’s make this thing airtight.

So macros are just functions, right? Functions from abstract syntax trees to abstract syntax trees. And we can just directly call those functions at compile time — we don’t need to go through as-macro at all.

(defmacro each-reverse [identifier list & body]
  (def $list (gensym))
  (def $i (gensym))
  # we have to make a new function binding,
  # because -- is a macro binding, and we
  # want to call it as a function
  (def fn-- --)
  ~(do
    (def ,$list ,list)
    (var ,$i (,- (,length ,$list) 1))
    (while (,>= ,$i 0)
      (def ,identifier (,in ,$list ,$i))
      ,;body
      ,(fn-- $i))))

And look: don’t do this. This is a fun exercise designed to show you that it is possible to write macros that are entirely insulated from their expanding environment.

But you shouldn’t actually write macros this defensively. We jumped the shark a few pages ago. You definitely shouldn’t worry about someone shadowing as-macro. And you probably shouldn’t even worry about writing someone shadowing -- — it’s just not worth spending the time to defend against.

But there is still a very good reason to understand how to write macros that are indifferent to the environment in which they’re used.

The reason to care about all of this is that these techniques let us write macros that refer to private functions — functions that don’t exist at all in the environment in which the macro is used.

We could, for example, write our own version of the ++ macro:

custom-macros.janet
(defn- plus-one [x]
  (+ x 1))

(defmacro plus-plus [variable]
  ~(set ,variable (plus-one ,variable)))

And then use it from another file:

main.janet
(import ./custom-macros)

(var x 0)
(custom-macros/plus-plus x)
(print x)

But that would, of course, give us an error:

janet main.janet
main.janet:4:1: compile error: unknown symbol plus-one

Because (custom-macros/plus-plus x) expands to (set x (plus-one x)), and plus-one is not in scope. And neither is custom-macros/plus-one — we defined it as a private function, after all.

But by unquoting the private plus-one function, we can still refer to it from within our macro’s expansion:

custom-macros.janet
(defn- plus-one [x]
  (+ x 1))

(defmacro plus-plus [variable]
  ~(set ,variable (,plus-one ,variable)))
janet main.janet
1

There’s not really a reason to write a “private macro,” because you can just write a private function instead and call that, like we did in the fn-- example. But you can use this to refer to otherwise public macros without needing to know exactly what name they’re bound to in the calling environment — be it foo/my-macro or just my-macro. You can use as-macro to paper over those naming differences, because you know that as-macro is always going to be called as-macro.

Alright. Now let’s go back to a reasonable version of our macro:

(defmacro each-reverse [identifier list & body]
  (def $list (gensym))
  (def $i (gensym))
  ~(do
    (def ,$list ,list)
    (var ,$i (- (,length ,$list) 1))
    (while (>= ,$i 0)
      (def ,identifier (in ,$list ,$i))
      ,;body
      (-- ,$i))))

As you can see this assumes that -, >=, in, and -- exist with their normal definitions in the calling environment. I still unquoted length, because I think that’s a common variable name. I’ll also unquote functions like tuple or struct. There’s an element of human judgment here.

Note that sometimes you actually shouldn’t unquote functions, even if you can. For example, there’s a macro in the standard library called +=. It’s defined like this:

(defmacro += [x n]
  ~(set ,x (,+ ,x ,n)))

And that’s actually really annoying!

If we shadow the function called + — say, because we want to overload it work over tuples — then += no longer does what I would expect it to. I want it to be the case that (+= x 1) is short for (set x (+ x 1)), but it’s not. It’s short for (set x (<function +> x 1)). It always invokes the + from root-env, even when we have a different + in scope.

So this is a case where I think it’s better to be deliberately unhygienic.

So, okay. We’ve written a reasonable macro. Now let’s make it look a little nicer.

It’s pretty common for macros to start by declaring a bunch of gensym‘d variables, and there’s a helper macro that makes that a little bit easier. It’s called with-syms, and we can use it to replace our explicit gensym calls with this:

(defmacro each-reverse [identifier list & body]
  (with-syms [$list $i]
    ~(do
      (def ,$list ,list)
      (var ,$i (- (,length ,$list) 1))
      (while (>= ,$i 0)
        (def ,identifier (in ,$list ,$i))
        ,;body
        (-- ,$i)))))

It’s also common to use let to bind those temporary variables to their corresponding abstract syntax trees, so that we don’t need the explicit do:

(defmacro each-reverse [identifier list & body]
  (with-syms [$list $i]
    ~(let [,$list ,list]
      (var ,$i (- (,length ,$list) 1))
      (while (>= ,$i 0)
        (def ,identifier (in ,$list ,$i))
        ,;body
        (-- ,$i)))))

Although in this case $i is going to be a variable, not a binding, so it has to stay out of the let. But that still saved us one line.

You’ll see this pattern a lot when you’re writing macros:

(defmacro something [foo bar]
  (with-syms [$foo $bar]
    ~(let [,$foo ,foo
           ,$bar ,bar]
      ...)))

So it’s a good idea to try to write it once or twice so that it makes sense.

I think that this final version is a pretty good macro:

(defmacro each-reverse [identifier list & body]
  (with-syms [$i $list]
    ~(let [,$list ,list]
      (var ,$i (,dec (,length ,$list)))
      (while (>= ,$i 0)
        (def ,identifier (in ,$list ,$i))
        ,;body
        (-- ,$i)))))

You could unquote a little more, you could expand the -- macro ahead of time, but I think that’s how I would write this one.

In fact, that’s how I did write that macro, all the way back in Chapter One. Remember that? This was the code that I showed to try to scare you away from Janet before you had a chance to fall in love. Not so scary now, is it?

So let’s try something scarier.

Back in Chapter Eight, I presented a hypothetical macro designed to loosely mimic JavaScript’s class syntax:

(class Counter
  constructor (fn [self] (set (self :_count) 0))
  add (fn [self amount] (+= (self :_count) amount))
  increment (fn [self] (:add self 1))
  count (fn [self] (self :_count)))

And now that we understand gensym, we can actually write such a macro. Like this:

(defmacro class [name & methods]
  (def proto @{})
  (var constructor nil)

  (each [name impl] (partition 2 methods)
    (if (= name 'constructor)
      (set constructor impl)
      (put proto (keyword name) impl)))

  (with-syms [$proto $constructor]
    ~(def ,name 
      (let [,$proto ,proto
            ,$constructor ,constructor]
        (fn [& args]
          (def self (,table/setproto @{} ,$proto))
          (,$constructor self ;args)
          self)))))

It’s a bit more complicated, but I think you’re ready for it.

First off, we use partition to chunk the input into pairs — so given a list like ['foo 1 'bar 2 'baz 3], it’ll give us [['foo 1] ['bar 2] ['baz 3]]. Then we iterate over those pairs, convert symbols like 'add into keywords like :add, and then stick them into a table. Except for the symbol constructor, which is special-cased.

After the loop runs, we’ll have a table from keywords to abstract syntax trees. Note that we haven’t evaluated the functions yet! They’re just syntax trees at this point, so our proto table will look like this:

@{:add ['fn '[self amount] ['+= ['self :_count] 'amount]]
  :increment ['fn '[self] [:add 'self 1]]
  :count ['fn '[self] ['self :_count]]}

Similarly, constructor is not a constructor function — it’s an abstract syntax tree that will become the constructor function after we evaluate it.

['fn '[self] ['set ['self :_count] 0]]

We can’t evaluate these abstract syntax trees directly, but we can return them from our macro for Janet to evaluate later. We have to make sure to only evaluate them once, so we use with-syms to mint temporary names to store the evaluated results in.

The rest is hopefully straightforward. Or at least… tractable. The final expansion looks something like this:

(def Counter
  (let [_000001 @{:add (fn [self amount] (+= (self :_count) amount)) 
                  :count (fn [self] (self :_count))
                  :increment (fn [self] (:add self 1))}
        _000002 (fn [self] (set (self :_count) 0))]
    (fn [& args]
      (def self (<cfunction table/setproto> @{} _000001))
      (_000002 self (splice args)) 
      self)))

So the prototype is evaluated once, and then constructor function is evaluated once, and then we create a function that creates a new table, sets its prototype, calls the constructor function, and then finally returns the table. And we call that function Counter.

That wasn’t too bad, right?

Because we’re manipulating syntax trees, we could actually go a little further. We could ditch the fn, and implicitly include self, so that we just write something like this instead:

(class Counter
  (constructor [] (set (self :_count) 0))
  (add [amount] (+= (self :_count) amount))
  (increment [] (:add self 1))
  (count [] (self :_count)))

I’m not saying that’s better, but it’s a thing that we could do. We’d have to go in and modify the inputs so that we still returned something like (fn [self] ...), but we don’t have to take that as an argument.

We could also add some kind of extends syntax for subclassing, if we wanted to. We could do anything! It’s just a question of manipulating abstract syntax trees. Carefully manipulating abstract syntax trees — don’t forget about the gensyming and the function unquoting.

Alright. At this point, I think that you’re ready to go out into the world and write macros safely and robustly. But before we leave, I want to talk about “abstract syntax trees.”

I’ve used that term a lot, but I never actually explained what I meant by it. After all, the things I’m calling abstract syntax trees in one place could be safely called symbols or tuples in other places. Because that’s what they are.

Back in Chapter One, we talked about all the different values of Janet. And we talked about them as normal values and data structures. But every Janet value is, simultaneously, an abstract syntax tree. And all I mean by that is that you can pass any Janet value to the compile function, and it will give you back a nullary function that does something with it.

But what, exactly? What does it mean for a struct to be an abstract syntax tree? Or a buffer? How does that work?

Well, let’s find out. We’ll go over all of the values of Janet one more time, and consider them no longer as regular values, but as “abstract syntax trees” representing Janet programs.

We’ll start simple. Most values just evaluate to themselves. This includes simple “atomic” values like numbers, strings, nil, booleans, and keywords:

values
repl:1:> ((compile "hello"))
"hello"
repl:2:> ((compile 123))
123

Functions and cfunctions also evaluate to themselves — that’s why we were able to unquote functions earlier in this chapter.

functions
repl:3:> ((compile pos?))
<function pos?>
repl:4:> ((compile int?))
<cfunction int?>

And so do fibers, which is a little bit weird. It’s weird because fibers are mutable: if you write a macro that returns an abstract syntax tree that contains a fiber, then that’s going to be the same fiber every time you call it:

repl:5:> (def count (coro (yield 1) (yield 2) (yield 3)))
<fiber 0x600003FBC1C0>
repl:6:> (defmacro counter [] ~(do (each x ,count (print x))))
<function counter>
repl:7:> (counter)
1
2
3
nil
repl:8:> (counter)
nil

Abstract types and pointers also evaluate to themselves, always, even if the underlying type is mutable.

repl:9:> (def peg (peg/compile "abc"))
<core/peg 0x6000037BC340>
repl:10:> ((compile peg))
<core/peg 0x6000037BC340>

Symbols are the first things that don’t evaluate to themselves. Symbols evaluate to, well, a lookup of that symbol.

repl:11:> (compile 'foo)
@{:error "unknown symbol foo"}
repl:12:> ((compile 'peg))
<core/peg 0x6000037BC340>

If you want to evaluate the symbol itself, then you have to quote it:

repl:12:> ((compile '(quote foo)))
foo

Or, more cryptically:

repl:13:> ((compile ''peg))
peg

Sometimes you’ll want to write macros that take symbols as inputs and return the actual symbols themselves, not the values they become. To do this, you need to explicitly quote them:

repl:14:> (defmacro symbolton [sym val] ~{(quote ,sym) ,val})
<function symbolton>
repl:15:> (symbolton x 1)
{x 1}

I think that’s the most explicit way to write this, but once you’re more comfortable with quasiquoting, you might prefer the more cryptic:

(defmacro symbolton [sym val] {~',sym val})

Okay. Continuing onward: tuples become function invocations:

repl:17:> ((compile ['+ 1 2]))
3
repl:18:> ((compile ~(+ 1 2)))
3

Although the first argument can be something other than a symbol. It could be an actual function:

repl:19:> ((compile [+ 1 2]))
3
repl:20:> ((compile ~(,+ 1 2)))
3

Or it could be any other “callable” value:

repl:21:> ((compile [{:foo 123} :foo]))
123

But wait a minute.

Tuples become invocations. But what if we just want to return a tuple?

Well, I have somehow managed to skirt this fact for this entire book, but there are actually two kinds of tuples in Janet: bracketed tuples and parenthesized tuples.

Confusingly, you make a parenthesized tuple using square brackets, like [1 2 3], or by quoting parentheses, like '(1 2 3), or by using the tuple function.

repl:22:> [1 2 3]
(1 2 3)
repl:23:> '(1 2 3)
(1 2 3)
repl:24:> (tuple 1 2 3)
(1 2 3)

And so far all the tuples we’ve been compiling have been parenthesized tuples.

But you can make bracketed tuples by quoting brackets, or by using the tuple/brackets function:

repl:25:> '[1 2 3]
[1 2 3]
repl:26:> (tuple/brackets 1 2 3)
[1 2 3]

You will probably only encounter bracketed tuples when you’re writing macros — apart from compile, they behave identically to normal, “parenthesized” tuples at runtime.

You can inspect the type of the tuple like this:

repl:27:> (tuple/type '(1 2 3))
:parens
repl:28:> (tuple/type '[1 2 3])
:brackets

When you compile a bracketed tuple, you get a regular, parenthesized tuple back:

repl:29:> ((compile '[1 2 3]))
(1 2 3)

Which makes sense: evaluating the abstract syntax tree '[1 2 3] does exactly the same thing that typing the characters [1 2 3] into a text file would do.

Note that, when you’re compiling such tuples, every element in the tuple is treated as an abstract syntax tree as well. It also gets compiled according to the rules that we’re setting out here:

repl:30:> ((compile (tuple/brackets [+ 1 2] 4)))
(3 4)

The same is true for arrays. Every element of the array is interpreted as an abstract syntax tree:

repl:31:> (def foo @[1 [+ 1 1] 3])
@[1 (<function +> 1 1) 3]
repl:32:> ((compile foo))
@[1 2 3]

But note that this will always return a new array, even if there’s nothing to “evaluate” inside of it:

repl:33:> (def bar @[])
@[]
repl:34:> (= bar ((compile bar)))
false

Which makes sense: typing @[] into a file also always creates a new array.

Structs evaluate their keys and their values as abstract syntax trees, and return a struct containing the results of that evaluation:

repl:35:> ((compile {'+ 1}))
{<function +> 1}

And tables do the same thing, but, like arrays, they always return a new table:

repl:36:> ((compile @{''plus '+}))
@{plus <function +>}

Finally, buffers — mutable strings — evaluate to copies of themselves:

repl:37:> (def hello @"hello")
@"hello"
repl:38:> ((compile hello))
@"hello"
repl:39:> (= hello ((compile hello)))
false

And those are all of the values of Janet. Again.

Every value is a valid argument to the compile function. Every value, if given the chance, can become a brand new Janet program. We have witnessed the duality between code and data, and we have emerged enlightened.

Or, well, I shouldn’t speak for you, I guess.

Do you feel enlightened?

Did you get anything out of this?

Has this book made you better, or wiser, or stronger?

Has it inspired you to give Janet a try?

Let me know in the comments.

Er, the (report) function, that is.

The End

))))))((((((

If you liked this book, you might like my blog, or my social media presence.

Loading...