This module will discuss how we can have confidence that a program does what it claims to do, using unit testing and debugging strategies. It may be considered optional and skipped if you are speedrunning Hoon School.
Code courageously.
If you avoid changing a section of code for fear of awakening the demons therein, you are living in fear. If you stay in the comfortable confines of the small section of the code you wrote or know well, you will never write legendary code. All code was written by humans and can be mastered by humans.
It's natural to feel fear of code; however, you must act as though you are able to master and change any part of it. To code courageously is to walk into any abyss, bring light, and make it right.
~wicdev-wisryt
When you produce software, how much confidence do you have that it does what you think it does? Bugs in code are common, but judicious testing can manifest failures so that the bugs can be identified and corrected. We can classify a testing regimen for Urbit code into a couple of layers: fences and unit tests.
Fences
Fences are barriers employed to block program execution if the state isn’t adequate to the intended task. Typically, these are implemented with assert
or similar enforcement. In Hoon, this means ?>
wutgar, ?<
wutgal, and ?~
wutsig, or judicious use of ^-
kethep and ^+
ketlus. For conditions that must succeed, the failure branch in Hoon should be !!
, which crashes the program.
Unit Tests
Unit tests are so called because they exercise the functionality of the code by interrogating individual functions and methods. Functions and methods can often be considered the atomic units of software because they are indivisible. However, what is considered to be the smallest code unit is subjective. The body of a function can be long are short, and shorter functions are arguably more unit-like than long ones.
(Katy Huff, “Python Testing and Continuous Integration”)
In many languages, unit tests refer to functions, often prefixed test
, that specify (and enforce) the expected behavior of a given function. Unit tests typically contain setup, assertions, and tear-down. In academic terms, they’re a grading script.
In Hoon, the tests/
directory contains the relevant tests for the testing framework to grab and utilize. These can be invoked with the -test
thread:
> -test /=landscape=/tests ~built /tests/lib/pull-hook-virt/hoonbuilt /tests/lib/versioning/hoon> test-supported: took 1047µsOK /lib/versioning/test-supported> test-read-version: took 28317µsOK /lib/versioning/test-read-version> test-is-root: took 28786µsOK /lib/versioning/test-is-root> test-current-version: took 507µsOK /lib/versioning/test-current-version> test-append-version: took 4804µsOK /lib/versioning/test-append-version> test-mule-scry-bad-time: took 8437µsOK /lib/pull-hook-virt/test-mule-scry-bad-time> test-mule-scry-bad-ship: took 8279µsOK /lib/pull-hook-virt/test-mule-scry-bad-ship> test-kick-mule: took 4614µsOK /lib/pull-hook-virt/test-kick-muleok=%.y
(Depending on when you built your fakeship, particular tests may or may not be present. You can download them from the Urbit repo and add them manually if you like. Regarding the example above (%landscape
desk), the tests are likely missing, so download them from here if you want to run them.)
Hoon unit tests come in two categories:
++expect-eq
(equality of two values)++expect-fail
(failure/crash)
Let's look at a practical example first, then dissect these.
Exercise: Testing a Library
Consider an absolute value arm ++absolute
for @rs
values. The unit tests for ++absolute
should accomplish a few things:
- Verify correct behavior for positive numeric input.
- Verify correct behavior for negative numeric input.
- For the purpose of demonstrating
++expect-fail
, verify an exception is raised on input of zero. (Properly speaking Hoon doesn't have exceptions because Nock is crash-only; tools likeunit
are a way of dealing with failed computations.)
(You may also think we would need to verify ++absolute
calls only succeed if the input is an @rs
, but arvo already handles this for us, as a hoon file will not build if a gate call contains an argument that does not match the sample type. So even if you wanted to add an ++expect-fail
test for it, your test file would not build.)
By convention any testing suite has the import line /+ *test
at the top.
/tests/lib/absolute.hoon
Click to expand
/+ *test, *absolute|%++ test-absolute;: weld%+ expect-eq!> .1!> (absolute .-1)%+ expect-eq!> .1!> (absolute .1)%- expect-fail|. (absolute .0)==--
Note that at this point we don’t care what the function looks like, only how it behaves.
/lib/absolute.hoon
|%++ absolute|= a=@rs?: (gth a .0) a(sub:rs .0 a)--
- Use the tests to determine what is wrong with this library code and correct it.
The dcSpark blog post “Writing Robust Hoon — A Guide To Urbit Unit Testing” covers some more good ideas about testing Hoon code.
/lib/test.hoon
In /lib/test.hoon
we find a core with a few gates: ++expect
, ++expect-eq
, and ++expect-fail
, among others.
++expect-eq
checks whether two vases are equal and pretty-prints the result of that test. It is our workhorse. The source for ++expect-eq
is:
Click to expand
++ expect-eq|= [expected=vase actual=vase]^- tang::=| result=tang::=? result !=(q.expected q.actual)%+ weld result^- tang:~ [%palm [": " ~ ~ ~] [leaf+"expected" (sell expected) ~]][%palm [": " ~ ~ ~] [leaf+"actual " (sell actual) ~]]==::=? result !(~(nest ut p.actual) | p.expected)%+ weld result^- tang:~ :+ %palm [": " ~ ~ ~]:~ [%leaf "failed to nest"](~(dunk ut p.actual) %actual)(~(dunk ut p.expected) %expected)== ==result
Test code deals in vase
s, which are produced by !>
zapgar as a cell of the type of a value and the value.
++expect-fail
by contrast take a |.
bardot trap (a trap that has the $
buc arm but hasn't been called yet) and verifies that the code within fails.
> (expect-fail:test |.(!!))~> (expect-fail:test |.((sub 0 1)))~> (expect-fail:test |.((sub 1 1)))~[[%leaf p="expected failure - succeeded"]]
(Recall that ~
null is %.y
true.)
Producing Error Messages
Formal error messages in Urbit are built of tanks.
- A
tank
is a structure for printing data.leaf
is for printing a single noun.palm
is for printing backstep-indented lists.rose
is for printing rows of data.
- A
tang
is a(list tank)
.
As your code evaluates, the Arvo runtime maintains a stack trace, or list of the evaluations and expressions that got the program to its notional point of computation. When the code fails, any error hints currently on the stack are dumped to the terminal for you to see what has gone wrong.
The
~_
sigcab rune, described as a “user-formatted tracing printf”, can include an error message for you, requiring you to explicitly build thetank
. (printf
is a reference to C's I/O library.)The
~|
sigbar rune, a “tracing printf”, can include an error message from a simple@t
cord.What this means is that these print to the stack trace if something fails, so you can use either rune to contribute to the error description:
|= [a=@ud]~_ leaf+"This code failed"!!The
!:
zapcol rune turns on line-by-line stack tracing, which is extremely helpful when debugging programs. Drop it in on the first Hoon line (after/
fas imports) of a generator or library while developing.> (sub 0 1)subtract-underflowdojo: hoon expression failed> !:((sub 0 1))/~zod/base/~2022.6.14..20.47.19..3b7a:<[1 4].[1 13]>subtract-underflowdojo: hoon expression failed
When you compose your own library cores, include error messages for likely failure modes.
Test-Driven Development
In extremis, rigorous unit testing yields test-driven development (TDD). Test-driven development refers to the practice of fully specifying desired function behavior before composing the function itself. The advantage of this approach is that it forces you to clarify ahead of time what you expect, rather than making it up on the fly.
For instance, one could publish a set of tests which characterize the behavior of a Roman numeral translation library sufficiently that when such a library is provided it is immediately demonstrable.
/tests/lib/roman.hoon
Click to expand
/+ *test, *roman|%++ test-output-one=/ src "i"=/ trg 1;: weld%+ expect-eq!> trg!> (from-roman src)%+ expect-eq!> trg!> (from-roman (cuss src))==++ test-output-two=/ src "ii"=/ trg 2;: weld%+ expect-eq!> trg!> (from-roman src)%+ expect-eq!> trg!> (from-roman (cuss src))==:: and so forth++ test-input-one=/ trg "i"=/ src 1;: weld%+ expect-eq!> trg!> (to-roman src)==++ test-input-two=/ trg "ii"=/ src 2;: weld%+ expect-eq!> trg!> (to-roman src)==:: and so forth--
By composing the unit tests ahead of time, you exercise a discipline of thinking carefully through details of the interface and implementation before you write a single line of implementation code.
Debugging Common Errors
Let’s enumerate the errors you are likely to have encountered by this point:
nest-fail
nest-fail
may be the most common. Likely you are using an atom or a cell where the other is expected.
> (add 'a' 'b')195> (add "a" "b")-need.@-have.[i=@tD t=""]nest-faildojo: hoon expression failed
mint-nice
mint-nice
arises from typechecking errors:
> ^-(tape ~[78 97 114 110 105 97])mint-nice-need.?(%~ [i=@tD t=""])-have.[@ud @ud @ud @ud @ud @ud %~]nest-faildojo: hoon expression failed
Conversion without casting via auras fails because the atom types (auras) don't nest without explicit downcasting to @
.
> `(list @ud)`~[0x0 0x1 0x2]mint-nice-need.?(%~ [i=@ud t=it(@ud)])-have.[@ux @ux @ux %~]nest-faildojo: hoon expression failed> `(list @ud)``(list @)`~[0x0 0x1 0x2]~[0 1 2]
fish-loop
A fish-loop
arises when using a recursive mold definition like list
. (The relevant mnemonic is that ++fish
goes fishing for the type of an expression.) Alas, this fails today:
> ?=((list @) ~[1 2 3 4])[%test ~[[%.y p=2]]]fish-loop
generator-build-fail
A generator-build-fail
most commonly results from composing code with mismatched runes (and thus the wrong children including hanging expected-but-empty slots).
Also check if you are using Windows-style line endings, as Unix-style line endings should be employed throughout Urbit.
Misusing the $
buc Arm
Another common mistake is to attempt to use the default $
buc arm in something that doesn't have it. This typically happens for one of two reasons:
$.+2
means that%-
cenhep or equivalent function call cannot locate a battery. This can occur when you try to use a non-gate as a gate. In particular, if you mask the name of a mold (such aslist
), then a subsequent expression that requires the mold will experience this problem.> =/ list ~[1 2 3]=/ a ~[4 5 6]`(list @ud)`a-find.$.+2-find.$
similarly looks for a$
buc arm in something that is a core but doesn't have the$
buc arm present.> *tape""> (tape)""> *(tape)-find.$
Debugging Strategies
What are some strategies for debugging?
- Debugging stack. Use the
!:
zapcol rune to turn on the debugging stack,!.
zapdot to turn it off again. (Most of the time you just pop this on at the top of a generator and leave it there.) printf
debugging. If your code will compile and run, employ~&
sigpam frequently to make sure that your code is doing what you think it’s doing.- Typecast. Include
^
ket casts frequently throughout your code. Entire categories of error can be excluded by satisfying the Hoon typechecker. - The only wolf in Alaska. Essentially a bisection search, you split your code into smaller modules and run each part until you know where the bug arose (where the wolf howled). Then you keep fencing it in tighter and tighter until you know where it arose. You can stub out arms with
!!
zapzap. - Build it again. Remove all of the complicated code from your program and add it in one line at a time. For instance, replace a complicated function with either a
~&
sigpam and!!
zapzap, or return a known static hard-coded value instead. That way as you reintroduce lines of code or parts of expressions you can narrow down what went wrong and why. - Run without networking. If you run the Urbit executable with
-L
, you cut off external networking. This is helpful if you want to mess with a copy of an actual ship without producing remote effects. That is, if other parts of Ames don’t know what you’re doing, then you can delete that copy (COPY!) of your pier and continue with the original. This is an alternative to using fakezods which is occasionally helpful in debugging userspace apps in Gall. You can also develop using a moon if you want to.