The Cat Programming Language Tutorial

Cat is a higher-order stack-oriented language. A Stack-oriented language (also known as stack-based) store and share data on one or more memory structures called the stack. Some assembly languages and intermediate languages (CIL and Java bytecode) work are examples of stack-oriented languages. Other well-known examples of stack-oriented are Postscript and Forth. Lesser known examples include Joy and Factor

Most stack-based languages use reverse polish notation (RPN), also called postfix notation, which means that arithmetic operands are written before operators. For example, the arithemtic expression written using standard infix notation would be:

    (1 + 5) * 7
in RPN would be:
    1 5 + 7 *
In RPN there is no need for parantheses, because all operators have the same precedence; everything happens in strict left to right order.

To understand how RPN and stack-oriented languages work, it is helpful to think of values (such as 1 or 5) as instructions which "push" data on to the stack. Arithmetic operations like + and * are instructions which "pop" values from the stack, perform some transformation, and push the value back on to the stack.

Using a Cat interpreter (such as our online Cat interpreter) we can see explicitly what happens if we enter the Cat expression one instruction at a time (remember, we consider "1" and "5" to be instructions). Below is a transcript using the stand-alone Cat interpreter (all examples are given with regards to the stand-alone interpreter which has more features than the online interpreter)

    >> 1
    stack: 1
    >> 5
    stack: 1 5
    >> +
    stack: 6
    >> 7
    stack: 6 7
    >> *
    stack: 42

What we call instructions are also frequently called "words" in stack-oriented languages.

Manipulating the Stack

In addition to performing arithmetic in a funky RPN style, stack-oriented langauges provide instructions for manipulating the stack, for example to remove the top item (pop), duplicate the top item (dup), or swap the top two items (swap). While the names vary between the different stack-based langauges, these basic concepts are universal. Here are some examples of how the various stack-manipulation instructions work

    >> 1 2 3
    stack: 1 2 3
    >> swap
    stack: 1 3 2
    >> swap
    stack: 1 2 3
    >> dup
    stack: 1 2 3 3
    >> pop
    stack: 1 2 3
    >> [pop] dip
    stack: 1 3
We introduced a new instruction dip at the end of the previous example. This instruction applies a sequence of instructions to the stack, while temporarily removing the top item. An other example of using dip is:
    >> 1 2 3
    stack: 1 2 3
    >> [swap] dip
    stack: 2 1 3
    >> swap
    stack: 2 3 1
Using the dip instruction is useful because we can perform operations at different depths on the stack. The square brackets around a set of instructions (e.g. [ and ]) is our way of saying don't evaluate the instructions right away, just push them on to the stack. If we wrote: 1 2 3 swap dip it would result in an error.

Defining New Instructions

We can define new instructions in Cat using a special form called a definition. The compiler recognizes definitions because they start with the define keyword.

In the previous example we managed to take the third item on the stack and placing it on the top by writing [swap] dip swap. This is called the dig instruction in Cat, and can be defined as:

    define dig { [swap] dip swap }
At this point if you are using the stand-alone interpreter you should notice that it outputs something interesting:
    dig : ('a 'b 'c -> 'b 'c 'a)
This is called the type signature of dig. The Cat interpreter automatically infers the type of new instructions for us, and displays the type signature.

Type Signatures

A type signature is a representation of the type of an instruction. In stack-oriented jargon this is frequently called a stack-effect diagram, because it tells us what effect the instruction will have on the top elements of the stack. In Cat type signature can be included in the definition, but this is optional. If the type signature is left out the type will be inferred.

Sidebar: Some people may read the preceding paragraph and wonder "Why would anyone bother writing a type signature?". The reason is that it helps programmers to find bugs in their software earlier. By explcitily stating what you expect to happen (i.e. using a type signature), the compiler can compare this with what will actually happen, and tell you where you went wrong. Without this mechanism if you make a mistake, the interpreter will simply accept what you do without complaint and do something unexpected. Such errors, or bugs as they are frequently called, waste value time to track down and remove, when you could be watching YouTube or playing NetHack.

The type of an expression is a classification of according to the configuration of items on the stack it consumes and produces. For example the add_int and sub_int instruction both have the same type: (int int -> int).

Note that all expressions, not just instructions, have a type. You can view an expression's type in the stand-alone interpreter by writing the expression as surrounded by square brackets and using the special instruction "#t". For example:

    >> [dup add_int add_int] #t
    ...
    type: (int int -> int)

Instructions are Functions

If you are familiar with the notion of functions, a Cat instruction is actually a stack transformation function. It accepts a stack as input and returns a stack as output. This notion is too general to be useful, so we often think of the arguments to a function as the values that an instruction actually reads (pops) off of the top of the stack, and the result(s) of a function as the values that the instructions pushes onto the stack. In Cat lingo, we sometimes call the arguments the consumption, and the results the production.

Expressions, Statements, and Variables

Expressions

An expression in Cat is any sequence of instructions. Each expression corresponds to a stack transformation function. Consider two instructions f and g. Remember that each instruction is a stack transformation function. The sequence of these two instructions, f g, is also a stack transformation function that is the result of performing the mathematical composition of these functions: g.f.

Statements

Cat consists entirely of expressions: there is no concept of statement. Every sequence of instructions denotes a stack transformation.

Variable

Cat technically has no variables. All intermediate experssion results are stored on the stack. However there are exceptions, because the stand-alone Cat interpreter does support named arguments and lambda expressions.

Quotations: Delayed Evaluation

Becaue we don't always want to evaluate expressions immediately, there is a concept of quotation. A quotation in Cat is denoted by an expression surrounded by square brackets. For example: [1 +]. In order to evaluate an expression that has been quoted, we can use an instruction such as apply, which evaluates the quotation on the top of the stack immediately.

Equivalently we can think of a quotation as a function without a name on the stack, waiting to be called (or duplicated or popped or whatever).

Sidebar: The Cat instructions quote and apply are congruent to the functions delay and force in Scheme.

Conditionals

In Cat a conditional expression can be defined using the if instruction. The if instruction expects three values on the stack. A boolean value denoting the condition, above that a quotation to be evaluated if the condition is true and on the top of the stack a quotation to be evaluated if the quotation is false. An example of using the if instruction is:

    1 1 +
    eq 2
    ["As I expected!"]
    ["I need to be repaired!"]
    if

Iterative Loops

In Cat looping is often performed using the while instruction. The while instruction takes two quotations off of the stack: the loop body, and above that the loop invariant. As an exercise try to figure out what the following code will do, and then try it out.

    1
    [dup writeln inc]
    [dup 10 lteq]
    while

Lists

Constructing Lists

You construct lists in Cat by calling the to write a quotation and then call the list instruction. The constructs a list from the results of the instruction. For example [3 2 1] list yields the list (3 2 1) where 1 is the first item. The list in effect constructs a new stack, executes the quotation, and uses that stack as the list. Note that lists read left to right, which would be counter-inutitive if it weren't for the fact that we are in a postfix language! The only restriction on the quotation is that it must not look at anything else on the stack. In other words it must be valid if executed on an empty stack. This is conveyed in the type signature of the list function:

    list : (( -> 'A) -> list)

Another method of constructing a list is by using the empty list cosntructor nil, -- which is is equivalent to [] list -- and then adding items to the front of the list using cons. For example nil 3 cons 2 cons 1 cons, will produce our earlier list of (3 2 1)

Accessing Items in Lists (Deconstructing Lists)

The most fundamental form for accessing items in a list is the uncons instruction which breaks a list in two-parts, the first item and the rest of the list. Both first and rest are naturally defined as functions in the standard library. Similar to these instructions are head and tail which remove the list off the stack.

You can extract the nth item from a list using the instruction nth, and the number of items in a list using count.

Higher-Order List Functions

Some of the more interesting functions in Cat are the higher-order list functions. The term higher-order function (sometimes abbreviated HOF) is a functions that accepts functions as inputs or returns functions as a result.

One of the simplest higher-order list function is map that applies a transformation to each item in a list.

    >> [1 2 3] list [2 *] map
    stack: (2 4 6)
    >> [1 +] map
    stack: (3 5 7)

Another HOF is foreach, which applies a function to to each item in a list, and the rest of the stack.

    >> [4 3 2 1] list [writeln] foreach
    1
    2
    3
    4
    stack: _empty_
    >> 0 [1 2 3 4] list [+] for_each
    stack: 10
    >> pop [] [4 3 2 1] list [cons] for_each
    stack: (1 2 3 4)
Notice that for_each is similar to a fold instruction in other functional languages but that the function argument is any function that take n arguments and returns n - 1 results.

Cat Extensions

Named Arguments

While the official Cat language has no variables, the stand-alone Cat interpreter does provide an extension for writing definitions with named parameters. Named arguments correspond to values at specific locations on the stack.

    define pop-with-names(x) { }
    define swap-with-names(a b) { b a }
    define dup-with-names(a) { a a }
    define quadratic-with-names(x a b c) { a x x * * b x * c + + }

Lambda Expressions

The Cat interpreter provides an extension for writing Lambda calculus (LC) style expressions with named arguments. The syntax looks like:

    \a.[a a] apply          // dup instruction
    \a.[] apply             // pop instruction
    \a.\b.[b a] apply       // swap instruction
    \a.\b.\c.[c a b] apply  // bury instruction
The purpose of this extension is to explore the correspondance between programming languages with named arguments and Cat. In the interpreter is a simple algorithm for converting from LC expressions to standard Cat.

The Cat Programming Language by Christopher Diggins