Wednesday, July 6, 2011

Common Lisp lazy sequences

When we left things in Part 2, we needed a function that would return an infinite sequence of the same argument. In Clojure, we saw that function is used like so:

(repeat "+") ;; "+" is the argument to repeat infinitely

Common Lisp does not have infinite sequences built in, but they are easy to implement. The Common Lisp library clazy is an implementation of them. For the sake of demonstration, we will sort of reinvent that implemenation in a simpler fashion.

Lazy sequences

Without further ado, here is the code to implement a very rudimentary lazy sequence library:

(defpackage my-lazy
  (:use :cl)
  (:shadow cl:cdr cl:mapcar)
  (:export "CDR" "MAPCAR" "DELAY" "FORCE" "REPEAT"))

(in-package :my-lazy)

(defstruct thunk
  body)

(defun thunkp (arg)
  (eq (type-of arg) 'thunk))

(defmacro my-lazy:delay (expr)
  `(make-thunk
   :body (lambda () ,expr)))

(defun my-lazy:force (thunk)
  (if (thunkp thunk)
      (funcall (thunk-body thunk))
      thunk))

(defun my-lazy:repeat (arg)
  (cons arg
    (delay (repeat arg))))

(defun my-lazy:cdr (cons)
  "cdr for lists, force cdr for thunks"
  (force (cl:cdr cons)))

(defun my-lazy:mapcar (f list &rest more-lists)
  "Apply FUNCTION to successive elements of LIST. 
Return list of FUNCTION return values.
lists can be lazy"
  (cons (apply f
           (car list)
           (cl:mapcar 'car more-lists))
    (when (and (cdr list) (every #'identity more-lists))
      (apply 'mapcar
         f
         (cdr list)
         (cl:mapcar 'cdr more-lists)))))

The first thing you'll notice is that we create a lazy package called "my-lazy" that when used will shadow some core functions (cdr and mapcar). This shadowing is necessary because we need these functions to operate similarly regardless of if the sequence is a list or a lazy list.

Also, you'll notice that we create a structure called a "thunk" (eg defstruct thunk). Wikipedia defines a thunk as:

In computer science, a thunk (also suspension, suspended computation or delayed computation) is a parameterless closure created to prevent the evaluation of an expression until forced at a later time.

We define a thunk as a structure instead of simply a parameterless lambda so that our thunks have a type unique from any other parameterless function.

The two key items here are delay and force. Delay is a macro that creates a thunk from an expression. Force is a function that forces the evaluation of a thunk.

Cdr is redefined to "force" evaluation of the core cdr of a cons cell if the core cdr is a thunk. Otherwise, our cdr simply returns the core cdr of the cons cell.

Mapcar is also redefined. Basically, it is only redefined to use our version of cdr. Besides that, it does basically the same thing as the core mapcar.

Now let's check our example. First in Clojure:

user> (map (partial str "price & tip: ") [5000 100 50] (repeat "+") [2000 40 10])
("price & tip: 5000+2000" "price & tip: 100+40" "price & tip: 50+10")

Next using our new package in Common Lisp:

CL-USER> (in-package :my-lazy)
#<PACKAGE "MY-LAZY">

MY-LAZY> (mapcar (partial 'str "price & tip: ") 
          '(5000 100 50) (repeat "+") '(2000 40 10))
("price & tip: 5000+2000" "price & tip: 100+40" "price & tip: 50+10")

Huzzah! We have successfully replicated standard Clojure functionality in Common Lisp.

Conclusion

In Part 1 we looked at modifying the Common Lisp syntax using reader macros. In Part 2 we looked at simplifying function calls using currying techniques. Here, in Part 3, we implemented infinite sequences which eliminated a (sometimes) hard to find bug.

The great thing about Common Lisp is that it is extremely easy to implement all of these concepts. In most other languages, implementing these concepts would almost definitely be either more difficult or not possible. The only drawback is that often there is too much freedom. The Common Lisp ethos is such that developers are encoraged to implement these concepts on their own. The result is, unlike the Clojure world, often there is no cannonical answer to these problems (such as lazy sequences). As a fan of Common Lisp, I'd like to see it gain more traction. Perhaps that will take one killer application (like Rails for the Ruby language). For that to happen, it will probably also need more coherance across the community. Regardless, because of its flexibility, it is still, in my opinion, the ultimate hacker language.

Common Lisp currying

In Part 1 we looked at adding to the Common Lisp syntax using a reader macro. Now we are going to use a regular function to implement a technique called "currying". We do this to abstract away a function call with a common argument. So instead of calling the same function with the same argument over and over again in your code, you can curry that function call w/ argument into a symbol that can be called on its own.

Currying

Wikipedia defines currying as:

In mathematics and computer science, currying is the technique of transforming a function that takes multiple arguments (or an n-tuple of arguments) in such a way that it can be called as a chain of functions each with a single argument (partial application).

In Part1, we needed to curry the function "concatenate" with the argument "'string". There are a number of ways to do this. We are going to build a currying function called partial that approximates the Clojure function of the same name.

Partial

The example which uses the partial function in Clojure comes from here. Below is the same example using the #() reader macro from Part 1 along with a slightly more readable and compact version using partial.

user> (map #(apply str "price & tip: " %&) 
              [5000 100 50] (repeat "+") [2000 40 10])
("price & tip: 5000+2000" "price & tip: 100+40" "price & tip: 50+10")

user> (map (partial str "price & tip: ") 
              [5000 100 50] (repeat "+") [2000 40 10])
("price & tip: 5000+2000" "price & tip: 100+40" "price & tip: 50+10")

Notice how the use of partial gets us out of the prosaic work of using a reader macro and "applying" a list of arguments to the magic "%&" argument. The "%&" list argument is implied when we use partial.

Partial turns out to be another trivially easy function to write in Common Lisp.

(defun partial (f &rest args)
  "currying function"
  (lambda (&rest more-args)
    (apply f (append args more-args))))

Essentially, we are merging together the arguments we know with the arguments we will be passing into the curried function. Let's demonstrate it in action.

CL-USER> (mapcar [apply 'concatenate 'string "price & tip: " %&] 
          '("5000" "100" "50") (loop repeat 3 collect "+") '("2000" "40" "10"))
("price & tip: 5000+2000" "price & tip: 100+40" "price & tip: 50+10")

CL-USER> (mapcar (partial 'concatenate 'string "price & tip: ") 
          '("5000" "100" "50") (loop repeat 3 collect "+") '("2000" "40" "10"))
("price & tip: 5000+2000" "price & tip: 100+40" "price & tip: 50+10")

CL-USER> (setf (symbol-function 'concatenate-string) 
          (partial 'concatenate 'string))
#<CLOSURE (LAMBDA (&REST MORE-ARGS)) {1007DC0E09}>

CL-USER> (mapcar (partial 'concatenate-string "price & tip: ") 
          '("5000" "100" "50") (loop repeat 3 collect "+") '("2000" "40" "10"))
("price & tip: 5000+2000" "price & tip: 100+40" "price & tip: 50+10")

Now we are getting somewhere. Using partial we've been able to turn the symbol 'concatenate-string into almost the same thing as the Clojure function "str".

String methods

The more astute readers may notice the Common Lisp examples still contain a little extra cruft not found in the Clojure examples.

The two pieces of cruft I am referring to are:

  • The Common Lisp version contains double quotes around the price literals. Thus starting them out as strings. The Clojure version simply uses numbers and within the "str" function they get converted to strings. We'll adapt the Common Lisp version to do the same.
  • The more dastardly of the crufty items is the lack of Clojure's "repeat" function. The Clojure "repeat" function simply repeats the argument infinitely as a sequence of items. This works in our example because map stops when one of the finite list args runs out of items. Our Common Lisp version relies on us to use the "loop" macro. In using "loop", we must specify an iteration value that matches exactly the length of the longest of the other list args. Failure to do so will cause our code to short circuit. This can be the source of maintenance bugs in production code.

Let's tackle the first piece of cruft. Here's the Common Lisp code:

(defmethod to-string (arg) (string arg))
(defmethod to-string ((arg integer)) (write-to-string arg))

(defun str (&rest args)
  (apply 'concatenate-string  (mapcar #'to-string args)))

The first thing to notice is the use of "defmethod" instead of "defun". Defmethod is a way to create functions that are dispatched differently at runtime based on argument type. So if we pass an integer argument to "to-string", the result will be:

(write-to-string arg)

Whereas, if we pass an argument of any other type to "to-string" the result will be:

(string arg)

In other words, the latter "result" function is the default, whereas the former is a specific type implemenation. This gives us great flexibility to extend "to-string" in the future without redefining any current methods.

The "str" function behaves just like the Clojure equivalent. It takes in a list of arguments and turns them all into strings before sending them along to our "concatenate-string" function.

Approximating the Clojure "repeat" function is a little more tricky. It will require us to implement infinite lists. Part 3 will demonstrate how to do that.

Common Lisp reader macros

Lisp has been called the programmers language by many people. Usually, it is called that because Lisp, unlike more conventional languages, gives the programmer the means to modify the language into something else entirely. I am going to look at examples of code written in other Lisps and using Common Lisps' macro and function facilities, write the equivalent in Common Lisp.

Reader Macros

Part 1 will demonstrate the use of reader macros. Reader macros allow you to literally change the Common Lisp syntax. We are going to add support for brackets as anonymous functions. This will approximate the syntax Paul Graham has created in Arc Lisp. It will also approximate the #() reader macro built into Clojure.

Quick aside regarding the Common Lisp template syntax

I'm going to be using the Common Lisp template syntax for the macros demonstrated below. I'm going to assume you understand it. If not, this is a good beginner tutorial on Common Lisp macros and the template syntax.

Arc Lisp

To explain brackets as anonymous functions, here are some usage examples in Arc Lisp:

arc> ([* _ 10] 3)
30

arc> (map [* _ 3] '(1 2 3 4))
(3 6 9 12)

Now for comparison, here are the same examples expanded into long form:

arc> ((fn (_) (* _ 10)) 3)
30

arc> (map (fn (_) (* _ 3)) '(1 2 3 4))
(3 6 9 12)

As you can see, the short form reader macro is (in my opinion) easier to both read and type. It saves at least 4 parens each use. So it is a valuable abstraction.

Common Lisp does not have this macro built in. So let's create it. First off, this blog post is an excellent beginner tutorial on Common Lisp reader macros. After reading it, it became clear to me that implementing Arc-style anonymous functions using brackets would be trivially easy.

(set-macro-character #\[
         (lambda (stream char)
           (let ((sexp (read-delimited-list #\] stream t)))
         `(lambda (_) (,@sexp)))))

(set-macro-character #\]
         (get-macro-character #\)))

Ok, so the set-marco-character function takes in two arguments. The first argument is the character to watch for. The second argument is the function to execute when that character is typed.

The first use of set-macro-character is saying, "When an open bracket character is typed, read the following stream of characters into an s-expression, then return an anonymous function with that s-expression as the body of the function. The function read-delimited-list, reads a stream of characters into a list (delimited by space by default) until it reaches a close bracket character as is its first argument.

The second set-macro-character says, "Treat close bracket characters as if they were closing parens".

Let's try it out.

CL-USER> ([* _ 10] 3)
30

CL-USER> (mapcar [* _ 3] '(1 2 3 4))
(3 6 9 12)

Yea! It works. Let's not stop there.

emacs support

If you are like me and use emacs with paredit to do your Lisp hacking, you are going to want to have emacs treat brackets the same way it treats parens. Here's what you need to add to your .emacs:

(modify-syntax-entry ?[ "(]" lisp-mode-syntax-table)
(modify-syntax-entry ?] ")[" lisp-mode-syntax-table)

Clojure

The problem with the reader macro we've just implemented is that it only expands to a one argument function. Not very versatile.

The Clojure anonymous function reader macro, #(), can take one argument, multiple numbered arguments, or a "rest" list argument. Here's what they look like in action:

user> (#(* % 10) 3)
30

user> (map #(* % 3) [1 2 3 4])
(3 6 9 12)

user> (map #(* %1 %2) [1 2 3 4] [5 6 7 8])
(5 12 21 32)

user> (map #(apply str %&) ["hello, " "clojure, "] ["world" "rocks"])
("hello, world" "clojure, rocks")

Let's implement all 3 of these in Common Lisp.

;; clojure idiom
(require 'cl-ppcre)
(defun numbered-arg-as-string (arg)
  (cl-ppcre:scan-to-strings "^%\\d+$" (string arg)))

(defun single-arg-as-string (arg)
  (let ((sarg (string arg)))
    (when (string-equal "%" sarg)
      sarg)))

(defun arc-arg-as-string (arg)
  (let ((sarg (string arg)))
    (when (string-equal "_" sarg)
      sarg)))

(defun rest-arg-as-string (arg)
  (let ((sarg (string arg)))
    (when (string-equal "%&" sarg)
      sarg)))

(defun flatten (l)
  "flattens a list"
  (cond ((null l) l)
      ((atom l) (list l))
    (t (append (flatten (car l))
           (flatten (cdr l))))))

(defun make-arg-list (predicate delimited-list)
  (labels ((string-list (delimited-list)
         (mapcar (lambda (x)
               (cond ((symbolp x) (funcall predicate x))
                 ((listp x) (string-list x))))
             delimited-list)))
    (remove-duplicates (mapcar #'intern
                   (sort (flatten (string-list delimited-list))
                     #'string-lessp) ;; BUG: if more than 9 numbered arguments are used
                   ))))

;; first check for numbered args, 
;; then for a single % arg, 
;; finally default to a single _ arg
;; swallow the rest args to get around style warnings
(set-macro-character #\[
             (lambda (stream char)
               (let* ((sexp (read-delimited-list #\] stream t))
                  (args (make-arg-list #'numbered-arg-as-string sexp))
                  (rest-args (make-arg-list #'rest-arg-as-string sexp))
                  (rest-arg (or (car rest-args) (gensym))))
             (unless args
               (setf args (make-arg-list #'single-arg-as-string sexp)))
             (unless args
               (setf args (make-arg-list #'arc-arg-as-string sexp))) ;; arc idiom (_)
             `(lambda (,@args &rest ,rest-arg) (identity ,rest-arg) (,@sexp)))))

(set-macro-character #\]
             (get-macro-character #\)))

So this is a lot more code than the Arc-style implementation. Really, it is the same thing. The only difference is that we need to look inside the s-expression "sexp" and find all the possible arguments. That is what make-arg-list does. A predicate is passed to make-arg-list to check for the different types of args available (eg % or %1, %2, ... or _). make-arg-list then needs to return only the distinct arguments found sorted in ascending order. This list will be turned into ,@args in our lambda expression. Also, if the argument %& is found in the s-expression, it will be turned into the &rest arg in our lambda expression.

Couple of interesting tidbits. If %& is not used in the s-expression (our code body that is), but more arguments are passed to our resulting lambda than we are accounting for, we are swallowing them in a &rest argument whose name is a (gensym) symbol. In other words, we don't care about them at all but we still want our code to run without error. If we do that, some Common Lisps (ie sbcl) will give you a warning about declaring an argument without using it. We get around seeing that warning by calling the identity function on the anonymous &rest argument before executing our code body. Yes, this wastes a CPU cycle, but who cares. For our purposes, seeing that warning message everytime seems way more annoying.

The same examples in Common Lisp now look like:

CL-USER> ([* % 10] 3)
30

CL-USER> (mapcar [* % 3] '(1 2 3 4))
(3 6 9 12)

CL-USER> (mapcar [* %1 %2] '(1 2 3 4) '(5 6 7 8))
(5 12 21 32)

CL-USER> (mapcar [apply 'concatenate 'string %&] 
          '("hello, " "common-lisp, ") '("world" "rocks"))
("hello, world" "common-lisp, rocks")

All of these examples were looking great until we got to the last one. Why do we have to remember to type the symbol 'string before the rest variable? This is precisely the kind of esoteria we should be striving to abstract away. Abstracting that away is what we will be doing in Part 2.