Wednesday, July 6, 2011

Common Lisp reader macros

Lisp has been called the programmers language by many people. Usually, it is called that because Lisp, unlike more conventional languages, gives the programmer the means to modify the language into something else entirely. I am going to look at examples of code written in other Lisps and using Common Lisps' macro and function facilities, write the equivalent in Common Lisp.

Reader Macros

Part 1 will demonstrate the use of reader macros. Reader macros allow you to literally change the Common Lisp syntax. We are going to add support for brackets as anonymous functions. This will approximate the syntax Paul Graham has created in Arc Lisp. It will also approximate the #() reader macro built into Clojure.

Quick aside regarding the Common Lisp template syntax

I'm going to be using the Common Lisp template syntax for the macros demonstrated below. I'm going to assume you understand it. If not, this is a good beginner tutorial on Common Lisp macros and the template syntax.

Arc Lisp

To explain brackets as anonymous functions, here are some usage examples in Arc Lisp:

arc> ([* _ 10] 3)
30

arc> (map [* _ 3] '(1 2 3 4))
(3 6 9 12)

Now for comparison, here are the same examples expanded into long form:

arc> ((fn (_) (* _ 10)) 3)
30

arc> (map (fn (_) (* _ 3)) '(1 2 3 4))
(3 6 9 12)

As you can see, the short form reader macro is (in my opinion) easier to both read and type. It saves at least 4 parens each use. So it is a valuable abstraction.

Common Lisp does not have this macro built in. So let's create it. First off, this blog post is an excellent beginner tutorial on Common Lisp reader macros. After reading it, it became clear to me that implementing Arc-style anonymous functions using brackets would be trivially easy.

(set-macro-character #\[
         (lambda (stream char)
           (let ((sexp (read-delimited-list #\] stream t)))
         `(lambda (_) (,@sexp)))))

(set-macro-character #\]
         (get-macro-character #\)))

Ok, so the set-marco-character function takes in two arguments. The first argument is the character to watch for. The second argument is the function to execute when that character is typed.

The first use of set-macro-character is saying, "When an open bracket character is typed, read the following stream of characters into an s-expression, then return an anonymous function with that s-expression as the body of the function. The function read-delimited-list, reads a stream of characters into a list (delimited by space by default) until it reaches a close bracket character as is its first argument.

The second set-macro-character says, "Treat close bracket characters as if they were closing parens".

Let's try it out.

CL-USER> ([* _ 10] 3)
30

CL-USER> (mapcar [* _ 3] '(1 2 3 4))
(3 6 9 12)

Yea! It works. Let's not stop there.

emacs support

If you are like me and use emacs with paredit to do your Lisp hacking, you are going to want to have emacs treat brackets the same way it treats parens. Here's what you need to add to your .emacs:

(modify-syntax-entry ?[ "(]" lisp-mode-syntax-table)
(modify-syntax-entry ?] ")[" lisp-mode-syntax-table)

Clojure

The problem with the reader macro we've just implemented is that it only expands to a one argument function. Not very versatile.

The Clojure anonymous function reader macro, #(), can take one argument, multiple numbered arguments, or a "rest" list argument. Here's what they look like in action:

user> (#(* % 10) 3)
30

user> (map #(* % 3) [1 2 3 4])
(3 6 9 12)

user> (map #(* %1 %2) [1 2 3 4] [5 6 7 8])
(5 12 21 32)

user> (map #(apply str %&) ["hello, " "clojure, "] ["world" "rocks"])
("hello, world" "clojure, rocks")

Let's implement all 3 of these in Common Lisp.

;; clojure idiom
(require 'cl-ppcre)
(defun numbered-arg-as-string (arg)
  (cl-ppcre:scan-to-strings "^%\\d+$" (string arg)))

(defun single-arg-as-string (arg)
  (let ((sarg (string arg)))
    (when (string-equal "%" sarg)
      sarg)))

(defun arc-arg-as-string (arg)
  (let ((sarg (string arg)))
    (when (string-equal "_" sarg)
      sarg)))

(defun rest-arg-as-string (arg)
  (let ((sarg (string arg)))
    (when (string-equal "%&" sarg)
      sarg)))

(defun flatten (l)
  "flattens a list"
  (cond ((null l) l)
      ((atom l) (list l))
    (t (append (flatten (car l))
           (flatten (cdr l))))))

(defun make-arg-list (predicate delimited-list)
  (labels ((string-list (delimited-list)
         (mapcar (lambda (x)
               (cond ((symbolp x) (funcall predicate x))
                 ((listp x) (string-list x))))
             delimited-list)))
    (remove-duplicates (mapcar #'intern
                   (sort (flatten (string-list delimited-list))
                     #'string-lessp) ;; BUG: if more than 9 numbered arguments are used
                   ))))

;; first check for numbered args, 
;; then for a single % arg, 
;; finally default to a single _ arg
;; swallow the rest args to get around style warnings
(set-macro-character #\[
             (lambda (stream char)
               (let* ((sexp (read-delimited-list #\] stream t))
                  (args (make-arg-list #'numbered-arg-as-string sexp))
                  (rest-args (make-arg-list #'rest-arg-as-string sexp))
                  (rest-arg (or (car rest-args) (gensym))))
             (unless args
               (setf args (make-arg-list #'single-arg-as-string sexp)))
             (unless args
               (setf args (make-arg-list #'arc-arg-as-string sexp))) ;; arc idiom (_)
             `(lambda (,@args &rest ,rest-arg) (identity ,rest-arg) (,@sexp)))))

(set-macro-character #\]
             (get-macro-character #\)))

So this is a lot more code than the Arc-style implementation. Really, it is the same thing. The only difference is that we need to look inside the s-expression "sexp" and find all the possible arguments. That is what make-arg-list does. A predicate is passed to make-arg-list to check for the different types of args available (eg % or %1, %2, ... or _). make-arg-list then needs to return only the distinct arguments found sorted in ascending order. This list will be turned into ,@args in our lambda expression. Also, if the argument %& is found in the s-expression, it will be turned into the &rest arg in our lambda expression.

Couple of interesting tidbits. If %& is not used in the s-expression (our code body that is), but more arguments are passed to our resulting lambda than we are accounting for, we are swallowing them in a &rest argument whose name is a (gensym) symbol. In other words, we don't care about them at all but we still want our code to run without error. If we do that, some Common Lisps (ie sbcl) will give you a warning about declaring an argument without using it. We get around seeing that warning by calling the identity function on the anonymous &rest argument before executing our code body. Yes, this wastes a CPU cycle, but who cares. For our purposes, seeing that warning message everytime seems way more annoying.

The same examples in Common Lisp now look like:

CL-USER> ([* % 10] 3)
30

CL-USER> (mapcar [* % 3] '(1 2 3 4))
(3 6 9 12)

CL-USER> (mapcar [* %1 %2] '(1 2 3 4) '(5 6 7 8))
(5 12 21 32)

CL-USER> (mapcar [apply 'concatenate 'string %&] 
          '("hello, " "common-lisp, ") '("world" "rocks"))
("hello, world" "common-lisp, rocks")

All of these examples were looking great until we got to the last one. Why do we have to remember to type the symbol 'string before the rest variable? This is precisely the kind of esoteria we should be striving to abstract away. Abstracting that away is what we will be doing in Part 2.