In a recent post I showed some
strange things that one can do with macros to be able to write SQL-like queries
right in Clojure. Several weeks ago we enabled selecting fields from records as
well as filtering them with where clause. In the end of the first post I promised you that we will soon add sorting capabilities to our query language and enable joining
“tables” together. Not that I am going to totally break this promise, but I
will deliver on it only partially: this time we will do sorting and the
sweetest part, joins, we leave for the final post.
Before proceeding, let me remind once more that
the code obtained through this exercise is not something that can be used in any
real application – it sucks from both performance and feature-richness
perspectives and is hardly idiomatic, nice or readable. Furthermore, while
doing the exercise I gradually came across the idea that what I show here is by
no means a good way to utilize the power of macros. Still, I am sure that implementing
a query language like this is a good exercise and can really help one
understand what macros are about, how to handle them and, at least, where one
should not use them. This way of learning is hard but it pays back a lot once
you put enough effort into it.
Let’s recap where did we stop last time.
First of all, we mastered picking fields from records and can write simple
queries like this one:
(def table [
{:a 1 :b 100 :c "100" :d 4}
{:a 2 :b 200 :c "200" :d 3}
{:a 3 :b 300 :c "300" :d 2}
{:a 4 :b 400 :c "400" :d 1}])
(select :c from table)
;=> ({:c "100"} {:c "200"} {:c "300"} {:c "400"})
However, since this is hardly an impressing achievement
and can be done in a much more concise way, we have gone further and allowed
for filtering – starting from very simple conditions and proceeding to quite
long and complex ones:
(select :a :b from table where :a < :d)
;=> ({:a 1, :b 100} {:a 2, :b 200})
(select :a :b from table where :a = 2 or (:d < 3 and :b > 300))
;=> ({:a 2, :b 200} {:a 4, :b 400})
Our key goals were to make the syntax as close
to SQL as possible and I think we succeeded at it – at the cost of introducing a lot of
complex stuff, whose main purpose is to deal with symbols like from, where and others. Now let’s take a look
at our macros and see what we can do to make ordering possible. The key player
in our team is select:
(defmacro select [& what]
(let [fields (set
(take-while #(not= 'from %) what))
source (fnext
(drop-while #(not= 'from %) what))
conditions (next (drop-while #(not= 'where %) what))]
`(map (fn [record#]
(into {} (filter #(~fields (first %)) record#)))
(filter #(condition % ~@conditions) ~source))))
For now, select is aware of from and where clauses, which it uses to find the desired fields, the source table and the filtering conditions in a query. To
enable sorting we have to pay attention to the order by clause as well and here I am going
to cheat. For the sake of making code a tiny bit simpler I will use a single orderby word without a whitespace – this
will save us a couple keystrokes. The only place in our code where we have to pay
attention to this symbol is in the select macro, where we have to retrieve the actual
ordering conditions from the original query. For this we use the same approach
as with where –
basically skip everything up to the orderby symbol and the symbol itself. At the same
time, we should remember that the new clause at the end of the query might introduce a problem: we don’t want this stuff to get into the condition macro, because upon meeting
orderings it will immediately spit an “Unable to resolve symbol: orderby” error. Thus, we also restrict the
conditions list on both sides:
(defmacro select [& what]
(let [fields (set
(take-while #(not= 'from %) what))
source (fnext
(drop-while #(not= 'from %) what))
conditions (take-while #(not= 'orderby %)
(next (drop-while #(not= 'where %) what)))
orderings (next (drop-while #(not= 'orderby %) what))]
`(map (fn [record#]
(into {} (filter #(~fields (first %)) record#)))
(filter #(condition % ~@conditions) ~source))))
Now, once we have the orderings list extracted
from the query, we can use it in some way. Because the select macro doesn’t look pretty
straightforward already, it is a good idea to implement the sorting stuff
elsewhere and just use it in select. For this purpose we are going to introduce
another macro - order. To start with, let us make it distinguish between 2
situations: when no ordering is requested and when there is actually some stuff
in the ordering conditions list. For now we can simply return nil in the latter case. The place to
plug ordering in select seems quite obvious: sorting should be done after filtering so we will
just wrap the existing body of select (the syntax-quoted part) in a call to the new macro:
(defmacro order [orderings what]
(if (empty? orderings)
what
nil))
(defmacro select [& what]
(let [fields (set
(take-while #(not= 'from %) what))
source (fnext
(drop-while #(not= 'from %) what))
conditions (take-while #(not= 'orderby %)
(next (drop-while #(not= 'where %) what)))
orderings (next (drop-while #(not= 'orderby %) what))]
`(order ~orderings
(map (fn [record#]
(into {} (filter #(~fields (first %)) record#)))
(filter #(condition % ~@conditions) ~source)))))
If you execute any query with the updated macro
you will see that the only thing that has changed is that now we get nil when there is a non-empty orderby clause:
(select :c from table where :a > 1)
;=> ({:c "200"} {:c "300"} {:c "400"})
(select :b :c from table where :a > 1 orderby :b)
;=> nil
Let’s take this as an evidence that we didn’t
break anything and proceed with sorting. First, what are we expecting to get in
the list of ordering conditions? Definitely, it should contain keywords denoting the fields to order by. Besides, each of these keywords may have an
optional asc or desc symbol after it, standing for ascending
and descending order, respectively. When the keyword is not followed by such a
symbol, we will assume ascending order, similarly to what SQL dialects do. Now,
this is not a completely trivial thing to implement, so let us first do
something simple – for example, handle the case when there are only keywords in
the orderings list. For this we only need to use the built-in sort-by and juxt functions in the order macro:
(defmacro order [orderings what]
(if (empty? orderings)
what
`(sort-by (juxt ~@orderings) ~what)))
(def table [
{:a 1 :b 400 :c "200" :d 4}
{:a 2 :b 300 :c "100" :d 3}
{:a 3 :b 200 :c "400" :d 2}
{:a 4 :b 100 :c "300" :d 1}])
(select :b :c from table where :a > 1 orderby :b)
;=> ({:c "300", :b 100} {:c "400", :b 200} {:c "100", :b 300})
(pprint (select :a :b :c :d from table orderby :d))
;=> ({:a 4, :c "300", :b 100, :d 1}
;=> {:a 3, :c "400", :b 200, :d 2}
;=> {:a 2, :c "100", :b 300, :d 3}
;=> {:a 1, :c "200", :b 400, :d 4})
The examples show us that filtering and the
simplest sorting facilities both work nicely. Still, if we chose to ask for
ascending or descending order explicitly, the compiler will award us with
something like “Unable to resolve symbol: asc”. Let us allow for these symbols and
get closer to our goal. Here I want to start with something that I have been
avoiding since the beginning of our strange exercise – we will get rid of the asc and desc symbols in the order macro well before doing anything else. Even
though while implementing select and condition I got used to handling symbols in macros, they still bother me a lot
and I want to force them out as fast as possible. We will replace asc with 1 and desc with -1 – this feels pretty logical and
saves us from comparing everything to asc and desc as well as from the attempts to
prevent Clojure from evaluating these symbols.
(defmacro order [orderings what]
(if (empty? orderings)
what
(let [orderings-desym
(vec (map (fn [s]
(cond (= 'desc s) -1
(= 'asc s) 1
(keyword? s) s))))]
`(sort-by (juxt ~@orderings) ~what))))
Next, we need a way to understand which
keywords should go with which order. This would be very simple should we not
allow to skip the asc symbol, but with this extra convenience feature we have to find a way to
handle this without falling into some non-trivial loop-recur construct. Fortunately, this can be
done in a not very elegant but satisfactory manner:
(defmacro order [orderings what]
(if (empty? orderings)
what
(let [orderings-desym
(vec (map (fn [s]
(cond (= 'desc s) -1
(= 'asc s) 1
(keyword? s) s))))
orderings-separated
(->> orderings-desym
(partition-by keyword?)
(map (partial interpose 1))
flatten
(partition-all 2))]
`(sort-by (juxt ~@orderings) ~what))))
This seemingly strange sequence of functions
transforms the incoming list of ordering conditions into a sequence of pairs of
the form (:key 1) or (:key -1). To achieve this result we first
separate the list into sublists of keywords without ordering markers and those
of ordering markers themselves. Then we use interpose, which will add missing 1‘s, and flatten the resulting list
to finally split it into pairs. To make this easier
to understand let’s take a look at the example – here are the stages of this
transformation for a particular list of ordering conditions:
;0. [:a :b :c 1 :d :e -1 :f] ;initial orderings
;1. ((:a :b :c) (1) (:d :e) (-1) (:f)) ;partition-by keyword?
;2. ((:a 1 :b 1 :c) (1) (:d 1 :e) (-1) (:f)) ;map (partial interpose 1)
;3. (:a 1 :b 1 :c 1 :d 1 :e -1 :f) ;flatten
;4. ((:a 1) (:b 1) (:c 1) (:d 1) (:e -1) (:f)) ;partition-all 2
As you see, each keyword apart from the last
one gets its 1 or -1. – missing
tail is, in fact, not a big problem and we will deal with it shortly. Now we
have set up everything except for the actual sorting part – we need to
transform the new sequence of conditions into a list of functions suitable for sort-by. For this purpose, we can introduce
an auxiliary function and consume it in the order macro:
(defn order-fn
([k]
k)
([k ord]
(if (neg? ord)
`(fn [r#] (- (~k r#)))
k)))
(defmacro order [orderings what]
(if (empty? orderings)
what
(let [orderings-desym
(map (fn [s]
(cond (= 'desc s) -1
(= 'asc s) 1
(keyword? s) s)) orderings)
orderings-separated
(->> orderings-desym
(partition-by keyword?)
(map (partial interpose 1))
flatten
(partition-all 2))
order-funcs
(map #(apply order-fn %) orderings-separated)]
`(sort-by (juxt ~@order-funcs) ~what))))
(pprint (select :a :b :c :d from table orderby :a desc))
;=> ({:a 4, :c "300", :b 100, :d 1}
;=> {:a 3, :c "400", :b 200, :d 2}
;=> {:a 2, :c "100", :b 300, :d 3}
;=> {:a 1, :c "200", :b 400, :d 4})
It works! Let’s try to understand why. The key piece
is the order-fn, which
generates sorting functions for us. The one-argument version is trivial – it
takes a single key and returns it to the caller. Why do we need something this
stupid? Well, that’s simply our way to handle the last keyword in the orderings
sequence when it comes without explicit asc or desc. As seen in the example above in
this case it won’t have an ordering marker (1 or -1) in the orderings-separated list, but with the help of order-fn defined as above we can just ignore
this and everything will work fine.
The two-arguments version of order-fn is a bit less trivial but still manageable: it will either return the keyword if we want ascending order, or wrap it into a function that will "invert" the numeric value retrieved with the keyword otherwise. At the same time, here we have one nasty feature: it returns a syntax-quoted fn form instead of a mere function.
This might seem weird – and it is – but at the same time that’s the only way it
is going to work. The problem here is that we use order-fn in a macro to produce another
function – a closure around the k argument. This combination of macro, function
and closure somehow makes compiler vomit “No matching ctor
found for class user$order_fn$fn__*” errors. I must admit that I failed to understand this problem – if you
can explain it I would be grateful. Still, for now we, at least, can go on with
quoting.
The way we use order-fn is very simple – we map it onto the
list of (key, sorting order) pairs and then pass the resulting functions to juxt. The latter produces a single function,
which extracts all the values used for ordering from a record. Finally, in
combination with sort-by this gives us a properly ordered results set.
Now, you might have spotted several problems
with our solution. The most severe one (apart from the approach in general) is
that we allow sorting only by the numeric fields – strings won’t do. This is
because to produce an ordering function for desc we use simple numeric negation,
which isn’t compatible with strings, sequences or anything else. I have
tried to come up with a more generic solution, but the time and brainpower that
I had put into it were not enough. Fortunately, my goal of understanding macros
better and learning to be not afraid of quoting and unquoting allows me to live
with this. However, if you come up with a more generic implementation of order-fn it will be a great pleasure for me
to see it – please share it in comments!
On the other side, the use of quoting in the order-fn makes me think that something is
definitely wrong with our approach to the problem. The degree of complexity of
our macros also strengthens this feeling – I still can manage them, but it
ceased to be an easy task long before we produced a working implementation of orderby. I have already stated that this way of implementing SQL-ish queries doesn’t look like an idiomatic way to
use Clojure macros and all these concerns only reassure me in this idea. Still,
the audible tension that I had in my brains when fighting this stuff confirms
that the exercise is a damn good kata.
Here are some of my takeaways from this
iteration:
- Maybe the most frequent error that I faced here
was the “Wrong number of arguments passed to a keyword” one. The reason for this was always that
Clojure tried to evaluate a list starting from a keyword when I didn’t mean it
to be evaluated. This same problem manifested itself in some other ways and I
find it the most important thing to tackle: you should always pay close attention
to what and when you are evaluating.
- I have expected to face some difficulties with
sorting over several fields – until I remembered that there is juxt in Clojure. So one of the lessons
from each and every exercise in programming is actually that there is likely a
function to solve your problem. That's actually one of the reasons why it is good to do
these exercises – you discover and remember a lot of useful stuff.
- One sign of using a wrong approach to a problem
is that you have a lot of bindings or variables and find it difficult to come
up with proper names for them (order, order-fn, orderings-separated, etc simply don’t sound very good).
Maybe the most appropriate thing to do under such conditions is to step back
and try to rethink the way you reason about the problem.
I will add joins to the mess in the next post –
as promised. For now, feel free to play with what we already have – here is the code. By the way, there is a problem with our new select macro, which limits sorting capabilities,
but is very easy to fix. Did you spot it? Let me know in the comments!
In addition to showing me where I messed up,
please tell whether you find this exercise a good way to accustom oneself with
macros or do you think it is as useless as it is long and complex? Maybe you have your
favorite Clojure kata, which helps to explore and understand macros better? I
will be glad to hear about it!