There are some other explanations given in sibling comments. They may be right, ...

eduction · on June 5, 2023

>In Common Lisp and its antecedents, a "list" is a chain of cons cells, as mentioned by some of the sibling comments, but that's not all. Another important point is that in Common Lisp and its antecedents, source code is not made of text strings; it's made of Lisp data structures--atoms and chains of cons cells.

Source code in Clojure is also not made of text strings, it also reads text as the serialization of data structures, which are then interpreted as source.

The difference is, Clojure uses data structures other than cons cells for source.

What do you see as particularly important about cons cells? What advantages do they give what some might call a "real LISP" over Clojure, which, I'd argue, smartly abstracts around more modern data structures like vectors, maps, sequences, collections as opposed to being "married" to cons cells as Rich Hickey once put it?

mikelevins · on June 5, 2023

Common Lisp uses cons cells. So do a bunch of older Lisps whose design fed into the design of Common Lisp. That's all. There's nothing else special about cons cells in my mind.

Cons cells are not the important point--at least not for me. The important point in this context is that expressions are represented by something better (that is, more conveniently-structured) than flat text strings, and that the representation be a standard data structure in the language, and that operations on source code are implemented by APIs that are exposed as standard parts of the language.

As far as I'm concerned, if a language does that, then it's nailed this particular part of being Lispy. There are some other parts, but that discussion is outside the scope of this one.

jaggederest · on June 5, 2023

The fancy word is "homoiconicity" but that is more about appearance. Homostructural? Homotypic?

I think it's also important, at least in my opinion, that that data structure be extremely simple, and cons cells are about as simple as you can get. When you start adding "Well a vector is different than a string, n-tuple, or array, so your code has to figure out which one it is dealing with", that's when you run into issues. You could step back and just go object oriented, but at it's core an object is just a struct with a pointer to a function table and/or dictionary data, so we're right back at "cons cell if you squint".

Internally it may be sped up but conceptually "everything is a small integer value or a cons cell" maps pretty closely to how I think about low level data structures. something something build your own arbitrary precision floating point number...

(123 (0 (123 nil))) ~= 123.123 ~= "{\0{" ~= [123, 0, 123]

mikelevins · on June 6, 2023

There's a decent argument for keeping such a foundational data structure as simple as possible, but there's also a decent argument for not making it too simple.

Cons cells are certainly very simple. They're so simple that, as Moon once observed to me, there's no place on them to hang metadata. For example, if your source code is made of cons cells, you might wish that they had some sort of metadata slot so that you could use it to keep track of where a given hunk of source code came from. You can't though. You have to kludge up some out-of-band solution for things like that.

We were talking about my hobby Lisp, Bard. He liked that it separated protocol from representation, so you could have Lists that were made of something other than cons cells. In fact, in Bard your Lists can be made of anything you like, as long as it participates in the List protocol. In particular, they can be made of something that has some place to hang metadata.

Rich Hickey of course also gave a bunch of Clojure's data structures places to hang metadata, possibly for similar reasons.

kazinator · on June 6, 2023

Secret meta-data in a cons cell is not out of the question.

In TXR Lisp, cons cells are four pointer-sized fields wide. So one field is not used. Almost. The field is used in the hash table implementation in which entries are conses. It sticks the hash code in there. That hash code is a pointer sized word with no tag; the garbage collector can safely ignore it.

The extra field is currently not used for tracking source location information, though it could be. Source location info is instead tracked in an external hash table. (The table is configured with weak semantics, so when the code becomes garbage, the entries vaporize.)

That representation could change in the future. It would mean that when the garbage collector traverses conses, it has to look at that hidden field of each one. And each time we allocate a fresh cons cell, we have to make sure it is initialized.

I'd have to benchmark it.

Associating expressions with source location info is a cost that we bear only when processing source code. If we shoehorn it into conses, then there is some nonzero cost to all cons cell processing, whether we are scanning code or not.

An important problem is that meta-data attached to cons cells (whether internal or external) is not copied across traditional tree-structure rewriting operations.

TXR Lisp's expander does some work behind the the scenes to propagate location info, like from macro calls to their expansions. The parser has a flag for whether to attach the info to objects in the first place. It's on by default if we are reading code, but not when reading data.

Outside of the expander, a few places in the compiler have to be aware of this (when the compiler performs its own tree-writing outside of the macro framework).

Overall I'm satisfied with the reporting. From time to time I see a bug: an error occurs for which source location info isn't available but should be.

I didn't give it character precision: I think that compiler messages that report line number and character column are too rococo for my taste. If you can't figure out the problem from a line number, maybe your code is stuffing too much into one line of code.

mikelevins · on June 6, 2023

Associating expressions with source location info is a cost that we bear only when processing source code. If we shoehorn it into conses, then there is some nonzero cost to all cons cell processing, whether we are scanning code or not.

That's true only because cons cells have a specific representation, but they don't have to. Bard classes are defined by protocols, not representations.

If I remember right, that's what Moon liked: because Bard's classes were defined by protocol and not representation, source code could be made of lists, and could have a place to hang metadata, without imposing that cost on other lists, because lists were not any specific representation; they were just any representation for which the list protocol was defined.

kazinator · on June 6, 2023

Even though I have two kinds of cons cells (regular and lazy) as well as the ability of objects to implement car and cdr and then work with those functions, I'm still fairly reluctant.

I wouldn't want source code to use objects, but real cons cells. Objects are heavier-weight. Each object is a cons-cell sized object, plus something in dynamic heap.

There are print-read consistency issues. Lazy conses and regular conses are indistinguishable in print. If you print some lazy conses, and read that back, you get regular conses. Of course, an infinite list made using lazy conses will not print in a readable way, so we can sweep that under the rug.

Objects implementing car and cdr have arbitrary print methods too. They won't print as lists. Those programmed to print as lists won't have print-read consistency.

mikelevins · on June 6, 2023

Point taken, but I feel like I should explain that the word "class" has an idiosyncratic meaning in Bard.

Bard classes are not conventional object-oriented classes; they aren't even CLOS-style classes. A Bard class is a set of representations that participate in a given protocol. That being the case, a hypothetical Bard cons cell representation could be exactly the same as a TXR Lisp cons cell, or the same as a Common Lisp cons cell. In either case it need not be the only representation of a cons cell in the language.

(I feel like someone is going to object that I shouldn't use the word "class" for a concept that is so different from what it usually means in object-oriented languages, and that might be true. If someone suggests a better term for a set of representations defined by a protocol in which they all participate, I'll consider adopting it.)

kazinator · on June 6, 2023

I have experience with both: multiple deeply-integrated cons objects that satisfy the consp function, as well as allowing non-cons objects (including classes in the OOP sense) to take operations like car and cdr.

Once you merely go from one cons type to two, with their own tags, every place in the run-time which checks for a cons cell has to now check for two possible type tags. An atom is everything that is not a cons, as you know, so that function is also affected.

(I wonder whether it wouldn't be better to just have one cons tag, and use some flag field to distinguish lazy conses.)

mikelevins · on June 6, 2023

Also fair points. In Bard I’ve been willing to pay that cost because exploring types that are defined by protocol was one of the motivating reasons for working on it.

eduction · on June 5, 2023

The original question was why people don’t consider Clojure a lisp. By your yardstick, it is clearly a lisp.

As far as I can tell the only argument against it being is that it does not specifically use cons cells.

mikelevins · on June 5, 2023

I did not intend to classify Clojure as "not a Lisp". I didn't intend to comment on Clojure at all, specifically, so commenting on this particular thread was an ill-conceived choice on my part.

I wanted to describe the source-code peculiarity because it hadn't been discussed elsewhere in the comments and I think it's important in what you might call old-fashioned Lispiness. I should have chosen another place for my comment. Sorry about that.