Linearized columns to HTML table in elisp

Let’s say you have linearized column data like this:

First name
Last name
Age
--
Barry
Yeengdull
33
--
Helga
Madbroom
44
--
Zelma
Landbeck
55
--
Peyton
Glonham
66

(names courtesy of Fictional Character Name Generator) and you want to make a HTML table out of it that will be displayed like this:

First nameLast nameAge
BarryYeengdull33
HelgaMadbroom44
ZelmaLandbeck55
PeytonGlonham66

The HTML for the above table would be:

<table>
<tr><th>First name</th><th>Last name</th><th>Age</th></tr>
<tr><td>Barry</td><td>Yeengdull</td><td>33</td></tr>
<tr><td>Helga</td><td>Madbroom</td><td>44</td></tr>
<tr><td>Zelma</td><td>Landbeck</td><td>55</td></tr>
<tr><td>Peyton</td><td>Glonham</td><td>66</td></tr>
</table>

Doing it in elisp

Here’s a general overview of what needs to be done:

  • Split the text by delimiters (in the above case, it’s --) to get a list of rows by column
  • Split column strings into cells
  • Transpose columns
  • Write the HTML out

Let’s do it one by one.

Split the text by delimiters (in the above case, it’s --) to get a list of rows by column

Elisp has a function called split-string that we can use for this purpose – see the docs in the elisp manual. For example, we can do this:

(defun icyrock-get-column-strs (str delim)
  (split-string str (concat "\n?" delim "\n?")))

;; Tests
(assert (equal
         (icyrock-get-column-strs "1\n2\n3\n--\n4\n5\n6\n--\n7\n8\n9" "--")
         '("1\n2\n3" "4\n5\n6" "7\n8\n9")))

which will:

  • Split the given string:
  1
  2
  3 
  --
  4
  5
  6
  --
  7
  8
  9

into lines using split-string. The result would be a list of strings, i.e. ("1\n\2\n3" "4\n5\n6" "7\n8\n9")

Split column strings into cells

Now that we have a list of strings, each representing a column, we can split each column into cells, again using split-string:

(defun icyrock-split-column-strs (strs)
  (mapcar (lambda (col-str) (split-string col-str "\n")) strs))

(defun icyrock-get-columns-from-string (str delim)
  (icyrock-split-column-strs (icyrock-get-column-strs str delim)))

;; Tests
(assert (equal
         (icyrock-get-columns-from-string "1\n2\n3\n--\n4\n5\n6\n--\n7\n8\n9" "--")
         '(("1" "2" "3") ("4" "5" "6") ("7" "8" "9"))))

Function icyrock-split-column-strs splits the list of column strings, so we get columns. icyrock-get-columns-from-string is a combination of this one and icyrock-get-column-strs, which, given our starting string, generates a list of cells ((1 2 3) (4 5 6) (7 8 9)).

Transpose columns

The list we got ((1 2 3) (4 5 6) (7 8 9)) seems ordered, however note that this is ordered by column, while we want it ordered by row. The list that we want is a transposition of this, i.e. ((1 4 7) (2 5 8) (3 6 9)).

(defun icyrock-transpose-table (table)
  (apply 'mapcar* 'list table))

(defun icyrock-get-rows-from-string (str delim)
  (icyrock-transpose-table (icyrock-get-columns-from-string str delim)))

;; Tests
(assert (equal
         (icyrock-get-rows-from-string "1\n2\n3\n--\n4\n5\n6\n--\n7\n8\n9" "--")
         '(("1" "4" "7") ("2" "5" "8") ("3" "6" "9"))))

Table transposition is done simply via mapcar* function, which does exactly what we need.

mapcar* takes the first element (a car in (e)lisp jargon) of all supplied arguments and applies the given function to it. To explain the above, try this:

(message (format "%s" (mapcar* 'list '(1 a x) '(2 b y) '(3 c z))))

What the above does is:

  • Get the list of first elements of supplied arguments, yielding (1 2 3)
  • Call the supplied function (“list” in this case) with them – effectively running (list 1 2 3)
  • Append that to the resulting list
  • Repeat, going the next list of elements, which in this case is (a b c)

The result in this case is the following list: ((1 2 3) (a b c) (x y z)).

Now, to transpose any list, what we need to do is exactly the above – take a look at the starting lists and the ending list. The only thing we need is to call mapcar* with the list of those arguments. This is exactly what apply does. Try this:

(message (format "%s" (apply '+ '(1 2 3))))

The above prints 6. In other words – (apply '+ '(1 2 3)) is the same as (+ 1 2 3). To generalize – (apply 'func '(a1 a2 ... aN)) is the same as (func a1 a2 ... aN), where func is any function and a1 through aN are its arguments.

Write the HTML out

Writing HTML output is fairly easy:

(defun icyrock-html-td-from-string (str)
  (format "<td>%s</td>" str))

(defun icyrock-list-to-string (list &optional sep)
  (mapconcat 'identity list sep))

(defun icyrock-html-tr-from-list (list)
  (format "<tr>%s</tr>"
          (icyrock-list-to-string (mapcar 'icyrock-html-td-from-string list))))

(defun icyrock-html-table-from-rows (table)
  (format "<table>\n%s\n</table>"
          (icyrock-list-to-string (mapcar 'icyrock-html-tr-from-list table) "\n")))

(defun icyrock-table-from-linearized-string (str delim)
  (icyrock-html-table-from-rows (icyrock-get-rows-from-string str delim)))

;; Tests
(assert (equal
         (icyrock-table-from-linearized-string "1\n2\n3\n--\n4\n5\n6\n--\n7\n8\n9" "--")
         (concat "<table>\n"
                 "<tr><td>1</td><td>4</td><td>7</td></tr>\n"
                 "<tr><td>2</td><td>5</td><td>8</td></tr>\n"
                 "<tr><td>3</td><td>6</td><td>9</td></tr>\n"
                 "</table>")))

The above produces the following:

<table>
<tr><td>1</td><td>4</td><td>7</td></tr>
<tr><td>2</td><td>5</td><td>8</td></tr>
<tr><td>3</td><td>6</td><td>9</td></tr>
</table>

Making it interactive

Now, obviously you’d want to use this while editing in Emacs. That is, write some linearized list, select the region and apply our function.

Here are the steps:

  • Get the currently selected region
  • Apply the above function to that, so you get the needed HTML
  • Overwrite the region with the HTML code. Alternatively, you can append after the region
(defun icyrock-make-html-table-from-current-region ()
  (interactive)
  (save-excursion
    (let ((current-region-string (buffer-substring (mark) (point))))
      (delete-region (mark) (point))
      (goto-char (mark))
      (insert (icyrock-table-from-linearized-string
               current-region-string "--")))))

(global-set-key (kbd "<f11>") 'icyrock-make-html-table-from-current-region)

Some comments for the above:

  • save-excursion is used to save the state of the mark / point, so the user will not “notice” any changes after we do our job. Just a courtesy to the user – it’s very nice when you feel “just right” after everything was converted to a HTML table, instead of having to move your cursor around to where it was
  • (interactive) is needed so the function can be bound to a key
  • (buffer-substring (mark) (point)) selects everything between the mark and point (essentially last two “important” cursor locations)
  • global-set-key function together with kbd function are used to bind this to a key

Now, if you are in a buffer which has the first test linearized table in this post, just select it and press F11 and you’ll get a nice formatted table.

Escaping

Obviously, if the table has some HTML-forbidden characters (such as >), the above will not work correctly. Some simple escaping can be done by changing the code like this:

(defun icyrock-html-escape-string (str)
  (let* ((s1 (replace-regexp-in-string "&" "&amp;" str))
         (s2 (replace-regexp-in-string "<" "&lt;" s1))
         (res (replace-regexp-in-string ">" "&gt;" s2)))
    res))

(defun icyrock-html-td-from-string (str)
  (format "<td>%s</td>" (icyrock-html-escape-string str)))

Similar things

If you have a “text-version” of the HTML table, see this article for an approach Emacs Lisp: How to Write a make-html-table Command.


Building Emacs from source on Xubuntu 12.04

Emacs is an extensible text editor written mostly in Emacs Lisp (or elisp for short). It’s quite old – the first version was released in 1976. according to Wikipedia article about Emacs. It’s, however, still regularly updated – the last stable version as of this writing is 23.4 released on January 29th, 2012.

One problem with Emacs is exactly its extensibility and customizability. It’s updated very often, so the last stable release is, well, unusable. All the extensions and testing is usually done on the development branch – currently, the preview release is 24.1-rc, released just yesterday, June 1st, 2012. In the current Xubuntu repository, the current version is present:

Version: 23.3+1-1ubuntu9

In order to compensate for this, I (and I suppose many other people) install directly from source. Here’s how to easy do this using the git mirror repository.

Steps for building Emacs

  • Build packages

In order to build emacs, you’ll need some packages installed that don’t come pre-installed on a fresh Xubuntu install. In case you did not, install these:

$ sudo apt-get install autoconf automake build-essential libjpeg-dev libgif-dev libncursesw5-dev libpng-dev libtiff4-dev texinfo
  • Clone the git mirror of Emacs

On this Emacs Wiki page – called Emacs From Git – you have several git repositories listed. Pick one, e.g. the one on gnu.org, and clone it somewhere:

$ git clone git://git.savannah.gnu.org/emacs.git

This is going to take a while, so be patient.

  • Build Emacs

Emacs uses the standard Linux configure scripts and make command. To build it, enter the newly cloned git repository and issue configure / make:

$ cd emacs
$ ./autogen.sh
$ ./configure
$ make

The above assumes that you want to install emacs in /usr/local – change that if you have other preferences here. To do that, run configure like this instead:

$ ./configure --prefix=/my/emacs/path

The build step will likely take even more then git cloning, so grab a few coffees or something.

  • Installing

After compiling Emacs, you can now install it:

$ sudo make install

For for a custom folder setup:

$ sudo make install --prefix=/my/emacs/path
  • Test run

Run emacs to confirm the version:

$ emacs --version
GNU Emacs 24.1.50.1
Copyright (C) 2012 Free Software Foundation, Inc.
GNU Emacs comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of Emacs
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.

All set.