I spent the last few days reading The Craft of Text Editing, a book on the design of text editors. It focuses mainly on editors based on Emacs, but many of the principles apply to all other text editors as While old, many of the topics it covers would appear to still be relevant to anyone designing an editor today, which is admittedly roughly nobody.
The book starts off talking about the different types of users, and the different types of input. The section on input types is quite dated - having more than one button on a mouse is bad design? and not really relevant today. It then covers the requirements you need for the language you are programming the editor in - not surprisingly for a book focusing on emacs-type editors, the recommended choices are C and Lisp.
Once it finishes those sections, the book gets to the interesting part: how to actually represent the text and structure your editor. The first part in this section talks about the possible editing models: Text as an array of characters, a 2D array of characters, a list of lines, and a few other options.
Next, several different file formats that you may need to handle are discussed. While not all of them will be applicable to every editor, ones that aspire to be as general as possible must handle them. This section also discusses how extra information not represented in the text of the file may be represented and stored - for example, typesetting information.
The implementation of the actual editor is then discussed. The main ways of representing buffers is talked about - essentially, the two most common ways are a linked list of characters and a ‘buffer gap’ system, which is what emacs uses. The efficiency of each of these implementations is discussed in several categories, including crash recovery. In general, the buffer gap system is found to be better than the other systems.
Redisplay algorithms are covered next - how to display the changes a user makes with as little interruption to the user as possible. This section doesn’t seem as important as it did back when this book was written - we have faster connections and processors, leading to it mattering a lot less whether five commands are issued per redisplay or four. Still quite interesting, though.
The next section deals with user commands. It discusses multilevel commands (such as C-x C-c), arguments, key rebinding, and modes. While the focus is on Emacs, all text editors share this command loop: for example, vim has suffix arguments corresponding to the range to delete when you press ‘d’. How to deal with Undo and Redo are then described, and various methods of implementing them and whether they are even necessary.
The next chapter deals with the design of the command set, or what exactly you are able to do with the editor. It talks about what you should strive for in a command set; responsiveness, consistency, permissiveness, progress, simplicity, uniformity, and extensibility. It discusses a few special types of editing certain types of syntax and how to enhance support for these. For some commands that have multiple interpretations, like forward-word, each implementation is discussed.
All in all, this was a very interesting book, well worth reading. Since most people spend a large portion of their time inside of their text editor, I believe it is important to understand the basics of how it works. While it probably won’t help in everyday use, having an idea of what is happening behind the scenes of your editor is important if anything goes wrong with it.