The first project I focused on when I became fun-employed was improving Vty: A terminal output library for Haskell software. Oh I know what you're thinking... Well no, but *I* think it's rediculous to spend time on a *terminal* library. Something like OpenGL or web related, hell anything where any significant activity has happened in the past few years seems more reasonable. Eh! It was an entertaining challange. This is a post-mortem, of sorts, for VTY 4. The primary goal was to document the overall development process. A side goal is to provide an overview of the implementation aspects of VTY 4. It all probably be separated into a few reasonably short posts instead of just one overlong post. Ah well! I had taken over as maintainer of VTY 3 from Stefan O'Rear. VTY 3 already worked great and I didn't really see me doing much. Still there is always something to improve. In this case vty did not support the various terminals people wanted to use. And characters that occupy multiple output columns were causing corruption. Plus, optimization is damn fun! Trying my hands at optimizing Haskell code sounded great. Way more interesting than optimizing C++ ;-) Even better was that, for the most part, there was an already fast and already working version to compare performance against: VTY 3. I figured low level optimization was what I should start on. Which only makes sense considering I was only interested in optimization fun at the time. ;-) The result of this was much faster than before. However, since I wasn't changing the design in any significant fashion some optimizations could not be implemented. In the end this route was only useful to define performance goals for a rewritten output layer. In addition I was learning Mandarin at the time. I wanted to, of course, create software to help me study. Since I have an infatuation with terminal user interfaces I wanted a terminal library that could handle double-width characters. There was no reasonable way to implement this with VTY 3's implementation. To assure the re-implementation did not introduce regressions I repeatedly: 0. Characterized vty 3's implementation. Both in terms of functionality and performance. 1. Defined the semantics for the new implementation. 2. Verified the new implementation performed as expected: The semantics defined in 1 were implemented correctly; No characteristics of vty 3's implementation that should be maintained are missing; Verified characteristics of vty 3 that caused issues were not maintained. Not all the verification steps could be automated. Some I didn't know how to. Others were just verified through informal analysis. The verification of some features was done by implementing an interactive test that guided and recorded the results of a manual review. For instance the libraries representation of red and what is actually required to get a terminal to display red. The only reasonable way to verify that final map was for me to sit there and look at the output. Then record whether or not the output was as expected. Since the same tests were going to have to be performed repeatedly and I wanted to record the results of the tests I formalized this process in software: tests/interactive_terminal_test.hs. This program recorded the results of: Describing a test to a user; Performing the test; Requesting from the user if the test passed or failed; Then recording the users response. This paid for itself very quickly. Not only did provide the framework to easily create about 15 individual tests. But the program could also worked as a sort of bug reporting tool for users. The verification that could be entirely automated was done either through the type system or QuickCheck 2 based verification tests. In short an loose terms: QuickCheck informally verifies equations satisfy user specified predicates for arbitrarially generated input. Not all input is attempted; that'd take too long. However enough is tried to be reasonably sure that an implementation works. The looseness of the verification is made up for the fact that QuickCheck tests are *extremely* easy and quick to implement. I used a very simple Makefile to manage the execution of tests. The usage followed: make => built and ran all tests. make TEST => build and run test with name TEST. The output for a test was logged to a "results" directory. The results included a time and memory profile. Nothing fancy, but enough to support a very quick modify/test cycle. As mentioned before, a particular source of trouble was the insanity of dealing with different terminals or terminal emulators. First off: I'm never going to refer to the physical box of relays from the 70s and 80s that is properly called a "terminal" again. "Terminal emulators" are now be refered to as terminals and the others should be archived and forgotten. So... Terminals are software driven character displays paired with keyboard input. The software controls the diplay by serializing to the STDOUT UTF-8 byte character sequences. Which are then displayed. And control bytes which modify the state of the terminal. Input from the keyboard and events are read from STDIN. Why the fuck something as old as a terminal hasn't been beaten down into a simple, universally supported set of operations by now is a mystery. And no, curses and terminfo are not simple. I suspect if support for everything that does no support a reasonable interface is dropped things would only be better. For this reason I only focused on supporting the following terminals: xterm-256-color with UTF-8; Mac OS X Terminal.app; gnome terminal, kde terminal; and rxvt-unicode - All the terminals I could easily use and behaved how I wanted. Reliably optimizing VTY 4 was simple. The only optimizations I applied were to reach the goal that nothing, once verified, got slower during further development. Each test provided basic performance feedback in addition to verifying correctness. Such as a time and memory profile. I investigated significant changes in the performance data and, for each case, determined if the change was acceptable or indicated a performance regression. However, any change in performance that was done to correct the implementation was considered acceptable. All this is quite different from my optimization work on VTY 3. In VTY 3 all the optimizations in the final release were micro-optimizations: Hand application of primitive types and equations. Which were difficult to verify compared to VTY 4's optimizations. One source of VTY 4's speed was the use of a different serialization algorithm than VTY 3. VTY 3 serialized bytes to the terminal an operation at a time. This resulted in either too many IO operations or too many memory allocations. So for VTY 4 the output algorithm had the goals to batch operations and not perform (any) memory allocation during serialization. Output was serialized to a fixed sized buffer then the buffer was output. While fast, to do this correctly the required buffer size must be known before serialization. This could be implemented performing a fold on the same output strucuture. The only test that performed the equivalent operations under VTY 3 and VTY 4 was the basic benchmark test. For VTY 3 the best results were: total time = 3.48 secs (174 ticks @ 20 ms) total alloc = 2,542,866,800 bytes (excludes profiling overheads) For VTY 4 the results are: total time = 1.84 secs (92 ticks @ 20 ms) total alloc = 1,513,254,136 bytes (excludes profiling overheads) Both execution time and allocations were greatly reduced. A definite win! Everything in this project went better than I expected except, of course, the release took longer than expected. Ah well! I am more convinced than before that Haskell can provide a powerful systems programming environment.