Haskell track: coding style guide

The Haskell programming language, like most programming languages, offers a number of ways to format code. Many programmers abuse this freedom and write unreadable (and thus incomprehensible and unmaintainable) code. While there is more than one way to properly format code, here is a set of guidelines which have been found useful in practice. Note that marks will be taken off for poor formatting (more marks deducted as the term goes on). Some of these guidelines may seem amazingly anal, but they really make a difference when reading code. Remember: you are writing code not just for the compiler, but for other people to read as well. The other person reading your code will most likely be you six months from now, so making sure that your code is readable is extremely important.

In the following, each item has a code associated with it to the left of the description of the item. This code will be used to specify the problem when correcting your code. It is up to you to match the code with the item. Hopefully this will encourage you to read this style guide :-) As a general rule, the earlier items in a section are more important than the later ones and/or represent more common errors.

In order to make life easier on you, I am supplying you with an automatic style checker. It won't catch all of these errors by any means, but it will catch a lot of them. You will be required to run your code through the style checker before you submit it; if it fails, I won't grade it (unless you can convince me that the style checker itself has a bug, which is possible). The Haskell style checker is written in python, which is available on the CS clusters.

Here is what you have to do to get the style checker working:

Download the style checker from this page onto your CS cluster account. The easiest way to do this is to come to the CS lab, log on to a computer, start a web browser (e.g. Mozilla), surf to this page, and then do "Save Page As" in the "File" menu (this works for Mozilla; other browsers have similar commands). It will suggest you save the file under the name haskell_style_check and you should agree. This will put the style checker program into one of your directories (probably your home directory).
Start up a terminal program and cd to the directory containing the style checker you just downloaded. Make it into an executable program by typing:
```
% chmod +x haskell_style_check
```
(where % is the unix prompt).
Make a subdirectory of your home directory called bin (which just means a location for programs; the name really doesn't matter but it's traditional). You can do this by typing
```
% mkdir ~/bin
```
at the unix prompt (the tilde (~) character is shorthand for your home directory).
Move the style checker into the newly-created bin directory:
```
% mv haskell_style_check ~/bin
```
Now comes the tricky part. You have to adjust your path so that it includes your bin directory. The path is just a list of directories that contain programs; it's the way the computer knows where to find programs. There is a default path set up for you when you start your computer, but it may not include the bin directory (because most users don't have one). To adjust your path you have to do this:
- First check to see if you already have the bin directory in your path (which you should, since that's how the CS cluster is set up by default). Type
```
% echo $PATH
    
```
  at the prompt and if the result includes "/home/<your-login-name>/bin" then you're all set and you can skip the rest of the steps in this section.
- Otherwise, copy a file called .bashrc to your home directory:
```
    % cp /home/setup/.bashrc ~
    
```
- Then, look in the .bashrc file for the following line:
```
    # export PATH="$PATH:/new/bin/dir"
    
```
- Open the file in your text editor and change that line to:
```
    export PATH="$PATH:$HOME/bin"
    
```
  (Note that you get rid of the initial # character.)
- Save the file, get out of the text editor and do this at the unix prompt:
```
    % source ~/.bashrc
    % hash -r
    
```

Now you're all set. You will be able to use the style checker from any directory you're in, and you'll never have to download it again. To style check a file called e.g. foo.c, do:

    % haskell_style_check foo.c

You can also style check multiple files all at once:

    % haskell_style_check foo.c bar.c baz.c

Don't be alarmed if there are a lot of errors reported; just go through the file and fix them. Some lines will probably have multiple style violations; you should fix all of them. You'll probably hate me for writing this program at first, but your code will become much more readable as a result. If you think you've found a bug in it, let me know at once; this is a work in progress. Note that the style checker will sometimes be too stupid to know when it's in the middle of a comment or a literal string, so it may report errors that aren't really errors in those cases. If so, just disregard them.

The most common style mistakes

These mistakes occur so often that they're almost universal. Therefore, please pay particular attention to avoiding them. Follow the links to get to the descriptions below. Style mistakes followed by an asterisk (*) are caught by the style checker.

TABS* Using tab characters in your code.
OPERATOR_SPACE* Not putting spaces between operators.
COMMA_SPACE* Not putting a space after a comma.
LINE_LENGTH* Writing lines longer than 78 characters.
USAGE_STMT Missing or inadequate usage statement.
EMPTY_LINES Using too many or too few empty (blank) lines in functions.
COMMENTS_FULL_SENTENCES Writing comments that are not full sentences.
COMMENTS_GRAMMATICAL Writing comments that are not grammatically correct or are misspelled.
COMMENT_SPACE* Not putting a space after the open-comment symbol "/*" and/or before the close-comment symbol "*/".
COMMENT_HEADER Not writing a proper comment at the head of a function.
FUNCTION_PROTOTYPES Not writing prototypes for all the functions defined in a file.
FUNCTION_BLANK_LINES Not separating function definitions by blank lines.

Catalog of style mistakes

General

[TABS]

Never, ever, ever use the tab character (ascii 0x9)! Different people use different tab settings, and code that looks just fine with a tab width of 2 becomes unreadable with a tab width of 8. Unfortunately, many text editing programs will stick in tab characters without making it obvious that they're doing it. If you use emacs for text editing (which I recommend) put the following lines in a file called .emacs, which should be placed in your home directory:
```
    (setq haskell-mode-hook
      '(lambda  ( )
         (progn
            (set-variable 'indent-tabs-mode nil)
            ;; other customizations, if any, go here
            )))
```
This is actually emacs-lisp code, but don't worry about that. Then exit and restart emacs. Now when you hit the tab key while editing Haskell code, emacs won't actually put any tab characters in your code, but instead will just put in spaces. In addition, emacs is smart enough that when you're editing Haskell code and you hit the tab key, emacs will automatically indent the code to a reasonable point on the line. Sometimes there are multiple reasonable points on the line for indenting; hitting the tab key successively will cycle between them. If you're using emacs in the CS cluster, it should also color your file in a meaningful way (comments will be a different color, for instance). Emacs is very nice to use for editing Haskell code, as it is for most programming languages.

If you're using the vim editor instead of emacs, you can see all the tab characters by typing:
```
    :set list
```
into the editor while in command mode. This will make all tab characters look like "^I" (a circumflex accent followed by a capital I). This makes it easy to go through a file and replace all tabs by e.g. four spaces. Better, still, you can put this into your ~/.vimrc file:
```
    set expandtab
```
and tabs will be printed as spaces.

If you're using an editor other than emacs or vim, your job is harder; you have to go over the line character by character using the forward-character arrow and find out where the tabs are and replace them.

If you don't like removing tabs from your code manually, here's a trick that will help. Let's say you have a file called foo.hs and you've run the style checker on it, and every other line has tabs in it. Just do this from the unix prompt (% in this example):
```
% sed -e 's/\t/    /g' < foo.hs > foo.hs.notabs
% mv foo.hs.notabs foo.hs
```
and your file will no longer have any tabs. On the other hand, this can mess up the indentation, so you should go over it afterwards to make sure it looks presentable and add spaces if necessary. If you don't, and the result is unreadable, I will probably make you redo it.

The "sed" in the command line is a program called sed (which means "stream editor"). It does simple editing on files on a line-by-line basis. So when you type
```
sed -e 's/\t/    /g' < foo.hs > foo.hs.notabs
             ^^^^ 4 space characters here
```
it executes the command between the single quotes:
```
's/\t/    /g'
```
on each line of the file foo.hs, putting the results into a new file called foo.hs.notabs. This command means "substitute (s) for every tab (\t) character, four space characters (which is what's between the // characters), and do it for every tab in the line (g, which means global)". Note that you have to type this in exactly as I've described it or it won't work.

If you want, you can use more or less than four space characters per tab. Most editors use eight space characters for a tab by default, so that might be a good alternative. That would look like this:
```
sed -e 's/\t/        /g' < foo.hs > foo.hs.notabs
             ^^^^^^^^ 8 space characters here
```
[OPERATOR_SPACE]

Use a single space to separate variable names from operators, i.e. write
```
    let a = b + c * d in ...
```
instead of
```
    let a=b+c*d in ...
```
[COMMA_SPACE]

Always put a space after a comma. There are no exceptions to this rule.
[LINE_LENGTH]

Don't write lines that are longer than 78 characters long. Long lines tend to be wrapped, or worse, to be truncated when printing out the source. Printing out source code is a valuable way to review your code. It is almost never necessary to have long lines.
[MAGIC_NUMBER]

Avoid putting a large number into a file which has no obvious relevance to the surrounding code. This is known as a "magic number". The reason for avoiding this is twofold:
- It's not usually clear from the context what the significance of the number is.
- The same number tends to occur several times in the file, which causes problems when you want to change the value.
The right thing to do is to assign the number to a value once, and then to use that value. This also makes it easy to change the number later if it becomes necessary.
[USELESS_CODE]

Don't put in code that has no function or no effect. If it's code that's was only used for debugging purposes, it should be removed before you submit your lab.
[USAGE_STMT]

If a stand-alone program is called with incorrect arguments, it should detect that and print a usage statement to the terminal. The usage statement should include the program name. As a general rule, any error that involves the user supplying invalid command-line arguments should give rise to a usage statement. You should try to make your usage statements comprehensive enough so that one statement will work for all such errors.
[STMTS_ON_LINE]

Never put more than one statement on a line. It makes for unreadable code.
[EMPTY_LINES]

Do not put large numbers of empty lines (> 2) between code sections unless there is a clear need to distinguish different sections of the code. Conversely, do put an empty line between logical sections in a single function, and especially between different functions.
[COMMENTS_FULL_SENTENCES]

This is the single most common style mistake. If a comment is a full sentence, its first word should be capitalized, unless it is an identifier that begins with a lower case letter (never alter the case of identifiers!), and it should end in a period. I prefer comments that are complete sentences. You should use two spaces after a sentence-ending period.

When you need to refer to identifiers, put them in surrounding single quotes, e.g.
```
    -- 'nitems' represents the number of items in the list.
```
If a comment is very short, it doesn't have to be a full sentence or end in a period e.g.
```
    let m = 10 in   -- maximum value
```
This is called an "inline comment". Use these only when describing something i.e. in the above code snippet you're saying "The value 'm' represents a maximum value."
[COMMENT_GRAMMATICAL]

Comments should be grammatically correct. In particular, incorrect spelling is unacceptable. I hate to sound like your high school English teacher, but it's a pain to read code with tons of spelling mistakes. Use a spell checker if you have to.

[COMMENT_SPACE]

Put a space after the open-comment symbol(s) and before the close-comment symbol i.e. do this:


    -- This is a comment that is easy to read.
    {- So is this. -}

and not this:


    --This is a comment that is harder to read.
    {-So is this.-}

[COMMENT_NON_OBVIOUS]

Write comments for anything that isn't completely obvious from the context. In particular, write comments for any tricky algorithm or code you are using. When in doubt, comment more rather than less.
[COMMENT_REDUNDANT]

Conversely, don't make completely redundant comments, e.g.
```
    let i = 1 in  -- Set i to 1.
```
What constitutes redundancy is often a judgment call. If in doubt, comment more rather than less.
[COMMENT_HEADER]
You should almost always put a comment at the beginning of each function describing what it does. The only exception is when you have a series of very similar functions which are written out one after another, and where the first comment applies (suitably modified) to all of them. This kind of "header comments" (not to be confused with header files) are by far the most important kind of comments, because even if the person reading your code has no idea how a given function works, the header comment will at least tell him what it does and how to use it. You should state what each of the arguments represents and what the function returns. You may also want to describe the algorithm used, its efficiency, and any other relevant factoids. An example:
```
    --
    -- insertion_sort
    --
    --    This function takes a list and returns a new list containing the
    --    same elements as the original list but sorted in ascending order.
    --    The list is sorted using the insertion sort algorithm.  
    --    This algorithm has a time complexity of O(n^2) where 'n' is the
    --    length of the list.
    --
    --    Arguments:
    --    -- lst:  the input list
    --
    --    Returns: the new list
    --

    insertion_sort :: Ord a => [a] -> [a]
    -- code...
```
[COMMENTS_CONSISTENT_WITH_CODE]

Comments that contradict the code are worse than no comments. ALWAYS MAKE A PRIORITY OF KEEPING THE COMMENTS UP-TO-DATE WHEN THE CODE CHANGES!
[COMMENT_INDENT]

Always indent your comments to the same degree as the surrounding code.

[COMMENT_INLINE]

Try to line up inline comments where convenient. In other words, don't do this:

    let x = 10  -- some cool comment about x
        y = 20 in  -- some even cooler comment about y

Instead, do this:

    let x = 10     -- some cool comment about x
        y = 20 in  -- some even cooler comment about y

[COMMENT_PRECEDING]

Do not write comments that apply to the preceding code if you can possibly avoid it. Try to write comments that refer to the current line of code or to the lines of code which immediately follow. For instance, this is bad:
```
    fib :: Int -> Int
    fib n = ...
    --
    -- This function computes the 'n'th fibonacci number.
    --
```
and this is good:
```
    --
    -- This function computes the 'n'th fibonacci number.
    --
    fib :: Int -> Int
    fib n = ...
```

Functions

[FUNCTION_BLANK_LINES]

Please separate your function definitions by one or two blank lines. Otherwise it's hard to find where a function definition begins.
[FUNCTION_STARTING_COLUMN]

Start the line that begins a function in column 0 (the leftmost column).

Finally...

Don't worry if you can't remember all of these rules; I don't expect you to. At this point it's more important that you develop an intuition for what is good and what is bad style, and if you aren't sure, you can refer back to this page later.