Lab: Quarantine Style in Haskell

Quarantine style in Haskell - Introduction

In this lab you will implement a solution of the Quarantine style in Haskell for the term frequency task.

(The unchanged task description is available in the Prelude of the Pipeline style lab.)

Haskell prevents performing I/O without dealing with monads, so we will be effectively forced to write the solution in the Quarantine style. As opposed to the solution for the Pipeline style, in which we circumvented the problem by performing all the input at the beginning and the output at the end and hardcoded stopwords and number of frequencies to print, in this lab we will deal with I/O properly.

Create and Clone Your GitHub Repository

To create the repository for this lab, fork this starter repository.

Your repository now exists on the GitHub servers. If you want to work on it, you first have to “clone” it on your computer.

Implement Your Program

Once you have the repository (a directory) on your computer, you can open it in your IDE (e.g., in VS Code). You can easily do this using the code command, passing as an argument the path to the appropriate directory:

code ~/lab-h2-quarantine-haskell

Copy the library file Split.hs from the starter repository of the Pipeline lab in Haskell.

Before you start “hacking”, please read the README and the rest of this lab.

The file named TermFrequency.hs already contains a skeleton of the program, including the type signatures of all the functions you need to implement.

For most of them, you can borrow the Haskell implementation you wrote for the Pipeline style in Haskell. Here is a list of the differences:

The main function is now supposed to get the command-line arguments, pass them as a list of strings to the mainWithArgs function, and print the result.
The mainWithArgs function is defined in two cases using pattern matching on its first and only parameter (the list of arguments): one case when that list has only two elements, as we expect when the program is launched correctly, and the other cases for all the remaining situations (_). Use the read function to convert the second argument to an integer when needed. When the argument list is not appropriate, the function should indicate this by returning a string with an error message.
The interestingWords function filters the interesting words based on the list of stop words read from the stop_words.txt file.

Working with Monads in Haskell

We saw that monads have two fundamental operations: “unit” and “bind”. Let’s play a bit with them in Haskell with the Maybe monad, which represents computations that might fail.

Suppose we have two dummy databases, one that associates names to user IDs and another that associates user IDs to passwords:

> name2id = [("Matthias", 123), ("Luca", 124)]
> id2pass = [(123, "password"), (124, "Passw0rd!")]

We can use the lookup function to retrieve the value (second element of the tuple) for a given key (first element), if the key exists. The function returns a value wrapped in a Just if the key is found; otherwise, we get Nothing.

> :t lookup
lookup :: Eq a => a -> [(a, b)] -> Maybe b
> lookup "Luca" name2id
Just 124
> lookup "Igor" name2id
Nothing

How can we continue the process and find the password, only when the user ID is found? We need to use lookup again, which is a function returning a monadic value. Thus, we have to use “bind” to compose.

In Haskell, “bind” is realized with the >>= operator. We can inspect its type:

> :t (>>=)
(>>=) :: Monad m => m a -> (a -> m b) -> m b

For any monad m, it takes a monadic value (m a) and a function that takes a value and returns a monadic value (a -> m b), and it produces a monadic value (m b).

We can use it to chain the two lookups:

> findPass name = lookup name name2id >>= \id -> lookup id id2pass
> findPass "Luca"
Just "Passw0rd!"
> findPass "Igor"
Nothing

What if we also want to chain another function, for example one that prints a welcome message with the username and the password when everything is found?

> greet name pwd = "Welcome, " ++ name ++ "! Your password is " ++ pwd

By itself, this is not a monadic function. We can’t directly use >>=. We can proceed in two ways:

We can create a monadic value with the “unit” operation. In Haskell, the function return (not to be confused with the return keyword for the return statement in many other languages) is used to create a monadic value.

The function return is highly generic: it takes a value of any type (a) and produces a monadic value of that type (m a), where m is the monad we are working with:
```
 > :t return
 return :: Monad m => a -> m a
```
We can use it to create a monadic value that prints a welcome message:
```
 > login name = findPass name >>= \pwd -> return (greet name pwd)
 > login "Luca"
 Just "Welcome, Luca! Your password is Passw0rd!"
 > login "Igor"
 Nothing
```
We can use the “map” operation of monads. In Haskell, the function fmap applies a function to the value “inside” a monadic value. The f in fmap stands for functor; fmap works for monads, because all monads are functors.

We can check the type of fmap:
```
 > :t fmap
 fmap :: Functor f => (a -> b) -> f a -> f b
```
The first parameter is the mapping function (from a to b) and the second parameter is our monadic value (f a). The result is a monadic value (f b).

Thus we can do:
```
 > login name = fmap (\pwd -> greet name pwd) (findPass name)
```
to get the same behavior as before.

There are also some variants of fmap which come handy and can be used as infix operators. For example <$> is an infix version of fmap:
```
 > import Data.Functor ((<$>))
 > login name = (\pwd -> greet name pwd) <$> findPass name
```
If you are building sequences of functions that you want to read “left to right”, you can also use <&>, which flips the order of the arguments:
```
 > import Data.Functor ((<&>)
 > login name = findPass name <&> (\pwd -> greet name pwd)
```

The IO Monad in Haskell

Functions that perform I/O in Haskell involve the IO monad. Unlike in JavaScript and Java, we don’t need to implement/simulate it ourselves, because it is built into the language.

You will need to use the following functions:

getArgs :: IO [String]  -- no parameters, returns an IO of list of strings with the command-line arguments
readFile :: FilePath -> IO String  -- one string as a parameter, returns an IO of string with the content of the file
putStrLn :: String -> IO ()  -- one string as a parameter, prints it to stdout, returns an IO of unit (i.e., no information)

There is no explicit “run” operation for the IO monad (like we had simulated in JavaScript and Java). Haskell “runs” the monadic I/O actions when it executes the main function, whose type is IO ().

To quote the documentation in the source code of GHC, the Haskell compiler:

A value of type IO a is a computation which, when performed, does some I/O before returning a value of type a.

There is really only one way to “perform” an I/O action: bind it to main in your program. When your program is run, the I/O will be performed. It isn’t possible to perform I/O from an arbitrary function, unless that function is itself in the ‘IO’ monad and called at some point, directly or indirectly, from main.

Read the above once more, it’s truly a fundamental description!

Don’t get deceived by the behavior of the ghci REPL: it “runs” the I/O actions, giving the illusion that you can use the results directly:

> readFile "stop_words.txt"
"a,able,about,..."

But in reality, the readFile function returns an IO String, which is not a string but an action that, when executed, will read the file and return its content.

> :t readFile "stop_words.txt"
readFile "stop_words.txt" :: IO String

The “do” notation

Haskell also offers a special syntax (remember “syntactic sugar”?) for working with monads in a convenient way: the do notation. Feel free to explore how it works (professional Haskell programmers use it frequently), but DO NOT use it in this lab. We want you to practice and understand the basic operations with monads (unit, bind, map); using the do notation would just confuse things at the moment.

Compile and Run Your Program

You can compile your program with ghc:

ghc TermFrequency.hs

This will produce some auxiliary files and an executable file named TermFrequency. You can run it, for example, with:

./TermFrequency input-small.txt 2

You can also use the testing infrastructure (after installing the dependencies with npm install):

node test.js --size small --lang haskell --main "./TermFrequency"