Lab: Quarantine Style in Haskell
Quarantine style in Haskell - Introduction
In this lab you will implement a solution of the Quarantine style in Haskell for the term frequency task.
(The unchanged task description is available in the Prelude of the Pipeline style lab.)
Haskell prevents performing I/O without dealing with monads, so we will be effectively forced to write the solution in the Quarantine style. As opposed to the solution for the Pipeline style, in which we circumvented the problem by performing all the input at the beginning and the output at the end and hardcoded stopwords and number of frequencies to print, in this lab we will deal with I/O properly.
Create and Clone Your GitHub Repository
To create the repository for this lab, fork this starter repository.
Your repository now exists on the GitHub servers. If you want to work on it, you first have to “clone” it on your computer.
Implement Your Program
Once you have the repository (a directory) on your computer,
you can open it in your IDE (e.g., in VS Code).
You can easily do this using the code
command, passing as an argument the path to the appropriate directory:
code ~/lab-h2-quarantine-haskell
Copy the library file Split.hs
from the starter repository of the Pipeline lab in Haskell.
Before you start “hacking”, please read the README and the rest of this lab.
The file named TermFrequency.hs
already contains a skeleton of the program, including the type signatures of all the functions you need to implement.
For most of them, you can borrow the Haskell implementation you wrote for the Pipeline style in Haskell. Here is a list of the differences:
- The
main
function is now supposed to get the command-line arguments, pass them as a list of strings to themainWithArgs
function, and print the result. - The
mainWithArgs
function is defined in two cases using pattern matching on its first and only parameter (the list of arguments): one case when that list has only two elements, as we expect when the program is launched correctly, and the other cases for all the remaining situations (_
). Use theread
function to convert the second argument to an integer when needed. When the argument list is not appropriate, the function should indicate this by returning a string with an error message. - The
interestingWords
function filters the interesting words based on the list of stop words read from thestop_words.txt
file.
Working with Monads in Haskell
We saw that monads have two fundamental operations: “unit” and “bind”. Let’s play a bit with them in Haskell with the Maybe
monad, which represents computations that might fail.
Suppose we have two dummy databases, one that associates names to user IDs and another that associates user IDs to passwords:
> name2id = [("Matthias", 123), ("Luca", 124)]
> id2pass = [(123, "password"), (124, "Passw0rd!")]
We can use the lookup
function to retrieve the value (second element of the tuple) for a given key (first element), if the key exists.
The function returns a value wrapped in a Just
if the key is found; otherwise, we get Nothing
.
> :t lookup
lookup :: Eq a => a -> [(a, b)] -> Maybe b
> lookup "Luca" name2id
Just 124
> lookup "Igor" name2id
Nothing
How can we continue the process and find the password, only when the user ID is found? We need to use lookup
again, which is a function returning a monadic value. Thus, we have to use “bind” to compose.
In Haskell, “bind” is realized with the >>=
operator. We can inspect its type:
> :t (>>=)
(>>=) :: Monad m => m a -> (a -> m b) -> m b
For any monad m
, it takes a monadic value (m a
) and a function that takes a value and returns a monadic value (a -> m b
), and it produces a monadic value (m b
).
We can use it to chain the two lookups:
> findPass name = lookup name name2id >>= \id -> lookup id id2pass
> findPass "Luca"
Just "Passw0rd!"
> findPass "Igor"
Nothing
What if we also want to chain another function, for example one that prints a welcome message with the username and the password when everything is found?
> greet name pwd = "Welcome, " ++ name ++ "! Your password is " ++ pwd
By itself, this is not a monadic function. We can’t directly use >>=
. We can proceed in two ways:
-
We can create a monadic value with the “unit” operation. In Haskell, the function
return
(not to be confused with thereturn
keyword for the return statement in many other languages) is used to create a monadic value.The function
return
is highly generic: it takes a value of any type (a
) and produces a monadic value of that type (m a
), wherem
is the monad we are working with:> :t return return :: Monad m => a -> m a
We can use it to create a monadic value that prints a welcome message:
> login name = findPass name >>= \pwd -> return (greet name pwd) > login "Luca" Just "Welcome, Luca! Your password is Passw0rd!" > login "Igor" Nothing
-
We can use the “map” operation of monads. In Haskell, the function
fmap
applies a function to the value “inside” a monadic value. Thef
infmap
stands for functor;fmap
works for monads, because all monads are functors.We can check the type of
fmap
:> :t fmap fmap :: Functor f => (a -> b) -> f a -> f b
The first parameter is the mapping function (from
a
tob
) and the second parameter is our monadic value (f a
). The result is a monadic value (f b
).Thus we can do:
> login name = fmap (\pwd -> greet name pwd) (findPass name)
to get the same behavior as before.
There are also some variants of
fmap
which come handy and can be used as infix operators. For example<$>
is an infix version offmap
:> import Data.Functor ((<$>)) > login name = (\pwd -> greet name pwd) <$> findPass name
If you are building sequences of functions that you want to read “left to right”, you can also use
<&>
, which flips the order of the arguments:> import Data.Functor ((<&>) > login name = findPass name <&> (\pwd -> greet name pwd)
The IO Monad in Haskell
Functions that perform I/O in Haskell involve the IO
monad. Unlike in JavaScript and Java, we don’t need to implement/simulate it ourselves, because it is built into the language.
You will need to use the following functions:
getArgs :: IO [String] -- no parameters, returns an IO of list of strings with the command-line arguments
readFile :: FilePath -> IO String -- one string as a parameter, returns an IO of string with the content of the file
putStrLn :: String -> IO () -- one string as a parameter, prints it to stdout, returns an IO of unit (i.e., no information)
There is no explicit “run” operation for the IO
monad (like we had simulated in JavaScript and Java).
Haskell “runs” the monadic I/O actions when it executes the main
function, whose type is IO ()
.
To quote the documentation in the source code of GHC, the Haskell compiler:
A value of type
IO a
is a computation which, when performed, does some I/O before returning a value of typea
.There is really only one way to “perform” an I/O action: bind it to
main
in your program. When your program is run, the I/O will be performed. It isn’t possible to perform I/O from an arbitrary function, unless that function is itself in the ‘IO’ monad and called at some point, directly or indirectly, frommain
.
Read the above once more, it’s truly a fundamental description!
Don’t get deceived by the behavior of the ghci
REPL: it “runs” the I/O actions, giving the illusion that you can use the results directly:
> readFile "stop_words.txt"
"a,able,about,..."
But in reality, the readFile
function returns an IO String
, which is not a string but an action that, when executed, will read the file and return its content.
> :t readFile "stop_words.txt"
readFile "stop_words.txt" :: IO String
The “do” notation
Haskell also offers a special syntax (remember “syntactic sugar”?) for working with monads in a convenient way: the do
notation.
Feel free to explore how it works (professional Haskell programmers use it frequently), but DO NOT use it in this lab.
We want you to practice and understand the basic operations with monads (unit, bind, map); using the do
notation would just confuse things at the moment.
Compile and Run Your Program
You can compile your program with ghc
:
ghc TermFrequency.hs
This will produce some auxiliary files and an executable file named TermFrequency
. You can run it, for example, with:
./TermFrequency input-small.txt 2
You can also use the testing infrastructure (after installing the dependencies with npm install
):
node test.js --size small --lang haskell --main "./TermFrequency"