Lab: Quarantine Style in Haskell
Quarantine style in Haskell - Introduction
In this lab you will implement a solution of the Quarantine style in Haskell for the term frequency task.
(The unchanged task description is available in the Prelude of the Pipeline style lab.)
Haskell prevents performing I/O without dealing with monads, so we will be effectively forced to write the solution in the Quarantine style. As opposed to the solution for the Pipeline style, in which we circumvented the problem by performing all the input at the beginning and the output at the end and hardcoded stopwords and number of frequencies to print, in this lab we will deal with I/O properly.
Create and Clone Your GitHub Repository
To create the repository for this lab, fork this starter repository.
Your repository now exists on the GitHub servers. If you want to work on it, you first have to “clone” it on your computer.
Implement Your Program
Once you have the repository (a directory) on your computer,
you can open it in your IDE (e.g., in VS Code).
You can easily do this using the code command, passing as an argument the path to the appropriate directory:
code ~/lab-h2-quarantine-haskell
Copy the library file Split.hs from the starter repository of the Pipeline lab in Haskell.
Before you start “hacking”, please read the README and the rest of this lab.
The file named TermFrequency.hs already contains a skeleton of the program, including the type signatures of all the functions you need to implement.
For most of them, you can borrow the Haskell implementation you wrote for the Pipeline style in Haskell. Here is a list of the differences:
- The
mainfunction is now supposed to get the command-line arguments, pass them as a list of strings to themainWithArgsfunction, and print the result. - The
mainWithArgsfunction is defined in two cases using pattern matching on its first and only parameter (the list of arguments): one case when that list has only two elements, as we expect when the program is launched correctly, and the other cases for all the remaining situations (_). Use thereadfunction to convert the second argument to an integer when needed. When the argument list is not appropriate, the function should indicate this by returning a string with an error message. - The
interestingWordsfunction filters the interesting words based on the list of stop words read from thestop_words.txtfile.
Working with Monads in Haskell
We saw that monads have two fundamental operations: “unit” and “bind”. Let’s play a bit with them in Haskell with the Maybe monad, which represents computations that might fail.
Suppose we have two dummy databases, one that associates names to user IDs and another that associates user IDs to passwords:
> name2id = [("Matthias", 123), ("Luca", 124)]
> id2pass = [(123, "password"), (124, "Passw0rd!")]
We can use the lookup function to retrieve the value (second element of the tuple) for a given key (first element), if the key exists.
The function returns a value wrapped in a Just if the key is found; otherwise, we get Nothing.
> :t lookup
lookup :: Eq a => a -> [(a, b)] -> Maybe b
> lookup "Luca" name2id
Just 124
> lookup "Igor" name2id
Nothing
How can we continue the process and find the password, only when the user ID is found? We need to use lookup again, which is a function returning a monadic value. Thus, we have to use “bind” to compose.
In Haskell, “bind” is realized with the >>= operator. We can inspect its type:
> :t (>>=)
(>>=) :: Monad m => m a -> (a -> m b) -> m b
For any monad m, it takes a monadic value (m a) and a function that takes a value and returns a monadic value (a -> m b), and it produces a monadic value (m b).
We can use it to chain the two lookups:
> findPass name = lookup name name2id >>= \id -> lookup id id2pass
> findPass "Luca"
Just "Passw0rd!"
> findPass "Igor"
Nothing
What if we also want to chain another function, for example one that prints a welcome message with the username and the password when everything is found?
> greet name pwd = "Welcome, " ++ name ++ "! Your password is " ++ pwd
By itself, this is not a monadic function. We can’t directly use >>=. We can proceed in two ways:
-
We can create a monadic value with the “unit” operation. In Haskell, the function
return(not to be confused with thereturnkeyword for the return statement in many other languages) is used to create a monadic value.The function
returnis highly generic: it takes a value of any type (a) and produces a monadic value of that type (m a), wheremis the monad we are working with:> :t return return :: Monad m => a -> m aWe can use it to create a monadic value that prints a welcome message:
> login name = findPass name >>= \pwd -> return (greet name pwd) > login "Luca" Just "Welcome, Luca! Your password is Passw0rd!" > login "Igor" Nothing -
We can use the “map” operation of monads. In Haskell, the function
fmapapplies a function to the value “inside” a monadic value. Thefinfmapstands for functor;fmapworks for monads, because all monads are functors.We can check the type of
fmap:> :t fmap fmap :: Functor f => (a -> b) -> f a -> f bThe first parameter is the mapping function (from
atob) and the second parameter is our monadic value (f a). The result is a monadic value (f b).Thus we can do:
> login name = fmap (\pwd -> greet name pwd) (findPass name)to get the same behavior as before.
There are also some variants of
fmapwhich come handy and can be used as infix operators. For example<$>is an infix version offmap:> import Data.Functor ((<$>)) > login name = (\pwd -> greet name pwd) <$> findPass nameIf you are building sequences of functions that you want to read “left to right”, you can also use
<&>, which flips the order of the arguments:> import Data.Functor ((<&>) > login name = findPass name <&> (\pwd -> greet name pwd)
The IO Monad in Haskell
Functions that perform I/O in Haskell involve the IO monad. Unlike in JavaScript and Java, we don’t need to implement/simulate it ourselves, because it is built into the language.
You will need to use the following functions:
getArgs :: IO [String] -- no parameters, returns an IO of list of strings with the command-line arguments
readFile :: FilePath -> IO String -- one string as a parameter, returns an IO of string with the content of the file
putStrLn :: String -> IO () -- one string as a parameter, prints it to stdout, returns an IO of unit (i.e., no information)
There is no explicit “run” operation for the IO monad (like we had simulated in JavaScript and Java).
Haskell “runs” the monadic I/O actions when it executes the main function, whose type is IO ().
To quote the documentation in the source code of GHC, the Haskell compiler:
A value of type
IO ais a computation which, when performed, does some I/O before returning a value of typea.There is really only one way to “perform” an I/O action: bind it to
mainin your program. When your program is run, the I/O will be performed. It isn’t possible to perform I/O from an arbitrary function, unless that function is itself in the ‘IO’ monad and called at some point, directly or indirectly, frommain.
Read the above once more, it’s truly a fundamental description!
Don’t get deceived by the behavior of the ghci REPL: it “runs” the I/O actions, giving the illusion that you can use the results directly:
> readFile "stop_words.txt"
"a,able,about,..."
But in reality, the readFile function returns an IO String, which is not a string but an action that, when executed, will read the file and return its content.
> :t readFile "stop_words.txt"
readFile "stop_words.txt" :: IO String
The “do” notation
Haskell also offers a special syntax (remember “syntactic sugar”?) for working with monads in a convenient way: the do notation.
Feel free to explore how it works (professional Haskell programmers use it frequently), but DO NOT use it in this lab.
We want you to practice and understand the basic operations with monads (unit, bind, map); using the do notation would just confuse things at the moment.
Compile and Run Your Program
You can compile your program with ghc:
ghc TermFrequency.hs
This will produce some auxiliary files and an executable file named TermFrequency. You can run it, for example, with:
./TermFrequency input-small.txt 2
You can also use the testing infrastructure (after installing the dependencies with npm install):
node test.js --size small --lang haskell --main "./TermFrequency"