Lab: Pipeline Style in Haskell
Pipleline style in Haskell - Introduction
In this lab you will implement a solution of the Pipeline style in Haskell for the term frequency task.
Implementing the solution in Haskell gives you the opportunity to practice some of the functional programming concepts we have covered in the course (e.g., higher-order functions, currying, function composition, repetition without loops, and so on).
The task you have to solve and the constraints of the style are the same as the ones specified in the JavaScript version lab.
Create and Clone Your GitHub Repository
To create the repository for this lab, fork this special starter repository.
Your repository now exists on the GitHub servers. If you want to work on it, you first have to “clone” it on your computer.
Implement Your Program
Once you have the repository (a directory) on your computer,
you can open it in your IDE (e.g., in VS Code).
You can easily do this using the code
command, passing as an argument the path to the appropriate directory:
code ~/lab-h1-pipeline-haskell
Note that the starting repository for this lab is slightly different from the previous ones. Make sure to read the README.md
.
The repository contains a Haskell file named Pipeline.hs
that you should use as a starting point.
Take a look at the content of the file to get an idea of the structure of the program,
as well as the functions you need to implement (with names, types, and descriptions).
There are no automated tests for reasons we are going to discuss soon.
Let’s Ignore I/O for Now
Performing I/O in Haskell involves monads, which we still have to cover in the course.
Therefore, we will focus for the moment on writing the core of our program.
We will implement a pipeline
function that has a String
parameter (the text to be processed) and produces the result as a String
.
This function is passed to the interact
function, which reads the entire input (from the standard input),
passes it to our pipeline
function, and then prints the result to the standard output.
That will be our main
function.
All of this is already implemented for you in the starter code.
Note that at the beginning of the file, the content of the stop words file and the number of frequencies to print are hardcoded. The only “input” to our program is the text to be processed. (This is why we don’t have automated tests for this lab. There are still reference output files for you to manually compare.)
Haskell Fragments for Your Functions
Try to implement and use each function in the REPL (ghci
) before integrating it into the pipeline in the main file.
To wit: do not try to write the entire implementation at once and compile it only at the end. You will likely end up with several errors that are hard to understand. Benefit from having small functions that you can work on individually!
The next sections give you ALL the ingredients you need to implement the pipeline. We will not always use the shortest or most idiomatic Haskell code: we will instead keep focusing on versions you can fully understand at this point.
Repeat computation with higher-order functions
Use higher-order functions to repeat computations.
Map
ghci> map (\n -> n * n) [1, 10, 100]
[1, 100, 10000]
Filter
ghci> filter (\s -> length s >= 1) ["ab", "", "a"]
["ab", "a"]
Reduce / Fold
ghci> foldl (+) 0 [1, 2, 3, 4]
10
ghci> foldl (\acc n -> acc + n) 0 [1, 2, 3, 4]
10
These two variants are equivalent. The first one uses the +
operator directly (surrounded by parentheses, treated as a function), while the second one explicitly uses a lambda to combine the accumulator and the current element. The first is more idiomatic, but feel free to use the second one if you prefer more explicit code.
Working with pairs
A pair is a tuple of two elements. You can use the functions fst
and snd
to access the first and second element, respectively.
ghci> fst ("a", 1)
"a"
ghci> snd ("a", 1)
1
Working with lists
Concatenate lists
Use the ++
operator:
ghci> [1, 2] ++ [3, 4]
[1, 2, 3, 4]
Check if an element is in a list
Use the elem
function:
ghci> elem 2 [1, 2, 3]
True
ghci> elem 4 [1, 2, 3]
False
You may find the opposite function notElem
useful as well.
Reverse a list
Use the reverse
function:
ghci> reverse [1, 2, 3]
[3, 2, 1]
Sort a list on a key
Use the sortOn
function from the Data.List
module, passing a function that extracts the key to sort on:
ghci> import Data.List (sortOn)
ghci> sortOn snd [("a", 2), ("b", 1)]
[("b", 1), ("a", 2)]
Take the first n
elements of a list
Use the take
function:
ghci> take 2 [1, 2, 3]
[1, 2]
ghci> take 5 [1, 2, 3]
[1, 2, 3]
Working with characters and strings
Check if a character is alphanumeric
Use isAlphaNum
from the Data.Char
module:
ghci> import Data.Char (isAlphaNum)
ghci> isAlphaNum 'a'
True
ghci> isAlphaNum ':'
False
Convert a character to lowercase
Use toLower
from the Data.Char
module:
ghci> import Data.Char (toLower)
ghci> toLower 'A'
'a'
Turn a character into a string
A string is a list of characters. You can simply use the list literal syntax:
ghci> ['a']
"a"
Concatenate strings
A string is a list of character, so you can use the ++
operator as shown above.
“Stringify” a value
Use the show
function (e.g., to convert an integer to a string):
ghci> show 42
"42"
Range of characters
You can use the “range syntax”:
ghci> ['a'..'c']
"abc" -- equivalent to ['a', 'b', 'c']
Split a String
The built-in words
function splits a string on spaces:
ghci> words "hello world"
["hello", "world"]
For a more generic split operation, use the splitOn
function from the Data.List.Split
library (we have simply packaged this library in Split.hs
for you to use without extra installation):
ghci> :load Split.hs
ghci> splitOn "," "hello,world"
["hello", "world"]
Working with maps / dictionaries
Create an empty map
Use empty
from the Data.Map
module:
ghci> import Data.Map (empty)
ghci> empty
fromList []
Update or insert a key-value pair
Use the insertWith
function from the Data.Map
module.
If the key is not present in the map, the value is inserted.
If the key is present, the function passed as the first argument is used to combine the old and new values.
ghci> import Data.Map (insertWith)
ghci> mapAfterFirstElem = insertWith (+) "a" 10 empty
ghci> mapAfterFirstElem
fromList [("a", 10)]
ghci> insertWith (+) "a" 5 mapAfterFirstElem
fromList [("a", 15)]
ghci> insertWith (+) "b" 5 mapAfterFirstElem
fromList [("a", 10), ("b", 5)]
The first argument is a function that combines the old and new value if the key already exists. The second argument is the key. The third argument is the new value (to be used as a default if the key does not exist). The fourth and last argument is the map to update.
This data structure is immutable, thus insertWith
returns a new, updated map.
Turn a map into a list of key-value pairs
Use the toList
function from the Data.Map
module:
ghci> import Data.Map (toList)
ghci> toList mapAfterFirstElem
[("a", 10)]
Compose functions
As seen during the course, you can compose functions using the infix .
operator,
evocative of the “ring” symbol used to denote function composition in mathematics.
The function on the right is applied first.
A good mnemonic is to read .
as “after”.
ghci> increment x = x + 1
ghci> square x = x * x
ghci> squareAfterIncrement = square . increment
ghci> squareAfterIncrement 2
9
Implement the Entire Pipeline
Once you have working functions, add your implementation in the Pipeline.hs
file.
The file already contains all the signatures of the functions, which both help you to understand their behavior
and the compiler to provide better error messages.
Run Your Program
Compile your program with ghc
:
ghc Pipeline.hs
This will produce an executable file named Pipeline
.
You can run your program by executing the following command, using the shell redirection operator <
to pass the content of a file as input:
./Pipeline < input-small.txt
The output will be printed to the standard output (courtesy of the interact
function).