Lab: Pipeline Style in JavaScript

Prelude

The labs in this course ask you to implement a solution (in JavaScript or Java) to the term frequency task, following a certain style.

Being this the first lab that deals with the term frequency task, it is a good moment to take a minute and acquaint ourselves with it.

Here is how Crista describes the problem in the Prologue of her book:

[…] the computational task in this book is trivial: given a text file, we want to display the N (e.g. 25) most frequent words and corresponding frequencies ordered by decreasing value of frequency. We should make sure to normalize for capitalization and to ignore stop words like “the”, “for”, etc. To keep things simple, we don’t care about the ordering of words that have equal frequencies. This computational task is known as term frequency.

And here is an example of an input and corresponding output after computing the term frequency:

Input:

White tigers live mostly in India
Wild lions live mostly in Africa

Output:

live  -  2
mostly  -  2
africa  -  1
india  -  1
lions  -  1
tigers  -  1
white  -  1
wild  -  1

You can find Crista’s Python implementations in her GitHub repository.

Note that the implementations go a bit beyond the task description given above:

  • the content of the input file is filtered by replacing all non-alphanumeric characters with a space;
  • removing stop words includes filtering out all single-letter words;
  • stop words (but not single-character words) come from a file, stop_words.txt, and the name of that file is hardcoded into the implementation.

Do the same in all your implementations. Moreover, her implementation uses a fixed N (most frequent words) of 25. In your implementation, the N must be given as a command line argument.

Pipleline style in Javascript - Introduction

In this lab you will implement a solution of the Pipeline style in JavaScript.

These are the constraints for the Pipeline style as specified in the book:

  • Larger problem is decomposed using functional abstraction. Functions take input and produce output.
  • No shared state between functions.
  • The larger problem is solved by composing functions​ one after the other, in a pipeline, as a faithful reproduction of mathematical function composition f ◦ g.

The goal of this lab is to write code that follows the Pipeline style. Crista’s book gives you an example in Python, but it does not go all the way.

Try to go further and create a solution that uses Pipeline style all the way. Concretely, this means:

  • breaking functions down into functions that are as small as possible (i.e., that do just one and one only thing)
  • implementing the functions in a functional style (i.e., avoiding mutation​ as much as possible)

Create and Clone Your GitHub Repository

To create the repository for this lab, fork this starter repository.

Your repository now exists on the GitHub servers. If you want to work on it, you first have to “clone” it on your computer.

Implement Your Program

You can find Crista’s Python implementation of the TermFrequency program in Pipeline style in her GitHub repository:

https://github.com/crista/exercises-in-programming-style/tree/master/06-pipeline

If you want, you can use this as a starting point for your JavaScript code.

Once you have the repository (a directory) on your computer, you can open it in your IDE (e.g., in VS Code). You can easily do this using the code command, passing as an argument the path to the appropriate directory:

code ~/lab-01-pipeline-javascript

Before you start “hacking”, please read the README (file README.md).

Then create a new JavaScript file named TermFrequency.js.

Implement a solution to the term frequency task in JavaScript following the Pipeline style. To ease your implementation efforts, especially if you are not familiar with JavaScript, the next section provides you with code snippets you can directly use in your solution.

We recommend you check out the example code even if you have prior experience with JavaScript, as it nudges you towards writing code in a functional style (embracing the spirit of the Pipeline style).

JavaScript Essentials

To solve the term frequency task, you will need several different features of the JavaScript language and its standard library.

(Some third-party libraries also offer convenient functions, but there are not needed for these labs. Do not use them.)

Perform Output

In JavaScript you can output any value by using the log method on the console object:

console.log("Hello World")

Use Command Line Arguments & Execute a JavaScript Program

When you execute JavaScript using Node.js, information about the current process is available through the process object.

In particular, process.argv is an array that contains the command-line arguments passed when the Node.js process was launched.

We can run our program executing this shell command:

node TermFrequency.js input-small.txt 25

node is the name of the virtual machine (i.e., the program that runs your program). TermFrequency.js is the name of your program. input-small.txt and 25 are additional arguments.

The following is a sample program, to be written in TermFrequency.js, that prints the whole process.argv array and additionally the two relevant arguments (which are in the third and fourth position – indexes 2 and 3, respectively).

console.log("process.argv:", process.argv);

const fileName = process.argv[2];
console.log("File:", fileName);

const limit = process.argv[3];
console.log("Limit:", limit);

(Note that we are passing multiple arguments to log, which are outputted space-separated.)

The output should look as follows:

process.argv: [
  '/Users/hauswirm/.nvm/versions/node/v12.18.3/bin/node', 
  '/Users/hauswirm/tmp/pipeline-javascript-hauswirth/TermFrequency.js', 
  'input-small.txt', 
  '25'
]
File: input-small.txt
Limit: 25

To recap, we executed the node command (the JavaScript VM) and passed three arguments to it:

  1. The name of the JavaScript file we want to run
  2. The name of the input text file
  3. The number of entries to be printed

Execute program via the test.js script (for testing)

For testing purposes, you can also execute your program using the test.js driver script we provide in your repository.

The test.js script will compare the output of your program with the correct output, and thus it will tell you whether your implementation produces the correct output.

Before you can run your program with test.js, you need to install all the necessary modules it needs. cd to your project directory and then run:

npm install

The test.js script understands three arguments:

  • --main XXX: the name of your JavaScript program (e.g., TermFrequency.js)
  • --lang XXX: the name of the programming language (e.g., javascript)
  • --size XXX: the size of the test input (small or large)

When we execute the same program written above via the test.js script, it ultimately will run the same shell command:

$ node test.js --size small --lang javascript --main TermFrequency.js
==> Running "node TermFrequency.js input-small.txt 25"
process.argv: [
  '/Users/hauswirm/.nvm/versions/node/v12.18.3/bin/node', 
  '/Users/hauswirm/tmp/pipeline-javascript-hauswirth/TermFrequency.js', 
  'input-small.txt', 
  '25'
]
File: input-small.txt
Limit: 25
==> Checking output
Test failed. Expected:
live  -  2
mostly  -  2
africa  -  1
india  -  1
lions  -  1
tigers  -  1
white  -  1
wild  -  1
but found:
process.argv: [
  '/Users/hauswirm/.nvm/versions/node/v12.18.3/bin/node', 
  '/Users/hauswirm/tmp/pipeline-javascript-hauswirth/TermFrequency.js', 
  'input-small.txt', 
  '25'
]
File: input-small.txt
Limit: 25

(Optional) Execute program in VS Code debugger

You might want to run your program from within the VS Code GUI (for example, to debug it). For that, you can create a “Launch Configuration” (launch.json). Here is an example:

    {
      // Use IntelliSense to learn about possible attributes.
      // Hover to view descriptions of existing attributes.
      // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
      "version": "0.2.0",
      "configurations": [
        {
          "type": "node",
          "request": "launch",
          "name": "Launch TermFrequency.js",
          "skipFiles": [
            "<node_internals>/**"
          ],
          "program": "${workspaceFolder}/TermFrequency.js",
          "args": ["input-small.txt", "25"],
        }
      ]
    }

You can provide the arguments (without node and without the name of the program) under args in the Launch Configuration.

When you then launch the program via the GUI (e.g., via the menu “Run Run Without Debugging”), you should see the following output in the VS Code Debug Console:
    Debugger attached.
    Waiting for the debugger to disconnect...
    Process exited with code 0
    process.argv: (4) ['/Users/hauswirm/.nvm/versions/node/v12.18.3/bin/node', '/Users/hauswirm/tmp/pipeline-javascript-hauswirth/TermFrequency.js', 'input-small.txt', '25']
    File: input-small.txt
    Limit: 25

Read a File

To work with files you need to import the fs module.

const fs = require("fs");

To read a text file into a string, you can use fs.readFileSync.

const pathToFile = "file.txt";
const text = fs.readFileSync(pathToFile, "utf8");

"utf8" stands for the UTF-8 text encoding, which is the encoding predominantly used these days.

Replace Characters

The JavaScript language supports so-called regular expressions. You can use them to search or replace substrings in a string.

The following expression evaluates to a new string containing the characters in text but with every sequence of “non-word” characters or underscores replaced by a single space:

text.replace(/[\W_]+/g, ' ')

\W (capital W) corresponds to a “non-word” character (i.e., not an upper or lower case letter, nor a digit). _ stands for itself. The square brackets denote a group of characters, and + requires that it occurs at least once. In the regular expression above, + applies to the group and therefore applies to one or more “non-word” or underscore characters.

Finally, g (which stands for global) applies the regular expression repeatedly to find and replace all the matching substrings.

Convert a String to Lowercase

Strings in JavaScript have a method toLowerCase that produces a copy of the string in lower case characters:

text.toLowerCase()

Split a String into an Array

Strings in JavaScript have a split method that can be used to split them into an array of possibly shorter strings. The string will be split at every instance of the given character.

Here we split the string text at every space:

text.split(' ')

Here we split the string text at every comma:

text.split(',')

Here we split the string at every character (because "" is the empty string, which leads to the “split after each character” behavior):

text.split("")

Append to an Array

Arrays have a concat method that you can use to merge two (or more) arrays together. It produces a new array.

This expression evaluates to a new array that contains all the elements of array1, followed by all the elements of array2:

array1.concat(array2)

To append a single element, avoid using the push method, as it mutates​ the array on which it is called.

You can adopt a bunch of different solutions:

  • Use again the concat method, passing as an argument an array literal​ that contains just one element:
    array1.concat([elem])
    
  • Use the spread syntax and an array literal​ to create a new array that contains all the elements of array1 and elem:
    [...array1, elem]
    

Sort an Array

You can sort the contents of an array using the sort method. You can specify how two elements are supposed to be compared by providing a two-parameter function (i.e., a “comparator”) that returns a number whose sign indicates the relative order of the two elements.

You can define such a comparator function in multiple ways, but an arrow function expression is particularly convenient.

(We will cover such functions in greater detail later on in the course.)

array.sort((a, b) => a - b)

Beware: it may look like the method sort does not perform mutation​, as it returns a reference to an array. It is, however, a reference to the original mutated array.

(Recent versions of JavaScript added the toSorted method, which does not perform mutation. In Node.js, this is only available since version 20.)

Iterate over an Array

With a for-of loop you can iterate over all the elements of an array:

for (element of array) {
  // ...
}

This is similar to a for-each loop in Java (for (String element : array) {...}).

Create a Map

In JavaScript you can create a map. A map (sometimes called a “dictionary”) is a data structure that holds key-value pairs. While an array associates a numerical index to a value, a map associates a key of an arbitrary type (e.g., a string) to a value​.

A JavaScript map is similar to a Java HashMap<K,V>.

The expression new Map() instantiates an empty map.

We can pass to the constructor​ an array of key-value pairs to initialize the map. Each pair is denoted in turn using an array with two elements.

As an example, this expression creates a map with one key (the string "age") and a corrisponding value (the number 18):

new Map([["age", 18]])

Here is another example with multiple entries (distributed over multiple lines for readability):

new Map([
  ["hello", 1],
  ["world", 42],
  ["!", 72]
])

You can use the method set to change (i.e., mutate) a map after it has been instantiated and add a new key value pair (using two parameters):

const cities = new Map();
cities.set("Lugano", 67082);

Access Values in a Map

Use the get method on a map to retrieve the value that corresponds to a given key (or undefined if no such key exists in the map):

const simpleMap = new Map([["age", 18]]);
const value = simpleMap.get("age"); 

Iterate over a Map

You can use a for-of loop over the result of the entries method to traverse all the entries of a map. Each entry is conceptually a tuple​, represented with an array always containing two elements: the key and the value.

for (e of simpleMap.entries()) {
  console.log(e);
}

Run, Test, and Debug the Program

Read the README.md and the first part of this lab to figure out how to run the program.

Test

We strongly recommend to use the test script, because it shows you whether your solution is functionally correct. For this, it compares your output with the reference output we provided. Obviously, this only works if you strictly follow the rules (e.g., don’t print out debug output), otherwise the script will tell you that you don’t produce the correct output.