Lab: Pipeline Style in Java

Pipleline style in Java - Introduction

In this lab you will implement a solution of the Pipeline style in Java for the term frequency task.

The task you have to solve and the constraints of the style are the same as the ones specified in the previous lab.

Unlike the prior lab, where you used JavaScript, this requires you to think about types.

Here, and in subsequent Java labs, try to use “strong types”. You should avoid circumventing the type checker (e.g., try to avoid casts, and avoid Object as a type).

Create and Clone Your GitHub Repository

To create the repository for this lab, fork this starter repository.

Your repository now exists on the GitHub servers. If you want to work on it, you first have to “clone” it on your computer.

Implement Your Program

You can find Crista’s Python implementation of the TermFrequency program in Pipeline style in her GitHub repository:

https://github.com/crista/exercises-in-programming-style/tree/master/06-pipeline

If you want, you can use this as a starting point for your Java code.

Once you have the repository (a directory) on your computer, you can open it in your IDE (e.g., in VS Code). You can easily do this using the code command, passing as an argument the path to the appropriate directory:

code ~/lab-02-pipeline-java

Before you start “hacking”, please read the README (file README.md).

Then create a new Java file named TermFrequency.java.

Implement a solution to the term frequency task in Java following the Pipeline style. To ease your implementation efforts, especially if you are not familiar with Java, the next section provides you with code snippets you can directly use in your solution.

We recommend you check out the example code even if you have prior experience with Java, as it nudges you towards writing code in a functional style (embracing the spirit of the Pipeline style).

Java Essentials

To solve the term frequency task, you will need several different features of the Java programming language and its standard library.

Perform Output

In Java you can output a value by using the println method on the System.out object:

System.out.println("Hello World");

Use Command Line Arguments & Execute a Java Program

The Java virtual machine passes the command-line arguments to the method main(String[]) in an array.

We can compile and run our program by executing these two shell commands:

javac TermFrequency.java
java TermFrequency input-small.txt 25

java is the name of the virtual machine (i.e., the program that runs your program). TermFrequency is the name of your class. input-small.txt and 25 are additional arguments.

The following is a sample program, to be written in TermFrequency.java, that prints the two relevant elements from the array of arguments (assuming that they exist).

public class TermFrequency {
  public static void main(final String[] args) {
    final String fileName = args[0];
    System.out.println("File: " + fileName);
    final int limit = Integer.parseInt(args[1]);
    System.out.println("Limit: " + limit);
  }
}

The output should look as follows:

File: input-small.txt
Limit: 25

To recap, after compiling we executed the java command (the Java VM) and passed three arguments to it:

  1. The name of the Java class we want to run
  2. The name of the input text file
  3. The number of entries to be printed

Execute program via the test.js script (for testing)

For testing purposes, you can also execute your program using the test.js driver script we provide in your repository.

The test.js script will compare the output of your program with the correct output, and thus it will tell you whether your implementation produces the correct output.

Before you can run your program with test.js, you need to install all the necessary modules it needs. cd to your project directory and then run:

npm install

The test.js script understands three arguments:

  • --main XXX: the name of your Java class (e.g., TermFrequency)
  • --lang XXX: the name of the programming language (e.g., java)
  • --size XXX: the size of the test input (small or large)

When we execute the same program written above via the test.js script, it ultimately will run the same shell command:

$ node test.js --size small --lang java --main TermFrequency
==> Compiling Java classes
==> Running "java TermFrequency input-small.txt 25"
File: input-small.txt
Limit: 25
==> Checking output
Test failed. Expected:
live  -  2
mostly  -  2
africa  -  1
india  -  1
lions  -  1
tigers  -  1
white  -  1
wild  -  1
but found:
File: input-small.txt
Limit: 25

(Optional) Execute program in VS Code debugger

You might want to run your program from within the VS Code GUI (for example, to debug it). For that, you can create a “Launch Configuration” (launch.json). Here is an example:

{
  // Use IntelliSense to learn about possible attributes.
  // Hover to view descriptions of existing attributes.
  // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
  "version": "0.2.0",
  "configurations": [
    {
      "type": "java",
      "name": "Launch TermFrequency",
      "request": "launch",
      "mainClass": "TermFrequency",
      "args": ["input-small.txt", "25"],
    }
  ]
}

You can provide the arguments (without java and without the name of the program) under args in the Launch Configuration.

When you then launch the program via the GUI (e.g., via the menu “Run Run Without Debugging”), you should see the following output in the VS Code Debug Console:
cd /Users/hauswirm/tmp/pipeline-java-hauswirth ; /Library/Java/JavaVirtualMachines/openjdk.jdk/Contents/Home/bin/java -Dfile.encoding=UTF-8 -cp "/Users/hauswirm/Library/Application Support/Code/User/workspaceStorage/441354108aedca8b25c039b16b1c5181/redhat.java/jdt_ws/pipeline-java-hauswirth_2a3ee310/bin" TermFrequency input-small.txt 25 
File: input-small.txt
Limit: 25

Read a File

To read or write text files, Java provides the classes ...Reader and ...Writer in the package java.io. These classes have methods that operate on characters. The class BufferedReader allows reading a file line by line (with the method readLine).

The newer packages java.nio... (“nio” stands for “New IO”) provide additional classes. One of them contains a method (since Java 11) with which you can read an entire text file into a String in one call.

Read line by line

Import the necessary classes from package java.io:

import java.io.FileReader;
import java.io.BufferedReader;

Then write the following code using BufferedReader.readLine():

final StringBuilder sb = new StringBuilder();
try (final BufferedReader br = new BufferedReader(new FileReader(pathToFile, StandardCharsets.UTF_8));) {
  String line;
  while ((line = br.readLine()) != null) {
    sb.append(line); // readline stripped line terminator
    sb.append('\n'); // add one
  }
  return sb.toString();
} catch (final IOException ex) {
  return null; // VERY BAD! DO NOT DO THIS IRL
}

Read in one go

Since Java 11 you can use the method java.nio.file.Files.readString() to read an entire text file.

Imports:

import java.nio.file.Files;
import java.nio.file.Paths;

Usage:

try {
  return Files.readString(Paths.get(pathToFile), StandardCharsets.UTF_8);
} catch (final IOException ex) {
  return null; // VERY BAD! DO NOT DO THIS IRL
}

How to handle problems?

The comment

// VERY BAD! DO NOT DO THIS IRL

is true. Never do this in real life.

Never swallow an exception. Exceptions are not well digestible. Here we swallow a possible IOException (with try/catch/return null) just because you might not have heard about exception-handling. And because in pure Pipeline style there should be no exceptions (the methods should be pure functions​, and thus they should not trigger any input or output, and thus there wouldn’t really be any problem with possible IOExceptions).

Text-Encoding

We specify (like in JavaScript and Python) explicitly the text encoding we expect the text file to be written in (UTF-8).

In Java, the class StandardCharsets provides a constant for each supported text encoding.

Replace Characters

Unlike JavaScript, Java doesn’t provide regular expressions as a language feature.

However, it provides classes in its library (like Pattern and Matcher) to work with regular expressions.

For our purposes we don’t even need those classes: we can directly use the method replaceAll() in the class String. The following expression produces a string in which every sequence of “non-word” or underscore (“_”) characters is replaced with a single space character:

text.replaceAll("[\\W_]+", " ")

The regular expression \W corresponds to a “non-word” character. However, because the backslash (\) in a Java String or char literal​ is used to start an escape sequence, we have to escape the backslash character itself (which we need in the regular expression) and thus end up with a double-backslash in the string literal. We write \\W for the regular expression \W.

_ stands for itself. The square brackets denote a group of characters, and + requires that it occurs at least once. In the regular expression above, + applies to the group and therefore applies to one or more “non-word” or underscore characters.

Convert a String to Lowercase

Like in JavaScript, Java String objects also have the method toLowerCase, which returns a copy of the string in lowercase characters:

text.toLowerCase()

Split a String into an Array

The class String in Java has a method split that can be used to split a string into an array of possibly shorter strings. The string will be split at every instance of the given character.

Here we split the string text at every space:

text.split(" ")

Here we split the string text at every comma:

text.split(",")

Here we split the string at every character (because "" is the empty string, which leads to the “split after each character” behavior):

text.split("")

Append to an Array

In Java, plain arrays have a fixed length. You cannot append to them.

We can use instead the class ArrayList.

This snippet instantiates an empty ArrayList of strings (note the type parameter <String>) and stores a reference to it in the list variable:

final List<String> list = new ArrayList<String>();

(We use the List interface as a type, which abstracts away how the list is actually implemented.)

You can use the static method asList of the class Arrays (plural!) to turn a plain array into a List:

final String[] array = ...;
final List<String> list = Arrays.asList(array);

Append an element

You can append a single element to a list using the method add:

names.add("Luca")

Beware: the method does not return a new list, but it mutates​ the object on which it is called.

Append a list

You can append a list at the end of another list using the method addAll.

final List<String> names = ...;
final List<String> moreNames = ...;

names.addAll(moreNames);

Note again that the method mutates​ names (but not moreNames).

Append an array

Alternatively, you can append all the elements of an array to a list. Use a static method of class Collections (plural!):

final List<String> list  = ...;
final String[] array = ...;

Collections.addAll(list, array);

Sort a list

To sort the contents of a list, you need to specify how two elements should be compared.

This comparison is implemented by a Comparator that has a method compareTo() with two parameters which returns the result of the comparison as a number (in the same vein as we did in JavaScript).

Comparator class:

class CountComparator implements Comparator<Integer> {
  @Override
  public int compareTo(Pair<String,Integer> a, Pair<String,Integer> b) {
    return b.getSecond() - a.getSecond();
  }
};

Usage:

final List<Pair<String,Integer>> list = ...;
list.sort(new CountComparator());

The implementation above assumes the existence of a Pair class that is not part of the standard Java library. You may want to implement one, or use any other data structure that you prefer (e.g., Map.Entry). (No third-party libraries, please.)

Iterate over a list

With a for-each loop you can iterate over all the elements of a list:

final List<String> list = ...;
for (final String element : list) {
  // ...
}

Note that using a loop is a very imperative way to achieve repetition. We will cover later on in the course more ways to repeat computations. But if you are already familiar with filter, map, reduce (and Java streams in general), please go ahead and make good use of them.

Create a Dictionary

In Java the idea of a “dictionary” is realized by the interface Map. An often-used implementation of Map is HashMap.

The following statement creates an empty dictionary and stores a reference to it in dict:

final Map<String,Integer> dict = new HashMap<String,Integer>();

You can use the put method to insert an entry (e.g., the key "age" and the value 18):

dict.put("age", 18)

Note that the method mutates​ the map on which it is called.

Lookup a Value in a Dictionary

Use the get method on a map to retrieve the value that corresponds to a given key (or null if no such key exists in the map):

dict.get("age")

Iterate over a Dictionary

With a for-each loop you can process all entries of a dictionary:

for (final String key : dict.keySet()) {
  final Integer value = dict.get(key);
  System.out.println(key + " : " + value);
}

Here we iterate explicitly over all keys (the key set) and then read, for each key, the corresponding value​.

Like for iterating over a list, we also use an imperative (loop-based) approach here. This is not very nice given that the Pipeline style is a pure functional style. If you know about filter, map, reduce (and Java streams in general), you may want to use those instead of using a loop.

Run, Test, and Debug the Program

Read the README.md and the first part of this lab to figure out how to run the program.

Test

We strongly recommend using the test script, because it shows you whether your solution is functionally correct. For this, it compares your output with the reference output we provided. Obviously, this only works if you strictly follow the rules (e.g., don’t print out debug output), otherwise the script will tell you that you don’t produce the correct output.