Skip to content
Snippets Groups Projects

Exercise 1.4

This is a solution to exercise 1.4 (WebCrawler) of the hands-on training of lecture distributed systems.

The folder contains a gradle project. The main class for the gradle project set in build.gradle points to the web crawler application.

The class WebGrab is not part of the web crawler, but implements the basic functionality of HttpConnection from the slides. It can be started directly (e.g. using the Run-link in your IDE).

Start the webcrawler from a CLI (like bash, cmd, powershell) passing the root URL as follows:

  • $ gradle run --args="https://www.reutlingen.de"

That will work, if gradle is installed on your system, otherwise use the gradle wrapper contained in the folder (gradlew.bat, gradlew).

The programm makes a BFS starting at the root URL and scans for e-mail adresses. You can interrupt the crawler at any time by pressing <Enter>.

In the end, the programm prints a list of found e-mails to the screen.