Apache Flink is an open source platform for distributed stream and batch data processing.
Zeppelin comes with pre-configured flink-local interpreter, which starts Flink in a local mode on your machine, so you do not need to install anything.
At the "Interpreters" menu, you have to create a new Flink interpreter and provide next properties:
property | value | Description |
---|---|---|
host | local | host name of running JobManager. 'local' runs flink in local mode (default) |
port | 6123 | port of running JobManager |
xxx | yyy | anything else from [Flink Configuration](https://ci.apache.org/projects/flink/flink-docs-release-0.9/setup/config.html) |
In example, by using the Zeppelin notebook is from Till Rohrmann's presentation "Interactive data analysis with Apache Flink" for Apache Flink Meetup.
%sh
rm 10.txt.utf-8
wget http://www.gutenberg.org/ebooks/10.txt.utf-8
%flink
case class WordCount(word: String, frequency: Int)
val bible:DataSet[String] = env.readTextFile("10.txt.utf-8")
val partialCounts: DataSet[WordCount] = bible.flatMap{
line =>
"""\b\w+\b""".r.findAllIn(line).map(word => WordCount(word, 1))
// line.split(" ").map(word => WordCount(word, 1))
}
val wordCounts = partialCounts.groupBy("word").reduce{
(left, right) => WordCount(left.word, left.frequency + right.frequency)
}
val result10 = wordCounts.first(10).collect()