Generic JDBC Interpreter for Apache Zeppelin
Overview
This interpreter lets you create a JDBC connection to any data source, by now it has been tested with:
- Postgres
- MySql
- MariaDB
- Redshift
- Apache Hive
- Apache Phoenix
- Apache Drill (Details on using Drill JDBC Driver)
- Apache Tajo
If someone else used another database please report how it works to improve functionality.
Create Interpreter
When you create a interpreter by default use PostgreSQL with the next properties:
| name | value |
|---|---|
| common.max_count | 1000 |
| default.driver | org.postgresql.Driver |
| default.password | ******** |
| default.url | jdbc:postgresql://localhost:5432/ |
| default.user | gpadmin |
It is not necessary to add driver jar to the classpath for PostgreSQL as it is included in Zeppelin.
Simple connection
Prior to creating the interpreter it is necessary to add maven coordinate or path of the JDBC driver to the Zeppelin classpath. To do this you must edit dependencies artifact(ex. mysql:mysql-connector-java:5.1.38) in interpreter menu as shown:
To create the interpreter you need to specify connection parameters as shown in the table.
| name | value |
|---|---|
| common.max_count | 1000 |
| default.driver | driver name |
| default.password | ******** |
| default.url | jdbc url |
| default.user | user name |
Multiple connections
JDBC interpreter also allows connections to multiple data sources. It is necessary to set a prefix for each connection to reference it in the paragraph in the form of %jdbc(prefix). Before you create the interpreter it is necessary to add each driver's maven coordinates or JDBC driver's jar file path to the Zeppelin classpath. To do this you must edit the dependencies of JDBC interpreter in interpreter menu as following:
You can add all the jars you need to make multiple connections into the same JDBC interpreter. To create the interpreter you must specify the parameters. For example we will create two connections to MySQL and Redshift, the respective prefixes are default and redshift:
| name | value |
|---|---|
| common.max_count | 1000 |
| default.driver | com.mysql.jdbc.Driver |
| default.password | ******** |
| default.url | jdbc:mysql://localhost:3306/ |
| default.user | mysql-user |
| redshift.driver | com.amazon.redshift.jdbc4.Driver |
| redshift.password | ******** |
| redshift.url | jdbc:redshift://examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com:5439 |
| redshift.user | redshift-user |
Bind to Notebook
In the Notebook click on the settings icon at the top-right corner. Use select/deselect to specify the interpreters to be used in the Notebook.
More Properties
You can modify the interpreter configuration in the Interpreter section. The most common properties are as follows, but you can specify other properties that need to be connected.
| Property Name | Description |
|---|---|
| {prefix}.url | JDBC URL to connect, the URL must include the name of the database |
| {prefix}.user | JDBC user name |
| {prefix}.password | JDBC password |
| {prefix}.driver | JDBC driver name. |
| common.max_result | Max number of SQL result to display to prevent the browser overload. This is common properties for all connections |
To develop this functionality use this method. For example if a connection needs a schema parameter, it would have to add the property as follows:
| name | value |
|---|---|
| {prefix}.schema | schema_name |
Examples
Hive
Properties
| Name | Value |
|---|---|
| hive.driver | org.apache.hive.jdbc.HiveDriver |
| hive.url | jdbc:hive2://localhost:10000 |
| hive.user | hiveuser |
| hive.password | hivepassword |
Dependencies
| Artifact | Excludes |
|---|---|
| org.apache.hive:hive-jdbc:0.14.0 | |
| org.apache.hadoop:hadoop-common:2.6.0 |
Phoenix
Properties
| Name | Value |
|---|---|
| phoenix.driver | org.apache.phoenix.jdbc.PhoenixDriver |
| phoenix.url | jdbc:phoenix:localhost:2181:/hbase-unsecure |
| phoenix.user | phoenixuser |
| phoenix.password | phoenixpassword |
Dependencies
| Artifact | Excludes |
|---|---|
| org.apache.phoenix:phoenix-core:4.4.0-HBase-1.0 |
Tajo
Properties
| Name | Value |
|---|---|
| tajo.driver | org.apache.tajo.jdbc.TajoDriver |
| tajo.url | jdbc:tajo://localhost:26002/default |
Dependencies
| Artifact | Excludes |
|---|---|
| org.apache.tajo:tajo-jdbc:0.11.0 |
How to use
Reference in paragraph
Start the paragraphs with the %jdbc, this will use the default prefix for connection. If you want to use other connection you should specify the prefix of it as follows %jdbc(prefix):
%jdbc
SELECT * FROM db_name;
or
%jdbc(prefix)
SELECT * FROM db_name;
Apply Zeppelin Dynamic Forms
You can leverage Zeppelin Dynamic Form inside your queries. You can use both the text input and select form parametrization features
%jdbc(prefix)
SELECT name, country, performer
FROM demo.performers
WHERE name=''
Bugs & Reporting
If you find a bug for this interpreter, please create a JIRA ticket.