Skip to content

RDBMS Reader

RDBMS Reader plugin supports reading data from traditional RDBMS. This is a generic relational database reading plugin that can support more relational database reading by registering database drivers.

At the same time, RDBMS Reader is also the base class for other relational database reading plugins. The following reading plugins all depend on this plugin:

Note: If a dedicated database reading plugin is already provided, it is recommended to use the dedicated plugin. If the database you need to read does not have a dedicated plugin, consider using this generic plugin. Before use, you need to perform the following operations to run normally, otherwise exceptions will occur.

Configure Driver

Suppose you need to read data from IBM DB2. Since no dedicated reading plugin is provided, we can use this plugin to implement it. Before use, you need to download the corresponding JDBC driver and copy it to the plugin/reader/rdbmsreader/libs directory. If your driver class name is special, you need to find the driver item in the task configuration file and fill in the correct JDBC driver name, such as DB2's driver name com.ibm.db2.jcc.DB2Driver. If not filled, the plugin will automatically guess the driver name.

The following lists common databases and their corresponding driver names:

Configuration

The following configuration shows how to read data from Presto database to terminal

json
{
  "job": {
    "setting": {
      "speed": {
        "byte": 1048576,
        "channel": 1
      },
      "errorLimit": {
        "record": 0,
        "percentage": 0.02
      }
    },
    "content": {
      "reader": {
        "name": "rdbmsreader",
        "parameter": {
          "username": "hive",
          "password": "",
          "column": [
            "*"
          ],
          "driver": "io.prestosql.jdbc.PrestoDriver",
          "connection": {
            "table": [
              "default.table"
            ],
            "jdbcUrl": "jdbc:presto://127.0.0.1:8080/hive"
          },
          "fetchSize": 1024,
          "where": "1 = 1"
        }
      },
      "writer": {
        "name": "streamwriter",
        "parameter": {
          "print": true
        }
      }
    }
  }
}

Parameters

ConfigurationRequiredData TypeDefault ValueDescription
jdbcUrlYeslistNoneJDBC connection information of the target database, jdbcUrl follows RDBMS official specifications and can include connection attachment control information
driverNostringNoneCustom driver class name to solve compatibility issues, see description below
usernameYesstringNoneUsername of the data source
passwordNostringNonePassword for the specified username of the data source
tableYeslistNoneSelected table names to be synchronized, using JSON data format. When configured for multiple tables, users need to ensure that multiple tables have the same table structure
columnYeslistNoneCollection of column names to be synchronized in the configured table, detailed description below
splitPkNostringNoneUse the field represented by splitPk for data sharding, which can greatly improve data synchronization efficiency, see notes below
autoPkNobooleanfalseWhether to automatically guess the sharding primary key, introduced in version 3.2.6, see description below
whereNostringNoneFiltering conditions for the table
sessionNolistNoneFor local connections, modify session configuration, see below
querySqlNostringNoneUse custom SQL instead of specified table to get data. When this item is configured, table and column configuration items are ignored
fetchSizeNoint1024Defines the number of batch data fetched between plugin and database server each time. Increasing this value may cause Addax OOM
excludeColumnNolistNoneColumn name fields to be excluded, only valid when column is configured as *

jdbcUrl

In addition to configuring necessary information, jdbcUrl configuration can also add specific configuration properties for each specific driver. Here we particularly mention that we can use configuration properties to support proxies to access databases through proxies. For example, for PrestoSQL database JDBC driver, it supports the socksProxy parameter, so the above configured jdbcUrl can be modified to:

jdbc:presto://127.0.0.1:8080/hive?socksProxy=192.168.1.101:1081

Most relational database JDBC drivers support socksProxyHost,socksProxyPort parameters for proxy access. There are also some special cases.

The following are the proxy types and configuration methods supported by various database JDBC drivers:

DatabaseProxy TypeProxy ConfigurationExample
MySQLsockssocksProxyHost,socksProxyPortsocksProxyHost=192.168.1.101&socksProxyPort=1081
PrestosockssocksProxysocksProxy=192.168.1.101:1081
PrestohttphttpProxyhttpProxy=192.168.1.101:3128

driver

In most cases, the JDBC driver for a database is fixed, but some have different recommended driver class names due to different versions, such as MySQL. The new MySQL JDBC driver type recommends using com.mysql.cj.jdbc.Driver instead of the previous com.mysql.jdbc.Driver. If you want to use the old driver name, you can configure the driver configuration item. Otherwise, the plugin will automatically guess the driver name based on the string in jdbcUrl.

column

Collection of column names to be synchronized in the configured table, using JSON array to describe field information. Users use * to represent default use of all column configurations, such as ["*"].

Supports column pruning, i.e., columns can be selected for partial export.

Supports column reordering, i.e., columns can be exported not according to table schema information.

Supports constant configuration, users need to follow JSON format:

["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"]

  • id is ordinary column name
  • `table` is column name containing reserved words
  • 1 is integer constant
  • 'bazhen.csy' is string constant
  • null is null pointer. Note that null here must appear as a string, i.e., quoted with double quotes
  • to_char(a + 1) is expression
  • 2.3 is floating point number
  • true is boolean value. Similarly, boolean values here must also be quoted with double quotes

Column must be explicitly filled and cannot be empty!

excludeColumn

There is a situation where we need to read most fields of a table. If the table has many fields, configuring column is obviously time-consuming. In particular, when we collect business data to big data platforms, we generally add some additional fields including partition fields and collection information. When we need to write back to business data tables, we need to exclude these fields. Under this consideration, we introduced the excludeColumn configuration item. When column is configured as *, the excludeColumn configuration item takes effect to exclude some fields.

For example:

json
{
  "column": ["*"],
  "excludeColumn": ["partition_col", "etl_time"]
}