Data Reader
DataReader plugin is specifically designed to generate data that meets certain rule requirements in development and testing environments.
In actual development and testing, we need to generate test data according to certain business rules, not just random content, such as ID card numbers, bank account numbers, stock codes, etc.
Why Reinvent the Wheel
Indeed, there are quite a few specialized data generation tools on the Internet, many of which are powerful and performant. However, most of these tools consider the data generation part but ignore the problem of writing data to the target end, or some consider it but only consider one or a limited number of databases.
It happens that the Addax tool can provide enough target-end writing capabilities, plus the existing Stream Reader is already a simple version of data generation tool. Therefore, adding some specific rules to this functionality and leveraging the diversity of the writing end naturally makes it a good data generation tool.
Configuration Example
Here I list all the rules currently supported by the plugin in the example below
{
"job": {
"setting": {
"speed": {
"byte": -1,
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": {
"reader": {
"name": "datareader",
"parameter": {
"column": [
{
"value": "1,100,",
"rule": "random",
"type": "double"
},
{
"value": "DataX",
"type": "string"
},
{
"value": "1",
"rule": "incr",
"type": "long"
},
{
"value": "1989/06/04 00:00:01,-1",
"rule": "incr",
"type": "date",
"dateFormat": "yyyy/MM/dd hh:mm:ss"
},
{
"value": "test",
"type": "bytes"
},
{
"rule": "address"
},
{
"rule": "bank"
},
{
"rule": "company"
},
{
"rule": "creditCard"
},
{
"rule": "debitCard"
},
{
"rule": "idCard"
},
{
"rule": "lat"
},
{
"rule": "lng"
},
{
"rule": "name"
},
{
"rule": "job"
},
{
"rule": "phone"
},
{
"rule": "stockCode"
},
{
"rule": "stockAccount"
}
],
"sliceRecordCount": 10
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": true,
"encoding": "UTF-8"
}
}
}
}
}Save the above content to job/datareader2stream.json
Then execute this task, the output result is similar to the following:
Details
$ bin/addax.sh job/datareader2stream.json
___ _ _
/ _ \ | | | |
/ /_\ \ __| | __| | __ ___ __
| _ |/ _` |/ _` |/ _` \ \/ /
| | | | (_| | (_| | (_| |> <
\_| |_/\__,_|\__,_|\__,_/_/\_\
:: Addax version :: (v4.0.2-SNAPSHOT)
2021-08-13 17:02:00.888 [ main] INFO VMInfo - VMInfo# operatingSystem class => com.sun.management.internal.OperatingSystemImpl
2021-08-13 17:02:00.910 [ main] INFO Engine -
{
"content":
{
"reader":{
"parameter":{
"column":[
{
"rule":"random",
"type":"double",
"scale": "2",
"value":"1,100,"
},
{
"type":"string",
"value":"DataX"
},
{
"rule":"incr",
"type":"long",
"value":"1"
},
{
"dateFormat":"yyyy/MM/dd hh:mm:ss",
"rule":"incr",
"type":"date",
"value":"1989/06/04 00:00:01,-1"
},
{
"type":"bytes",
"value":"test"
},
{
"rule":"address"
},
{
"rule":"bank"
},
{
"rule":"company"
},
{
"rule":"creditCard"
},
{
"rule":"debitCard"
},
{
"rule":"idCard"
},
{
"rule":"lat"
},
{
"rule":"lng"
},
{
"rule":"name"
},
{
"rule":"job"
},
{
"rule":"phone"
},
{
"rule":"stockCode"
},
{
"rule":"stockAccount"
}
],
"sliceRecordCount":10
},
"name":"datareader"
},
"writer":{
"parameter":{
"print":true,
"encoding":"UTF-8"
},
"name":"streamwriter"
}
},
"setting":{
"errorLimit":{
"record":0,
"percentage":0.02
},
"speed":{
"byte":-1,
"channel":1
}
}
}
2021-08-13 17:02:00.937 [ main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2021-08-13 17:02:00.938 [ main] INFO JobContainer - Addax jobContainer starts job.
2021-08-13 17:02:00.940 [ main] INFO JobContainer - Set jobId = 0
2021-08-13 17:02:00.976 [ job-0] INFO JobContainer - Addax Reader.Job [datareader] do prepare work .
2021-08-13 17:02:00.977 [ job-0] INFO JobContainer - Addax Writer.Job [streamwriter] do prepare work .
2021-08-13 17:02:00.978 [ job-0] INFO JobContainer - Job set Channel-Number to 1 channels.
2021-08-13 17:02:00.979 [ job-0] INFO JobContainer - Addax Reader.Job [datareader] splits to [1] tasks.
2021-08-13 17:02:00.980 [ job-0] INFO JobContainer - Addax Writer.Job [streamwriter] splits to [1] tasks.
2021-08-13 17:02:01.002 [ job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2021-08-13 17:02:01.009 [ taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2021-08-13 17:02:01.017 [ taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2021-08-13 17:02:01.017 [ taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
7.65 DataX 1 1989-06-04 00:00:01 test 天津市南京县长寿区光明路263号 交通银行 易动力信息有限公司 6227894836568607 6235712610856305437 450304194808316766 31.3732613 -125.3507716 龚军 机电工程师 13438631667 726929 8741848665
18.58 DataX 2 1989-06-03 00:00:01 test 江苏省太原市浔阳区东山路33号 中国银行 时空盒数字信息有限公司 4096666711928233 6217419359154239015 220301200008188547 48.6648764 104.8567048 匡飞 化妆师 18093137306 006845 1815787371
16.16 DataX 3 1989-06-02 00:00:01 test 台湾省邯郸市清河区万顺路10号 大同商行 开发区世创科技有限公司 4096713966912225 6212977716107080594 150223196408276322 29.0134395 142.6426842 支波 审核员 13013458079 020695 3545552026
63.89 DataX 4 1989-06-01 00:00:01 test 上海市辛集县六枝特区甘园路119号 中国农业银行 泰麒麟传媒有限公司 6227893481508780 6215686558778997167 220822196208286838 -71.6484635 111.8181273 敬坤 房地产客服 13384928291 174445 0799668655
79.18 DataX 5 1989-05-31 00:00:01 test 陕西省南京市朝阳区大胜路170号 内蒙古银行 晖来计算机信息有限公司 6227535683896707 6217255315590053833 350600198508222018 -24.9783587 78.017024 蒋杨 固定资产会计 18766298716 402188 9633759917
14.97 DataX 6 1989-05-30 00:00:01 test 海南省长春县璧山区碧海街147号 华夏银行 浙大万朋科技有限公司 6224797475369912 6215680436662199846 220122199608190275 -3.5088667 -40.2634359 边杨 督导/巡店 13278765923 092780 2408887582
45.49 DataX 7 1989-05-29 00:00:01 test 台湾省潜江县梁平区七星街201号 晋城商行 开发区世创信息有限公司 5257468530819766 6213336008535546044 141082197908244004 -72.9200596 120.6018163 桑明 系统工程师 13853379719 175864 8303448618
8.45 DataX 8 1989-05-28 00:00:01 test 海南省杭州县城北区天兴路11号 大同商行 万迅电脑科技有限公司 6227639043120062 6270259717880740332 430405198908214042 -16.5115338 -39.336119 覃健 人事总监 13950216061 687461 0216734574
15.01 DataX 9 1989-05-27 00:00:01 test 云南省惠州市和平区海鸥街201号 内蒙古银行 黄石金承信息有限公司 6200358843233005 6235730928871528500 130300195008312067 -61.646097 163.0882369 卫建华 电话采编 15292600492 001658 1045093445
55.14 DataX 10 1989-05-26 00:00:01 test 辽宁省兰州市徐汇区东山街176号 廊坊银行 创汇科技有限公司 6227605280751588 6270262330691012025 341822200908168063 77.2165746 139.5431377 池浩 多媒体设计 18693948216 201678 0692522928
2021-08-13 17:02:04.020 [ job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2021-08-13 17:02:04.021 [ job-0] INFO JobContainer - Addax Writer.Job [streamwriter] do post work.
2021-08-13 17:02:04.022 [ job-0] INFO JobContainer - Addax Reader.Job [datareader] do post work.
2021-08-13 17:02:04.025 [ job-0] INFO JobContainer - PerfTrace not enable!
2021-08-13 17:02:04.028 [ job-0] INFO StandAloneJobContainerCommunicator - Total 10 records, 1817 bytes | Speed 605B/s, 3 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-08-13 17:02:04.030 [ job-0] INFO JobContainer -
任务启动时刻 : 2021-08-13 17:02:00
任务结束时刻 : 2021-08-13 17:02:04
任务总计耗时 : 3s
任务平均流量 : 605B/s
记录写入速度 : 3rec/s
读出记录总数 : 10
读写失败总数 : 0Configuration Description
The configuration of column is slightly different from other plugins. A field consists of the following configuration items:
| Configuration | Required | Default Value | Example | Description |
|---|---|---|---|---|
| value | No | None | Addax | Data value, required in some cases |
| rule | No | constant | idCard | Data generation rule, detailed description below |
| type | No | string | double | Data value type |
| dateFormat | No | yyyy-MM-dd HH:mm:ss | yyyy/MM/dd HH:mm:ss | Date format, only valid when type is date |
rule Description
The core of this plugin's field configuration is the rule field, which is used to indicate what kind of data should be generated, and according to different rules, combined with other configuration options to generate data that meets expectations. Currently, the configurations of rule are all built-in supported rules, custom rules are not supported yet. Detailed description follows:
constant
constant is the default configuration of rule. This rule means that the data value to be generated is determined by the value configuration item, with no changes. For example:
{
"value": "Addax",
"type": "string",
"rule": "constant"
}This means the data value generated by this field is all Addax
incr
The meaning of incr configuration item is consistent with the incr meaning in the streamreader plugin, indicating this is an incremental data generation rule. For example:
{
"value": "1,2",
"rule": "incr",
"type": "long"
}This means the field data is a long integer, starting from 1, incrementing by 2 each time, forming an incremental sequence starting from 1 with step size 2.
For more detailed configuration rules and precautions for this field, refer to the incr description in streamreader.
random
The meaning of random configuration item is consistent with the random meaning in the streamreader plugin, indicating this is an incremental data generation rule. For example:
{
"value": "1,10",
"rule": "random",
"type": "string"
}This means the field data is a random string with length 1 to 10 (both 1 and 10 included).
For more detailed configuration rules and precautions for this field, refer to the random description in streamreader.
| Rule Name | Meaning | Example | Data Type | Description |
|---|---|---|---|---|
address | Randomly generate address information that basically meets domestic actual conditions | 辽宁省兰州市徐汇区东山街176号 | string | |
bank | Randomly generate a domestic bank name | 华夏银行 | string | |
company | Randomly generate a company name | 万迅电脑科技有限公司 | string | |
creditCard | Randomly generate a credit card number | 430405198908214042 | string | 16 digits |
debitCard | Randomly generate a debit card number | 6227894836568607 | string | 19 digits |
email | Randomly generate an email address | [email protected] | string | |
idCard | Randomly generate a domestic ID card number | 350600198508222018 | string | 18 digits, follows validation rules, first 6 digits meet administrative division requirements |
lat | Randomly generate latitude data | 48.6648764 | double | Fixed 7 decimal places, can also use latitude |
lng | Randomly generate longitude data | 120.6018163 | double | Fixed 7 decimal places, can also use longitude |
name | Randomly generate a domestic name | 池浩 | string | Currently doesn't consider surname distribution in China |
job | Randomly generate a domestic job title | 系统工程师 | string | Data source from recruitment websites |
phone | Randomly generate a domestic mobile phone number | 15292600492 | string | Currently doesn't consider virtual phone numbers |
stockCode | Randomly generate a 6-digit stock code | 687461 | string | First two digits meet domestic stock code standards |
stockAccount | Randomly generate a 10-digit stock trading account | 0692522928 | string | Completely random, doesn't meet account standards |
uuid | Randomly generate a UUID string | bc1cf125-929b-43b7-b324-d7c4cc5a75d2 | string | Completely random |
zipCode | Randomly generate a domestic postal code | 411105 | long | Doesn't fully meet domestic postal code standards |
Note: The data types returned by the rules in the above table are fixed and cannot be modified, so type doesn't need to be configured, and the configured type will be ignored. Since data generation comes from internal rules, value also doesn't need to be configured, and the configured content will be ignored.