S3 Reader
S3 Reader plugin is used to read data on Amazon AWS S3 storage. In implementation, this plugin is written based on S3's official SDK 2.0.
This plugin also supports reading storage services compatible with S3 protocol, such as MinIO.
Configuration Example
The following sample configuration is used to read two files from S3 storage and print them out
json
{
"job": {
"setting": {
"speed": {
"channel": 1,
"bytes": -1
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": {
"reader": {
"name": "s3reader",
"parameter": {
"endpoint": "https://s3.amazonaws.com",
"accessId": "xxxxxxxxxxxx",
"accessKey": "xxxxxxxxxxxxxxxxxxxxxxx",
"bucket": "test",
"object": [
"1.csv",
"aa.csv",
"upload_*.csv",
"bb_??.csv"
],
"column": [
"*"
],
"region": "ap-northeast-1",
"fileFormat": "csv",
"fieldDelimiter": ","
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": true
}
}
}
}
}Parameters
| Configuration | Required | Data Type | Default Value | Description |
|---|---|---|---|---|
| endpoint | Yes | string | None | S3 Server EndPoint address, e.g. s3.xx.amazonaws.com |
| region | Yes | string | None | S3 Server Region address, e.g. ap-southeast-1 |
| accessId | Yes | string | None | Access ID |
| accessKey | Yes | string | None | Access Key |
| bucket | Yes | string | None | Bucket to read |
| object | Yes | list | None | Objects to read, can specify multiple and wildcard patterns, see description below |
| column | Yes | list | None | Column information of objects to read, refer to column description in RDBMS Reader |
| fieldDelimiter | No | string | , | Field delimiter for reading, only supports single character |
| compress | No | string | None | File compression format, default is no compression |
| encoding | No | string | utf8 | File encoding format |
| writeMode | No | string | nonConflict | |
| pathStyleAccessEnabled | No | boolean | false | Whether to enable path-style access mode |
object
When specifying a single object, the plugin can currently only use single-threaded data extraction.