Discussion:
Accessing json fields within CSV file
ankit jain
2017-06-08 09:22:51 UTC
Permalink
Hi,
I have a few psv file with a few of the columns being a json key value map.
1|xyz|{"key1":"value1", "key2":"value2"}|abc|
I am converting these files to parquet format but want to convert the json
key and values to different columns. How is that possible?

end product being:
id name key1 key2 description
1 xyz value1 value2 abc

Right now am doing something like this but the json column wont explode:

CREATE TABLE dfs.data.`/logs/logsp/` AS SELECT
CAST(columns[0] AS INT) `id`,
columns[1] AS `name`,
columns[2] AS `json_column`,
columns[3] AS `description`,
from dfs.data.`logs/events.tbl`;
And this is what I get

id name json_column description
1 xyz {"key1":"value1", "key2":"value2"} abc

Thanks in advance,
Ankit Jain
Andries Engelbrecht
2017-06-08 19:37:40 UTC
Permalink
You can use convert_from and JSON data type.

0: jdbc:drill:> select t.col1, t.col2, t.conv.key1 as key1, t.conv.key2 as key2, t.col4 from
. . . . . . . > (select columns[0] as col1 , columns[1] as col2 , convert_from(columns[2], 'JSON') as conv , columns[3] as col4 from `/flat/psv-json/json.tbl`) t;
+-------+-------+---------+---------+-------+
| col1 | col2 | key1 | key2 | col4 |
+-------+-------+---------+---------+-------+
| 1 | xyz | value1 | value2 | abc |




If you want to use functions like flatten you will need to make sure the JSON in represented in an array.
i.e. [{"key":1, "value": 1},{"key":2, "value":2}]

0: jdbc:drill:> select t.col1, t.col2, t.conv.key as key, t.conv.`value` as `value`, t.col4 from
. . . . . . . > (select columns[0] as col1, columns[1]as col2, flatten((convert_from(columns[2],'JSON'))) as conv, columns[3] as col4 from `/flat/psv-json/json.tbl`) t;
+-------+-------+------+--------+-------+
| col1 | col2 | key | value | col4 |
+-------+-------+------+--------+-------+
| 1 | xyz | 1 | 1 | abc |
| 1 | xyz | 2 | 2 | abc |
+-------+-------+------+--------+-------+



--Andries




On 6/8/17, 2:22 AM, "ankit jain" <***@gmail.com> wrote:

Hi,
I have a few psv file with a few of the columns being a json key value map.
1|xyz|{"key1":"value1", "key2":"value2"}|abc|
I am converting these files to parquet format but want to convert the json
key and values to different columns. How is that possible?

end product being:
id name key1 key2 description
1 xyz value1 value2 abc

Right now am doing something like this but the json column wont explode:

CREATE TABLE dfs.data.`/logs/logsp/` AS SELECT
CAST(columns[0] AS INT) `id`,
columns[1] AS `name`,
columns[2] AS `json_column`,
columns[3] AS `description`,
from dfs.data.`logs/events.tbl`;
And this is what I get

id name json_column description
1 xyz {"key1":"value1", "key2":"value2"} abc

Than

Loading...