How to store subset of columns from a csv file?
I need to create a table in hive
(or Impala
) by reading from a csv
file (named file.csv), the problem is that this csv
file could have a different number of columns each time I read it. The only thing I am sure of is that it will always have three columns called A, B, and C.
For example, the first csv
I get could be (the first row is the header):
------------------------
| X | Y | A | Z | B | C |
------------------------
| 1 | 2 | 3 | 4 | 5 | 6 |
and the second:
------------
| C | A | B |
-------------
| 1 | 2 | 3 |
And I need to store this in a table, maybe an external table. Something like this:
CREATE EXTERNAL TABLE file (A STRING, B STRING, C STRING)
AS
SELECT A, B, C
USING HEADER
LOCATION 'input/loading/';
That obviously does not work. Any ideas?
Category Data Science