COUNT on External Table in HIVE
I have been trying around the EXTERNAL table concepts in HIVE
CREATE EXTERNAL TABLE IF NOT EXISTS MovieData
(id INT, title STRING,releasedate date, videodate date,
URL STRING,unknown TINYINT, Action TINYINT, Adventure TINYINT,
Animation TINYINT,Children TINYINT, Comedy TINYINT, Crime TINYINT,
Documentary TINYINT, Drama TINYINT, Fantasy TINYINT,
Film-Noir TINYINT, Horror TINYINT, Musical TINYINT,
Mystery TINYINT, Romance TINYINT, Sci-Fi TINYINT,
Thriller TINYINT, War TINYINT, Western TINYINT)
COMMENT 'This is a list of movies and its genre'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
Created a table using the above statement and then used the LOAD statement to get the data populated.
LOAD DATA LOCAL INPATH '/home/ubuntu/MovieLens.txt' INTO TABLE MovieData;
Next time I DROP the table in HIVE and recreate it again and LOAD the data... But when I do a COUNT operation on the table I get double the values that's present in the file that I loaded.
I read through few articles that EXTERNAL table does not delete the data but the schema alone from the HIVE metastore... External Table
Can you please advise why does HIVE behave this way...
Topic hive
Category Data Science