I have two CSV files, each one is around 1 GB of data. When I use Import["file.csv"]
, it takes a very, very, long time to import the data. So how can I accelerate the import procedure?
The file contains around 2000 columns of different type of data, like numbers, category data and string. And contains around 140000 lines. And there are a lot of missing values in the data. So there is no assumption about the data set like the post as following:
Speeding up Importing and Exporting CSV format
In addition, since the column is encrypted like "VAR_0001", "VAR_0002", so we can not judge whether the column contains number or category or string data.
The original data file can be downloaded from the following link(around 1GB large):
The first 7000 rows of the dataset, around 45M of size:
Comments
Post a Comment