Date and time calculations are quite slow so I am exploring ways to minimize the number of calculations performed. As an example:
msft = FinancialData["MSFT", "Jan 1, 2005"]
Timing[DateListPlot[msft]]
{0.253632, *picture*}
whereas
Timing[
msft1 = msft;
msft1[[All, 1]] = AbsoluteTime /@ msft[[All, 1]];
DateListPlot[msft1]
]
{0.214658, *picture*}
So it is faster to convert to AbsoluteTime
and make a plot than to stick with date lists (as a side issue this can actually be done even faster with ListPlot
or ListLinePlot
and a tick function).
When wanting to work with date lists to select subsets of data you will typically choose a search criteria in which the absolute time of elements is tested against some condition. So this is another example where it is useful to already have the data in absolute time.
So the question is whether it is possible to import dates directly into Mathematica and have them as absolute times rather than date lists. My two specific examples would be FinancialData
as per the example above -- can this be made to output absolute times via some configuration rather than having to convert the data post import? Also for Excel spreadsheets containing columns with dates (i.e. date strings as per Excel formats) and data. Can these be imported and converted to absolute times during the import process instead of converted to, and output as, date lists?
(If solutions to this problem means that Import
and FinancialData
internals take date lists and convert to absolute time then I don't really see that as a solution. I am seeking ways to go direct to absolute time.)
Answer
The data in the file test.xls
are
03/Jan/2000 45.46
04/Jan/2000 43.92
05/Jan/2000 44.38
06/Jan/2000 42.9
07/Jan/2000 43.46
10/Jan/2000 43.78
11/Jan/2000 42.65
12/Jan/2000 41.26
13/Jan/2000 42.04
14/Jan/2000 43.78
An alternative approach is to exploit the fact that Office documents are zipped collections of XML
files. So,
Step 1: rename the source file by adding .zip
to the file name: test.xlsx.zip
.
Step 2: Import the appropriate xml
file in the zip
file, extract the data elements and re-format:
Cases[Import["C:\\ your directory \\test.xlsx.zip", {"ZIP", "xl\\worksheets\\sheet1.xml"}],
XMLElement["v", {}, {value_}] :> value, Infinity]
// Partition[#, 2] &
This gives:
{{"36528", "45.46"}, {"36529", "43.92"}, {"36530", "44.38"}, {"36531", "42.9"}, {"36532", "43.46"}, {"36535", "43.78"}, {"36536", "42.65"}, {"36537", "41.26"}, {"36538", "42.04"}, {"36539", "43.78"}}
where the first entry in each sublist is Excel's DATEVALUE
(serial date number that counts the number of days from 1/1/1900).
Puzzle: I would expect that converting Excel's DATEVALUE
to Mathematica's AbsoluteTime
(number of seconds from 1/1/1900) would be as simple as multiplying the former by 24*60*60
. But doing that with:
excelDateValues = {"36528", "36529", "36530", "36531", "36532", "36535", "36536", "36537", "36538", "36539"}
and
DateList /@ (24*60*60*ToExpression@excelDateValues)
gives
{{2000, 1, 5, 0, 0, 0.}, {2000, 1, 6, 0, 0, 0.}, {2000, 1, 7, 0, 0, 0.},
{2000, 1, 8, 0, 0, 0.}, {2000, 1, 9, 0, 0, 0.}, {2000, 1, 12, 0, 0, 0.},
{2000, 1, 13, 0, 0, 0.}, {2000, 1, 14, 0, 0, 0.}, {2000, 1, 15, 0, 0, 0.},
{2000, 1, 16, 0, 0, 0.}}
which is off by two days. Hopefully, there is a less naive approach to the get the right conversion factor to go from excel Datevalues
to Mma AbsoluteTime
so that a modified version of Cases[]
above gives the desired result.
Puzzle resolved: Thanks to Mr.Wizard's reference, the historical background to the two-day discrepancy is explained beautifully in Joel Spolsky's great story . So, unless your data does contain dates going back early 1900's for most
cases just subtracting 2 from final output dates should be ok. But ... things can get more complicated considering possible excel date system settings and varying defaults accross OSs. (see XL 1900 and 1904 date systems)
EDIT: Import
uses the filename extension if no format is provided as the second argument. For zip files it returns the filenames in the zipped archive. For the example case
Import["C:\\ your directory \\test.xlsx.zip"]
returns
{"[Content_Types].xml", "_rels\\.rels","xl\\_rels\\workbook.xml.rels", "xl\\workbook.xml", "xl\\styles.xml", "xl\\worksheets\\sheet1.xml", "xl\\theme\\theme1.xml", "customXml\\item1.xml", "customXml\\_rels\\item1.xml.rels", "customXml\\_rels\\item2.xml.rels", "docProps\\app.xml", "customXml\\itemProps2.xml", "customXml\\item2.xml", "customXml\\itemProps1.xml", "docProps\\core.xml"}
Comments
Post a Comment