Skip to main content

visualization - Using Periodogram to investigate the seasonality of the data


After asking how to get rid of the seasonality of this set of data, I'd like to ask how to investigate the seasonality in data using Mathematica.


Which tool would you use to study seasonality? I.e. to realize that the seasonality is weekly. Is the Periodogram (link) a valid tool? Can I use SARIMA?


I've extended the set of data.


enter image description here



enter image description here


ps: i've extended the range of data



Answer



Sjoerd's approach using SARIMAProcess and TimeSeriesModelFit, in particular the last portion with in which you test SARIMA models of differing orders and observe which models are particularly favoured by the AIC, is certainly a valid approach.


However, since you asked about periodograms and other spectral methods in your question, I thought I'd give an answer to show a few additional plots you could generate as part of your initial exploratory data analysis, before you proceed to model fitting.


Let's begin by reading in the data. Since others have already shown you features like DateListPlot, I'm just going to grab the column with the day-by-day traffic counts


counts = Rest[Import["sampleTrafficData.csv"]][[All, 2]] 

Now, because we're dealing with count data, we have a problem -- the variance of the measurements is not constant across our data. So we need to apply some form of variance stabilizing transformation. There are a couple of possible choices, generally falling into families of square-root transforms and logarithmic transforms. For the purposes of this discussion, let's use the Anscombe transform:


anscombe = 2. Sqrt[counts + 3/8]; 

len = Length[anscombe];

Note that I've also converted to purely numerical data at this stage; this speeds up processing later.


One potentially useful diagnostic is the autocorrelation function:


DiscretePlot[CorrelationFunction[anscombe, i], {i, 0, 25}, 
PlotRange -> {-0.25, 1.045}, Frame -> True, Axes -> True,
GridLines -> {None, {{2/Sqrt[len], Dashed},
0, {-2/Sqrt[len], Dashed}}},
FrameLabel -> {"Lags", "Autocorrelation"}, ImageSize -> 500,
BaseStyle -> {FontSize -> 12}]


Plot of autocorrelation function


If we had independent and identically distributed measurement errors, we would expect almost all of the lagged autocorrelations to lie between the (approximate) 2$\sigma$ confidence bounds shown by the dashed gridlines. Clearly this is not the case here. Each day's traffic is strongly correlated with the traffic from the previous day. (Since we haven't attempted to remove any trend from the data, we're also likely seeing some contamination from that as well.) In addition, we can see some weak evidence for a regular pattern with period of 7 days -- although the autocorrelation is slowly decreasing at larger and larger lags, it rises to local maxima at lag 7, again at lag 14 (if you squint) and once more at lag 21.


A useful complement to the plot of the ACF is the partial autocorrelation function:


DiscretePlot[PartialCorrelationFunction[anscombe, i], {i, 1, 25}, 
PlotRange -> {-0.3, 1.045}, Frame -> True, Axes -> True,
GridLines -> {None, {{2/Sqrt[len], Dashed},
0, {-2/Sqrt[len], Dashed}}},
FrameLabel -> {"Lags", "Partial Autocorrelation"}, ImageSize -> 500,
BaseStyle -> {FontSize -> 12}]


Partial autocorrelation function


The PACF tracks only that component of the autocorrelation not accounted for by earlier lags. In this case, we see that the autocorrelation at lag 1 is doing the bulk of the heavy lifting. If we were going to model this process as a purely autoregressive model, an AR(2) would be a good choice.


Other potentially interesting features lie at lags 7, 10, 16, but these should be interpreted with some caution. These partial autocorrelations are not particularly far outside the expected regime, and since we are dealing with 95% confidence intervals and making comparisons for 25 lags, we naively expect at least one "spurious" result in the mix. Here your physical intuition should come into play -- with a daily sampling rate, a period of 7 suggests a weekly pattern. Knowing what you do about your system, are there similarly compelling reasons to believe in a 10 or 16 day cycle?


Next up, we can take a look at the spectral density / FFT / periodogram:


Periodogram[anscombe, PlotRange -> All, Frame -> True, 
FrameLabel -> {"Frequency (1/days)", "dB"},
GridLines -> {Range[1, 3]/7, None}, ImageSize -> 500,
BaseStyle -> {FontSize -> 12}]


Raw periodogram


In the above plot, I've added gridlines for frequencies of 1/7 days and the next two harmonics (2/7, 3/7). With the gridlines to aid the eye, you could make an argument for a peak corresponding to a weekly period, but it does not exactly jump off the page. Unfortunately, the raw periodogram is a biased estimator, particularly when the dynamic range of the signal is large.


There are a couple of workarounds built into Mathematica to help reduce this issue, but they come at a cost. One option is to compute the periodogram for a number of subsets of the data and then average these estimates together. This helps reduce bias, but at the cost of resolution. For example, if we take average the results for the first and last halves of your data set, we obtain:


Periodogram[anscombe, Floor[len/2], PlotRange -> All, Frame -> True, 
FrameLabel -> {"Frequency (1/days)", "dB"},
GridLines -> {Range[1, 3]/7, None}, ImageSize -> 500,
BaseStyle -> {FontSize -> 12}]

Periodogram averaging first and last half of the data


In this plot, it becomes easier to argue for a peak corresponding to a periodicity of 1/7 days, but the peak has also become broader, as we've sacrificed resolution to suppress bias. There are no free lunches.



Alternatively, you can apply a smoothing filter on your data prior to computing the periodogram, such as the Blackman-Harris, using:


Periodogram[anscombe, len, len, BlackmanHarrisWindow, PlotRange->All, ... ]

Periodogram of filtered data


Here we've again picked up slightly stronger evidence of a peak at 1/7 days, but without the broadening of the peak. However, depending on the filtering method you choose, you're potentially washing out (destroying) other features in your dataset. Still no free lunch.


Mathematica comes with a number of common data filters (or "windows") built-in. Try out a couple of different choices in the above code snippet and see how your results change.


If you're really ambitious, you could try a slick combination of the above two techniques called multi-tapering. Unfortunately, this is not (yet) native functionality of Mathematica, so you'd have to roll your own. If you're interested, in these and other techniques, you might want to check out the book by Percival and Walden. (Their newer book on wavelet analysis for time series is also worth checking out.)


Sorry about the length. Hopefully you find this useful, and not too pedantic.


Comments

Popular posts from this blog

plotting - Filling between two spheres in SphericalPlot3D

Manipulate[ SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, Mesh -> None, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], {n, 0, 1}] I cant' seem to be able to make a filling between two spheres. I've already tried the obvious Filling -> {1 -> {2}} but Mathematica doesn't seem to like that option. Is there any easy way around this or ... Answer There is no built-in filling in SphericalPlot3D . One option is to use ParametricPlot3D to draw the surfaces between the two shells: Manipulate[ Show[SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], ParametricPlot3D[{ r {Sin[t] Cos[1.5 Pi], Sin[t] Sin[1.5 Pi], Cos[t]}, r {Sin[t] Cos[0 Pi], Sin[t] Sin[0 Pi], Cos[t]}}, {r, 1, 2 - n}, {t, 0, Pi}, PlotStyle -> Yellow, Mesh -> {2, 15}]], {n, 0, 1}]

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1.