Skip to main content

Extracting lists from list based on length


I have a list looking like this;


{{"ADS GY Equity", "", "", "ALV GY Equity", "", "", "BAS GY Equity", 
"", "", "BAYN GY Equity", "", "", "BEI GY Equity", "", "",
"BMW GY Equity", ""}, {"Date", "PX_LAST", "", "Date", "PX_LAST", "",

"Date", "PX_LAST", "", "Date", "PX_LAST", "", "Date", "PX_LAST",
"", "Date", "PX_LAST"}, {"addidas", 17.925, "", "alvac", 287.292,
"", "basse", 24.875, "", "bayern", 42.34, "", "begge", 23.667, "",
"BMW", 29.49}, {{2000, 1, 4, 0, 0, 0.}, 17.5,
"", {2000, 1, 4, 0, 0, 0.}, 285.934, "", {2000, 1, 4, 0, 0, 0.},
23.925, "", {2000, 1, 4, 0, 0, 0.}, 41.216,
"", {2000, 1, 4, 0, 0, 0.}, 21.333, "", {2000, 1, 4, 0, 0, 0.},
28.3}, {{2000, 1, 5, 0, 0, 0.}, 17.5, "", {2000, 1, 5, 0, 0, 0.},
294.078, "", {2000, 1, 5, 0, 0, 0.}, 23.375,
"", {2000, 1, 5, 0, 0, 0.}, 40.176, "", {2000, 1, 5, 0, 0, 0.}, 21.,

"", {2000, 1, 5, 0, 0, 0.}, 27.74}, {{2000, 1, 6, 0, 0, 0.}, 18.25,
"", {2000, 1, 6, 0, 0, 0.}, 297.698, "", {2000, 1, 6, 0, 0, 0.},
24.015, "", {2000, 1, 6, 0, 0, 0.}, 41.356,
"", {2000, 1, 6, 0, 0, 0.}, 21.867, "", {2000, 1, 6, 0, 0, 0.},
27.65}, {{2000, 1, 7, 0, 0, 0.}, 18., "", {2000, 1, 7, 0, 0, 0.},
305.977, "", {2000, 1, 7, 0, 0, 0.}, 25.,
"", {2000, 1, 7, 0, 0, 0.}, 43.08, "", {2000, 1, 7, 0, 0, 0.},
22.33, "", {2000, 1, 7, 0, 0, 0.}, 27.6}, {{2000, 1, 10, 0, 0, 0.},
18.272, "", {2000, 1, 10, 0, 0, 0.}, 307.742,
"", {2000, 1, 10, 0, 0, 0.}, 25.11, "", {2000, 1, 10, 0, 0, 0.},

44.644, "", {2000, 1, 10, 0, 0, 0.}, 22.667,
"", {2000, 1, 10, 0, 0, 0.}, 28.7}, {{2000, 1, 11, 0, 0, 0.},
18.103, "", "", "", "", {2000, 1, 11, 0, 0, 0.}, 23.995,
"", {2000, 1, 11, 0, 0, 0.}, 43.323, "", {2000, 1, 11, 0, 0, 0.},
21.883, "", {2000, 1, 11, 0, 0, 0.},
28.6}, {{2000, 1, 12, 0, 0, 0.}, 18., "", "", "",
"", {2000, 1, 12, 0, 0, 0.}, 24., "", {2000, 1, 12, 0, 0, 0.},
41.45, "", {2000, 1, 12, 0, 0, 0.}, 21.717,
"", {2000, 1, 12, 0, 0, 0.}, 28.19}, {{2000, 1, 13, 0, 0, 0.},
17.462, "", "", "", "", {2000, 1, 13, 0, 0, 0.}, 23.75,

"", {2000, 1, 13, 0, 0, 0.}, 41.169, "", {2000, 1, 13, 0, 0, 0.},
22.163, "", {2000, 1, 13, 0, 0, 0.},
27.4}, {{2000, 1, 14, 0, 0, 0.}, 16.837, "", "", "",
"", {2000, 1, 14, 0, 0, 0.}, 23.75, "", {2000, 1, 14, 0, 0, 0.},
42.152, "", {2000, 1, 14, 0, 0, 0.}, 22.583,
"", {2000, 1, 14, 0, 0, 0.}, 27.2}, {{2000, 1, 17, 0, 0, 0.}, 17.,
"", "", "", "", {2000, 1, 17, 0, 0, 0.}, 23.935, "", "", "",
"", {2000, 1, 17, 0, 0, 0.}, 23.33, "", {2000, 1, 17, 0, 0, 0.},
27.28}, {{2000, 1, 18, 0, 0, 0.}, 17., "", "", "",
"", {2000, 1, 18, 0, 0, 0.}, 23.975, "", "", "",

"", {2000, 1, 18, 0, 0, 0.}, 22.667, "", {2000, 1, 18, 0, 0, 0.},
27.}, {{2000, 1, 19, 0, 0, 0.}, 16.65, "", "", "",
"", {2000, 1, 19, 0, 0, 0.}, 24.92, "", "", "",
"", {2000, 1, 19, 0, 0, 0.}, 23.3, "", {2000, 1, 19, 0, 0, 0.},
27.59}}

I want to extract every list which has length 14 or 15. This should leave me with eight lists, because I also want to drop the lists/columns with no values in it.


This is how the output should look like after extraction


  {{{"ADS GY Equity", "", "BAS GY Equity", "", "BEI GY Equity", "", 
"BMW GY Equity", ""}, {"Date", "PX_LAST", "Date", "PX_LAST",

"Date", "PX_LAST", "Date", "PX_LAST"}, {"addidas", 17.925, "basse",
24.875, "begge", 23.667, "BMW", 29.49}, {{2000, 1, 4, 0, 0, 0.},
17.5, {2000, 1, 4, 0, 0, 0.}, 23.925, {2000, 1, 4, 0, 0, 0.},
21.333, {2000, 1, 4, 0, 0, 0.}, 28.3}, {{2000, 1, 5, 0, 0, 0.},
17.5, {2000, 1, 5, 0, 0, 0.}, 23.375, {2000, 1, 5, 0, 0, 0.},
21., {2000, 1, 5, 0, 0, 0.}, 27.74}, {{2000, 1, 6, 0, 0, 0.},
18.25, {2000, 1, 6, 0, 0, 0.}, 24.015, {2000, 1, 6, 0, 0, 0.},
21.867, {2000, 1, 6, 0, 0, 0.}, 27.65}, {{2000, 1, 7, 0, 0, 0.},
18., {2000, 1, 7, 0, 0, 0.}, 25., {2000, 1, 7, 0, 0, 0.},
22.33, {2000, 1, 7, 0, 0, 0.}, 27.6}, {{2000, 1, 10, 0, 0, 0.},

18.272, {2000, 1, 10, 0, 0, 0.}, 25.11, {2000, 1, 10, 0, 0, 0.},
22.667, {2000, 1, 10, 0, 0, 0.}, 28.7}, {{2000, 1, 11, 0, 0, 0.},
18.103, {2000, 1, 11, 0, 0, 0.}, 23.995, {2000, 1, 11, 0, 0, 0.},
21.883, {2000, 1, 11, 0, 0, 0.}, 28.6}, {{2000, 1, 12, 0, 0, 0.},
18., {2000, 1, 12, 0, 0, 0.}, 24., {2000, 1, 12, 0, 0, 0.},
21.717, {2000, 1, 12, 0, 0, 0.}, 28.19}, {{2000, 1, 13, 0, 0, 0.},
17.462, {2000, 1, 13, 0, 0, 0.}, 23.75, {2000, 1, 13, 0, 0, 0.},
22.163, {2000, 1, 13, 0, 0, 0.}, 27.4}, {{2000, 1, 14, 0, 0, 0.},
16.837, {2000, 1, 14, 0, 0, 0.}, 23.75, {2000, 1, 14, 0, 0, 0.},
22.583, {2000, 1, 14, 0, 0, 0.}, 27.2}, {{2000, 1, 17, 0, 0, 0.},

17., {2000, 1, 17, 0, 0, 0.}, 23.935, {2000, 1, 17, 0, 0, 0.},
23.33, {2000, 1, 17, 0, 0, 0.}, 27.28}, {{2000, 1, 18, 0, 0, 0.},
17., {2000, 1, 18, 0, 0, 0.}, 23.975, {2000, 1, 18, 0, 0, 0.},
22.667, {2000, 1, 18, 0, 0, 0.}, 27.}, {{2000, 1, 19, 0, 0, 0.},
16.65, {2000, 1, 19, 0, 0, 0.}, 24.92, {2000, 1, 19, 0, 0, 0.},
23.3, {2000, 1, 19, 0, 0, 0.}, 27.59}}}

If I where to explain simply what has been done, It only extracted the companies date and closing price where the number of closing prices was equal to the number of longest number of closing prices for any of the companies.



Answer



Starting with your data assigned to dat please try this:



tdat = Partition[dat\[Transpose], 2, 3];

newdat = Join @@ DeleteCases[tdat, {{__, ""}, _}]\[Transpose];

I am going by the assumption that any primary column ending with "" means that it is short and should be pared from the table.


Sample:


newdat // MatrixForm

enter image description here


Explanation



As requested here is an explanation of my code. Due to the format of tables in Mathematica it is usually much easier to operate on rows rather than columns therefore my first step is to Transpose the data. (\[Transpose] is a special transpose symbol; see the linked documentation.)


When looking at the data a structure becomes apparent: there are two columns of data in each group, and a spacing or dividing column containing nothing but "". (Indicentially these dividing columns are completely omitted in my output; if you need them, ask, and I'll put them back in.) I therefore Partition the data into groups of two columns, moving three places between each group and thereby skipping the "empty" dividing columns. I call this data tdat.


Now that the data is in a row-based form and grouped we can use fairly simple pattern matching to extract only the full-length sections. It is simpler in this case to delete the unwanted ones so I use DeleteCases. My pattern is:


{{__, ""}, _}

This pattern will be matched to each group of two rows (originally columns). It is a literal list with two elements. The second is _ which is short for Blank[] and matches any single expression; I use this because I want to allow any second column. The first element is {__, ""} which matches a list (here row, originally column) that starts with any series of elements (__, short for BlankSequence[]) and ends with literal "". We therefore match any data group the first row (column) of which ends with "" and delete these.


Finally, the data groups are joined into a single table using Join and @@ (short for Apply), then transposed back to column form. Note that the low precedence of the \[Transpose] operator means that this transpose is done after the Join.


By the way, if you are doing this on a large volume of data there will be faster methods than pattern matching, perhaps using Pick in place of the DeleteCases step. However, I find pattern matching more versatile and concise, and therefore the best place to start unless speed or large data is a priority.


Comments

Popular posts from this blog

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

mathematical optimization - Minimizing using indices, error: Part::pkspec1: The expression cannot be used as a part specification

I want to use Minimize where the variables to minimize are indices pointing into an array. Here a MWE that hopefully shows what my problem is. vars = u@# & /@ Range[3]; cons = Flatten@ { Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; Minimize[{Total@((vec1[[#]] - vec2[[u[#]]])^2 & /@ Range[1, 3]), cons}, vars, Integers] The error I get: Part::pkspec1: The expression u[1] cannot be used as a part specification. >> Answer Ok, it seems that one can get around Mathematica trying to evaluate vec2[[u[1]]] too early by using the function Indexed[vec2,u[1]] . The working MWE would then look like the following: vars = u@# & /@ Range[3]; cons = Flatten@{ Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; NMinimize[ {Total@((vec1[[#]] - Indexed[vec2, u[#]])^2 & /@ R...

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]