Skip to main content

list manipulation - Allocate amount A according to the distribution of amount B


I have a cost that varies by category and date. I also have the counts of each item type within each category and date. I want to distribute the cost to each of the item types. I'm trying to use GroupBy, and Normalize to get the percentage distribution by item type. My plan is to then multiply these percentages by the cost for each category and date.


x1 = {
<|"cat" -> "a", "type" -> "abc1", "date" -> {2013, 1}, "count" -> 10|>,
<|"cat" -> "a", "type" -> "abc2", "date" -> {2013, 1}, "count" -> 20|>,
<|"cat" -> "a", "type" -> "abc3", "date" -> {2013, 1}, "count" -> 30|>,
<|"cat" -> "a", "type" -> "abc1", "date" -> {2014, 1}, "count" -> 40|>,
<|"cat" -> "a", "type" -> "abc2", "date" -> {2014, 1}, "count" -> 50|>,

<|"cat" -> "a", "type" -> "abc1", "date" -> {2015, 1}, "count" -> 13|>,
<|"cat" -> "b", "type" -> "abc1", "date" -> {2013, 1}, "count" -> 60|>,
<|"cat" -> "b", "type" -> "abc1", "date" -> {2014, 1}, "count" -> 70|>,
<|"cat" -> "b", "type" -> "abc2", "date" -> {2013, 1}, "count" -> 75|>
};
x2 = GroupBy[x1, {#cat, #date} &, Normalize[#, Total] &]

This results in the following:


<|{"a", {2013, 1}} -> 
{<|"cat" -> 1/3, "type" -> ("abc1")/("abc1" + "abc2" + "abc3"), "date" -> {1/3, 1/3}, "count" -> 1/6|>,

<|"cat" -> 1/3, "type" -> ("abc2")/("abc1" + "abc2" + "abc3"), "date" -> {1/3, 1/3}, "count" -> 1/3|>,
<|"cat" -> 1/3, "type" -> ("abc3")/("abc1" + "abc2" + "abc3"), "date" -> {1/3, 1/3}, "count" -> 1/2|>},
{"a", {2014, 1}} ->
{<|"cat" -> 1/2, "type" -> ("abc1")/("abc1" + "abc2"), "date" -> {1/2, 1/2}, "count" -> 4/9|>,
<|"cat" -> 1/2, "type" -> ("abc2")/("abc1" + "abc2"), "date" -> {1/2, 1/2}, "count" -> 5/9|>},
{"a", {2015, 1}} ->
{<|"cat" -> 1, "type" -> 1, "date" -> {1, 1}, "count" -> 1|>},
{"b", {2013, 1}} ->
{<|"cat" -> 1/2, "type" -> ("abc1")/("abc1" + "abc2"), "date" -> {1/2,1/2}, "count" -> 4/9|>,
<|"cat" -> 1/2, "type" -> ("abc2")/("abc1" + "abc2"), "date" -> {1/2,1/2}, "count" -> 5/9|>},

{"b", {2014, 1}} -> {<|"cat" -> 1, "type" -> 1,"date" -> {1, 1}, "count" -> 1|>}|>

Which is a mess. It does calculate the correct share for each type (in the count key), but now I'd have to somehow pull this apart and put it back together properly.


Originally I didn't use associations, and did a lot of joining. Using associations seems like a better idea but this level of complication is daunting. I'm afraid that when I pull it apart and put it back together the pieces may not be in the same order.


Is there a better way to approach this? Or, is there a reasonable way to simplify the result?



Answer



Let's start with a function cost that will tell us the total cost for each category/date pair.


cost[{cat_, date_}] := 100.

For illustration purposes, this function has been defined trivially -- the real cost function will have to look up its values somewhere.



Using Query


Let's define total to be a map from a category/date pair to the total count for that pair:


total = x1 // Query[GroupBy[{#cat, #date}&], Total, "count"]

(* <| {a,{2013,1}} -> 60
, {a,{2014,1}} -> 90
, {a,{2015,1}} -> 13
, {b,{2013,1}} -> 135
, {b,{2014,1}} -> 70
|> *)


Now we can generate adjusted source data so that each entry has a new cost property containing the prorated cost of that entry:


x1 // Query[All, <| #, "cost" -> cost[{#cat, #date}] #count / total[{#cat, #date}] |>&]

result screenshot


Without Using Query


The previous section uses Query, but it does not depend upon any particularly advanced functionality. We can perform the same operations without using Query at all:


total = GroupBy[x1, {#cat, #date}& -> (#count&), Total];

<| #, "cost" -> cost[{#cat, #date}] #count / total[{#cat, #date}] |>& /@ x1

(* same result as before *)




Sidebar: How does the Query version work?


Let's look at how Query is used above. First:


x1 // Query[GroupBy[{#cat, #date}&], Total, "count"]

Queries specify an operator to apply to each element of each level of an expression. The first operator is applied to level zero (the whole expression), the next to level one, etc. Each level is acted upon twice: once while descending into the expression and a second time when ascending back out to top-level. It is beyond the scope of this response to describe these phases in detail. However, here is the sequence of events for this particular query:




  1. Level 0, descending The GroupBy operator performs the necessary grouping and splits the original data into groups keyed by the "cat" and "date" keys. GroupBy is documented as being a special operator in that it inserts a new level into the data that did not exist before. While we started with a list of associations on level 0, after the grouping each level 1 element is a list of associations for a particular group.

  2. Level 1, descending The Total operator is an ascending operator, so in a descending context it is treated as All, i.e. Identity. This means that each group's list of associations will be left untouched as we descend.

  3. Level 2, descending The "count" operator will extract the count property from each of the associations. Thus, each association will be transformed into a count. This is the last operator, so now we switch to the ascending phase.

  4. Level 2, ascending The "count" operator is a descending operator, so it is treated as All in the ascending phase. Each count undergoes no further transformation.

  5. Level 1, ascending The Total operator now gets its chance to act since it is an ascending operator. It is presented with a list of the counts generated by step 4, returning a single number representing the sum of those counts.

  6. Level 0, ascending The GroupBy operator is a descending operator so it is treated as All in the ascending phase. This means that association containing counts keyed by groups will be returned unchanged. This association is the final result of the first query.


The second query is essentially of the form:


Query[All, f]


Short-cutting the description a bit, this applies the All operator to level 0 and the f operator to level 1. Since All essentially means Identity, the query is equivalent to saying Map[f]. In fact, we can see this by applying Normal to the query:


Query[All, f] // Normal
(* Map[f] *)

This Normal trick works for any query.


Tools described in Visualizing Type System Operations can be helpful in developing an intuition for dataset and query operations.


Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...