list manipulation - Allocate amount A according to the distribution of amount B

I have a cost that varies by category and date. I also have the counts of each item type within each category and date. I want to distribute the cost to each of the item types. I'm trying to use GroupBy, and Normalize to get the percentage distribution by item type. My plan is to then multiply these percentages by the cost for each category and date.

x1 = {
     <|"cat" -> "a", "type" -> "abc1", "date" -> {2013, 1}, "count" -> 10|>,
     <|"cat" -> "a", "type" -> "abc2", "date" -> {2013, 1}, "count" -> 20|>,
     <|"cat" -> "a", "type" -> "abc3", "date" -> {2013, 1}, "count" -> 30|>,
     <|"cat" -> "a", "type" -> "abc1", "date" -> {2014, 1}, "count" -> 40|>,
     <|"cat" -> "a", "type" -> "abc2", "date" -> {2014, 1}, "count" -> 50|>,

     <|"cat" -> "a", "type" -> "abc1", "date" -> {2015, 1}, "count" -> 13|>,
     <|"cat" -> "b", "type" -> "abc1", "date" -> {2013, 1}, "count" -> 60|>,
     <|"cat" -> "b", "type" -> "abc1", "date" -> {2014, 1}, "count" -> 70|>,
     <|"cat" -> "b", "type" -> "abc2", "date" -> {2013, 1}, "count" -> 75|>
     };
x2 = GroupBy[x1, {#cat, #date} &, Normalize[#, Total] &]

This results in the following:

<|{"a", {2013, 1}} -> 
 {<|"cat" -> 1/3, "type" -> ("abc1")/("abc1" + "abc2" + "abc3"), "date" -> {1/3, 1/3}, "count" -> 1/6|>, 

  <|"cat" -> 1/3, "type" -> ("abc2")/("abc1" + "abc2" + "abc3"), "date" -> {1/3, 1/3}, "count" -> 1/3|>, 
  <|"cat" -> 1/3, "type" -> ("abc3")/("abc1" + "abc2" + "abc3"), "date" -> {1/3, 1/3}, "count" -> 1/2|>}, 
 {"a", {2014, 1}} -> 
 {<|"cat" -> 1/2, "type" -> ("abc1")/("abc1" + "abc2"), "date" -> {1/2, 1/2}, "count" -> 4/9|>, 
  <|"cat" -> 1/2, "type" -> ("abc2")/("abc1" + "abc2"), "date" -> {1/2, 1/2}, "count" -> 5/9|>},
 {"a", {2015, 1}} -> 
 {<|"cat" -> 1, "type" -> 1, "date" -> {1, 1}, "count" -> 1|>}, 
 {"b", {2013, 1}} -> 
 {<|"cat" -> 1/2, "type" -> ("abc1")/("abc1" + "abc2"), "date" -> {1/2,1/2}, "count" -> 4/9|>, 
  <|"cat" -> 1/2, "type" -> ("abc2")/("abc1" + "abc2"), "date" -> {1/2,1/2}, "count" -> 5/9|>}, 

 {"b", {2014, 1}} -> {<|"cat" -> 1, "type" -> 1,"date" -> {1, 1}, "count" -> 1|>}|>

Which is a mess. It does calculate the correct share for each type (in the count key), but now I'd have to somehow pull this apart and put it back together properly.

Originally I didn't use associations, and did a lot of joining. Using associations seems like a better idea but this level of complication is daunting. I'm afraid that when I pull it apart and put it back together the pieces may not be in the same order.

Is there a better way to approach this? Or, is there a reasonable way to simplify the result?

Answer

Let's start with a function cost that will tell us the total cost for each category/date pair.

cost[{cat_, date_}] := 100.

For illustration purposes, this function has been defined trivially -- the real cost function will have to look up its values somewhere.

Using Query

Let's define total to be a map from a category/date pair to the total count for that pair:

total = x1 // Query[GroupBy[{#cat, #date}&], Total, "count"]

(* <| {a,{2013,1}} -> 60
    , {a,{2014,1}} -> 90
    , {a,{2015,1}} -> 13
    , {b,{2013,1}} -> 135
    , {b,{2014,1}} -> 70
    |> *)

Now we can generate adjusted source data so that each entry has a new cost property containing the prorated cost of that entry:

x1 // Query[All, <| #, "cost" -> cost[{#cat, #date}] #count / total[{#cat, #date}] |>&]

Without Using Query

The previous section uses Query, but it does not depend upon any particularly advanced functionality. We can perform the same operations without using Query at all:

total = GroupBy[x1, {#cat, #date}& -> (#count&), Total];

<| #, "cost" -> cost[{#cat, #date}] #count / total[{#cat, #date}] |>& /@ x1

(* same result as before *)

Sidebar: How does the Query version work?

Let's look at how Query is used above. First:

x1 // Query[GroupBy[{#cat, #date}&], Total, "count"]

Queries specify an operator to apply to each element of each level of an expression. The first operator is applied to level zero (the whole expression), the next to level one, etc. Each level is acted upon twice: once while descending into the expression and a second time when ascending back out to top-level. It is beyond the scope of this response to describe these phases in detail. However, here is the sequence of events for this particular query:

Level 0, descending The GroupBy operator performs the necessary grouping and splits the original data into groups keyed by the "cat" and "date" keys. GroupBy is documented as being a special operator in that it inserts a new level into the data that did not exist before. While we started with a list of associations on level 0, after the grouping each level 1 element is a list of associations for a particular group.

Level 1, descending The Total operator is an ascending operator, so in a descending context it is treated as All, i.e. Identity. This means that each group's list of associations will be left untouched as we descend.

Level 2, descending The "count" operator will extract the count property from each of the associations. Thus, each association will be transformed into a count. This is the last operator, so now we switch to the ascending phase.

Level 2, ascending The "count" operator is a descending operator, so it is treated as All in the ascending phase. Each count undergoes no further transformation.

Level 1, ascending The Total operator now gets its chance to act since it is an ascending operator. It is presented with a list of the counts generated by step 4, returning a single number representing the sum of those counts.

Level 0, ascending The GroupBy operator is a descending operator so it is treated as All in the ascending phase. This means that association containing counts keyed by groups will be returned unchanged. This association is the final result of the first query.

The second query is essentially of the form:

Query[All, f]

Short-cutting the description a bit, this applies the All operator to level 0 and the f operator to level 1. Since All essentially means Identity, the query is equivalent to saying Map[f]. In fact, we can see this by applying Normal to the query:

Query[All, f] // Normal
(* Map[f] *)

This Normal trick works for any query.

Tools described in Visualizing Type System Operations can be helpful in developing an intuition for dataset and query operations.

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

Blog

Search This Blog

list manipulation - Allocate amount A according to the distribution of amount B

Comments

Post a Comment

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

How to thread a list

functions - Get leading series expansion term?