How to customize derivative behavior via upvalues?

I'm having trouble getting customized derivatives (defined using upvalues) to behave in sane ways. By this I mean that I'd like to define a derivative using an upvalue and then have Mathematica apply all derivative rules it normally would, delegating to the upvalues when appropriate. For example.

X /: D[X[x_], y_] := A KroneckerDelta[x,y]
Y /: D[Y[x_], y_] := B KroneckerDelta[x,y]
D[X[x], x]        (* Gives the Expected Answer:  A *)
D[X[x] + Y[x], x] (* Desired Output:   A + B  *)
D[X[x] Y[x], x]   (* Desired Output:   A Y[x] + X[x] B   *)

For the latter two, I get X'[x] + Y'[x] and Y[x] X'[x] + X[x] Y'[x]. I'm not understanding why the upvalues aren't kicking in. A similar question was asked here, but I couldn't adjust it for my problem.

Update:

It looks like I didn't specify my problem in enough detail. To give a better sense of the direction I'm looking, see this question. My (present) understanding of Derivative suggests it cannot achieve what I want since the arguments to my functions are abstract indices (e.g. multiple arguments are implicit).

Consider a (discrete) joint probability $p_{XY}(i,j)$ where $i$ and $j$ are indexes (with unspecified ranges) that represent the possible values. Now, consider a marginal distribution $p_X(i) = \sum_j p_{XY}(i,j)$. Each $p_X(i)$ depends on all of the $p_{XY}(j,k)$ with $j = i$. I was hoping to declare the derivatives of $p_{XY}(i,j)$ and $p_X(i)$ with respect to $p_{XY}(k,l)$ (treating each $p_{XY}(i,j)$ as independent variables). Then, I wanted to be able to differentiate any generic expression involving those functions. Mathematically, this is the behavior:

$$ \frac{\partial p_{XY}(i,j)}{\partial p_{XY}(k,l)} = \delta_{ij} \delta_{jk} \qquad\qquad \frac{\partial p_X(i)}{\partial p_{XY}(k,l)} = \delta_{ik} $$

Here is a toy expression I'd like to differentiate:

$$\frac{\partial}{\partial p_{XY}(k,l)} \left[ p_{XY}(i,j) p_X(i)^2 \right] = \delta_{ik} \delta_{jl}\, p_X(i)^2 + 2 p_{XY}(i,j)\, p_X(i) \, \delta_{ik} $$

The above is just an application of the product rule, so it's pretty straight-forward to do these calculations by hand, once the derivatives of the primitives are specified. However, I don't want to re-specify all the rules of differentiation. Somehow I'd like to tell Mathematica to use its own system, while incorporating my primitives.

Things I've tried...

pXY /: D[pXY[i_, j_], pXY[k_, l_]] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX /: D[pX[i_], pXY[j_, k_]] := KroneckerDelta[i, j]

D[pXY[i, j] pX[i]^2, pXY[k, l]]   (* Undesirably returns 0 *)

I think I understand why (via pattern matching) this gives 0. Another attempt:

pXY /: D[pXY[i_, j_], pXY[k_, l_]] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX[i_] := MySumX[pXY[i, j], {i, j}]
MySum /: D[MySum[s_, {i_, j_}], pXY[k_, l_]] := D[s /. {i -> k, j -> l}, pXY[k, l]]
MySumX /: D[MySumX[s_, {i_, j_}], pXY[k_, l_]] := D[s /. {i -> k}, pXY[k, l]]

D[pX[i], pXY[i, k]]  (* Result is correct *)
D[pXY[i, j] + pX[i], pXY[k, l]] (* Result is not correct *)

Hopefully its clearer now how the original question relates to what I'm truly after. Just to keep it fun, here is another expression I'd like to differentiate:

$$ \frac{\partial}{\partial p_{XY}(k,l)} \sum_{i,j} p_{XY}(i,j)\log \frac{ p_{XY}(i,j)}{p_X(i) p_Y(j)}$$

Answer

Update

In my original answer (which I have left below for entertainment value) I asserted that it would be difficult to make it work with upvalues using D, but Tom has reminded me of the NonConstants option. When the symbols listed as NonConstants are differentiated they remain as expressions with head D:

D[a x, x, NonConstants -> {a}]
(*  a + x D[a, x, NonConstants -> {a}]  *)

This means that it is perfectly possible to use the upvalues approach suggested in the question, all that is required is to extend the pattern with a BlankNullSequence[] to allow for the presence of the option.

pXY /: D[pXY[i_, j_], pXY[k_, l_], ___] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX /: D[pX[i_], pXY[j_, k_], ___] := KroneckerDelta[i, j]

SetOptions[D, NonConstants -> {pXY, pX}];

D[pXY[i, j] pX[i]^2, pXY[k, l]]
(*  KroneckerDelta[i, k] KroneckerDelta[j, l] pX[i]^2 + 
  2 KroneckerDelta[i, k] pX[i] pXY[i, j]  *)

Original answer

In response to the updated question:

I think it will be very difficult to make it work using rules based on D. This simple example shows the problem:

a /: D[a[x], x] := 1

D[a[x], x]
(* 1 *)

D[2 a[x], x]
(* 2 a'[x] *)

Because there is no special rule defined for D[2 a[x], x] it is evaluated as normal, giving 2 Derivative[1][a][x]. You could of course add lots more upvalues to deal with different expressions, but you will end up effectively rewriting the rules of differentiation.

A better solution is to create your special rules based on Derivative, like this:

pXY[i_, j_]'[pXY[k_, l_]] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX[i_]'[pXY[j_, k_]] := KroneckerDelta[i, j]

You will note that your example does not appear to work, however:

D[pXY[i, j] pX[i]^2, pXY[k, l]]
(* 0 *)

What's going on here is that D sees that the function does not appear to depend on the differentiation variable, so it returns 0 immediately. It's the same as D[Sin, x] returning 0 while D[Sin[x], x] returns Cos[x]. The solution is to make the expression explicitly a function of the differentiation variable:

D[pXY[i, j][pXY[k, l]] pX[i][pXY[k, l]]^2, pXY[k, l]]
(* KroneckerDelta[i, k] KroneckerDelta[j, l] pX[i][pXY[k, l]]^2 + 
 2 KroneckerDelta[i, k] pX[i][pXY[k, l]] pXY[i, j][pXY[k, l]] *)

This works, but has made the notation messy. A possible solution is to define your own version of D which automatically converts func to func[var] before differentiating, and then converts func[var] back to func afterwards.

myD[expr_, var_] := D[expr /. (p_pX | p_pXY) :> p[var], var] /. (p_pX | p_pXY)[var] :> p

You now have:

myD[pXY[i, j] pX[i]^2, pXY[k, l]] 
(* KroneckerDelta[i, k] KroneckerDelta[j, l] pX[i]^2 + 

 2 KroneckerDelta[i, k] pX[i] pXY[i, j] *)

Blog

Search This Blog

How to customize derivative behavior via upvalues?

Comments

Post a Comment

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

How to thread a list

mathematical optimization - Minimizing using indices, error: Part::pkspec1: The expression cannot be used as a part specification