I'm having trouble getting customized derivatives (defined using upvalues) to behave in sane ways. By this I mean that I'd like to define a derivative using an upvalue and then have Mathematica apply all derivative rules it normally would, delegating to the upvalues when appropriate. For example.
X /: D[X[x_], y_] := A KroneckerDelta[x,y]
Y /: D[Y[x_], y_] := B KroneckerDelta[x,y]
D[X[x], x] (* Gives the Expected Answer: A *)
D[X[x] + Y[x], x] (* Desired Output: A + B *)
D[X[x] Y[x], x] (* Desired Output: A Y[x] + X[x] B *)
For the latter two, I get X'[x] + Y'[x]
and Y[x] X'[x] + X[x] Y'[x]
. I'm not understanding why the upvalues aren't kicking in. A similar question was asked here, but I couldn't adjust it for my problem.
Update:
It looks like I didn't specify my problem in enough detail. To give a better sense of the direction I'm looking, see this question. My (present) understanding of Derivative
suggests it cannot achieve what I want since the arguments to my functions are abstract indices (e.g. multiple arguments are implicit).
Consider a (discrete) joint probability $p_{XY}(i,j)$ where $i$ and $j$ are indexes (with unspecified ranges) that represent the possible values. Now, consider a marginal distribution $p_X(i) = \sum_j p_{XY}(i,j)$. Each $p_X(i)$ depends on all of the $p_{XY}(j,k)$ with $j = i$. I was hoping to declare the derivatives of $p_{XY}(i,j)$ and $p_X(i)$ with respect to $p_{XY}(k,l)$ (treating each $p_{XY}(i,j)$ as independent variables). Then, I wanted to be able to differentiate any generic expression involving those functions. Mathematically, this is the behavior:
$$ \frac{\partial p_{XY}(i,j)}{\partial p_{XY}(k,l)} = \delta_{ij} \delta_{jk} \qquad\qquad \frac{\partial p_X(i)}{\partial p_{XY}(k,l)} = \delta_{ik} $$
Here is a toy expression I'd like to differentiate:
$$\frac{\partial}{\partial p_{XY}(k,l)} \left[ p_{XY}(i,j) p_X(i)^2 \right] = \delta_{ik} \delta_{jl}\, p_X(i)^2 + 2 p_{XY}(i,j)\, p_X(i) \, \delta_{ik} $$
The above is just an application of the product rule, so it's pretty straight-forward to do these calculations by hand, once the derivatives of the primitives are specified. However, I don't want to re-specify all the rules of differentiation. Somehow I'd like to tell Mathematica to use its own system, while incorporating my primitives.
Things I've tried...
pXY /: D[pXY[i_, j_], pXY[k_, l_]] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX /: D[pX[i_], pXY[j_, k_]] := KroneckerDelta[i, j]
D[pXY[i, j] pX[i]^2, pXY[k, l]] (* Undesirably returns 0 *)
I think I understand why (via pattern matching) this gives 0. Another attempt:
pXY /: D[pXY[i_, j_], pXY[k_, l_]] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX[i_] := MySumX[pXY[i, j], {i, j}]
MySum /: D[MySum[s_, {i_, j_}], pXY[k_, l_]] := D[s /. {i -> k, j -> l}, pXY[k, l]]
MySumX /: D[MySumX[s_, {i_, j_}], pXY[k_, l_]] := D[s /. {i -> k}, pXY[k, l]]
D[pX[i], pXY[i, k]] (* Result is correct *)
D[pXY[i, j] + pX[i], pXY[k, l]] (* Result is not correct *)
Hopefully its clearer now how the original question relates to what I'm truly after. Just to keep it fun, here is another expression I'd like to differentiate:
$$ \frac{\partial}{\partial p_{XY}(k,l)} \sum_{i,j} p_{XY}(i,j)\log \frac{ p_{XY}(i,j)}{p_X(i) p_Y(j)}$$
Answer
Update
In my original answer (which I have left below for entertainment value) I asserted that it would be difficult to make it work with upvalues using D
, but Tom has reminded me of the NonConstants
option. When the symbols listed as NonConstants are differentiated they remain as expressions with head D
:
D[a x, x, NonConstants -> {a}]
(* a + x D[a, x, NonConstants -> {a}] *)
This means that it is perfectly possible to use the upvalues approach suggested in the question, all that is required is to extend the pattern with a BlankNullSequence[]
to allow for the presence of the option.
pXY /: D[pXY[i_, j_], pXY[k_, l_], ___] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX /: D[pX[i_], pXY[j_, k_], ___] := KroneckerDelta[i, j]
SetOptions[D, NonConstants -> {pXY, pX}];
D[pXY[i, j] pX[i]^2, pXY[k, l]]
(* KroneckerDelta[i, k] KroneckerDelta[j, l] pX[i]^2 +
2 KroneckerDelta[i, k] pX[i] pXY[i, j] *)
Original answer
In response to the updated question:
I think it will be very difficult to make it work using rules based on D
. This simple example shows the problem:
a /: D[a[x], x] := 1
D[a[x], x]
(* 1 *)
D[2 a[x], x]
(* 2 a'[x] *)
Because there is no special rule defined for D[2 a[x], x]
it is evaluated as normal, giving 2 Derivative[1][a][x]
. You could of course add lots more upvalues to deal with different expressions, but you will end up effectively rewriting the rules of differentiation.
A better solution is to create your special rules based on Derivative
, like this:
pXY[i_, j_]'[pXY[k_, l_]] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX[i_]'[pXY[j_, k_]] := KroneckerDelta[i, j]
You will note that your example does not appear to work, however:
D[pXY[i, j] pX[i]^2, pXY[k, l]]
(* 0 *)
What's going on here is that D
sees that the function does not appear to depend on the differentiation variable, so it returns 0 immediately. It's the same as D[Sin, x]
returning 0 while D[Sin[x], x]
returns Cos[x]
. The solution is to make the expression explicitly a function of the differentiation variable:
D[pXY[i, j][pXY[k, l]] pX[i][pXY[k, l]]^2, pXY[k, l]]
(* KroneckerDelta[i, k] KroneckerDelta[j, l] pX[i][pXY[k, l]]^2 +
2 KroneckerDelta[i, k] pX[i][pXY[k, l]] pXY[i, j][pXY[k, l]] *)
This works, but has made the notation messy. A possible solution is to define your own version of D
which automatically converts func
to func[var]
before differentiating, and then converts func[var]
back to func
afterwards.
myD[expr_, var_] := D[expr /. (p_pX | p_pXY) :> p[var], var] /. (p_pX | p_pXY)[var] :> p
You now have:
myD[pXY[i, j] pX[i]^2, pXY[k, l]]
(* KroneckerDelta[i, k] KroneckerDelta[j, l] pX[i]^2 +
2 KroneckerDelta[i, k] pX[i] pXY[i, j] *)
Comments
Post a Comment