Skip to main content

How to customize derivative behavior via upvalues?


I'm having trouble getting customized derivatives (defined using upvalues) to behave in sane ways. By this I mean that I'd like to define a derivative using an upvalue and then have Mathematica apply all derivative rules it normally would, delegating to the upvalues when appropriate. For example.


X /: D[X[x_], y_] := A KroneckerDelta[x,y]
Y /: D[Y[x_], y_] := B KroneckerDelta[x,y]
D[X[x], x] (* Gives the Expected Answer: A *)
D[X[x] + Y[x], x] (* Desired Output: A + B *)
D[X[x] Y[x], x] (* Desired Output: A Y[x] + X[x] B *)

For the latter two, I get X'[x] + Y'[x] and Y[x] X'[x] + X[x] Y'[x]. I'm not understanding why the upvalues aren't kicking in. A similar question was asked here, but I couldn't adjust it for my problem.



Update:


It looks like I didn't specify my problem in enough detail. To give a better sense of the direction I'm looking, see this question. My (present) understanding of Derivative suggests it cannot achieve what I want since the arguments to my functions are abstract indices (e.g. multiple arguments are implicit).


Consider a (discrete) joint probability $p_{XY}(i,j)$ where $i$ and $j$ are indexes (with unspecified ranges) that represent the possible values. Now, consider a marginal distribution $p_X(i) = \sum_j p_{XY}(i,j)$. Each $p_X(i)$ depends on all of the $p_{XY}(j,k)$ with $j = i$. I was hoping to declare the derivatives of $p_{XY}(i,j)$ and $p_X(i)$ with respect to $p_{XY}(k,l)$ (treating each $p_{XY}(i,j)$ as independent variables). Then, I wanted to be able to differentiate any generic expression involving those functions. Mathematically, this is the behavior:


$$ \frac{\partial p_{XY}(i,j)}{\partial p_{XY}(k,l)} = \delta_{ij} \delta_{jk} \qquad\qquad \frac{\partial p_X(i)}{\partial p_{XY}(k,l)} = \delta_{ik} $$


Here is a toy expression I'd like to differentiate:


$$\frac{\partial}{\partial p_{XY}(k,l)} \left[ p_{XY}(i,j) p_X(i)^2 \right] = \delta_{ik} \delta_{jl}\, p_X(i)^2 + 2 p_{XY}(i,j)\, p_X(i) \, \delta_{ik} $$


The above is just an application of the product rule, so it's pretty straight-forward to do these calculations by hand, once the derivatives of the primitives are specified. However, I don't want to re-specify all the rules of differentiation. Somehow I'd like to tell Mathematica to use its own system, while incorporating my primitives.


Things I've tried...


pXY /: D[pXY[i_, j_], pXY[k_, l_]] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX /: D[pX[i_], pXY[j_, k_]] := KroneckerDelta[i, j]

D[pXY[i, j] pX[i]^2, pXY[k, l]] (* Undesirably returns 0 *)

I think I understand why (via pattern matching) this gives 0. Another attempt:


pXY /: D[pXY[i_, j_], pXY[k_, l_]] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX[i_] := MySumX[pXY[i, j], {i, j}]
MySum /: D[MySum[s_, {i_, j_}], pXY[k_, l_]] := D[s /. {i -> k, j -> l}, pXY[k, l]]
MySumX /: D[MySumX[s_, {i_, j_}], pXY[k_, l_]] := D[s /. {i -> k}, pXY[k, l]]

D[pX[i], pXY[i, k]] (* Result is correct *)
D[pXY[i, j] + pX[i], pXY[k, l]] (* Result is not correct *)


Hopefully its clearer now how the original question relates to what I'm truly after. Just to keep it fun, here is another expression I'd like to differentiate:


$$ \frac{\partial}{\partial p_{XY}(k,l)} \sum_{i,j} p_{XY}(i,j)\log \frac{ p_{XY}(i,j)}{p_X(i) p_Y(j)}$$



Answer



Update


In my original answer (which I have left below for entertainment value) I asserted that it would be difficult to make it work with upvalues using D, but Tom has reminded me of the NonConstants option. When the symbols listed as NonConstants are differentiated they remain as expressions with head D:


D[a x, x, NonConstants -> {a}]
(* a + x D[a, x, NonConstants -> {a}] *)

This means that it is perfectly possible to use the upvalues approach suggested in the question, all that is required is to extend the pattern with a BlankNullSequence[] to allow for the presence of the option.



pXY /: D[pXY[i_, j_], pXY[k_, l_], ___] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX /: D[pX[i_], pXY[j_, k_], ___] := KroneckerDelta[i, j]

SetOptions[D, NonConstants -> {pXY, pX}];

D[pXY[i, j] pX[i]^2, pXY[k, l]]
(* KroneckerDelta[i, k] KroneckerDelta[j, l] pX[i]^2 +
2 KroneckerDelta[i, k] pX[i] pXY[i, j] *)

Original answer



In response to the updated question:


I think it will be very difficult to make it work using rules based on D. This simple example shows the problem:


a /: D[a[x], x] := 1

D[a[x], x]
(* 1 *)

D[2 a[x], x]
(* 2 a'[x] *)


Because there is no special rule defined for D[2 a[x], x] it is evaluated as normal, giving 2 Derivative[1][a][x]. You could of course add lots more upvalues to deal with different expressions, but you will end up effectively rewriting the rules of differentiation.


A better solution is to create your special rules based on Derivative, like this:


pXY[i_, j_]'[pXY[k_, l_]] := Simplify[KroneckerDelta[i, k] KroneckerDelta[j, l]]
pX[i_]'[pXY[j_, k_]] := KroneckerDelta[i, j]

You will note that your example does not appear to work, however:


D[pXY[i, j] pX[i]^2, pXY[k, l]]
(* 0 *)

What's going on here is that D sees that the function does not appear to depend on the differentiation variable, so it returns 0 immediately. It's the same as D[Sin, x] returning 0 while D[Sin[x], x] returns Cos[x]. The solution is to make the expression explicitly a function of the differentiation variable:



D[pXY[i, j][pXY[k, l]] pX[i][pXY[k, l]]^2, pXY[k, l]]
(* KroneckerDelta[i, k] KroneckerDelta[j, l] pX[i][pXY[k, l]]^2 +
2 KroneckerDelta[i, k] pX[i][pXY[k, l]] pXY[i, j][pXY[k, l]] *)

This works, but has made the notation messy. A possible solution is to define your own version of D which automatically converts func to func[var] before differentiating, and then converts func[var] back to func afterwards.


myD[expr_, var_] := D[expr /. (p_pX | p_pXY) :> p[var], var] /. (p_pX | p_pXY)[var] :> p

You now have:


myD[pXY[i, j] pX[i]^2, pXY[k, l]] 
(* KroneckerDelta[i, k] KroneckerDelta[j, l] pX[i]^2 +

2 KroneckerDelta[i, k] pX[i] pXY[i, j] *)

Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Filling between two spheres in SphericalPlot3D

Manipulate[ SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, Mesh -> None, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], {n, 0, 1}] I cant' seem to be able to make a filling between two spheres. I've already tried the obvious Filling -> {1 -> {2}} but Mathematica doesn't seem to like that option. Is there any easy way around this or ... Answer There is no built-in filling in SphericalPlot3D . One option is to use ParametricPlot3D to draw the surfaces between the two shells: Manipulate[ Show[SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], ParametricPlot3D[{ r {Sin[t] Cos[1.5 Pi], Sin[t] Sin[1.5 Pi], Cos[t]}, r {Sin[t] Cos[0 Pi], Sin[t] Sin[0 Pi], Cos[t]}}, {r, 1, 2 - n}, {t, 0, Pi}, PlotStyle -> Yellow, Mesh -> {2, 15}]], {n, 0, 1}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....