I'm dealing with derivatives of scalar functions of matrices and wondering if Mathematica can help me here.
The standard approach of expanding it in terms of components is cumbersome. As an motivating example, I want to minimize the following function, where $X$ is a matrix
$$f(X) = \text{tr}(X'X)$$
I can use matrix differential calculus to derive that one step of gradient descent to minimize this function is:
$$X^* = X - 2 X$$
On other hand, suppose $X$ is square, and my function is $$g(X)=\text{tr}(X^2)$$
Now, a single step of gradient descent looks as follows $$X^* = X - 2 X'$$
This can get complicated to do by hand, as an example, Exercise 1 of Magnus 9.10 asks to show that gradient descent step of the following function
$$h(X) = \det A X B$$
is the following
$$X^* = X-\det(AXB)(A'(B'X'A')^{-1}B')'$$
now take $$h^*(X) = \text{tr}(AX'BXC)$$ formula to do a single step of gradient descent is $$ \text{probably something simple} $$
Is there a way to get help deriving/checking expressions above in Mathematica?
(note, I'm using "gradient descent step" instead of "derivative" because there are multiple notations for derivative which differ in shape, but reformulating it as gradient descent removes ambiguity)
Answer
It is certainly fairly easy to check these relations for specific sizes of array:
X = Array[x, {7, 11}];
Map[D[Tr[Transpose[X].X], #] &, X, {2}] == 2 X
(* True *)
Y = Array[y, {17, 17}];
Map[D[Tr[Y.Y], #] &, Y, {2}] == 2 Transpose[Y]
(* True *)
Generalisation to the determinant expression should be relatively simple...
EDIT
Without being too rigorous about it, we can use the following relations to find some matrix derivatives (I assume in the following that all arguments to Tr are square )
With[{m = 4, n = 2, p = 3},
A = Array[a, {m, n}];
X = Array[x, {p, n}];
B = Array[b, {p, p}];
F = Array[c, {n, m}];
Y = Array[y, {n, p}];]
Tr[B] == Tr[Transpose[B]]
(* True *)
Tr[X.Y] == Tr[Y.X]
(* True *)
This allows us to derive (for example)
Tr[A.Transpose[X].B.X.F] == Tr[B.X.F.A.Transpose[X]] == Tr[Transpose[B].X.Transpose[A].Transpose[F].Transpose[X]] // Simplify
(* True *)
and guess the form of the derivative
Map[D[Tr[A.Transpose[X].B.X.F], #] &, X, {2}] ==
B.X.F.A + Transpose[B].X.Transpose[A].Transpose[F] // Expand
(* True *)
Comments
Post a Comment