Matrix calculus is the extension of calculus to vector or matrix setting. I used to suffer a lot with matrix calculus in my early grad life. I’ve seen some other people to suffer as well. It primarily happens because the standard courses in linear algebra do not cover this topic very often. However, all the maths in Matrix calculus are basically trivial. Anyway, here are a few interesting facts that made my life much easier. I could use these to derive many expressions (e.g. calculating gradients in SGD based algorithms etc.) in my machine learning practice. Check if they can help you or not.
Inside trace, objects (both vector and matrix. tensor?) can cycle or can take transpose of itself. For example:
- tr(AB) = tr(BA)
tr(ABC) = tr(BCA) = tr(CAB)
In general, tr(ABCD…Z) = tr(ZABC…Y)=tr(YZAB…X)
Inner product of objects can be represented by traces.
- … … A,B are matrices
- … … a, x are vectors. The second equality holds because trace of a scalar is just the scalar. Once it is inside the trace, now we can make a lot of fun as shown in the next fact.
Fact 1 and 2 leads to interesting consequences:
- … Notice how B changed its location
… Notice how C changed its location
Differentiation of inner product of an object (vector or matrix) with a constant object is the const. object:
We really need to understand this final fact. In order to take derivative of a function which contains multiple instances of the differentiating variable, we need to take the following approach. Consider only one instance of the variable to be active at a time (i.e. assume all the other instances constant), do the derivative, and then put the actual variable in place of all its instances. Now perform similar action for other instances of the variable. This explanation is still too verbose. I hope the following example will clarify it:
Prove all the facts.