Find the Maximum of a Multi-variable Taylor Series

ecastro · Aug 11, 2019

Firstly, the matrix notation of the series is,
\begin{align*}
f\left(x, y, z\right) &= f\left(a, b, c\right) + \left(\mathbf{x} - \mathbf{a}\right)^T \frac{\partial f\left(a, b, c\right)}{\partial \mathbf{x}} + \frac{1}{2}\left(\mathbf{x} - \mathbf{a}\right)^T \frac{\partial^2 f\left(a, b, c\right)}{\partial \mathbf{x}^2} \left(\mathbf{x} - \mathbf{a}\right),
\end{align*}
where ##\mathbf{x} = \left[x, y, z\right]^T## and ##\mathbf{a} = \left[a, b, c\right]^T##. The first derivative of the series is set to zero to find the maximum or minimum point, then how to apply the first derivative? Is the first derivative matrix to be applied is ##\frac{\partial}{\partial \mathbf{x}} = \left[\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z}\right]^T##?

Thank you in advance.

tnich · Aug 11, 2019

ecastro said:

Is the first derivative matrix to be applied is ##\frac{\partial}{\partial \mathbf{x}} = \left[\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z}\right]^T##?

Yes, you seem to have the right concept here. You need to look at the gradient
$$\nabla f(\mathbf x)=\frac{\partial f}{\partial \mathbf{x}} = \left[\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}\right]$$
and consider how your Taylor series will behave for ##\mathbf x## in the neighborhood of ##\mathbf a## when ##\nabla f(\mathbf a) = \mathbf 0##. Hints: What conditions on ##\nabla^2 f(\mathbf a)## would have to hold for ##f(\mathbf a)## to be a maximum or a minimum? Are you looking for a global extremum or a local one?

ecastro · Aug 12, 2019

Thank you for the confirmation. I arrived at this solution, assuming that ##\left(a, b, c\right) = \left(0, 0, 0\right)## (for short-writing),

\begin{align*}
0 &= \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} f\left(0, 0, 0\right) + \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial f\left(0, 0, 0\right)}{\partial x} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial y} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial z} \end{bmatrix} + \frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}\end{bmatrix}\begin{bmatrix}x \\ y \\ z \end{bmatrix}
\end{align*}

So, the first term is zero because ##f\left(0, 0, 0\right)## is constant. As for the second term, the right-most matrix is constant because the first derivative of the function was evaluated at ##\left(0, 0, 0\right)## (correct me if I am wrong), so it reduces to:
\begin{align*}
\begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial f\left(0, 0, 0\right)}{\partial x} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial y} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial z} \end{bmatrix} &= \begin{bmatrix} \frac{\partial f\left(0, 0, 0\right)}{\partial x} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial y} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial z}\end{bmatrix}
\end{align*}

As for the final term,
\begin{align*}
\frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}\end{bmatrix}\begin{bmatrix}x \\ y \\ z \end{bmatrix} \\
= \frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}z\end{bmatrix} \\
= \frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2}x^2 + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}xy + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}xz + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}yx + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2}y^2 + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}yz + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}zx + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}zy + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}z^2\end{bmatrix} \\
= \frac{1}{2} \begin{bmatrix} 2x \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2} + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}z + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}x + 2y\frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2} + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}z + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}y + 2z\frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}
\end{bmatrix}
\end{align*}

If the partial differentiations are commutative, I should have 2 as a common factor in the matrix, which cancels out with the ##\frac{1}{2}##, and I can factor out the ##x##, ##y##, and ##z##. The final matrix notation should be,
\begin{align*}
\mathbf{0} &= 0 + \frac{\partial f\left(0, 0, 0\right)}{\partial \mathbf{x}} + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial \mathbf{x}^2} \mathbf{x} \\
\mathbf{x} &= -\frac{\partial^2 f\left(0, 0, 0\right)}{\partial \mathbf{x}^2}^{-1} \frac{\partial f\left(0, 0, 0\right)}{\partial \mathbf{x}}
\end{align*}

And I just need to input this into the original equation to find the maximum (or minimum) point, right?

tnich said:

Hints: What conditions on ∇2f(a)∇2f(a)\nabla^2 f(\mathbf a) would have to hold for f(a)f(a)f(\mathbf a) to be a maximum or a minimum? Are you looking for a global extremum or a local one?

As for the second derivative, I think it should be positive if it's minimum and otherwise if it's maximum. I am interested in the local one, how much will it change if I am somehow interested in the global extremum?

tnich · Aug 12, 2019

First, make sure you understand what a solution looks like. ##f(\mathbf x)## is a generic function, and you may not be able get an exact numerical solution for a maximum or minimum. You can figure out what conditions a maximum or minimum would satisfy, though. You have already guessed that ##\nabla f(\mathbf a)=\mathbf 0## at a local maximum or minimum). Start with

$$f(\mathbf x) = f(\mathbf a) + (\mathbf x - \mathbf a)^T \nabla f(\mathbf a) + \frac 1 2 (\mathbf x - \mathbf a)^T \nabla^2 f(\mathbf a) (\mathbf x - \mathbf a) + \dots$$

Now suppose you have a value of ##\mathbf a## for which ##\nabla f(\mathbf a)=\mathbf 0## and substitute ##\mathbf 0## for ##\nabla f(\mathbf a)## in the Taylor series expansion. Notice that you have an infinite number of terms in the Taylor expansion, so you can't just look at the first three and ignore the rest. You can, however, make ##\mathbf x## as close to ##\mathbf a## as you want. This tells you what happens to ##f(\mathbf x)## in a small neighborhood of ##\mathbf a## and makes the rest of the terms insignificantly small. Now you can consider what the matrix of second partials ##\nabla^2 f(\mathbf a)## needs to look like to make ##f(\mathbf a)## a maximum or minimum point in that neighborhood.

Knowing what a solution looks like, you can start to design numerical methods for approximating maxima or minima. You have come close to deriving a gradient descent method in your final solution, but here are some additional things you should consider.

First, assuming that ##\mathbf a=\mathbf 0## pretty much guarantees that you will get a local min or max near ##\mathbf x=\mathbf 0##, which is fine if that is the one you want, but the function may have other optima you are interested in.

Second, except in special cases, a gradient descent will not get you the exact solution, though you can do successive iterations and get closer to it.

Finally, while a gradient descent method sends you in the right direction toward an extremum, you may overshoot or undershoot it, so you may have to choose how far to go in that direction.

ecastro · Aug 21, 2019

Thank you for your reply!

Actually, I am trying to understand the SIFT algorithm and trying to write it in code; it interpolates the neighbourhood of a pixel using the first three terms of the Taylor series. I am pretty much interested in the local extremum point, but I was curious for the absolute extremum. Sorry about that. :P

Find the Maximum of a Multi-variable Taylor Series

Related to Find the Maximum of a Multi-variable Taylor Series

1. What is a Taylor series?

2. How is a multi-variable Taylor series different from a single-variable Taylor series?

3. Why is it important to find the maximum of a multi-variable Taylor series?

4. What is the process for finding the maximum of a multi-variable Taylor series?

5. How can the maximum of a multi-variable Taylor series be used in real-world applications?

Similar threads

Hot Threads

Recent Insights