BME | Machine Learning - Duality

Purpose: Generally, algorithms such as MLE can only optimize unconstrained expressions, such as $\max_\omega \frac{1}{m}\sum l(\omega; x_i, j_i)$ . However, when encountering algorithms with constraints, one needs to employ Duality, LP, QP to solve such problems. Additionally, Duality, LP, QP can also address the first type of problems without constraints.

Duality Problem

Problem Type

\min_\omega J(\omega)\\ \text{s.t. } c_i(\omega)\leq 0 \text{, for } i\in {j}\\ e_i(\omega)= 0 \text{, for } i\in {\epsilon}——(0)

Augmented Lagrange

L(\omega, \alpha, \beta) = J(\omega)+\sum_{i\in j}\alpha_i c_i(\omega)+\sum_{i\in \epsilon}\beta_ie_i(\omega)——(1)

Dual feasible means $\alpha_i\geq0, \beta_i\in \mathbb{R}$

Theorem 1

From equation (1), it follows that

\max_{\alpha\geq0, \beta}L(\omega,\alpha,\beta)=\left\{ \begin{matrix} J(\omega) \text{, if $\omega$ is feasible}\\ \infty, \text{otherwise} \end{matrix} \right.——(2)

Especially, if $J^*=\min_\omega J(\omega)$ , and $c_i(\omega)\leq 0$ and $e_i(\omega)=0$ , then

J^*=\min_\omega \max_{\alpha\geq0, \beta} L(\omega,\alpha,\beta)

Refer to the attached PDF for proof.

Theorem 2 | Weak Duality

Define the Lagrange dual function as follows

D(\alpha, \beta) = \min_\omega L(\omega, \alpha, \beta)

Then, the following holds

D(\alpha, \beta)\leq J(\omega)

for all feasible $ \omega $and $\alpha \geq 0$ and $\beta$ . Particularly,

D^*:=\max_{\alpha\geq0, \beta} \min_\omega L(\omega,\alpha,\beta)\leq\min_\omega \max_{\alpha\geq0, \beta} L(\omega,\alpha,\beta)=J^*

Note: Weak duality doesn’t necessarily require $J(\omega)$ to be convex.

Theorem 3 | Strong Duality

If $J(\omega), c_i, e_i$ are all convex, and the following constraint qualification holds:

There exists an $\omega$ such that $c_i(\omega)<0$ , for all $i\in j$

Then, equation (1) satisfies

D^*:=\max_{\alpha\geq0, \beta} \min_\omega L(\omega,\alpha,\beta)=\min_\omega \max_{\alpha\geq0, \beta} L(\omega,\alpha,\beta)=J^*

Note that $D^* = \max_{\alpha>0, \beta} D(\alpha, \beta)$ represents the case when the duality problem is optimal, and $J^* = \min_{\omega}J(\omega)$ represents the original problem being optimal.

KKT Condition

KKT Conditions

If the primal problem is convex, then the KKT conditions are both necessary and sufficient. That is, if $\hat\omega$ and $(\hat\alpha, \hat\beta)$ satisfy the KKT conditions, then $\hat\omega$ and $(\hat\alpha, \hat\beta)$ are primal and dual optimal, i.e.,

KKT Conditions Equations

Linear Problem

Problem satisfies

\min \ c^Tw \Rightarrow J(\omega)\\ \text{s.t.} \ A\omega=b \Rightarrow e_j(\omega)\\ \omega \geq 0 \Rightarrow c_i(\omega)

$ D(\alpha, \beta) $for LP is still a LP

Quadratic Problem

Problem satisfies

\min \omega^TG\omega+\omega^Td\\ \text{s.t.}\ A\omega=b\\ \omega \geq0

Refer to the attached PDF for specific details on how to apply duality.

Summary

Example Problem

General approach:

Write L;
$D=\min(L)$ : Take partial derivatives of w (sometimes there are two), express $w$ in terms of $\alpha$ and $\beta$ to obtain $D$ ;
$\max D$ gives $D^*,\alpha^*,\beta^*$ : Take derivatives of D or use KKT conditions to find $D^*, \alpha^*, \beta^*$ ;
$J^*$ , $w^*$ : Substitute into the expression obtained in step 2 to find $w^*$ . $J^*=D^*$ .

Note: The content in this blog is class notes shared for educational purposes only. Some images and content are sourced from textbooks, teacher materials, and the internet. If there is any infringement, please contact aursus.blog@gmail.com for removal.