To answer your questions first
This paper assume $f_i$ is sigmoid function $f_i(x) = \sigma(x) = \frac{1}{1 + e^{-x}}$.
Note that
$$
\frac{\partial \sigma(x)}{\partial x} = \sigma(x) (1 - \sigma(x))
$$
Since
$$
\begin{align*}
& f'_{l_m}\big(\text{net}_{l_m}(t - m)\big) w_{l_m l_{m - 1}} \\
& = \sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big) \cdot w_{l_m l_{m - l}}, \tag{1}
\end{align*}
$$
to find the maximum value of (1) with respect to $w_{l_m l_{m - 1}}$, we can calculate the derivative of (1) and find the point $w_{l_m l_{m - 1}}^*$ where the derivative of (1) evaluated at $w_{l_m l_{m - 1}}^*$ equals to 0, i.e.,
$$
\frac{\partial \Big[\sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big) \cdot w_{l_m l_{m - l}}\Big]}{\partial w_{l_m l_{m - 1}}} = 0 \tag{2}
$$
Now we calculate the derivative
$$
\begin{align*}
& \frac{\partial \Big[\sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big) \cdot w_{l_m l_{m - l}}\Big]}{\partial w_{l_m l_{m - 1}}} \\
& = \frac{\partial \sigma\big(\text{net}_{l_m}(t - m)\big)}{\partial \text{net}_{l_m}(t - m)} \cdot \frac{\partial \text{net}_{l_m}(t - m)}{\partial w_{l_m l_{m - 1}}} \cdot \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big) \cdot w_{l_m l_{m - l}} \\
& \quad + \sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \frac{\partial \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big)}{\partial \text{net}_{l_m}(t - m)} \cdot \frac{\partial \text{net}_{l_m}(t - m)}{\partial w_{l_m l_{m - 1}}} \cdot w_{l_m l_{m - l}} \\
& \quad + \sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big) \cdot \frac{\partial w_{l_m l_{m - 1}}}{\partial w_{l_m l_{m - 1}}} \\
& = \sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big)^2 \cdot y_{l_{m - 1}}(t - m - 1) \cdot w_{l_m l_{m - 1}} \\
& \quad - \Big(\sigma\big(\text{net}_{l_m}(t - m)\big)\Big)^2 \cdot \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big) \cdot y_{l_{m - 1}}(t - m - 1) \cdot w_{l_m l_{m - 1}} \\
& \quad + \sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big) \\
& = \Big[2 \Big(\sigma\big(\text{net}_{l_m}(t - m)\big)\Big)^3 - 3 \Big(\sigma\big(\text{net}_{l_m}(t - m)\big)\Big)^2 + \sigma\big(\text{net}_{l_m}(t - m)\big)\Big] \cdot \\
& \quad \quad y_{l_{m - 1}}(t - m - 1) \cdot w_{l_m l_{m - 1}} \\
& \quad + \sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big) \\
& = \sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \Big(2 \sigma\big(\text{net}_{l_m}(t - m)\big) - 1\Big) \Big(\sigma\big(\text{net}_{l_m}(t - m)\big) - 1\Big) \cdot \\
& \quad \quad y_{l_{m - 1}}(t - m - 1) \cdot w_{l_m l_{m - 1}} \\
& \quad + \sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big) \\
& = 0.
\end{align*} \tag{3}
$$
The last equality in (3) follows from (2).
By swapping terms we can further reduce our equation:
$$
\begin{align*}
& \sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \Big(2 \sigma\big(\text{net}_{l_m}(t - m)\big) - 1\Big) \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big) \cdot \\
& \quad \quad y_{l_{m - 1}}(t - m - 1) \cdot w_{l_m l_{m - 1}} = \sigma\big(\text{net}_{l_m}(t - m)\big) \cdot \Big(1 - \sigma\big(\text{net}_{l_m}(t - m)\big)\Big) \\
\implies & \Big(2 \sigma\big(\text{net}_{l_m}(t - m)\big) - 1\Big) \cdot y_{l_{m - 1}}(t - m - 1) \cdot w_{l_m l_{m - 1}} = 1 \\
\implies & w_{l_m l_{m - 1}} = \frac{1}{y_{l_{m - 1}}(t - m - 1)} \cdot \frac{1}{2 \sigma\big(\text{net}_{l_m}(t - m)\big) - 1} \\
\implies & w_{l_m l_{m - 1}} = \frac{1}{y_{l_{m - 1}}(t - m - 1)} \cdot \coth\bigg(\frac{\text{net}_{l_m}(t - m)}{2}\bigg).
\end{align*} \tag{4}
$$
The last implication in (4) use the following equations:
$$
\begin{align*}
\tanh(x) & = 2 \sigma(2x) - 1 \\
\tanh(\frac{x}{2}) & = 2 \sigma(x) - 1 \\
\coth(\frac{x}{2}) & = \frac{1}{\tanh(\frac{x}{2})} = \frac{1}{2 \sigma(x) - 1}
\end{align*}
$$
The lengthy writing above should be enough to answer your question 1.
For 2., I believe is just an assumption to make analysis easier.
But I agree it's not a good assumption (why? you can see that even if we let $y_{l_{m - 1}}(t - m - 1)$ non-zero, we can still make $\text{net}_{l_m}(t - m) = 0$, which make $\coth$ undefined).
For 3., since $f_{l_m}'$ is sigmoid function, which use exponential in it, we can see that
$$
\begin{align*}
& \lim_{w_{l_m l_{m-1}} \to \infty} \frac{w_{l_m l_{m - 1}}}{1 + e^{-\text{net}_{l_m}(t - m)}} \frac{e^{-\text{net}_{l_m}(t - m)}}{1 + e^{-\text{net}_{l_m}(t - m)}} \\
&= \lim_{w_{l_m l_{m-1}} \to \infty} \frac{w_{l_m l_{m - 1}}}{1 + e^{-\sum_{l_m^*} w_{l_m^* l_{m - 1}} y_{l_{m - 1}}(t - m - 1) }} \frac{e^{-\sum_{l_m^*} w_{l_m^* l_{m - 1}} y_{l_{m - 1}}(t - m - 1)}}{1 + e^{-\sum_{l_m^*} w_{l_m^* l_{m - 1}} y_{l_{m - 1}}(t - m - 1)}} \\
&= 0.
\end{align*}
$$
These should answer your question.
BUT!
Actually the author make a mistake in equation (3.2), which make the analysis a little bit inaccurate.
By a little bit I mean even though the implications were all wrong, by fixing (3.2) and sorting out the implications the result still holds (gradients are still vanishing).
I'll write down the correct equation and leave the sorting to you
$$
\frac{\partial \vartheta(t - q)}{\partial \vartheta(t)} = \sum_{l_1 = 1}^n \cdots \sum_{l_{q - 1} = 1}^n \prod_{m = 1}^q f_{l_{m}}'(\text{net}_{l_m}(t - m)) w_{l_{m - 1} l_m}
$$
(he wrote $w_{l_m l_{m - 1}}$ instead of $w_{l_{m - 1} l_m}$.)