Dropout onto pre-weighted vs onto pre-activated vector?

Kari

2021年7月20日 14:08

For any layer in my neural net, should I apply dropout onto an entering vector, or on the pre-activated vector?

In other words:

$$\vec q=W\cdot \vec x$$ $$\vec h = activate(drop(\vec q))$$

or:

$$\vec q=W\cdot (drop(\vec x)) $$ $$ \vec h = activate(\vec q)$$

I think the second variant is smoother (none of our current vector is fully dropped out, but is assembled from a mix of the dropped-out input) and is therefore softer.

Topic mathematics dropout

Category Data Science

Dropout onto pre-weighted vs onto pre-activated vector?

About