Dropout onto pre-weighted vs onto pre-activated vector?

For any layer in my neural net, should I apply dropout onto an entering vector, or on the pre-activated vector?

In other words:

$$\vec q=W\cdot \vec x$$ $$\vec h = activate(drop(\vec q))$$

or:

$$\vec q=W\cdot (drop(\vec x)) $$ $$ \vec h = activate(\vec q)$$

I think the second variant is smoother (none of our current vector is fully dropped out, but is assembled from a mix of the dropped-out input) and is therefore softer.

Topic mathematics dropout

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.