Initialisation methods and variance

I'm trying to understand some weights initialisation methods by reading the article http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf .

But I don't understand their notation on variance. Right in equation (5) they refer to a variable $z^i$ and I don't know what they mean by it: if it's a collective index for all examples or not?

Topic variance

Category Data Science


Welcome to the community.

Definition: Let's define the feature/examples (examples are also called data points). Usually in tabular dataframes when folks doing data analytics, data science etc., features are in columns, and data points of each particular feature are populating the the rows. Here is a simple dataset snapshot showing this definition:

enter image description here

Variance: We are interested in measuring the statistics for each feature. Because in Machine Learning paradigm, those features that have very narrow variance often are not that informative for predicting the target. In this definition it would be along the rows. For example, income feature, we would like to know what is the mean of income, or variance of income for all data points (along rows).

Please NOTE this is just a convention that is very common among Python users and many standard libraries/tutorials out there. As long as you know what you are doing, you can swap these axes. I have seen that some old teaching materials having an opposite definition, meaning that features are in rows, and data points are spread in columns like your definition. If you follow this convention, obviously you have to calculate the variance along column axis, not rows anymore.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.