Is it possible to find an internal event in a Classification between two classes?

I am very new to the machine learning area. So my question might be trivial 

I have two classes $U, V$ of binary vectors. In the training phase, 

I use $u_1,\ldots, u_{1000}$ from $U$ class and $v_1, \ldots, v_{1000}$ from $V$.

In the testing phase, I have to determine whether a vector is coming from $U$ or $V$? 

How can we do that with good accuracy? Also, can we find internal event by which ML makes the clasification? 

Topic distribution machine-learning

Category Data Science


There are no trivial questions. We are all trying to learn more from each other. Some remarks that I hope will help.

  • In general the classification problem is not restricted to having only two classes (having only two is usually named binary classification).
  • The vectors do not have to be binary (i.e. having only 0 or 1). Usually, they have to be numeric although some algorithms accept also categorical values. (In this case, how categorical values are treated depends on the algorithm).
  • Usually, a dataset is split into train and test parts. As the name suggests, the training part is used to train the algorithm/create the classification model. The result of the training is evaluated using the test part. The data in test part are hidden by the training algorithm.
  • There is no golden rule for doing the above with high accuracy. One usually, tweaks input variables (=what the coordinates of the vectors above mean, this is called feature engineering) and tries different algorithms and different parameters for them. I should add that accuracy is a technical term and is only one way to evaluate the performance of a classifier. One can read about accuracy and several other metrics in various places (ex. Wikipedia here or here). Also, there is not a fixed value for accuracy that a classifier needs to achieve in order to be accepted. For example, if you only have two classes U and V and 99% of your data are from U, then by assigning all items to class U you get 99% accuracy. (In this case one should probably use another metric).

Finally, for some algorithms it is easy to see the rules used for classification. A classical example is decision trees. For others, one has to process the classifier (and the data) and try to construct such rules or at least see the main drivers for the classifier.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.