Machine learning cost/benefit for including priors in input vector

Is there a trade-off in accuracy/generalisation/performance when providing priors to a general machine learning algorithm vs training the machine learning algorithm with enough data so that it could internalise that prior?

For example: Let's say I'm trying to get an ANN to do some basic classification on whether Vehicle 'A' is in class 'Bus' or 'Not Bus'. Vehicles, in this example, have some features that are dependent on each other [Size,Speed] and let's say that I have a history of all features in a bunch of locations.

I'd like to build a location independent, generalised algorithm so that when you input [Size, Speed, Longitude, Latitude] you can get Bus/Not_Bus.

Now the obvious thing to do (I think) would be to train an algorithm using many [Size, Speed, Longitude, Latitude] vectors but I have a feeling that this would be brittle if it ever came to a location that was not in the training set and might behave unexpectedly.

What would the trade-offs be if I used [Size, Speed, Prior(Size | lat/lon), Prior(Speed | lat/lon) where the Prior is the pre-calculated histogram of speeds/sizes for that location.

Prior(Size | location) = [Size-0, Size-10, ..., Size-X] 

It would make the input vector much larger, and a kernel function can be used to get prior data for locations with no historical matches. It would also make the algorithm location independant as the input features only include size, speed and the priors of those for the specific location.

Topic geospatial feature-selection machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.