Machine Learning algorithm for predicting number of cases in pandemic

I’m giving my first steps with AI and Machine Learning so I have the following issue. I’m trying to predict an outcome from COVID-19 number of day vs confirmed cases using scikit-learn library. I mean, my input is the number of days since the pandemic started in my country and my output is the number of confirmed cases in that corresponding date. However both using GradientBoosting and RandomForest I get the same output values for the test values…I post below the code in Python as it is very short…

import numpy as np

from sklearn import ensemble

import pandas
datos = pandas.read_csv('covid.csv',";")

entrada = np.array(datos['ORDEN']).reshape(-1,1)

salida = datos["CASOS"]    

regr = ensemble.GradientBoostingRegressor(random_state=0,n_estimators=500).fit(entrada,salida)

test = np.array([i for i in range(63,70)]).reshape(-1,1)

print(regr.predict(test))

regr = ensemble.RandomForestRegressor(random_state=0,n_estimators=500).fit(entrada,salida)

print(regr.predict(test))

My output is this:

[1782.99976513 1782.99976513 1782.99976513 1782.99976513 1782.99976513
 1782.99976513 1782.99976513]
[1773.99 1773.99 1773.99 1773.99 1773.99 1773.99 1773.99]

What am I doing wrong?? Thanks in advance.

Topic ai regression scikit-learn python

Category Data Science


It will depend completely on your feature engineering so I can think that in this case your model is maybe only predicting the mean or median of your target.

Also, it might help try using other kinds of models since you are trying to predict the counts of an event on a determined period of time, so it might be useful to use Poisson models that are in experimental phase in sklearn, nonetheless, the documentation might help to understand how the model works

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.