How to improve regression neural network?

Question

How to improve regression neural network?

Darkstar Dream

2022年5月8日 08:02

I am new to deep learning and data science and trying to increase my knowledge by working on some hackathons. Currently, the hackathon project I am working on has the task to predict the closing price of crypto-currency based on 48 parameters with ~1200 records.

By far I was able to achieve some good accuracy from the model but still, my score is very low. I have tried many things from knowledge but it doesn't seem to be affecting the performance a bit. So I just want a little suggestion and tips, since there is scope to improve the performance.

Dataset

Here are some sample records from my dataset.

id	asset_id	open	high	low	volume	market_cap	url_shares	unique_url_shares	reddit_posts	reddit_posts_score	reddit_comments	reddit_comments_score	tweets	tweet_spam	tweet_followers	tweet_quotes	tweet_retweets	tweet_replies	tweet_favorites	tweet_sentiment1	tweet_sentiment2	tweet_sentiment3	tweet_sentiment4	tweet_sentiment5	tweet_sentiment_impact1	tweet_sentiment_impact2	tweet_sentiment_impact3	tweet_sentiment_impact4	tweet_sentiment_impact5	social_score	average_sentiment	news	price_score	social_impact_score	correlation_rank	galaxy_score	volatility	market_cap_rank	percent_change_24h_rank	volume_24h_rank	social_volume_24h_rank	social_score_24h_rank	medium	youtube	social_volume	percent_change_24h	market_cap_global	close
ID_322qz6	1	9422.849081	9428.490628	9422.849081	713198620.0	173763453624.0	1689.0	817.0	55.0	105.0	61.0	271.0	3420.0	1671.0	11675867.0	39.0	1343.0	448.0	2237.0	124.0	330.0	331.0	2515.0	120.0	506133.0	1326610.0	1159677.0	8406185.0	281329.0	11681999.0	3.6	69.0	2.7	3.6	3.3	66.0	0.0071176	1.0	606.0	2.0	1.0	1.0	2.0	5.0	4422	1.4345161346109587	281806567507.0	9428.279323
ID_3239o9	1	7985.359278	7992.059917	7967.567267	400475518.0	142694202230.96	920.0	544.0	20.0	531.0	103.0	533.0	1491.0	242.0	5917814.0	195.0	1070.0	671.0	3888.0	1.0	52.0	315.0	1100.0	23.0	1320.0	381117.0	1706376.0	3754815.0	80010.0	5924770.0	3.7	1.0	2.0	2.0	1.0	43.5	0.00941863	1.0							2159	-2.4595073021531104	212689713284.66	7967.567267
ID_323J9k	1	49202.033778	49394.593518	49068.057046	3017728869.0	916697653223.0	1446.0	975.0	72.0	1152.0	187.0	905.0	9346.0	4013.0	47778746.0	104.0	2014.0	1099.0	11476.0	331.0	923.0	864.0	6786.0	442.0	9848462.0	5178557.0	2145663.0	25510267.0	5110490.0	47796942.0	3.7	22.0	3.1	3.0	3.3	65.5	0.01353005	1.0	692.0	3.0	1.0	1.0			10602	4.942447794031182	1530711784042.0	49120.738484

The dataset has 48 features however, the model is performing well only with 5 columns that are ['open', 'high', 'low', 'market_cap', 'market_cap_global']

Model

I have tried a small neural network with only 2 hidden layers. And I have fed the model with the above 5 features which are scaled with a standard scaler. Apart from this, I also have utilized callbacks, early stopping, and a custom loss function for calculating rmse. Till now this is the best performing model I was able to create

# create model
model_dl2 = Sequential()
model_dl2.add(Dense(50, input_dim=5, activation='relu'))
model_dl2.add(Dense(75,  activation='relu'))
model_dl2.add(Dense(1,  activation='linear'))

# custom loss function
from keras import backend as k
def root_mean_squared_error(y_true, y_pred):
    return k.sqrt(k.mean(k.square(y_pred - y_true))) 

# callbacks
loss = ModelCheckpoint('Models/best_model2.h5', monitor='val_loss', verbose=1, save_best_only=True)
es = EarlyStopping(patience=500)

# Compile model
opt = tf.keras.optimizers.Adam(learning_rate=0.5, amsgrad=True)
model_dl2.compile(loss= root_mean_squared_error, optimizer=opt)

model_dl2.fit(x_trainS2, y_trainS2, validation_data=(x_testS2, y_testS2), epochs=3000, batch_size=128, callbacks=[loss, es])

## accuracy rmse:53

My attempt to increase the performance

The accuracy of the model is stuck around rmse of 53, I have tried many things such as

different activation function, optimizer functions with different learning rate
increased/decreased hidden layers neurons (vertical scaling)
increased/decreased neurons (horizontal scaling)
I tried to take PCA of the rest 43 or some selected columns

But none of this increased the accuracy.

Apart from this, Dataset also have few issues such as

many null values in both target and features 'close', about ~30%
multicollinearity
skewness(right-skewed).

To solve these issues I have tried few things which weren't that helpful except the 1st one.

For null values it seems to be working well if we fill it with 0's in both features and the target column. So not dropped any rows
For skewness I tried to do Power transformation but it didn't work. Also, I can't do a log transformation because the dataset contains negative values. So basically did nothing
Because of multicollinearity I used only 5 features (mentioned above) that are working well. However, these 5 features are also highly correlated and for that, I was relying on data transformation but it didn't work.

My question

My problems may sound very basic but I have applied many things that I have learned by myself and now I am out of ideas. I don't know what to do. Improving the dataset issue could be one solution but I don't know what to do, after trying those things. Also if the issue is in the model then it will be great if you can recommend some tuning that I may be missing

feel free to ask for more details if you need to.

Topic hyperparameter-tuning regression deep-learning neural-network data-cleaning

Category Data Science

Robert Link · Accepted Answer · 2021年9月17日 04:15

These are great first attempts! However, neural networks are notoriously bad at working with tabular data. You'd might be better served using a traditional ML model (e.g., linear regression, SVM).

Regardless of whether you're using a neural net or otherwise, you should normalize/transform your input features and the output feature (i.e., your closing price). Transforming your inputs would remedy the right-skew problem that you're facing and shrink the overall scale of your regression data -- which helps your prediction models converge towards a minimum loss. I hope that this helps!

How to improve regression neural network?

Dataset

Model

My attempt to increase the performance

My question

About