To datawarehouse or not to data warehouse?
I was wondering if you will be as so kind to assist me with a quick question (will to be happy to explain more if you are willing to...). I am researching and setting up a system to do a machine learning job (training) to find correlations between Social Media (or other digital trails from wearables etc.) information of a user and his scores on personality tests.
The scores are in my Postgresql (on AWS) and I need to decide on how to store the Social Media/Digital trails from wearables (unstructured and structured) information. I was thinking DynamoDB.
I was also thinking to integrate both databases under Amazon Redshift and to do the analytics (using RapidMinder) from there..... Does it all make sense? Do I really need a data warehouse for this? Will it be more sensible to use just a single DB (Postgresql or Dynamo) for all this without data warehousing? to I am talking about up to 100K records more or less (for the training).... Future data will in the millions.
I get so many conflicting answers and I hope and will appreciate your kindness and advice. Thank you so much in advance!!!
Topic redshift
Category Data Science