Transitioning from a python script for data transformation to BigQuery

So I have a dataset spread over multiple and ever-growing excel files all of which looks like: email order_ID order_date [email protected] 1234 23-Mar-2021 [email protected] 1235 23-Mar-2021 [email protected] 1236 23-Mar-2021 [email protected] 1237 24-Mar-2021 [email protected] 1238 28-Mar-2021 End goal is to have two distinct datasets as: First one being Orders: (Public. For analysis, trading emails with user_IDs for anonymity and marking returning for further analyses) user_ID order_ID order_date is_returning? 1 1234 23-Mar-2021 0 2 1235 23-Mar-2021 0 2 1236 23-Mar-2021 1 1 …
Category: Data Science

Overdue evolution in SQL

Despite my efforts I cannot think of a way to answer my need : I have 2 tables containing respectively a set of loans to be reimbursed, and a set of reimbursements on these loans (not all loans have an entry in reimbursement table because some of them remain unpaid). Table loan has columns: id amount due_date Table reimbursement has columns: id debt_id payment_date My goal is to obtain the amount that was overdue for more than 5 days for …
Category: Data Science

What data/analytics tools I need to use at my current e-commerce workplace?

I recently started a new position as a data scientist at an E-commerce company. The company is founded about 4-5 years ago and is new to many data-related areas. Specifically, I'm their first data science employee. So I have to take care of both data analysis tasks as well as bringing new technologies to the company. They have used Elastic Search (and Kibana) to have reporting dashboards on their daily purchases and user's interactions on their e-commerce website. They also …
Category: Data Science

Is it possible to update data and retrain just one of several data series in bigquery model

I am building something very similar to this BigQuery ML example project. My system is different in two ways: Firstly it will need several thousand time-series so I would prefer to use the multiple-series feature rather than having thousands of individual models. Secondly is the data is more unpredictable in the long run (rather than periodic or seasonal) so needs retraining quite often, with only local trends being detected. The data is actually monitoring voltages in battery-operated devices, which usually …
Category: Data Science

Query google trend using google BigQuery

I need help with google BigQuery. Am using big query to query data from Google Trends. now I want to get data using a specific keyword example spiderman, and get the result in regions like CSV downloaded in google trend "interest over time". But google trend has this code only view 25 top-trending terms SELECT * FROM `bigquery-public-data.google_trends.top_terms` WHERE refresh_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) I want to use same syntax to get data for a specific term/keyword.
Category: Data Science

How to schedule importing data files from SFTP server located on compute engine instance into BigQuery?

What I want to achieve: Transfer hourly coming data files onto a SFTP file server located on a compute engine VM from several different feeds into Bigquery with real-time updates effectively & cost-efficiently. Context: The software I am trying to import data from is an old legacy software and does not support direct exports to cloud. So direct connection from software to cloud isn't an option. It does however support exporting data to a SFTP server. Which is not available …
Category: Data Science

Recommendation Systems User Profile Streaming Data on GCP

I have a recommendation system that recommends articles to different users. I am planning to provide the recommendations in an off-line fashion. Where I already have a table in BigQuery which has the recommendations and an API call returns the recommendations for each page on the website. Now I want to have another table called user_profile which stores the information about the user_id|shown|clicked| articles to the users. This should happen in real-time. I looked into https://cloud.google.com/bigquery/streaming-data-into-bigquery but it has limitations. …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.