databases

Azure Cloud SQL - Querying large number of rows with Python

Allen Wu

2022年6月2日 17:04

I have a Python Flask application that connects to an Azure Cloud SQL Database, and uses the Pandas read_sql method with SQLAlchemy to perform a select operation on a table and load it into a dataframe. recordsdf = pd.read_sql(recordstable.select(), connection) The recordstable has around 5000 records, and the function is taking around 10 seconds to execute (I have to pull all records every time). However, the exact same operation with the same data takes around 0.5 seconds when I'm selecting …

Topic: azure-ml pandas python databases

Category: Data Science

How can I store sources, effective dates, and confidence for every property in a knowledge graph?

AJAr

2022年5月9日 16:00

What I am wanting to do is ensure that every property in a knowledge base comes from at least one source. I would like to ensure that every edge is spawned (or at least explained) by some event, like a "claim" or "measurement" or "birth." I'd like to rate on a scale the confidence that some property is correct, which could also be inherited from the source's confidence rating. Finally, I want to ensure that effective date(s) are known or …

Topic: uncertainty inference graphs knowledge-base databases

Category: Data Science

Database System for Manual Entry

Gurkenkönig

2022年5月9日 01:04

I'm in charge of setting up a patient register (100K+ patients) for a non-profit project with little money. This register should provide the basis for later datascience. I'm not sure how a good database solution can work the long run. It must be possible for various clinics to enter the data manually into the system. Since I have experience with Django I have developed a webapp prototype with Django and an SQLite DB (it is not expected that many users …

Topic: databases

Category: Data Science

Treating highly correlated features to the label feature

InsDSt

2022年5月8日 16:55

We work on a dataset with >1k features, where some elements are temporal/non-linear aggregations of other features. e.g., one feature might be the salary s, where the other is the mean salary of four months (s_4_m). We try to predict which employees are more likely to get a raise by applying a regression model on the salary. Still, our models are extremly biased toward features like s_4_m, which are highly correlated to the label feature. Are there best practices for …

Topic: correlation databases

Category: Data Science

Feature extraction from relational database

pnv

2022年5月3日 11:06

In order to build a classifier, I need to extract a few features from the data stored on a MySQL database. I need to join multiple tables and it is taking a lot of time. I have joined 2 tables at one time and have got results in multiple cases. I need to combine them. Writing a script will be the best option? How do people extract features from large relational databases? Am I missing something? Thanks.

Topic: feature-extraction databases

Category: Data Science

ML model deployment architecture?

TheSugoiBoi

2022年3月16日 03:07

I came from a software development background and we have separate servers of the same database (dev, test, prod). The reason for this is because we develop our apps against the dev DB, run tests against the Test DB, and prod is prod. This is so we create a clear separation and won't bring down prod trying to build our app. Do you guys train your models the same way? Have 3 environments of the same database and as your …

Topic: machine-learning-model data-product databases machine-learning

Category: Data Science

I Have Issues Installing Basemap

Mr Prof

2022年3月6日 04:05

I tried to install Basemap and it gives me this: preparing transaction: done verifying transaction: done executing transaction: failed ERROR conda.core.link:_execute(507): An error occurred while uninstalling packag e 'defaults::conda-4.5.12-py37_0'. PermissionError (13, Access is denied) Attempting to roll back Rolling back transaction: done PermissionError (13, Access is denied) Question: What should I do next? I will appreciate your response as I have been on this for some time now. Thanks. NOTE: I have also tried to install cartopy but I ran …

Topic: image-preprocessing data-science-model databases machine-learning

Category: Data Science

Data source for financial data mining

Leapfrog

2022年2月26日 22:48

I plan to do data modeling in the financial area for my master's dissertation. I am thinking of finding the connection between a certain company or country characteristics ( x values) and their creditworthiness (here I am still looking for a y variable such as credit score, bankruptcy occurrence, etc.). Do you know in which databases I could find required input_ Company data would be great, however, most likely country data might be more accessible then I could also do …

Topic: finance databases data-mining

Category: Data Science

Solusion to discover/inference the usage/meanings of tables in unkown database?

Mithril

2022年2月9日 07:48

This is a usual situation I meet recently that customers gave me a database with many tables they don't quite understand too, then ask me to make a model predict the future revenue, classify which user may be valuable or something else. To be honest, extracting useful data from an unknown database made me exhausted. For example, I need to figure out which table is the user table, product table, or transaction table ... which column can use to join(there …

Topic: inference relational-dbms databases

Category: Data Science

How to strategize model training with new data coming in every day?

dodo postman

2021年12月21日 13:44

I have a mysql database in which new records are added every day to raw data. This raw data is cleaned and a ML model is trained with it once a week. What should be the best strategy to capture new data in model without fetching entire records( old & new) and retraining from scratch. Im saving the models every week with pickle , can I just fit the previously saved model on new records. Is this an efficient methodology …

Topic: sql pandas predictive-modeling databases machine-learning

Category: Data Science

Decision Tree taking too long to execute

spectre

2021年12月1日 09:46

I am training a Decision Tree Regressor on a relatively small data. The dimensions of my train and test sets are (34164, 10) and (8514, 10). Here is the relevant code: y = np.log(data2['price']) data2.drop(['price'], axis = 1, inplace = True) num_cols = [cname for cname in data2.columns if data2[cname].dtype in ['int64', 'float64']] cat_cols = [cname for cname in data2.columns if data2[cname].dtype == 'object'] num_trans = SimpleImputer(strategy = 'mean') cat_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 'most_frequent')), ('onehotencode', OneHotEncoder(handle_unknown = …

Topic: decision-trees cross-validation databases

Category: Data Science

Microsoft Access Partial Unique Index

vicatcu

2021年10月25日 00:21

In many databases (MongoDB comes to mind) there's a way to specify a partial unique index, which expresses the sentiment: "Please make sure no two records in this table are duplicates with respect to this set of fields, as long as this condition on the record holds true (otherwise don't consider this record in the uniqueness constraint)." Does Microsoft Access have a way of expressing this kind of a constraint?

Topic: indexing databases

Category: Data Science

Connect Orange 3.20 to postgresql database

oppia04

2021年9月20日 23:14

I installed orange 3.20 on windows 7. It works so far, the problem is connecting it to a server-based Postgres database. While the connection can be made at the moment, when you try to load a table the message "missing extension quantile" comes up. A few problems are coming up with this message. It seems like it is not possible to install this extension on a windows server without a lot of stress. The extension seems not to be actual …

Topic: orange3 orange databases

Category: Data Science

Do I need to read an entire database for a recommendation system?

Dani

2021年7月24日 18:51

Let's say I have a database with approx 100000 rows. I want to build a content-based recommendation system. Do I really need to read the entire database to calculate similarity? That would be very expensive to do it hosted on AWS, Azure, etc. Additionally, my data is always changing (new data being added, old removed), so I can't just use a constant file. Is there a more cost-effective way?

Topic: cloud nlp recommender-system databases

Category: Data Science

Best image recognition API to implement for eCommerce Lifestyle/Sculpture site

jummypho

2021年7月16日 11:50

I'm planning an eCommerce site currently. We are likely running WooCommerce and looking to implement Algolia for our search features. We feel that for our particular purposes, a visual search would be a crucial feature to implement, due to our product types. For the purpose of my question, I will use the example of sculptures and ceramics, with various forms both abstract and utilitarian, textures, colors, and so forth. The idea is a customer can upload a photo of their …

Topic: training image-recognition software-recommendation databases machine-learning

Category: Data Science

What is the ideal database that allows fast cosine distance?

G4bri3l

2021年7月1日 14:26

I'm currently trying to store many feature vectors in a database so that, upon request, I can compare an incoming feature vector against many other (if not all) stored in the db. I would need to compute the Cosine Distance and only return, for example, the first 10 closest matches. Such vector will be of size ~1000 or so. Every request will have a feature vector and will need to run a comparison against all feature vectors belonging to a …

Topic: feature-extraction databases

Category: Data Science

How to work with hundreds of CSVs with millions of rows in each?

rick458

2021年6月2日 22:45

So I'm doing a project on the COVID-19 Tweets dataset from the IEEE port and I plan to analyse the tweets over the time period from March 2020 till date. The thing is there's more than 300 CSVs for each data with each having millions of rows. Now I need to hydrate all of these tweets before I can go and filter through them. Hydrating just 1 CSV alone took more than two hours today. I wanted to know if …

Topic: text-filter csv sentiment-analysis databases machine-learning

Category: Data Science

query category wise dara

Nikhil Ghadi

2021年5月17日 06:53

I have stored the some products and there respective category in sql. Like this. Now I want fetch all product from table with respective category. output should be=>

Topic: sql dataset data-cleaning databases data-mining

Category: Data Science

How can I create a table from an existing table in SQL but using cells from the old table as columns in the new table?

PlatinumMaths

2021年4月19日 23:13

I have a table, and I want to create a new table such as the one below (from the table above) In SQL, I tried using the following commands. I am able to generate a table with only one column like this, CREATE TABLE table2 AS SELECT balance FROM table1 WHERE balance='currency' But if I try to do multiple WHERE clause's it doesn't seem to work. I tried to do, CREATE TABLE table2 AS SELECT balance, category FROM table1 WHERE …

Topic: sql databases

Category: Data Science

DBMS or Software for privacy sensitive data

user86825

2021年4月1日 23:05

We have a dataset of very privacy sensitive people data and want to build a database with it. The data protection department in our company doesn't like the idea that the data scientists are able to see any data specific to a person (even if anonymized). We can't preaggregate the data in the database because there are hundreds of different possible aggregations that could be interesting. Is there a software or DBMS that could ensure that users can only query …

Topic: privacy databases

Category: Data Science

About