I have a Python Flask application that connects to an Azure Cloud SQL Database, and uses the Pandas read_sql method with SQLAlchemy to perform a select operation on a table and load it into a dataframe. recordsdf = pd.read_sql(recordstable.select(), connection) The recordstable has around 5000 records, and the function is taking around 10 seconds to execute (I have to pull all records every time). However, the exact same operation with the same data takes around 0.5 seconds when I'm selecting …
What I am wanting to do is ensure that every property in a knowledge base comes from at least one source. I would like to ensure that every edge is spawned (or at least explained) by some event, like a "claim" or "measurement" or "birth." I'd like to rate on a scale the confidence that some property is correct, which could also be inherited from the source's confidence rating. Finally, I want to ensure that effective date(s) are known or …
I'm in charge of setting up a patient register (100K+ patients) for a non-profit project with little money. This register should provide the basis for later datascience. I'm not sure how a good database solution can work the long run. It must be possible for various clinics to enter the data manually into the system. Since I have experience with Django I have developed a webapp prototype with Django and an SQLite DB (it is not expected that many users …
We work on a dataset with >1k features, where some elements are temporal/non-linear aggregations of other features. e.g., one feature might be the salary s, where the other is the mean salary of four months (s_4_m). We try to predict which employees are more likely to get a raise by applying a regression model on the salary. Still, our models are extremly biased toward features like s_4_m, which are highly correlated to the label feature. Are there best practices for …
In order to build a classifier, I need to extract a few features from the data stored on a MySQL database. I need to join multiple tables and it is taking a lot of time. I have joined 2 tables at one time and have got results in multiple cases. I need to combine them. Writing a script will be the best option? How do people extract features from large relational databases? Am I missing something? Thanks.
I came from a software development background and we have separate servers of the same database (dev, test, prod). The reason for this is because we develop our apps against the dev DB, run tests against the Test DB, and prod is prod. This is so we create a clear separation and won't bring down prod trying to build our app. Do you guys train your models the same way? Have 3 environments of the same database and as your …
I tried to install Basemap and it gives me this: preparing transaction: done verifying transaction: done executing transaction: failed ERROR conda.core.link:_execute(507): An error occurred while uninstalling packag e 'defaults::conda-4.5.12-py37_0'. PermissionError (13, Access is denied) Attempting to roll back Rolling back transaction: done PermissionError (13, Access is denied) Question: What should I do next? I will appreciate your response as I have been on this for some time now. Thanks. NOTE: I have also tried to install cartopy but I ran …
I plan to do data modeling in the financial area for my master's dissertation. I am thinking of finding the connection between a certain company or country characteristics ( x values) and their creditworthiness (here I am still looking for a y variable such as credit score, bankruptcy occurrence, etc.). Do you know in which databases I could find required input_ Company data would be great, however, most likely country data might be more accessible then I could also do …
This is a usual situation I meet recently that customers gave me a database with many tables they don't quite understand too, then ask me to make a model predict the future revenue, classify which user may be valuable or something else. To be honest, extracting useful data from an unknown database made me exhausted. For example, I need to figure out which table is the user table, product table, or transaction table ... which column can use to join(there …
I have a mysql database in which new records are added every day to raw data. This raw data is cleaned and a ML model is trained with it once a week. What should be the best strategy to capture new data in model without fetching entire records( old & new) and retraining from scratch. Im saving the models every week with pickle , can I just fit the previously saved model on new records. Is this an efficient methodology …
I am training a Decision Tree Regressor on a relatively small data. The dimensions of my train and test sets are (34164, 10) and (8514, 10). Here is the relevant code: y = np.log(data2['price']) data2.drop(['price'], axis = 1, inplace = True) num_cols = [cname for cname in data2.columns if data2[cname].dtype in ['int64', 'float64']] cat_cols = [cname for cname in data2.columns if data2[cname].dtype == 'object'] num_trans = SimpleImputer(strategy = 'mean') cat_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 'most_frequent')), ('onehotencode', OneHotEncoder(handle_unknown = …
In many databases (MongoDB comes to mind) there's a way to specify a partial unique index, which expresses the sentiment: "Please make sure no two records in this table are duplicates with respect to this set of fields, as long as this condition on the record holds true (otherwise don't consider this record in the uniqueness constraint)." Does Microsoft Access have a way of expressing this kind of a constraint?
I installed orange 3.20 on windows 7. It works so far, the problem is connecting it to a server-based Postgres database. While the connection can be made at the moment, when you try to load a table the message "missing extension quantile" comes up. A few problems are coming up with this message. It seems like it is not possible to install this extension on a windows server without a lot of stress. The extension seems not to be actual …
Let's say I have a database with approx 100000 rows. I want to build a content-based recommendation system. Do I really need to read the entire database to calculate similarity? That would be very expensive to do it hosted on AWS, Azure, etc. Additionally, my data is always changing (new data being added, old removed), so I can't just use a constant file. Is there a more cost-effective way?
I'm planning an eCommerce site currently. We are likely running WooCommerce and looking to implement Algolia for our search features. We feel that for our particular purposes, a visual search would be a crucial feature to implement, due to our product types. For the purpose of my question, I will use the example of sculptures and ceramics, with various forms both abstract and utilitarian, textures, colors, and so forth. The idea is a customer can upload a photo of their …
I'm currently trying to store many feature vectors in a database so that, upon request, I can compare an incoming feature vector against many other (if not all) stored in the db. I would need to compute the Cosine Distance and only return, for example, the first 10 closest matches. Such vector will be of size ~1000 or so. Every request will have a feature vector and will need to run a comparison against all feature vectors belonging to a …
So I'm doing a project on the COVID-19 Tweets dataset from the IEEE port and I plan to analyse the tweets over the time period from March 2020 till date. The thing is there's more than 300 CSVs for each data with each having millions of rows. Now I need to hydrate all of these tweets before I can go and filter through them. Hydrating just 1 CSV alone took more than two hours today. I wanted to know if …
I have stored the some products and there respective category in sql. Like this. Now I want fetch all product from table with respective category. output should be=>
I have a table, and I want to create a new table such as the one below (from the table above) In SQL, I tried using the following commands. I am able to generate a table with only one column like this, CREATE TABLE table2 AS SELECT balance FROM table1 WHERE balance='currency' But if I try to do multiple WHERE clause's it doesn't seem to work. I tried to do, CREATE TABLE table2 AS SELECT balance, category FROM table1 WHERE …
We have a dataset of very privacy sensitive people data and want to build a database with it. The data protection department in our company doesn't like the idea that the data scientists are able to see any data specific to a person (even if anonymized). We can't preaggregate the data in the database because there are hundreds of different possible aggregations that could be interesting. Is there a software or DBMS that could ensure that users can only query …