metadata

Reduce MNIST dataset

Salman Al-haddad

2022年4月14日 03:29

I am working on the MNIST dataset. How I can reduce 50% of this data? (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

Topic: mnist jupyter metadata python machine-learning

Category: Data Science

New boss doesn't like CSVs because "they don't keep the metadata" and prefers RDA and RDATA. What do they mean by "metadata"?

John

2022年3月18日 23:27

I have a background in GIS and am just learning data science using programming languages. Specifically, I am focusing on learning Python to complement my bosses knowledge of R (we are a new department and there are just 3 of us). In my previous research, I always either used ArcGIS coupled with data manipulation of CSVs in Excel for my research. My boss says they don't use CSVs because it doesn't maintain the metadata, and most of their files are …

Topic: metadata

Category: Data Science

Regression on multiple datasets with a per-dataset variable

Pippo

2021年4月6日 07:42

I have 10 datasets, each with the same variables (e.g., age and income) but different numbers of observations. Let us now consider a categorical variable $X$ that can only take values $0$ and $1$ per dataset, meaning that it keeps the same value for all observations. For 5 datasets, $X=0$; for the other 5, $X=1$. How do I create a regression model for a variable of these datasets (e.g., age) that takes into account this "meta-variable" $X$? A simple solution …

Topic: regression metadata

Category: Data Science

How to work with input which is a combination of metadata+ vectorized text data + image pixel data to build a Regression Model (predict views)?

Mathew

2021年3月23日 14:57

There are 4 datasets (all in csv format), each has a uniqueID column by which each record can be identified. Image and text datasets are dense datasets.(need to be converted to ndarray). Can someone suggest how to use all these 4 datasets for building a regression model? This is how the datasets look, Metadata having some input features and target variable(views) uniqueID ad_blocked embed duration language hour views 1 True True 68 3 10 244 2 False True 90 1 …

Topic: sparse regression metadata nlp python

Category: Data Science

What is meta- data and meta features?

asmgx

2021年3月10日 17:30

I want to know what is metadata and what is meant by meta features? When I google Meta Features what I get is feature selection tool called "Meta-Feature". What is the function of feature selection tools ? Also, what I want is the definition and meaning of the meta features ?

Topic: classification metadata feature-extraction feature-selection clustering

Category: Data Science

Questions about adding metadata to a CNN using keras

cdr

2021年3月8日 02:59

I have a convolutional neural network and would like to include some metadata. My metadata is in a multiple csv files that correspond to each class and it contains a bunch of geometric properties (about 8 numerical measurements), specifically revolving around size and volume that would help classification of similar looking images but have varying size and volume. I am currently using Keras to build my models. What I am unsure of is where and how to add metadata into …

Topic: keras convolutional-neural-network metadata

Category: Data Science

Error "comparison (6) is possible only for atomic and list types" encountered in metafor

Jess D

2021年1月12日 12:28

I'm very new to r and trying to run a multi-level meta analysis using pre-calculated effect sizes. The data file can be accessed via this link:testrunfile The script I used as a first step to fit the model was: res <- rma.mv (yi = es_r, v = var, data = testrun, method = "REML", level = 95, digits = 7, slab = ref, random = ~ 1 | samp_id) But I keep getting this error: Error in verbose > 2 …

Topic: metadata r

Category: Data Science

How to annotate text documents with meta-data?

Amir Ali Akbari

2020年7月2日 13:04

Having a lot of text documents (in natural language, unstructured), what are the possible ways of annotating them with some semantic meta-data? For example, consider a short document: I saw the company's manager last day. To be able to extract information from it, it must be annotated with additional data to be less ambiguous. The process of finding such meta-data is not in question, so assume it is done manually. The question is how to store these data in a …

Topic: text-mining metadata nlp data-cleaning

Category: Data Science

Visualizing community composition using network of pie charts

Zach Boyd

2019年10月18日 06:08

Given a social network, I want to perform community detection and compare the result to known node metadata, such as gender, age, etc. to see if certain communities are largely composed of "similar" people. I have seen this done before in visualizations like this: (image from https://arxiv.org/pdf/0809.0690.pdf) where each circle represents a community and the coloring of the circle shows the breakdown of some attribute (e.g. nationality) within that community. Does anyone know what tool can be used to create …

Topic: visualization social-network-analysis metadata

Category: Data Science

Automatically uses several cores on R

Guilherme Felipe Reis

2019年6月6日 04:30

I am using a library called MFE to generate meta-features. However, I am working right now with several files and I notice that I am using only 1 core of my machine and taking too much time. I have been trying to implement some libraries as I saw in another question: library(iterators) library(foreach) library(doParallel) This one, but me being dumb could not implement it ='(. I just would like to put this snippet running in all my cores so I …

Topic: meta-learning metadata r machine-learning

Category: Data Science

Labeling data as having an error?

David LeBauer

2019年5月10日 20:55

I am curating a large quantity of data from different sensors. If I know that a particular sensor was broken or poorly calibrated for a particular time range, what would be a useful way of annotating the data to make it clear that the data are of poor quality and / or have known errors? I am thinking a set of key:value pairs (like quality:error, description:'sensor was broken') that I can store in json, yaml, image header (e.g. exif) metadata …

Topic: data-formats metadata

Category: Data Science

Understanding a dataset (prior to applying ML models) with no metadata given

Aditya Kadrekar

2018年11月11日 08:43

How do you understand a dataset when there is no metadata given (no details about the attributes given in the dataset)? It is difficult to comprehend the attribute names as only the short forms are given. It's given to me that 'pm2.5' is the target variable. How do I understand which independent variables will affect this target variable?

Topic: data-analysis metadata dataset machine-learning

Category: Data Science

Finding an appropriate price using thousands of data points

Zeeshan

2018年1月2日 05:23

If i have a lot of data points describing the price of a used car. How would I find the market value of the car (assuming that the price points in the data set are the only determinant used, and the basis of determination will be the frequency [higher the frequency the better] price data point occurence for that particular car). A count of absolute value recurrences will not work, as I want to bucket numbers that are similar (less …

Topic: metadata

Category: Data Science

Recommendations for storing time series data

lmjohns3

2017年4月16日 19:30

As part of my thesis I've done some experiments that have resulted in a reasonable amount of time-series data (motion-capture + eye movements). I have a way of storing and organizing all of this data, but it's made me wonder whether there are best practices out there for this sort of task. I'll describe what I've got, and maybe that will help provide some recommendations. So, I have an experiment that requires subjects to use their vision and move their …

Topic: data csv experiments time-series metadata

Category: Data Science

Antisymmetry of graph of Information vs probability

user1825567

2017年2月18日 14:14

The formula for Information given by a data of occurring with probability p is: I=-log2 p This formula gives the bits if information needed to know the outcome of the event. This formula captures the intuition that the information needed to know the outcome of an event with probability 1 is 0 as we already know that outcome of the event. So shouldn't the formula give the information as 0 for the event with probability 0 as we know the …

Topic: probability metadata categorical-data

Category: Data Science

Meta-analysis of public 16S data

André Soares

2015年9月10日 07:09

I am trying to start a meta-analysis for which I want to extract some 16S-based information from public databases. Moreover, I want to relate this information with any metadata found in the associated studies (everything from environmental variables to sequencing details). For this, I realized some databases are available, like NCBI-Nucletotide, NCBI-SRA and EMBL-EBI-ENA, but I am not sure about which one to use or whether I can use them all. How can I filter only whole 16S sequences? Or …

Topic: bioinformatics metadata

Category: Data Science

Can metadata be used to adapt parsing for an unescaped in field use of the delimiter?

Chris Simokat

2014年5月21日 07:19

I have data coming from a source system that is pipe delimited. Pipe was selected over comma since it was believed no pipes appeared in field, while it was known that commas do occur. After ingesting this data into Hive however it has been discovered that rarely a field does in fact contain a pipe character. Due to a constraint we are unable to regenerate from source to escape the delimiter or change delimiters in the usual way. However we …

Topic: parsing metadata

Category: Data Science

About