data-formats

Converting data format

Leonardo Becarelli

2022年5月16日 04:07

I'm trying to use the recent COVID-19 data from the site of Italian Civil Protection, but they use a rather complicated time format that I'm finding troublesome as a novice to plot as data in a graph. This is how the data is presented: [1] 2020-02-24T18:00:00 2020-02-25T18:00:00 2020-02-26T18:00:00 2020-02-27T18:00:00 2020-02-28T18:00:00 2020-02-29T18:00:00 and I would like to use the format as DD-MM, without the time and the year. How can I do it?

Topic: rstudio data-formats dataset r

Category: Data Science

R code making 1 column into multiple columns with their unique ID

codingc0nfusions

2022年5月3日 23:55

Currently stuck on a data wrangling question in R. So far I've tried variations of this code using tidyverse package, columns 5 and 6 here were the rating and the user: df[,5:6] %>% pivot_wider(names_from = question, values_from = rating, names_sep = ".") %>% unnest(cols = everything())-> df_reformat Each column will be the question ID and the rows are the scores for each user, ideally clustered by group. Data structure needed: repID user Customer question 1 Customer question 2 .... Customer …

Topic: dplyr data-wrangling data-formats data-cleaning r

Category: Data Science

Storing Large dataset for processing and analysis of data

user14519285

2022年4月4日 03:02

I am new to data engineering and wanted to know , what is the best way to store more than 3000 GB of data for further processing and analysis ? I am specifically looking for open source resources . I have explored many data formats for storage . The dataset that I want to store is a heart rate pulse data generated by a sensor.

Topic: data-engineering data-analysis data-formats dataset processing

Category: Data Science

Python: convert variables into correct format for DataFrame

newbieeeee

2022年3月30日 20:01

I have 3 variables that I would like to use to build my dataset but since they are in a weird shape/format, I had no success so far. I'm quite new to this and really appreciate any help!! The 3 variables I have are: print(newspaper) ['Bolero'] ['Schweizer Illustrierte Style'] ['Bolero'] print(title) ['Schönheit und Tragik'] ['magie pur'] ['Das sind unsere Favoriten'] print(pubDate) ['2007-01-01'] ['2007-01-01'] ['2007-01-01'] It seems to like all variables are a list of lists, but I'm not quite sure. …

Topic: dataframe data-formats python

Category: Data Science

Running a query in R after establishing dbconnect

ColRow

2022年3月29日 14:01

I do not seem to figure out what is wrong it the following statement. The connection to the DWH is established but the query statement in R seems not to work, with the following error : LR=dbGetQuery(con, "select id as ID, date_c."Professional_Status" as Prof_Status, case when talk_sec >= 5 then 1 else 0 end as Established_Connection from id_collect as id_c left join date_conncet as date_c on id_c.date=date_c.date where date::date = '2018-01-19' and country = 'IT' and type = 'shop' and …

Topic: sql data-formats r

Category: Data Science

How to drop the previous rows of a database based on a matching value in a column?

DaddyMuffin

2021年12月9日 03:23

So I am currently trying to sort through a data frame containing attribute classes and values of teams. However, my data has multiple rows of different classes and values of the same Team ID/Attribute ID. I was wondering if there was a faster way to get just the last row of each of the same Team IDs/Attribute IDs.

Topic: colab data-formats pandas dataset python

Category: Data Science

Date time conversion in a CSV column

RocketBlaster05

2021年12月1日 14:25

I am new to data science. I am attempting to write a program using regression techniques, and all of my values are numerical, except for the date and time (UTC), which are written in this format: HH:MM:SS MM/DD/YY. The date and time are a part of a CSV file and I do not know how to alter the column. I have looked around for how to convert this to a numerical value, but all the results put the date before …

Topic: dataframe data-formats python

Category: Data Science

Advantage of a treebank in XML format

Ahmad

2021年7月13日 01:00

Which treebanks are based on an XML format? What is the advantage of XML format for a treebank? I think it may have effects on annotation and querying the treebank. for example LASSY and Alpino or TIGER are in xml format.

Topic: data-formats nlp

Category: Data Science

What are the advantages of HDF compared to alternative formats?

IharS

2021年7月2日 00:43

What are the advantages of HDF compared to alternative formats? What are the main data science tasks where HDF is really suitable and useful?

Topic: hierarchical-data-format data-formats

Category: Data Science

Is there any way to analyze the format of text strings?

cosmarchy

2021年5月7日 05:24

I have a lot of data which basically consists of alphanumeric text on individual lines which can very in length and contain delimiters. Since there are many thousands of lines of text, I'm looking to see whether there is an automated way to determine the different formats of text. A sample of which is: 90665013-163 90731046-103 90840069-009 90847069-009 90880046-103 90889046-103 90897-051 9089744-103 9089844-103 90901-46909 90901-lep 9091046-103 9091046-909 90764046-1037 can10043E can90065-op016 9094344-103 90669j4-4438718 90666ie79 90664046-103 90710-077 004-919 4A1900935 can90064-op016 can90066-E016 9094544-103 …

Topic: data-analysis data-formats

Category: Data Science

How to store efficiently very large sparse 3D matrices

hH1sG0n3

2021年5月6日 05:03

To train a CNN, I have stacked arrays of images over observations [observations x width x length]. The dataset is very sparse ($95\%$). What would be an efficient way of storing these matrices efficiently in terms of format (e.g. pickle, parquet) structure (e.g. scipy.sparse.csr_matrix, List of Lists)

Topic: cnn data-formats bigdata

Category: Data Science

NCHW vs NHWC in Machine Learning

CovertKoala

2021年2月12日 22:05

As I've been introducing myself to the various deep learning frameworks, I've noticed a difference in the default placement of channels for images. Is there a substantial difference between NCHW vs NHCW layout? Why would I choose one over the other?

Topic: convolutional-neural-network computer-vision data-formats

Category: Data Science

What is the most used format to save data with type information

Pieter

2021年2月10日 17:50

I am exporting data from an SQL database and importing it into R. This is a two step process since I first (automatically) download the data to a hard drive and then import the file with R. Currently, I am using csv files to save the data. Everybody supports csv. But csv does not support type information. This makes it sometimes cumbersome to load a csv file because I must check all the column types. This seems unnecessary because the …

Topic: data data-formats databases

Category: Data Science

Connecting Infusionsoft data to Google data studio

Pravin

2020年8月25日 18:10

I want to create a Google Data studio dashboard from Infusion soft data. The main problem are the connectors - there are multiple tools that provide direct connectors but they are paid solutions like Klipfolio, Clicdata, Grow etc. If a direct connection is not possible, I want to use some combination of Google sheets and Zapier or other free tools to create a data flow that can be constantly refreshed for data coming in from "infusionsoft" to "Google data studio" …

Topic: google-data-studio data data-formats google

Category: Data Science

ValueError: could not convert string to float: '��'

cappy0704

2020年1月27日 17:23

I have a (2M, 23) dimensional numpy array X. It has a dtype of <U26, i.e. unicode string of 26 characters. array([['143347', '1325', '28.19148936', ..., '61', '0', '0'], ['50905', '0', '0', ..., '110', '0', '0'], ['143899', '1325', '28.80434783', ..., '61', '0', '0'], ..., ['85', '0', '0', ..., '1980', '0', '0'], ['233', '54', '27', ..., '-1', '0', '0'], ['��', '�', '��', ..., '�', '��', '��']], dtype='<U26') When I convert it to a float datatype, using X_f = X.astype(float) I get the …

Topic: dataframe csv data-formats python

Category: Data Science

Best file format for transfer of EHR data

user0

2019年10月24日 17:38

I am working on a clinical trial where we have several sites sending us EHR data. The sites are currently sending the data in excel files. I have a feeling someone's opening the files because 3 of the files have 64,999 rows exactly, and excel 2007 cuts off at 65,000. I am working in python, but I am trying to prevent the people at the local sites from opening the files in excel. What's the best format for the files …

Topic: data data-formats

Category: Data Science

Containing multicomponent data in rows or columns

hko

2019年8月17日 07:01

I have been working with DNA sequences and compiled a table with features from those sequences. I have a column called Trimer, which contains strings. For some DNA sequences there is one trimer of interest so that column contains one 3 character string (i.e. "ATG"). For other rows in the table that trimer column has 2 or 3 trimers of interest so the Trimer column has multiple strings in it (i.e. "ATT, CTG, GAT"). All trimers from one sequence should …

Topic: preprocessing data-formats data-cleaning

Category: Data Science

Getting stock data in a discipline manner from Yahoo finance

coding_ninza

2019年7月19日 14:57

I used the below code for downloading stock data from yahoo finance:- import yfinance as yf import datetime stocks = ["AXISBANK.NS", "HDFCBANK.NS", "ICICIBANK.NS" ,"INDUSINDBK.NS", "KOTAKBANK.NS", "SBIN.NS", "YESBANK.NS"] start = datetime.datetime(2018,1,1) end = datetime.datetime(2019,7,17) data = yf.download(stocks, start=start, end=end) data I get the data in a below manner:- I saved the data using panda:- import pandas as pd df = pd.DataFrame(data) # saving the dataframe df.to_csv('BANKING STOCK.csv') I got the data in this format:- But I ant my data in this …

Topic: csv data-formats pandas python

Category: Data Science

.h5 file format does not close properly

Fatemeh Asgarinejad

2019年5月27日 02:06

import h5py #added hf = h5py.File('../images.h5', 'w') #added hf.close() #added h5_file = tables.open_file("images.h5", mode="w") I also tried: h5py.File.close(hf) the error that pops up in both cases is: ValueError: The file 'restricted_images.h5' is already opened. Please close it before reopening in write mode. I've also tried: if isinstance(obj, h5py.File): # Just HDF5 files obj.close() while In[]: hf Out[]: <Closed HDF5 file> , the file is not closed yet.

Topic: data-formats python

Category: Data Science

Labeling data as having an error?

David LeBauer

2019年5月10日 20:55

I am curating a large quantity of data from different sensors. If I know that a particular sensor was broken or poorly calibrated for a particular time range, what would be a useful way of annotating the data to make it clear that the data are of poor quality and / or have known errors? I am thinking a set of key:value pairs (like quality:error, description:'sensor was broken') that I can store in json, yaml, image header (e.g. exif) metadata …

Topic: data-formats metadata

Category: Data Science

About