Scraping mixed elements and passing to SQL

I'm running webscrapes via python which are retrieving data from csv's hosted on the web. I'd like to pass the data into a MSSQL database. An issue I have is the mixed elements/data types in the csv. Here is an example of the data

Item    Val1    Val2
A       100      200
B       101      201
C       Null    -2/2(%)
D       Null    2019-Nov-18

I would like to import all of this data into the db, but the critical data is in the "Val2" column. It contains numbers (formatted as numbers and general), dates, and strings with numbers as above.

I would appreciate your thoughts on two things:

  1. After I get this data into python, whats the best way/package to use to send it to SQL
  2. Do I need to set up the SQL table in a certain way to handle the mixed elements? Should I filter those different values out with python and put them somewhere else?

For 1, I've looked at things like pd.to_sql and iterating over rows and sending via pyodbc. Data won't be particularly large to begin with. I'll be setting up a lot of scrapes so just want some ideas on the "best" way to do this transfer so I don't have to re-design them all in the future

Thanks I know this is open ended but would love to hear any thoughts on the "right" way to do this

Topic web-scraping python-3.x sql

Category Data Science


One way would be to declare that column as a character type. I think this would be a poor solution as it forces all interpretation and processing of the data back onto the application. Better would be to declare Val2 as datatype sql_variant. At least then the DBMS has a fighting chance of interpreting the value correctly.

Since each of those datatypes represents a different attribute with unique semantics it would be better to separate them into their own columns. SQL Server has the TRY_PARSE function which allows you to discover the datatype of each value and assign it to its proper column.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.