Using Python Pandas dataframe to read and insert data to Microsoft SQL Server
This article is originally published at https://tomaztsql.wordpress.com
In the SQL Server Management Studio (SSMS), the ease of using external procedure sp_execute_external_script has been (and still will be) discussed many times. But the reason for this short blog post is the fact that, changing Python environments using Conda package/module management within Microsoft SQL Server (Services), is literally impossible. Scenarios, where you want to build a larger set of modules (packages) but are impossible to be compatible with your SQL Server or Conda, then you would need to set up a new virtual environment and start using Python from there.
Communicating with database to load the data into different python environment should not be a problem. Python Pandas module is an easy way to store dataset in a table-like format, called dataframe. Pandas is very powerful python package for handling data structures and doing data analysis.
Loading data from SQL Server to Python pandas dataframe
This underlying task is something that every data analyst, data engineer, statistician and data scientist will be using in everyday work. Extracting data from Microsoft SQL Server database using SQL query and storing it in pandas (or numpy) objects.
With following code:
## From SQL to DataFrame Pandas import pandas as pd import pyodbc sql_conn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server}; SERVER=SQLSERVER2017; DATABASE=Adventureworks; Trusted_Connection=yes') query = "SELECT [BusinessEntityID],[FirstName],[LastName], [PostalCode],[City] FROM [Sales].[vSalesPerson]" df = pd.read_sql(query, sql_conn) df.head(3)
you will get the first three rows of the result:
Make sure that you configure the SERVER and DATABASE as well as the credentials to your needs. If you are running older version of SQL Server, you will need to change the driver configuration as well.
Inserting data from Python pandas dataframe to SQL Server
Once you have the results in Python calculated, there would be case where the results would be needed to inserted back to SQL Server database. In this case, I will use already stored data in Pandas dataframe and just inserted the data back to SQL Server.
First, create a table in SQL Server for data to be stored:
USE AdventureWorks; GO DROP TABLE IF EXISTS vSalesPerson_test; GO CREATE TABLE vSalesPerson_test( [BusinessEntityID] INT ,[FirstName] VARCHAR(50) ,[LastName] VARCHAR(100))
After that, just simply run the following Python code:
connStr = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server}; SERVER=SQLSERVER2017; DATABASE=Adventureworks; Trusted_Connection=yes') cursor = connStr.cursor() for index,row in df.iterrows(): .. cursor.execute("INSERT INTO dbo.vSalesPerson_test([BusinessEntityID], [FirstName],[LastName]) values (?, ?,?)", row['BusinessEntityID'], row['FirstName'], row['LastName']) .. connStr.commit() cursor.close() connStr.close()
*Python indentation might be broken; use github file.
And the data will be inserted in SQL Server table:
As always, sample code is available at Github.
Happy coding!
Thanks for visiting r-craft.org
This article is originally published at https://tomaztsql.wordpress.com
Please visit source website for post related comments.