Python / R News / Visualization

Using Python with Microsoft Reporting Services (SSRS)

by tomaztsql · November 19, 2018

This article is originally published at https://tomaztsql.wordpress.com

Using Python with SQL Server 2017 SSRS should not be an obstacle, since the framework and the technology is the same as with R language. With SQL Server 2017, Python got a full and functional support for native SSRS.

The usability and functionality of Python is simply immense. This blog post will be dedicated on using Python for data science in SSRS

Most of the samples will be derived from the Adventure-Works, both SQL Server sample datasets are available at Microfost Github.

Selecting Python results with SSRS

External procedure sp_execute_external_script within SQL Server database using Python language does not need to have the ODBC drivers or python modules like pyodbc or sqlachemy for extracting or writing data between Sql server engine and python engine, the only module needed is Python pandas, since the communication between sql server requires data frames for exposing data to python (from sql server), respectively.

-- Check Python runtime
EXECUTE sp_execute_external_script
       @language =N'Python'
      ,@script=N'OutputDataSet = InputDataSet'
      ,@input_data_1 = N'SELECT 1 AS result'
WITH RESULT SETS
((
result INT
))
GO

Please note: Python code, as specified with @script parameter, must be validated and is sensitive to indentation. So single missing (or excessive) indent might cause the complete query to fail.

We will introduce the input parameters for better python data selection. At the same time, this parameter will be used as input parameter in SSRS. T-SQL query with python script:

-- query parametrization
DECLARE @MaritalStatusIN CHAR(1) = 'S' -- S/M Single/Married

EXECUTE sp_execute_external_script
@language =N'Python',
@script = N'
import pandas as pd
df = InputDataSet
df_gender = df[''MaritalStatus''] == MaritalStatus
df_gen = df[df_gender]
correlation = df_gen.corr(method=''pearson'')
OutputDataSet = pd.DataFrame(correlation, columns=["nof","age"])',
@input_data_1 = N'SELECT 
               COUNT(*) AS nof,MaritalStatus,age 
               FROM AdventureWorksDW2014.dbo.vTargetMail
               WHERE age < 100
               GROUP BY maritalstatus,age'
,@params = N'@MaritalStatus CHAR(1)'
,@MaritalStatus = @MaritalStatusIN

WITH RESULT SETS 
(( 
   CountObs FLOAT
  ,Age FLOAT 
));
GO

For the purposes of SSRS, I will encapsulate the upper code into stored procedure with one input parameter – that is Marital status. This same parameter will be propagated into python script.

EXEC SelectingPythonResults
      @MaritalStatusIN = 'S'

With results presented through SSRS with option of selecting the values.

Predicting and scoring with Python in SSRS

In addition to selecting filtered parameters in SSRS, we can also check how our prediction model is working.

For this manner, we will create a slightly different subset and we will try to predict the number of cars owned by the customers in the dataset, based on the age and the daily commute distance. Query for the dataset will the following:

--- Query for scoring the model
SELECT 
   age 
  ,NumberCarsOwned
  ,CAST(REPLACE(LEFT(CommuteDistance,2),'-','') AS TINYINT) as CommuteDistance
FROM AdventureWorksDW2014.dbo.vTargetMail
WHERE
    age < 100

We will run a simple linear regression with dependent (y) variable = NumberCarsOwned and age + CommuteDistance both as X variables. We will store couple of different versions of prediction model in database, trained on different size of training subsets; 20%, 30%, 40% and 50% of the original dataset for training purposes. All four different models will be serialized and stored in a table.

Let’s create a table to store serialized models:

-- creating table for storing models
CREATE TABLE PredictingWithPy_models
(model VARBINARY(MAX)
,modelName VARCHAR(100)
,trainSize FLOAT
)

Now, we will use Scikit-learn python module to train the model based on different subsets of original dataset.

CREATE OR ALTER PROCEDURE [dbo].[RunningPredictionsPy] 
(
  @size FLOAT --- format: 0.3 or 0.4 or 0.5 
 ,@name VARCHAR(100) 
 ,@trained_model varbinary(max) OUTPUT
)
AS
BEGIN
EXEC sp_execute_external_script
  @language = N'Python'
 ,@script = N'
import numpy as np
import pandas as pd
import pickle
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

##Create SciKit-Learn linear regression model
X = df2[["age", "CommuteDistance"]]
y = np.ravel(df2[["nofCars"]])
name = name
##Create training (and testing) variables based on test_size
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=size)

## CreateLinear Model
SKL_lr = LinearRegression()
linRegObj = SKL_lr.fit(X_train, y_train)

##Serialize model
trained_model = pickle.dumps(linRegObj)'

 ,@input_data_1 = N'SELECT * FROM vTargetMail'
 ,@input_data_1_name = N'df2'
 ,@params = N'@trained_model varbinary(max) OUTPUT, @size FLOAT, @name VARCHAR(100)'
 ,@trained_model = @trained_model OUTPUT
 ,@size = @size
 ,@name = @name
END;
GO

And executing this procedure to store trained models in previously created table:

DECLARE @model VARBINARY(MAX);
EXEC [RunningPredictionsPy] 
    @size = 0.2  
   ,@name = 'Ln_20'
   ,@trained_model = @model OUTPUT
INSERT INTO PredictingWithPy_models (model, modelName, trainSize) 
VALUES(@model, 'Ln_20', 0.2);
GO

We repeat, so that at the end, we end up with percentage for 20, 30, 40 and 50; complete list is available at GitHub. At the end, the table with stored models should hold 4 lines with serialized models.

Now that we have trained the models, we want to predict the results, using SSRS.

For this manner, we will need another stored procedure, that will have X variables (independent variables: age and Commutedistance) as input parameters, besides selected model and have the prediction returned (Y variable – Number of cars owned).

Following procedure will be used in SSRS:

-- CREATE Procedure to predict the number of cars owned
CREATE OR ALTER PROCEDURE [dbo].[RunningPredictionWithValesPy] 
(
   @model varchar(100)
  ,@age INT
  ,@commuteDistance INT
)
AS
BEGIN

DECLARE @modelIN VARBINARY(MAX) = 
    (SELECT model FROM PredictingWithPy_models WHERE modelName = @model)
-- Create a T-SQL Query 
DECLARE @q NVARCHAR(MAX) = N'SELECT '+CAST(@age AS VARCHAR(5))+' AS age, 
         '+ CAST(@commuteDistance AS VARCHAR(5))+' AS CommuteDistance'

-- Store the new values for prediction in temp table
DROP TABLE IF EXISTS #t
CREATE TABLE #t (age INT, CommuteDistance INT)
INSERT INTO #t (age, CommuteDistance)
EXEC sp_executesql @q

EXEC sp_execute_external_script
@language = N'Python'
,@script = N'
import pickle
import numpy as np
import pandas as pd
from sklearn import metrics

##Deserialize model
mod = pickle.loads(modelIN)
X = InputDataSet[["age", "CommuteDistance"]]

##Create numpy Array when you introducte more values at the same time (bulk prediction)
predArray = mod.predict(X)
#predList = []
#for i in range(len(predArray)):
#     predList.append((predArray[i])[1])

OutputDataSet = pd.DataFrame(data = predArray, columns = ["predictions"])' 
  ,@input_data_1 = N'SELECT * FROM #t'
  ,@input_data_1_name = N'InputDataSet'
  ,@params = N'@modelIN varbinary(max)'
  ,@modelIN = @modelIN
WITH RESULT SETS 
((
prediction_Score FLOAT
));
END
GO

Checking for the procedure to work fine, the following T-SQL query will run the predictions against new dataset:

EXEC [RunningPredictionWithValesPy]
   @model = 'Ln_30'
  ,@age = 44
  ,@commuteDistance = 1

We can integrate the procedure into reporting services (SSRS) and have the combination of the input values displayed with the selected model (size of test set):

with the results of predictions in SSRS or in SSMS; both return same results:

All reports featured here will also be available at GitHub.

Visualizing results with Python in SSRS

As we have created four different models, we would also like to have the accuary of the model visually represented using SSRS.

Showing plots created with Python might not be as straight forward, as with R Language.

Following procedure will extract the data from database and generate plot, that can be used and visualized in SSRS.

CREATE OR ALTER PROCEDURE [dbo].[VisualizeWithPyR2] 
(
     @inputVariable VARCHAR(100)
)
AS
BEGIN

DECLARE @q NVARCHAR(MAX) = N'SELECT '+CAST(@inputVariable AS VARCHAR(50))+' AS val1 FROM vTargetMail'
-- Store the values in temp table
DROP TABLE IF EXISTS #t
CREATE TABLE #t (val1 FLOAT)
INSERT INTO #t (val1)
EXEC sp_executesql @q

EXEC sp_execute_external_script
  @language = N'Python'
  ,@script = N'
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(12, 5))
plt.plot(plotdataset)
fig.savefig(''C:\\\PyGraphs\\firstGraph.png'') 
OutputDataSet = pd.DataFrame(data =[1], columns = ["plot"])' 
,@input_data_1 = N'SELECT * FROM #t'
,@input_data_1_name = N'plotdataset'
WITH RESULT SETS 
((
    plot INT
));
END
GO

So far, I have found this as to be the simplest way – saving the plot on same server, where SSRS is running and exposing the disk path to SSRS. Python stores the graph to disk on the host / client / server with following command:

fig = plt.figure(figsize=(12, 5))
plt.plot(plotdataset)
fig.savefig(''C:\\\PyGraphs\\firstGraph.png'')

Resulting in SSRS as:

In Report builder, one additional setting worth mentioning is the image properties, where the path to the saved plot should be the same as the one presented in Python code. Savefig function in python is pointing to same location as the external source for this image setting, with function as: file://C://PyGraphs//firstGraph.png.

Evaluating all four linear regression models, based on the size of the test sample size, can be calculated using R2 and accompanying python code:

from sklearn.metrics import r2_score
def R2EvalMod(test_sizeIN):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_sizeIN)
    SKL_lr = LinearRegression()
    linRegObj = SKL_lr.fit(X_train, y_train)
    predArray = linRegObj.predict(X_test)
    R2 = r2_score(y_test, predArray) 
    return R2

sql_conn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER=TOMAZK\\MSSQLSERVER2017;DATABASE=SQLPY;Trusted_Connection=yes') 
query3 = '''
SELECT trainSize FROM [dbo].[PredictingWithPy_models]
'''
df3 = pd.read_sql(query3, sql_conn)

R2List = []
for i in range(len(df3)):
    z = df3.at[i, 'trainSize']
    R2=R2EvalMod(z)
    R2List.append(([z],[R2]))
 
labels = ['trainsize', 'R2']
df33 = pd.DataFrame.from_records(R2List, columns=labels)

With the fact that, better the model the higher R2 is – between 0 and 1 (also negative -1 to 0), showing that the model R2 is best when using 30% of dataset for testing and 70% for training.

As always, code is available at GitHub.

Happy Pythoning.

Thanks for visiting r-craft.org
This article is originally published at https://tomaztsql.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Using Python with Microsoft Reporting Services (SSRS)

You may also like...

Categories

Using Python with Microsoft Reporting Services (SSRS)

Selecting Python results with SSRS

Predicting and scoring with Python in SSRS

Visualizing results with Python in SSRS

You may also like...

R or Python

No surprises: More people tweet more. Visualizing twitter counts during election day.

Feeling the rstudio::conf ❤️

Categories