A macro to automate the creation of indicator variables in SAS
This article is originally published at https://chemicalstatistician.wordpress.com
In a recent blog post, I introduced an easy and efficient way to create indicator variables from categorical variables in SAS. This method pretends to run logistic regression, but it really is using PROC LOGISTIC to get the design matrix based on dummy-variable coding. I shared SAS code for how to do so, step-by-step.
I write this follow-up post to provide a macro that you can use to execute all of those steps in one line. If you have not read my previous post on this topic, then I strongly encourage you to do that first. Don’t use this macro blindly.
Here is the macro. The key steps are
- Run PROC LOGISTIC to get the design matrix (which has the indicator variables)
- Merge the original data with the newly created indicator variables
- Delete the “INDICATORS” data set, which was created in an intermediate step
%macro create_indicators(input_data, target, covariates, output_data); proc logistic data = &input_data noprint outdesign = indicators; class &covariates / param = glm; model &target = &covariates; run; data &output_data; merge &input_data indicators (drop = Intercept &target); run; proc datasets library = work noprint; delete indicators; run; %mend;
I will use the built-in data set SASHELP.CARS to illustrate the use of my macro. As you can see, my macro can accept multiple categorical variables as inputs for creating indicator variables. I will do that here for the variables TYPE, MAKE, and ORIGIN.
%create_indicators(sashelp.cars, DriveTrain, Type Make Origin, cars1);
By executing this one line, I created the data set CARS1, which has the indicator variables for all of the levels within TYPE, MAKE, and ORIGIN.
Here is some code to take a random sample of CARS1 using PROC SURVEYSELECT; I included a seed for you to replicate my results.
proc surveyselect data = cars1 noprint n = 10 seed = 265 out = cars2; run; proc print data = cars2 noobs; var type:; run; proc print data = cars2 noobs; var origin:; run;
Here are the results from the two PROC PRINT statements.
Type | TypeHybrid | TypeSUV | TypeSedan | TypeSports | TypeTruck | TypeWagon |
---|---|---|---|---|---|---|
SUV | 0 | 1 | 0 | 0 | 0 | 0 |
Sedan | 0 | 0 | 1 | 0 | 0 | 0 |
Sports | 0 | 0 | 0 | 1 | 0 | 0 |
Wagon | 0 | 0 | 0 | 0 | 0 | 1 |
SUV | 0 | 1 | 0 | 0 | 0 | 0 |
Sedan | 0 | 0 | 1 | 0 | 0 | 0 |
SUV | 0 | 1 | 0 | 0 | 0 | 0 |
Sedan | 0 | 0 | 1 | 0 | 0 | 0 |
Wagon | 0 | 0 | 0 | 0 | 0 | 1 |
Sedan | 0 | 0 | 1 | 0 | 0 | 0 |
Origin | OriginAsia | OriginEurope | OriginUSA |
---|---|---|---|
USA | 0 | 0 | 1 |
USA | 0 | 0 | 1 |
USA | 0 | 0 | 1 |
USA | 0 | 0 | 1 |
Asia | 1 | 0 | 0 |
Asia | 1 | 0 | 0 |
USA | 0 | 0 | 1 |
Asia | 1 | 0 | 0 |
Asia | 1 | 0 | 0 |
USA | 0 | 0 | 1 |
I encourage you to print the entire data set in SAS for your viewing, and I also encourage you to try this macro for your own data set!
Thanks for visiting r-craft.org
This article is originally published at https://chemicalstatistician.wordpress.com
Please visit source website for post related comments.