Multi-varied scoring analysis techniques
Aims
This course covers predictive modelling through the use of the SAS/STAT software, with a particular focus on Proc LOGISTIC. For the development of Decision Trees, Enterprise Miner is used. The aim of the course is to construct an entire predictive process related to a binary target event, illustrating the methods for the correct identification and definition of the event, selecting the explanatory variables, assessing the models, the treatment of the missing values and the most efficient techniques for managing large volumes of data.
Who should attend
Statistical analysts, data mining experts, business users; topics covered will focus on the marketing database area, credit risk assessment, fraud detection and predictive modelling applications in general.
Prerequisites
Basic experience in the use of the SAS language is required and at least a basic statistics knowledge. Basic experience in data analysis is recommended.
Course outline
Database preparation
- Defining the phenomenon to be analysed (analysis temporal interval)
- Identifying data sources
- Designing and Constructing the Customer Table
- Constructing the TARGET variable
- Determination of the development sample (TRAINING/VALIDATION)
- Characteristic analysis (missing, outlier, ...)
Logistical Regression
- Underlying model hypotheses
- Parameter estimates
- Model significance
- Single regressor significance
- Fit diagnostics
- Residue analysis
- Influence analysis
- Interaction of variables
- Multicollinearity
- Selection procedures
Decision Trees
- Underlying model hypotheses
- Algorithm phases: iterative partition, pruning
- Split criteria: Chi-square. Entropy, Gini; adjustments
- Regression Decision Tree
- Missing Treatment
- Overfitting
Assessment
- Estimated model comparison
- Assessment of Performance and estimated model lost/profit indices
Duration
The duration of the course is 3 days.