infolytika

Please choose your evaluation exercise!

A brief guide to CrisisModeler

This view provides a brief guide to the user interface and exercises of CrisisModeler. Further details can be found here and in the original paper by Holopainen & Sarlin (2015). The sections in this view briefly describe the components of the visual interface. These include the top banner, the left panel and the views separated into three steps of the modeling process:

  1. Model input
    • Data
    • Parameters
  2. Model building
    • Optimize
    • Describe
    • Evaluate
  3. Model output
    • Plot
    • Map

Top banner

CrisisModeler is showcased with two European applications: bank-level and country-level systemic risk indicators and stress events. The top banner includes two buttons for selection of application. Country-level is the default choice, which loads country-level quarterly data into the CrisisModeler. Selecting the bank-level applications switches to a bank-specific application of the CrisisModeler.


Left panel

The left panel contains settings and parameters for interacting with CrisisModeler. These are divided into three sections:

  • Modeling parameters refers to modeling specifications, including sliders controlling the preference between type I/II errors in the loss function, as well as the forecast horizon and the post-crisis horizon. A checkbox controls whether the classifier threshold should be optimized or not (see von Schweinitz and Sarlin, 2015). The following views allow excluding unknown events per quarter: Describe, Evaluation (recursive) and Plot. The Optimize and Evaluate (Cross-validate) views allow specifying the number of folds in cross-validation, whereas the Evaluate (recursive) view allows setting the starting quarter of the recursion. In the Map view, Methods and Method parameters are omitted and replaced with Map parameters and Plot parameters, allowing for a large number of choices pertaining to the view.
  • Methods checkboxes enable selecting out of the following list techniques for modeling: Signaling, Logit, Decision Tree, k-Nearest Neighbors, Random Forest, Neural Network and Support Vector Machine. The last Ensembles checkbox relies upon four approaches to aggregate the chosen methods: 1. using the best individual method (in-sample) for out-of-sample predictions, 2. relying on the binary majority vote of all methods, 3. combining model output probabilities into an arithmetic mean of all methods, and 4. using in-sample performance to weight in the aggregation of probabilities.
  • Method parameters relate to the parameters of the methods, and appear based on the methods selected using the checkboxes above.

The sections in the left panel can be collapsed as preferred by clicking the arrows connected to the corresponding sections.

Above the gray panel resides a Calculate button and an Auto refresh -checkbox, as well as a Reset button. By default, any changes in the method or modeling parameters will initialize an immediate recalculation of the exercises. If desired, the Auto refresh setting may be unchecked. Then, recalculation commences only when Calculate is clicked, regardless of changes in methods and/or parameters. In the Optimize view, Auto refresh is always disabled due to the large computations. The Reset button resets all method and exercise parameters to their defaults.


Modeling process & views

1. Model input

Data

This view allows interaction with the input to modeling, including variable selection. (Saving and loading of data is not available in the preview version. Contact info@infolytika.com for your free trial.)

Parameters

This view allows interaction with parameters used for modeling. Used parameters can also be saved in this view. (This is not available in the preview version. Contact info@infolytika.com for your free trial.)

2. Model building

Optimize

This view is used to perform a grid search using either the cross-validation or the recursive exercise for the selected methods to determine their appropriate parameters. Most methods return a table sorted in order of descending usefulness, with the best performing parameters first. However, the following methods differ: Logit returns the optimal LASSO penalization parameter and the Decision Tree returns the optimal complexity parameter used for pruning the tree. Apply buttons appear after the calculations, allowing for the returned optimal parameters to quickly be applied to the left panel.

Describe

This view presents descriptive statitistics of the data and models, allowing for a better understanding of the models.

Evaluate

In this view, two exercises may be performed to evaluate the selected methods:

  • Recursive: This subview provides a table summarizing recursive out-of-sample performance. That is, the performance is reported as a sum of all out-out-of sample quarters. The starting quarter of the recursion can be specified in the left panel.
  • Cross-validate: this subview provides a table summarizing cross-validated out-of-sample performance. That is, the performance is reported as a sum of all the left-out folds. The number of folds can be specified in the left panel.

For both exercises, users can save data and model output as well as a performance table. (Saving data is not available in the preview version. Contact info@infolytika.com for your free trial.)

3. Model output

Plot

This view provides two types of graphical output to the models estimated.

  • Time-series: This subview enables specifying one method and one entity, for which a line chart of the probability and threshold is shown. For the Decision Tree method, a tree plot with a highlighted path for the chosen entity is shown below the line chart for a specific quarter. The final quarter is shown by default, but a slider enables specifying the quarter.
  • Cross-section: This subview shows a cross-section barchart of each entity for one selected method, with estimations based on the full dataset. The final quarter is shown by default, but a slider enables a specific quarter to be chosen.
The graphical output of both subviews can be selected based on two types of estimations: Full sample or Recursive. The full sample estimations show a fit as of today (i.e. on the whole available sample), while the recursive estimations display the out-of-sample fit of the real-time recursive exercise starting from the specified quarter onwards.

For both subviews, users can save data and model output as well as the shown graphics. (Saving data is not available in the preview version. Contact info@infolytika.com for your free trial.)

Map

The Map view presents the Financial stability map, which creates a low-dimensional representation of the financial stability cycle from a large number of risk indicators, as described in Sarlin and Peltonen. The map has two aims: i) to reduce large amounts of high-dimensional data to fewer mean profiles, and ii) to provide a low-dimensional representation of the high-dimensional mean profiles. Its three subviews are: Financial stability map, Indicator planes and Mapping log.

  • Financial stability map: Interaction enables the high-dimensional risk indicators to be visualized for a chosen set of economies and time span on the map. It provides a low-dimensional display that functions as a display for visualizing individual data concerning entities and their time series.
    The time span of the plotted entities may be chosen using the slider below the map, and individual entities may be selected by checking "Select individual countries".
  • Indicator planes: This subview provides a plot of each individual variable, as well as a coloring with respect to individual indicator values to show how values are associated with various locations on the map. The planes come along with information from building the map.
  • Mapping log: This log window shows the raw output from the training of the Financial stability map for reference.

When the Map view is selected, parameters which only affect the Financial stability map appear on the left panel under "Map parameters". These are: Neighborhood radius of the Gaussian neighborhood function, which controls how data in one node impacts neighboring nodes. Min & max of grid height (nodes) is a complexity parameter which defines the minimum and maximum size of the grid, given that the model does not exceed performance of a logit model. Training iterations specifies the number of iterations in training. Smaller values are recommended, as higher values usually have a minor impact on the model, but a large impact on computation time. Degree of supervision defines the weights the class variables have on training; with a smaller number the weight will be larger, which improves the separation of classes on the map. Number of clusters defines the number of different clusters in the map. Input variables allows for manual selection of input variables to be used.



A brief guide to CrisisModeler

This view provides a brief guide to the user interface and exercises of CrisisModeler. Further details can be found here and in the original paper by Holopainen & Sarlin (2015). After a brief guide to getting started, the sections in this view briefly describe the components of the visual interface. These include the top banner, the left panel and the views separated into three steps of the modeling process:

  1. Model input
    • Data
    • Parameters
  2. Model building
    • Optimize
    • Describe
    • Evaluate
  3. Model output
    • Plot
    • Map

Getting started

The app is automatically opened in your default browser. If it is Internet Explorer, we suggest you to copy/paste the URL to Chrome or Firefox for a better experience.

Next, the user is expected to provide input to the CrisisModeler. It takes two files as an input:

  • Data file: To get started users need to load a data file with quarterly bank or country level observations that include both indicators and crisis events. The file is kept in internal memory (not as a file in the home directory). It is worth noting that all included indicators are used for defining a common sample (no missing values for any indicator), which is used for deriving all models/thresholds. In the Model output step, signaling and logit analysis is applied to and provides model output for the full available sample based upon used indicators (i.e., the coverage may be larger). The file assumes the following structure (order and naming of columns):
    • First three columns for country data: Country, Year and Quarter (if year and quarter are in the format 2010Q1, then replace Year and Quarter columns with a single column denoted Time).
    • First four columns for bank data: Bankname, Country, Year and Quarter (if year and quarter are in the format 2010Q1, then replace Year and Quarter columns with a single column denoted Time).
    • Subsequent columns: All columns are treated as indicators (left hand side variables) and can take arbitrary but unique names. Note that all variables will be used in all methods except for signaling and logit analysis.
    • Final column: Crisis events that can only take values 0 and 1, but the column itself can take an arbitrary name.
  • Parameter file: CrisisModeler derives default parameters from defaultparameters.r in the home directory, after which the parameters can be adapted through the user interface (adapts internal memory, not the file). Moreover, a full, new set of parameters can be read from a csv file (for instance, a previously saved set of parameters). This parameter file includes a matrix of all free parameters, including preferences, forecast horizons etc. A loadable parameter file assumes similar structure as the enclosed file defaultparameters.csv, but is specific to each dataset.

CrisisModeler can also be run in online mode, which fixes the dataset and allows no saving of data and model output. Online mode is activated by setting binary switches ‘onlineVersion’ to 1 in the beginning of both server.R and ui.R. More specifically, The online mode implies more specifically an automated reading of a file denoted ‘macro.csv‘ in the home directory, as well as hides possibilities to upload data and parameters and save data and model output.

The user can also choose a different Usefulness function for measuring performance and setting thresholds. The default version uses the approach by Sarlin (2013) with absolute costs for missed crises and false alarms, whereas the user can also choose to use the approach by Alessi and Detken (2011) with error costs relative to class size. The alternative Usefulness function is activated by setting the binary switch ‘defaultparameters$u.alessidetken’ to 1 in defaultparameters.R.


Top banner

CrisisModeler allows running both bank-level and country-level early-warning exercises. The banner will indicate whether the uploaded data initates the bank or country version of the tool.


Left panel

The left panel contains settings and parameters for interacting with CrisisModeler. These are divided into three sections:

  • Modeling parameters refers to modeling specifications, including sliders controlling the preference between type I/II errors in the loss function, as well as the forecast horizon and the post-crisis horizon. A checkbox controls whether the classifier threshold should be optimized or not (see von Schweinitz and Sarlin, 2015). The following views allow excluding unknown events per quarter: Describe, Evaluation (recursive) and Plot. The Optimize and Evaluate (Cross-validate) views allow specifying the number of folds in cross-validation, whereas the Evaluate (recursive) view allows setting the starting quarter of the recursion. In the Map view, Methods and Method parameters are omitted and replaced with Map parameters and Plot parameters, allowing for a large number of choices pertaining to the view.
  • Methods checkboxes enable selecting out of the following list techniques for modeling: Signaling, Logit, Decision Tree, k-Nearest Neighbors, Random Forest, Neural Network and Support Vector Machine. The last Ensembles checkbox relies upon four approaches to aggregate the chosen methods: 1. using the best individual method (in-sample) for out-of-sample predictions, 2. relying on the binary majority vote of all methods, 3. combining model output probabilities into an arithmetic mean of all methods, and 4. using in-sample performance to weight in the aggregation of probabilities.
  • Method parameters relate to the parameters of the methods, and appear based on the methods selected using the checkboxes above.

The sections in the left panel can be collapsed as preferred by clicking the arrows connected to the corresponding sections.

Above the gray panel resides a Calculate button and an Auto refresh -checkbox, as well as a Reset button. By default, any changes in the method or modeling parameters will initialize an immediate recalculation of the exercises. If desired, the Auto refresh setting may be unchecked. Then, recalculation commences only when Calculate is clicked, regardless of changes in methods and/or parameters. In the Optimize view, Auto refresh is always disabled due to the large computations. The Reset button resets all method and exercise parameters to their defaults.


Modeling process & views

1. Model input

Data

This view allows interaction with the input to modeling, including both data and variable selection.

  • Load data: To get started, users should upload data to be used by the CrisisModeler. After loading a data file, a list of variables to be used by CrisisModeler appears. Using the checkboxes, the user may restrict the variables to be used as predictors in the exercises. Should the loaded data file include columns that are not intended to be used as input variables, it is recommended that these are excluded by unchecking them from this list. Excluded columns are still kept intact for reference in the output data. For convenience, CrisisModeler calculates and displays the size of the common sample for the current variable selection in real-time.
Parameters

This view allows interaction with parameters used for modeling. Used parameters can saved for future use.

  • Load/Save parameters: Users can upload parameters from previous sessions with the CrisisModeler or save the current set of parameters.

2. Model building

Optimize

This view is used to perform a grid search using either the cross-validation or the recursive exercise for the selected methods to determine their appropriate parameters. Most methods return a table sorted in order of descending usefulness, with the best performing parameters first. However, the following methods differ: Logit returns the optimal LASSO penalization parameter and the Decision Tree returns the optimal complexity parameter used for pruning the tree. Apply buttons appear after the calculations, allowing for the returned optimal parameters to quickly be applied to the left panel.

Describe

This view presents descriptive statitistics of the data and models, allowing for a better understanding of the models.

Evaluate

In this view, two exercises may be performed to evaluate the selected methods:

  • Recursive: This subview provides a table summarizing recursive out-of-sample performance. That is, the performance is reported as a sum of all out-out-of sample quarters. The starting quarter of the recursion can be specified in the left panel.
  • Cross-validate: this subview provides a table summarizing cross-validated out-of-sample performance. That is, the performance is reported as a sum of all the left-out folds. The number of folds can be specified in the left panel.

For both exercises, users can save data and model output as well as a performance table. For convenience, copies of data and model output of these two exercises are also stored in-memory as global R variables: ‘global.data.recursive’ and ‘global.data.crossval’, respectively.

3. Model output

Plot

This view provides two types of graphical output to the models estimated.

  • Time-series: This subview enables specifying one method and one entity, for which a line chart of the probability and threshold is shown. For the Decision Tree method, a tree plot with a highlighted path for the chosen entity is shown below the line chart for a specific quarter. The final quarter is shown by default, but a slider enables specifying the quarter.
  • Cross-section: This subview shows a cross-section barchart of each entity for one selected method, with estimations based on the full dataset. The final quarter is shown by default, but a slider enables a specific quarter to be chosen.
The graphical output of both subviews can be selected based on two types of estimations: Full sample or Recursive. The full sample estimations show a fit as of today (i.e. on the whole available sample), while the recursive estimations display the out-of-sample fit of the real-time recursive exercise starting from the specified quarter onwards.

For both subviews, users can save data and model output as well as the shown graphics. The data saved corresponds to the estimation chosen by the user. For convenience, a copy of the full sample data and model output is also stored in-memory in the global R variable ‘global.data.fullsample’.

Map

The Map view presents the Financial stability map, which creates a low-dimensional representation of the financial stability cycle from a large number of risk indicators, as described in Sarlin and Peltonen. The map has two aims: i) to reduce large amounts of high-dimensional data to fewer mean profiles, and ii) to provide a low-dimensional representation of the high-dimensional mean profiles. Its three subviews are: Financial stability map, Indicator planes and Mapping log.

  • Financial stability map: Interaction enables the high-dimensional risk indicators to be visualized for a chosen set of economies and time span on the map. It provides a low-dimensional display that functions as a display for visualizing individual data concerning entities and their time series.
    The time span of the plotted entities may be chosen using the slider below the map, and individual entities may be selected by checking "Select individual countries".
  • Indicator planes: This subview provides a plot of each individual variable, as well as a coloring with respect to individual indicator values to show how values are associated with various locations on the map. The planes come along with information from building the map.
  • Mapping log: This log window shows the raw output from the training of the Financial stability map for reference.

When the Map view is selected, parameters which only affect the Financial stability map appear on the left panel under "Map parameters". These are: Neighborhood radius of the Gaussian neighborhood function, which controls how data in one node impacts neighboring nodes. Min & max of grid height (nodes) is a complexity parameter which defines the minimum and maximum size of the grid, given that the model does not exceed performance of a logit model. Training iterations specifies the number of iterations in training. Smaller values are recommended, as higher values usually have a minor impact on the model, but a large impact on computation time. Degree of supervision defines the weights the class variables have on training; with a smaller number the weight will be larger, which improves the separation of classes on the map. Number of clusters defines the number of different clusters in the map. Input variables allows for manual selection of input variables to be used.