Presentation and options Dialog box of the application
Syntax

Presentation and options

This application allows generating an explanatory and predictive model of a Y spatial dichotomous variable based on n quantitative Xn independent spatial variables.

The Y dependent variable will be dichotomous and, therefore, the numeric values it will take will be either 1 or 0 (presence/absence, yes/no, success/failure...). The purpose of the analysis is predicting that Y variable takes value 1 according to the values of the explanatory variables, P(Y =1|X ), and to evaluate the relationship or effect of those on the dependent variable.

The analysis is based on the multi-variant binary logistic regression model, which assumes that the probabilities of Y variable would be 1 follow the logistic distribution and, therefore, its value can be estimated according to the following formula, named logistic function:

where:


are the independent variables,

is the constant of the model or independent term,

are the coefficients of independent variables.

This is a continuous function and takes values in the range [0,1].

The coefficients vector is estimated by the maximum likelihood estimation, which means that coefficients are adjusted in a way to maximize the likelihood function.

Given that a dichotomous dependent variable follows a binomial distribution, the likelihood function for a random sample of N observations is expressed by:

where:


is the observed value (1 or 0) of the dependent variable for i sample

is the predicted value of the dependent variable for i sample, .

Coefficients which maximize L(B) will also maximize its logarithmic transformation. To maximize the logarithm of the likelihood function, the solution to the following non-lineal equational system must be found:

where:
is the observed value of independent variable for i sample.

In the application, the Newton-Raphson iterative algorithm has been implemented in order to resolve it.

So then, to determine the coefficients of the regression, a set of samples where both the dependent variable (1 or 0) in detailed locations (point features) and the set of possible independent variables must be provided . These samples will be provided either in a PNT structured point file or in a table in any other format accessible via an ODBC (Open Database Connectivity) driver. Independent variables must be given as IMG raster format from the same geographic region and pixel size. The predictable result will also be an IMG raster format.

The regression procedure is, in fact, an iterative fitting process of all possible regressions: from the regression with all initially introduced independent variables to regressions with a single independent variable. Analyzing the statistical parameters of each regression and according to the chosen statistical criteria (the smaller AIC coefficient, the smaller Deviance statistics or the better R2 Naglekerke coefficient, the best considered regression of all is obtained.

More information about the logistic regression model, the maximum likelihood estimation, and the iterative Newton-Raphson algorithm can be consulted at the following reference:

Czepiel, S.A. (2002) Maximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation https://czep.net/stat/mlelr.pdf.


Dialog box of the application


RegLog dialog box


Syntax

Syntax:

Parameters:

Modifiers: