Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse

Download (use StuffIt Expander to extract the files) TURNER Version 1.0 now

For detailed information about system requirements see below.

TURNER is research software for interactivly analysing multidimensional discrete data running on Macintosh computers. It tries to transfer the interactive paradigm known from exploratory graphical data analysis to the concise treatment of categorical data, typically arranged in two- or multi-way contingency tables. Including standard features for categorical data like Pearson's chi-squared test and log-linear models it offers the whole goodness-of-fit family of power divergence statistics and the N-value. Interactive contingency tables provide the user with the facility of easily switching between all two-dimensional views of multivariate data. All displays dealing with the same data set are fully linked and may be interacted with directly. Therefore multivariate dependencies can be simply detected. Using raw data as well as percentages or summands of the test statistics interactive tables allow to detect unusual cells that have a strong impact on the resulting statistics. The data manipulating facilities gives immediate answers to the questions whether empty cells -random or structural - influence the analysis of the data. In the same way validity of chi-squared tests can be proved in the presence of cells with expected numbers smaller than 5.

TURNER was devised by Antony Unwin and has been implemented and further developed by Stephan Lauer. It is written in C++. TURNER follows the Macintosh conventions and is consistent with other Mac packages. It is one of a number of software development projects in the department of Computer-oriented Statistics and Data Analysis at the University of Augsburg.

System requirements

Apple Macintosh computer with 68020 processor or higher, Power-Macintosh or compatible
at least 2 MBworking space available (4 MB recommended)
running System Mac OS 7.0 or higher
colour monitor recommended
hard disk recommended

Interactive contingency tables

Interactive contingency tables are new tools for flexibly investigating categorical data. The main operations possible in interactive tables are:

pooling categories
deleting categories
collapsing tables
colouring numbers according to their relation to observed data
toggling between raw data, percentages, expected values and summands of test statistics
manipulating data to see what-if-effect

Interactivity not only means that the user can interact with the data, but also that the results from the changes made by the user can be seen instantaneously. Therefore, interactive contingency tables not only offer the possibilities of comparing some static views of different aspects of the data, they even allow to draw conclusions from the manner things are changing.

Basic displays

The basic tool for presenting categorical data in TURNER is a two-dimensional contingency table. This is obvious for data in two dimensions, for data in more than two dimensions - stored in so-called supercubes - this means that two-way contingency tables will be displayed either showing the marginal distribution in a projection or the conditional distribution in a slice. In the literature slices are sometimes called partial tables and projections marginal tables. The following figure shows slices and projections of a four-dimensional data set.

Slices can exhibit quite different information than projections, it is therefore necessary to analyze all two-dimensional tables drawn from a multi-way contingency table.

Pooling Categories

Often in the analysis of survey data a regrouping of the pre-given categories might be very effectful. Instead of forcing the user to redefine groups in the edit mode, TURNER allows pooling categories by a single mouse click.

Deleting Categories

In the same manner it is possible to eliminate rows of a contingency table.

Collapsing Tables

Loglinear models are often used as a part of research studies where variables fall into different categories, for example in predictor and control variables, or in global and individual characteristics. In these cases an effective tool is needed to build-up multi-way contingency tables for a selected portion of variables. TURNER allows to select those variables that are not of interest for the next analysis stage and then to calculate a multi-way contingency table for all other variables with the data collapsed over the selected variables. In the previous example, the variable 'Preference' can be seen as the response variable, the other three as explanatory variables. In an early step of the analysis it seems natural to look at the dependencies of the design variables ignoring the variable 'Preference'.

Colouring numbers according to their relation to observed data

The order relation between observed and expected cell entries can be easily sketched from their colour. A blue colour indicates that the expected value is smaller than the observed one, a red colour indicates that the expected cell entry is greater or equal than the observed one.

In the same manner cells with expected values smaller than five are highlighted by brightness, light printing says that the expected value is greater than five, dark printing means that it is smaller than five.

Toggling between raw data, percentages, expected values and summands of test statistics

A first indication that there exist associations between variables can be drawn from the categories‘relative frequencies. TURNER allows to display row or column percentages as well as total percentages by simply moving the mouse in the upper left corner of a table display.

The Power Divergence Window allows to show expected values for the current loglinear model calculated by the iterative proportional fitting algorithm and to show summands of the power divergence statistic.

Manipulating data to see what-if-effect

By simply double-clicking a cell its entry can be changed to see what happens if. Such a construction of a hypothetical table allows to study the effect of structural and random zeros in the data set. Moreover, the effect of cells with small expected numbers can be studied in detail.

The display at the bottom shows how Pearson‘s chi-squared statistics change accordingly. In TURNER all manipulated values are coloured in orange.

Power Divergence Statistic

The power-divergence family of statistics links the traditional test statistics, like Pearsons chi-squared, the likelihood-ratio-statistik or the Freeman-Tukey statistic, by a single real-valued parameter. It was introduced in Cressie and Read (1984) and consolidated and updated in Cressie and Read (1988). The power-divergence statistic is defined as

Pearson's chi-squared test, p- and N-values

The standard test statistic for contingency tables is Pearson‘s chi-squared test. Usually, conclusions about its significance are drawn from the p-value. The main deficiency of the chi-squared test is, however, its dependence from the sample size N. Therefore, TURNER calculates the so-called N-value, that is the total number of observations needed for a table to be significant with same cell proportions. The N-value is especially advantegeous in comparing tables.

Loglinear Models

The definition and testing of models for discrete multivariate data has been the subject of many statistical researchers in the last quarter century. A very widespread and powerful method is loglinear modeling. Loglinear models share many features with linear model methods for continuous variables. They describe association patterns among categorical variables. With the loglinear approach, cell counts in a contingency table are modelled in correspondence to the associations between variables. The linear predictors for the models have structure analogous to ANOVA models.

As many other software TURNER also is restricted to hierarchical loglinear models. Hierarchical models means that whenever the model contains higher-order effects it also incorporates all lower-order effects composed from the variables.By standard a loglinear model in TURNER includes all main effects and whenever a higher-order effect is selected the model is automatically made hierarchical.

accesses.

$o$ $o$ $o$ $o$ $o$ $o$

September 1997