UTCDAK - Censored Data Analysis Kit is software for analysing censored data. The system is a tool for computing the mean and standard deviation of a sample with censored data points. The user can choose from nine different methods. The system can be used as an applet with the default data on this web page or as an application on any data on your computer.
The methods are based on Statistics for Environmental Engineers by Berthouex and Brown (1996).
Real - This method computes the mean and standard deviation of the data disregarding the detection limit (DL). This is the prefered method, but most of the time it is not possible to use this method, because the data are reported as below detection limit (BDL) and the real value is not known.
Zero - A modified sample is created by replacing below detection limit (BDL) values with zero. The mean and standard deviation are computed from the modified sample.
Half - A modified sample is created by replacing below detection limit (BDL) values with half the detection limit (DL). The mean and standard deviation are computed from the modified sample.
DL - A modified sample is created by replacing below detection limit (BDL) values with the detection limit (DL). The mean and standard deviation are computed from the modified sample.
Delete - A modified sample is created by deleting the below detection limit (BDL) values. The mean and standard deviation are computed from the modified sample.
Median - The median value of the sample is taken to be the mean. The standard deviation is not computed.
Trimmed - A modified sample is created by deleting below detection limit (BDL) values as well as the same amount of data points from the other (high) end of the distribution. The mean is computed from the modified sample. The standard deviation is not computed.
Winsorized - This method is similar, but more complicated than the Trimmed method. The mean and standard deviation are calculated. See Berthouex and Brown (1996) for details.
Cohen - This method is Cohen's Maximum Likelihood Estimator Method. It is the most complicated method supported by the system. The method finds the paramters (mean and standard deviation) that maximize the likelihood of obtaining the values of the sample. See Berthouex and Brown (1996) for details.
Entering/Modifying Data
Data can be scrolled with the 'UP' and 'DOWN' buttons. The data value column can be valued with a data value or "BDL" if the real data value is unknown. If the radio button for multiple detection limits is active the detection limit column can be valued with the detection limit or "ND" if the detection limit is unknown.
Specifying Single or Multiple Detection Limit (DL)
The user can specify a single or multiple detection limits. The appropriate radio button should be activated. Note that only the fields corresponding to method of specifying the detection limit are enabled.
Specifying Methods
The methods a respecified on the right side of the window. A check mark indicates the method will be used.
Calculating
Calculations are done with the 'Calculate' button.
Loading Data (Application Only)
The name of a data file containing the data can be specified at the command line when starting the system. See the Application section below for details.
St. Charles Bay, Texas Ammonia Nitrogen Data
The applet and application starts with a default sample data set. The sample data are 147 Ammonia Nitrogen measurements [mg/L] from St. Charles Bay in South Texas. The data were compiled by Ward and Armstrong (1997). Three data point were assumed errors based being more that three times the standard deviation from the mean. They were removed.
Simple Nitrate Sample Data
A sample of nitrate measurements taken from Berthouex and Brown (1996) was used during the development of the system. This allowed for the easy veryfication of the results, because Berthouex and Brown use the same data set in examples illustrating the methods. The data set is very good for illustrating the methods. Try increasing the detection limit while computing several methods and see which methods still give good estimates of the mean and/or standard deviation.
The same Java class used by the applet above can be run as an application. To run the application download the Utcdak.java. Then compile and execute it with your Java interpreter using the following syntax:
The Java compiler (javac) and interpreter (java) are part of the Java Developer's Kit (JDK) which can be downloaded from Sun's Java web page.
The datafile is optional. It is in ASCII format with the number of data points on the first line followed by each data point on one line. The data value and detection limit are separated by a blank space. If after the first line each line contains only one value it is assumed that the second value on the first line is the detection limit for all the data points. Check the sample data for examples. Following are the beginning few lines of the default data file.
149 0.1000 ND 0.0200 ND 0.3400 ND 0.3300 ND 0.1200 ND 0.0400 ND BDL 0.0300 BDL 0.0300 0.1700 ND BDL 0.0500 BDL 0.0500 0.0700 ND 0.0800 ND ...
Berthouex, P. M., and L. C. Brown, Statistics for Environmental Engineers, Lewis, Boca Raton, Florida, 1996.
Ward, G. H., and N. E. Armstrong, Ward, Corpus Christi Bay National Estuary Program: Ambient Water, Sediment and Tissue Quality of Corpus Christi Bay Study Area: Present Status and Historical Trends, Summary Report, Draft, Center for Research in Water Resources, The University of Texas at Austin, 1996.