This article develops and fits probability distributions for the variability in projected (total) job tenure for adult men and women in 31 industries and 22 occupations based on data reported by the U.S. Department of Labor's Bureau of Labor Statistics. It extends previously published results and updates those results from January 1987 to February 1996. The model provides probability distributions for the variability in projected (total) job tenures within the time range of the data, and it extrapolates the distributions beyond the time range of the data, i.e., beyond 25 years.
Using probability plots and Maximum Likelihood Estimation (MLE), we fit lognormal distributions to data compiled by Ershow et al. for daily intake of total water and tap water by three groups of women (controls, pregnant, and lactating; all between 15–49 years of age) in the United States. We also develop bivariate lognormal distributions for the joint distribution of water ingestion and body weight for these three groups. Overall, we recommend the marginal distributions for water intake as fit by MLE for use in human health risk assessments.
This paper reanalyzes the dataset cited by the U.S. Environmental Protection Agency in its Exposure Factors Handbook that contains measurements of skin area, height, and body weight for 401 people spanning all stages of development. The reanalysis shows that a univariate model for total skin area as a function of body weight gives useful and practical results with little or no loss of reliability as compared to the Agency's bivariate model. This new result leads to a new method to develop Lognormal distributions for total skin area as a function of body weight alone.
In 1987, James and Knuiman published their analysis of a comprehensive domestic water use study conducted in Perth, Western Australia to quantify the components of water usage in approximately 3000 households. This manuscript corrects errors and omissions about James and Knuiman's study in the U.S. EPA's Exposure Factors Handbook, and it shows James and Knuiman's results in a form and notation more readily used in Monte Carlo simulations.
In recent years the U.S. Environmental Protection Agency has been challenged both externally and internally to move beyond its traditional conservative single‐point treatment of various input parameters in risk assessments. In the first section, we assess when more involved distribution‐based analyses might be indicated for such common types of risk assessment applications as baseline assessments of Superfund sites. Then in two subsequent sections, we give an overview with some case studies of technical analyses of (A) variability/heterogeneity and (B) uncertainty. By "inter‐individual variability" is meant the real variation among individuals in exposure‐producing behavior, in exposures, or some other parameter (such as differences among individual municipal solid waste incinerators in emissions). In contrast, "uncertainty" is a description of the imperfection in knowledge of the true value of a particular parameter or its real variability in an individual or a group. In general uncertainty is reducible by additional information‐gathering or analysis activities (better data, better models), whereas real variability will not change (although it may be more accurately known) as a result of better or more extensive measurements. The purpose of the rather long‐winded exposition of these two final sections is to show the differences between analyses of these two different things, both of which are described using the language of probability distributions.
For the U.S. population, we fit bivariate distributions to estimated numbers of men and women aged 18‐74 years in cells representing 1 in. intervals in height and 10 lb intervals in weight. For each sex separately, the marginal histogram of height is well fit by a normal distribution. For men and women, respectively, the marginal histogram of weight is well fit and satisfactorily fit by a lognormal distribution. For men, the bivariate histogram is satisfactorily fit by a normal distribution between the height and the natural logarithm of weight. For women, the bivariate histogram is satisfactorily fit by two superposed normal distributions between the height and the natural logarithm of weight. The resulting distributions are suitable for use in public health risk assessments.
In: Journal of risk research: the official journal of the Society for Risk Analysis Europe and the Society for Risk Analysis Japan, Band 4, Heft 1, S. 49-62
Finite mixture models, that is, weighted averages of parametric distributions, provide a powerful way to extend parametric families of distributions to fit data sets not adequately fit by a single parametric distribution. First‐order finite mixture models have been widely used in the physical, chemical, biological, and social sciences for over 100 years. Using maximum likelihood estimation, we demonstrate how a first‐order finite mixture model can represent the large variability in data collected by the U.S. Environmental Protection Agency for the concentration of Radon 222 in drinking water supplied from ground water, even when 28% of the data fall at or below the minimum reporting level. Extending the use of maximum likelihood, we also illustrate how a second‐order finite mixture model can separate and represent both the variability and the uncertainty in the data set.
Risk assessors often use different probability plots as a way to assessthe fit of a particular distribution or model by comparing the plotted points to a straight line and to obtain estimates of the parameters in parametric distributions or models. When empirical data do not fall in a sufficiently straight line on a probability plot, and when no other single parametricdistribution provides an acceptable (graphical) fit to the data, the risk assessor may consider a mixture model with two component distributions. Animated probability plots are a way to visualize the possible behaviors of mixture models with two component distributions. When no single parametric distribution provides an adequate fit to an empirical dataset, animated probability plots can help an analyst pick some plausible mixture models for the data based on their qualitative fit. After using animations during exploratory data analysis, the analyst must then use other statistical tools, including but not limited to: Maximum Likelihood Estimation (MLE) to find the optimal parameters, Goodness of Fit (GoF) tests, and a variety of diagnostic plots to check the adequacy of the fit. Using a specific example with two LogNormal components, we illustrate the use of animated probability plots asa tool for exploring the suitability of a mixture model with two component distributions. Animations work well with other types of probability plots, and they may be extended to analyze mixture models with three or more component distributions.
Variability arises due to differences in the value of a quantity among different members of a population. Uncertainty arises due to lack of knowledge regarding the true value of a quantity for a given member of a population. We describe and evaluate two methods for quantifying both variability and uncertainty. These methods, bootstrapsimulation and a likelihood‐based method, are applied to three datasets. The datasetsinclude a synthetic sample of 19 values from a Lognormal distribution, a sample of nine values obtained from measurements of the PCB concentration in leafy produce, and asample of five values for the partitioning of chromium in the flue gas desulfurization system of coal‐fired power plants. For each of these datasets, we employ the two methods to characterize uncertainty in the arithmetic mean and standard deviation, cumulative distribution functions based upon fitted parametric distributions, the 95th percentile of variability, and the 63rd percentile of uncertainty for the 81st percentile of variability. The latter is intended to show that it is possible to describe anypoint within the uncertain frequency distribution by specifying an uncertainty percentile and a Variability percentile. Using the bootstrap method, we compare results based upon use of the method of matching moments and the method of maximum likelihood for fitting distributions to data. Our results indicate that with only 5‐19 data pointsas in the datasets we have evaluated, there is substantial uncertainty based upon random sampling error. Both the boostrap and likelihood‐based approaches yield comparable uncertainty estimates in most cases.
Using exploratory data analysis, probability plots, scatterplots, and computer animations to rotate and visualize the data, we fit a trivariate Normal distribution to data for the height, the natural logarithm of body weight, and the body fat for 646 men between the ages of 50 and 80 years as reported by the medical staff of the U.S. Veterans Administration's "Normative Aging Study" in Boston, MA. Although these data do not include any children, women, or young men, the measurements represent the best data that we could find through a 4‐year search. We believe that these data are well measured and reliable for men in the specified age range and that these data reveal an interesting statistical pattern for use in probabilistic PBPK models.
Based on results reported from the NHANES II Survey (the National Health and Nutrition Examination Survey II) for people living in the United States during 1976–1980, we use exploratory data analysis, probability plots, and the method of maximum likelihood to fit lognormal distributions to percentiles of body weight for males and females as a function of age from 6 months through 74 years. The results are immediately useful in probabilistic (and deterministic) risk assessments.
The purpose of this paper is to undertake a statistical analysis to specify empirical distributions and to estimate univariate parametric probability distributions for air exchange rates for residential structures in the United States. To achieve this goal, we used data compiled by the Brookhaven National Laboratory using a method known as the perfluorocarbon tracer (PFT) technique. While these data are not fully representative of all areas of the country or all housing types, they are judged to be by far the best available. The analysis is characterized by four key points: the use of data for 2,844 households; a four‐region breakdown based on heating degree days, a best available measure of climatic factors affecting air exchange rates; estimation of lognormal distributions as well as provision of empirical (frequency) distributions; and provision of these distributions for all of the data, for the data segmented by the four regions, for the data segmented by the four seasons, and for the data segmented by a 16 region by season breakdown. Except in a few cases, primarily for small sample sizes, air exchange rates were found to be well fit by lognormal distributions (adjusted R2 0.95). The empirical or lognormal distributions may be used in indoor air models or as input variables for probabilistic human health risk assessments.
Fish consumption rates play a critical role in the assessment of human health risks posed by the consumption of fish from chemically contaminated water bodies. Based on data from the 1989 Michigan Sport Anglers Fish Consumption Survey, we examined total fish consumption, consumption of self‐caught fish, and consumption of Great Lakes fish for all adults, men, women, and certain higher risk subgroups such as anglers. We present average daily consumption rates as compound probability distributions consisting of a Bernoulli trial (to distinguish those who ate fish from those who did not) combined with a distribution (both empirical and parametric) for those who ate fish. We found that the average daily consumption rates for adults who ate fish are reasonably well fit by lognormal distributions. The compound distributions may be used as input variables for Monte Carlo simulations in public health risk assessments.
We propose 14 principles of good practice to assist people in performing and reviewing probabilistic or Monte Carlo risk assessments, especially in the context of the federal and state statutes concerning chemicals in the environment. Monte Carlo risk assessments for hazardous waste sites that follow these principles will be easier to understand, will explicitly distinguish assumptions from data, and will consider and quantify effects that could otherwise lead to misinterpretation of the results. The proposed principles are neither mutually exclusive nor collectively exhaustive. We think and hope that these principles will evolve as new ideas arise and come into practice.