Suchergebnisse
Filter
27 Ergebnisse
Sortierung:
Assessing the Influence of "Standard" and "Culturally Specific" Risk Factors on the Prevalence and Frequency of Offending: The Case of Indigenous Australians
In: Race and Justice: RAJ, Band 3, Heft 1, S. 58-82
ISSN: 2153-3687
Linkage for risk-sensitive environments: Australian examples of privacy-preserving record linkage using Bloom filters
In: International journal of population data science: (IJPDS), Band 9, Heft 5
ISSN: 2399-4908
Introduction Organisations are increasingly aware of the risks and responsibilities of handling personally identifying information (PII). These factors not only influence internal data management practices but also impact on data linkage arrangements with other parties. Some linkage environments are particularly sensitive to the release or use of PII. Advances in privacy-preserving record linkage methods such as PPRL-using-Bloom make it possible to undertake highly accurate data linkage without release or disclosure of PII. Such methods play a role in enabling data linkage in risk-sensitive environments.
Objectives and ApproachWe present and describe several Australian case studies where the PPRL-using-Bloom method has been used to enable data linkage between organisations. We report on the defining elements of each case, the associated risks and solutions, as well as quality and performance issues. We also reflect on challenges and opportunities for future improvement.
ResultsAustralian use cases utilising privacy preserving linkage (PPRL-using-Bloom) include projects linking state-based datasets to Commonwealth datasets, some linking primary care data to state-based secondary healthcare data, and others linking healthcare data to non-health datasets such as police and criminal justice datasets.
Conclusion / ImplicationsMethods such as PPRL-using-Bloom play a critical role in enabling data linkage in highly risk-sensitive environments. However, in an ever evolving world where risks and requirements are constantly changing, linkage methodologies and technologies must remain adaptable to meet evolving demands.
Partial Agreements in Probabilistic Linkages
In: International journal of population data science: (IJPDS), Band 3, Heft 4
ISSN: 2399-4908
IntroductionRecord linkage units around the world use probabilistic linkage techniques for routine linkage of large datasets. It is widely known how probabilities are converted to agreement and disagreement weights for each field, yet there has been little exploration of the methodology to optimally convert field similarity scores into partial weights.
Objectives and ApproachString similarity comparators such as Jaro-Winkler are commonly used in traditional linkage, other comparators such as the Sorenson Dice coefficient, Jaccard similarity and Hamming distance are used in alternative privacy-preserving record linkage techniques. Determining partial weights to apply at each level of similarity is a non-trivial task. However, both types of linkages would greatly benefit from similarity to weight functions for each field that maximises the accuracy of the linkage.
We evaluated several methods for computing partial agreement weights and applied these to synthetic datasets with varying levels of corruption. We then evaluated the methods on real administrative datasets.
ResultsExact comparisons can miss matches where typographical errors or misspellings produce small changes in value. Similarity comparisons can reduce the number of missed matches, but may also increase the number of incorrect matches.
Various results of the partial agreement methods on Jaro-Winkler, Sorenson Dice coefficient, Jaccard similarity and Hamming distance comparators will be presented. A generic function to convert similarity values to weights, created from synthetic data, can be used on most datasets with a greatly improved result in linkage quality. However, maximising the linkage quality requires the creation of similarity-to-weight functions that are optimised for each dataset.
Conclusion/ImplicationsAccuracy in record linkage is vital for the correct analysis of linked data. It is even more critical in privacy-preserving record linkage where the ability for clerical review is limited. Optimised functions for converting similarities to partial weights can significantly improve the quality of linkage and should not be overlooked.
Public Cloud: The Future of Record Linkage?
In: International journal of population data science: (IJPDS), Band 3, Heft 4
ISSN: 2399-4908
IntroductionBusinesses worldwide are increasingly adopting the storage, compute and analytical services provided by cloud computing. Yet, few operational linkage units are keeping pace with this world of technological change - most use legacy systems approaching their limits with the rapidly increasing size and range of datasets now required for linkage.
Objectives and ApproachTo meet the demands of linkage for the near future, it is important that new solutions for linkage consider the services provided by public cloud infrastructure for compute, storage and analytics. We examined Platform as a Service (PaaS) offerings for use in the development of a cost-effective cloud model for scalable, privacy-preserving record linkage (PPRL). PPRL techniques were adapted to maximise the quality of linkage and to automate as much of the process as possible. Finally, a prototype was created to demonstrate the capabilities and potential of the model.
ResultsWe present our cloud model for PPRL, a platform for record linkage that provides rapid scaling of resources to meet demand, and the results of how our prototype performed on massive datasets.
Conclusion/ImplicationsThe application of record linkage using relatively inexpensive cloud infrastructure represents a significant step towards providing an efficient and scalable record linkage service to researchers and government. Larger datasets can be linked efficiently, including national or cross-jurisdictional datasets, with little investment in private infrastructure, and improved turnaround times for researchers.
How do socio-demographic differences in administrative records affect the quality (accuracy) of data linkage?
In: International journal of population data science: (IJPDS), Band 3, Heft 4
ISSN: 2399-4908
IntroductionRecord linkage is inherently uncertain, with all linkages containing some amount of false positive and false negative errors. Previous results have suggested that linkage error may not be evenly distributed throughout the population, with particular subgroups exhibiting higher rates of linkage error. Record linkage is inherently uncertain, with all linkages containing
Objectives and ApproachThis study investigated the distribution of linkage error using four large-scale Australian administrative datasets; hospital admissions datasets from Western Australia and New South Wales, and emergency presentation datasets from New South Wales and South Australia. Each dataset had been previously de-duplicated to a very high standard, with large scale manual review taking place; these results were used as our truth set.
Each dataset was linked using probabilistic record linkage with results (precision and recall) compared by gender, age, geographic indices of remoteness and socioeconomic status.
ResultsResults were highly dataset dependent. Consistent findings were lower linkage quality found for individuals living in remote locations, and lower linkage quality in those in the youngest category (those born after 1980). Some datasets showed lower linkage quality for females, for those in middle age as compared to the elderly, and for those with lower socioeconomic status. The differences in linkage quality found were typically small. Changes in threshold settings had generally no effect on the relationship between sociodemographic characteristics and linkage quality.
Conclusion/ImplicationsLinkage studies focussing on younger individuals and those in remote areas may have greater uncertainty regarding their results. Targeting efforts by linkage units may be required to ensure even distribution of linkage errors. Further research is required into investigating how linkage errors effect research outcomes.
Trajectories of homelessness and association with mental health and substance use disorders among young people transitioning from out-of-home care in Australia
In: Child abuse & neglect: the international journal ; official journal of the International Society for the Prevention of Child Abuse and Neglect, Band 149, S. 106643
ISSN: 1873-7757
Use of cross-sectoral data linkage to predict high-rate offenders in Western Australia
In: International journal of population data science: (IJPDS), Band 3, Heft 4
ISSN: 2399-4908
IntroductionStudies have repeatedly found that a small number of offenders account for a disproportionate amount of crime. High-rate, persistent offenders (so-called 'prolific' offenders) have a major impact on local crime rates and public perceptions of safety, and place a substantial financial and social burden on communities.
Objectives and ApproachUsing population-level administrative data, our study identifies 'prolific' offenders in WA and describes their demographic and crime profiles. The official criminal records of all offenders born in WA between 1980 and 1995 were linked to administrative records from health, education and child protection databases (followed to 2005). Linked data on families (parents and siblings) were also included. Using this information, the study identified factors that distinguish between prolific and non-prolific offenders. The study also examined whether correlates of prolific offending were similar between a) male and female offenders, and b) Indigenous and non-Indigenous offenders.
ResultsClusters of offenders exhibiting a high-rate and persistent pattern of offending over the life-course were identified. These 'prolific' offenders accounted for a disproportionate amount of crime and criminal justice contacts:
8\% of female offenders accounted for 41% of female contacts;
3\% of male offenders made up 21% of male contacts;
9\% of Indigenous offenders made up 37% of Indigenous contacts;
7\% of non-Indigenous offenders made up 36% of non-Indigenous contacts.
Being the subject of a maltreatment allegation, being placed in out-of-home care, and having a serious mental health condition before the age of 18 increased the odds of being a prolific offender. Two criminogenic factors - early onset of offending (contact before age 12) and early violence - emerged as the most significant predictors.
Conclusion/ImplicationsChild protection and mental health services have much of the information needed to target early prevention, while criminal justice agencies are well-placed to apply crime reduction strategies through the targeting of early-onset/early-violent offenders. A combined approach is likely to have the greatest effect on reducing impact of prolific offending.
Assessing the impact of different grouping methods: time to rethink and regroup?: IJPDS (2017) Issue 1, Vol 1:136, Proceedings of the IPDLN Conference (August 2016)
In: International journal of population data science: (IJPDS), Band 1, Heft 1
ISSN: 2399-4908
ABSTRACT
ObjectivesThe grouping of record-pairs to determine which administrative records belong to the same individual is an important process in record linkage. A variety of grouping methods are used but the relative benefits of each are unknown. We evaluate a number of grouping methods against the traditional merge based clustering approach using large scale administrative data.
ApproachThe research aimed to both describe current grouping techniques used for record linkage, and to evaluate the most appropriate grouping method for specific circumstances. A range of grouping strategies were applied to three datasets with known truth sets. Conditions were simulated to appropriately investigate one-to-one, many-to-one and ongoing linkage scenarios.
ResultsResults suggest alternate grouping methods will yield large benefits in linkage quality, especially when the quality of the underlying repository is high. Stepwise grouping methods were clearly superior for one-to-one linkage. There appeared little difference in linkage quality between many-to-one grouping approaches. The most appropriate techniques for ongoing linkage depended on the quality of the population spine and the underlying dataset.
ConclusionsThese results demonstrate the large effect that the choice of grouping strategy can have on overall linkage quality. Ongoing linkages to high quality population spines provide large improvements in linkage quality compared to merge based linkages. Procuring or developing such a population spine will provide high linkage quality at far lower cost than current methods for improving linkage quality. By improving linkage quality at low cost, this resource can be further utilised by health researchers.
How do you measure up? Methods to assess linkage quality: IJPDS (2017) Issue 1, Vol 1:133, Proceedings of the IPDLN Conference (August 2016)
In: International journal of population data science: (IJPDS), Band 1, Heft 1
ISSN: 2399-4908
ABSTRACT
ObjectivesRecord linkage is a powerful technique which transforms discrete episode data into longitudinal person-based records. These records enable the construction and analysis of complex pathways of health and disease progression, and service use. Achieving high linkage quality is essential for ensuring the quality and integrity of research based on linked data. The methods used to assess linkage quality will depend on the volume and characteristics of the datasets involved, the processes used for linkage and the additional information available for quality assessment. This paper proposes and evaluates two methods to routinely assess linkage quality.
ApproachLinkage units currently use a range of methods to measure, monitor and improve linkage quality; however, no common approach or standards exist. There is an urgent need to develop "best practices" in evaluating, reporting and benchmarking linkage quality. In assessing linkage quality, of primary interest is in knowing the number of true matches and non-matches identified as links and non-links. Any misclassification of matches within these groups introduces linkage errors. We present efforts to develop sharable methods to measure linkage quality in Australia. This includes a sampling-based method to estimate both precision (accuracy) and recall (sensitivity) following record linkage and a benchmarking method - a transparent and transportable methodology to benchmark the quality of linkages across different operational environments.
ResultsThe sampling-based method achieved estimates of linkage quality that were very close to actual linkage quality metrics. This method presents as a feasible means of accurately estimating matching quality and refining linkages in population level linkage studies. The benchmarking method provides a systematic approach to estimating linkage quality with a set of open and shareable datasets and a set of well-defined, established performance metrics. The method provides an opportunity to benchmark the linkage quality of different record linkage operations. Both methods have the potential to assess the inter-rater reliability of clerical reviews.
ConclusionsBoth methods produce reliable estimates of linkage quality enabling the exchange of information within and between linkage communities. It is important that researchers can assess risk in studies using record linkage techniques. Understanding the impact of linkage quality on research outputs highlights a need for standard methods to routinely measure linkage quality. These two methods provide a good start to the quality process, but it is important to identify standards and good practices in all parts of the linkage process (pre-processing, standardising activities, linkage, grouping and extracting).
Implementing privacy-preserving record linkage: welcome to the real world: IJPDS (2017) Issue 1, Vol 1:134, Proceedings of the IPDLN Conference (August 2016)
In: International journal of population data science: (IJPDS), Band 1, Heft 1
ISSN: 2399-4908
ABSTRACT
ObjectivesWhile record linkage has become a strategic research priority within Australia and internationally, legal and administrative issues prevent data linkage in some situations due to privacy concerns. Even current best practices in record linkage carry some privacy risk as they require the release of personally identifying information to trusted third parties. Application of record linkage systems that do not require the release of personal information can overcome legal and privacy issues surrounding data integration. Current conceptual and experimental privacy-preserving record linkage (PPRL) models show promise in addressing data integration challenges but do not yet address all of the requirements for real-world operations. This paper aims to identify and address some of the challenges of operationalising PPRL frameworks.
ApproachTraditional linkage processes involve comparing personally identifying information (name, address, date of birth) on pairs of records to determine whether the records belong to the same person. Designing appropriate linkage strategies is an important part of the process. These are typically based on the analysis of data attributes (metadata) such as data completeness, consistency, constancy and field discriminating power. Under a PPRL model, however, these factors cannot be discerned from the encrypted data, so an alternative approach is required. This paper explores methods for data profiling, blocking, weight/threshold estimation and error detection within a PPRL framework.
ResultsProbabilistic record linkage typically involves the estimation of weights and thresholds to optimise the linkage and ensure highly accurate results. The paper outlines the metadata requirements and automated methods necessary to collect data without compromising privacy. We present work undertaken to develop parameter estimation methods which can help optimise a linkage strategy without the release of personally identifiable information. These are required in all parts of the privacy preserving record linkage process (pre-processing, standardising activities, linkage, grouping and extracting).
ConclusionsPPRL techniques that operate on encrypted data have the potential for large-scale record linkage, performing both accurately and efficiently under experimental conditions. Our research has advanced the current state of PPRL with a framework for secure record linkage that can be implemented to improve and expand linkage service delivery while protecting an individual's privacy. However, more research is required to supplement this technique with additional elements to ensure the end-to-end method is practical and can be incorporated into real-world models.
Unlocking the potential of health systems using privacy preserving record linkage: A pilot project exploring the research potential of developing a linkable general practice dataset
In: International journal of population data science: (IJPDS), Band 4, Heft 3
ISSN: 2399-4908
BackgroundGeneral practice is a rich source of health data for research. It is an important resource which can be used to improve patient management, reduce costs and improve patient outcomes. Traditionally, the challenge has been around access to general practice data which remains hard to 'join up'.
This abstract describes technology developed to support aspirations of the MedicineInsight program to provide linked de-identified general practice data that can be used to derive insights to enable better patient outcomes.
Main AimThe aim of this project was to use real-world data to identify technical, logistical and analytical requirements throughout the linkage process. Logistical aims covered the negotiation, approval and data acquisition processes, as well as data linkage and data delivery aspects performed by technical and data service stakeholders.
Methods/Approach Given the sensitivity of the information involved, the project employed a privacy preserving record linkage methodology. This method uses encrypted personal identifying information (Bloom filters) in a probability-based linkage framework to help mitigate risk while maximising linkage quality.
Existing MedicineInsight systems were extended to automatically generate encoded linkage data at each general practice. Pilot linkages were then used to validate the capability/capacity of CDL infrastructure to create secure extensible linked general practice datasets.
ResultsThe project has successfully developed interoperable technology to create a transparent data catalogue which is linkable to other datasets. This technology has been embedded with MedicineInsight systems and results of the pilot linkages are being evaluated. The project will make recommendations to enable consistent delivery of linkage services across care settings.
ConclusionOutcomes from the project will improve delivery of record linkage services to the health and broader research community. Using linked data from across the care continuum, researchers will be able to evaluate the effectiveness of service delivery and provide evidence for policy and programme development.
Privacy preserving linkage using multiple dynamic match keys
In: International journal of population data science: (IJPDS), Band 4, Heft 1
ISSN: 2399-4908
IntroductionAvailable and practical methods for privacy preserving linkage have shortcomings: methods utilising anonymous linkage codes provide limited accuracy while methods based on Bloom filters have proven vulnerable to frequency-based attacks.
ObjectivesIn this paper, we present and evaluate a novel protocol that aims to meld both the accuracy of the Bloom filter method with the privacy achievable through the anonymous linkage code methodology.
MethodsThe protocol involves creating multiple match-keys for each record, with the composition of each match-key depending on attributes of the underlying datasets being compared. The protocol was evaluated through de-duplication of four administrative datasets and two synthetic datasets; the 'answers' outlining which records belonged to the same individual were known for each dataset. The results were compared against results achieved with un-encoded linkage and other privacy preserving techniques on the same datasets.
ResultsThe multiple match-key protocol presented here achieved high quality across all datasets, performing better than record-level Bloom filters and the SLK, but worse than field-level Bloom filters.
ConclusionThe presented method provides high linkage quality while avoiding the frequency based attacks that have been demonstrated against the Bloom filter approach. The method appears promising for real world use.
Evaluation of approximate comparison methods on Bloom filters for probabilistic linkage
In: International journal of population data science: (IJPDS), Band 4, Heft 1
ISSN: 2399-4908
Introduction The need for increased privacy protection in data linkage has driven the development of privacy-preserving record linkage (PPRL) techniques. A popular technique using Bloom filters with cryptographic analyses, modifications, and hashing variations to optimise privacy has been the focus of much research in this area. With few applications of Bloom filters within a probabilistic framework, there is limited information on whether approximate matches between Bloom filtered fields can improve linkage quality.
Objectives In this study, we evaluate the effectiveness of three approximate comparison methods for Bloom filters within the context of the Fellegi-Sunter model of recording linkage: Sørensen–Dice coefficient, Jaccard similarity and Hamming distance.
Methods Using synthetic datasets with introduced errors to simulate datasets with a range of data quality and a large real-world administrative health dataset, the research estimated partial weight curves for converting similarity scores (for each approximate comparison method) to partial weights at both field and dataset level. Deduplication linkages were run on each dataset using these partial weight curves. This was to compare the resulting quality of the approximate comparison techniques with linkages using simple cut-off similarity values and only exact matching.
Results Linkages using approximate comparisons produced significantly better quality results than those using exact comparisons only. Field level partial weight curves for a specific dataset produced the best quality results. The Sørensen-Dice coefficient and Jaccard similarity produced the most consistent results across a spectrum of synthetic and real-world datasets.
Conclusion The use of Bloom filter similarity comparisons for probabilistic record linkage can produce linkage quality results which are comparable to Jaro-Winkler string similarities with unencrypted linkages. Probabilistic linkages using Bloom filters benefit significantly from the use of similarity comparisons, with partial weight curves producing the best results, even when not optimised for that particular dataset.
The effect of cross-jurisdictional linked hospital and death data on estimating risk-adjusted grouped hospital standardised mortality ratios in Australia: IJPDS (2017) Issue 1, Vol 1:139, Proceedings of the IPDLN Conference (August 2016)
In: International journal of population data science: (IJPDS), Band 1, Heft 1
ISSN: 2399-4908
ABSTRACT
ObjectivesThe Population Health Research Network (PHRN) was established to increase data linkage capacity in Australia. A proof of concept study investigating cross border hospital use and hospital mortality was undertaken to demonstrate the effectiveness of increased data linkage capacity in supporting nationally significant health research. The objective of this study was to evaluate whether cross-jurisdictional linkage of hospital and death records across Australian states could refine estimation of Hospital Standardised Mortality Ratios (HSMRs).
ApproachIn Australia, administrative hospital and death data are collected by individual state governments. The newly established Centre for Data Linkage created a cross-jurisdictional linkage key that brought together hospital and death records belonging to individuals across four Australian states over a five year period (1st July 2004 – 30th June 2009). Hospital inpatient records from public, psychiatric and private hospitals and private day surgery centres were provided by New South Wales, Western Australia and Queensland. South Australia provided public hospital inpatient records only. The linked data underwent extensive cleaning and standardisation to improve the validity of interstate comparisons.
The final cohort comprised 7.7 million hospital patients. In-hospital deaths and deaths within 30 days of hospital discharge from the four state jurisdictions were used to estimate the SMR of hospital groups defined by geography and type of hospital (grouped HSMR) under three record linkage scenarios; 1) cross-jurisdictional person-level linkage, 2) within-jurisdictional (state-based) person-level linkage and 3) unlinked records. All public and private hospitals in New South Wales, Queensland, Western Australia and public hospitals in South Australia were included in this study. Death registrations from all four states were obtained from state-based registries of births, deaths and marriages.
Results Cross-jurisdictional linkage identified 11,116 cross-border hospital transfers of which 170 resulted in a cross-border in-hospital death. An additional 496 cross-border deaths occurred within 30 day of hospital discharge. The inclusion of cross-jurisdictional person-level links to unlinked hospital records reduced the coefficient of variation amongst the grouped HSMRs from 0.19 to 0.15; the inclusion of 30 day deaths reduced the coefficient of variation further to 0.11. There were minor changes in grouped HSMRs between cross-jurisdictional and within-jurisdictional linkages, although the impact of cross-jurisdictional linkage increased when restricted to geographic regions with high cross-border hospital use such as the New South Wales and Queensland border area.
ConclusionCross-jurisdictional data linkage modified estimates of grouped HSMRs, particularly for hospitals groups that were likely to receive a high proportion of cross-border users.