HIPAA requires the de-identification of Protected Health Information (PHI) through two main methods: Safe Harbor and Expert Determination. The Safe Harbor approach requires removing 18 specific identifiers, while Expert Determination relies on statistical analysis to ensure minimal re-identification risk. These methods enable healthcare organizations to use valuable health data for research while maintaining patient privacy.

Key insights:

PHI Definition: Any identifiable health information connected to medical care or payment that's protected under HIPAA
De-identification Methods: Two approaches - Safe Harbor (removing 18 identifiers) and Expert Determination (statistical analysis)
Data Elements: 18 specific identifiers including names, dates, contact information, biometric data, and unique codes must be removed
ZIP Code Rules: First three digits can be retained only if geographic area has population over 20,000
Date Handling: Only years can be retained; specific dates must be removed, and ages over 89 must be grouped as "90 or older"
Expert Assessment: Specialists evaluate re-identification risk using replicability, data source availability, and distinguishability factors
Risk Management: Regular review of de-identification methods needed as technology and data accessibility evolve
Re-identification Controls: Organizations can assign tracking codes if they cannot be used to identify individuals and are securely stored
Actual Knowledge: Covered entities must remove any information they know could identify individuals, even if not explicitly listed
Compliance Balance: De-identification enables research use while maintaining privacy, but requires ongoing monitoring and updates

Introduction

Patient privacy is a top priority in the healthcare industry. The Health Insurance Portability and Accountability Act (HIPAA) is the legislation that controls how Protected Health Information (PHI) is handled. Any information about a person's health, medical treatment, or payment for medical care is considered PHI and is governed by strict laws to avoid unapproved disclosure.

HIPAA requires data to be de-identified to make it easier to utilize for research purposes while maintaining the privacy of individual individuals. There are two primary approaches to de-identification: the Safe Harbor approach, which calls for the elimination of 18 distinct identifiers, and the Expert Determination approach, in which trained specialists analyze the information to determine the risk of re-identification. Researchers and healthcare organizations must comprehend these identifiers and how to remove them to use important health data without jeopardizing patient privacy.

This insight aims to help readers understand the 18 identifiers specified by HIPAA along with techniques to de-identify them.

PHI

1. Definition

Any information that could be used to identify a person and is connected to their health, medical care, or payment for medical care is protected health information (PHI). PHI is governed by the Health Insurance Portability and Accountability Act (HIPAA) and is generated or disclosed in the provision of healthcare services. Only studies involving medical records or direct patient interactions are permitted to use PHI for research purposes under HIPAA. While research data unrelated to medical events is exempt from HIPAA regulations unless it can be traced back to healthcare services, researchers must adhere to HIPAA when working with PHI from medical records.

1. What is Not Considered PHI?

Information that is not connected to medical records or healthcare events but may contain personal identifiers is not regarded as PHI under HIPAA. For instance, research health information (RHI), such as aggregated datasets or test results not included in medical records, is not regarded as PHI even if it must abide by rules safeguarding human participants. Genetic research, which searches for potential signs rather than diagnosing illnesses, is also excluded. Under HIPAA, even health information that does not contain the 18 unique identifiers—like isolated vital signs—is not classified as PHI. However, if any of these identifiers—such as a medical record number—are present, the data is classified as PHI.

List of 18 Identifiers

Under HIPAA, the following identifiers must be removed for data to be considered de-identified:

Names,
Geographic subdivisions smaller than a state,
Dates related to an individual (excluding the year),
Telephone numbers,
Fax numbers,
Email addresses,
Social Security numbers,
Medical record numbers,
Health plan beneficiary numbers,
Account numbers,
Certificate/license numbers,
Vehicle identifiers,
Device identifiers,
URLs,
IP addresses,
Biometric identifiers,
Full-face photographs or similar images,
Any unique identifying number or code.

As long as de-identified data complies with HIPAA's re-identification prevention requirements (e.g. anonymizing data with non-derived codes), it can be used without restriction.

Covered Entities, Business Associates, and PHI

Under HIPAA, "covered entities" and their "business partners" are subject to the Privacy Rule. Health care clearinghouses and health care providers that electronically manage specific standardized financial transactions are examples of covered entities. By definition, business associates work for covered organizations to offer services involving the use or disclosure of PHI.

With permission from a business associate agreement, this arrangement allows a business associate to de-identify PHI. This technique demonstrates how HIPAA aims to preserve strict privacy rules and it balances the utility of PHI in service delivery. A thorough summary of the Privacy Rule and its implications for PHI protection may be found on the Office for Civil Rights (OCR) website.

De-Identifying PHI under HIPAA

In order to guarantee a negligible risk of re-identification, HIPAA gives covered businesses two options for de-identifying data: eliminating all 18 identifiers or employing statistical de-identification. As long as the codes are protected from public view and cannot be used to identify an individual, covered entities are permitted to assign them for internal tracking. Agreements with business colleagues who handle de-identification must also be made. By using this method, organizations can preserve confidentiality while allowing data to be used for study without requiring individual authorization.

De-identification is essential for protecting privacy and allowing for the secondary use of health data, which is becoming more feasible as health information technology progresses. De-identification lowers privacy concerns when it eliminates identifiers from health data, allowing researchers to use the data without jeopardizing the individual's privacy.

It is important to note that de-identification techniques reduce the possibility of re-identification, but they do not completely remove it, which highlights the necessity of treating with caution.

The De-identification Standard

Article 164.514(a) of the HIPAA Privacy Rule stipulates that health information must be deemed de-identified if it does not identify a person and there is no plausible reason to think it might be used for identification. Expert Determination and Safe Harbor are the two de-identification techniques provided under the Rule.

A certified expert must confirm that the data's re-identification risk is low and provide documentation of the analysis techniques in order to use the Expert Determination method. On the other hand, 18 distinct identifiers, including names, location information, and biometric data, must be eliminated in order to use the Safe Harbor technique. In order to allow data usage outside of the Privacy Rule's restrictions, both strategies seek to satisfy the de-identification threshold. These techniques could, however, cause data loss, which would lessen the usefulness of de-identified data in some situations.

Re-identification

Additional re-identification requirements are provided by the Privacy Rule, which permits covered companies to give de-identified data a special code in order to facilitate possible re-identification. This code cannot be derived from personal data or used for identifying purposes that are not under the control of the covered entity. The data would return to HIPAA-protected status if re-identification was sought. HIPAA's de-identification framework enables data value while rigorously regulating potential re-identification and any related disclosures, as the process reaffirms.

Expert Determination Method

A strong method for data de-identification, the Expert Determination Method reduces the possibility of re-identification in sensitive datasets, such medical records, by applying statistical and scientific methods. A specialist in de-identification evaluates and uses intricate statistical concepts to guarantee that information cannot be traced back to particular people, satisfying a "very tiny" risk threshold.

This approach uses an expert's knowledge to assess whether the data is safe for dissemination without violating privacy laws, making it especially pertinent in industries like healthcare where privacy is crucial.

1. Risk Assessment in Expert Determination

The expert estimates the uniqueness of records in a dataset within a specific population using a variety of statistical criteria in order to calculate identification risk. The expert assesses the likelihood that a combination of data points, such as a patient's demographic information, could identify a person. When population data is not available, data-derived statistics or external data sources, such as U.S. Census data, are used for population analysis in this study. In order to reduce the possibility of re-identification, it is intended that records be non-unique or indistinguishable from those of other records in the collection and the general population.

An expert can use a variety of strategies to reduce identification risk if they conclude that it surpasses reasonable bounds. Techniques include changing or generalizing certain data pieces to hide recognizable trends, such as aggregating geographic regions or substituting age ranges for precise age. By striking a balance between privacy and data utility, this procedure makes sure that the dataset is still useful for analysis even when the risk of re-identification is reduced. To further guard against linkage with other datasets, the specialist might also put in place extra precautions such as data usage agreements.

Although a precise numerical risk level is not specified by the Privacy Rule, what constitutes an acceptable "very modest" risk depends on the circumstances. To establish an acceptable risk level, the specialist evaluates elements like data uniqueness and access to outside identifying sources. For instance, a fraction of a percent of records may be unique in one context due to acceptable risk, but in another, stronger standards may be required due to increasing data availability. In accordance with legal regulations, the expert records this conclusion.

2. How Do Experts Assess the Risk of Identification of Information?

Experts assess the identification risk when de-identifying health information using a systematic procedure that employs a number of re-identification mitigation techniques. The Office for Civil Rights (OCR) stipulates that the techniques employed must adhere to a "very modest" risk level, although it does not prescribe a particular de-identification procedure. Experts evaluate identification risks iteratively in collaboration with data managers and administrators; this process frequently necessitates several testing and adjustment cycles before the health information is found to satisfy this minimal risk criterion.

Assessing the health data's potential for identification by the intended recipients is the first step in the assessment procedure. This entails assessing the uniqueness of the data items as well as the accessibility of relevant external data sources. Experts provide guidance on suitable statistical or scientific de-identification techniques to lessen identifiability after this assessment. Lastly, using predetermined risk criteria, the expert validates the privacy of the data and decides whether the risk level is acceptable.

When evaluating the identifiability of data, experts consult fundamental criteria such as distinguishability, data source availability, reproducibility, and overall risk assessment. grasp the possibility of re-identification requires a grasp of each of these principles:

Replicability entails classifying data pieces according to how likely they are to stay constant or change over time. Because they are consistent identifiers, very stable factors like demographics (birthdate) present a higher danger of re-identification. On the other hand, data that varies, such as lab results, presents less identifiability risk because it is more difficult to consistently link these changeable measures to a particular person.

Data Source Availability investigates the restrictions surrounding these data sources and the accessibility of data that may contain recognizable qualities. Certain test results and other information that is restricted to healthcare systems and not available to the general public are less likely to be re-identified. However, the risk is greatly increased by publicly available information, such as names or dates of birth, since these records may be cross-referenced to possibly re-identify people.

Distinguishability evaluates the ease with which a person's data may be identified in a larger data set. For instance, a significant number of people in the United States can be uniquely identified using data factors like the ZIP code, gender, and date of birth. Broader classifications, on the other hand, including year of birth and three-digit ZIP codes, lessen this distinctiveness and make it less likely that a specific individual may be identified.

Assess Risk examines availability, distinguishability, and replicability in detail, combining these factors to calculate the cumulative identification risk. Greater risk is generally indicated by higher levels of these components, particularly when strongly distinguishing traits or stable demographic elements are present. Re-identification is less likely at lower levels with more changeable or less distinctive information.

The uniqueness of the de-identified data, the availability of a naming source, and a working method to create the connection are the three requirements that must be met for an expert to determine if the data set can be "linked" to outside sources that reveal individuals. While the risk of future identifiability is not completely eliminated, the lack of any of these factors makes re-identification more difficult.

When combined, these guidelines provide a strong framework that allows professionals to evaluate and reduce the risk of identification, allowing for the responsible sharing of de-identified health information while protecting the privacy of individuals.

3. Validity of Expert Determinations

Expert opinions might need to be reviewed on a regular basis as data access, societal circumstances, and technology change. Experts frequently create certifications with a time limit, periodically reviewing the data protection level to make sure it still complies with changing standards. Entities may keep using the dataset after certification expires unless specialists conclude additional changes are required in light of privacy issues and existing de-identification procedures.

Experts can create several de-identified versions of a dataset that are each suited to the risk profile of the intended users in situations when the dataset may need to fulfill several objectives. For instance, one version of the dataset may generalize both geographic and age data, whereas another may use more precise age distinctions but only generalize geographical data. Even when datasets are pooled, each version is separately tailored to avoid possible re-identification. Because of this adaptability, businesses may satisfy a range of data needs without sacrificing privacy.

Safe Harbor Method

According to article 164.514(b), the Safe Harbor method describes a particular procedure for de-identifying personal health information (PHI) in order to protect privacy. This technique makes it impossible to identify specific persons within the dataset by eliminating or generalizing specific data pieces. Names, geographic information below the state level, any particular dates (apart from the year), and unique identifiers such as IP addresses, Social Security numbers, and medical records must all be removed from PHI. The covered entity must additionally attest that no residual data could be utilized, separately or in combination, to re-identify an individual in order for anonymization to be effective.

1. ZIP Code Inclusion in De-Identified Data

Under certain circumstances, the Safe Harbor approach permits the partial retention of ZIP codes in de-identified data. The first three digits of a ZIP code may be kept by covered entities, but only if the relevant geographic area has a population of at least 20,000. Should there be 20,000 or fewer people in an area, the first three numbers must be changed to "000." This guarantees that a low population density lowers the possibility of re-identification, and population measurements should be validated using the most recent census data. In order to combine the usefulness of geographic data with the protection of individual identities, the approach depends on these rules.

2. Handling Date Information

In order to prevent tracking specific healthcare events, the Safe Harbor technique limits the use of date information by requiring that all elements more specific than the year be eliminated. To avoid potentially identifying older demographics, a person's birthdate and other relevant information must be generalized to "90 or older" if they are over 89. Even dates linked to clinical data are subject to this constraint, which guarantees that no recognizable chronology can be traced back to a specific person while maintaining overall health trends in datasets.

In addition to the basic identifiers, the Safe Harbor technique expressly forbids any derivatives that can jeopardize anonymity, like initials or partial Social Security numbers. This includes identifiers that, when paired with other information, present a danger of re-identification even if they may appear less obvious. Therefore, any unique numbers, traits, or codes that might obliquely enable an individual's identity are prohibited, including biometric identifiers like voice or fingerprint prints.

3. Any Other Unique Identifying Number, Characteristic, or Code

"Any other unique identifying number, characteristic, or code" (mentioned in article 164.514(b)(2)(i)(R)) is a catch-all category that encompasses any unique identifiers beyond those specifically listed in the Safe Harbor list (A-Q) under the Privacy Rule's Safe Harbor procedure. Maintaining de-identification requires this clause, which makes sure that even implicit identifiers that could be linked to a specific person are eliminated. An overview of this category's contents and instances of typical identifiers that fit under it are provided below.

Identifying Number: Any unique number that is not specifically mentioned but may be connected to a person's identification can be considered an identifying number. This includes, for example, clinical trial record numbers. In order to prevent re-identification, numbers linked to particular activities, such as clinical trials, should be eliminated.

Identifying Code: Values obtained from encoding methods without further security are referred to as identifying codes (e.g., a code generated by a non-secure hash function without a secret key, such as a "salt"). Despite being created from the data itself, these codes must be eliminated to preserve anonymity because they are susceptible to reverse engineering. For instance, barcodes that are incorporated into electronic medical records or e-prescribing systems can facilitate tracking and are frequently specific to each patient. To prevent re-identification, such identifiers must be removed via the Safe Harbor method.

Identifying Characteristic: Any distinctive personal trait that would enable someone to recognize and differentiate an individual could be considered an identifying characteristic. Because of its specificity and possibility for traceability, a medical record that lists a patient's occupation as "current President of State University," for instance, would qualify as an identifying characteristic.

Re-identification Exception: The Safe Harbor clause contains an exception for re-identification. If a covered entity follows the Safe Harbor re-identification protections, they are permitted to apply codes or IDs to de-identified material for possible re-identification. If these protections are upheld, such codes are not regarded as direct identifiers that need to be deleted, assisting covered entities in managing records while maintaining privacy compliance.

By guaranteeing that all remaining unique identifiers, whether explicit or implied, are handled and lowering the risks associated with re-identification, this "catch-all" strategy highlights the Privacy Rule's dedication to data privacy.

4. Concept of Actual Knowledge

"Actual knowledge" is defined by the Safe Harbor standards as a covered entity's awareness that residual data, either by itself or in conjunction with other information, might be used to identify a specific person. This awareness may emerge if the entity is aware that certain information, such as a distinctive occupation or an uncommon clinical episode, may reveal a person. Therefore, in order to completely comply with de-identification standards, a covered business must make sure that such unique data is eliminated or generalized, acknowledging any identifiable features.

For example, if an employment, such as "past president of the State University," is listed in a patient's record, the covered organization may determine that this particular feature, when paired with other basic information, such as age or state, would probably result in the identification of the individual. Because the particular occupation presents a high risk of identification, just eliminating conventional identifiers as specified in the Safe Harbor list would not be adequate in this situation. The entity must take further measures to remove such identifiable information in order to comply with Safe Harbor rules, reducing the possibility of re-identification.

This case shows that in order to guarantee compliance, "actual knowledge" calls for a proactive examination by covered businesses. Entities can improve patient privacy and reduce the possibility of re-identification by addressing these concerns and better meeting Safe Harbor standards.

Conclusion

To sum up, HIPAA's de-identification procedure for Protected Health Information (PHI) is crucial for protecting patient privacy while enabling the useful application of health data in analytics and research. Covered businesses can convert sensitive PHI into de-identified data that can be used without strict privacy restrictions by eliminating the 18 particular identifiers, which include names, location information, and biometric data. Additionally, HIPAA describes two de-identification techniques, Safe Harbor and Expert Determination, each of which offers a methodical way to reduce the possibility of re-identification. The Expert Determination approach emphasizes the significance of contextual awareness in protecting privacy by using statistical tools and expert judgments to assure a "very tiny" risk of re-identification, whereas the Safe Harbor method provides explicit rules for eliminating identifiers.

Re-identification is still a significant problem, though, and it calls for thorough analysis of how de-identified data might still be connected to specific people using publicly available information. When assessing the identifiability of their datasets, covered organizations need to exercise caution, particularly in view of changing data access conditions and technological advancements. Organizations can strike a balance between using health data for research and upholding the stringent privacy rules outlined by HIPAA by putting in place strong procedures for data management and routinely evaluating the efficacy of de-identification techniques. In the end, preserving the integrity of health information and defending people's rights in a society that is becoming more and more data-driven require a thorough grasp of both the theoretical and practical facets of de-identification.

Authors

Hashim Hayat

Cornell University

Krishna Chilukuri

Central Michigan University

Abdullah Ahmed

NYU Abu Dhabi

Daheem Hayat

National Defence University

Flavia Trotolo

NYU Abu Dhabi

Build HIPAA-Compliant Healthcare Solutions with Confidence

Whether you need to ensure your existing healthcare application meets HIPAA compliance standards or want to develop a new HIPAA-compliant solution from the ground up, Walturn's expert team can help. Our specialists understand the complexities of PHI de-identification and can implement robust privacy measures that protect patient data while maintaining functionality. Let's build secure, compliant healthcare solutions together.

Get HIPAA-Compliant Today

References

“HIPAA Privacy Rule and Its Impacts on Research.” Privacyruleandresearch.nih.gov, privacyruleandresearch.nih.gov/pr_08.asp.

Artificial Intelligence

Data Privacy

AI Governance

Got an app?

We build and deliver stunning mobile products that scale

Get Started

Got an app?

We build and deliver stunning mobile products that scale

Get Started

Got an app?

We build and deliver stunning mobile products that scale

Get Started

Got an app?

We build and deliver stunning mobile products that scale

Get Started

Our mission is to harness the power of technology to make this world a better place. We provide thoughtful software solutions and consultancy that enhance growth and productivity.

The Jacx Office: 16-120

2807 Jackson Ave

Queens NY 11101, United States

(202) 900-9871

Book an onsite meeting or request a services?

Learn More

Our work

Services

Insights

Artificial Intelligence (AI)

Case studies

Our mission is to harness the power of technology to make this world a better place. We provide thoughtful software solutions and consultancy that enhance growth and productivity.

The Jacx Office: 16-120

2807 Jackson Ave

Queens NY 11101, United States

(202) 900-9871

Book an onsite meeting or request a services?

Learn More

Our work

Services

Insights

Artificial Intelligence (AI)

Case studies

Our mission is to harness the power of technology to make this world a better place. We provide thoughtful software solutions and consultancy that enhance growth and productivity.

The Jacx Office: 16-120

2807 Jackson Ave

Queens NY 11101, United States

(202) 900-9871

Book an onsite meeting or request a services?

Learn More

Our work

Services

Insights

Artificial Intelligence (AI)

Case studies

Our mission is to harness the power of technology to make this world a better place. We provide thoughtful software solutions and consultancy that enhance growth and productivity.

The Jacx Office: 16-120

2807 Jackson Ave

Queens NY 11101, United States

(202) 900-9871

Book an onsite meeting or request a services?

Learn More

Our work

Services