How to anonymize data
Use the form on this page to contact us and find out if your data needs to be anonymized.
If you intend to deposit your data in our archive, remember that DASSI exclusively accepts
anonymized data.
What does that mean?
It means that your data should not allow the
identification of the individuals involved in the research that produced the data.
Anonymization allows for the privacy protection of research participants. This way, it is not possible to identify the people in the dataset or in the data corpus, either directly through personal information or indirectly through other additional or contextual information.
Anonymization applies to all information that can be used to identify a person. Generally, a distinction is made between:
- direct identifiers: personal data and/or information sufficient on its own to uniquely identify an individual. For example: name, surname, tax code, residential address, phone number, vehicle registration number, email address, student ID, computer IP address, etc.
- indirect identifiers: information that alone is not sufficient to identify someone, but if combined with other available information, could be used to deduce the person's identity. For example: age, gender, municipality of residence, profession, education level, income, marital status, nationality, ethnicity, rare disease, etc.
There is no one-size-fits-all method for all types of data. Generally, identifiers (both direct and indirect) can be removed or modified. In the latter case, information can be generalized, aggregated, or distorted. Anonymization practices may vary depending on the type of data:
- quantitative data: involve the aggregation of one or more variables, recoding the values of a variable to reduce the precision level (e.g., into classes), transforming textual variables into numeric ones, aggregating upper or lower intervals (top-coding) of a continuous variable to conceal outliers, or recoding textual variables with more generic descriptions.
- qualitative data: involve the use of pseudonyms or the removal of direct identifiers, categorization of proper names (e.g., "friend," "sister," "northern city," "Serie A team," etc.), categorization of socio-demographic information according to shared standards (e.g., using Istat categories), substitution of specific temporal references with periods or ranges (e.g., for age or years).
Use the form on this page to contact us and find out if your data needs to be anonymized.