Meeting 13. Nov 2012 Arezo, Aslak, Gro(tlf), Endre, Ingrid, Rune

Present: Arezoo, Aslak, Gro, Endre, Ingrid (referent), Rune

Endre present:

M 43 Aids

M 44 CHDo

K 49 CHD

patient data is often structured like this.

usually more is involved.

If you have a table like this you can tell that the male, 43 has aids -> not really anonymous.

you can deduce that the male 43 has aids because he is the only male 43 in the data set.

These data are not anonymous, just deidentified. Group data are more truly anonymous.

This is more anonymous since the age is grouped:

M 40-45 aids

M 40-45 CHD

K 46-50 CHD

Finding the optimal way of de-identify data is hard. There are so many different combinations that it will be hard to do in a limited amount of time. You have to find the balance between keeping information and making the data anonymous.There is no obvious way to do clustering of diseases. No obvious way of how to group the data.

what Endre's program do: collect all values from each row. program automatically gives hierarchies.

Cornell - program that do the same, but you have to make the tables yourself. very time consuming. hard to create the hierarchies.

Sensitive attribute - the one that will not be de-identified. the other variables will be de-identified while the sensitive attribute stays as it is.

Project 2: (Randveig)

Patient records.

she suspected that info in health records were of bad quality. If you search for a person a lot of info is not filled in and so on. she wanted to find out how bad the quality was. check quality of info in health records. Endre worked with her with de-identifying data (dates and so on). prob: dates were not formatted in the same way. one patient could have several ids.

Ingrid:

take into account the individual (doctor). suggestions for icd-10 codes - red, green and so on

Rune's notes:

Endre:
" M 34 AIDS" example

Grouping DRG codes: 10-102, 104B-107A,
GRO:
DRG is categorical variables (They are hierarchical.)
-Endre: The computer has no way of knowing this...

Have you seen the aggregated NPR cubes.
("Cognos"? create a table at aggregate levels of DRG cubes.)

Gro: De-identified vs. anonymous (not back-trackable)

Rune: We will never get truly anonymous data.
--One easy way out is to avoid "complete" data. If you remove

Aslak:
We predefine which groups we need (age, diagnoses, etc).
--Can we anonymize 10 or 100 variables
--We can only do 15 columns (anonymized) with the German Flash algorithm.
--If we have 100 columns (We need three columns to be 5-diverse). That's no problem.
--In one day: How many columns.
--10 or 11 columns in half a minute.

GRO:
Two directions:
--One public anonymous table for NSD
--One private semi-anonymous table for own research.

Endre:
We can throw away 10% of the "outliers".
--Rune: Than we cannot do back-tracking anymore, right?

NEXT STEP
Rune:

What is the question that Rannveig (HEMIT) wants to answer:
--Is the information in NPR of good enough quality?
--Does it match the quality of PAS? of the hospital internal records?
Aslak: What did you do?
--De-identifying data, like dates and so on.
--Representation of time, hard to make uniform between systems. Gro: Why?
---It was not represented in the same way within one file.
---Several patient IDS pr one patient.
--Gro: What if you stripped away new-borns and immigrants that don't have proper IDs.
Aslak: More about what you did?
--Anonymize: Every person had a day1 when s/he enter the hospital. (Normalize dates)

Where can we download the Kanon (Flash), program?
--SVN, update SVN+repository page in Wiki

How can we run Kanon?

Gro:
Why did you have to work with dates to de-identify the data?

Endre:
Rannbeigs workflow:
1) Pick out of the DB with complex queries
2) Run Endre's python programs on PAS, NPR,
3) Make sure this is re-usable!

THIRD STEP
What did you do for Kjartan?
--He wanted to run Kanon on his data, but the data was too big.
--Don't include too many columns. (as non-sensitive).

RUNES WORK FOR PASTAS

Take UNNis data and visualize
-Solid line for 24-hours of day services
-Dashed line with annotation hours per week for services

In the future

Concentrate on Evicare from 14-15, and Pastas from 15-16

Space shortcuts

Page tree

Rune's notes: