meeting 13. Nov 2012 Arezo, Aslak, Gro(tlf), Endre, Ingrid, Rune

Present: Arezo, Aslak, Gro, Endre, Ingrid, Rune

Endre present:

M 43 Aids

M 44 CHDo

K 49 CHD

patient data is often structured like this.

usually more is involved.

If you have a table like this you can tell that the male, 43 has aids -> not really anonymous.

you can deduce that the male 43 has aids because he is the only male 43 in the data set.

These data are not anonymous, just deidentified. Group data are more truly anonymous.

This is more anonymous since the age is grouped:

M 40-45 aids

M 40-45 CHD

K 46-50 CHD

Finding the optimal way of deidentify data is hard. There are so many different combinations that it will be hard to do in a limited amount of time. You have to find the balance between keeping information and making the data anonymous.There is no obvious way to do clustering of deseases. No obvious way of how to group the data.

what endres program do: collect all values from each row. prog automatically gives hierarchies.

cornell - program that do the same, but you have to make the tables yourself. very time consuming. hard to create the hierarchies.

Sensitive attribute - the one that will not be deidentified. the other variables will be deidentified while the sensitive attribute stays as it is.

Proj 2: (randveig)

Patient records.

she suspected that info in health records were of bad quality. If you search for a person a lot of info is not filled in and so on. she wanted to find out how bad ithe quality was. check quality of info in halth records. Endre worked with her with deidentifying data (dates and so on). prob: dates were not formated in the same way. one patient could have several ids.

Ingrid:

ta hensyn til induvidual (legen). suggestions for icd-10 codes - red, green and so on

Space shortcuts

Page tree