Colander: program to de-identify dates
By setting all occurrences of a specific event (first hospitalization start for example), to a common date and time.
Eg. if you have a file with two patients, patients A and B which looks like the following
Patient |
Date |
Event |
---|---|---|
A |
2011/10/12 |
In |
B |
2011/10/11 |
In |
A |
2011/14/12 |
Out |
B |
2011/10/12 |
Out |
Patient Date Event
A 2011/10/12 In
B 2011/10/11 In
A 2011/14/12 Out
B 2011/10/12 Out
Patient |
Date |
Event |
---|---|---|
A |
2000/01/01 |
In |
B |
2000/01/01 |
In |
A |
2000/04/01 |
Out |
B |
2000/01/02 |
Out |
Author:
endrebak
Config file format (input_file_name|output_file_name (!= input file name)|columns containing dates|column containing pid)
CSV/sykehus_avd.csv|CSV/sykehus_avd_anon.csv|6,5,7|-13
CSV/tjenester.csv|CSV/tjenester_anon.csv|4,5|6
CSV/brukere.csv|CSV/brukere_anon.csv|1|5
CSV/saker.csv|CSV/saker_anon.csv|0,1|8
NB: first row, third column, first number is the index of the column which contains the date you wish to base the recomputation of all the other dates on. It is 6 in this example.
Usage:
adjust config file, run "python Colanderapp.py"
Files:
Notes:
not tested on real data