Colander: program to de-identify dates
By setting all occurrences of a specific event (first hospitalization start for example), to a common date and time.
Eg. if you have a file with two patients, patients A and B which looks like the following
Patient | Date | Event |
---|---|---|
A | 2011/11/12 | In |
B | 2011/11/11 | In |
A | 2011/14/12 | Out |
B | 2011/10/12 | Out |
Patient | Date | Event |
---|---|---|
A | 2000/01/01 | In |
B | 2000/01/01 | In |
A | 2000/04/01 | Out |
B | 2000/01/02 | Out |
Author:
endrebak
Config file format (input_file_name|output_file_name (!= input file name)|columns containing dates|column containing pid)
New Format? #input_file_name|(!=)output_file_name|all columns containing dates|one column containing the unique ID
CSV/sykehus_avd.csv|CSV/sykehus_avd_anon.csv|6,5,7|-13
CSV/tjenester.csv|CSV/tjenester_anon.csv|4,5|6
CSV/brukere.csv|CSV/brukere_anon.csv|1|5
CSV/saker.csv|CSV/saker_anon.csv|0,1|8
NB: first row, third column, first number is the index of the column which contains the date you wish to base the recomputation of all the other dates on. It is 6 in this example.
Usage:
adjust config file, run "python Colanderapp.py"
Files:
Notes:
not tested on real data