You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Colander: program to de-identify dates

By setting all occurrences of a specific event  (first hospitalization start for example), to a common date and time.

Eg. if you have a file with two patients, patients A and B which looks like the following

Patient

Date

Event

A

2011/10/12

In

B

2011/10/11

In

A

2011/14/12

Out

B

2011/10/12

Out

Patient

Date

Event

A

2000/01/01

In

B

2000/01/01

In

A

2000/04/01

Out

B

2000/01/02

Out

Author:

endrebak

Config file format (input_file_name|output_file_name (!= input file name)|columns containing dates|column containing pid)

CSV/sykehus_avd.csv|CSV/sykehus_avd_anon.csv|6,5,7|-13

CSV/tjenester.csv|CSV/tjenester_anon.csv|4,5|6

CSV/brukere.csv|CSV/brukere_anon.csv|1|5

CSV/saker.csv|CSV/saker_anon.csv|0,1|8

NB: first row, third column, first number is the index of the column which contains the date you wish to base the recomputation of all the other dates on. It is 6 in this example.

Usage:

adjust config file, run "python Colanderapp.py"

Files:

Colanderapp.py

config

Notes:
not tested on real data

  • No labels