Øystein

To start working on the new data from Haldor, we need...

  • Rune: Tell everyone about stand-off-management?
    • Look into BRAT
    • Examples from BioNLP 2013 competition
  • Laura: Add timestamps down to the minute, and author, for all text files.
    • The file format is RTF, and it is very messy
    • Haldor can convert them to real text by using DIPS-scripts
  • Create 10-fold split of the dataset
    • Make sure only 8 folds are used for training, 1 for development testing, and keep the last split secret for FINAL testing
  • Fileformats
  • Sentence Separation
    • Hans: Using available Java code from the web.
      • Is it necessary? Yes, Haldor does not have sentence boundaries.
      • Stoplist, stemlist,
      • Upload this on the
    • Check Thomas' sentence splitting
  • Tools: http://2013.bionlp-st.org/supporting-resources
    • Contains tokenisation, lemmatisation, sentence splitting, chunking and PoS-tagging in a unified BioC XML format
  • No labels