csv: a vocabulary for describing CSV files
Rurik Thomas Greenall (2014-01-26)
This document describes a vocabulary for describing CSV (and other column-oriented) files.
The vocabulary is provided under the
ODC-PDDL.
Namespace
The URI for this vocabulary is
http://www.ntnu.no/ub/data/csv/
The suggested prefix for this vocabulary is
csv
Terms
Properties and classes
ColumnOrientedDocument
A column-oriented document, typically a spreadsheet or data table
type: rdfs:Class
subclass of: foaf:Document
term status: stable
CsvDocument
A CSV document broadly conforming to IETF's RFC4180 (http://tools.ietf.org/html/rfc4180); csv:hasEscapeSymbol and csv:hasColumnDelimiter are both set for this class (\ and , respectively
type: rdfs:Class
subclass of: ColumnOrientedDocument
term status: stable
Column
A column in the document
type: rdfs:Class
status: stable
Cell
[DEPRECATED: it wasn't considered useful to describe minute detail, it made more sense to provide an RDF representation of the data with links to the CSV] A cell in the document.
type: rdfs:Class
status: archaic
hasColumnDelimiter
Explicit statement of which symbol is used to delimit columns in rows
type: rdfs:Property
domain: ColumnOrientedDocument
range: literal
status: stable
hasEscapeSymbol
Explicit statement of which symbol is used to escape characters in data
type: rdfs:Property
domain: ColumnOrientedDocument
range: literal
status: stable
hasColumn
Denotes the relationship between a CSV document and a column within that document.
type: rdfs:Property
domain: CsvDocument
range: Column
status: stable
hasCell
[DEPRECATED: it wasn't considered useful to describe minute detail, it made more sense to provide an RDF representation of the data with links to the CSV] Denotes the relationship between a column and a cell within that column.
type: rdfs:Property
domain: Column
range: Cell
status: archaic
hasIndex
An index that denotes the position of a column in a document, numbered from left to right from 1 and sequentially upwards.
type: rdfs:Property
domain: Column
range: nonNegativeInteger
status: stable
hasCharacterEncoding
Description of character encoding of a document, for example UTF-8, US-ASCII, ISO8859-1. This information corresponds to the optional MIME parameter 'charset' defined in RFC4180.
type: rdfs:Property
domain: CsvDocument
range: literal
status: stable
encodesLinebreaksAs
Description of how linebreaks are encoded in a document, a value such as LF, CRLF or CR+LF is expected.
type: rdfs:Property
domain: CsvDocument
range: literal
status: stable
Description of whether or not the first line of the document contains column headers; a boolean is expected. 'false' indicates that there is no header, while 'true' indicates that there is a header and that data thereby begins at row 2. This information corresponds to the optional MIME parameter 'header' defined in RDF4180, where a boolean value 'false' here represents 'absent', while a boolean value 'true' represents 'present'.
type: rdfs:Property
domain: CsvDocument
range: boolean
status: stable
hasColumnIndex
[DEPRECATED: it wasn't considered useful to describe minute detail, it made more sense to provide an RDF representation of the data with links to the CSV] An index that denotes the position of a cell in a column.
type: rdfs:Property
domain: Cell
range: Column
status: archaic
hasRowIndex
[DEPRECATED: it wasn't considered useful to describe minute detail, it made more sense to provide an RDF representation of the data with links to the CSV] An index that denotes the position of a cell in a row.
type: rdfs:Property
domain: Cell
range: nonNegativeInteger
status: archaic
mapsTo
Which RDF class values in the column map to.
type: rdfs:Property
domain: Column
range: rdfs:Property
status: stable
hasMultivalueSeparator
In cases where columns contain multiple values, indicate the separator symbol as (escaped) text. Please note: we provide no solution to your obvious meatspace problem.
type: rdfs:Property
domain: Column
range: literal
status: stable
Usage example
For a file "file.txt", with the following structure:
First name,Last name,Age,Pets
John,Doe,38,"Cat,Dog"
Jane,Doe,31,"Dog,Parakeet"
Maxie,Doe,33,Mouse
Les,Doe,39,"Horse,Dog,Sausage"
Which we represent in the following way:
<http://example.com/file.txt> a csv:CsvDocument ;
dcterms:title "people, ages and pets" ;
dcterms:creator "J. Doe" ;
dcterms:date "2011-04-21" ;
csv:hasCharacterEncoding "ASCII" ;
csv:encodesLinebreaksAs "CRLF" ;
csv:hasHeader "true"^^^xsd:boolean ;
csv:hasColumn :column1 ;
csv:hasColumn :column2 ;
csv:hasColumn :column3 ;
csv:hasColumn :column4 .
:column1 a csv:Column ;
rdfs:label "First name" ;
rdfs:comment "Contains the first name of a person" ;
csv:mapsTo foaf:givenName ;
csv:hasIndex "1" .
:column2 a csv:Column ;
rdfs:label "Last name" ;
rdfs:comment "Contains the last name of a person" ;
csv:mapsTo foaf:familyName ;
csv:hasIndex "2" .
:column3 a csv:Column ;
rdfs:label "Age" ;
rdfs:comment "Contains the age of a person" ;
csv:mapsTo foaf:age ;
csv:hasIndex "3" .
:column4 a csv:Column ;
rdfs:label "Pets" ;
rdfs:comment "Contains the pets people own separated by commas (yes, a useful textual comment for your computer)" ;
csv:mapsTo ex:pet ;
csv:hasMultivalueSeparator "," ;
csv:hasIndex "4" .