Serialisation is the process of creating a linear stream of bytes or characters corresponding to an object structure (a graph), so the same structure can be re-constructed by de-serialising the stream. This technique is used both to save the object structure to file and to transmit it across a network.
Introduction
Persistence is the term used for storing (application) state in between sessions, in our case model instances represented by objects structures (graphs), and covers file and database storage, including SQL, object and NoSQL databases. The logic of files and databases are so different that it is easier to treat them separately. A file is essentially a sequence or array of bytes, which often encode characters, and serialisation is the term used for mapping the object structure to byte or character sequences. Although files are mostly meant for consumption by programs, they often need to be edited by hand, so a serialisation format should ideally be human-readable and to some extent editable.
Model instances are graphs in general, but it is easier to think of them as mostly hierarchical, with links across. Hence, a serialisation mechanism will typically need to support the following features:
- type information, i.e. references to EClasses in the model, so instances can be created
- attribute values, corresponding to EAttributes, so attributes of instances can be set or filled
- links, corresponding to EReferences, similar to attributes where the values are strings that can resolve to objects in the context of the object hierarchy
- containment, corresponds to EReference with containment flag set
Given these features, it is pretty easy to make a general algorithm for serialising and de-serialising object structures. To serialise, identify the root object(s) and traverse the object hierarchy. For each object output the EClass reference, and for each EStructuralFeature, output the name and values (EAttributes) or object identities (EReferences). For containment, use some kind of nesting indicator, like curly braces and/or indentation. To de-serialise, create the objects and set or fill the attributes while parsing, and store the EReference names and object identities. After all objects have been created, link them together by resolving the object identity
Note that all these somehow need to reference elements in the model (of the instance that needs to be serialised), instance types need to refer to EClasses, attribute values, links and containment need to refer to EStructuralFeatures (EAttributes and EReferences). However, since all these references are used in the context of some element, they needn't always be fully spelt out with something like a qualified name. E.g. a simple name is enough to identify an EClass within an EPackage or an EStructuralFeature within an EClass.
Resources, resource factories and resource sets
XML-based formats
EMF has support for XML-based