1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
|
Formal description of an extended plain-text property list format
=================================================================
Apple's [property list format][plist] format is a data serialization
format used extensively by Mac OS X and OS X applications. Its
genesis as the NeXTStep [ASCII plist format][ascii] was extremely user-
and developer-friendly, being both human- and computer- readable, easy
to edit by hand, or to use in code. Unfortunately, with OS X, Apple
has migrated to a new [XML-based format][xml], which, though it is
still extremely to use in code, is only barely human-readable. To this
end, a new format is needed: one which supports all of the recently
added features of the XML format such as Unicode, and first-class
numeric and date formats.
[plist]: http://en.wikipedia.org/wiki/Property_list
[ascii]: http://developer.apple.com/documentation/Cocoa/Conceptual/PropertyLists/Articles/OldStylePListsConcept.html
[xml]: http://developer.apple.com/documentation/Cocoa/Conceptual/PropertyLists/Articles/XMLPListsConcept.html
Our new format is based on the NeXT ASCII plist format, but modified to
include all of the object types allowed by Apple's XML property lists,
and to allow Unicode, while remaining as much as possible a strict
superset of the NeXT format. A few [backward-incompatible][b-comp]
changes have been made, such as the interpretation of single-quoted
strings as 'raw', without using backslash escaping.
[b-comp]: http://en.wikipedia.org/wiki/Backward_compatibility
As much as possible, we should be able to convert between this format
and the XML format with no loss of information. This cannot be
completely achieved: spacing will have to change during such a
round-trip, as well as comment formatting. But such lossless
conversions are nonetheless a goal of the format.
One major backward-compatible change is the allowed use of whitespace
as an array or dictionary separator, removing altogether the need for
commas or semicolons. Either of the former can optionally be used as
a separator for arrays or dictionaries, and the canonical conversion
from xml to plain-text formats should use semicolons as dictionary
separators, and commas as array separators.
Grammar
-------
This specification is mostly BNF-like, but also includes something
like regular expression syntax for some elements, or like the 'EBNF'
(different from standard EBNF) used by the W3C for the XML spec:
* `foo?` means `foo` is optional
* `foo*` means 0 or more occurences of `foo`
* `foo+` means 1 or more occurences of `foo`
* `[<characters>]` is to be interpreted like character ranges in
regular expressions
* `'<string>'` or `"<string>"` is a literal string
Otherwise, the syntax that follows is just as BNF.
### Plain-text property list format extreme
block-comment-content ::= [^*] | "*" [^/]
block-comment ::= "/*" block-comment-content* "*/"
line-comment-content ::= [^\n]
line-comment ::= "//" line-comment-content*
comment ::= block-comment | line-comment
whitespace ::= [ \t\n]
space ::= whitespace | comment
* * *
integer ::= "-"? [1-9] [0-9]*
float ::= "-"? ([0-9]+ "." [0-9]* | "." [0-9]+)
([eE] [+-] [0-9]+ )?
true ::= ".t" | ".true"
false ::= ".f" | ".false"
boolean ::= true | false
number ::= boolean | integer | float
Still need to figure out what a 'boolean' should look like. I'm
leaning towards something like `.t` and `.f` for true and false,
respectively, but some other format might be as good or better. I'm
open to suggestions.
Also, it may be useful to allow the inclusion of octal or hexadecimal
numbers. I'll leave that to be decided in the future.
* * *
unquoted-string ::= [a-zA-Z_] [a-zA-Z0-9_-]+
backslash-escape ::= "\" [\"bnrt]
octal-escape ::= "\" [0-7] [0-7] [0-7]
unicode-escape ::= "\U" [0-9a-fA-F] [0-9a-fA-F]
[0-9a-fA-F] [0-9a-fA-F]
string-escape ::= unicode-escape | backslash-escape |
octal escape
string-character ::= [^"\] | string-escape
quoted-string ::= '"' string-character* '"'
Notice that unquoted strings cannot contain arbitrary unicode.
Allowing unquoted strings for very simple ascii things without spaces
between is useful because it makes simple plists easier to write/read,
but any strings with odd characters: with periods or spaces or
unicode, or beginning with digits, should be quoted.
quote-escape ::= "''"
raw-string-content ::= [^'] | quote-escape
raw-string ::= "'" raw-string-content* "'"
string ::= unquoted-string | quoted-string |
raw-string
* * *
data-content ::= space* [0-9a-fA-F]
data ::= "<" hex-content* space* ">"
date ::=
The date format is based on ISO-8601. It looks like @ followed by most
valid ISO-8601 dates, assumed (as any ISO-8601 date) to be in the local
timezone, unless one is specified.
* * *
separator ::= "," | ";" | whitespace
object ::= number | string | data | date |
array | dictionary
structure-space ::= space | separator
array-first-element ::= structure-space* object
array-other-element ::= structure-space* separator
structure-space* object
array-content ::= array-first-element
array-other-element*
array ::= "(" array-content? structure-space* ")"
unquoted-dict-key ::= [a-zA-Z0-9_] [a-zA-Z0-9_-]+
dict-key ::= unquoted-dict-key | quoted-string |
raw-string
dict-element ::= space* dict-key space* "=" space*
object space*
dict-first-element ::= structure-space* dict-element
dict-other-element ::= structure-space* separator
structure-space* dict-element
dict-content ::= dict-first-element dict-other-element*
dict ::= "{" dict-content? structure-space* "}"
This is definitely not as clean as it could be expressed. What I'm
getting at is that an array starts with a `(`, then includes
arbitrarily-many elements, separated by at least some whitespace or a
separator. Likewise for dictionaries. Separators don't need to have
content between them necessarily, but a separator can't come between a
label and its object in a dictionary. A dictionary element must have
a key, a `=`, and an object in order to be valid.
* * *
plist ::= space* object space*
And finally, at long last, the top-level plist object, which could be
of any type, but is usually a dictionary.
* * *
Conversion <-> XML
------------------
All of these types can be converted <-> XML, with no loss in the
actual data, though whitespace formatting will be lost upon
conversion. Additionally:
* In the case of comments, I suppose there's no way to ensure that
the aligned comments will be correctly aligned when round tripped
through the XML format. Additionally, XML comments are allowed
within strings themselves, and maybe between digits of a number,
or inside a date, etc. These are probably impossible to duplicate
in this ascii format, so such xml comments will likely need to be
moved to the beginning/end of the relevant structure when
converting from xml to this new format. Additionally, XML has no
'line comments', so line comments in this new format may switch to
block comments when a document is round-tripped through xml, and
vice versa.
* Dates will probably change form somewhat when round-tripped
through XML, as hopefully this new format will be slightly
forgiving w.r.t. date formats.
* Raw strings when converted should use XML's CDATA tags if
possible, to maintain readability in the resulting XML. Then
strings surrounded by CDATA tags could maybe be converted back as
raw strings. Otherwise, raw strings and quoted strings might
switch from one to the other when converting to xml and back, and
either one might be converted to an unquoted string, if it
contains only alphanumeric characters. It is altogether
impossible to completely tell these types of strings apart after a
document has been converted to XML.
* Arrays and dictionaries will have any separators that were
originally missing added to them when a document is converted to
this format from XML. Any consecutive separators will be limited
to only those necessary to separate actual objects in the
array/dictionary.
* If a document in this format is converted to xml, serialized and
then deserialized, and then converted back to this format, no
guarantees can be made that the result will look like an original
document. Any comments will be lost, the types of (particularly
numeric) objects could change, etc. This probably goes without
saying, but mentioning it doesn't hurt anything.
Other commentary
----------------
* It may be nice to begin our new-style property lists with an
(optional) unique string, so that they may be easily identified.
* Almost all old-style ASCII plists should also be valid documents
in this new format, with the notable exception that single-quoted
strings and raw strings have different syntax and semantics, and
so some care must be exercised with ascii plists containing
single-quoted strings.
|