For the original document, look here.
BSON is a binary format in which zero or more ordered key/value pairs are stored as a single entity. We call this entity a document.
The following grammar specifies version 1.1 of the BSON standard. We’ve written the grammar using a pseudo-BNF syntax. Valid BSON data is represented by the document non-terminal.
The following basic types are used as terminals in the rest of the grammar. Each type must be serialized in little-endian format.
byte 1 byte (8-bits) int32 4 bytes (32-bit signed integer, two’s complement) int64 8 bytes (64-bit signed integer, two’s complement) uint64 8 bytes (64-bit unsigned integer) double 8 bytes (64-bit IEEE 754-2008 binary floating point) decimal128 16 bytes (128-bit IEEE 754-2008 decimal floating point)
The following specifies the rest of the BSON grammar. Note that quoted strings represent terminals, and should be interpreted with C semantics (e.g. “0x01” represents the byte 0000 0001). Also note that we use the * operator as shorthand for repetition (e.g. (“0x01”*2) is “0x010x01”). When used as a unary operator, * means that the repetition can occur 0 or more times.
document ::=
int32 e_list "0x00" ①
element-list ::=
element element-list
| ""
element ::=
"0x01" e_name double 64-bit binary floating point
| "0x02" e_name string UTF-8 string
| "0x03" e_name document Embedded document
| "0x04" e_name document Array
| "0x05" e_name binary Binary data
| "0x06" e_name Undefined (value) — Deprecated
| "0x07" e_name (byte*12) ObjectId
| "0x08" e_name "0x00" Boolean "false"
| "0x08" e_name "0x01" Boolean "true"
| "0x09" e_name int64 UTC datetime
| "0x0A" e_name Null value
| "0x0B" e_name cstring cstring Regular expression
| "0x0C" e_name string (byte*12) DBPointer — Deprecated
| "0x0D" e_name string JavaScript code
| "0x0E" e_name string Symbol. — Deprecated
| "0x0F" e_name code_w_s JavaScript code w/ scope — Deprecated
| "0x10" e_name int32 32-bit integer
| "0x11" e_name uint64 Timestamp
| "0x12" e_name int64 64-bit integer
| "0x13" e_name decimal128 128-bit decimal floating point
| "0xFF" e_name Min key
| "0x7F" e_name Max key
e_name ::=
cstring Key name
string ::=
int32 (byte*) "0x00" String - The int32 is the number bytes in the (byte*) + 1 (for the trailing '0x00'). The (byte*) is zero or more UTF-8 encoded characters.
cstring ::= (byte*) "0x00" Zero or more modified UTF-8 encoded characters followed by '0x00'. The (byte*) MUST NOT contain '0x00', hence it is not full UTF-8.
binary ::= int32 subtype (byte*) Binary - The int32 is the number of bytes in the (byte*).
subtype ::= "0x00" Generic binary subtype
| "0x01" Function
| "0x02" Binary (Old)
| "0x03" UUID (Old)
| "0x04" UUID
| "0x05" MD5
| "0x06" Encrypted BSON value
| "0x80" User defined
code_w_s ::= int32 string document Code w/ scope — Deprecated
① BSON Document. int32 is the total number of bytes comprising the document.