SWIM Supporting Material
XSD to JSON Schema Translation guidelines
This page contains supporting material. The content is mature but is subject to review, comment and change by the SWIM Community of Interest.
This page contains guidelines that may assist in translating XML Schema Definitions (XSD) into - as much as possible - equivalent JSON Schemas.
Since, in the ATM content, the main Exchange Models are XML based, these guidelines might be useful (e.g.) whenever the concerned Community of Interests managing these models decide to explore the creation of a JSON representation of their model. Analogously, these guidelines may be useful whenever an organization is interested to create (e.g., for internal purposes) a JSON representation of these exchange models (or a subset of them) starting from their current XSDs.
While defining such guidelines, the following Principles and considerations have been taken into account.
General Principles and Considerations
Data-centric vs. Document-centric: XSD often defines document structures (elements, attributes, mixed content), while JSON is inherently data-centric (key-value pairs, arrays). This fundamental difference requires careful consideration. The translation should prioritize representing the data rather than replicating the XML document's hierarchical structure literally if it leads to unnatural JSON.
Lossless Conversion (where possible): The ideal scenario is a round-trip conversion (XSD to JSON Schema and back) without losing critical information. While not always fully achievable due to the differences in expressiveness, it is a guiding principle.
Readability and Usability: The resulting JSON Schema should be as clear and intuitive as possible for JSON developers. Avoiding "XML-isms" in the JSON Schema is important for adoption and understanding.
Validation Equivalence: Of of the main purposes of both schemas is validation. The translated JSON Schema should enforce the same constraints on JSON instances as the XSD would on XML instances.
Handling Namespaces: XML namespaces are a significant challenge as JSON has no native concept of them.
Best Practice: Avoid embedding namespace prefixes directly into JSON property names unless absolutely necessary for disambiguation and if the consuming applications are explicitly designed to handle them. This often leads to less readable and "un-JSON-like" structures.
Preferred Approach:
If the namespace simply indicates a logical grouping and doesn't introduce naming conflicts for simple types, it can often be omitted.
If namespaces are crucial for identifying different types or preventing name collisions, consider using nested JSON objects to represent different namespaces. For example,
{"ns1": {"elementA": ...}, "ns2": {"elementA": ...}}. This maintains clarity but can increase nesting.For highly complex, multi-namespace XSDs, consider a "flattening" approach with well-defined naming conventions (e.g.,
namespace_elementName).
Criteria: Readability, avoidance of name collisions, and the consumer's ability to interpret the namespace information.
Note:
The format keyword in JSON Schema is not always considered a strict assertion to be checked by all JSON validators. Therefore, for critical validation where precision is fundamental, it is often better to define an explicit pattern. This is due to the nature of format in JSON Schema:
Annotation, Not Strict Validation (by default): The JSON Schema specification (e.g., Draft 2020-12, but this has been consistent across drafts) defines
formatas an annotation keyword. This is a crucial distinction. The specification states:"The value of this keyword is a string, which specifies a format that the string-valued instance is expected to have."
"Implementations MAY choose to ignore
formatfor the purpose of validation. When they do, they SHOULD provide an option to enable format validation, but they are not required to do so." (Emphasis added)This "MAY" clause means that while many validators do implement format validation for common formats (like
date-time,email,uri), they are not mandated to do so for all formats or in all contexts. Some validators might provide options to enable/disable it, while others might simply ignore it for strict validation.Purpose of
format:Semantic Meaning: It conveys semantic intent (e.g., "this string is an email address"). This is important for human understanding and for tools (e.g., a form generator might automatically provide an email input field).
Tooling Hints: It guides UI frameworks, code generators, and other tools in how to handle or display the data.
Common Recognition: For well-known formats (like
date-time,uri,email,ipv4,uuid), many validators do perform the validation.
On the other side, pattern in JSON Schema serves for:
Strict Validation: The
patternkeyword, in contrast, is a validation keyword. The specification explicitly mandates that validators must implement it:"The value of this keyword MUST be a string. This string SHOULD be a valid regular expression, according to the ECMA-262 regular expression dialect."
"A string instance is valid against this keyword if the regular expression matches the instance successfully."
There is no "MAY" clause. If you define a
pattern, a conforming validator must check it.Precision and Control: Regular expressions provide a high degree of control and precision over the exact string format, leaving no ambiguity for the validator.
Therefore, in the context of XSD to JSON Schema translation, adding a pattern in addition to format can be considered a best practice when the original XSD's validation of a specific string format was mandatory and strictly enforced, and it is important to ensure that same level of strictness and consistency across all JSON Schema validators.
XSD Constructs to JSON Schema Mapping
This section provides a detailed mapping among XSD constructs and types towards a JSON Schema counterpart. Since different approaches might be possible, for each of them a preferred approach is indicated together with its justification.
The choices outlined below are driven by a combination of factors:
Direct Semantic Equivalence: Where a JSON Schema keyword or construct directly mirrors an XSD construct's meaning and validation behavior (e.g.,
xs:enumerationtoenum, facets likeminLengthtominLength), that's always the preferred translation. This ensures the highest fidelity.Idiomatic JSON: JSON has its own conventions and patterns. Translations that result in "natural" JSON structures (e.g., objects for complex types, arrays for repeated elements) are favored over those that force XML's tree structure in a cumbersome way. This enhances usability and interoperability with JSON-native tools and developers.
Validation Completeness: The goal is to retain as many of the original validation rules as possible. If a direct mapping isn't available, finding the closest approximation or noting the limitation is important. For instance, while
totalDigitshas no direct JSON Schema equivalent, knowing this allows for a decision on whether to implement it at the application layer or via a regex pattern.Minimizing Ambiguity: XSD offers flexibility (e.g., mixed content, attributes vs. elements) that JSON Schema doesn't. Best practices aim to resolve these ambiguities in a consistent and predictable manner.
Tooling Support: Practical considerations often influence choices. Mappings that are well-supported by existing JSON Schema validation libraries and generation tools are more viable.
Simple Types (xs:simpleType)
xs:string,xs:normalizedString,xs:token,xs:language,xs:Name,xs:NCName,xs:ID,xs:IDREF,xs:IDREFS,xs:ENTITY,xs:ENTITIES,xs:NMTOKEN,xs:NMTOKENS:Preferred Approach: Map directly to
"type": "string".Criteria: Direct equivalence.
Note for
IDREF/IDREFS: While the type is string, JSON Schema doesn't have a direct equivalent for ID/IDREF validation. This typically need application-level validation or a custom keyword for full fidelity. ForIDREFSandENTITIES, map to"type": "array", "items": {"type": "string"}.
xs:boolean:Preferred Approach: Map to
"type": "boolean".Criteria: Direct equivalence.
xs:decimal,xs:float,xs:double:Preferred Approach: Map to
"type": "number".Criteria: Direct equivalence for floating-point numbers.
xs:integer,xs:nonPositiveInteger,xs:negativeInteger,xs:long,xs:int,xs:short,xs:byte,xs:nonNegativeInteger,xs:unsignedLong,xs:unsignedInt,xs:unsignedShort,xs:unsignedByte,xs:positiveInteger:Preferred Approach: Map to
"type": "integer".Criteria: Direct equivalence for whole numbers.
xs:date,xs:time,xs:dateTime:Preferred Approach: Map to
"type": "string"with the corresponding"format"keyword (e.g.,"format": "date","format": "time","format": "date-time"). Note that, for ensuring the same behavior across different validators, it is recommended to also add apattern.For
"format": "date": "pattern": "^\\d{4}-\\d{2}-\\d{2}$"For
"format": "time": "pattern": "^(?:[01]\\d|2[0-3]):(?:[0-5]\\d):(?:[0-5]\\d)(?:\\.\\d+)?(?:Z|[+-](?:[01]\\d|2[0-3]):(?:[0-5]\\d))?$" - representing HH:MM:SS[.sss][Z|(+|-)HH:MM]For
"format": "date-time": "^\\d{4}-\\d{2}-\\d{2}T(?:[01]\\d|2[0-3]):(?:[0-5]\\d):(?:[0-5]\\d)(?:\\.\\d+)?(?:Z|[+-](?:[01]\\d|2[0-3]):(?:[0-5]\\d))?$" - representing ISO 8601 combined date and time
Criteria: JSON Schema uses string representation for dates/times and relies on the
formatkeyword for semantic validation.
xs:anyURI:Preferred Approach: Map to
"type": "string"with"format": "uri". Also in this case, to ensure same behavior across different validators, it would be better to add apatterne.g.: "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]$" - but a full URI/URL validation with regex can be very complex and difficult to make universally correct. For this reason,format: "uri"is often used alone, relying on a robust validator's implementation.Criteria: Direct semantic mapping.
xs:base64Binary,xs:hexBinary:Preferred Approach: Map to
"type": "string"with"contentEncoding": "base64"or similar if specific binary handling is needed, or simply"type": "string"if the encoded string is sufficient. Some tools might represent these as an array of integers, but string is more common for textual JSON.Criteria: How the binary data is typically represented in JSON. Base64 encoded strings are standard.
xs:duration:Preferred Approach: Map to
"type": "string". There's no direct JSON Schema format for duration, so string representation (e.g., ISO 8601 duration format) is common.Criteria: Lack of direct equivalent; string is the most flexible.
xs:QName:Preferred Approach: Map to an object with properties like
{"namespaceURI": "...", "localPart": "...", "prefix": "..."}.Criteria: Preserving the components of a QName, as JSON doesn't have a native QName type. This is explicitly noted in some converter tools (e.g., Oxygen XML Editor).
xs:listXSD Role:
xs:listdefines a simple type whose content is a space-separated sequence of values of an atomic base type (or a union type whose members are atomic types). For example, a list of integers"1 2 3"or a list of dates"2023-01-01 2023-01-02". Facets likelength,minLength,maxLength, andenumerationcan be applied to the list itself (referring to the number of items in the list), or to theitemType(referring to constraints on individual items).JSON Schema Mapping: The most direct and idiomatic mapping for an
xs:listis to a JSON Schema array.Preferred Approach:
Map the overall
xs:listtype to"type": "array".Map the
itemTypeof thexs:listto theitemskeyword within the JSON Schema array definition. Theitemskeyword's schema will define the type and constraints for each element in the array.Map XSD facets applied to the
xs:listitself (e.g.,length,minLength,maxLengthfor the number of items) to theminItemsandmaxItemskeywords in the JSON Schema array.Map XSD facets applied to the
itemTypeto the corresponding keywords within theitemsschema in JSON Schema.
Criteria:
Direct Semantic Equivalence: An XSD list fundamentally represents a collection of values of a specific type. JSON arrays are the direct equivalent for collections.
Idiomatic JSON Schema: Using
type: "array"anditemsis the standard and most readable way to define lists/arrays in JSON Schema, well-supported by all tooling.Validation Equivalence:
minItems,maxItems, and the schema withinitemsallow for precise replication of the validation rules imposed by XSDlength,minLength,maxLengthon the list, and by facets on theitemType.
Facets (xs:minInclusive, xs:maxInclusive, xs:minLength, etc.)
xs:minInclusive,xs:maxInclusive:Preferred Approach: Map to
minimumandmaximumkeywords respectively.Criteria: Direct equivalence.
xs:minExclusive,xs:maxExclusive:Preferred Approach: Map to
exclusiveMinimumandexclusiveMaximumkeywords respectively.Criteria: Direct equivalence.
xs:minLength,xs:maxLength:Preferred Approach: Map to
minLengthandmaxLengthkeywords respectively.Criteria: Direct equivalence for string and array lengths.
xs:length:Preferred Approach: Map to both
minLengthandmaxLengthwith the same value.Criteria:
lengthimplies an exact length, which is best represented by both minimum and maximum in JSON Schema.
xs:pattern:Preferred Approach: Map to the
patternkeyword.Criteria: Direct equivalence for regular expression validation.
xs:enumeration:Preferred Approach: Map to the
enumkeyword.Criteria: Direct equivalence.
xs:totalDigits,xs:fractionDigits:Preferred Approach: These have no direct JSON Schema equivalent. For
totalDigits, you might usepatternwith a regex to enforce digit count, or potentially a custom keyword if strict numeric validation is needed. ForfractionDigits,multipleOfcan be used (e.g.,multipleOf: 0.01for two decimal places) even if it's not a perfect match.Criteria: Lack of direct equivalence; finding the closest functional approximation. Often, these require application-level validation.
Complex Types (xs:complexType)
Elements (
xs:element) as Properties:Preferred Approach: Elements within a complex type typically map to properties of a JSON object.
Criteria: The natural way to represent structured data in JSON.
Attributes (
xs:attribute):Preferred Approach: Map attributes to properties within the same JSON object as their parent element, or as a nested object if attribute names clash with child element names.
Criteria: JSON doesn't distinguish between attributes and elements. Treating them as properties is the most common approach. If an XML element has both content and attributes, the XML content often becomes a property (e.g.,
_value) and attributes become other properties.
Sequence (
xs:sequence):Preferred Approach: Represents an ordered list of properties in the JSON object. JSON Schema inherently implies order for arrays, but for object properties, order is not semantically significant. The
propertyNameskeyword can define schema for property names, but it doesn't enforce ordering of keys.Criteria: JSON object properties are unordered. If order is truly critical, consider an array of objects where each object represents an item from the sequence.
Choice (
xs:choice):Preferred Approach: Map to the
oneOforanyOfkeywords in JSON Schema.oneOfimplies exactly one of the subschemas must be valid, whileanyOfmeans one or more.oneOfis generally preferred forxs:choice.Criteria: Directly captures the "choose one" semantic.
All (
xs:all):Preferred Approach: Map to an object with all specified properties as optional, or use
allOfto combine schemas, where each schema defines one of the elements from theallgroup.Criteria:
xs:allmeans elements can appear zero or one time, in any order. JSON Schema object properties are inherently unordered.
Mixed Content (
mixed="true"):Preferred Approach: This is notoriously difficult. Often, the textual content is mapped to a special property (e.g.,
_textor_value), and child elements become other properties. If the mixed content is truly unstructured (like HTML snippets), it might be best represented as a single string.Criteria: JSON doesn't naturally support mixed content. The approach depends on how the mixed content is consumed.
Occurrences (
minOccurs,maxOccurs):Preferred Approach:
minOccurs="0": The corresponding JSON Schema property is not listed inrequired.minOccurs="1"(default): The corresponding JSON Schema property is listed inrequired.maxOccurs="1"(default): The corresponding JSON Schema property is a single value (e.g.,"type": "string").maxOccurs="unbounded"ormaxOccurs > 1: The corresponding JSON Schema property is an array ("type": "array") andminItems/maxItemscan be used to reflectminOccurs/maxOccursif they are greater than 1.
Criteria: Direct equivalence and standard JSON Schema array usage
Groups (
xs:group):Preferred Approach: Often, a named group can be translated into a reusable definition in JSON Schema's
$defs(ordefinitionsin older drafts) and referenced using$ref.Criteria: Promotes reusability and modularity, mirroring XSD's group concept.
Type Extension (
xs:extension) and Restriction (xs:restriction):Preferred Approach:
Extension: Use
allOfto combine the base type's schema with the extending type's additional properties.Restriction: Apply the restricting facets directly to the corresponding JSON Schema type. If a type restricts a complex type, it might use
allOfwith an object schema that has more restrictive properties.
Criteria:
allOfis the primary mechanism for inheritance/composition in JSON Schema. Facets are directly mappable.
SimpleContent
xs:simpleContentXSD Role: Defines a complex type that does not contain child elements, but allows for character content (like a simple type) and can have attributes. It's primarily used when you want to extend or restrict a simple type to add attributes, or when an element just holds text but needs attributes.
Example:
<price currency="USD">19.99</price>where19.99is the simple content andcurrencyis an attribute.
JSON Schema Mapping: This can be a particular translation due to JSON's fundamental difference from XML: JSON doesn't inherently distinguish between "text content" and "attributes" on a single node. Everything is a key-value pair.
Preferred Approach:
Represent the simple content as a dedicated property within the JSON object. Common property names for this purpose are
"_value","$value", or"_text". The type of this property corresponds to the base simple type.Map the attributes as regular properties within the same JSON object.
Criteria:
Fidelity and Semantic Preservation: This approach allows for the representation of both the simple value and its associated attributes in a structured JSON object.