Validating machine-generated YAML and JSON using Kwalify in Ruby 1.9

Suraj N. Kurapati

  1. Problem
    1. Approach
      1. Solution

        Problem

        Kwalify is a Ruby library for parsing and validating YAML and JSON documents. However, it only supports a “mostly YAML 1.0” subset where the bodies of mappings (values associated with hash keys) must be indented; otherwise, the following error occurs:

        ERROR: file:line:1 [/] document end expected (maybe invalid tab char found).
        

        For example, this is Kwalify style:

        language:
          - name: Ruby
            type: dynamic
            license:
              - Ruby
              - GPL v2
        

        In contrast, this is YAML 1.0+ style:

        language:
        - name: Ruby
          type: dynamic
          license:
          - Ruby
          - GPL v2
        

        This poses a problem when validating machine-generated YAML documents in Ruby 1.9 because its libyaml-based YAML library (known as “Psych”) only emits YAML 1.1 documents, which are not the “mostly YAML 1.0” kind that Kwalify expects. For example, the last line in the following code raises the aforementioned error:

        require 'kwalify'
        
        schema = Kwalify::Yaml.load_file('some_complex_schema.yaml')
        validator = Kwalify::Validator.new(schema)
        parser = Kwalify::Yaml::Parser.new(validator)
        
        yaml = some_complex_object.to_yaml # machine-generate
        data = parser.parse(yaml)          # parse & validate <== FAIL
        

        Approach

        I initially approached this problem by trying to make Ruby’s YAML emitter indent the bodies of mappings as the Kwalify style requires:

        require 'psych'
        yaml = Psych.dump(some_complex_object, indentation: 4).
               gsub(/^(\s*)-   (?=\S.*:)/, '\1  - ')
        

        However, this did not address the incompatibility between Ruby’s YAML 1.1 emitter and Kwalify’s “mostly YAML 1.0” parser. For instance, Kwalify choked on the notation of multi-line strings as expressed in YAML 1.1.

        Defeated, I turned to the Kwalify user’s guide for a hint and found one:

        JSON can be considered as a subset of YAML. It means that YAML parser can parse JSON and Kwalify can validate JSON document.

        Since Kwalify supports parsing and validating JSON documents, we can work around the problem by machine-generating JSON documents instead of YAML ones and then feed them into Kwalify for parsing and validation:

        require 'json'
        yaml = some_complex_object.to_json
        

        Unfortunately, Kwalify failed to parse the machine-generated JSON:

        Kwalify::SyntaxError: file: , line 1: mapping key is expected.
        

        How could this be? The Kwalify user’s guide even has examples of JSON parsing and validation. I promptly tried them out, via copy and paste, and they worked exactly as advertised. Strange.

        Upon further trails, I noticed the key difference between the examples shown in the Kwalify user’s guide and our machine-generated JSON: the lack of minification in the former. Armed with this knowledge, and this helpful answer on StackOverflow, I finally arrived at a workable solution:

        require 'json'
        yaml = JSON.pretty_generate(some_complex_object)
        

        Solution

        Putting this all together, the final solution to the original problem is:

        require 'json'
        require 'kwalify'
        
        schema = Kwalify::Yaml.load_file('some_complex_schema.yaml')
        validator = Kwalify::Validator.new(schema)
        parser = Kwalify::Yaml::Parser.new(validator)
        
        yaml = JSON.pretty_generate(some_complex_object) # machine-generate
        data = parser.parse(yaml)                        # parse & validate
        

        Happy validating!