Lessons Learnt Writing PostgreSQL Extensions for JSON(b)


Gabriel Fürstenheim

  furstenheim

A bit about me

  • Mathematician
  • Fullstack backend @Geoblink
  • Main language is Javascript

JSON in Postgres

  • Two types JSON and JSONb
  • Easy to access information
  • Hard to manipulate
  • Easy to add functionality through C code

Selection of property

Input

                
                {"a": 1, "b": 2}
                

Output

                
                2
                

Solution

                    
                
                {"a": 1, "b": 2}->'b'

                

Property containment

Input

                
                {"a": 1, "b": 2}
                

Output

                
                t
                

Solution

                
                {"a": 1, "b": 2} ? 'b'

                

Sum of jsons

Input

                
                {"a": 1, "b": 2}
                {"a": 3, "b": 4}
                

Output

                
                    {"a": 4, "b": 6}
                

Solution

                
                SELECT jsonb_object_agg(key, val)
                FROM (
                    SELECT key, SUM(value::numeric) val
                    FROM test_table t, jsonb_each_text(info)
                    GROUP BY key
                 )s;   

                

Nested sum of jsons

Input

                
                {"a": {"c": 4}, "b": {"d": 4}}
                {"a": {"c": 3}, "b": {"d": 2}}
                

Output

                
                  {"a": {"c": 7}, "b": {"d": 6}}
                

Solution

????

C solution

                
                    jsonb_deep_add(
                        {"a": {"c": 4}, "b": {"d": 4}},
                        {"a": {"c": 3}, "b": {"d": 2}}
                    )
                

JSON Schema

                
                    {
                        "type": "object",
                        "additionalProperties": false, 
                        "properties": {
                            "email": {
                                "type": "string"
                            },
                            "number_of_stores": {
                                "type": "integer"
                            }
                        }
                    }
                

Starting a PostgreSQL extension

Jsonb



pushJsonbValue



JsonbIteratorToken
JsonbIterator



JsonbIteratorNext



JsonbValue

JsonbIterator

Flat iteration

Recursive Iteration

Flat iteration

Input

                        
                            {
                                "a": 1,
                                "b": {
                                    "c": 2
                                }
                            }
                        
                    

Output

                        
                            {
                                "a"
                                1
                                "b"
                                {"c": 2}
                            }
                        
                    

Recursive Iteration

Input

                        
                            {
                                "a": 1,
                                "b": {
                                    "c": 2
                                }
                            }
                        
                    

Output

                        
                            {
                                "a"
                                1
                                "b"
                                    {
                                        "c"
                                        2
                                    }
                            }
                        
                    

Jsonb and JsonbValue

  • Everything is a JsonbValue: keys, values, object descriptions
  • findJsonbValueFromContainer
  • JsonbValueToJsonb

Building a JSONb

  • Same order as JsonbIterator
  • pushJsonbValue

Naïve Implementation

  • Iterate over first object
  • Get value from second object
  • Push sum to Generator

Cons

  • Problems when keys are not the same
  • Not optimal

Recursive sort-merge join

  • Keys in JSONb are sorted on parsing
  • Traverse the JSONs simultaneously
  • O(#keys)

Performance for Summation

Numbers of rows Level of nesting C (ms) SQL (ms) Improvement
2 million 0 3705 11083 3x
2 million 1 12014 ??

Performance for Json Schema Validation

Numbers of tweets C (ms) SQL (postgres json schema) (ms) Improvement
10000 8911 203172 23x

Key points

  • Easy to extend functionality for JSONb
  • Keys are stored in order in the database
  • Recursive sort-merge join

Resources

Don't fear the JSON

Thank you!!

  furstenheim