Shredding JSON in SQL Server Link to heading

For a while now (1,2), I’ve been a fan of shredding arbitrary JSON to key-value mapping. I’ve had some success doing this in PowerShell (3) and using jq (4). These exercises were helpful in identifying aggregate changes to ElasticSearch logs and MongoDB documents on a few occasions.

Sadly, I’ve never been able to grok the syntax in T-SQL. I blame this on SQL Server’s very limited support for JSON functions - as opposed to JSON support in PostgreSQL for example. Recently though I had a bit of a breakthrough. Something clicked into place and I’d like to document below the successful approach with some notes on what it does and why. To start with the best bit, the code:

The code Link to heading

The table-valued function below turns JSON input into a KVP table where each key is the fully qualified jPath to the element within the top-level JSON document.

dbo.json_shred()

create or alter function dbo.json_shred (
    @json nvarchar(max)
)
returns table 
as
return (
/*
Author:      Peter Vandivier
Date:        2021-10-04
Description: Takes an arbitrary JSON document and returns all valid
             json_paths within the document as well as the values
             present at said path and the depth within the document
             to which the given path with probe
Example:
    -- Filtering to `[Type] not in (4,5)` returns all leaf-node
    -- paths and values for the given document
    --
    declare @json nvarchar(max) = N'{"a":"foo","b":[1,2,{"d":"bar","e":[0]}],"f":{"g":"baz"},"h":null,"i":"zap"}';
    --
    select * 
    from dbo.json_shred(@json)
    where [Type] not in (4,5);
Example_Output:
    +-------+----------+-------------+-------+------+
    | Level | Parent   | Key         | Value | Type |
    +-------+----------+-------------+-------+------+
    | 1     | $.       | $.a         | foo   | 1    |
    | 1     | $.       | $.h         | NULL  | 0    |
    | 1     | $.       | $.i         | zap   | 1    |
    | 2     | $.f      | $.f.g       | baz   | 1    |
    | 2     | $.b      | $.b[0]      | 1     | 2    |
    | 2     | $.b      | $.b[1]      | 2     | 2    |
    | 3     | $.b[2]   | $.b[2].d    | bar   | 1    |
    | 4     | $.b[2].e | $.b[2].e[0] | 0     | 2    |
    +-------+----------+-------------+-------+------+
*/
    with level_0 as (
        select
            convert(int,0) as [Level],
            convert(nvarchar(4000),N'$') as [Key],
            @json as [Value],
            convert(
                int,
                case left(@json,1)
                    when N'[' then 4
                    when N'{' then 5
                    else 0
                end
            ) as [Type]
    )
    , key_value_unwrap as(
        select 
            l0.[Level] + 1 as [Level],
            convert(nvarchar(max),null) as Parent,
            l0.[Key] + iif(l0.[Type] = 5, '.' + oj.[Key], quotename(-1 + row_number() over (order by (select null)))) collate database_default as [Key],
            oj.[Value],
            oj.[Type]
        from level_0 l0
        outer apply openjson(l0.[Value]) as oj
        where l0.[Value] is not null 
        union all
        select 
            kvu.[Level] + 1 as [Level],
            convert(nvarchar(max),kvu.[Key]) as Parent,
            kvu.[Key] + iif(kvu.[Type] = 5, '.' + oj.[Key], quotename(-1 + row_number() over (order by (select null)))) as [Key],
            oj.[Value],
            oj.[Type]
        from key_value_unwrap as kvu
        outer apply openjson(kvu.[Value], 'lax $') as oj
        where kvu.[Type] in (4,5)
    ), _union as (
        select 
            l0.[Level],
            convert(nvarchar(max),null) as Parent,
            l0.[Key] + N'.' as [Key],
            l0.[Value],
            l0.[Type]
        from level_0 as l0
        union all
        select 
            kvu.[Level],
            kvu.Parent,
            kvu.[Key],
            kvu.[Value],
            kvu.[Type]
        from key_value_unwrap as kvu
    ) 
    select 
        u.[Level],
        iif(u.[Level]=1,N'$.',u.Parent) as Parent,
        u.[Key],
        u.[Value],
        u.[Type]
    from _union as u
);

The Explanation Link to heading

An invocation of the single-argument-no-schema overload of OPENJSON() will unpack all elements at the first level. If you then recurse your level-1 resultset into another application of single-argument OPENJSON(), you’ll get all elements from the 2nd level; and so on.

level_0 Link to heading

Why the clunky level_0 CTE? Well, dear reader, because APPLY is still a JOIN. The first iteration of OPENJSON over your document has a left-to-right relationship to the document; NOT an up-to-down relationship. Consider the following:

select *
from (values 
    (1, '{"a":1,"b":{"c":2}}'),
    (2, '{"d":3}')
) as v (id,doc)
cross apply openjson(v.doc) as js;

The resultset might be visualized like…

id	doc	key	value	type
1	{“a”:1,“b”:{“c”:2}}	a b	1 {"c":2}	2 5
2	{“d”:3}	d	3	2

Note that key-value pairs associate directly to the containing document. A JSON document can itself contain arbitrary JSON documents as constituent data (consider the data in [Key] “b” for document id 1). Therefore for a JSON shredder, I consider the document itself as a data value. If this were jq, the query string I would execute to get the whole document would be “.” (single dot) - meaning, simply: “retrieve everything”.

What I want is all level-1 key-value rows appended to the the level-0 document row - at a deeper [Level]. Something like this:

id	key	value	Level
1	. .a .b	{"a":1,"b":{"c":2}} 1 {"c":2}	0 1 1
2	. .d	{"d":3} 3	0 1

In order to achieve this, I’ve separately defined the level-0 tuple and I prepend it to the final resultset.

key_value_unwrap Link to heading

This is the “meat & potatoes” of the shredder. We iterate over the keys of the document and recurse to a higher level for any key whose corresponding value is an array or object type (OPENJSON() return types).

We decrement by one the output of row_number() where used to accomodate 0-based indexing. We suffix the local [Key] to the parent value to build the fully qualified path as we spelunk “deeper” into the object.

_union Link to heading

The _union CTE is partially redundant but serves to simplify some expression redundancy that would otherwise need to appear in the final select.

The homage Link to heading

One of my favorite StackOverflow answers ever is this T-SQL XML shredder. You can see it in action against a complex document on db<>fiddle. It stuck in my brain from the first time I used it and obviously influenced the design of this JSON shredder.

These 2 functions share more than just output style though. They both serve to do on SQL Server something you probably ought to be doing elsewhere in the stack - but sometimes you need to JFDI in a pinch. Just don’t look too close at the execution plan for either 😅