Shredding JSON in SQL Server Link to heading
For a while now (1,2), I’ve been a fan of shredding arbitrary JSON to key-value mapping. I’ve had some success doing this in PowerShell (3) and using jq (4). These exercises were helpful in identifying aggregate changes to ElasticSearch logs and MongoDB documents on a few occasions.
Sadly, I’ve never been able to grok the syntax in T-SQL. I blame this on SQL Server’s very limited support for JSON functions - as opposed to JSON support in PostgreSQL for example. Recently though I had a bit of a breakthrough. Something clicked into place and I’d like to document below the successful approach with some notes on what it does and why. To start with the best bit, the code:
The code Link to heading
The table-valued function below turns JSON input into a KVP table where each key is the fully qualified jPath to the element within the top-level JSON document.
dbo.json_shred()
create or alter function dbo.json_shred (
@json nvarchar(max)
)
returns table
as
return (
/*
Author: Peter Vandivier
Date: 2021-10-04
Description: Takes an arbitrary JSON document and returns all valid
json_paths within the document as well as the values
present at said path and the depth within the document
to which the given path with probe
Example:
-- Filtering to `[Type] not in (4,5)` returns all leaf-node
-- paths and values for the given document
--
declare @json nvarchar(max) = N'{"a":"foo","b":[1,2,{"d":"bar","e":[0]}],"f":{"g":"baz"},"h":null,"i":"zap"}';
--
select *
from dbo.json_shred(@json)
where [Type] not in (4,5);
Example_Output:
+-------+----------+-------------+-------+------+
| Level | Parent | Key | Value | Type |
+-------+----------+-------------+-------+------+
| 1 | $. | $.a | foo | 1 |
| 1 | $. | $.h | NULL | 0 |
| 1 | $. | $.i | zap | 1 |
| 2 | $.f | $.f.g | baz | 1 |
| 2 | $.b | $.b[0] | 1 | 2 |
| 2 | $.b | $.b[1] | 2 | 2 |
| 3 | $.b[2] | $.b[2].d | bar | 1 |
| 4 | $.b[2].e | $.b[2].e[0] | 0 | 2 |
+-------+----------+-------------+-------+------+
*/
with level_0 as (
select
convert(int,0) as [Level],
convert(nvarchar(4000),N'$') as [Key],
@json as [Value],
convert(
int,
case left(@json,1)
when N'[' then 4
when N'{' then 5
else 0
end
) as [Type]
)
, key_value_unwrap as(
select
l0.[Level] + 1 as [Level],
convert(nvarchar(max),null) as Parent,
l0.[Key] + iif(l0.[Type] = 5, '.' + oj.[Key], quotename(-1 + row_number() over (order by (select null)))) collate database_default as [Key],
oj.[Value],
oj.[Type]
from level_0 l0
outer apply openjson(l0.[Value]) as oj
where l0.[Value] is not null
union all
select
kvu.[Level] + 1 as [Level],
convert(nvarchar(max),kvu.[Key]) as Parent,
kvu.[Key] + iif(kvu.[Type] = 5, '.' + oj.[Key], quotename(-1 + row_number() over (order by (select null)))) as [Key],
oj.[Value],
oj.[Type]
from key_value_unwrap as kvu
outer apply openjson(kvu.[Value], 'lax $') as oj
where kvu.[Type] in (4,5)
), _union as (
select
l0.[Level],
convert(nvarchar(max),null) as Parent,
l0.[Key] + N'.' as [Key],
l0.[Value],
l0.[Type]
from level_0 as l0
union all
select
kvu.[Level],
kvu.Parent,
kvu.[Key],
kvu.[Value],
kvu.[Type]
from key_value_unwrap as kvu
)
select
u.[Level],
iif(u.[Level]=1,N'$.',u.Parent) as Parent,
u.[Key],
u.[Value],
u.[Type]
from _union as u
);
The Explanation Link to heading
An invocation of the single-argument-no-schema overload of OPENJSON()
will unpack all elements at the first level. If you then recurse your level-1 resultset into another application of single-argument OPENJSON()
, you’ll get all elements from the 2nd level; and so on.
level_0 Link to heading
Why the clunky level_0
CTE? Well, dear reader, because APPLY
is still a JOIN
. The first iteration of OPENJSON
over your document has a left-to-right relationship to the document; NOT an up-to-down relationship. Consider the following:
select *
from (values
(1, '{"a":1,"b":{"c":2}}'),
(2, '{"d":3}')
) as v (id,doc)
cross apply openjson(v.doc) as js;
The resultset might be visualized like…
id | doc | key | value | type |
---|---|---|---|---|
1 | {“a”:1,“b”:{“c”:2}} |
a b |
1 {"c":2} |
2 5 |
2 | {“d”:3} | d | 3 | 2 |
Note that key-value pairs associate directly to the containing document. A JSON document can itself contain arbitrary JSON documents as constituent data (consider the data in [Key]
“b” for document id 1). Therefore for a JSON shredder, I consider the document itself as a data value. If this were jq
, the query string I would execute to get the whole document would be “.
” (single dot) - meaning, simply: “retrieve everything”.
What I want is all level-1 key-value rows appended to the the level-0 document row - at a deeper [Level]
. Something like this:
id | key | value | Level |
---|---|---|---|
1 |
. .a .b |
{"a":1,"b":{"c":2}} 1 {"c":2} |
0 1 1 |
2 |
. .d |
{"d":3} 3 |
0 1 |
In order to achieve this, I’ve separately defined the level-0 tuple and I prepend it to the final resultset.
key_value_unwrap Link to heading
This is the “meat & potatoes” of the shredder. We iterate over the keys of the document and recurse to a higher level for any key whose corresponding value is an array or object type (OPENJSON() return types).
We decrement by one the output of row_number()
where used to accomodate 0-based indexing. We suffix the local [Key]
to the parent value to build the fully qualified path as we spelunk “deeper” into the object.
_union Link to heading
The _union
CTE is partially redundant but serves to simplify some expression redundancy that would otherwise need to appear in the final select
.
The homage Link to heading
One of my favorite StackOverflow answers ever is this T-SQL XML shredder. You can see it in action against a complex document on db<>fiddle. It stuck in my brain from the first time I used it and obviously influenced the design of this JSON shredder.
These 2 functions share more than just output style though. They both serve to do on SQL Server something you probably ought to be doing elsewhere in the stack - but sometimes you need to JFDI in a pinch. Just don’t look too close at the execution plan for either 😅