Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: reference fields within jsonb/json entries in postgres #143

Open
c-nv-s opened this issue Jan 1, 2023 · 13 comments
Open

feat: reference fields within jsonb/json entries in postgres #143

c-nv-s opened this issue Jan 1, 2023 · 13 comments

Comments

@c-nv-s
Copy link

c-nv-s commented Jan 1, 2023

would be nice if sq could reference fields from json data that has been saved in a table under the jsonb or json type in postgres.

@neilotoole neilotoole self-assigned this Mar 12, 2023
@neilotoole
Copy link
Owner

@c-nv-s Interesting idea. Do you have a concept in your mind of what the syntax for that might look like?

@c-nv-s
Copy link
Author

c-nv-s commented Mar 14, 2023

whatever keeps it most transparent for the end user that you can think of.
In postgres if you have a table called the_customer_orders_table and it has a jsonb column called thedata you reference a field something like this:

select thedata->'items'->'product_name' from the_customer_orders_table ;

in sq this could look something like this:

sq '.the_customer_orders_table | .thedata::jsonb.items.product_name, .thedata::jsonb.items.product_quantity, etc '

@c-nv-s
Copy link
Author

c-nv-s commented Mar 14, 2023

but if you think that could get very verbose with all the selectors, you could go fully declarative like this guy haha https://github.com/xcite-db/Unquery/blob/main/TUTORIAL.adoc

@neilotoole
Copy link
Owner

but if you think that could get very verbose with all the selectors, you could go fully declarative like this guy haha https://github.com/xcite-db/Unquery/blob/main/TUTORIAL.adoc

I'd never seen that tool before. There are dozens of us!

@neilotoole
Copy link
Owner

neilotoole commented Mar 26, 2023

whatever keeps it most transparent for the end user that you can think of. In postgres if you have a table called the_customer_orders_table and it has a jsonb column called thedata you reference a field something like this:

select thedata->'items'->'product_name' from the_customer_orders_table ;

in sq this could look something like this:

sq '.the_customer_orders_table | .thedata::jsonb.items.product_name, .thedata::jsonb.items.product_quantity, etc '

Thanks for the input @c-nv-s . The latter suggestion with the double colons (.thedata::jsonb...) prob not going to be viable, because the recently released column aliases feature uses the colon, e.g.

$ sq '.actor | .first_name:given_name'

The postgres "pointer style" might work, e.g.

$ sq '.actor.jsonblob->age'.

That would combine with column aliases to look like:

$ sq '.actor.jsonblob->age:how_old'

That doesn't seem too bad. Now, to combine it with whitespace names:

$ sq '.actor."json blob"->"person age":how_old'

Again, that could work.

For further nested JSON:

$ sq '.actor.jsonblob->person.age'

That seems pretty ok.

I'll have to give it some more noodling. Let me know if you have additional thoughts.

@neilotoole
Copy link
Owner

Or maybeeee something like:

$ sq .actor.jsonblob[person.age]

Although I'm not sure I like that as much.

@neilotoole
Copy link
Owner

neilotoole commented Mar 26, 2023

One potential downside to the arrow is that I was considering using it to incorporate insertion into the query language.

Right now you can do:

$ sq .actor --insert @othersrc.table

The theory being that MAYBE the insert could be implemented as:

$ sq '.actor -> @othersrc.table'

But I'm pretty lukewarm about it. There's other options too, like

$ sq '.actor | insert(@othersrc.table)'

Which might be clearer. Again, not sure if we'll take this route at all, or if the query language should stay exclusively for querying.

@c-nv-s
Copy link
Author

c-nv-s commented Mar 27, 2023

Another thing that drew me to sq was the prospect of having a tool that work similarly to jq which eases the brain tax of having to learn multiple languages and thier individual quirks.
so maybe yoou could consider the question "how can I keep this feature most jq-like?"
other than that i'm pretty easy-going on what you choose.
https://github.com/itchyny/gojq

fyi (just in case it wasn't clear) I mentioned this because I would say that your last example of the possible insert command is definitely the more clear and jq-like solution
$ sq '.actor | insert(@othersrc.table)'

@c-nv-s
Copy link
Author

c-nv-s commented May 11, 2023

any more thoughts on this request? this feature and inline datasources are the final puzzle piece for me with this tool.

@neilotoole
Copy link
Owner

@c-nv-s I'm finally investigating JSON support (can you tell I have thanksgiving week off? 😉)...

Quick question for you: do you use the "native" pg JSON operators in your queries, or jsonpath queries?

E.g. I've got a test table with info about GitHub repos (just because their API is a good source of JSON), looking like this:

$ sq inspect .github_json_awk5z6e6 -v
NAME                  TYPE   ROWS  COLS  NAME       TYPE  PK
github_json_awk5z6e6  table  2     3     repo_name  text
                                         repo       json
                                         issues     json

Do you do this?

select gh.repo::json->>'node_id'
from github_json_awk5z6e6 gh;

Or this?

select jsonb_path_query(gh.repo::jsonb, '$.node_id')->>0
from github_json_awk5z6e6 gh;

At this point, I'm leaning towards implementing the feature syntax via jsonpath, largely because that seems more portable across different databases, e.g. MySQL, SQL Server, etc. So, right now, the sq syntax might end up looking like:

$ sq '.github_json_awk5z6e6 | .repo->>$.node_id'

But, I'm still in the investigation phase. What would be really helpful is if you could provide some sample queries that you use in your workflow 🙏. I need a feel for how complex they might be.

And any other early feedback is very welcome too.

@neilotoole
Copy link
Owner

neilotoole commented Nov 22, 2023

Although, being that jsonpath is effectively another language to learn, maybe I'll wrap a jq veneer over the jsonpath. E.g. given this:

select jsonb_path_query(issues::jsonb, '$[*].title')->>0
from github_json_awk5z6e6;

In jsonpath style, that would be:

$ sq '.github_json_awk5z6e6 | .issues->$[*].title'

But, in jq style, it would be:

$ sq '.github_json_awk5z6e6 | .issues->.[].title'

@c-nv-s
Copy link
Author

c-nv-s commented Nov 22, 2023

hey neil, that jq style veneer does look nice and intuitive.

To answer your first question, I use the first mentioned query style, however my intention is to to get json responses as quickly as I can out of postgres and into jq, so my queries to postgres would usually aim for the most sensible top level node of the json and then the rest of the data manipulation would be done with jq.

psql -h localhost -p 5432 -U mypostgresuser -d shoppingappdb -tc "select jsonb_agg(data->'items'->'product') from theCollectionOfOrders " | jq ' ... ' 

I'll see if I can paste a snippet from the little postgres json cheatsheet I collated from various online sources

@c-nv-s
Copy link
Author

c-nv-s commented Nov 22, 2023

some snippets from my cheatsheet sourced online (was never intended for public viewing but what the hell)

CREATE TABLE theOrdersTable (
        id integer NOT NULL PRIMARY KEY,
        data jsonb NOT NULL
);

INSERT INTO theOrdersTable (data) 
VALUES('{ "Id": 53635, "customer": "Lily Bush", "items": {"product": "Diaper","qty": 24}}'),
      ('{ "Id": 87644, "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}'),
      ('{ "Id": 44487, "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}');


# three ways to get whole table as a json array

select to_jsonb(theOrdersTable) from theOrdersTable ;
select jsonb_agg(theOrdersTable) from theOrdersTable ;
select row_to_json(theOrdersTable) from theOrdersTable ;

# get all the json objects from the data column with pretty or regular json output

select jsonb_pretty(data) from theOrdersTable ;
select data::jsonb from theOrdersTable ;


# get a specific field in the json objects

select data->'items'->'product' from theOrdersTable ;


# build a json response object yourself manually

select jsonb_pretty( jsonb_agg( jsonb_build_object('id', id, 'items', data->'items'))) from theOrdersTables ;


# get the total number of objects which match a given json path (filter)

select * from theOrdersTable where jsonb_path_exists(data, '$.customer') ;


# the total number of objects but filter on a condition

select * from theOrdersTable where jsonb_path_exists(data, '$.customer ? (@ == "John Doe")')  ;


# combine filter with selection

select data->'items'->'product' from theOrdersTable where jsonb_path_exists(data, '$.customer ? (@ == "John Doe")')  ;

                                                                                                                             [452/1868]
# select and limit results

select data->'items'->'product' from theOrdersTable limit 2 ;


# expand the outermost JSON object into a set of key-value pairs

select jsonb_each(data) from theOrdersTable ;


# combine filter and object selection (!!jsonb_each(x) x MUST MATCH TO AN OBJECT IN THE JSON NOT VALUE!!)

select jsonb_each(data->'items') from theOrdersTable where jsonb_path_exists(data, '$.customer ? (@ == "John Doe")')  ;

                                                                                                                     

# get the keys of each object

select jsonb_object_keys(data->'items') from theOrdersTable where jsonb_path_exists(data, '$.customer') ;


# get the typeof the item

select jsonb_typeof(data->'items') from theOrdersTable where jsonb_path_exists (data, '$.customer') ;


# arrays in the json data can also be accessed. e.g.:
SELECT count(*) FROM thCollectionOfPosts WHERE jsonb_path_exists(data, '$.path.to.theJSONArray[*] ? (@ == "someValueYouWantToMatchOn")'
)


# building complex nested queries 

SELECT
jsonb_pretty(
    jsonb_build_object(
        'theAmountPurchased', json_agg(theOrdersTable.data->'items'->'qty'),
        'theFavoriteProduct', json_agg(theOrdersTable.data->'items'->'product')
    )
)
FROM theOrdersTable

# or with limiting from result of another query                                                                                        

select row_to_json(theOrdersTable) from ( select id, to_jsonb(data->'items'->'product') from theOrdersTable ) theOrdersTable limit 2 ;
select jsonb_pretty( row_to_json(theOrdersTable)::jsonb) from ( select  jsonb_build_object('theData', data->'items') from theOrdersTabl
e limit 2 ) theOrdersTable ;

## CRUD OPERATIONS

## UPDATING

# update type 1 - add a new outer-level field and value
UPDATE theOrdersTable
SET data = data || '{"the_delivery_priority": ["Next Day"]}'
WHERE data->>'customer'::text IN ( 'John Doe', 'Lily Bush');  

# update type 2 - set/update existing field (creating it if it doesn't already exist)
# in this example we create an order_ref field in the json from the table's own id field

UPDATE theOrdersTable
SET data = jsonb_set(
  data,
  '{order_ref}',
  id::TEXT::JSONB,
  true);  # this boolean sets whether to create if it doesn't already exist which is true by default so can be omitted

# if for some reason you had the 'data' column as a text field type you would have to convert to json before updating e.g.

UPDATE "theOrdersTable" 
SET "data"=jsonb_set("data"::jsonb, '{url}', '"images/0005.jpg"', true)
WHERE "details"::json->>'name'='Eiffel Tower';

# adding a value to a json array

UPDATE USER_CONFIGURATIONS
SET DATA =
jsonb_set(DATA, '{configurations}'::text[], DATA ->'configurations' || '{"name":"firstconf", "theme":"dark", "textsize":"large"}'::json
b,
        true)
WHERE (DATA ->> 'userid') = 'user@example.com';

# or

UPDATE airports

SET airport = 
JSONB_SET(airport, '{airport_keywords}',
  (SELECT (airport -> 'airport_keywords') || TO_JSONB('BWI'::TEXT) FROM airports WHERE airport ->> 'ident' = 'KBWI'),
  false)
WHERE airport ->> 'ident' = 'KBWI';

# change value of a field in an array                                                                                                  

UPDATE USER_CONFIGURATIONS
SET DATA = JSONB_SET(
    DATA,
    '{configurations}'::text[],
    (((DATA -> 'configurations')
     -
      (SELECT i
       FROM generate_series(0, jsonb_array_length(data->'configurations') - 1) AS i
       WHERE (data->'configurations'->i->>'name' = 'firstconf')))::jsonb
     ||
      '{"name":"first", "theme":"light"}'::jsonb))
WHERE (DATA ->> 'userid') = 'user@example.com';

# change value of a field in a nested document

UPDATE airports

SET airport = JSONB_SET(

  airport,

  '{runways}',

  (WITH heli_runs AS (

    SELECT airport -> 'runways' AS heli_run

    FROM airports
WHERE airport ->> 'ident' = 'MD25')                                                                                                
                                                                                                                                       
  SELECT JSONB_SET(                                                                                                                    
                                                                                                                                       
    JSONB_ARRAY_ELEMENTS(heli_run),                                                                                                    
                                                                                                                                       
    '{surface}',                                                                                                                       

    TO_JSONB('asph'::text),

    false)

  FROM heli_runs),

  false)
WHERE airport ->> 'ident' = 'MD25';

## DELETING

# delete type 1 - deleting an outermost field

UPDATE theOrdersTable set data = ( data - 'delivery_information' ) ;

# or remove an entire entry with the DELETE command

DELETE FROM theOrdersTable
WHERE (DATA->>'customer') = 'John Doe'::text


# delete type 2 - deleting inner fields and arrays

# remove the second element from an array
UPDATE USER_CONFIGURATIONS
SET DATA = DATA #- '{configurations, 1}'::text[]
WHERE (DATA ->> 'userid') = 'anotheruser@example.com';

# or remove array element with key name 'myconf2'

UPDATE USER_CONFIGURATIONS
SET DATA = DATA #-
            ('{configurations,' || (
              SELECT i
              FROM generate_series(0, jsonb_array_length(DATA->'configurations') - 1) AS i
              WHERE (DATA->'configurations'->i->>'name' = 'myconf2')
            ) || '}')::text[]
WHERE (DATA ->> 'userid') = 'anotheruser@example.com';

# or

UPDATE airports
SET airport =

JSONB_SET(

  airport,

  '{airport_keywords}',
  (WITH kjfk_keys AS (

    SELECT JSONB_ARRAY_ELEMENTS(airport -> 'airport_keywords') keywords

    FROM airports
   
    WHERE airport ->> 'ident' = 'KJFK')
SELECT JSONB_AGG(keywords)

  FROM kjfk_keys

  WHERE keywords::TEXT != '"Idlewild"'))
WHERE airport ->> 'ident' = 'KJFK';

# or we can delete the entire array by updating it with an empty array

UPDATE USER_CONFIGURATIONS
SET DATA =
jsonb_set(DATA, '{configurations}'::text[], '[]'::jsonb) WHERE (DATA ->> 'userid') = 'example@example.com';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants