Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct (un)marshal #46

Open
jbenet opened this issue Jan 23, 2014 · 30 comments
Open

struct (un)marshal #46

jbenet opened this issue Jan 23, 2014 · 30 comments

Comments

@jbenet
Copy link
Collaborator

jbenet commented Jan 23, 2014

Hey, tiedot's great.

It would be good if tiedot supported direct struct marshal/unmarshal. There's no reason it shouldn't, as encoding/json supports it, and that's what tiedot's using.

I see that the reason the interface has map[string]interface{} is to add the id field. Is this necessary? Or, could reflection be used? (structs could include the id field).

@HouzuoGuo
Copy link
Owner

At the moment, two reasons exist for the document to be internally recognized as map[string]interface{}:

  • GetIn function traverses serialized JSON document structure to find values to be indexed/unindexed
  • To make use of query results, documents have their unique ID chucked in and then written to disk.

How about we introduce the following two new APIs, for embedded usage:

  • (col *Col) Insert(object interface{}) (documentID uint64)
  • (col *Col) Read(documentID uint64, object interface{})
  • (col *Col) Update(documentID uint64, newObject interface{})

For Insert and Update, the implementation serializes the user defined structure and manipulate the result string to put in the ID.

For Read, it will simply deserialize the document JSON into the structure.

How does it sound?

@jbenet
Copy link
Collaborator Author

jbenet commented Jan 23, 2014

Hey! the interface methods look good to me. 👍

Well, i think that adding the PK to the object is not a good idea. It dirties up the user's data. For instance, usually one expects data put into a database to be returned exactly the same. More concretely:

checksum1 = sha(val1)
db.put(key, val1)
val2 = db.get(key)
checksum2 = sha(val2)
assert val1 == val2
assert checksum1 == checksum2

Adding the key modifies the document, and breaks modularity. I think either:

  • don't put the pk in the dict. pk should only be a handle (like a file descriptor) to the data.
  • wrap the data in an outer dictionary with the key. Store {'pk': <pk>, 'data': <user's data>}. only return <user's data>.
  • if you really want to return the pk to the users in the dict, then allow users to use the pk field in their structs (with pk defaulting to 0 if not there), and to change the pk name. (e.g. if I want to use id or key because i have some other id scheme, i should be able to).

also, looking at the source, looks like there isn't a good way to address arrays with paths in GetIn. If I'm understanding correctly, GetIn tries all array items. (e.g. foo,bar returns 1 in {"foo": [{"baz":2}, "b", {"bar":1}]} -- am I right?) Perhaps may be better to explicitly address the arrays with indices, like mongo/bson do (foo,2,bar above).

@HouzuoGuo
Copy link
Owner

You are right, GetIn tries all array items to find the interested document value. I am aware of the Mongo feature, but rarely see it being used in real applications.

Regarding the doc ID - right now it is called "_pk". We may change the name to "@pk" for example - that is valid for JSON but invalid to be deserialized into Go structure.

How do you type thumb up? Is it a unicode character?

@jbenet
Copy link
Collaborator Author

jbenet commented Jan 24, 2014

rarely see it being used in real applications.

So, what happens here: events,date from {"events": [{"date":1}, {"date":2}, {"date":3}]} ? There's no way to explicitly address the individual events, so I couldn't ask to index on "the date of the first event".

that is valid for JSON but invalid to be deserialized into Go structure.

Something that will return {"_pk": 1, "other": "stuff"} and {"other": "stuff"} depending on the API that's used? This seems inconsistent to me.

Let me ask this, why do you want to return the _pk in the first place? is there any place where it gets returned that you didn't previously have it already?

:+1: -> 👍 :)

@alexandrestein
Copy link

Hey everyone.

Return id is useful when you show the user multiple elements and he select one of those. (for update or open)
The simplest way to do that is to hide the element id and when user click, the app get the related id.

The "_id" (as MongoDB) or "_key" sounds good to me.

I would prefer to have the id wrap in the object by default if the struct define a named field like "_id" for example. As "mgo" (GoLang MongoDB driver) works.

What about that?

@HouzuoGuo
Copy link
Owner

OK, now index on particular array element sounds useful.

What about changing the ID attribute name from "_pk" into "@id"? Because Go structure cannot have member "@id" attribute, therefore the JSON deserializer will happily skip that one.

Quite a few NoSQL engines put document ID in the document itself, like Alexandre said. In the case of tiedot, it helps a lot with query performance.

@HouzuoGuo
Copy link
Owner

If you don't mind, I shall change the ID attribute name from "_pk" to "@id" straight away.

@HouzuoGuo
Copy link
Owner

Completed @id change: 3cc156b

@HouzuoGuo
Copy link
Owner

Sorry about the delay.
The three new functions will make it into nextgen branch.

Also in nextgen branch, "@id" will no longer be physically stored in document itself, but rather only maintained in lookup table.

@HouzuoGuo
Copy link
Owner

these APIs will make their way into version 3.1 =)

@stunti
Copy link

stunti commented Nov 9, 2014

Hi.
I am new to tiedot but I can't find these functions with that signature. Are they implemented under a different name?
Thanks

@HouzuoGuo
Copy link
Owner

I am terribly sorry, very unfortunately the features did not make their way into the 3.1. For now map[string]interface{} is still the only supported document structure.

Here are some usage examples:
https://github.com/HouzuoGuo/tiedot/blob/master/example.go

Note that: document ID is no longer an attribute in the document content itself.

@geoah
Copy link

geoah commented Feb 22, 2015

Seems that these have been introduced in 3.2, is this stable to use or not yet ready?

@HouzuoGuo
Copy link
Owner

@geoah This is a trivial feature and I forgot about it during 3.2, sorry! Actually the interface between 3.0 and 3.2 have not changed.
It should be really simple to write, let me see if I can get it done in the next several days.

@kylewolfe
Copy link
Collaborator

Hi Houzuo,

I'm interested in using tiedot a little more and have a few questions on this subject. In reference to :

At the moment, two reasons exist for the document to be internally recognized as map[string]interface{}:

GetIn function traverses serialized JSON document structure to find values to be indexed/unindexed
To make use of query results, documents have their unique ID chucked in and then written to disk.

The method Col.ForEachDoc (https://godoc.org/github.com/HouzuoGuo/tiedot/db#Col.ForEachDoc) returns a []byte and in turn basically a json document that can quickly be unmarshaled using json.Unmarshal. I've found that using json to unmarshal is far better than something that does a map[string]interface{} -> struct conversion.

My questions: 1) how far along are you with any refactor to the API? 2) is the design final? 3) if not, and this []byte json document is indeed how the data comes off the wire. may I suggest creating a new data type of []byte that has a method to convert return a map[string]interface{}?

I think an API that allows both map[string]interface{} AND a JSON compatible struct in / out would be amazing.

A pain point I currently have is that for me to read from tiedot in to a struct I have to do something like this:

// convert each row back in to a Session type
    for id := range queryResult {
        doc, err := col.Read(id)
        if err != nil {
            Logger.Fatalln(err)
        }
        doc["Id"] = id

        ns := &Session{}
        err = mapToStruct(doc, ns)
        if err != nil {
            Logger.Fatalln(err)
        }
        s = append(s, *ns)
    }

In drivers such as mgo, there is a mechanism for unmarshaling the Id directly in the the given struct (here I had to inject it into my map before unmarshaling). You'll also notice that s and queryResult had to be allocated ahead of time. If a type is returned that has the ability to convert its self to both map[string]interface as well as []byte, I can skip that and go right in to unmarshaling using my preferred package.

@HouzuoGuo
Copy link
Owner

Thank you very much for the suggestions @kylewolfe and sorry I have not been active with the project development.

For your use case I'd like to propose four new functions:

  • An Insert function (doc interface{})(id int, err error)
  • An Update function (id int, doc interface{}) (err error)
  • A Read function (id int, destination interface{}) (err error)
  • Another Read function (id int, idAttrName string, destination interface{}) (err error)

I hope they'll cover your use cases. What do you suggest to be their names?

If they look good to you I should be able to implement them within a day or two.

@kylewolfe
Copy link
Collaborator

Thanks for the quick reply!

Your proposed methods would use json to (un)marshal?
What would idAttrName be used for?

Is the idea of a single return type that can do multiple things out of the realm of possibility? Or maybe just too much of a breaking change? Ideally an interface such as this would be better, so that Id is handled by (un)marshaling:

type Foo struct {
    Id int
    Foo, Bar string
}

f := &Foo{Foo: "foo", Bar: "bar"}
if err := col.Insert(f); err != nil {
    panic(err)
}
fmt.Println(f.Id) // has new id

f.Foo = "baz"
if err = col.update(f); err != nil {
    panic(err)
}

mgo behaves this way so that I do not have to take an extra step to populate Id, if it exists. What are your thoughts?

I'd be happy to start digging in and trying to make some updates to allow for this. I have a high need for embedded db usage, so it would probably be beneficial for me to poke around and understand a little more about tiedot at a lower level.

@kylewolfe
Copy link
Collaborator

Going back to the new return type. If []byte indeed does come off the wire (as I see it in ForEachDoc) and you also need map[string]interface for indexing purposes, etc. Then a type something like this could be used across the whole application:

type Row []byte {}
func (r Row) ToMap() map[string]interface{} // allows you to do what you need for indexing

Then during ForEachDoc, both json unmarshaling and any existing functionality around map[string]interface{} can be used.

@HouzuoGuo
Copy link
Owner

All right, I misunderstood your original use case regarding mgo. It will be a little difficult to support mgo style API, but I'll try my best to make the existing APIs more pleasant.

json.Marshal/Unmarshal uses JSON byte array instead of string, which happens to be compatible with the representation of JSON documents in the storage. However the higher level API in Col such as Col.Read() returns map[string]interface{} which is inconvenient.

So starting from Col.Read I propose two more functions (names are not final)

Read2(id int, destination interface{}) reads document at specified ID and deserialise it to destination, structure of your choice.

Read3(id int, idAttrName string, destination interface{}) is similar to above, but it also artificially introduces (or modifies) an ID attribute denoted by the idAttrName, so that deserialised structure will have the document ID as well.

And finally, to make Insert and Update easier to use, they will also take interface{} instead of map[string]interface{} as the document input. These two functions will convert the input into index-capable map[string]interface{} automatically, for maintaining indexes.

So in the end the new APIs behave similar to mgo API in terms of supporting any structure as document and treatment of document ID in the deserialised structure.

How does that sound?

@kylewolfe
Copy link
Collaborator

I think I may have a crack at this tonight. Rather than what you describe in Read3, how about utilizing a structs tag feature within reflect?

type Foo struct {
    Id int `tiedot: id`
    Foo, Bar string
}

I'm assuming your looking to keep API compatibility for awhile longer (before v4 if at all?)

@HouzuoGuo
Copy link
Owner

it will definitely be ideal if tiedot can make use of the tag, although I do not know how. Please feel free to experiment with the feature and I look forward to hear from your findings.

@kylewolfe
Copy link
Collaborator

Here's a little show and tell of what I have so far (not ready for a PR): master...kylewolfe:api_break

I titled the branch api_break, thinking that I would write up what an ideal final API would look like, regardless of breaking changes, but then I realized you could actually switch to a format in which mgo uses (doc interface{}) and still keep backward compatibility.

Insert now supports map[string]interface, the new type of Document, []byte and any pointer to a struct. If a pointer to a struct is given, it will search for a tiedot tag of Id, and update that struct directly, allowing for my earlier example.

Once tested (and other functions converted), this should allow for:

type Foo struct {
    Id int `tiedot: id`
    Foo string
}

f := &Foo{Foo: "bar"}
if _, err := col.Insert(f); err != nil {
    panic(err)
}
// f.Id now has the new id

f.Foo = "baz"
if _, err := col.Update(f.Id,  f); err != nil { // Ideally, the first argument wouldn’t be needed, but oh well
    panic(err)
}

What are your thoughts on this direction?

@HouzuoGuo
Copy link
Owner

That looks even better than I previously thought! well done!
I added you into collaborator list so feel free to push changes.

@kylewolfe
Copy link
Collaborator

Excellent! I'll move my work in to a branch on the official repo and continue to play. Do you have a standard or thought on breaking changes? I realized that some more thought needs to go in to a few other methods (ForEachDoc, for example, I believe would have to receive a breaking change in order to support struct unmarshaling)

@HouzuoGuo
Copy link
Owner

Well I think people are quite familiar with the idea that softwares evolve and breaking changes will occur . As long as the actual feature isn't taken away I wouldn't mind changing some function signatures.

@kylewolfe kylewolfe self-assigned this Aug 7, 2015
@arafath-mk
Copy link

Hi.. What is the status of this issue..?

@HouzuoGuo
Copy link
Owner

@arafath-mk sorry, little progress has been made regarding this issue, although it should not be a difficult matter.
@kylewolfe hello! do you by any chance have a moment to take a look at this?

@kylewolfe
Copy link
Collaborator

Unfortunately, I opted to use a different solution for my project. As a result I did not get around to finishing my work on this issue. My partial work is still available though (master...kylewolfe:api_break) if someone wants to pick up the baton.

@kylewolfe kylewolfe removed their assignment Nov 15, 2016
@phpfs
Copy link

phpfs commented Nov 20, 2020

Hey there - any news on adding a function to directly insert a struct? :)

@HouzuoGuo
Copy link
Owner

thanks for your interest in this project @phpfs , this project started as a short programming exercise, i'm very sorry that i didn't have a chance to take it higher :|

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants