Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible undesired behavior when using query filters #49

Open
mark-hartmann opened this issue Feb 24, 2022 · 3 comments
Open

Possible undesired behavior when using query filters #49

mark-hartmann opened this issue Feb 24, 2022 · 3 comments

Comments

@mark-hartmann
Copy link

mark-hartmann commented Feb 24, 2022

In the With / Without / Union functions it is not checked whether the passed columns are indexes, which might cause unexpected/undesired behavior if used with "normal" columns.

Let's take the following example:

c := column.NewCollection()
c.CreateColumn("name", column.ForString())
c.CreateColumn("some-column", column.ForInt16())

c.Insert(func(row column.Row) error {
	row.SetString("name", "john")
	row.SetInt16("some-column", 100)
	return nil
})
c.Insert(func(row column.Row) error {
	row.SetString("name", "jane")
	return nil
})

If I now start a query over the rows and only want to have the rows that have a value in this column, it works fine:

c.Query(func(txn *column.Txn) error {
	// Prints "1" 
	fmt.Println(txn.With("some-column").Count())
})

If the attribute for "jane" is set somewhere later in the program, this is of course picked up correctly using With. The problem is, however, that you can no longer get "jane" out of the With, because you can't "unset" columns.

By the way, it is not even checked whether the column exists at all. In my opinion, this should cause a panic like it does in other places (e.g. the *ReaderFor functions).

c.Query(func(txn *column.Txn) error {
	// Column does not exist 
	txn.With("abc").Count()
})

Now I have to ask myself if users should be able to pass any column (if so, there should be a Unset(column string) in txn) or if only the WithValue function should be used for normal columns (as I interpret it). In this case you should somehow check this within Txn, possibly via the owner or column, since this struct contains the IsIndex function.

@mark-hartmann
Copy link
Author

Just a quick thought; a small check for idx.IsIndex() in the methods themselves should solve at least the non-index-column problem, if it is a problem at all.

if idx, ok := txn.columnAt(columnName); ok && idx.IsIndex() {
	// ...
}

@Dreeseaw
Copy link
Collaborator

Hello Mark,

I've run into an error with using the package's txn.WithValue method with a boolean column, and traced the bug back to the distinction between a boolean and index column type.

When WithValue calls index.Filter to find matches, it uses c.Value to dictate both the value of the object and if it exists in the bitmap (txn.go +197) -

	txn.rangeRead(func(offset uint32, index bitmap.Bitmap) {
		index.Filter(func(x uint32) (match bool) {
			if v, ok := c.Value(offset + x); ok {
				match = predicate(v)
			}
			return
		})
	})

In the case of a false boolean value in the map, columnBool.Value (column.go +222, which call's kelindar/bitmap bitmap.go +33) will return false,false in the case of a false value, not executing the given predicate call. In the tests, the "active" boolean column is filtered via With and Without (txn_test.go +84, 90), making me think that boolean columns should be regarded as 'manually-set-indexes' rather than a normal column.

I can't tell if this is truly a bug, but if it is, I feel like the best route for solving both our problems is to fix up the columnBool.Value method and then add your index check.

Let me know what you think

@mark-hartmann
Copy link
Author

Hi, first of all I would like to apologize for the long time it took me to reply.

The idea of treating boolean columns as indexes sounds right to me, however the behavior of columnBool.Value seems strange to me as well. If a value exists - even if it is false - the predicate would have to be called.

I would have to dig a bit more into the column/bitmap library itself, but it seems to me that Bitmap.Contains might be the actual problem here. When Bitmap.Contains is called, it returns false if the value is false, which is not necessarily correct for boolean columns or any other data type and its null value. I don't know if this is really related to null values or not, I`ll have to take a closer look in the next days. If you have other suspicions, feel free to let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants