Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POJO toDataFrame support (and array improvements) #650

Merged
merged 14 commits into from
May 1, 2024

Conversation

Jolanrensen
Copy link
Collaborator

@Jolanrensen Jolanrensen commented Apr 6, 2024

Fixes #380

This PR relaxes, the toDataFrame() properties() DSL to allow for getter functions too.

If no memberProperties are available, it'll fall back to getter-like memberFunctions.
Also, it now tries to sort by any constructor if there are multiple (common in java with a no-arg constructor).

EDIT:

  • Added that toDataFrame { properties } treats arrays as values using new isArray reflection helpers.
  • Fixed rendering of Array types in output
  • Added array value rendering to output

…. If no memberProperties are available, fall back to getter-like memberFunctions. Also, it now tries to sort by any constructor if there are multiple (common in java with a no-arg constructor)
@Jolanrensen Jolanrensen added bug Something isn't working enhancement New feature or request labels Apr 6, 2024
@Jolanrensen Jolanrensen added this to the 0.14.0 milestone Apr 6, 2024
@Jolanrensen Jolanrensen self-assigned this Apr 6, 2024
@Jolanrensen Jolanrensen removed the bug Something isn't working label Apr 6, 2024
Copy link
Collaborator

@zaleslaw zaleslaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a little bit more work on clarification and refactoring for future generations

@koperagen
Copy link
Collaborator

Let's add a check in isValueType for different arrays. Maybe there's a simple check? If there's no, below code from compiler for reference.
Also, functions in java class can in theory be static, we should filter them out.
Please check how Kotlin reflection sees a List and Int, int, Number returned from Java bean: it could be that they're not converted automatically to their Kotlin counterparts and our isValue or isIterable checks won't work.
Last thing that worth checking is type arguments. Map<String, Integer> from Java can become something like (Mutable)Map<String!, Int!> in Kotlin. Expected result here would be Map<String?, Int?>

private fun ConeKotlinType.isArrayType(isNullable: Boolean?): Boolean {
    return isBuiltinType(StandardClassIds.Array, isNullable) ||
            StandardClassIds.primitiveArrayTypeByElementType.values.any { isBuiltinType(it, isNullable) } ||
            StandardClassIds.unsignedArrayTypeByElementType.values.any { isBuiltinType(it, isNullable) }
}

val primitiveTypes = setOf(Boolean, Char, Byte, Short, Int, Long, Float, Double)

    val primitiveArrayTypeByElementType = primitiveTypes.associateWith { id -> id.shortClassName.primitiveArrayId() }

 val unsignedTypes = setOf(UByte, UShort, UInt, ULong)
    val unsignedArrayTypeByElementType = unsignedTypes.associateWith { id -> id.shortClassName.primitiveArrayId() }

private fun Name.primitiveArrayId() = ClassId(StandardClassIds.Array.packageFqName, Name.identifier(identifier + StandardClassIds.Array.shortClassName.identifier))

@koperagen
Copy link
Collaborator

Let's filter out functions that have type arguments too

class B {
    fun <T> getA(): T {
        return TODO()
    }

    fun <T> getAa(): List<T> {
        return TODO()
    }
}

fun t(b: B) {
    b.getAa<Int>()
}

@Jolanrensen
Copy link
Collaborator Author

Let's add a check in isValueType for different arrays. Maybe there's a simple check? If there's no, below code from compiler for reference. Also, functions in java class can in theory be static, we should filter them out. Please check how Kotlin reflection sees a List and Int, int, Number returned from Java bean: it could be that they're not converted automatically to their Kotlin counterparts and our isValue or isIterable checks won't work. Last thing that worth checking is type arguments. Map<String, Integer> from Java can become something like (Mutable)Map<String!, Int!> in Kotlin. Expected result here would be Map<String?, Int?>

private fun ConeKotlinType.isArrayType(isNullable: Boolean?): Boolean {
    return isBuiltinType(StandardClassIds.Array, isNullable) ||
            StandardClassIds.primitiveArrayTypeByElementType.values.any { isBuiltinType(it, isNullable) } ||
            StandardClassIds.unsignedArrayTypeByElementType.values.any { isBuiltinType(it, isNullable) }
}

val primitiveTypes = setOf(Boolean, Char, Byte, Short, Int, Long, Float, Double)

    val primitiveArrayTypeByElementType = primitiveTypes.associateWith { id -> id.shortClassName.primitiveArrayId() }

 val unsignedTypes = setOf(UByte, UShort, UInt, ULong)
    val unsignedArrayTypeByElementType = unsignedTypes.associateWith { id -> id.shortClassName.primitiveArrayId() }

private fun Name.primitiveArrayId() = ClassId(StandardClassIds.Array.packageFqName, Name.identifier(identifier + StandardClassIds.Array.shortClassName.identifier))

lucky for us, static functions aren't in memberFunctions so this works by default :) working on the rest...

…it, updating convertToDataFrame implementation to be clearer too. Testing for different types of primitives from java pojo
…ys as values in toDataFrame { properties() }
@Jolanrensen Jolanrensen changed the title POJO toDataFrame support POJO toDataFrame support (and array improvements) Apr 25, 2024
@Jolanrensen Jolanrensen requested review from zaleslaw and removed request for koperagen April 25, 2024 14:40
@Jolanrensen
Copy link
Collaborator Author

I tried to add your requests to the PR and even added a little extra I needed for debugging/testing (see top comment).

I also spawned a couple of new related issues:


/** Returns the column name for this callable.
* If the callable contains the [ColumnName][org.jetbrains.kotlinx.dataframe.annotations.ColumnName] annotation, its [ColumnName.name][org.jetbrains.kotlinx.dataframe.annotations.ColumnName.name] is returned.
* Otherwise, the name of the callable is returned with proper getter-trimming iff it's a [KFunction]. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not introduce comments that are essentially exactly follow the code? Code is easier to read and is always factual, unlike plain text here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in the same file, but if you're calling the code from another place, all you see is the signature and the KDoc. I don't want to navigate to a piece of code to figure out what it does.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctrl+shift+i?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha okay that might work, but still, I think there's no such thing as "unnecessary docs". If university programming has taught me anything is that documentation, no matter how useless it may seem to write, will help you in some way in the future.

val box = kClass.java.methods.single { it.name == "box-impl" }
val valueClassConverter = ValueClassConverter(unbox, box)
valueClassConverter
if (!kClass.isValue) return@let null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really needed? It was reviewed and merged, not sure why it needs to be changed without actual functional changes. I don't like when PR has things like this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the complexity of this huge function is someting @zaleslaw stuggeled with and I agree. Lowering the cognitive complexity, like here, improves the codebase as a whole. And face it, we're not gonna do a refactor of everything anytime soon, so I think it's okay to do this along with PRs. I will try to do it in separate commits when I can though, so you can more easily skip it in reviews

* It prefers the primary constructor if it exists, else it falls back to the other constructors to do the sorting.
* Finally, it falls back to lexicographical sorting if a property does not exist in any constructor.
*/
internal fun <T> Iterable<KCallable<T>>.sortWithConstructors(klass: KClass<*>): List<KCallable<T>> {
Copy link
Collaborator

@koperagen koperagen Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly i still don't understand what to expect from this as a user. (regarding secondary constructors etc) I'd like something simpler
Btw, by default reflection probably returns properties / functions in some specific order too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope :/ reflection returns them completely random.

The idea was to come as close to what a user expects the order of columns to be as possible. So if they have a weird class like

class A {
  constructor()
  constructor(foo: Int, bar: Int)
  constructor(bar: Int, too: Int)
  ...
}

well, you expect foo to be before bar and too after it, right? That's what this function achieves. It sorts the columns according to the order of all constructors. If there's a primary constructor, obviously that is used first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now simplified to just sorting lexicographically and by primary/single constructor

Copy link
Collaborator

@koperagen koperagen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

@Jolanrensen Jolanrensen merged commit 35176c9 into master May 1, 2024
4 checks passed
@Jolanrensen Jolanrensen deleted the pojo-toDataFrame branch May 1, 2024 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Known issues with Collection of Java classes
3 participants