-   Notifications  
You must be signed in to change notification settings  - Fork 76
 
Rework of dfs functions to recursively (changing return type of recurse-able functions) #363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…zed versions on which transformers like "recursively" can be called.
topDfs -> topmostChildren top -> roots
# Conflicts: # examples/jupyter-notebooks/titanic/Titanic.ipynb
# Conflicts: # core/generated-sources/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/Modify.kt # core/generated-sources/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/TestBase.kt # core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/Modify.kt # core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/TestBase.kt # docs/StardustDocs/topics/ColumnSelectors.md
   core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/DataColumn.kt  Show resolved Hide resolved  
    ...generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl.kt  Outdated   Show resolved Hide resolved  
    ...generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl.kt  Outdated   Show resolved Hide resolved  
    ...generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl.kt  Show resolved Hide resolved  
    ...generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl.kt  Show resolved Hide resolved  
 | } | ||
| val dfs = fullTree.topDfs { it.data != null } | ||
| return dfs.map { it.data!!.addPath(it.pathFromRoot()) } | ||
| val subtrees = fullTree.topmostChildren { it.data != null } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
topmostChildren?
 topDFS is a bad name, but topmostChildren leads me to the mist again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As my doc explains it takes the tree and returns first the children, closest to the root of the tree, where the predicate holds. It doesn't matter whether the children of those nodes also hold, those are ignored. So since we now have allChildren { something == true }, I thought topmostChildren { something == true }. It can also be viewed as subtree-roots where the predicate holds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@koperagen Any suggestions? :)
   core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/columns/constructors.kt  Show resolved Hide resolved  
    core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/columns/tree/Utils.kt  Show resolved Hide resolved  
    core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/html.kt  Outdated   Show resolved Hide resolved  
    core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl.kt  Show resolved Hide resolved  
 | public fun <T> DataFrame<T>.replaceAll( | ||
| vararg valuePairs: Pair<Any?, Any?>, | ||
| columns: ColumnsSelector<T, *> = { allDfs() }, | ||
| columns: ColumnsSelector<T, *> = { all().recursively(includeGroups = false) }, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has the default value for includeGroups changed? I would argue that one of the main usages of dfsOf is to convert / replace columns with some type and groups are not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that it's because of unification of overloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, by default groups were included in dfs {}, but excluded in allDfs() if I recall correctly. I had to choose a sensible default. But if you use colsOf<Int>().rec(), then includeGroups irrelevant anyways, since Int is not a Group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is interesting and a bit confusing. I think this parameter only exists for allDfs. For normal dfs it would be dfs { !it.isColumnGroup() }. Now it's also present for rec, recursively. What if you remove it from them and leave only for allRec with the same default value (true)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.rec() can not only be called on all(), you can also call first { }.rec(includeGroups = false), so I think keeping it in rec/recursively too would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a function that can give all valueColumns right? If we did we could have valueCols {}.recursively() or valueColsRecursively() but that might get a bit long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus it's not just ColumnGroups that get included right? also FrameColumns and ValueColumns. So, just all of them. I think that's a good default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@koperagen Actually I do want to add valueCols {} and frameCols {} since we already have groups {} as well. Probably in another PR though. Then you can call valueCols().rec() if you want. Sounds okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a nice addition, not 100% related to this problem though. For this allRecursively function with previous default, something like "allLeafs" might fit better. I think it was named like that before and renamed later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also take it in another direction eventually, similar to colsOf<Type>() have colsOf(vararg ColumnKind) {}.rec(), that might be clearer compared to users having to figure out what a "leaf" means.
   ...erated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/columns/tree/TreeNode.kt  Show resolved Hide resolved  
 # Conflicts: # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/convert.kt # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/describe.kt # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/inferType.kt # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/reorder.kt # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/columns/ColumnWithPath.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/describe.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/inferType.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/reorder.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/columns/ColumnWithPath.kt
|   Another thing that i noticed now is that it's hard to tell what rec does for   |  
# Conflicts: # docs/StardustDocs/topics/ColumnSelectors.md # gradle/libs.versions.toml
 
 Yes, it works the same. You can read it like a sentence: "Select the first column of type Int recursively", which sort of makes sense. Basically:  |  
# Conflicts: # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/rename.kt # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/api/flatten.kt # core/generated-sources/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/move.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/rename.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/api/flatten.kt # core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/move.kt
Same as #361 but using a different tactic.
Largest changes:
Streamlined behavior of
all,cols,groupsetc. such that if it's called onSingleColumn<*>containing a singleColumnGroup(such asDataFrame), the operation will run on the children of thatColumnGroupColumnSet<*>, it will simply run on the columns inside.This is enforced using the new
allInternal()function combined withisSingleColumnWithGroup. I think this simply enforces expected behavior, since all tests work with it (and break upon a slight modification of this function) and more importantly, the calls seem to make sense. Let me know what you think.Introducing
TransformableColumnSetandTransformableSingleColumn. A wrapper around the original ones that provide a way totransformResolveas well as the normalresolve.transformResolvetakes aColumnSetTransformerthat can get executed before resolving theColumnSetorSingleColumn. AColumnSetTransformercontains implementations for bothtransformandtransformSingle, similar to the functions inUtils.kt. Currently, the only implementation of this isRecursivelyTransformer, which is configured and created in theTransformableColumnSet<>.recursively()calls.Not all
ColumnSets are transformable. Functions that allow reconfiguration with aColumnSetTransformerneed to explicitly state that in their return type. In principle, any function that usestransform {}can automatically opt-in to theTransformableColumnSetreturn-type if that makes sense for them. Currently, I added the ability tocols {},all(),groups {},children {},filter {},nameContains(),endsWith(),startsWith(),except(), andwithoutNulls().I added
TransformableSingleColumntofirst {},last {}, andsingle {}(which usessingleWithTransformerImplto pass on the transformer to the right place).Of course, if other functions are applicable to modification calls like
.recursively()I'd gladly hear it..recursively() / .rec()is the replacement ofdfs {},dfsOf<>{}, andallDfs(). because the name was confusing. I put deprecations in place and replaced all calls in the library to recursively. Some things to note:dfs {}, called on either aColumnSetorSingleColumnwould exclude the top-level of columns but still include column groups in the end-result, so a proper replacement of that call would becols {}.recursively()if called on aSingleColumnandcols {}.recursively(includeTopLevel = false)if called on a normalColumnSet. Of courserec()can be used as shorthand.dfsOf<>(), is replaced similarly to eithercolsOf<>().rec()orcolsOf<>().rec(includeTopLevel = false).allDfs()defaultedincludeGroupstofalse, so be careful, since this is nowtrue. It can be replaced either withall().recursively()orallRecursively()(orallRec()), again taking into accountincludeTopLevelandincludeGroups.Other smaller changes:
dfstodepthFirstSearchwheredepth first searchwas actually meantTreeNode.dfs {}->TreeNode.allChildren {}(+ other dfs functions)TreeNode.topDfs {}->TreeNode.topmostChildren {}(+ other topDfs functions)Iterable<ColumnWithPath<*>>.dfs()->flattenRecursively()ColumnPath.somethingoverloads inColumnsSelectionDslsince they are handled bySingleColumn<*>ColumnSetandSingleColumn