Skip to content

Conversation

@martinpopel
Copy link
Contributor

No description provided.

- require Python 3.6+ due to f-strings - Travis test with Python 3.6-3.9 - expect pip older than 2016, which knows python_requires
- If no head is specified, the first word from mention_words will be used instead. - If mention_words are provided, they must contain the head. - docstring
and store it in CorefCluster, not CorefMention
- EmptyNode is a subclass of Node - root is stored in empty nodes, not computed (because empty nodes have parent=None and root may be unreachable) - node.create_empty_child(deprel='...') deprel is now required - it is the enhanced UD deprel to be stored in DEPS - argument `after` specifies the position (ord) of the newly created empty node - root.create_empty_child() is a faster version, which does not set `deps` and `ord`.
In Python, empty attributes should be None (it is more memory efficient than '_'), the underscore is just a matter of CoNLL-U serialization.
so the setter can be overriden in EmptyNode
Benchmark "NewTreex" shows 0.5% more memory, but total time 6% faster (rehanging, i.e. changing a parent is 21% faster, next_node 65% faster): experiment|TIME |MAXMEM |load |save |iterN|rehang|remove|add |reorder| ----------|-----:|------:|-----:|----:|----:|-----:|-----:|----:|------:| udapi |40.507|832.815|17.066|4.521|0.715|3.502 |2.348 |2.868|4.174 | udapi_new |37.958|837.302|16.720|3.846|0.251|2.766 |2.284 |2.699|4.106 |
reading+writing support for multiple clusters in one node stored using layered attributes
…dTuple root.create_empty_child() should not sort the empty nodes because they may not have ord filled. In contrast, node.create_empty_child() should sort the empty nodes.
* l.append(x) is faster than l += [x] * l.sort(); l2=l; is faster than l2 = sorted(l)
The main trick is to prevent creating new lists (and memory allocations) wherever possible. E.g. `node.children` creates a new `ListOfNodes` object with a copy of the list of children and `node.children(add_self=True)` creates one more copy (and the previous copy is thrown away for gc). Thus internally, we can call node._children (which currently does not guarantee sorted result), which creates no extra list. `node.children(add_self=True)` was changed so it creates just a single new list. Further speedup is possible in future. Very minor speedup is due to direct usage of attributes instead of overloaded properties, e.g. node._ord instead of node.ord. This is not worth the effort in user blocks, but internally in the core API it makes a (small) difference in total.
@martinpopel martinpopel merged commit f131d31 into master Feb 9, 2021
@martinpopel martinpopel deleted the coref branch February 9, 2021 11:07
@dan-zeman dan-zeman removed their request for review February 9, 2021 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants