Skip to content

Conversation

@michnov
Copy link
Contributor

@michnov michnov commented Feb 9, 2021

Calling tokenize_tag_parse_tree with resegment=False first runs the underlying UDPipe, resulting in a sequence of tokens potentially grouped to sentence segments. The nested sequence is then flatten so that all tokens belong to the same segment. However, this was not reflected in the root.text attribute, which was always assigned a value from UDPipe by calling ufal.udpipe.Sentence.getText(). Apparently, instead of recomputing the return value on the fly, the getter returns a value pre-computed during processing.

If we do not want the text to be resegmented, the value of root.text must stay the same.

…ole text, not just the first sentence after segmentation
@martinpopel martinpopel merged commit 4bb1908 into master Feb 9, 2021
@martinpopel
Copy link
Contributor

Thanks.

@martinpopel martinpopel deleted the no_resegment_text branch February 9, 2021 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants