Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,5 @@ sphinx:
# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: requirements.txt
- requirements: requirements-dev.txt
- requirements: docs/requirements-doc.txt
119 changes: 31 additions & 88 deletions docs/source/annotation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,30 +18,13 @@ The *to_string* method returns a string representation containing three tabulate

For example:

.. code-block:: python
.. literalinclude:: ../../tests/test_doc.py
:language: python
:dedent:
:linenos:
:emphasize-lines: 11,12,13

from iamsystem import Matcher, Abbreviations, Term
matcher = Matcher()
abb = Abbreviations(name="abbs")
abb.add(short_form="infect", long_form="infectious", tokenizer=matcher)
matcher.add_fuzzy_algo(abb)
term = Term(label="infectious disease", code="D007239")
matcher.add_keywords(keywords=[term])
text = "Infect mononucleosis disease"
annots = matcher.annot_text(text=text, w=2)
for annot in annots:
print(annot)
print(annot.to_string(text=text))
print(annot.to_string(text=text, debug=True))


.. code-block:: pycon

# Infect disease 0 6;21 28 infectious disease (D007239)
# Infect disease 0 6;21 28 infectious disease (D007239) Infect mononucleosis disease
# Infect disease 0 6;21 28 infectious disease (D007239) Infect mononucleosis disease infect(abbs);disease(exact)
:emphasize-lines: 18,19,20
:start-after: # start_test_annotation_format
:end-before: # end_test_annotation_format

Passing the document to the *to_string* function adds the document substring
that begins at the first token start offset and ends at the last token end offset.
Expand All @@ -58,27 +41,12 @@ This happens if two terms have the same label but
also if the normalization process removes punctuation or if stopwords are ignored.
In the example below, only one annotation is produced and it has 3 keywords:

.. code-block:: python

from iamsystem import Matcher, english_tokenizer, Term
term1 = Term(label="Infectious Disease", code="J80")
term2 = Term(label="infectious disease", code="C0042029")
term3 = Term(label="infectious disease, unspecified", code="C0042029")
tokenizer = english_tokenizer()
matcher = Matcher(tokenizer=tokenizer)
matcher.add_stopwords(words=["unspecified"])
matcher.add_keywords(keywords=[term1, term2, term3])
text = "History of infectious disease"
annots = matcher.annot_text(text=text)
annot = annots[0]
for keyword in annot.keywords:
print(keyword)

.. code-block:: pycon

# Infectious Disease (J80)
# infectious disease (C0042029)
# infectious disease, unspecified (C0042029)
.. literalinclude:: ../../tests/test_doc.py
:language: python
:dedent:
:linenos:
:start-after: # start_test_annotation_multiple_keywords
:end-before: # end_test_annotation_multiple_keywords

Overlapping and ancestors
^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -98,43 +66,24 @@ Furthermore, if a1 has all the tokens of a2 then a2 is called a **nested annotat
By default, the :ref:`matcher:Matcher` removes nested annotation.
For example:

.. code-block:: python
.. literalinclude:: ../../tests/test_doc.py
:language: python
:dedent:
:linenos:
:emphasize-lines: 5, 10

from iamsystem import Matcher
matcher = Matcher()
matcher.add_labels(labels=["lung", "lung cancer"])
text = "Presence of a lung cancer"
annots = matcher.annot_text(text=text, w=1)
for annot in annots:
print(annot)
# lung cancer 14 25 lung cancer
self.assertEqual("lung cancer 14 25 lung cancer", str(annots[0]))
matcher.remove_nested_annots = False
annots = matcher.annot_text(text=text, w=1)
for annot in annots:
print(annot)
# lung 14 18 lung
# lung cancer 14 25 lung cancer
:emphasize-lines: 6,11
:start-after: # start_test_annotation_overlapping_ancestors
:end-before: # end_test_annotation_overlapping_ancestors


Another example where the first annotation fully overlaps the second but the latter is not
a nested annotation:

.. code-block:: python

from iamsystem import Matcher
matcher = Matcher()
matcher.add_labels(labels=["North America", "South America"])
text = "North and South America"
annots = matcher.annot_text(text=text, w=3)
for annot in annots:
print(annot)

.. code-block:: pycon

# North America 0 5;16 23 North America
# South America 10 23 South America
.. literalinclude:: ../../tests/test_doc.py
:language: python
:dedent:
:linenos:
:start-after: # start_test_annotation_overlapping_not_ancestors
:end-before: # end_test_annotation_overlapping_not_ancestors

The first annotation, starting at offset 0 and ending at offset 23, fully overlaps the second.
However, it doesn't have all the tokens of the second annotation,
Expand All @@ -155,19 +104,13 @@ Partial overlapping
Definition: let a1 and a2 two annotations. If a1.start < a2.start and a2.start < a1.end
then we say that a1 **partially overlaps** a2.

.. code-block:: python

from iamsystem import Matcher
matcher = Matcher()
matcher.add_labels(labels=["lung cancer", "cancer prognosis"])
annots = matcher.annot_text(text="lung cancer prognosis")
for annot in annots:
print(annot)

.. code-block:: pycon

# lung cancer 0 11 lung cancer
# cancer prognosis 5 21 cancer prognosis
.. literalinclude:: ../../tests/test_doc.py
:language: python
:dedent:
:linenos:
:start-after: # start_test_annotation_partial_overlap
:end-before: # end_test_annotation_partial_overlap

The first annotation partially overlaps the second because it ends after the second starts.
In this example, both annotations share the *"cancer"* token.
Expand Down
14 changes: 14 additions & 0 deletions docs/source/api_doc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,21 @@ ESpellWiseAlgo
:undoc-members:
:show-inheritance:

SimString
^^^^^^^^^
SimStringWrapper
""""""""""""""""
.. autoclass:: iamsystem.fuzzy.simstring.SimStringWrapper
:members:
:undoc-members:
:show-inheritance:

ESimStringMeasure
"""""""""""""""""
.. autoclass:: iamsystem.fuzzy.simstring.ESimStringMeasure
:members:
:undoc-members:
:show-inheritance:

Brat
----
Expand Down
77 changes: 20 additions & 57 deletions docs/source/brat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,29 +14,19 @@ The class :ref:`api_doc:BratDocument` can store **Brat entities** and **Brat not
Each entity corresponds to an annotation:

- An ID
- A Brat type that should be declared in Brat's configuration file (annotation.conf)
- A Brat type declared in Brat's configuration file (annotation.conf)
- start-end offsets
- text substring

.. code-block:: python
:linenos:
:emphasize-lines: 8

from iamsystem import Matcher, Term, BratDocument
matcher = Matcher()
term1 = Term(label="North America", code="NA")
matcher.add_keywords(keywords=[term1])
text = "North and South America"
annots = matcher.annot_text(text=text, w=3)
brat_document = BratDocument()
brat_document.add_annots(annots, text=text, brat_type="CONTINENT", keyword_attr=None)
print(str(brat_document))


.. code-block:: pycon
.. literalinclude:: ../../tests/test_doc.py
:language: python
:dedent:
:linenos:
:emphasize-lines: 12
:start-after: # start_test_brat_document
:end-before: # end_test_brat_document

# T1 CONTINENT 0 5;16 23 North America
# #1 IAMSYSTEM T1 North America (NA)

The first line is the brat entity, the second is the brat note. T1 is the ID of the brat entity.
Each note is linked to a brat entity by its ID, here T1.
Expand All @@ -50,49 +40,22 @@ applies to all annotations.
If you have multiple Brat types, a better way to do this is to store the Brat type
in a :ref:`api_doc:Keyword` subclass attribute and to pass the attribute name to the *add_annots* function:

.. code-block:: python
.. literalinclude:: ../../tests/test_doc.py
:language: python
:dedent:
:linenos:
:emphasize-lines: 14

from iamsystem import Term
class Entity(Term):
def __init__(self, label: str, code: str, brat_type: str):
super().__init__(label, code)
self.brat_type = brat_type

from iamsystem import Matcher, BratDocument
matcher = Matcher()
term1 = Entity(label="North America", code="NA", brat_type="CONTINENT")
matcher.add_keywords(keywords=[term1])
text = "North and South America"
annots = matcher.annot_text(text=text, w=3)
brat_document = BratDocument()
brat_document.add_annots(annots=annots, text=text, keyword_attr='brat_type')
print(str(brat_document))

.. code-block:: pycon

# T1 CONTINENT 0 5;16 23 North America
# #1 IAMSYSTEM T1 North America (NA)
:emphasize-lines: 18
:start-after: # start_test_brat_doc_keyword
:end-before: # end_test_brat_doc_keyword

Brat Writer
^^^^^^^^^^^^^
^^^^^^^^^^^

This package provides an utility class to write a :ref:`api_doc:BratDocument`.

.. code-block:: python
.. literalinclude:: ../../tests/test_doc.py
:language: python
:dedent:
:linenos:
:emphasize-lines: 11,12

from iamsystem import Matcher, Term, BratDocument, BratWriter
matcher = Matcher()
term1 = Term(label="North America", code="NA")
matcher.add_keywords(keywords=[term1])
text = "North and South America"
annots = matcher.annot_text(text=text, w=3)
brat_document = BratDocument()
brat_document.add_annots(annots, text=text, brat_type="CONTINENT")
filename = "./doc.ann"
with(open(filename, 'w')) as f:
BratWriter.saveEntities(brat_entities=brat_document.get_entities(), write=f.write)
BratWriter.saveNotes(brat_notes=brat_document.get_notes(), write=f.write)
:start-after: # start_test_brat_writer
:end-before: # end_test_brat_writer
Loading