scossin · scossin · Feb 12, 2023 · Feb 2, 2023 · Feb 4, 2023 · Feb 6, 2023
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -26,5 +26,5 @@ sphinx:
 # Optionally declare the Python requirements required to build your docs
 python:
  install:
- - requirements: requirements.txt
+ - requirements: requirements-dev.txt
  - requirements: docs/requirements-doc.txt
diff --git a/docs/source/annotation.rst b/docs/source/annotation.rst
@@ -18,30 +18,13 @@ The *to_string* method returns a string representation containing three tabulate
 
 For example:
 
-.. code-block:: python
+.. literalinclude:: ../../tests/test_doc.py
+ :language: python
+ :dedent:
  :linenos:
- :emphasize-lines: 11,12,13
-
- from iamsystem import Matcher, Abbreviations, Term
- matcher = Matcher()
- abb = Abbreviations(name="abbs")
- abb.add(short_form="infect", long_form="infectious", tokenizer=matcher)
- matcher.add_fuzzy_algo(abb)
- term = Term(label="infectious disease", code="D007239")
- matcher.add_keywords(keywords=[term])
- text = "Infect mononucleosis disease"
- annots = matcher.annot_text(text=text, w=2)
- for annot in annots:
- print(annot)
- print(annot.to_string(text=text))
- print(annot.to_string(text=text, debug=True))
-
-
-.. code-block:: pycon
-
- # Infect disease	0 6;21 28	infectious disease (D007239)
- # Infect disease	0 6;21 28	infectious disease (D007239)	Infect mononucleosis disease
- # Infect disease	0 6;21 28	infectious disease (D007239)	Infect mononucleosis disease	infect(abbs);disease(exact)
+ :emphasize-lines: 18,19,20
+ :start-after: # start_test_annotation_format
+ :end-before: # end_test_annotation_format
 
 Passing the document to the *to_string* function adds the document substring
 that begins at the first token start offset and ends at the last token end offset.
@@ -58,27 +41,12 @@ This happens if two terms have the same label but
 also if the normalization process removes punctuation or if stopwords are ignored.
 In the example below, only one annotation is produced and it has 3 keywords:
 
-.. code-block:: python
-
- from iamsystem import Matcher, english_tokenizer, Term
- term1 = Term(label="Infectious Disease", code="J80")
- term2 = Term(label="infectious disease", code="C0042029")
- term3 = Term(label="infectious disease, unspecified", code="C0042029")
- tokenizer = english_tokenizer()
- matcher = Matcher(tokenizer=tokenizer)
- matcher.add_stopwords(words=["unspecified"])
- matcher.add_keywords(keywords=[term1, term2, term3])
- text = "History of infectious disease"
- annots = matcher.annot_text(text=text)
- annot = annots[0]
- for keyword in annot.keywords:
- print(keyword)
-
-.. code-block:: pycon
-
- # Infectious Disease (J80)
- # infectious disease (C0042029)
- # infectious disease, unspecified (C0042029)
+.. literalinclude:: ../../tests/test_doc.py
+ :language: python
+ :dedent:
+ :linenos:
+ :start-after: # start_test_annotation_multiple_keywords
+ :end-before: # end_test_annotation_multiple_keywords
 
 Overlapping and ancestors
 ^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -98,43 +66,24 @@ Furthermore, if a1 has all the tokens of a2 then a2 is called a **nested annotat
 By default, the :ref:`matcher:Matcher` removes nested annotation.
 For example:
 
-.. code-block:: python
+.. literalinclude:: ../../tests/test_doc.py
+ :language: python
+ :dedent:
  :linenos:
- :emphasize-lines: 5, 10
-
- from iamsystem import Matcher
- matcher = Matcher()
- matcher.add_labels(labels=["lung", "lung cancer"])
- text = "Presence of a lung cancer"
- annots = matcher.annot_text(text=text, w=1)
- for annot in annots:
- print(annot)
- # lung cancer	14 25	lung cancer
- self.assertEqual("lung cancer	14 25	lung cancer", str(annots[0]))
- matcher.remove_nested_annots = False
- annots = matcher.annot_text(text=text, w=1)
- for annot in annots:
- print(annot)
- # lung	14 18	lung
- # lung cancer	14 25	lung cancer
+ :emphasize-lines: 6,11
+ :start-after: # start_test_annotation_overlapping_ancestors
+ :end-before: # end_test_annotation_overlapping_ancestors
+
 
 Another example where the first annotation fully overlaps the second but the latter is not
 a nested annotation:
 
-.. code-block:: python
-
- from iamsystem import Matcher
- matcher = Matcher()
- matcher.add_labels(labels=["North America", "South America"])
- text = "North and South America"
- annots = matcher.annot_text(text=text, w=3)
- for annot in annots:
- print(annot)
-
-.. code-block:: pycon
-
- # North America	0 5;16 23	North America
- # South America	10 23	South America
+.. literalinclude:: ../../tests/test_doc.py
+ :language: python
+ :dedent:
+ :linenos:
+ :start-after: # start_test_annotation_overlapping_not_ancestors
+ :end-before: # end_test_annotation_overlapping_not_ancestors
 
 The first annotation, starting at offset 0 and ending at offset 23, fully overlaps the second.
 However, it doesn't have all the tokens of the second annotation,
@@ -155,19 +104,13 @@ Partial overlapping
 Definition: let a1 and a2 two annotations. If a1.start < a2.start and a2.start < a1.end
 then we say that a1 **partially overlaps** a2.
 
-.. code-block:: python
-
- from iamsystem import Matcher
- matcher = Matcher()
- matcher.add_labels(labels=["lung cancer", "cancer prognosis"])
- annots = matcher.annot_text(text="lung cancer prognosis")
- for annot in annots:
- print(annot)
 
-.. code-block:: pycon
-
- # lung cancer	0 11	lung cancer
- # cancer prognosis	5 21	cancer prognosis
+.. literalinclude:: ../../tests/test_doc.py
+ :language: python
+ :dedent:
+ :linenos:
+ :start-after: # start_test_annotation_partial_overlap
+ :end-before: # end_test_annotation_partial_overlap
 
 The first annotation partially overlaps the second because it ends after the second starts.
 In this example, both annotations share the *"cancer"* token.

diff --git a/docs/source/api_doc.rst b/docs/source/api_doc.rst
@@ -238,7 +238,21 @@ ESpellWiseAlgo
  :undoc-members:
  :show-inheritance:
 
+SimString
+^^^^^^^^^
+SimStringWrapper
+""""""""""""""""
+.. autoclass:: iamsystem.fuzzy.simstring.SimStringWrapper
+ :members:
+ :undoc-members:
+ :show-inheritance:
 
+ESimStringMeasure
+"""""""""""""""""
+.. autoclass:: iamsystem.fuzzy.simstring.ESimStringMeasure
+ :members:
+ :undoc-members:
+ :show-inheritance:
 
 Brat
 ----

diff --git a/docs/source/brat.rst b/docs/source/brat.rst
@@ -14,29 +14,19 @@ The class :ref:`api_doc:BratDocument` can store **Brat entities** and **Brat not
 Each entity corresponds to an annotation:
 
 - An ID
-- A Brat type that should be declared in Brat's configuration file (annotation.conf)
+- A Brat type declared in Brat's configuration file (annotation.conf)
 - start-end offsets
 - text substring
 
-.. code-block:: python
- :linenos:
- :emphasize-lines: 8
-
- from iamsystem import Matcher, Term, BratDocument
- matcher = Matcher()
- term1 = Term(label="North America", code="NA")
- matcher.add_keywords(keywords=[term1])
- text = "North and South America"
- annots = matcher.annot_text(text=text, w=3)
- brat_document = BratDocument()
- brat_document.add_annots(annots, text=text, brat_type="CONTINENT", keyword_attr=None)
- print(str(brat_document))
-
 
-.. code-block:: pycon
+.. literalinclude:: ../../tests/test_doc.py
+ :language: python
+ :dedent:
+ :linenos:
+ :emphasize-lines: 12
+ :start-after: # start_test_brat_document
+ :end-before: # end_test_brat_document
 
- # T1	CONTINENT 0 5;16 23	North America
- # #1	IAMSYSTEM T1	North America (NA)
 
 The first line is the brat entity, the second is the brat note. T1 is the ID of the brat entity.
 Each note is linked to a brat entity by its ID, here T1.
@@ -50,49 +40,22 @@ applies to all annotations.
 If you have multiple Brat types, a better way to do this is to store the Brat type
 in a :ref:`api_doc:Keyword` subclass attribute and to pass the attribute name to the *add_annots* function:
 
-.. code-block:: python
+.. literalinclude:: ../../tests/test_doc.py
+ :language: python
+ :dedent:
  :linenos:
- :emphasize-lines: 14
-
- from iamsystem import Term
- class Entity(Term):
- def __init__(self, label: str, code: str, brat_type: str):
- super().__init__(label, code)
- self.brat_type = brat_type
-
- from iamsystem import Matcher, BratDocument
- matcher = Matcher()
- term1 = Entity(label="North America", code="NA", brat_type="CONTINENT")
- matcher.add_keywords(keywords=[term1])
- text = "North and South America"
- annots = matcher.annot_text(text=text, w=3)
- brat_document = BratDocument()
- brat_document.add_annots(annots=annots, text=text, keyword_attr='brat_type')
- print(str(brat_document))
-
-.. code-block:: pycon
-
- # T1	CONTINENT 0 5;16 23	North America
- # #1	IAMSYSTEM T1	North America (NA)
+ :emphasize-lines: 18
+ :start-after: # start_test_brat_doc_keyword
+ :end-before: # end_test_brat_doc_keyword
 
 Brat Writer
-^^^^^^^^^^^^^
+^^^^^^^^^^^
 
 This package provides an utility class to write a :ref:`api_doc:BratDocument`.
 
-.. code-block:: python
+.. literalinclude:: ../../tests/test_doc.py
+ :language: python
+ :dedent:
  :linenos:
- :emphasize-lines: 11,12
-
- from iamsystem import Matcher, Term, BratDocument, BratWriter
- matcher = Matcher()
- term1 = Term(label="North America", code="NA")
- matcher.add_keywords(keywords=[term1])
- text = "North and South America"
- annots = matcher.annot_text(text=text, w=3)
- brat_document = BratDocument()
- brat_document.add_annots(annots, text=text, brat_type="CONTINENT")
- filename = "./doc.ann"
- with(open(filename, 'w')) as f:
- BratWriter.saveEntities(brat_entities=brat_document.get_entities(), write=f.write)
- BratWriter.saveNotes(brat_notes=brat_document.get_notes(), write=f.write)
+ :start-after: # start_test_brat_writer
+ :end-before: # end_test_brat_writer