-
- Notifications
You must be signed in to change notification settings - Fork 3k
Description
Is your suggestion for improvement related to a problem? Please describe.
MEDLINE records are indexed with headings and subheadings (MeSH terms), having a one-to-many relationship between headings and subheadings. PubMed displays the MeSH terms individually (in pairs), like this.
Notice that the heading "Kidney Diseases" repeats for each associated subheading, with trailing asterisks denoting "Major topics". This is not how the MeSH terms appear in PubMed exports, and therefore, not how JabRef imports them.
This is how the terms come in PubMed text files.
MH - *Kidney Diseases/diagnosis/epidemiology/physiopathology/therapy JabRef imports this unchanged as one keyword:
*Kidney Diseases/diagnosis/epidemiology/physiopathology/therapy This is how the terms appear in PubMed xml.
<MeshHeading> <DescriptorName UI="D007674" MajorTopicYN="Y">Kidney Diseases</DescriptorName> <QualifierName UI="Q000175" MajorTopicYN="N">diagnosis</QualifierName> <QualifierName UI="Q000453" MajorTopicYN="N">epidemiology</QualifierName> <QualifierName UI="Q000503" MajorTopicYN="N">physiopathology</QualifierName> <QualifierName UI="Q000628" MajorTopicYN="N">therapy</QualifierName> </MeshHeading>Again, JabRef imports this as one keyword, this time separating the subheadings with a comma:
Kidney Diseases, diagnosis, epidemiology, physiopathology, therapy Describe the solution you'd like
I would like JabRef to import MeSH terms as individual keywords using the same format as PubMed where each heading has a maximum of one subheading and the major topic is displayed as an asterisk at the end of the heading or subheading string. Keywords generated from plain text or xml files from PubMed should have the same format in JabRef.
The keywords should look like this:
Kidney Diseases*/diagnosis Kidney Diseases*/epidemiology Kidney Diseases*/physiopathology Kidney Diseases*/therapy
The bibtex source should look like this (assuming the user-define keyword separator is a semicolon):
Kidney Diseases*/diagnosis; Kidney Diseases*/epidemiology; Kidney Diseases*/physiopathology; Kidney Diseases*/therapy Parsing MeSH terms this way lets the keywords fit better in the GUI and makes it easier to search and filter by keyword.
Additional context
Ideally, the MEDLINE importer (and other importers) would check if the user-defined keyword separator is included in the input, and warn or choose a substitution in case of conflict. List items are appear one per line in PubMed text files, so the keyword separator should not be found in any lines that begin with MH - .
| * Parses the keyword list and uses {@link Keyword#DEFAULT_HIERARCHICAL_DELIMITER} as hierarchical delimiter. |
Regex for moving asterisks to the end.
(?<slash>/{0,1})\*(?<subhead>.+?(?=/|$))(?<=^MH - .*) Replace with
"${slash}${subhead}* Metadata
Metadata
Assignees
Type
Projects
Status