📚 CIMPLE Knowledge Graph (CimpleKG)

The CIMPLE Knowledge Graph (CimpleKG) is a continuously updated large knowledge graph that has been created to help researchers combat misinformation. CimpleKG links information from fact-checking organizations with other datasets about misinformation, giving researchers a more comprehensive view of the problem.

This repository contains scripts to deploy the Knowledge Graph developed within the CIMPLE project.

The data being loaded is available at https://github.com/CIMPLE-project/knowledge-base/releases and is updated on a daily (nightly) basis.

The source code to retrieve the body of the claim review from the specified url is available in the claimreview-text-extractor repository.

We fully document the URI design pattern that are used to identify all objects in the knowledge graph.

The code that converts the daily updated Claim Reviews into RDF is available in the converter repository.

The Claim Reviews data integrated in the CimpleKG is available on the claimreview-data repository.

🔍 Knowledge Graph Overview and Documentation

CimpleKG links daily updated data from 70+ fact-checking organisations with over 200k documents from static misinformation datasets. The knowledge graph is augmented with textual features and entities extracted from the textual data integrated into the graph. The knowledge graph contains more than 15m triples, including 263k+ distinct entities and 1m textual features with over 203k fact-checked claims, spanning 26 languages and 36 countries. Detailed statistics can be found on the releases page.

A public SPARQL endpoint is available at https://data.cimple.eu/sparql and data releases are made available in this repository. The knowledge graph can be also loaded and queried locally (see Initialising the Knowledge Graph).

SPARQL Query examples and additional documentation can be found in the documentation page.

RDF Namespaces

CimpleKG commonly uses the following namespaces and prefixes:

Prefix	URI
dc	http://purl.org/dc/elements/1.1/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
rnews	http://iptc.org/std/rNews/2011-10-07#
schema	http://schema.org/
xsd	http://www.w3.org/2001/XMLSchema#

They can be imported into Virtuoso through the isql interface:

DB.DBA.XML_SET_NS_DECL ('dc', 'http://purl.org/dc/elements/1.1/', 2); DB.DBA.XML_SET_NS_DECL ('rdf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 2); DB.DBA.XML_SET_NS_DECL ('rnews', 'http://iptc.org/std/rNews/2011-10-07#', 2); DB.DBA.XML_SET_NS_DECL ('schema', 'http://schema.org/', 2); DB.DBA.XML_SET_NS_DECL ('xsd', 'http://www.w3.org/2001/XMLSchema#', 2);

🚧 Initialising the Knowledge Graph

This section covers the steps required to set up a new Knowledge Base for the first time.

Clone this repository.

git clone https://github.com/CIMPLE-project/knowledge-base.git cd knowledge-base

Copy the .env.example file to .env and edit it to set the environment variables accordingly.
- DBA_PASSWORD: Password for the Virtuoso database.
- SPARQL_UPDATE: Enable SPARQL update queries.
- VIRT_SPARQL_ResultSetMaxRows: Maximum number of rows to return in a SPARQL query.
- VIRT_SPARQL_MaxQueryCostEstimationTime: Maximum time to estimate the cost of a SPARQL query.
- VIRT_SPARQL_MaxQueryExecutionTime: Maximum time to execute a SPARQL query.
- VIRTUOSO_DATA_PATH: Path to the Virtuoso data directory.
- VIRTUOSO_PORT: Port to expose the Virtuoso database.
- WHD_HOOK_TIMEOUT: Timeout for the webhook server.
- GITHUB_TOKEN: GitHub token to create the releases.
- CIMPLE_FACTORS_MODELS_PATH: Path to the CIMPLE factors models.
Run docker compose to start the Virtuoso database and the webhook server.
```
docker compose up -d
```
Generate a password for the webhook server and restart the service.

This step is optional but recommended if you plan to expose the webhook server to the internet.
```
docker compose exec webhookd htpasswd -B -c /etc/webhookd/.htpasswd api docker compose restart webhookd
```

Loading data into the Knowledge base

Copy all your RDF files into a dumps folder inside the data directory (defined by VIRTUOSO_DATA_PATH in the .env file).

Directory structure example (in this case VIRTUOSO_DATA_PATH is set to /var/docker/cimple/virtuoso/data):
- /var/docker/cimple/virtuoso/data/dumps/
  - iptc/*.ttl
  - agencefrancepresse/*.ttl
Run the following command to load all dumps:

The script deploy_all.sh will initialize the prefixes, and load all the vocabularies, IPTC codes, and RDF dumps.
```
docker compose exec virtuoso sh /scripts/deploy_all.sh
```

Manually loading a specific file

You can also load certain files given a pattern using the load.sh script.

(Note: make sure that the files you wish to load have been copied to the dumps folder inside the Virtuoso data directory).

For example, the following command will load all dumps contained in the folder "agencefrancepresse", starting with "2020_", and ending with ".ttl":

docker compose exec virtuoso sh /scripts/load.sh -p5 -g "http://data.cimple.eu/agencefrancepresse/news" "agencefrancepresse" "2020_*.ttl"

To load all files from the folder "agencefrancepresse/FRA":

docker compose exec virtuoso sh /scripts/load.sh -p5 -g "http://data.cimple.eu/agencefrancepresse/news" "agencefrancepresse/FRA" "*.*"

Syntax: load.sh [options] [graph] [dir path] [file mask]]

List of parameters:

-h --help Show help -p --parallel Number of parallel threads for loading RDF data (through rdf_loader_run()) -g --graph Name of graph to load the data into -c --clear Clear graph before loading

Webhook server

The webhook server is used to trigger the deployment of the RDF data.

Webhooks list:

http://localhost:8880/redeploy - Executes the deployment script.
http://localhost:8880/status - Returns "OK" if the service is running.

Example:

curl -u api:$API_PASSWORD http://localhost:8880/status curl -u api:$API_PASSWORD -XPOST http://localhost:8880/redeploy?url=https%3A%2F%2Fgithub.com%2FMartinoMensio%2Fclaimreview-data%2Freleases%2Ftag%2F2023_08_22

(replace $API_PASSWORD with the password you generated during Setup step)

Dereferencing

The list of path to be dereferenced is in dereferencing/config.yml. See the full list of URI patterns for reference.

For exporting the apache config and the script for adding them to Virtuoso, run:

cd dereferencing npx list2dereference config.yml docker compose cp insert_vhost.sql virtuoso:/insert_vhost.sql docker compose exec -i virtuoso sh -c "isql-v -U dba -P \${DBA_PASSWORD} < /insert_vhost.sql"

Read more at https://github.com/pasqLisena/list2dereference

URL Shortening

The service can be accessed at http://cimple.eurecom.fr/c/.

To install the URL shortening service, run the following commands:

docker compose cp scripts/c_uri_dav.vad virtuoso:/usr/local/virtuoso-opensource/share/virtuoso/vad/c_uri_dav.vad docker compose exec -i virtuoso sh -c "isql-v -U dba -P \${DBA_PASSWORD} exec=\"DB.DBA.VAD_INSTALL('/usr/local/virtuoso-opensource/share/virtuoso/vad/c_uri_dav.vad');\""

The service is hosted on the route /c. You may have to update the apache2 Virtual Host configuration to map the route, for example (assuming Virtuoso is hosted on port 8890):

<Location /c> ProxyPreserveHost On ProxyPass http://localhost:8890/c ProxyPassReverse http://localhost:8890/c </Location>

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
dereferencing		dereferencing
scripts		scripts
webhookd		webhookd
.env.example		.env.example
.gitignore		.gitignore
CimpleKG_claimreviews_2024_04_11.png		CimpleKG_claimreviews_2024_04_11.png
CimpleKG_data_model.png		CimpleKG_data_model.png
Documentation.md		Documentation.md
README.md		README.md
URI.patterns.md		URI.patterns.md
compose.yml		compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📚 CIMPLE Knowledge Graph (CimpleKG)

🔍 Knowledge Graph Overview and Documentation

RDF Namespaces

🚧 Initialising the Knowledge Graph

Loading data into the Knowledge base

Manually loading a specific file

Webhook server

Dereferencing

URL Shortening

About

Uh oh!

Releases 458

Packages

Contributors 4

Uh oh!

Languages

CIMPLE-project/knowledge-base

Folders and files

Latest commit

History

Repository files navigation

📚 CIMPLE Knowledge Graph (CimpleKG)

🔍 Knowledge Graph Overview and Documentation

RDF Namespaces

🚧 Initialising the Knowledge Graph

Loading data into the Knowledge base

Manually loading a specific file

Webhook server

Dereferencing

URL Shortening

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 458

Packages 0

Contributors 4

Uh oh!

Languages

Packages