SPEECH 🚀

💬SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres

🍎 The project is an official implementation for SPEECH model and a repository for OntoEvent-Doc dataset, which has firstly been proposed in the paper 💬SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres accepted by ACL 2023 main conference.

🖥️ We also release the poster and slides for better understanding of this paper.

🤗 The implementations are based on Huggingface's Transformers and also referred to OntoED & DeepKE.

🤗 The baseline implementations are reproduced with codes referred to MAVEN's baselines or with official implementation.

Brief Introduction 📣

SPEECH is proposed to address event-centric structured prediction with energy-based hyperspheres.
SPEECH models complex dependency among event structured components with energy-based modeling, and represents event classes with simple but effective hyperspheres.

Project Structure 🔍

The structure of data and code is as follows:

SPEECH ├── README.md ├── ACL2023@Poster_Speech.pdf ├── ACL2023@Slides_Speech.pdf ├── requirements.txt # for package requirements ├── data_utils.py # for data processing ├── speech.py # main model (bert serves as the backbone) ├── speech_distilbert.py# main model (distilbert serves as the backbone) ├── speech_roberta.py # toy model (roberta serves as the backbone, not adopted in the paper and just for reference) ├── run_speech.py# for model running ├── run_speech.sh# bash file for model running  └── Datasets # data ├── MAVEN_ERE │   ├── train.jsonl # for training │   ├── test.jsonl # for testing │   └── valid.jsonl # for validation ├── OntoEvent-Doc │ ├── event_dict_label_data.json # containing all event type labels  │   ├── event_dict_on_doc_train.json# for training │   ├── event_dict_on_doc_test.json# for testing │ └── event_dict_on_doc_valid.json# for validation └── README.md

Requirements 📦

python==3.9.12
torch==1.13.0
transformers==4.25.1
scikit-learn==1.2.2
torchmetrics==0.9.3
sentencepiece==0.1.97

Usage 🛠️

1. Project Preparation:

Download this project and unzip the dataset. You can directly download the archive, or run git clone https://github.com/zjunlp/SPEECH.git in your teminal.

cd [LOCAL_PROJECT_PATH] git clone git@github.com:zjunlp/SPEECH.git

2. Data Preparation:

Unzip MAVEN_ERE and OntoEvent-Doc datasets stored at ./Datasets.

cd Datasets/ unzip MAVEN_ERE unzip OntoEvent-Doc cd ..

3. Running Preparation:

Install all required packages.
Adjust the parameters in run_speech.sh bash file.

pip install -r requirements.txt vim run_speech.sh # input the parameters, save and quit

Hint:

Please refer to main() function in run_speech.py file for detail meanings of each parameters.
Pay attention to --ere_task_type parameter candidates:
- "doc_all" is for "All Joint" experiments in the paper
- "doc_joint" is for each ERE subtask "+joint" experiments in the paper
- "doc_temporal"/"doc_causal/"doc_sub" is for each ERE subtask experiments only
Note that the loss ratio λ1, λ2, λ3, for trigger classification, event classification and event-relation extraction depends on different tasks, please ensure a correct setting of these ratios, referring to line 56-61 in speech.py and speech_distilbert.py file for details. We also present the loss ratio setting in Appendix B in our paper.

4. Running Model:

Run ./run_speech.sh for training, validation, and testing.

./run_speech.sh # Or you can run run_speech.py with manual parameter input in the terminal. python run_speech.py --para...

Hint:

A folder of model checkpoints will be saved at the path you input (--output_dir) in the bash file run_speech.sh or the command line in the terminal.
We also release the checkpoints for direct testing (Dismiss --do_train in the parameter input)

How about the Dataset 🗃️

We briefly introduce the datasets in Section 4.1 and Appendix A in our paper.

MAVEN_ERE is proposed in a paper and released in GitHub.

OntoEvent-Doc, formatted in document level, is derived from OntoEvent which is formatted in sentence level.

Statistics

The statistics of MAVEN-ERE and OntoEvent-Doc are shown below, and the detailed data schema can be referred to [./Datasets/README.md].

Dataset	#Document	#Mention	#Temporal	#Causal	#Subevent
MAVEN-ERE	4,480	112,276	1,216,217	57,992	15,841
OntoEvent-Doc	4,115	60,546	5,914	14,155	/

Data Format

The data schema of MAVEN-ERE can be referred to their GitHub. Experiments on MAVEN-ERE in our paper involve:

6 temporal relations: BEFORE, OVERLAP, CONTAINS, SIMULTANEOUS, BEGINS-ON, ENDS-ON
2 causal relations: CAUSE, PRECONDITION
1 subevent relation: subevent_relations

Experiments on OntoEvent-Doc in our paper involve:

3 temporal relations: BEFORE, AFTER, EQUAL
2 causal relations: CAUSE, CAUSEDBY

We also add a NA relation to signify no relation between the event mention pair for the two datasets.

🍒 The OntoEvent-Doc dataset is stored in json format. Each document (specialized with a doc_id, e.g., 95dd35ce7dd6d377c963447eef47c66c) in OntoEvent-Doc datasets contains a list of "events" and a dictionary of "relations", where the data format is as below:

[a doc_id]: { "events": [ { 'doc_id': '...', 'doc_title': 'XXX', 'sent_id': , 'event_mention': '......', 'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], 'trigger': '...', 'trigger_pos': [, ], 'event_type': '' }, { 'doc_id': '...', 'doc_title': 'XXX', 'sent_id': , 'event_mention': '......', 'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], 'trigger': '...', 'trigger_pos': [, ], 'event_type': '' }, ... ], "relations": { // each event-relation contains a list of 'sent_id' pairs. "COSUPER": [[,], [,], [,]], "SUBSUPER": [], "SUPERSUB": [], "CAUSE": [[,], [,]], "BEFORE": [[,], [,]], "AFTER": [[,], [,]], "CAUSEDBY": [[,], [,]], "EQUAL": [[,], [,]] } }

How to Cite 📝

📋 Thank you very much for your interest in our work. If you use or extend our work, please cite the following paper:

@inproceedings{ACL2023_SPEECH, author = {Shumin Deng and  Shengyu Mao and  Ningyu Zhang and  Bryan Hooi}, title = {SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres}, booktitle = {{ACL} {(1)}}, publisher = {Association for Computational Linguistics}, pages = {351--363}, year = {2023}, url = {https://aclanthology.org/2023.acl-long.21/} }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SPEECH 🚀

Brief Introduction 📣

Project Structure 🔍

Requirements 📦

Usage 🛠️

How about the Dataset 🗃️

Statistics

Data Format

How to Cite 📝

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Datasets		Datasets
ACL2023@Poster_Speech.pdf		ACL2023@Poster_Speech.pdf
ACL2023@Slides_Speech.pdf		ACL2023@Slides_Speech.pdf
LICENSE		LICENSE
README.md		README.md
data_utils.py		data_utils.py
requirements.txt		requirements.txt
run_speech.py		run_speech.py
run_speech.sh		run_speech.sh
speech.py		speech.py
speech_distilbert.py		speech_distilbert.py
speech_roberta.py		speech_roberta.py

License

zjunlp/SPEECH

Folders and files

Latest commit

History

Repository files navigation

SPEECH 🚀

Brief Introduction 📣

Project Structure 🔍

Requirements 📦

Usage 🛠️

How about the Dataset 🗃️

Statistics

Data Format

How to Cite 📝

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages