Skip to content
/ CPT Public

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Notifications You must be signed in to change notification settings

fastnlp/CPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CPT

This repository contains code and checkpoints for CPT.

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Fei Yang, Li Zhe, Hujun Bao, Xipeng Qiu

Introduction

Aiming to unify both NLU and NLG tasks, We propose a novel Chinese Pre-trained Un-balanced Transformer (CPT), which is an unbalanced Transformer encoder-decoder pre-trained with MLM and DAE jointly.



The architecture of CPT is a variant of the full Transformer and consists of three parts:

  1. Shared Encoder (S-Enc): a Transformer encoder with fully-connected self-attention, which is designed to capture the common semantic representation for both language understanding and generation.
  2. Understanding Decoder (U-Dec): a shallow Transformer encoder with fully-connected self-attention, which is designed for NLU tasks. The input of U-Dec is the output of S-Enc.
  3. Generation Decoder (G-Dec): a Transformer decoder with masked self-attention, which is designed for generation tasks with auto-regressive fashion. G-Dec utilizes the output of S-Enc with cross-attention.

Pre-Trained Models

We provide the pre-trained weights of CPT and Chinese BART with source code, which can be directly used in Huggingface-Transformers.

  • Chinese BART-base: 6 layers Encoder, 6 layers Decoder, 12 Heads and 768 Model dim.
  • Chinese BART-large: 12 layers Encoder, 12 layers Decoder, 16 Heads and 1024 Model dim.
  • CPT-base: 10 layers S-Enc, 2 layers U-Dec/G-Dec, 12 Heads and 768 Model dim.
  • CPT-large: 20 layers S-Enc, 4 layers U-Dec/G-Dec, 16 Heads and 1024 Model dim.

The pre-trained weights can be downloaded here.

Model MODEL_NAME
Chinese BART-base fnlp/bart-base-chinese
Chinese BART-large fnlp/bart-large-chinese
CPT-base fnlp/cpt-base
CPT-large fnlp/cpt-large

To use CPT, please import the file finetune/modeling_cpt.py that define the architecture of CPT into your project. Then, use the PTMs as the following example, where MODEL_NAME is the corresponding string that refers to the model.

For CPT:

from modeling_cpt import BertTokenizer, CPTForConditionalGeneration tokenizer = BertTokenizer.from_pretrained("MODEL_NAME") model = CPTForConditionalGeneration.from_pretrained("MODEL_NAME") print(model)

For Chinese BART:

from transformers import BertTokenizer, BartForConditionalGeneration tokenizer = BertTokenizer.from_pretrained("MODEL_NAME") model = BartForConditionalGeneration.from_pretrained("MODEL_NAME") print(model)

Pre-Training

Pre-training code and examples can be find Here.

Fine-Tuning

Fine-tuning code and examples can be find Here.

Citation

@article{shao2021cpt, title={CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation}, author={Yunfan Shao and Zhichao Geng and Yitao Liu and Junqi Dai and Fei Yang and Li Zhe and Hujun Bao and Xipeng Qiu}, journal={arXiv preprint arXiv:2109.05729}, year={2021} }

About

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published