Name	Name	Last commit message	Last commit date
Latest commit History 10 Commits
finetune	finetune
misc	misc
pretrain	pretrain
.gitignore	.gitignore
README.md	README.md

Name

Last commit message

Last commit date

finetune

CPT

This repository contains code and checkpoints for CPT.

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Fei Yang, Li Zhe, Hujun Bao, Xipeng Qiu

Introduction

Aiming to unify both NLU and NLG tasks, We propose a novel Chinese Pre-trained Un-balanced Transformer (CPT), which is an unbalanced Transformer encoder-decoder pre-trained with MLM and DAE jointly.

The architecture of CPT is a variant of the full Transformer and consists of three parts:

Shared Encoder (S-Enc): a Transformer encoder with fully-connected self-attention, which is designed to capture the common semantic representation for both language understanding and generation.
Understanding Decoder (U-Dec): a shallow Transformer encoder with fully-connected self-attention, which is designed for NLU tasks. The input of U-Dec is the output of S-Enc.
Generation Decoder (G-Dec): a Transformer decoder with masked self-attention, which is designed for generation tasks with auto-regressive fashion. G-Dec utilizes the output of S-Enc with cross-attention.

Pre-Trained Models

We provide the pre-trained weights of CPT and Chinese BART with source code, which can be directly used in Huggingface-Transformers.

Chinese BART-base: 6 layers Encoder, 6 layers Decoder, 12 Heads and 768 Model dim.
Chinese BART-large: 12 layers Encoder, 12 layers Decoder, 16 Heads and 1024 Model dim.
CPT-base: 10 layers S-Enc, 2 layers U-Dec/G-Dec, 12 Heads and 768 Model dim.
CPT-large: 20 layers S-Enc, 4 layers U-Dec/G-Dec, 16 Heads and 1024 Model dim.

The pre-trained weights can be downloaded here.

Model	`MODEL_NAME`
`Chinese BART-base`	fnlp/bart-base-chinese
`Chinese BART-large`	fnlp/bart-large-chinese
`CPT-base`	fnlp/cpt-base
`CPT-large`	fnlp/cpt-large

To use CPT, please import the file finetune/modeling_cpt.py that define the architecture of CPT into your project. Then, use the PTMs as the following example, where MODEL_NAME is the corresponding string that refers to the model.

For CPT:

from modeling_cpt import BertTokenizer, CPTForConditionalGeneration tokenizer = BertTokenizer.from_pretrained("MODEL_NAME") model = CPTForConditionalGeneration.from_pretrained("MODEL_NAME") print(model)

For Chinese BART:

from transformers import BertTokenizer, BartForConditionalGeneration tokenizer = BertTokenizer.from_pretrained("MODEL_NAME") model = BartForConditionalGeneration.from_pretrained("MODEL_NAME") print(model)

Pre-Training

Pre-training code and examples can be find Here.

Fine-Tuning

Fine-tuning code and examples can be find Here.

Citation

@article{shao2021cpt, title={CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation}, author={Yunfan Shao and Zhichao Geng and Yitao Liu and Junqi Dai and Fei Yang and Li Zhe and Hujun Bao and Xipeng Qiu}, journal={arXiv preprint arXiv:2109.05729}, year={2021} }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CPT

Introduction

Pre-Trained Models

Pre-Training

Fine-Tuning

Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

fastnlp/CPT

Folders and files

Latest commit

History

Repository files navigation

CPT

Introduction

Pre-Trained Models

Pre-Training

Fine-Tuning

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages