Skip to content

[Bug]: AttributeError: module 'paddlenlp.transformers.ernie.tokenizer' has no attribute 'ErnieFastTokenizer' #6666

@wuhaoyupku

Description

@wuhaoyupku

软件环境

paddle-bfloat 0.1.7 paddle2onnx 1.0.5 paddlefsl 1.1.0 paddlenlp 2.5.2 paddlepaddle-gpu 2.4.1 x2paddle 1.4.0

重复问题

  • I have searched the existing issues

错误描述

推理的时候报错: AttributeError: module 'paddlenlp.transformers.ernie.tokenizer' has no attribute 'ErnieFastTokenizer'

稳定复现步骤 & 代码

使用model_zoo/ernie-3.0的示例代码,修改了
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, use_fast=True)
用来使用 FastTokenizer。
训练完成之后,模型的tokenizer_config.json里面 tokenizer_class:"ErnieFastTokenizer"

然后加载模型进行推理,报错
AttributeError: module 'paddlenlp.transformers.ernie.tokenizer' has no attribute 'ErnieFastTokenizer'
推理的时候也是用
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, use_fast=True)

查看了一下tokenizer代码:
if init_class:
class_name = cls._name_mapping[init_class]
import_class = import_module(f"paddlenlp.transformers.{class_name}.tokenizer")
tokenizer_class = getattr(import_class, init_class)
if use_fast:
fast_tokenizer_class = cls._get_fast_tokenizer_class(init_class, class_name)
tokenizer_class = fast_tokenizer_class if fast_tokenizer_class else tokenizer_class
return tokenizer_class

在这个地方:
tokenizer_class = getattr(import_class, init_class)

确实是没有对tokenizer_config.json里面 tokenizer_class:"ErnieFastTokenizer" 进行处理的,因为use_fast的分支在后面。

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtriage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions