文本分类应用,采用模型是Text-CNN
- 英文数据集:影评评论
- 中文数据集(已去除停用词):旅客酒店住宿评论;网络购物评论;书评;对话语料。 原始语料以及预处理脚本请见:链接:https://share.weiyun.com/5UoZkUx 密码:6tnnxy
- 训练好的模型:链接:https://share.weiyun.com/5tYgD3F 密码:weumvg
中文
x.shape -> (54568, 50) y.shape -> (54568, 2) len(vocabulary) -> 52822 len(vocabulary_inv) -> 52822 X_train.shape -> (43654, 50) y_train.shape -> (43654, 2) X_test.shape -> (10914, 50) y_test.shape -> (10914, 2) 英文
x.shape -> (10662, 56) y.shape -> (10662, 2) len(vocabulary) -> 12766 len(vocabulary_inv) -> 12766 X_train.shape -> (8529, 56) y_train.shape -> (8529, 2) X_test.shape -> (2133, 56) y_test.shape -> (2133, 2) 
