Skip to content

目标和参考资料 #1

@guonaihong

Description

@guonaihong

目标

  • 用go实现字符串相似度lib
  • 处理中文准确度较高(目前很多老外写的库处理中文效果不佳)
  • 集成多种相似度算法(编辑距离,汉明编码,骰子系数)

莱文斯坦-编辑距离(Levenshtein)

Hamming

Dice's coefficient

  • https://blog.csdn.net/gjk0223/article/details/2314844
    n个字符算集合一个元素,这点容易忽略,n是可以配置的,很多开源项目都忽略这点。原论文公式是 2 *(a 和b的交集) /(len(a) + len(b)),默认选择2,但是2对中文不太友好

Jaro

TODO

  • Damerau-Levenshtein - distance & normalized
  • Jaro and Jaro-Winkler - this implementation of Jaro-Winkler does not limit the common prefix length

补充

https://help.highbond.com/helpdocs/analytics/13/scripting-guide/zh-cn/Content/lang_ref/functions/r_dicecoefficient.htm

参考API设计(取名)

参考选用了哪些算法名字

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions