Algorithm::LDA - A Raku Latent Dirichlet Allocation implementation.
use Algorithm::LDA; use Algorithm::LDA::Formatter; use Algorithm::LDA::LDAModel; my @documents = ( "a b c", "d e f", ); my ($documents, $vocabs) = Algorithm::LDA::Formatter.from-plain(@documents); my Algorithm::LDA $lda .= new(:$documents, :$vocabs); my Algorithm::LDA::LDAModel $model = $lda.fit(:num-topics(3), :num-iterations(500)); $model.topic-word-matrix.say; # show topic-word matrix $model.document-topic-matrix; # show document-topic matrix $model.log-likelihood.say; # show likelihood $model.nbest-words-per-topic.say # show nbest words per topic use Algorithm::LDA; use Algorithm::LDA::Formatter; use Algorithm::LDA::LDAModel; # Note: You can get AP corpus as follows: # $ wget "https://github.com/Blei-Lab/lda-c/blob/master/example/ap.tgz?raw=true" -O ap.tgz # $ tar xvzf ap.tgz my @vocabs = "./ap/vocab.txt".IO.lines; my @documents = "./ap/ap.dat".IO.lines; my $documents = Algorithm::LDA::Formatter.from-libsvm(@documents); my Algorithm::LDA $lda .= new(:$documents, :@vocabs); my Algorithm::LDA::LDAModel $model = $lda.fit(:num-topics(20), :num-iterations(500)); $model.topic-word-matrix.say; # show topic-word matrix $model.document-topic-matrix; # show document-topic matrix $model.log-likelihood.say; # show likelihood $model.nbest-words-per-topic.say # show nbest words per topic Algorithm::LDA is a Raku Latent Dirichlet Allocation implementation.
Defined as:
submethod BUILD(:$!documents!, :$!vocabs! is raw) { } Constructs a new Algorithm::LDA instance.
Defined as:
method fit(Int :$num-iterations = 500, Int :$num-topics!, Num :$alpha = 0.1e0, Num :$beta = 0.1e0, Int :$seed --> Algorithm::LDA::LDAModel) Returns an Algorithm::LDA::LDAModel instance.
-
:$num-ierationsis the number of iterations for gibbs sampler -
:$num-topics!is the number of topics -
alphais the prior for theta distribution (i.e., document-topic distribution) -
betais the prior for phi distribution (i.e., topic-word distribution) -
seedis the seed for srand
titsuki titsuki@cpan.org
Copyright 2018 titsuki
This library is free software; you can redistribute it and/or modify it under the Artistic License 2.0.
The algorithm is from:
-
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.
-
Li, Wei, and Andrew McCallum. "Pachinko allocation: DAG-structured mixture models of topic correlations." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
-
Wallach, Hanna M., et al. "Evaluation methods for topic models." Proceedings of the 26th annual international conference on machine learning. ACM, 2009.
-
Minka, Thomas. "Estimating a Dirichlet distribution." (2000): 4.