Text and Web Mining
What is Text Mining?Text Data Analysis and Information Retrieval Information retrieval (IR) is a field that has been developing in parallel with database systems for many years.Text mining is process of analyzing huge text data to retrieve the information from it.
Basic Measures for Text RetrievalPrecision: This is the percentage of retrieved documents that are in fact relevant tothe query (i.e., “correct” responses). It is formally defined asRecall: This is the percentage of documents that are relevant to the query and were,in fact, retrieved.
Retrieval and IndexingText Retrieval Methods 1) Document selection methods2) Document ranking methodsText Indexing Techniques 1) Inverted indices2) Signature files.
Query Processing TechniquesOnce an inverted index is created for a document collection, a retrieval system can answer a keyword query quickly by looking up which documents contain the query keywords.
Ways of dimensionality Reduction for Text1)Latent Semantic Indexing2) Locality Preserving Indexing3) Probabilistic Latent Semantic IndexingProbabilistic Latent Semantic Indexing schemas :1) Keyword-Based Association Analysis2) Document Classification Analysis3) Document Clustering Analysis
Mining WWWMining World wide webThe WWW is a huge, widely distributed, global information service center for news, advertisements , management, education, government, and many other information services. The Web also contains a rich and dynamic collection of hyperlink information and Web page access and usage information, providing rich sources for data mining.
Challenges in mining WWWThe Web seems to be too huge for effective data warehousing and data miningThe complexity of Web pages is far greater than that of any traditional text document collectionThe Web is a highly dynamic information sourceThe Web serves a broad diversity of user communitiesOnly a small portion of the information on the Web is truly relevant or useful
Web Usage MiningWeb usage mining is the third category in web mining. This type of web mining allows for the collection of Web access information for Web pages. This usage data provides the paths leading to accessed Web pages. This information is often gathered automatically into access logs via the Web server.
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

Data Mining: Text and web mining

  • 1.
  • 2.
    What is TextMining?Text Data Analysis and Information Retrieval Information retrieval (IR) is a field that has been developing in parallel with database systems for many years.Text mining is process of analyzing huge text data to retrieve the information from it.
  • 3.
    Basic Measures forText RetrievalPrecision: This is the percentage of retrieved documents that are in fact relevant tothe query (i.e., “correct” responses). It is formally defined asRecall: This is the percentage of documents that are relevant to the query and were,in fact, retrieved.
  • 4.
    Retrieval and IndexingTextRetrieval Methods 1) Document selection methods2) Document ranking methodsText Indexing Techniques 1) Inverted indices2) Signature files.
  • 5.
    Query Processing TechniquesOncean inverted index is created for a document collection, a retrieval system can answer a keyword query quickly by looking up which documents contain the query keywords.
  • 6.
    Ways of dimensionalityReduction for Text1)Latent Semantic Indexing2) Locality Preserving Indexing3) Probabilistic Latent Semantic IndexingProbabilistic Latent Semantic Indexing schemas :1) Keyword-Based Association Analysis2) Document Classification Analysis3) Document Clustering Analysis
  • 7.
    Mining WWWMiningWorld wide webThe WWW is a huge, widely distributed, global information service center for news, advertisements , management, education, government, and many other information services. The Web also contains a rich and dynamic collection of hyperlink information and Web page access and usage information, providing rich sources for data mining.
  • 8.
    Challenges in miningWWWThe Web seems to be too huge for effective data warehousing and data miningThe complexity of Web pages is far greater than that of any traditional text document collectionThe Web is a highly dynamic information sourceThe Web serves a broad diversity of user communitiesOnly a small portion of the information on the Web is truly relevant or useful
  • 9.
    Web Usage MiningWebusage mining is the third category in web mining. This type of web mining allows for the collection of Web access information for Web pages. This usage data provides the paths leading to accessed Web pages. This information is often gathered automatically into access logs via the Web server.
  • 10.
    Visit more selfhelp tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net