Franklin

Text Mining : Applications and Theory.

Author/Creator:
Berry, Michael W.
Publication:
New York : John Wiley & Sons, Incorporated, 2010.
Format/Description:
Book
1 online resource (223 pages)
Edition:
2nd ed.
Status/Location:
Loading...

Options
Location Notes Your Loan Policy

Details

Other records:
Subjects:
Data mining -- Congresses.
Natural language processing (Computer science) -- Congresses.
Form/Genre:
Electronic books.
Summary:
"It is extremely useful for practitioners and students in computer science, natural language processing, bioinformatics and engineering who wish to use text mining techniques." (Journal of Information Retrieval, 1 April 2011).
Contents:
Intro
Text Mining
Contents
List of Contributors
Preface
PART I TEXT EXTRACTION, CLASSIFICATION, AND CLUSTERING
1 Automatic keyword extraction from individual documents
1.1 Introduction
1.1.1 Keyword extraction methods
1.2 Rapid automatic keyword extraction
1.2.1 Candidate keywords
1.2.2 Keyword scores
1.2.3 Adjoining keywords
1.2.4 Extracted keywords
1.3 Benchmark evaluation
1.3.1 Evaluating precision and recall
1.3.2 Evaluating efficiency
1.4 Stoplist generation
1.5 Evaluation on news articles
1.5.1 The MPQA Corpus
1.5.2 Extracting keywords from news articles
1.6 Summary
1.7 Acknowledgements
References
2 Algebraic techniques for multilingual document clustering
2.1 Introduction
2.2 Background
2.3 Experimental setup
2.4 Multilingual LSA
2.5 Tucker1 method
2.6 PARAFAC2 method
2.7 LSA with term alignments
2.8 Latent morpho-semantic analysis (LMSA)
2.9 LMSA with term alignments
2.10 Discussion of results and techniques
2.11 Acknowledgements
References
3 Content-based spam email classification using machine-learning algorithms
3.1 Introduction
3.2 Machine-learning algorithms
3.2.1 Naive Bayes
3.2.2 LogitBoost
3.2.3 Support vector machines
3.2.4 Augmented latent semantic indexing spaces
3.2.5 Radial basis function networks
3.3 Data preprocessing
3.3.1 Feature selection
3.3.2 Message representation
3.4 Evaluation of email classification
3.5 Experiments
3.5.1 Experiments with PU1
3.5.2 Experiments with ZH1
3.6 Characteristics of classifiers
3.7 Concluding remarks
3.8 Acknowledgements
References
4 Utilizing nonnegative matrix factorization for email classification problems
4.1 Introduction
4.1.1 Related work
4.1.2 Synopsis
4.2 Background.
4.2.1 Nonnegative matrix factorization
4.2.2 Algorithms for computing NMF
4.2.3 Datasets
4.2.4 Interpretation
4.3 NMF initialization based on feature ranking
4.3.1 Feature subset selection
4.3.2 FS initialization
4.4 NMF-based classification methods
4.4.1 Classification using basis features
4.4.2 Generalizing LSI based on NMF
4.5 Conclusions
4.6 Acknowledgements
References
5 Constrained clustering with k-means type algorithms
5.1 Introduction
5.2 Notations and classical k-means
5.3 Constrained k-means with Bregman divergences
5.3.1 Quadratic k-means with cannot-link constraints
5.3.2 Elimination of must-link constraints
5.3.3 Clustering with Bregman divergences
5.4 Constrained smoka type clustering
5.5 Constrained spherical k-means
5.5.1 Spherical k-means with cannot-link constraints only
5.5.2 Spherical k-means with cannot-link and must-link constraints
5.6 Numerical experiments
5.6.1 Quadratic k-means
5.6.2 Spherical k-means
5.7 Conclusion
References
PART II ANOMALY AND TREND DETECTION
6 Survey of text visualization techniques
6.1 Visualization in text analysis
6.2 Tag clouds
6.3 Authorship and change tracking
6.4 Data exploration and the search for novel patterns
6.5 Sentiment tracking
6.6 Visual analytics and FutureLens
6.7 Scenario discovery
6.7.1 Scenarios
6.7.2 Evaluating solutions
6.8 Earlier prototype
6.9 Features of FutureLens
6.10 Scenario discovery example: bioterrorism
6.11 Scenario discovery example: drug trafficking
6.12 Future work
References
7 Adaptive threshold setting for novelty mining
7.1 Introduction
7.2 Adaptive threshold setting in novelty mining
7.2.1 Background
7.2.2 Motivation
7.2.3 Gaussian-based adaptive threshold setting
7.2.4 Implementation issues.
7.3 Experimental study
7.3.1 Datasets
7.3.2 Working example
7.3.3 Experiments and results
7.4 Conclusion
References
8 Text mining and cybercrime
8.1 Introduction
8.2 Current research in Internet predation and cyberbullying
8.2.1 Capturing IM and IRC chat
8.2.2 Current collections for use in analysis
8.2.3 Analysis of IM and IRC chat
8.2.4 Internet predation detection
8.2.5 Cyberbullying detection
8.2.6 Legal issues
8.3 Commercial software for monitoring chat
8.4 Conclusions and future directions
8.5 Acknowledgements
References
PART III TEXT STREAMS
9 Events and trends in text streams
9.1 Introduction
9.2 Text streams
9.3 Feature extraction and data reduction
9.4 Event detection
9.5 Trend detection
9.6 Event and trend descriptions
9.7 Discussion
9.8 Summary
9.9 Acknowledgements
References
10 Embedding semantics in LDA topic models
10.1 Introduction
10.2 Background
10.2.1 Vector space modeling
10.2.2 Latent semantic analysis
10.2.3 Probabilistic latent semantic analysis
10.3 Latent Dirichlet allocation
10.3.1 Graphical model and generative process
10.3.2 Posterior inference
10.3.3 Online latent Dirichlet allocation (OLDA)
10.3.4 Illustrative example
10.4 Embedding external semantics from Wikipedia
10.4.1 Related Wikipedia articles
10.4.2 Wikipedia-influenced topic model
10.5 Data-driven semantic embedding
10.5.1 Generative process with data-driven semantic embedding
10.5.2 OLDA algorithm with data-driven semantic embedding
10.5.3 Experimental design
10.5.4 Experimental results
10.6 Related work
10.7 Conclusion and future work
References
Index.
Notes:
Description based on publisher supplied metadata and other sources.
Local notes:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2021. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Contributor:
Kogan, Jacob.
Berry, Professor Michael W.
Kogan, Professor Jacob.
Other format:
Print version: Berry, Michael W. Text Mining
ISBN:
9780470689653
9780470749821
OCLC:
815250653