Exploring Textual Data

Couverture
Springer Science & Business Media, 31 déc. 1997 - 247 pages
0 Avis
Researchers in a number of disciplines deal with large text sets requiring both text management and text analysis. Faced with a large amount of textual data collected in marketing surveys, literary investigations, historical archives and documentary data bases, these researchers require assistance with organizing, describing and comparing texts.
Exploring Textual Data demonstrates how exploratory multivariate statistical methods such as correspondence analysis and cluster analysis can be used to help investigate, assimilate and evaluate textual data. The main text does not contain any strictly mathematical demonstrations, making it accessible to a large audience. This book is very user-friendly with proofs abstracted in the appendices. Full definitions of concepts, implementations of procedures and rules for reading and interpreting results are fully explored. A succession of examples is intended to allow the reader to appreciate the variety of actual and potential applications and the complementary processing methods. A glossary of terms is provided.
 

Avis des internautes - Rédiger un commentaire

Aucun commentaire n'a été trouvé aux emplacements habituels.

Pages sélectionnées

Table des matières

TEXTUAL STATISTICS SCOPE AND APPLICATIONS
5
111 The linguistic viewpoint
6
112 Content analysis
7
113 Artificial intelligence
8
121 Pioneering works
9
A STATISTICIANS VIEWPOINT
10
132 Internal and external information metadata
11
133 A wealth of metainformation
12
512 Aggregated lexical tables
103
513 Frequency threshold for words
104
515 Construction of aggregated lexical and segmental table
108
516 Analysis and interpretation of lexical tables
111
517 Illustration of displays using repeated segments
115
52 WORKING DEMOGRAPHIC PARTITIONS
118
53 DIRECT ANALYSIS OF RESPONSES OR DOCUMENTS
121
531 How are distances interpreted?
122

RESPONSES TO OPEN QUESTIONS
14
a research tool
15
142 Manual postcoding of free responses
17
groups of responses
18
THE UNITS OF TEXTUAL STATISTICS
21
211 Computerized text
22
213 Lemmatized analyses
23
214 Semantically based approaches
24
215 Brief comparison with other languages
25
22 SEGMENTATION AND NUMERIC CODING OF TEXT
26
221 Numeric coding of Life corpus
27
222 Corpus P
28
232 Zipfs law
29
24 LEXICOMETRIC DOCUMENTS
31
241 Index of a corpus
32
243 Vocabulary growth
34
244 Lexical tables
35
251 Sentences sequences
36
252 Repeated segments table
37
26 FINDING COOCCURRENCES QUASISEGMENTS
39
262 Finding multiple cooccurrences quasisegments
40
272 Comparison of main quantitative characteristics
41
CORRESPONDENCE ANALYSIS OF LEXICAL TABLES
45
31 BASIC PRINCIPLES OF MULTIVARIATE DESCRIPTIVE METHODS
46
32 CORRESPONDENCE ANALYSIS
47
323 Validity of the representation
55
324 Active and supplementary variables
60
325 A comparison with principal components analysis
63
33 MULTIPLE CORRESPONDENCE ANALYSIS
69
331 Basic structure of a survey sample
71
332 Validity of the representation
76
333 Positioning of supplementary variables
78
CLUSTER ANALYSIS OF WORDS AND TEXTS
81
41 REVIEW OF HIERARCHICAL CLUSTER ANALYSIS
82
411 The dendrogram
83
412 Cutting the dendrogram
84
413 Appending supplementary elements
85
414 Filtering on first principal axes
86
421 Cluster analysis of words
87
422 Cluster analysis of texts
90
423 Notes on cluster analysis of words
91
43 CLUSTER ANALYSIS OF SURVEY DATA SETS
94
431 Mixed clustering algorithms
95
432 Sequence of operations in survey analysis
96
working demographic partition
97
VISUALIZATION OF TEXTUAL DATA
101
51 CORRESPONDENCE ANALYSIS OF LEXICAL TABLES
102
532 Analysis of sparse matrix T
123
533 Application example
124
CHARACTERISTIC TEXTUAL UNITS MODAL RESPONSES AND MODAL TEXTS
129
61 CHARACTERISTIC ELEMENTS
130
612 List of characteristic units
134
62 MODAL RESPONSES
136
621 Selection of modal responses using characteristic elements
137
622 Selection of modal responses using chisquare distances
140
623 Implementation and examples
141
LONGITUDINAL PARTITIONS TEXTUAL TIME SERIES
147
711 Longitudinal partitioning example
148
712 Analysis of age category gradation
149
713 Adjacent characteristic elements
150
72 TEXTUAL TIME SERIES
153
721 Speeches time series
154
722 Chronological characteristic elements
155
723 Characteristic increments
157
724 Parallel analysis of a lemmatized corpus
161
TEXTUAL DISCRIMINANT ANALYSIS
163
81 TWO MAJOR AREAS OF CONCERN IN TEXTUAL ANALYSIS
164
information retrieval coding validation
165
82 UNITS AND INDICES OF STYLOMETRY
166
821 Function words speech parts
167
822 Richness of vocabulary
168
AN EXAMPLE
169
832 Available data for attribution problems
170
833 Other approaches to the problem
173
84 GLOBAL DISCRIMINANT ANALYSIS
174
841 General principles
175
842 Units for global discriminant analysis
177
844 Discriminant analysis regularized through preliminary correspondence analysis
179
85 GLOBAL DISCRIMINATION AND VALIDATION
181
852 Vocabulary and analysis for Tokyo
185
853 Reality of patterns
192
854 Discriminant analysis and confusion matrices
193
855 Conclusions to section 85
199
Singular value decomposition and correspondence analysis
200
Clustering techniques
211
More details about the nonparametric estimation model
219
Search for repeated segments in a corpus
221
Glossary
224
References
229
Author Index
238
Subject Index
242
Symbols
246
Droits d'auteur

Autres éditions - Tout afficher

Expressions et termes fréquents

Fréquemment cités

Page 236 - Hall, DJ (1967). A Clustering Technique for Summarizing Multivariate Data.

Références à ce livre

Tous les résultats Google Recherche de Livres »

Informations bibliographiques