Exploring Textual Data

Springer Science & Business Media, 31 déc. 1997 - 247 pages

Researchers in a number of disciplines deal with large text sets requiring both text management and text analysis. Faced with a large amount of textual data collected in marketing surveys, literary investigations, historical archives and documentary data bases, these researchers require assistance with organizing, describing and comparing texts.
Exploring Textual Data demonstrates how exploratory multivariate statistical methods such as correspondence analysis and cluster analysis can be used to help investigate, assimilate and evaluate textual data. The main text does not contain any strictly mathematical demonstrations, making it accessible to a large audience. This book is very user-friendly with proofs abstracted in the appendices. Full definitions of concepts, implementations of procedures and rules for reading and interpreting results are fully explored. A succession of examples is intended to allow the reader to appreciate the variety of actual and potential applications and the complementary processing methods. A glossary of terms is provided.

Aperçu du livre »

À l'intérieur du livre

Pages sélectionnées

Page de titre

Index

Références

Table des matières

TEXTUAL STATISTICS SCOPE AND APPLICATIONS	ix

111 The linguistic viewpoint	x

112 Content analysis	xi

121 Pioneering works	1

A STATISTICIANS VIEWPOINT	2

132 Internal and external information metadata	3

RESPONSES TO OPEN QUESTIONS	6

a research tool	7

512 Aggregated lexical tables	95

513 Frequency threshold for words	96

515 Construction of aggregated lexical and segmental table	100

516 Analysis and interpretation of lexical tables	103

517 Illustration of displays using repeated segments	107

52 WORKING DEMOGRAPHIC PARTITIONS	110

53 DIRECT ANALYSIS OF RESPONSES OR DOCUMENTS	113

531 How are distances interpreted?	114

142 Manual postcoding of free responses	9

groups of responses	10

THE UNITS OF TEXTUAL STATISTICS	13

211 Computerized text	14

213 Lemmatized analyses	15

214 Semantically based approaches	16

215 Brief comparison with other languages	17

22 SEGMENTATION AND NUMERIC CODING OF TEXT	18

221 Numeric coding of Life corpus	19

222 Corpus P	20

232 Zipfs law	21

24 LEXICOMETRIC DOCUMENTS	23

241 Index of a corpus	24

243 Vocabulary growth	26

244 Lexical tables	27

251 Sentences sequences	28

252 Repeated segments table	29

26 FINDING COOCCURRENCES QUASISEGMENTS	31

262 Finding multiple cooccurrences quasisegments	32

272 Comparison of main quantitative characteristics	33

CORRESPONDENCE ANALYSIS OF LEXICAL TABLES	37

31 BASIC PRINCIPLES OF MULTIVARIATE DESCRIPTIVE METHODS	38

32 CORRESPONDENCE ANALYSIS	39

323 Validity of the representation	47

324 Active and supplementary variables	52

325 A comparison with principal components analysis	55

33 MULTIPLE CORRESPONDENCE ANALYSIS	61

331 Basic structure of a survey sample	63

332 Validity of the representation	68

333 Positioning of supplementary variables	70

CLUSTER ANALYSIS OF WORDS AND TEXTS	73

41 REVIEW OF HIERARCHICAL CLUSTER ANALYSIS	74

411 The dendrogram	75

412 Cutting the dendrogram	76

413 Appending supplementary elements	77

414 Filtering on first principal axes	78

421 Cluster analysis of words	79

422 Cluster analysis of texts	82

423 Notes on cluster analysis of words	83

43 CLUSTER ANALYSIS OF SURVEY DATA SETS	86

431 Mixed clustering algorithms	87

432 Sequence of operations in survey analysis	88

working demographic partition	89

VISUALIZATION OF TEXTUAL DATA	93

51 CORRESPONDENCE ANALYSIS OF LEXICAL TABLES	94

532 Analysis of sparse matrix T	115

533 Application example	116

CHARACTERISTIC TEXTUAL UNITS MODAL RESPONSES AND MODAL TEXTS	121

61 CHARACTERISTIC ELEMENTS	122

612 List of characteristic units	126

62 MODAL RESPONSES	128

621 Selection of modal responses using characteristic elements	129

622 Selection of modal responses using chisquare distances	132

623 Implementation and examples	133

LONGITUDINAL PARTITIONS TEXTUAL TIME SERIES	139

711 Longitudinal partitioning example	140

712 Analysis of age category gradation	141

713 Adjacent characteristic elements	142

72 TEXTUAL TIME SERIES	145

722 Chronological characteristic elements	147

723 Characteristic increments	149

724 Parallel analysis of a lemmatized corpus	153

TEXTUAL DISCRIMINANT ANALYSIS	155

81 TWO MAJOR AREAS OF CONCERN IN TEXTUAL ANALYSIS	156

information retrieval coding validation	157

82 UNITS AND INDICES OF STYLOMETRY	158

821 Function words speech parts	159

822 Richness of vocabulary	160

AN EXAMPLE	161

832 Available data for attribution problems	162

833 Other approaches to the problem	165

84 GLOBAL DISCRIMINANT ANALYSIS	166

841 General principles	167

842 Units for global discriminant analysis	169

844 Discriminant analysis regularized through preliminary correspondence analysis	171

85 GLOBAL DISCRIMINATION AND VALIDATION	173

852 Vocabulary and analysis for Tokyo	177

853 Reality of patterns	184

854 Discriminant analysis and confusion matrices	185

855 Conclusions to section 85	191

Singular value decomposition and correspondence analysis	192

Clustering techniques	203

More details about the nonparametric estimation model	211

Search for repeated segments in a corpus	213

Glossary	216

References	221

Author Index	230

Subject Index	234

Symbols	238

Droits d'auteur

Informations bibliographiques

Titre	Exploring Textual Data Volume 4 de Text, Speech and Language Technology
Auteurs	Ludovic Lebart, A. Salem, L. Berry
Édition	illustrée
Éditeur	Springer Science & Business Media, 1997
ISBN	0792348400, 9780792348405
Longueur	247 pages

Exporter la citation	BiBTeX EndNote RefMan

À propos de Google Livres - Règles de confidentialité - Conditions d' utilisation - Informations destinées aux éditeurs - Signaler un problème - Aide - Accueil Google

Livres

Exploring Textual Data

À l'intérieur du livre

Pages sélectionnées

Table des matières

Autres éditions - Tout afficher

Expressions et termes fréquents

Références à ce livre

Informations bibliographiques