Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex DataHans-Hermann Bock, Edwin Diday Springer Science & Business Media, 21 déc. 1999 - 425 pages Raymond Bisdorff CRP-GL, Luxembourg The development of the SODAS software based on symbolic data analysis was extensively described in the previous chapters of this book. It was accompanied by a series of benchmark activities involving some official statistical institutes throughout Europe. Partners in these benchmark activities were the National Statistical Institute (INE) of Portugal, the Instituto Vasco de Estadistica Euskal (EUSTAT) from Spain, the Office For National Statistics (ONS) from the United Kingdom, the Inspection Generale de la Securite Sociale (IGSS) from Luxembourg 1 and marginally the University of Athens . The principal goal of these benchmark activities was to demonstrate the usefulness of symbolic data analysis for practical statistical exploitation and analysis of official statistical data. This chapter aims to report briefly on these activities by presenting some signifi cant insights into practical results obtained by the benchmark partners in using the SODAS software package as described in chapter 14 below. |
Table des matières
Purpose History Perspective | 1 |
12 Symbolic Data Tables and Symbolic Objects | 2 |
122 Sources of Symbolic Data | 3 |
123 Symbolic Objects | 5 |
13 Tools and Operations for Symbolic Objects | 8 |
14 History and Evolution of SDA | 11 |
15 The Content of the SODAS Project | 14 |
152 An Illustrative Example | 15 |
842 Flexible Matching of Boolean Symbolic Objects | 188 |
843 An Application | 196 |
Symbolic Factor Analysis | 198 |
92 Symbolic Principal Component Analysis | 200 |
922 The Purpose of the Method | 201 |
923 The VERTICES Method | 202 |
924 The CENTERS Method | 205 |
925 Representation by Rectangles | 206 |
153 Overview on the SODAS Software | 17 |
Concepts and Symbolic Objects | 18 |
162 Intent and Extent the Two Kinds of Concepts | 19 |
The Four Traditions and Symbolic Objects | 20 |
17 Advantages of Using Symbolic Data Analysis | 21 |
18 The Future Development of SODAS | 22 |
The Classical Data Situation | 24 |
23 Quantitative Variables | 25 |
24 Qualitative Variables | 26 |
242 Ordinal Variables and Generalized Ordinal Variables | 27 |
25 Data Vectors and the Data Matrix | 31 |
26 Dependent Variables | 32 |
261 Logical Dependence | 33 |
262 Hierarchical Dependence MotherDaughter | 34 |
263 Stochastic Dependence | 36 |
27 Missing Values | 37 |
Symbolic Data | 39 |
32 MultiValued and Interval Variables | 42 |
33 Modal Variables | 45 |
34 A Synthesis of Symbolic Data Types | 49 |
Symbolic Objects | 54 |
42 Relations and Descriptions | 60 |
421 Relations | 61 |
422 Descriptions Description Vectors and Description Sets | 62 |
423 Product Relations | 63 |
43 Events and Assertion Objects | 64 |
44 Boolean Symbolic Objects as Triples | 69 |
45 Modal Symbolic Objects | 75 |
Generation of Symbolic Objects from Relational Databases | 78 |
52 Principles of Symbolic Object Acquisition from Relational Databases | 80 |
53 Interaction with the Database | 85 |
532 Sampling Individuals | 91 |
533 Dependent Variables and Missing Values | 92 |
54 A Generalization Operator | 93 |
542 Problem of OverGeneralization | 95 |
543 A Quality Criterion to Evaluate a Generalized Description | 97 |
544 Coding by Testing for a Uniform Distribution Among Intervals | 98 |
545 A Reduction Algorithm | 100 |
546 A Numerical Example | 102 |
55 Further Operations on Generated Assertions | 103 |
552 Validation of Generated Assertions | 105 |
Descriptive Statistics for Symbolic Data | 106 |
62 The Observed Symbolic Data Set | 108 |
621 The Data Table | 109 |
622 Logical Dependencies | 110 |
623 The Virtual Extension of a Description Vector | 111 |
63 The Case of MultiValued Variables | 112 |
631 Frequency Distribution for a Categorical or Quantitative MultiValued Variable | 113 |
632 Summary Measures for a Numerical MultiValued Variable | 117 |
64 The Case of an IntervalValued Variable | 119 |
Visualizing and Editing Symbolic Objects | 125 |
712 Our Graphical Representation | 126 |
713 Use of Zoom Star | 130 |
714 Conclusion | 136 |
721 Modification of an Existing Symbolic Object | 137 |
722 Modification of Labels | 138 |
Similarity and Dissimilarity | 139 |
811 Resemblance Measures | 140 |
Special Cases | 142 |
813 Distance Measures from a Classical Data Matrix | 145 |
814 Similarity Measures from a Categorical Data Matrix | 148 |
82 Dissimilarity Measures for Probability Distributions | 153 |
The General Case | 154 |
Special Cases | 155 |
823 The Affinity Coefficient | 160 |
83 Dissimilarity Measures for Symbolic Objects | 165 |
831 Gowda and Didays Dissimilarity Measure | 166 |
832 The Approach by Ichino and Yaguchi | 170 |
833 Dissimilarity Measures of De Carvalho | 173 |
Constrained Case | 177 |
835 The Dissimilarity Options in the SODAS Package | 183 |
84 Matching Symbolic Objects | 186 |
926 Example of Oils and Fats | 207 |
927 Conclusions | 212 |
932 A Reminder of Factorial Discriminant Analysis | 214 |
933 FDA on Symbolic Data | 219 |
934 Illustrative Application to a Data Set | 231 |
Discrimination Assigning Symbolic Objects to Classes | 234 |
1013 The Decision Rule | 235 |
1014 The Classical Probabilistic Framework | 236 |
1015 Density Estimation | 238 |
102 Symbolic Kernel Discriminant Analysis | 240 |
1022 Determining the Prior Probabilities | 242 |
1023 The Output Data | 243 |
103 Symbolic Discrimination Rules | 244 |
The Set of Binary Questions and the Construction of a New Data Table from Binary Variables | 247 |
1034 The Recursive Partition Algorithm | 250 |
1035 Detailed Description of the Different Steps | 253 |
1036 Decisional Considerations | 259 |
1037 Example | 261 |
104 Segmentation Trees for Stratified Data | 266 |
1042 Input and Output Data | 267 |
1043 An Example Distinction from Classical Decision Trees | 271 |
1044 Main Steps of the Algorithm | 274 |
1045 Detailed Description of the Algorithm | 277 |
1046 Choices in the Algorithm for Classical Data | 280 |
1047 Choices in the Algorithm for Probabilistic Data | 285 |
1048 Symbolic Object Description of Strata | 289 |
1049 The Example 1043 Revisited | 291 |
10410 Conclusion | 293 |
Clustering Methods for Symbolic Objects | 294 |
112 CriterionBased Divisive Clustering for Symbolic Data | 299 |
1122 Two Distance Measures | 301 |
1123 Extension of the WithinClass Variance Criterion | 304 |
1124 Bipartitioning a Cluster | 305 |
1125 Choice of the Cluster to be Split | 307 |
1127 Example of a Classical Dataset | 308 |
1128 Example of a Symbolic Data Set | 309 |
113 Hierarchical and Pyramidal Clustering with Complete Symbolic Objects | 312 |
1132 Complete Symbolic Objects | 314 |
1133 A HierarchicalPyramidal Clustering Algorithm for Symbolic Data | 315 |
1134 Extension to More Complex Symbolic Data Types | 317 |
1135 A Numerical Example | 322 |
114 Pyramidal Classification for Interval Data Using Galois Lattice Reduction | 324 |
1141 Definition and Construction of Galois Lattices | 325 |
1142 Reduction of a Galois Lattice into a Pyramid | 334 |
1143 A Realcase Application | 337 |
Symbolic Approaches for Threeway Data | 342 |
122 The Input and Output Data | 343 |
1232 Data Compression by Time Clustering | 344 |
1233 Adapted Data Analysis Methods | 345 |
124 Interpretation of Outcomes from Processing of Temporal Changes | 346 |
1242 Symbolic Interpretation of Clustering Results | 347 |
Fuzzy Coding and Compression | 348 |
Temporal Changes of Nominal Variables | 350 |
Using Time Lines for Markings | 352 |
Illustrative Benchmark Analyses | 355 |
132 Professional Careers of Retired Working Persons | 356 |
1322 Divisive Clustering of Professional Careers | 359 |
1323 About the Discrimination of the Retiring Age from the Professional Careers | 369 |
133 Comparing European Labour Force Survey Results from the Basque Country and Portugal | 374 |
1332 Building Symbolic Objects | 376 |
134 Processing Census Data from ONS | 382 |
135 General Conclusion | 385 |
The SODAS Software Package | 386 |
143 Short List of Methods in SODAS Software | 388 |
Symbolic Kernel Discriminant Analysis | 389 |
Principal Component Analysis | 390 |
Decision Tree | 391 |
Notations and Abbreviations | 392 |
394 | |
Addresses of Contributors to this Volume | 414 |
417 | |
Autres éditions - Tout afficher
Expressions et termes fréquents
A₁ algorithm assertion objects B₁ binary questions Boolean symbolic objects C₁ Cartesian Cartesian product classes classes C1 clustering coding coefficient compute consider corresponding criterion d₁ data matrix data units data vector database decision tree decisional node defined definition dendrogram denoted described description set description vector descriptors Diday discriminant dissimilarity measure distance domain elements empirical distribution function example extension factorial plane Figure frequency distribution function Galois lattices given histogram hypercube individuals interval variables logical dependence matching method modal variables multi-valued variable nominal variable observed obtained ordinal variables partition principal component analysis principal components probabilistic probability distribution pyramid recursive partition relation representation salaries Section single-valued SODAS project SODAS software split SQL query statistical strata stratum subsets symbolic data analysis symbolic data array symbolic data table symbolic description symbolic variables terminal nodes variables Y₁ weight Y₁ Y₂ Zoom Star