Projects per year
Abstract
Coding of data, usually upstream of data analysis, has crucial impli- cations for the data analysis results. By modifying the data coding – through use of less than full precision in data values – we can aid appre- ciably the effectiveness and efficiency of the hierarchical clustering. In our first application, this is used to lessen the quantity of data to be hierar- chically clustered. The approach is a hybrid one, based on hashing and on the Ward minimum variance agglomerative criterion. In our second appli- cation, we derive a hierarchical clustering from relationships between sets of observations, rather than the traditional use of relationships between the observations themselves. This second application uses embedding in a Baire space, or longest common prefix ultrametric space. We compare this second approach, which is of O(n log n) complexity, to k-means.
Original language | English |
---|---|
Pages (from-to) | 707-730 |
Number of pages | 24 |
Journal | SIAM Journal on Scientific Computing |
Volume | 30 |
DOIs | |
Publication status | Published - 2008 |
Projects
- 1 Finished
-
New Mathematical approaches for structuring and searching through, very large compressed encrypted textual data stores
Murtagh, F. (PI) & Contreras Albornoz, P. (CoI)
Eng & Phys Sci Res Council EPSRC
1/11/06 → 31/03/10
Project: Research