Isotropic Dynamic Hierarchical Clustering
Isotropic Dynamic Hierarchical Clustering
Article PDF

Keywords

lustering; hierarchical clustering; dynamic clustering; isotropic clustering; multi-dimensional space; b-tree; factor analysis

How to Cite

Victor Sadikov, & Oliver Rutishauser. (2016). Isotropic Dynamic Hierarchical Clustering. Global Journal of Computer Science and Technology, 16(C3), 29–36. Retrieved from https://gjcst.com/index.php/gjcst/article/view/779

Abstract

We face a business need of discovering a pattern in locations of a great number of points in a high-dimensional space We assume that there should be a certain structure so that in some locations the points are close while in other locations the points are more dispersed Our goal is to group the close points together The process of grouping close objects is known under the name of clustering 1 We are particularly interested in a hierarchical structure A plain structure may reduce the number of objects but the data are still difficult to manage or present 2 The classical technique suited for the task at hand is a B-Tree The key properties of the B-Tree are that it is hierarchical and balanced and it can be dynamically constructed from the input data In these terms B-Tree has certain advantages over other clustering algorithms where the number of clusters needs to be defined a priori The BTree approach allows to hope that the structure of input data will be well determine without any supervised learning 3 The space is Euclidean and isotropic This is the most challenging part of the project because currently there are no B-Tree implementations processing indices in a symmetrical and isotropical way Some known implementations are based on constructing compound asymmetrical indices from point coordinates where the main index works as a key while the function of other 999 indices is lost and the other known implementations split the nodes along the coordinate hyper-planes sacrificing the isotropy of the original space In the latter case the clusters become coordinate parallelepipeds which is a rather artificial and unnecessary assumption Our implementation of a B Tree for a high-dimensional space is based directly on concepts of factor analysis 4 We need to process a great deal of data something like tens of millions of points in a thousand-dimensional space The application has to be scalable even though technically out task is not con
Article PDF
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2016 Authors and Global Journals Private Limited