• Please use the portal issue section to announce your technical problem.



9th IPM-HPC Workshop on Multi-core Systems and Graphics Processors

8th IPM-HPC Workshop on Multi-core Systems and Parallel Platforms

7th IPM-HPC WorkShop on Multi-Core Systems & GPU and it???s Application in HPC

6th IPM-HPC WorkShop on Multi-Core Systems and it???s Application in Big Data

Winner of the Memcode 2016 Design Contest

13th ACM-IEEE International Conference on Formal Methods and Models for System Design
The University of Texas at Austin
September 21-23, 2015

IPM-HPC Team members: Saeid Rahmani, Armin Ahmadzadeh, Omid Hajihassani, SeyedPooya Mirhosseini, and Saeid Gorgin

Contest Problem: The MEMOCODE???16 will include a design contest, which will pose a computational challenge that participants may solve using hardware or software on FPGAs, GPUs, and CPUs. The conference will sponsor at least one prize with a monetary award for the contest winners. The 2016 challenge is K-means clustering that is an unsupervised method for clustering multidimensional data points, aiming to partition the points into ???K??? subgroups (clusters) that are similar. This is used in a variety of applications such as data mining, image segmentation, medical imaging, and bioinformatics.

IPM-HPC Solution: Our method makes exhaustive use of four High throughput GPU and hides memory latency. The 2016 MEMOCODE Design Contest was to efficiently compute k-means clustering on a large multidimensional data set. We performed effective optimizations involving the algorithmic structure and parallelism. We implemented our design using Intel Xeon E5 CPUs and NVIDIA GTX 980 GPUs. Our overall best result computed the solution in 106ms using four GPUs. In terms of cost normalized results, our best solution was the 2x GPU implementation, which was only 1.5x slower than the 4x GPU solution, at half the cost. The IPM team???s implementation strategy involved careful parallelization of the problem across available platforms, as well as optimization of the arithmetic required by the problem. Moreover, The solution was based on Lloyd???s algorithm.