journals.iop.org home page electronic journals * User guide   * Site map   | Quick Search:Help  
Journal of Physics: Conference Series
Athens/Institutional login
IOP login: Password:   
Create account | Alerts | Contact us
Journals Home | Journals List | EJs Extra | This Journal | Search | Authors | Referees | Librarians | User Options | Help |

PERI - auto-tuning memory-intensive kernels for multicore

S Williams et al 2008 J. Phys.: Conf. Ser. 125 012038 (15pp)   doi: 10.1088/1742-6596/125/1/012038  Help

   PDF (1.18 MB) | References

S Williams1,2, K Datta2, J Carter1, L Oliker1,2, J Shalf1, K Yelick1,2 and D Bailey1
1 CRD/NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
2 Computer Science Division, University of California at Berkeley, Berkeley, CA 94720, USA
E-mail: SWWilliams@lbl.gov, kdatta@eecs.berkeley.edu, JTCarter@lbl.gov, LOliker@lbl.gov, JShalf@lbl.gov, KAYelick@lbl.gov and DHBailey@lbl.gov

Abstract. We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to sparse matrix vector multiplication (SpMV), the explicit heat equation PDE on a regular grid (Stencil), and a lattice Boltzmann application (LBMHD). We explore one of the broadest sets of multicore architectures in the high-performance computing literature, including the Intel Xeon Clovertown, AMD Opteron Barcelona, Sun Victoria Falls, and the Sony-Toshiba-IBM (STI) Cell. Rather than hand-tuning each kernel for each system, we develop a code generator for each kernel that allows us identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned kernel applications often achieve a better than 4× improvement compared with the original code. Additionally, we analyze a Roofline performance model for each platform to reveal hardware bottlenecks and software challenges for future multicore systems and applications.

Bookmark and Share Post to CiteUlike | Post to Connotea | Post to Bibsonomy

 


Find related articles





Article options

Authors & Referees

 
Content finder
  Full Search
  Help


  
Setup information is available for Adobe Acrobat.
EndNote, ProCite ® and Reference Manager ® are registered trademarks of ISI Researchsoft.
Copyright © Institute of Physics and IOP Publishing Limited 2009.
Use of this service is subject to compliance with the terms and conditions of use. In particular, reselling and systematic downloading of files is prohibited.
Help: Cookies | Data Protection.