Title: Empowering Cloud-Resolving Models Through GPU and Asynchronous I/O
Presenting Author: Wei-Kuo Tao
Organization:NASA GSFC
Co-Author(s): Thomas L. Clune, Shujia Zhou, Toshihisa Matsui, Xiaowen Li, Xiping Zeng NASA GSFC

Abstract:
The effects of aerosols on weather and climate are the largest uncertainty in current weather and climate models. In this project, technology on GPU (graphics processing units) and Asynchronous I/O is used to empower cloud-resolving models (CRMs) and consequently significantly improve the modeling capability of the aerosol effects and their interactions with radiation and microphysics for weather prediction and climate change studies. Our approach is (1) accelerate computationally intensive components of CRMs (i.e., the microphysics and radiation) with GPU, (2) develop a parallel, asynchronous I/O tool to improve model efficiency, (3) develop a data compression mechanism to further empower the asynchronous I/O tool. The key accomplishments as of June, 2014 are:

• Completion of radiation component optimization with CUDA Fortran, integrating it with the rest of the GCE model system and testing its performance:

    • The key routine and its supporting routines in the short wave module were optimized with the emphasis of minimizing data transfer to improve the computational performance
    • The key routine in the long wave modules were optimized with shared memory and minimization of data transfer to improve the computational performance
    • Lesson learned: Reducing IO between CPU and GPU is the key to improve performance. Re-organizing codes through reducing array dimensions is one effective solution.

• Developed a hybrid radiation driver module that works for both GPU-based vector simulations and CPU-based 1D simulations.

• Ported one-moment microphysics scheme in GPU using lesson learned from porting radiation component.

• One-momentum microphysics scheme was re-organized and ported into GPU with OpenACC as well as CUDA Fortran.

• Implemented a two-moment scheme into GCE model, tested its performance:

    • Developed a baseline case, identified the compute-intensive algorithms, and re-organizing it for porting into GPU using lesson learned from porting both radiation component and one-moment microphysics scheme
    • Tested a tropical case for baseline (GCE) system integration
    • A tropical convective precipitation system (TWP-ICE) for baseline system integration (had 3D visualization from Ames and observation for validation).

• Added parallel IO capability to AsyncIO through MPI IO. The experiment results show this tool is very promising in improving IO performance.