Optimizing Data Movement And Achieving Performance Portability With Fine-Grained Data Blocking