Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Cross-Feature Transfer Learning For Efficient Tensor Program Generation

Version 1 : Received: 30 December 2023 / Approved: 3 January 2024 / Online: 3 January 2024 (08:38:03 CET)

A peer-reviewed article of this Preprint also exists.

Verma, G.; Raskar, S.; Emani, M.; Chapman, B. Cross-Feature Transfer Learning for Efficient Tensor Program Generation. Appl. Sci. 2024, 14, 513. Verma, G.; Raskar, S.; Emani, M.; Chapman, B. Cross-Feature Transfer Learning for Efficient Tensor Program Generation. Appl. Sci. 2024, 14, 513.

Abstract

Tuning tensor program generation involves navigating a vast search space to find optimal program transformations and measurements for a program on target hardware. The complexity of this process is further amplified by the exponential combinations of transformations, especially in heterogeneous environments. This research addresses these challenges by introducing a novel approach that learns the joint neural network and hardware features space, facilitating knowledge transfer to new, unseen target hardware. A comprehensive analysis is conducted on the existing state-of-the-art dataset, TenSet, including a thorough examination of test split strategies and the proposal of methodologies for dataset pruning. Leveraging an attention-inspired technique, we tailor the tuning of tensor programs to embed both neural network and hardware-specific features. Notably, our approach substantially reduces the dataset size by up to 53% compared to the baseline without compromising Pairwise Comparison Accuracy (PCA). Furthermore, our proposed methodology demonstrates competitive or improved mean inference times with only 25%40% of the baseline tuning time across various networks and target hardware. The attention-based tuner can effectively utilize schedules learned from previous hardware program measurements to optimize tensor program tuning on previously unseen hardware, achieving a top-5 accuracy exceeding 90%. This research introduces a significant advancement in auto-tuning tensor program generation, addressing the complexities associated with heterogeneous environments and showcasing promising results regarding efficiency and accuracy.

Keywords

auto-tuning; deep learning compilers; heterogeneous transfer learning; tensor program generation

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.