Kong, W., Liu, B., Bi, X., Pei, J., & Chen, Z. (2023). Instructional Mask AutoEncoder: A Scalable Learner for Hyperspectral Image Classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
Kong, W., Liu, B., Bi, X., Pei, J., & Chen, Z. (2023). Instructional Mask AutoEncoder: A Scalable Learner for Hyperspectral Image Classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
Kong, W., Liu, B., Bi, X., Pei, J., & Chen, Z. (2023). Instructional Mask AutoEncoder: A Scalable Learner for Hyperspectral Image Classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
Kong, W., Liu, B., Bi, X., Pei, J., & Chen, Z. (2023). Instructional Mask AutoEncoder: A Scalable Learner for Hyperspectral Image Classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
Abstract
"Finding fresh water in the ocean of data." is a challenge that all deep learning domains struggle with, especially in the area of hyperspectral image analysis. As hyperspectral remote sensing technology advances by leaps and bounds, there are increasing amounts of hyperspectral images(HSIs) can be available. Whereas in fact, these unlabeled HSIs are powerless to be used as material to driven a supervised learning task due to the extremely expensive labeling costs and some unknown regions. Although learning-based methods have achieved remarkable performance due to their superior ability to represent features, at the cost, these methods are complex, inflexible and tough to carry out transfer learning. In this paper, we propose the "Instructional Mask AutoEncoder"(IMAE), which is a simple and powerful self-supervised learner for HSI classification that uses a transformer-based mask autoencoder to extract the general features of HSIs through a self-reconstructing agent task. Moreover, we utilize the metric learning to perform an instructor which can direct the model focus on the human interested region of the input so that we can alleviate the defects of transformer-based model such as local attention distraction, lack of inductive bias and tremendous training data requirement. In downstream forward propagation, instead of global average pooling, we employ a learnable aggregation to put the tokens into fullplay. The obtained results illustrate that our method effectively accelerates the convergence rate and promotes the performance in downstream task.
Environmental and Earth Sciences, Geochemistry and Petrology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.