Galaxy classification is essential for understanding the formation and evolution of cosmic structures. However, faced with the explosive growth of astronomical observation data, traditional single-modality classification methods relying solely on spectroscopy or imaging have struggled to meet high-precision demands due to insufficient feature utilization and limited generalization capability. Therefore, multimodal fusion has emerged as a promising direction by leveraging information complementarity to overcome the limitations of single data sources. Accordingly, this paper proposes a model named Galaxy CosineNet (GCSNet), which integrates imaging, spectroscopic, and tabular data for high-precision galaxy classification. Specifically, the model employs dedicated encoders to process the three modalities separately and utilizes skip connections to preserve raw features. Furthermore, it incorporates a multi-head self-attention mechanism to deeply mine global cross-modal complementary information. Finally, these features are concatenated and fed into a cosine similarity classification head. Experimental results demonstrate that GCSNet achieves 97.15% accuracy in classifying star-forming, composite, active galactic nuclei (AGNs), and normal galaxies. This performance outperforms the best single-modal baseline, GaSNet, by 0.76% and mainstream multi-modal models such as MB-ISTL and the Transformer by over 1.6%. Consequently, the proposed GCSNet offers an effective and novel approach for research on automatic galaxy classification.