Human activity recognition (HAR) based on smartphone and wearable sensor data is commonly addressed using statistical learning methods and deep neural networks that often provide strong predictive performance, but at the expense of limited interpretability and substantial computational and energy requirements. Such limitations reduce their suitability for deployment in practical sensing environments where model decisions must be transparent, verifiable and executable on resource-constrained devices. In this work, we investigate the Convolutional Tsetlin Machine (CTM) for multimodal HAR using the UCI-HAR dataset. The Tsetlin Machine is a novel neuro-symbolic machine learning approach that offers two important advantages over many conventional machine learning methods: (i) it learns logic-based decision rules that are human-readable and formally verifiable, and (ii) it operates with comparatively low computational complexity, making it well suited to efficient and low-power on-device learning. The proposed study systematically analyses the contribution of different feature modalities by decomposing the inertial signals space into semantically defined subsets according to: (i) sensor source: accelerometer or gyroscope; (ii) physical component: body or gravity; (iii) coordinate: x, y or z. A separate CTM classifier was trained for each modality and their combination in order to determine the relative discriminative value of each modality group for activity classification. In addition to predictive performance the study emphasizes the interpretability of the CTM model ensured by expressing each decision in the form of propositional clauses, thereby enabling visualization and direct inspection of the modality-specific patterns supporting each activity class. Owing to its symbolic structure and modest computational demands, the CTM provides a principled framework for the design of explainable, resource-efficient and deployable HAR systems. The proposed work therefore contributes toward trustworthy multimodal sensing by jointly addressing predictive performance, interpretability and suitability for embedded and mobile platforms.