3.4. Feature Extraction
Each review is represented by a 15 dimensional feature vector. The feature construction starts from 11 conventional fake review features. Among them, four features are retained as traditional single market features, denoted as to , because they directly describe basic review properties such as rating, length, sentiment consistency, and review burst behavior. Two features are further enhanced as and , because simple lexicon or surface based measurements are insufficient to capture semantic sentiment and language complexity in app reviews. Several conventional features are then extended to the cross market setting, forming to , to capture repeated content, rating discrepancies, ranking discrepancies, and review growth differences across markets. Finally, three new cross market features, , , and , are introduced to describe sentiment inconsistency and temporal coordination patterns that can only be observed after aligning the same app across multiple markets.
For the
i th review
of an app, the feature vector is denoted as
, where
,
, and
denote traditional single market features, enhanced single market features, and cross market features, respectively.
Figure 3 summarizes the feature construction process.
Single Market Features. Single Market features provide basic evidence from individual reviews and local review activity within a single market. For review , denotes its app, denotes the original rating, and denotes the normalized rating.
: App rating score. The app rating score is defined as .
: Sentiment rating inconsistency. Fake reviews may exhibit a mismatch between the sentiment expressed in the review text and the assigned rating. Such inconsistency can indicate abnormal review behavior or mechanically generated promotional content. Therefore, the sentiment rating inconsistency feature is defined as , where denotes the sentiment score of review . A larger value of indicates a stronger mismatch between textual sentiment and numerical rating.
: App review length. The app review length feature is defined as , where denotes the character length of review .
: App review count anomaly. Fake reviews often appear in short bursts, leading to abnormal daily review-count changes. The feature is calculated using the isolation forest and is defined as , where x is the daily review count, denotes the expected path length of x through isolation trees, and is the average path length for a sample size of n. A larger anomaly score indicates a stronger deviation from normal review volume patterns.
Enhanced Single Market Features. These features characterize the semantic and linguistic properties of individual reviews.
: Sentiment score. Promotional fake reviews often contain strong emotional expressions to influence user installation decisions. The sentiment score is obtained using a Chinese sentiment classifier based on StructBERT-base-chinese [
42], trained on four review datasets with 115K samples. The feature is defined as
, where
denotes the predicted sentiment score of review
. Larger values indicate more positive sentiment.
Table 1 gives representative examples.
Language complexity. Fake reviews may contain rigid templates, repeated promotional wording, or unnatural expressions that differ from ordinary user feedback. We measure language complexity using a RoBERTa language assessment model [
43]. For review
, the feature is defined as
, where
denotes the normalized log probability of the review,
denotes the token at position
k,
N is the number of tokens, and
is the resulting perplexity value. In implementation,
is clipped to
for numerical stability, with larger values indicating lower language model likelihood and higher linguistic complexity. We keep the clipped perplexity value without additional normalization because its absolute scale reflects the output range of the language assessment model and preserves magnitude differences among reviews.
Table 1 gives representative examples.
Cross Market Features. These features capture discrepancy and synchronization patterns of the same app across multiple markets. For review , all cross market features are computed over its associated app and then assigned to .
: Cross market sentiment discrepancy. This feature measures the variation of average sentiment across markets for app . It is defined as , where is the average sentiment score of app in market j, is the sentiment score of the k-th review of app in market j, is the number of reviews of app in market j, is the cross market mean sentiment, and is the number of markets where app is listed. A larger value indicates stronger sentiment discrepancy across markets.
: Cross market temporal variance of duplicate app reviews. This feature measures whether duplicated reviews are posted synchronously across markets. It is defined as , where is the set of timestamps of reviews whose content is identical to review , and is their average timestamp. If no duplicated review is observed, is set to 0.
: Cross market count of duplicate app reviews. This feature captures exact content reuse across markets. It is defined as , where denotes the number of reviews whose content is identical to review across all markets where app is listed.
: Cross market count of similar app reviews. This feature captures cross market review reuse with shared templates or minor textual variations. It is defined as , where denotes the set of reviews of app across all markets, is the Hamming-distance-based similarity score, is the similarity threshold, and is the indicator function.
: Cross market app ranking discrepancy. This feature measures whether the same app exhibits inconsistent ranking dynamics across markets. It is defined as , where denotes the mean square between markets and denotes the mean square within markets, both computed from ANOVA over the daily ranking changes of app . The daily ranking change is , where denotes the ranking of app in market j at time t. A larger value indicates stronger cross market divergence in ranking dynamics.
: Cross market app review length discrepancy. This feature measures whether review verbosity differs across markets. It is defined as , where is the average review length of app in market j, and is the number of markets where app is listed.
: Cross market app burst discrepancy. This feature captures abnormal short term growth in review volume across markets. For app in market j, the burst ratio is defined as , where is the number of reviews of app in market j on day t, and is the average daily review count. The feature is defined as . A higher value indicates stronger inconsistency in burst review behaviors across markets.
: Cross market app rating discrepancy. This feature measures the variation of app ratings across markets. It is defined as , where is the standard deviation of the average ratings of app across all markets. A larger value indicates stronger rating inconsistency across markets.
: Cross market temporal variance of similar app reviews. This feature extends from exact duplicates to semantically or lexically similar reviews. Review similarity is computed using a Hamming distance based similarity score with a threshold of . The feature is defined as , where is the timestamp set of reviews similar to , and is their average timestamp. If no similar review is observed, is set to 0.