Submitted:
16 April 2025
Posted:
21 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methods
- Effective identification of relevant patterns is made possible by a hierarchical arrangement of transactions.
- Each meta-pattern acquires a stability score to filter insufficient or inconsistent relations.;
- DMAR examines the real statistical relationship between items instead of depending just on frequency-based measurements.
- This approach removes arbitrary or minor patterns that usually coexist without any important correlation.
- DMAR dynamically changes this criterion depending on the variance of mutual information values, unlike conventional algorithms using a fixed support threshold.
- This adaptive filtering lowers noise in the result and increases the relevance of obtained rules.
- Extracted from validated meta-patterns are candidate rules.
- Each rule is assessed using important measure, including confidence, therefore maintaining the identification of excellent relationships.
- A dynamic filtering mechanism remains just the most important rules.
- DMAR proposes an additional filter designed to supplement traditional criteria (support, confidence) with an intra-antecedent analysis. This method concentrates on the extraction of meaningful coherent rules and tries to improve the semantic quality of the associations discovered.
2.1. Extracting Meta-Patterns
2.2. Mutual Information-Based Filtering
- If I (A, B) >0, then A and B are statistically correlated, and their relationship is substantial.
- If I (A, B) ≈0, then A and B are independent, even if they are frequently associated.
- If I (A, B) <0, then the presence of A lowers the probability of the presence of B, which is rarely relevant in association rule mining.
2.3. Adaptive and Dynamic Thresholding
2.4. Association Rule Generation
2.5. Refining Association Rules
- The numerator Support(X∪Y) measures the simultaneous frequency of the sets X and Y.
- The denominator represents the total frequency of X occurring with all its potential outcomes.
-
HypothesisLet D be a collection of transactions, and let a set of association rules be derived in the form X, where X is a subset of I X⊆I, and= {,,…,} is the collection of consequences obtained for the corresponding antecedent X.We refer to; Support(A) is the count (or percentage) of transactions with itemset A. The expression X∪refers to the co-occurrence of X with a consequence. The aim is to quantify the focus of X towards a target within complete set of.
-
Justification by analogy with a normalized distributionPartial support vs. conditioned total support: The measure Support(X∪Y) defines the absolute frequency for the co-occurrence of X and Y. Conversely, the sum is the overall frequency of the occurrence of X across all rules produced regardless of the consequent.Specific vs global ratio: Considering the ratio; / One counts the frequency with which X is directly followed by Y, in relation to all the cases in which X produces some .
- 3
-
Properties justifying the validity of the formula:
- The TCM, as a ratio of frequencies, is bounded in the interval [0, 1], allowing for a consistent interpretation across rules.
- Maximality (TCM = 1) is obtained when X is only associated to Y, thus giving proof that the measure successfully detects exclusive concentration.
- Minimality (TCM = 0) if X weakly relates to Y; the formula accurately represents logical dispersion.
- Invariance in relation to the size of the database is observed because the TCM relies on relative frequencies, thus staying valid irrespective of the number of transactions or density of the DataSets.
| Algorithm 1: DMAR (Dynamic Mining of Association Rules). |
Input:
|
Output:
|
| 1.1 Initialize a Transaction Tree T. |
| 1.2 For each transaction t in DB: |
| Insert into T by merging similar items. |
| 1.3 Extract frequent meta-patterns X with support ≥ . |
1.4 For each meta-pattern X:
|
2.1 For each meta-pattern X = {A, B, C, ...}:
|
| 2.2 Calculate the average μ and standard deviation σ of MI values for meta-patterns X. |
2.3 Calculate a Dynamic Adaptive Threshold for each meta-pattern X:
|
| Retain X. |
|
| 3.1 Initialize R (set of association rules of the form X ⇒ Y). |
3.2 For each meta-pattern X:
|
| 4.1 Calculate the average and standard deviation of Mutual Information values for generated rules. |
4.2 Define a Dynamic Adaptive Threshold to filter rules:
|
| 4.3 Apply this threshold to dynamically filter generated association rules R. |
|
| 5.1 Group all generated association rules R by their antecedent X |
5.2 For each group of rules with the same antecedent X:
|
| 5.3 Build the list = {TCM(r) for all r ∈ R} |
| 5.4 Compute μ = Average () and Compute σ =Deviation (). |
| 5.5 Compute dynamic threshold θ = μ + λ × σ |
| 5.6 = {r ∈ R | TCM(r) ≥ θ} |
| Return |
3. Results
- Number of rules generated: produce a concise yet pertinent and significant collection of association rules.
- Effect of TCM measurement on the logical integrity of rules.
- Computational efficiency: Execution time and memory consumption compared to other methods.
3.1. Dataset Overview
- Mushroom Dataset: The dense dataset contains descriptions of ~8,124 mushroom samples, and has ~119 various items. Categorical attributes were converted into transactions, and each feature is considered an item.
- Adult Dataset: Also called the “Census Income” dataset, this dense dataset contains ~48,842 instances with ~95 attributes and aims to predict whether a person’s annual income is above $50,000. The numerical attributes were discretized into discrete classes and then translated to transactions.
- Online Retail II Dataset: The sparse dataset includes all the transactions carried out between December 1, 2009, and December 9, 2011, by a United Kingdom-based online retailer. It has ~53,628 records with ~5305 attributes. It is typically used for sales pattern detection, customer segmentation, and market basket analysis.
- Retail Dataset: This sparse dataset, available at the SPMF website [30], contains~ 88,162 retail transactions, widely used in frequent pattern mining and association rule mining, and ~16,470 items.
3.2. Analysis of Results
3.2.1. Number of Rules Generated
- At λ=0.5, DMAR produces approximately 50% fewer rules compared to Apriori or FP-Growth.
- For λ=1.0, this reduction is approximately 65–70%.
- With λ=1.5, only 10 to 15% of the initial rules remain.
- Mushroom (defined by density and structure): even at λ=1.5, a high number of rules (615) is still retained, which proves that DMAR preserves the logical consistency of frequent associations and reduces redundancy.
- Adult (large base, low density): the contrast between DMAR and Apriori is especially significant. DMAR removes rules that result from opportunistic but not very specific combinations, thereby demonstrating the value of TCM in low semantic concentration data.
- Retail and Online Retail II (sparse transactions, long tail): DMAR with λ=1.5 maintains just a nucleus of very specific rules, which is perfect for targeted recommendation systems, where the importance is relevance rather than abundance.
- Structural meta-pattern filtering
- Dynamic Mutual Information Filtering (elimination of uninformative meta-patterns).
- Confidence filtering using a threshold computed dynamically from the data.
- Filtering by TCM, threshold θ controlled by λ. Even at λ=0.5, DMAR generates considerably less logical noise while still maintaining exploitable rules.
- λ = 0.5 → exploratory tasks, visualization, or human decision support
- λ = 1.0 → for evenly balanced use cases (recommendation, labeling, intermediate classification),
- λ = 1.5 → for high-stakes scenarios (diagnosis, fraud detection, highly targeted recommendations), where only the highest possible rule quality is acceptable.
3.2.2. Impact of the TCM Measure on the logical quality of the rules
3.2.3. Execution Time
- As a result: On dense datasets (like Mushroom), FP-Growth is efficient, However, in extensive and sparse databases such as Retail or Online Retail II, there are many more rules, causing a significant rise in execution time. Conversely, DMAR integrates multiple mechanisms for volume reduction from the initial point:
- Targeted extraction by meta-patterns,
- Dynamic Mutual Information Semantic Filtering.
- Selection by confidence, also under dynamic control.
- Logical filtering by TCM, with a dynamic threshold.
3.3. Comparison with Related Algortihm
- Execution time: DMAR consistently has execution times shorter than FP-Growth, and more significantly, than Apriori. This is because of its iterative filtering design (meta patterns → MI → confidence→ TCM), which precludes the large generation of poor or irrelevant rules, thereby reducing the computational overhead. In contrast, while FP-Growth handles small datasets efficiently, its performance deteriorates with growing dataset size or sparsity. In contrast, however, the Apriori algorithm incurs the largest costs, which are due to its approach of generating and examining all potential itemsets.
- Scalability: DMAR is very scalable, especially for large databases such as Retail or Online Retail II. Its efficiency comes from the early pruning of the search space, so it is solid for each combinatorial case. FP-Growth has linear increments in memory usage and processing time when the quantity of frequent patterns grows. Apriori, characterized by its exponential complexity, very rapidly becomes unfeasible for large real-world data sets.
- Logical quality of extracted rules: One significant benefit of DMAR is that it can generate rules of high-logical-quality. Through the use of the TCM measure, DMAR tends to support rules that link a single antecedent to a single consequent. This makes it possible to minimize broad rules and ambiguity. FP-Growth and Apriori, in contrast, due to the lack of semantic filtering, tend to generate numerous non-discriminative rules.
- Integrated semantic filtering: DMAR particularly incorporates an advanced filtering strategy in which the evaluation of rules is performed according to their structural form (meta-patterns), mutual information, contextual confidence, and logical density computed by TCM. FP-Growth and Apriori only consider support and confidence measures and disregard the semantic context.
- Reduction of redundancy: By utilizing several filtering stages, DMAR effectively restricts the production of redundant rules. This aspect is vital in giving concise, readable, and actionable outputs. Apriori and FP-Growth, however, tend to produce numerous variations of rules containing the same information and therefore complicate the subsequent processing.
- User control (dynamic λ): The introduction of the parameter λ in DMAR provides the user with a convenient way to dynamically adjust the level of selectivity. It is an effective method to manage the compromise between the number of rules and the targeted level of quality, an operation that is not possible with conventional techniques.
4. Discussion
5. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| DMAR | Dynamic Mining of Association Rules |
| MI | Mutual information |
| TCM | Target Concentration Measure |
References
- Frawley, W.J.; Piatetsky-Shapiro, G.; Matheus, C.J. Knowledge discovery in databases: an overview. AI Mag. 1992, 13, 57–70. [Google Scholar]
- Agrawal, R.; Imieliński, T.; Swami, A. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, 25–28 May 1993; pp. 207–216. [Google Scholar]
- Agrawal, R.; Srikant, R. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Santiago de, Chile, Chile, 12–15 September 1994; pp. 487–499. [Google Scholar]
- Han, J.; Pei, J.; Yin, Y. Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 16–18 May 2000; pp. 1–12. [Google Scholar]
- Zaki, M.J. Fast Mining of Sequential Patterns in Very Large Databases; University of Rochester, Department of Computer Science: Rochester, NY, USA, 1997. [Google Scholar]
- Liu, Y.; Liao, W.K.; Choudhary, A. A two-phase algorithm for fast discovery of high utility itemsets. In Advances in Knowledge Discovery and Data Mining; Dai, H., Srikant, R., Zhang, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 689–695. [Google Scholar]
- Ahmed, C.F.; Tanbeer, S.K.; Jeong, B.S.; Lee, Y.K. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 2009, 21, 1708–1721. [Google Scholar] [CrossRef]
- Tan, P.-N.; Kumar, V.; Srivastava, J. Selecting the right interestingness measure for association patterns. Inf. Syst. 2004, 29, 293–313. [Google Scholar] [CrossRef]
- Brin, S.; Motwani, R.; Ullman, J.D.; Tsur, S. Dynamic itemset counting and implication rules for market basket data. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, Tucson, AZ, USA, 13–15 May 1997; pp. 255–264. [Google Scholar]
- Geng, L.; Hamilton, H.J. Interestingness measures for data mining: A survey. ACM Comput. Surv. 2006, 38, 1–39. [Google Scholar] [CrossRef]
- Silberschatz, A.; Tuzhilin, A. What makes patterns interesting in knowledge discovery systems? IEEE Trans. Knowl. Data Eng. 1996, 8, 970–974. [Google Scholar] [CrossRef]
- Lavrač, N.; Flach, P.; Zupan, B. Rule evaluation measures: A unifying view. In Proceedings of the 9th International Workshop on Inductive Logic Programming (ILP 1999), Bled, Slovenia, 24–27 June 1999; Flach, P., Lavrač, N., Eds.; Springer: Berlin/Heidelberg, Germany, 1999; Volume 1634, pp. 174–185. [Google Scholar]
- Hilderman, R.J.; Hamilton, H.J. Knowledge Discovery and Interestingness Measures: A Survey; University of Regina: Regina, SK, Canada, 2001. [Google Scholar]
- Lenca, P.; Meyer, P.; Vaillant, B.; Lallich, S. On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid. Eur. J. Oper. Res. 2008, 184, 610–626. [Google Scholar] [CrossRef]
- Mudumba, B.; Kabir, M.F. Mine-first association rule mining: An integration of independent frequent patterns in distributed environments. Decis. Anal. J. 2024, 10, 100434. [Google Scholar] [CrossRef]
- Pinheiro, C.; Guerreiro, S.; Mamede, H.S. A survey on association rule mining for enterprise architecture model discovery: State of the art. Bus. Inf. Syst. Eng. 2024, 66, 777–798. [Google Scholar] [CrossRef]
- Antonello, F.; Baraldi, P.; Zio, E.; Serio, L. A novel metric to evaluate the association rules for identification of functional dependencies in complex technical infrastructures. Environ. Syst. Decis. 2022, 42, 436–449. [Google Scholar] [CrossRef]
- Alhindawi, N. Metrics-based exploration and assessment of classification and association rule mining techniques: A comprehensive study. In Studies in Systems, Decision and Control; Springer: Cham, Switzerland, 2024; Volume 503, pp. 171–184. [Google Scholar]
- Pasquier, N.; Bastide, Y.; Taouil, R.; Lakhal, L. Discovering frequent closed itemsets for association rules. In Proceedings of the 7th International Conference on Database Theory (ICDT), Jerusalem, Israel, 10–12 January 1999; pp. 398–416. [Google Scholar]
- Zaki, M.J. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 2000, 12, 372–390. [Google Scholar] [CrossRef]
- García, S.; Luengo, J.; Herrera, F. Data Preprocessing in Data Mining; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
- Casella, G.; Berger, R.L. Statistical Inference, 2nd ed.; Duxbury Press: Pacific Grove, CA, USA, 2002. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning, 2nd ed.; Springer: Cham, Switzerland, 2009. [Google Scholar]
- Wackerly, D.D.; Mendenhall, W.; Scheaffer, R.L. Mathematical Statistics with Applications, 7th ed.; Cengage Learning: Boston, MA, USA, 2014. [Google Scholar]
- Freund, J.E.; Perles, B.M. Statistics: A First Course, 8th ed.; Pearson: Upper Saddle River, NJ, USA, 2007. [Google Scholar]
- Han, J.; Pei, J.; Kamber, M. Data Mining: Concepts and Techniques, 3rd ed.; Morgan Kaufmann: San Francisco, CA, USA, 2011. [Google Scholar]
- UCI Machine Learning Repository; University of California, Irvine, School of Information and Computer Sciences: Irvine, CA, USA, 2017. Available online: https://archive.ics.uci.edu/ (accessed on 10 April 2025).
- SPMF: A Java open-source pattern mining library. J. Mach. Learn. Res. 2016, 15, 3569–3573, Available online: https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php (accessed on 10 April 2025).
- Aggarwal, C.C.; Yu, P.S. A new framework for itemset generation. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2001), Santa Barbara, CA, USA, 21–23 May 2001; pp. 18–24. [Google Scholar]
- Liu, B.; Hsu, W.; Ma, Y. Integrating classification and association rule mining. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD ‘98), New York, NY, USA, 27–31 August 1998; pp. 80–86. [Google Scholar]




| Meta-Patterns | Support(X) | Mutual Information | Threshold Dynamic | Obtained? |
|---|---|---|---|---|
| {B, C} | 4 | -0.039 | 0.1 | No |
| {B, E} | 4 | 0.175 | 0.1 | Yes |
| {A, C} | 3 | 0.131 | 0.1 | Yes |
| {B, C, E} | 3 | 0.292 | 0.1 | Yes |
| {C, E, A} | 1 | 0.000 | 0.1 | No |
| {B, C, E, A} | 1 | -0.078 | 0.1 | No |
| Measure | Advantages | Limits | What TCM adds additionally |
|---|---|---|---|
| Support | - Easy to calculate - Reflects the actual frequency - Robust to small datasets |
- Favors frequent trivial rules - Ignores the distribution of consequences |
TCM distinguishes whether this frequency is focused or dispersed |
| Confidence | - Intuitive probabilistic measure - Frequently used in practice |
- Insensitive to the competition among several consequences - Can be high even if X is ambiguous |
TCM completes the confidence by revealing the specificity of X→Y |
| TCM | - Evaluates the logical concentration of X - Normalized - Permits comparison between rules |
- Depends on the availability of the rules X⇒Y - Not as well-known |
Provides a structured, comparative, and standardized view on the consequences of X |
| Dataset | DMAR (λ = 0.5) | DMAR (λ = 1.0) | DMAR (λ = 1.5) | FP-Growth | Apriori |
|---|---|---|---|---|---|
| Mushroom | 2 140 | 1 102 | 615 | 4 550 | 4 690 |
| Adult | 3 810 | 2 230 | 1 072 | 6 324 | 6 401 |
| Retail | 1 965 | 982 | 410 | 3 723 | 3 941 |
| Online Retail | 2 540 | 1 195 | 526 | 5 030 | 5 197 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).