Submitted:
16 March 2025
Posted:
17 March 2025
You are already at the latest version
Abstract
This study integrates customer loyalty program data with a synthetic population to analyze grocery shopping behaviours in Montreal. Using clustering techniques, we classify 295,631 loyalty program members into seven distinct consumer segments based on behavioural and sociodemographic attributes. The findings reveal significant heterogeneity in consumer behaviour, emphasizing the impact of urban geography on shopping decisions. This segmentation also provides valuable insights for retailers optimizing store locations and marketing strategies, and for policymakers aiming to enhance urban accessibility. Additionally, our approach strengthens Agent-Based Model (ABM) simulations by incorporating demographic and behavioural diversity, leading to more realistic consumer representations. While integrating loyalty data with synthetic populations mitigates privacy concerns, challenges remain regarding data sparsity and demographic inconsistencies. Future research should explore multi-source data integration and advanced clustering techniques. Overall, this study contributes to geographically explicit modelling, demonstrating the effectiveness of combining behavioural and synthetic demographic data in urban retail analysis.
Keywords:
1. Introduction
2. Methods
2.1. Study Area
2.2. Data Sources
2.3. Connecting the Loyalty Data with the Synthetic Population
2.4. Typology Generation
3. Results
3.1. Total Income
3.2. Frequency of Store Visits
3.3. Distance to Their Most Frequently Visited, as Well as Nearest Store
3.4. Average Spending per Transaction
3.5. The Number of Unique Stores Visited
3.6. The Percentage of Shopping Done in the Evening
3.7. The Percentage of Shopping on the Weekend

3.8. The Most Frequently Purchased Items
3.9. The Spatial Distribution of Customer Clusters
4. Conclusion and Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| ABM | Agent-based model |
| GIS | Geographical Information Systems |
| MCCHE | McGill Centre for the Convergence of Health and Economics |
| PUMF | Public Use Microdata Files |
| IPF | Iterative Proportional Fitting |
| SSE | Sum of squares error |
| DA | Dissemination area |
References
- Barthelemy, J.; Cornelis, E. Synthetic populations: Review of the different approaches.
- Bawa, K.; Ghosh, A. A model of household grocery shopping behavior. Marketing Letters 1999, 10, 149–160. [Google Scholar]
- Brahmana, R. S.; Mohammed, F. A.; Chairuang, K. Customer segmentation based on RFM model using K-means, K-medoids, and DBSCAN methods. Lontar Komput. J. Ilm. Teknol. Inf 2020, 11, 32. [Google Scholar]
- Byrom, J. The role of loyalty card data within local marketing initiatives. International Journal of Retail & Distribution Management 2001, 29, 333–342. [Google Scholar]
- Carpenter, J. M.; Moore, M. Consumer demographics, store attributes, and retail format choice in the US grocery market. International Journal of Retail & Distribution Management 2006, 34, 434–452. [Google Scholar]
- Castle, C. J.; Crooks, A. T. Principles and concepts of agent-based modelling for developing geospatial simulations. 2006. [Google Scholar]
- Chapuis, K.; Taillandier, P.; Drogoul, A. Generation of synthetic populations in social simulations: a review of methods and practices. Journal of Artificial Societies and Social Simulation 2022, 25. [Google Scholar]
- Chen, Y.; Liu, Q.; Yang, J.; Cheng, X.; Deng, M. Spatially constrained statistical approach for determining the optimal number of regions in regionalization. International Journal of Geographical Information Science 2024, 38, 2108–2147. [Google Scholar] [CrossRef]
- Crooks, A. T. Constructing and implementing an agent-based model of residential segregation through vector GIS. International Journal of Geographical Information Science 2010, 24, 661–675. [Google Scholar]
- Cui, M. Introduction to the k-means clustering algorithm based on the elbow method. Accounting, Auditing and Finance 2020, 1, 5–8. [Google Scholar]
- Das, S.; Nayak, J. Customer Segmentation via Data Mining Techniques: State-of-the-Art Review. In Computational Intelligence in Data Mining; Nayak, J., Behera, H., Naik, B., Vimal, S., Pelusi, D., Eds.; Smart Innovation, Systems and Technologies, vol. 281; Springer: Singapore, 2022. [Google Scholar] [CrossRef]
- Gallagher, S.; Richardson, L. F.; Ventura, S. L.; Eddy, W. F. SPEW: synthetic populations and ecosystems of the world. Journal of Computational and Graphical Statistics 2018, 27, 773–784. [Google Scholar]
- Gieschen, A.; Paquet, C.; Sengupta, R.; Aunio, A. L.; Belkhiria, F.; Brown, S.; Dube, L. SynthEco—A multi-layered digital ecosystem for analysing complex human behaviour in context. International Journal of Population Data Science 2023, 8, 2285. [Google Scholar] [CrossRef]
- Grekousis, G.; Wang, R.; Liu, Y. Mapping the geodemographics of racial, economic, health, and COVID-19 deaths inequalities in the conterminous US. Applied Geography 2021, 135, 102558. [Google Scholar] [CrossRef] [PubMed]
- French, S. A.; Wall, M.; Mitchell, N. R. Household income differences in food sources and food items purchased. International Journal of Behavioral Nutrition and Physical Activity 2010, 7, 1–8. [Google Scholar]
- Ikotun, A. M.; Ezugwu, A. E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences 2023, 622, 178–210. [Google Scholar]
- Jackson, J.; Forest, B.; Sengupta, R. Agent-based simulation of urban residential dynamics and land rent change in a gentrifying area of Boston. Transactions in GIS 2008, 12, 475–491. [Google Scholar]
- Jiang, N.; Crooks, A. T.; Kavak, H.; Burger, A.; Kennedy, W. G. A method to create a synthetic population with social networks for geographically-explicit agent-based models. Computational Urban Science 2022, 2, 7. [Google Scholar]
- Kang, Y.; Wu, K.; Gao, S.; Ng, I.; Rao, J.; Ye, S.; Fei, T. STICC: a multivariate spatial clustering method for repeated geographic pattern discovery with consideration of spatial contiguity. International Journal of Geographical Information Science 2022, 36, 1518–1549. [Google Scholar] [CrossRef]
- Kenhove, P. V.; De Wulf, K. Income and time pressure: a person-situation grocery retail typology. The International Review of Retail, Distribution and Consumer Research 2000, 10, 149–166. [Google Scholar]
- Kumar, V.; Reinartz, W. Customer Relationship Management: Concept, Strategy, and Tools; Springer: 2012.
- Leloup, X.; Rose, D.; Maaranen, R. The New Social Geography of Montreal: The socio-spatial evolution of income distribution between 1980 and 2015 in the Montreal Metropolitan Area. INRS-Centre Urbanisation Culture Société, 2018.
- Li, X.; Anselin, L. rgeoda: R Library for Spatial Data Analysis, version 0.0.11-1; https://geodacenter.github.io/rgeoda/, https://github.com/geodacenter/rgeoda/, 2024.
- Lloyd, A.; Cheshire, J. Detecting Address Uncertainty in Loyalty Card Data. Appl. Spatial Analysis 2019, 12, 445–465. [Google Scholar] [CrossRef]
- Prasad, C. J. Effect of consumer demographic attributes on store choice behaviour in food and grocery retailing—an empirical analysis. Management and Labour Studies 2010, 35, 35–58. [Google Scholar]
- Melnykov, V.; Zhu, X. An extension of the K-means algorithm to clustering skewed data. Computational Statistics 2019, 34, 373–394. [Google Scholar]
- Perez, L.; Sengupta, R. Big Data (R) evolution in Geography: Complexity Modelling in the Last Two Decades. Geography Compass 2024, 18, e70009. [Google Scholar] [CrossRef]
- Sabzian, H.; Shafia, M. A.; Bonyadi Naeini, A.; Jandaghi, G.; Sheikh, M. J. A review of agent-based modeling (ABM) concepts and some of its main applications in management science. Interdisciplinary Journal of Management Studies 2018, 11, 659–692. [Google Scholar]
- Sengupta, R.; Bennett, D. A. Agent-based modelling environment for spatial decision support. International Journal of Geographical Information Science 2003, 17, 157–180. [Google Scholar]
- Sengupta, R.; Lant, C.; Kraft, S.; Beaulieu, J.; Peterson, W.; Loftus, T. Modeling enrollment in the Conservation Reserve Program by using agents within spatial decision support systems: an example from southern Illinois. Environment and Planning B: Planning and Design 2005, 32, 821–834. [Google Scholar]
- Statistics Canada. Census Profile, 2021 Census of Population; Catalogue no. 98-316-X2021001; Ottawa: Statistics Canada, 2023. Available online: https://www12.statcan.gc.ca/census-recensement/2021/dp-pd/prof/index.cfm?Lang=E.
- Sturley, C.; Newing, A.; Heppenstall, A. Evaluating the potential of agent-based modelling to capture consumer grocery retail store choice behaviours. The International Review of Retail, Distribution and Consumer Research 2018, 28, 27–46. [Google Scholar]
- Zhang, J.; Robinson, D. T. Investigating path dependence and spatial characteristics for retail success using location allocation and agent-based approaches. Computers, Environment and Urban Systems 2022, 94, 101798. [Google Scholar]
- Zhang, J. Z.; Chang, C. W. Consumer dynamics: theories, methods, and emerging directions. Journal of the Academy of Marketing Science 2021, 49, 166–196. [Google Scholar] [CrossRef]
- Zhu, Y.; Ferreira, J. Synthetic Population Generation at Disaggregated Spatial Scales for Land Use and Transportation Microsimulation. Transportation Research Record: Journal of the Transportation Research Board 2014, 2429, 168–177. [Google Scholar] [CrossRef]











| Cluster | 0 | 1 | 2 | 3 | 4 | 5 | 6 | |
|---|---|---|---|---|---|---|---|---|
| ]3*Total Income | Mean | 40,998.96 | 41,251.73 | 40,560.14 | 110,898.73 | 40,378.49 | 42,297.43 | 43,479.75 |
| Median | 39,300.00 | 39,000.00 | 38,777.78 | 109,334.87 | 38,714.29 | 39,428.57 | 39,900.00 | |
| Std. | 12,817.45 | 13,655.96 | 12,441.85 | 24,585.19 | 12,091.56 | 15,319.54 | 17,703.89 | |
| ]3*Frequency of Store visits | Mean | 10.85 | 51.61 | 47.06 | 43.45 | 43.20 | 92.66 | 848.13 |
| Median | 7.39 | 14.77 | 16.00 | 12.77 | 14.99 | 34.47 | 1034.00 | |
| Std. | 10.96 | 103.83 | 80.56 | 88.44 | 69.89 | 143.21 | 236.05 | |
| ]3*Distance to most frequently visited store | Mean | 1.74 | 1.70 | 1.20 | 1.70 | 1.47 | 14.27 | 3.62 |
| Median | 0.97 | 1.22 | 0.60 | 0.88 | 0.99 | 10.95 | 1.78 | |
| Std. | 2.19 | 1.72 | 1.74 | 2.52 | 1.57 | 11.09 | 4.88 | |
| ]3*Distance to nearest store | Mean | 0.88 | 1.14 | 0.66 | 0.94 | 0.98 | 3.18 | 1.20 |
| Median | 0.76 | 0.96 | 0.53 | 0.71 | 0.83 | 3.35 | 0.91 | |
| Std. | 0.57 | 0.75 | 0.51 | 0.76 | 0.66 | 1.89 | 1.02 | |
| ]3*Average spending per transaction | Mean | 39.10 | 115.71 | 31.72 | 38.54 | 32.45 | 39.75 | 35.60 |
| Median | 35.09 | 104.49 | 28.04 | 31.83 | 28.45 | 33.31 | 23.60 | |
| Std. | 19.98 | 47.83 | 17.60 | 25.85 | 18.39 | 26.81 | 37.31 | |
| ]3*Unique stores visited | Mean | 9.93 | 4.01 | 3.44 | 4.50 | 3.40 | 3.86 | 1.11 |
| Median | 9.00 | 4.00 | 3.00 | 4.00 | 3.00 | 3.00 | 1.00 | |
| Std. | 3.09 | 2.27 | 1.77 | 2.91 | 1.68 | 2.58 | 0.31 | |
| ]3*Evening shopping (%) | Mean | 32.39 | 20.49 | 58.77 | 30.76 | 12.64 | 25.67 | 27.76 |
| Median | 31.47 | 16.00 | 56.96 | 27.66 | 10.26 | 20.00 | 0.00 | |
| Std. | 19.07 | 19.67 | 16.66 | 24.03 | 11.35 | 24.65 | 41.94 | |
| ]3*Weekend shopping (%) | Mean | 32.79 | 43.23 | 32.46 | 30.33 | 25.09 | 30.39 | 31.37 |
| Median | 30.85 | 41.23 | 31.03 | 28.57 | 24.10 | 27.50 | 0.00 | |
| Std. | 13.31 | 25.62 | 15.79 | 17.21 | 15.96 | 22.55 | 43.67 | |
| Number of customers | 44062 | 26816 | 80364 | 14492 | 100412 | 16932 | 12553 |
| Cluster | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| 1st | Beverage | Beverage | Beverage | Beverage | Beverage | Beverage | Juice |
| 2nd | Chips | Juice | Cheese | Juice | Chips | Chips | Beverage |
| 3rd | Cheese | Cheese | Chips | Cheese | Juice | Juice | Yogurt |
| 4th | Juice | Chips | Juice | Chips | Cheese | Cheese | Cheese |
| 5th | Mayonnaise | Mayonnaise | Mayonnaise | Mayonnaise | Butter | Butter | Eggs |
| 6th | Butter | Milk | Butter | Butter | Mayonnaise | Mayonnaise | Mayonnaise |
| 7th | Milk | Butter | Yogurt | Eggs | Eggs | Eggs | Chips |
| 8th | Yogurt | Yogurt | Eggs | Milk | Yogurt | Yogurt | Butter |
| 9th | Eggs | Eggs | Milk | Yogurt | Milk | Milk | Bread |
| 10th | Sugar | Cereals w/o fruits/nuts | Sugar | Sugar | Sugar | Sugar | Cereals w/ fruits/nuts |
| Cluster | Name | Income | Shopping Frequency |
Distance to Most Frequently Visited Store |
Avg. Spending |
Unique Stores |
% Evening | % Weekend |
| 0 | The Frequent Store Explorers | ∼$40K | ∼7 days | 1.74 km | $39 | ∼10 | ∼32% | ∼33% |
| 1 | High-Spending Weekend Shoppers | ∼$41K | ∼15 days | 1.70 km | $116 | ∼4 | ∼20% | ∼43% |
| 2 | Near Evening Shoppers | ∼$40K | ∼16 days | 1.20 km | $32 | ∼3–4 | ∼59% | ∼32% |
| 3 | High Income Shoppers | ∼$110K | ∼12–13 days | 1.70 km | $39 | ∼4–5 | ∼31% | ∼30% |
| 4 | Weekday Day Shoppers | ∼$40K | ∼14–15 days | 1.47 km | $32 | ∼3–4 | ∼13% | ∼25% |
| 5 | Far Infrequent Shoppers | ∼$42K | ∼35 days | 14.27 km | $40 | ∼4 | ∼26% | ∼30% |
| 6 | Minimal Engagement Shoppers | ∼$43K | 1,000+ days | 3.62 km | $36 | ∼1 | ∼28% (high variance) | ∼31% (high variance) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).