Submitted:
16 July 2025
Posted:
16 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. Data Processing
3.1. Data Source
3.2. Initial Parsing
- medium-size russet potato, about 10 ounces, peeled and diced
- shrimp, shelled and cut into bite-sized pieces
- kale, stemmed, rinsed, and coarsely chopped to make 6 cups
3.3. Mapping
- medium-size russet potato, about 10 ounces, peeled and diced → potato
- shrimp, shelled and cut into bite-sized pieces → shrimps
- kale, stemmed, rinsed, and coarsely chopped to make 6 cups → kale
-
Removing parenthetical tokens
- -
- vanilla extract (to taste) → vanilla extract
-
Removing comma-separated descriptors
- -
- honey, preferably wildflower → honey
-
Picking the best of two ingredients separated by ‘or’
- -
- sriracha or other hot sauce → sriracha
-
Finding a sole ‘top’ ingredient name
- -
- coarsely grated carrot → carrots
3.4. Supplementing with AllRecipes Data
- 1 pound ground beef
- 1/2 teaspoon black pepper
- 1 (25 ounces) package frozen cheese ravioli
4. Algorithm Design and Analysis
4.1. Algorithm Design
4.2. Graph-Based Models
4.3. Multimodal Approaches
- 1.
- w(x,y) describes the weight of the edge between two ingredients. I define the weight of an edge as the co-occurrence of two ingredients; that is, how many times the two ingredients appear in the same recipe.
- 2.
- graph is the entire ingredient network.
- 3.
- peripheral ingredient describes those ingredients that are not part of a recipe but have at least one edge (with nonzero weight) to an ingredient in the recipe (Figure 1). Formally, this is described as
- 4.
- compatibility score is a score computed (using different metrics) that is used to rate each ingredient suggestion. An ingredient with high “compatibility” Another suggestion is that the pair are commonly used together and, therefore, would be a good fit for the recipe.
- 1.
-
Degree Centrality (Algorithm 1). The key aspect of this algorithm was to compute a compatibility score of an ingredient by summing over all its connections with ingredients in the recipe. For two ingredients , I compute the weight of the connection by counting the number of occurrences in which appear together in the same recipe. Each connection is normalized by dividing by the highest weighted number of connections of the two ingredients; this helps prevent other high-frequency ingredients from dominating the recommendation. To compute the weighted number of connections of an ingredient a, I sum the weights over all the connections with its neighbors: .Formally, I write, for a pair of ingredients with i being a recipe ingredient and k a peripheral ingredient,
| Algorithm 1 Degree Centrality algorithm |
|
- 1.
- Normalized PMI. After computing a score using PMI, normalize the value by the weighted number of connections. This algorithm is the exact same as in Algorithm 1, only the weight is substituted for . This was mostly meant to compare directly with the degree centrality algorithm.
- 2.
-
Generalized PMI (Algorithm 2). This is an extension to the pairwise PMI score and generalizes it PMI between more than two ingredients. For an n-tuple of ingredients in the recipe, I compute the sum of all their weighted connections with each other . I then find the tuple whose sum of weights is the greatest - this captures the ingredients that are seen together most often in recipes and gives me a sense of the “essential” ingredients in the recipe. Then, using these essential ingredients, I can compute a generalized PMI score for each peripheral ingredient that has connections to all of these crucial ingredients, which is given byIn practice, I found that letting for determining the size of my tuple was best, as it as large as I could get without too many division by zero errors.
- 3.
-
Weighted PMI (Algorithm 3). This weights an ingredient’s PMI score by the number of edges that exist between itself and the ingredients of the recipe. Formally, for a peripheral ingredient b I describe this asThus, for each connection to an ingredient in the recipe, I increase the weight factor by 1. Intuitively, this assigns more importance to peripheral ingredients that pair well with many of the recipe’s ingredients.
- 4.
- Minimax PMI. This algorithm was meant to consider safe suggestions; that is, recommending peripheral ingredients that would be compatible with all recipe ingredients instead of ingredients that would work very well with some and not at all with others in the recipe. To do this, for a peripheral ingredient b I computed , and then returned the ingredients with the highest min values. This way, I could rule out ingredients that would never pair well with at least one recipe ingredient.
| Algorithm 2 Generalized PMI algorithm |
|
| Algorithm 3 Weighted PMI algorithm |
|
4.4. Analysis
5. Web Design
- simplesearch.html
- simplesearch_searched.html
- no_results.html
6. Backend Architecture
6.1. Integration with Frontend
6.2. Backend Processing
- 1.
- First, use module scraper.py to scrape recipes from the Internet and store raw data in a folder dl/.
- 2.
- Then, I employ parser.py to parse out the relevant recipe data and store the recipes in a folder processed/.
- 3.
- mapper.py is run to create two files: mapping.txt, which maps similar ingredients to the same ingredient, and top.txt, which lists the 1000 most common ingredients.
- 4.
- These are fed into the analyzer_*.py files, which produce the ingredient networks and run the relevant algorithms described in Section 3. These functions will run the algorithm on each recipe obtained from NYTimes Cooking, and save all the results in the corresponding analyzer_*.txt files.
7. Enhancements Based on Reviewer Feedback
7.1. Scalability and Real-Time Applicability
7.2. User Personalization and Feedback Loop
7.3. Ethical and Legal Considerations
- Users are now informed of data sources and their rights under GDPR and CCPA.
- I am exploring partnerships with recipe providers to obtain explicit permissions.
7.4. User Personalization and Feedback Loop
7.5. Refining Mathematical Models
8. Conclusions and Future Work
References
- Gawrysiak, P.; Kulkarni, A.; Jensen, C. Graph representations for food-wine pairing. arXiv 2024, arXiv:2407.00107 2024. [Google Scholar]
- Ma, T.; Chen, F.; Luo, X. Vision-language models in food composition compilation: UMDFood-VL. arXiv 2023, arXiv:2306.01747 2023. [Google Scholar]
- Sun, L.; Huang, M.; Zhou, J. Multimodal approaches for enhancing food pairing and dietary advice. Computers in Human Behavior 2023, 138, 107421. [Google Scholar]
- Zhou, Q.; Yu, L.; Wang, X. Flavor network evolution in cross-cultural cuisines. International Journal of Gastronomy and Food Science 2021, 25, 100341. [Google Scholar]
- Brown, E.; Richards, S. Fusion cuisine and the future of gastronomic innovation. Food Research International 2023, 162, 112456. [Google Scholar]
- Teng, C.; Lin, Y.; Adamic, L.A. Recipe recommendation using ingredient networks. CoRR 2011, abs/1111.3919. [Google Scholar]
- Ahn, Y.Y.; Ahnert, S.E.; Bagrow, J.P.; Barabási, A.L. Flavor network and the principles of food pairing. Scientific Reports 2011, 1. [Google Scholar] [CrossRef] [PubMed]
- Ma, T.; Chen, F.; Luo, X. Vision-language models in food composition compilation: Umdfood-vl. arXiv 2023, arXiv:2306.01747. [Google Scholar]
- Anonymous. How an AI-powered QR code will choose your restaurant meal. The Australian. 2023. Available online: https://www.theaustralian.com.au/business/technology/qr-code-business-meu-is-using-new-technology-to-overhaul-pub-restaurant-menus/news-story/ff937e97339fc6984d9e70e006d01686.
- Ma, T.; Chen, F.; Luo, X. Vision-Language Models in Food Composition Compilation: UMDFood-VL. arxiv 2023. Available online: https://arxiv.org/abs/2306.01747.
- Smith, C.; Patel, R. Dietary data-driven insights using advanced NLP algorithms. Journal of Computational Gastronomy 2022, 10, 201–213. [Google Scholar]
- Gawrysiak, P.; Kulkarni, A.; Jensen, C. Graph Representations for Food-Wine Pairing. arxiv 2024. Available online: https://arxiv.org/abs/2407.00107.
- Zhu, Y.X.; Huang, J.; Zhang, Z.K.; Zhang, Q.M.; Zhou, T.; Ahn, Y.Y. Geography and similarity of regional cuisines in China. PLoS ONE 2013, 8. [Google Scholar] [CrossRef] [PubMed]
- Jain, A.; Rakhi, N.; Bagler, G. Analysis of food pairing in regional cuisines of India. PLoS ONE 2015, 10. [Google Scholar] [CrossRef] [PubMed]
- Jain, A.; Bagler, G. Understanding regional food networks: A computational perspective. PLoS ONE 2023, 18, e0286321. [Google Scholar]





Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).


