Skip to content
Luís Rita edited this page Mar 19, 2020 · 4 revisions

October | December | January | February | March

October

[11/10/19] HyperFoods project proposal;

[15/10/19] Project allocation;

December

[01/12/19 - 03/12/19] EIT Health Summit 2019 (Paris, France). Defining unmet need - how much does it matter? The importance of placing the patients in the core of healthcare innovation;

[20/12/19] Word2Vec embedding of ingredients and recipes using gensim;

[22/12/19] Different visualization modules were tested: Matplotlib, Plotly and Seaborn;

[24/12/19] Vocabulary created to extract ingredient's quantities from recipes;

[26/12/19] Data visualization bugs fixed. Added unsupervised learning algorithm - Infomap, to cluster ingredients and recipes accordingly to their similarity;

[27/12/19] Enhanced visualization tools (e.g. - clustering represented). Overall ingredient popularity calculated over the Recipe1M+. Bugs fixed;

[28/12/19] Facebook recipe retrieval algorithm benchmarked. Similar values to the ones presented in the respective algorithm paper;

[29/12/19] IoU and F1 scores calculated as part of the benchmark tests. Enhanced Visualization of Seaborn. General bugs fixes;

[30/12/19] Started buiding recipe recommendation system. Communication with "FoodData Central" API to retrieve recipes' nutritional information. Data visualization enhanced;

[31/12/19] Enhanced code readability;

[01/12/20] Recipe units (teaspoon, cup, onces...) of each ingredient retrieved from the dataset. New vocabulary file was created with this purpose. Enhancement in the detection process of the ingredients present in the recipes;

January

[02/01/20] Recipe1M+ ingredient quantities' analysis: several website databases did not have their values correctly parsed. It will be fixed. Added notes regarding the role of each python module imported;

[03/01/20] New Recipe1M+ dataset generated with the quantities of each element corrected from the recipes belonging to "food.com". Started exploring FlavorDB. "Flavor network and the principles of food pairing" article read;

[04/01/20] Resolution of Matplotlib and Seaborn data plots enhanced. Recipes are now classified accordingly to the average number of anticancer molecules by ingredient (intermediate stage for later considering specific quantities). "FlavorDB: a database of flavor molecules" read;

[05/01/20] 2 enhanced vocabulary files were created to retrieve ingredient's quantities. Recipe1M+ dataset was carefully analyzed and some extra errors were identified and corrected (e.g. - presence of a recipe written in Spanish);

[07/01/20] Quantities, units and ingredients retrieved from the recipes. Web server (Node.js) created as basis for a future web application of recipe recommendation. Added conversion from kitchen units (teaspoon, tablepoon, etc.) to a metric unit (ml). 4 clustering algorithms (Louvain, Infomap, Density-Based Spatial Clustering of Applications with Noise, and Mean Shift) were tested. Added GitHub pages;

[12/01/20] Recipe recommendation web application launched (hyperfoods.herokuapp.com/). Number of anticancer molecules weighted accordingly to the quantity of each element in the recipes. Inverse cooking algorithm from Facebook integrated in the web app;

[14/01/20] Added thesis word document and txt file with the conversion from kitchen units to grams;

[15/01/20] Added thesis PDF document;

[16/01/20] Bug in infomap fixed. Created function which input is a single recipe from the Recipe1M+ dataset and outputs the ingredients, quantity and units. Simplified, enhanced efficiency of the overall Jupyter Notebbok. Started applying supervised learning SVM to jaan's dataset;

[21/01/20] Reformatted Jupyter Notebook for better understanding. Parsing units and cooking processes;

[22/01/20] Improved code efficiency;

[23/01/20] Empty and non-english recipes deleted from database. Nutritional info retrieved for recipe dataset vocab items. Overall efficiency improvements;

[25/01/20] Corrected units in the nutritional info retrieved from recipe dataset. Medium sizes for each ingredient retrieved;

[28/01/20] Added datasets and metadata to QNAP NAS. Google Doc updated with the description of each file. Prepared PowerPoint presentation. Trained SVM model to be able to predict cuisines in the Recipe1M+ dataset;

[30/01/20] Increased code efficiency by substituting some dictionaries by sets. LinearSVC model trained with the dataset containing ingredients and cuisines;

February

[01/02/20] Random Forest model trained the dataset containing ingredients and cuisines. Model validation using Stratified K-Fold Cross Validation and Leave One Out Cross Validation for the Random Forest and Linear Support Vector (80% accuracy) classifiers. Upload package to Test PyPi allowing retrieval of ingredients, quantities and units;

[02/02/20] Test PyPI package bug corrected. Jupyter Notebook better organized. Project readme.md updated. Created DOI using Zenodo. Created first release in GitHub;

[05/02/20] Introducing Dask to perform data analysis using distributed computing. General bug fixes;

[09/02/20] Corrected bugs in the web application;

[10/02/20] Thesis document updated. Created synonymous JSON file converting ingredients from Kaggle and Nature to Recipe1M+ dataset vocabulary. Corrected bugs in the web application. Updated Google Doc with Random Forest model and synonymous JSON. New updated version of vocabulary file to retrieve quantities' units from recipes was created;

[11/02/20] Retrieving tokenized ingredients from Recipe1M+ dataset. Training Word2Vec model to Recipe1M+ and Kaggle & Nature datasets. Confusion matrix implemented to SVM model;

[12/02/20] Corrected ingredients' names from anticancer molecules' list. Retrieved list of cuisine-drug interactions. JSON file created with only the ingredients compiled from the Recipe1M+. Word2Vec trained in Recipe1M+ and Kaggle & Nature datasets;

[13/02/20] Training SVC. Creating a scalable (without RegEx) fractions corrector to Recipe1M+. Improving units and quantities detector algorithm. Ingredients, quantities and units detected separately. 3 features will be integrated in a function and it will be released to PyPI;

[15/02/20] Scalable algorithm implemented that retrieves the ingredients, quantities and units from a recipe line: PyPI. Table with the recipes with the higher number of anticancer molecules retrieved. Jupyter Notebook code reorganized;

[23/02/20] LinearSVM model trained. Confusion matrix calculated. 5-fold cross validation performed. Removed recipes with less than 3 ingredients from anticancer top list. Embedded recipes and ingredients plotted in 2D. Cuisines retrieved for each recipe from Recipe1M+ dataset;

[27/02/20] It was increased (50 -> 100) the number of features in each word2vec vector which resulted in a an increase of accuracy in cuisine classification and ingredients embedding. Spectral clustering applied to the last embedding succesfuly. Normalization of confusion matrix;

March

[09/03/20] Project report submitted;

[16/03/20] Poster finished;

Clone this wiki locally