Improving the Yelp Bean matching algorithm #300

conancain · 2023-11-08T16:47:21Z

We modified the match_utils so that the meeting weights between 2 users are calculated based on attributes instead of being uniformly set to 1

…nto jenny_debin_2023-11-08

jeanne1994 · 2023-11-08T18:59:04Z

Incorporating more user attributes in the matching mechanism will require having more columns in the postgres user table. To start, we can/will manually alter+update the table to add parameters (like language, location, manager id) to the postgres table.

In the future, we will have to figure out a programmatic way to fill the user table - either by editing the current cron job, if the source have the fields we need), or another cron job need to be created to pull data from the coreAPI and update each user record

ny2ko · 2023-11-08T21:56:24Z

This is an interesting decision to talk through. My previous assumptions were that we were getting this level of matching / user segmentation by creating the right subscriptions to split folks up by say office, location, interests etc. With this change that all becomes murky and makes me think we are trying to move to a place where we have one or very few subscriptions. Is that accurate?

conancain · 2023-11-09T16:05:47Z

This is an interesting decision to talk through. My previous assumptions were that we were getting this level of matching / user segmentation by creating the right subscriptions to split folks up by say office, location, interests etc. With this change that all becomes murky and makes me think we are trying to move to a place where we have one or very few subscriptions. Is that accurate?

We are not trying to change the number of subscriptions. The idea here is that we want to avoid matching people who are in the same organization/have the same manager, as the idea of Beans is to connect with people across Yelp. It would be awkward to talk to your teammate through Beans match as you see/work with each other everyday.

ny2ko · 2023-11-09T16:12:38Z

This is an interesting decision to talk through. My previous assumptions were that we were getting this level of matching / user segmentation by creating the right subscriptions to split folks up by say office, location, interests etc. With this change that all becomes murky and makes me think we are trying to move to a place where we have one or very few subscriptions. Is that accurate?

We are not trying to change the number of subscriptions. The idea here is that we want to avoid matching people who are in the same organization/have the same manager, as the idea of Beans is to connect with people across Yelp. It would be awkward to talk to your teammate through Beans match as you see/work with each other everyday.

Does it not work by applying rules? E.g. https://github.com/Yelp/beans/blob/master/api/yelp_beans/matching/pair_match.py#L23 can be used to avoid matching people in the same org

jeanne1994 · 2023-11-09T19:50:38Z

Does it not work by applying rules? E.g. https://github.com/Yelp/beans/blob/master/api/yelp_beans/matching/pair_match.py#L23 can be used to avoid matching people in the same org

Yes, rules can avoid matching people with the exact same attribute. However, this change is aim to increase the "interesting-ness" of the pairs by maximizing the diversity within each pair.

IIUC, the current subscription mechanism is based on available meeting time and interest. I do see value in matching people that are more different within each subscription, this can spice up convo and enable more cross-background learning/discussion. This is how I imagine this feature does: I want to be matched with people that are working in domains that are different than mine, during my ML bean time.

ny2ko · 2023-11-09T21:23:15Z

Does it not work by applying rules? E.g. https://github.com/Yelp/beans/blob/master/api/yelp_beans/matching/pair_match.py#L23 can be used to avoid matching people in the same org

Yes, rules can avoid matching people with the exact same attribute. However, this change is aim to increase the "interesting-ness" of the pairs by maximizing the diversity within each pair.

IIUC, the current subscription mechanism is based on available meeting time and interest. I do see value in matching people that are more different within each subscription, this can spice up convo and enable more cross-background learning/discussion. This is how I imagine this feature does: I want to be matched with people that are working in domains that are different than mine, during my ML bean time.

Some additional context that could help here:

I set beans up at Twitch(since left) and folks have been using it to meet each other. There are very many different subscriptions that exist, from a company wide one to specific locations, to meetings within an org to 1 on 1 setups within a team using beans. Each of these has different expectations for criteria to enforce for matches. E.g. Location wise people don't want to be matched with someone on the same team but for the within team subscription, that is what folks actually want.

Is there a way to make these code changes work using the rules systems so we can preserve the flexibility this affords each meeting subscription?

jeanne1994 · 2023-11-09T21:45:59Z

Is there a way to make these code changes work using the rules systems so we can preserve the flexibility this affords each meeting subscription?

Oh, these code changes functions alongside existing rules and subscription set. The algor respects the existing matching rules and each subscription's user pool. We are only re-shaping how pairs are created (currently completely random) under each subscription. As an example, when we generate pairs for UK tea time, the high level steps are:

get all the people who opt in for the week
create all possible pairs (itertools.combinations)
based on the rules of the subscription, remove pairs that can not be matched (eg. recently paired, same department etc)
create optimal pairs (what the code tries to do)
notify successful pairs

Its worth noting that the code is a marginal improvement on how users are matched, it is not trying to change the flow of the current match process

ny2ko

Thanks for the back and forth and explaining your thoughts. I think what is an optimal pairing is a pretty subjective decision but good with this for a v1

ny2ko · 2023-11-10T01:02:39Z

api/yelp_beans/matching/match_utils.py

+        return float(intersection) / union
+
+
+def get_pairwise_distance(


Would it be possible to make the attributes used configurable? I think it'd be great to have the choice of attributes to apply be something that can be configured differently for different subscriptions

api/yelp_beans/matching/match_utils.py

ny2ko · 2023-11-10T01:06:07Z

api/yelp_beans/matching/match_utils.py

+    distance += dist_2
+
+    # tenure
+    dist_3 = abs(int(user_a_attributes["days_since_start"]) - int(user_b_attributes["days_since_start"]))


Tenure is a bit subjective. I don't have strong opinions here if it doesn't lead to starvation. Fundamental to this assumption is that tenured folks know each other and so optimize for meeting newer less tenured people.

I think this works for v1 but I'll be curious to hear feedback on whether folks not getting matched with similarly tenured people gets noticed. Perhaps eventually we should get to a place where we can ask users to tell us their preferences for matching

api/tests/matching/match_utils_test.py

api/yelp_beans/matching/match_utils.py

…into jenny_debin_2023-11-08

Updating User model

40ed781

conancain requested a review from jeanne1994 November 8, 2023 16:47

jeanne1994 added 4 commits November 8, 2023 11:47

migrated match_utils and pair_match

114cea9

fix bug.

5dc9cd6

fix bug.

02b6913

fix bug.

28aa84e

conancain requested a review from kdeal November 8, 2023 18:51

conancain changed the title ~~Attempt 2~~ Improving the Yelp Bean matching algorithm Nov 8, 2023

conancain added 2 commits November 8, 2023 13:57

Clean up unnecessary commented out code

6d1c8c1

Merge branch 'jenny_debin_2023-11-08' of github.com:conancain/beans i…

6d4c9c4

…nto jenny_debin_2023-11-08

Clean up code

f91ddfd

conancain marked this pull request as ready for review November 8, 2023 19:15

ny2ko reviewed Nov 10, 2023

View reviewed changes

conancain commented Nov 24, 2023

View reviewed changes

api/yelp_beans/matching/match_utils.py Outdated Show resolved Hide resolved

Apply suggestions from code review

8638b40

ykdeal changed the base branch from master to main December 6, 2023 19:57

jeanne1994 and others added 5 commits March 1, 2024 09:50

adding test case structure

4666685

Merge branch 'main' into jenny_debin_2023-11-08

1a9591b

Added unit test cases for paired distance

b62ab33

Merge branch 'main' into jenny_debin_2023-11-08

5bc0498

Merge branch 'jenny_debin_2023-11-08' of github.com:jeanne1994/beans …

36278ba

…into jenny_debin_2023-11-08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the Yelp Bean matching algorithm #300

Improving the Yelp Bean matching algorithm #300

conancain commented Nov 8, 2023 •

edited

jeanne1994 commented Nov 8, 2023

ny2ko commented Nov 8, 2023

conancain commented Nov 9, 2023

ny2ko commented Nov 9, 2023 •

edited

jeanne1994 commented Nov 9, 2023

ny2ko commented Nov 9, 2023

jeanne1994 commented Nov 9, 2023

ny2ko left a comment

ny2ko Nov 10, 2023

ny2ko Nov 10, 2023

		return float(intersection) / union


		def get_pairwise_distance(

Improving the Yelp Bean matching algorithm #300

Are you sure you want to change the base?

Improving the Yelp Bean matching algorithm #300

Conversation

conancain commented Nov 8, 2023 • edited

jeanne1994 commented Nov 8, 2023

ny2ko commented Nov 8, 2023

conancain commented Nov 9, 2023

ny2ko commented Nov 9, 2023 • edited

jeanne1994 commented Nov 9, 2023

ny2ko commented Nov 9, 2023

jeanne1994 commented Nov 9, 2023

ny2ko left a comment

Choose a reason for hiding this comment

ny2ko Nov 10, 2023

Choose a reason for hiding this comment

ny2ko Nov 10, 2023

Choose a reason for hiding this comment

conancain commented Nov 8, 2023 •

edited

ny2ko commented Nov 9, 2023 •

edited