Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the Yelp Bean matching algorithm #300

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

conancain
Copy link
Contributor

@conancain conancain commented Nov 8, 2023

We modified the match_utils so that the meeting weights between 2 users are calculated based on attributes instead of being uniformly set to 1

@conancain conancain requested a review from kdeal November 8, 2023 18:51
@conancain conancain changed the title Attempt 2 Improving the Yelp Bean matching algorithm Nov 8, 2023
@jeanne1994
Copy link

Incorporating more user attributes in the matching mechanism will require having more columns in the postgres user table. To start, we can/will manually alter+update the table to add parameters (like language, location, manager id) to the postgres table.

In the future, we will have to figure out a programmatic way to fill the user table - either by editing the current cron job, if the source have the fields we need), or another cron job need to be created to pull data from the coreAPI and update each user record

@conancain conancain marked this pull request as ready for review November 8, 2023 19:15
@ny2ko
Copy link
Contributor

ny2ko commented Nov 8, 2023

This is an interesting decision to talk through. My previous assumptions were that we were getting this level of matching / user segmentation by creating the right subscriptions to split folks up by say office, location, interests etc. With this change that all becomes murky and makes me think we are trying to move to a place where we have one or very few subscriptions. Is that accurate?

@conancain
Copy link
Contributor Author

This is an interesting decision to talk through. My previous assumptions were that we were getting this level of matching / user segmentation by creating the right subscriptions to split folks up by say office, location, interests etc. With this change that all becomes murky and makes me think we are trying to move to a place where we have one or very few subscriptions. Is that accurate?

We are not trying to change the number of subscriptions. The idea here is that we want to avoid matching people who are in the same organization/have the same manager, as the idea of Beans is to connect with people across Yelp. It would be awkward to talk to your teammate through Beans match as you see/work with each other everyday.

@ny2ko
Copy link
Contributor

ny2ko commented Nov 9, 2023

This is an interesting decision to talk through. My previous assumptions were that we were getting this level of matching / user segmentation by creating the right subscriptions to split folks up by say office, location, interests etc. With this change that all becomes murky and makes me think we are trying to move to a place where we have one or very few subscriptions. Is that accurate?

We are not trying to change the number of subscriptions. The idea here is that we want to avoid matching people who are in the same organization/have the same manager, as the idea of Beans is to connect with people across Yelp. It would be awkward to talk to your teammate through Beans match as you see/work with each other everyday.

Does it not work by applying rules? E.g. https://github.com/Yelp/beans/blob/master/api/yelp_beans/matching/pair_match.py#L23 can be used to avoid matching people in the same org

@jeanne1994
Copy link

Does it not work by applying rules? E.g. https://github.com/Yelp/beans/blob/master/api/yelp_beans/matching/pair_match.py#L23 can be used to avoid matching people in the same org

Yes, rules can avoid matching people with the exact same attribute. However, this change is aim to increase the "interesting-ness" of the pairs by maximizing the diversity within each pair.

IIUC, the current subscription mechanism is based on available meeting time and interest. I do see value in matching people that are more different within each subscription, this can spice up convo and enable more cross-background learning/discussion. This is how I imagine this feature does: I want to be matched with people that are working in domains that are different than mine, during my ML bean time.

@ny2ko
Copy link
Contributor

ny2ko commented Nov 9, 2023

Does it not work by applying rules? E.g. https://github.com/Yelp/beans/blob/master/api/yelp_beans/matching/pair_match.py#L23 can be used to avoid matching people in the same org

Yes, rules can avoid matching people with the exact same attribute. However, this change is aim to increase the "interesting-ness" of the pairs by maximizing the diversity within each pair.

IIUC, the current subscription mechanism is based on available meeting time and interest. I do see value in matching people that are more different within each subscription, this can spice up convo and enable more cross-background learning/discussion. This is how I imagine this feature does: I want to be matched with people that are working in domains that are different than mine, during my ML bean time.

Some additional context that could help here:

I set beans up at Twitch(since left) and folks have been using it to meet each other. There are very many different subscriptions that exist, from a company wide one to specific locations, to meetings within an org to 1 on 1 setups within a team using beans. Each of these has different expectations for criteria to enforce for matches. E.g. Location wise people don't want to be matched with someone on the same team but for the within team subscription, that is what folks actually want.

Is there a way to make these code changes work using the rules systems so we can preserve the flexibility this affords each meeting subscription?

@jeanne1994
Copy link

Is there a way to make these code changes work using the rules systems so we can preserve the flexibility this affords each meeting subscription?

Oh, these code changes functions alongside existing rules and subscription set. The algor respects the existing matching rules and each subscription's user pool. We are only re-shaping how pairs are created (currently completely random) under each subscription. As an example, when we generate pairs for UK tea time, the high level steps are:

  1. get all the people who opt in for the week
  2. create all possible pairs (itertools.combinations)
  3. based on the rules of the subscription, remove pairs that can not be matched (eg. recently paired, same department etc)
  4. create optimal pairs (what the code tries to do)
  5. notify successful pairs

Its worth noting that the code is a marginal improvement on how users are matched, it is not trying to change the flow of the current match process

Copy link
Contributor

@ny2ko ny2ko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the back and forth and explaining your thoughts. I think what is an optimal pairing is a pretty subjective decision but good with this for a v1

return float(intersection) / union


def get_pairwise_distance(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to make the attributes used configurable? I think it'd be great to have the choice of attributes to apply be something that can be configured differently for different subscriptions

api/yelp_beans/matching/match_utils.py Outdated Show resolved Hide resolved
distance += dist_2

# tenure
dist_3 = abs(int(user_a_attributes["days_since_start"]) - int(user_b_attributes["days_since_start"]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tenure is a bit subjective. I don't have strong opinions here if it doesn't lead to starvation. Fundamental to this assumption is that tenured folks know each other and so optimize for meeting newer less tenured people.

I think this works for v1 but I'll be curious to hear feedback on whether folks not getting matched with similarly tenured people gets noticed. Perhaps eventually we should get to a place where we can ask users to tell us their preferences for matching

api/tests/matching/match_utils_test.py Show resolved Hide resolved
@ykdeal ykdeal changed the base branch from master to main December 6, 2023 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants