Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grank 项目的可能出现误差的点 #11

Open
bestony opened this issue Oct 1, 2018 · 0 comments
Open

grank 项目的可能出现误差的点 #11

bestony opened this issue Oct 1, 2018 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@bestony
Copy link
Member

bestony commented Oct 1, 2018

可能导致误差的点

1. 丢弃项目中未设定时间的 commit / pr

在使用 GraphQL 抓取数据时,发现了部分 commit / pr 是未标注时间的,未确保这些 dirty data 不影响项目的分析,丢弃了这部分数据。但是同样可能导致数据分析的结果不够精确。

相关代码位置:

commit_frame = commit_frame[commit_frame.date != "未标注时间"]

2. 邮箱不精确

在使用 GraphQL 抓取数据时,发现部分 commit/pr 的email 为 github 域下的,考虑到无法对 Github 域下的项目分析其所属的企业和个人,将此部分数据丢弃。以确保社区化分析的准确性。

Grank/grank/libs/helpers.py

Lines 229 to 234 in 7b00abb

def is_corp(email,config):
"""判断是否是企业用户"""
if config["corp"]["keyword"] in email:
return True
else:
return False

@bestony bestony added help wanted Extra attention is needed enhancement New feature or request labels Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant