Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give usage stats for typical properties and sort them #139

Open
nichtich opened this issue Nov 29, 2018 · 2 comments
Open

Give usage stats for typical properties and sort them #139

nichtich opened this issue Nov 29, 2018 · 2 comments
Labels
enhancement New feature or request view: Entity This relates to the entity view
Milestone

Comments

@nichtich
Copy link
Contributor

The "typical properties" would be more helpful if sorted by number of usage. What percentage of instances actually use these properties? See the SPARQL query in this thread to find out: https://twitter.com/fagerving/status/1068229258491846656

@mkroetzsch
Copy link
Member

mkroetzsch commented Nov 29, 2018

"Typical" in SQID really means something quite different from absolute counts. It shows properties that are significantly more relevant to a class of things than to other things on Wikidata. For example, Freebase ID is one of the most common properties overall, and across all classes, but it is not particularly typical for anything. SQID orders properties by "average typicality" (across all classes), which is -- I agree -- not the best approach since it shuffles properties from page to page.

To give an example, the most typical properties for "lighthouses" are "light characteristic of lighthouse", "Admiralty number", and "focal height" (https://tools.wmflabs.org/sqid/#/view?id=Q39715) but these are surely not the most frequent (supposedly, every single lighthouse has coordinates). So it does work well, but is not the best heuristic for ordering. In fact, I think ordering should be more manually controlled still, e.g., you want birth and death to end up close to one another and in some fixed order, but I don't think mere statistics would ever achieve this.

To answer to the actual issue report: sorting typical properties by usage would put things first that are not "typical" at all, and would make the same properties be the top ranking ones across large parts of data.

@nichtich
Copy link
Contributor Author

nichtich commented Dec 2, 2018

Thanks for the detailed explanation what "typical" actually means in SQID. But what's the use case for this information? Given the "typical" properties one could infer what class an item without P31 statement best belongs to (duck-typing). My use case is creation or extension of items with a known class. If an editor curates an item of a lighthouse he/she should first know that almost every lighthouse has a country and a coordinate. I'd propose to rename "typical properties" to "distinguishing" or "designating" properties and introduce the most used properties as "typical properties" or "frequent properties".

@mmarx mmarx added enhancement New feature or request view: Entity This relates to the entity view labels Aug 20, 2019
@mmarx mmarx added this to To do in General Development via automation Aug 20, 2019
@mmarx mmarx added this to the New Features milestone Aug 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request view: Entity This relates to the entity view
Projects
Development

No branches or pull requests

3 participants