Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to gengo #1305

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

spenserblack
Copy link
Collaborator

@spenserblack spenserblack commented Apr 8, 2024

See #1152 for previous conversation. With the drastic API changes to gengo it started to feel like more work to modify the existing PR instead of starting over.

To Do

Resolves #26
Closes #1152

To get this working with one language and then expand from there.
This doesn't do much but make the parts that used to use tokei panic.
Languages that aren't yet supported are commented out.
@spenserblack
Copy link
Collaborator Author

@o2sh Before this moves any further we need to determine if we want to move forward with analyzing a git ref (HEAD) or a folder (I think preferably we don't support both). I've started listing reasons for each here: #1303

src/info/blob_size.rs Outdated Show resolved Hide resolved
languages.get_statistics(&[&dir], &ignored, &tokei_config);
languages
) -> Result<gengo::Analysis, Box<dyn Error>> {
// TODO Determine best way to ignore files (and if that should continue to be handled by onefetch)
Copy link
Owner

@o2sh o2sh Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have language type and glob filtering for the V1

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or is this going to be handled via .gitattributes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on the "file source." Right now the Git file source is highly opinionated and would use .gitattributes for ignoring files, overrides, etc.


I plan on the Directory file source being much less opinionated (can't get much more generic than just reading a folder 🙂). The functionality isn't there yet, but I was considering adding ways to configure excluding/including files. Right now .gitignore and .ignore files would ignore files via the ignore crate's default behavior. I was hoping to discuss the usage sometime, but it would probably either be like this:

let file_source = Directory::with_config("./", Config { ignored_files: Some(vec![]) });

or like this:

let file_source = Directory::new("./").extend_ignored_files(&[]);

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear, and it also aplies to the language type filter ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default Gengo marks Programming, Markup, and Query as detectable, and Prose and Data as not detectable. detectable is basically a boolean that answers "should this factor in to the stats?" Also, documentation, generated files, and vendored files are not detectable by default. Assuming we use the Git file source with .gitattributes, this is example usage:

# Markdown is Prose, but we still want it in the stats
*.md gengo-detectable

# JavaScript is Programming, but we want to exclude it
*.js -gengo-detectable

# The contents of dist/ are not generated, so they will be detectable
dist/* -gengo-generated
# OR
# Even though dist/* is generated, it should still be included in the stats
dist/* gengo-detectable

By default any file that is not detectable is excluded from the Summary.


Again, the Git file source is highly opinionated and tries to behave a lot like github-linguist. The usage with the Directory file source is still undecided, and right now it doesn't implement any overrides. Or we could write our own file source (#1303 (reply in thread)) if we want very specific behavior that wouldn't be suitable for the gengo crate.

But, however we implement it, IMO the best way would be to inform gengo what files are and aren't detectable, and then make use of either the Analysis (detailed results) or Summary (simplified results).


tl;dr Yes, gengo would be the one handling included/excluded types, and we'd just pass configuration to it somehow.

languages
) -> Result<gengo::Analysis, Box<dyn Error>> {
// TODO Determine best way to ignore files (and if that should continue to be handled by onefetch)
let file_source = Git::new(dir, "HEAD")?;
Copy link
Owner

@o2sh o2sh Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the pending changes taken into count for the language analysis?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be handled by gengo.
@@ -9,14 +9,14 @@ use strum::IntoEnumIterator;

pub mod language;

pub fn get_main_language(loc_by_language: &[(Language, usize)]) -> Language {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of refactoring is inevitable to make things make more sense when using gengo over tokei. Stumbled on this. Don't mean to be harsh, but is a function for the sole purpose of indexing into [0] really necessary? Especially now that newer versions of Rust have .first() and .get(0) to safely index into collections.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, go ahead.

But for an easier review, we can keep the refactoring/cleanup in a separate PR, so we can focus on the current subject and don't lose track of the important changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored it to handle a OOB panic that was introduced (see 82da923 for more info).

This was introduced by the removal of `get_loc_by_language`, which had
a check to return `None` if the total LoC was `0`. This will now check
if the list of languages is empty.
@spenserblack
Copy link
Collaborator Author

So I think that covers pretty much everything. All that's left is to add more language support (spenserblack/gengo#34) and make a new release to add back the supported language variants and then we're good AFAIK.

* Dependencies
* `&Option<T>` -> `Option<&T>` conversion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect language detected (C++ as C, XML as TypeScript, etc.)
2 participants