Skip to content

Checkstyle GSoC 2020 Project Ideas

Roman Ivanov edited this page Apr 20, 2024 · 35 revisions

we participated in 2020. https://summerofcode.withgoogle.com/archive/2020/organizations/4788704964509696#projects-list



Project Name: Patch Suppression

Skills required: basic Java

Project type: new feature implementation.

Project goal: implement new filter/suppression module

Mentors: Roman Ivanov, Pavel Bludov, Timur Tibeev, Benjamin Marwell

Description: Introducing Checkstyle to a project can be a challenging and NOT an easy job, especially when a project has massive amount of code, very active in development, and there are no resources to start a new process of code cleanup. It may require an extensive effort, especially when there is legacy code from previous contributors that becomes a monotonous job, that everyone tries to avoid. It is easy to say how code should look like, but may be hard to actually enforce rules in existing codebase.

For example Guava is not following google style, and it is easy to say how code should look like but hard to assign somebody to fix ALL problems from previous contributors. It is very boring activity that all will try to avoid. Good practice from openjdk actually discourage code changes without good reason.

Better approach is to let existing code be as is and validate only new code. Checkstyle already has a wide array of filter functionality that could suppress certain violations if user classify a violation as “won’t fix”. Just getting started with setting up the initial suppressions still requires a huge effort to review all the violations, or organize a team on special cleanup process.

Checkstyle needs a new approach for filtering violations, which should be based on the diff details of the last commit. With this, users will only need to fix violations only in newly created or changed code. It could be configurable to skip changed lines and and only show violations in purely new code.

This new feature would allow users to ignore all violations from legacy code and start doing clean code during the middle of active development. Checkstyle will continue to validate whole code base, but with such suppressor only changed lines of code or new files will have to follow all Checks. As additional benefit users can grow their set of Checks, which validate new rules, and not be bothered at all about fixing all existing violations. All violations will be eventually fixed.

The difference generated by patch text content does not have information about what columns in the line are actually changed. It is required to find the exact set of columns to make the suppression more precise. Some Checks report violations only on the line now. They need to be updated to report violation on line and column to make a direct correlation to the difference utility.

As proof of success for this project, it is required to get some open source project onboard to use checkstyle and this new feature. It would be good to try collaborate one more time with Guava project or we can ask our friends in Spring or Hbase project.

More details: https://github.com/checkstyle/checkstyle/issues/6689


Project Name: Auto-fix Module

Skills required: basic Java

Project type: new feature implementation.

Project goal: implement new module, test it on real projects

Mentors: Roman Ivanov, Timur Tibeev, Andrei Paikin, Benjamin Marwell

Description: Checkstyle is known as tool that raises numerous minor issues. There are so many of these and they are so minor that it is hard to find time and engineer to fix them. Most of the issues are so easy to fix but navigation to certain part of the code and making the fix takes time. Engineers could spend this time doing something more valuable. Implementation of an auto-fix functionality could significantly simplify introduction of checkstyle to project as it will do most tedious work automatically.

The major part of checkstyle violations are specifically targeting the formatting of the code. It is often that IDE formatting settings are not in sync with the checkstyle configuration. The IDE can fix the code itself as part of it’s auto-formatting. The same should be done by Checkstyle. Each Check that is targeting the formatting part of the code should have “Fix” functionality built-in. This functionality will convert the code with the violation to compliant code without any user interaction. Such functionality is in huge demand by users.

In scope of this project, it is required to review all existing functionality of auto-fix of code in plugins and tools to learn challenges they have and see the whole list of requirements to resolve such a task. Make implementation of auto-fix for formatting Checks as part of a special Module that takes all reported violations and fix them that will support auto-fix. If the resulting functionality proves to be easy to maintain, and might be reused by checkstyle plugins, then propose API changes can be brought to the core library and allow any plugins to reuse it.

More details at https://github.com/checkstyle/checkstyle/issues/7427


Project Name: Generation and publishing regression diff report based on configs in Pull Request description

Skills required: intermediate Java

Project type: new feature implementation.

Project goal: implement tool for Checkstyle CI pipeline

Mentors: Roman Ivanov, Baratali Izmailov, Erik Silkensen

Description: Right now Checkstyle project has a rule that any functional contributions should be supported by special report that shows difference of behavior on a big set of opensource projects, example for acceptance of changes. Creation of such report is almost automated, but contributor still needs to figure out how to configure it and how to run it and publish it on the web. Even after generating the report once, they may need to keep regenerating it after each change in code during Pull Request review. This process is error prone approach especially when contributions are done in limited free time and by those unfamiliar with all the intricacies of what admins look for in the reports. We need to do the last leap forward and automate all such steps. Contributor should be focused on code and design and leave all chore activity to be done automatically.

Pull Request can have in description following markers for Dif report generator:

Diff Regression projects: {{URI to file like projects-to-test-on.properties}}

Diff Regression config: {{URI to file like my_checks.xml}}

As Proof of concept we should move all executions to Continuous Integration (CI). CI should take in the Pull Request description, parse it, and get from it “Diff Regression config” and use https://github.com/checkstyle/contribution/blob/master/checkstyle-tester/projects-for-wercker.properties . CI build items will do execution of validation for certain project(s) from property file. Once the report is generated, CI should be able to deploy it to a report storage. Once deployment is done, CI script should make a comment in the Pull Request about where to find and read the reports.

When proof of concept (implementation on CI) start working well we should generalize idea to standalone web service to be hosted in the “cloud”.

More details: https://github.com/checkstyle/checkstyle/issues/7498


Project Name: ANTLR grammar for Java 14 features

Skills required: intermediate Java

Project type: update for existing implementation.

Project goal: update Java ANTLR grammar in Checkstyle to support Java 14 syntax changes

Mentors: Roman Ivanov, Erik Silkensen, Pavel Bludov, Ruslan Diachenko, Baratali Izmailov

Description: Checkstyle is a project that helps software developers to conform with predefined styles in Java code formatting and some best practices. Validation process in Checkstyle relies on correct parsing of Java source file which helps to build an accurate parse tree used by each Check. As right now, Checkstyle does not work on java files that use features introduced from Java 12 till Java 14.

Java14 introduces a few new features that cause changes to the Java language syntax: Pattern Matching for instanceof, Text Blocks, Switch Expressions, Records.

The goal of this project is to add support of Java 14 syntax changes to Checkstyle and allow developers to keep on using checkstyle when switching to modern Java versions.

Updated grammar will be available for all other projects that are interested in parsing of Java files, as ANTLR grammar is common for all implementations and is not specific to Checkstyle.

If students finish this task before the deadline, the rest of the time will be spent on fixing known problems in our parser on rarely used syntax https://github.com/checkstyle/checkstyle/labels/antlr .


Project Name: Upgrade Java Grammar from ANTLR2 to ANTLR4

Skills required: basic Java and experience with syntax analysis.

Project type: new feature implementation.

Project goal: to update core library to the latest version in order to simplify Java grammar support.

Mentors: Roman Ivanov, Pavel Bludov, Erik Silkensen

Description: Checkstyle needs to have new Java grammar that is based on ANTLR4 version. This task is very difficult but it is critical for Checkstyle as ANTLR2 library is not supported (from 2006) and is far less efficient. Old version has a bunch of syntax analysis limitations that have already been resolved in ANTLR4. Our team is already experiencing difficulties with support of current grammar as it is too complicated due to limited parsing abilities of ANTLR2.

New features of ANTLR4 that we need:

  • Antlr4 got support of direct left recursion that will simplify grammar significantly. We already have a lot of warnings of non-deterministic behavior that is not possible to resolve in ANTLR2, example.
  • Antlr4 has a bunch of UI tools that helps user to debug grammar and see how parser will work: IDE plugins, Parse Tree Inspector UI application from ANTLR distribution package.

Prove of necessity: results of open survey , Example of annoying warnings during the build


Project Name: Optimization of distance between methods in single Java class

Skills required: basic Java , good analytical abilities, good background in mathematics.

Project type: new feature implementation.

Project goal: to make quality practices automated and publicly available.

Mentors: Roman Ivanov, Ruslan Diachenko, Timur Tibeyev

Description:

This task is ambitious attempt to improve code read-ability by minimizing user jump/scrolls in source file to look at details of method implementation when user looks at method first usage.

It is required to analyse a lot of code and find a model to minimize distance between methods first usage and method declaration in the same file and respect users preferences to keep grouped overloaded and overridden methods together. Some other preferences may appear during investigation of open-source projects.

First step is already done by our team, we created a web service that already calculate distances between methods and make DSM matrix to ease analysis - methods-distance. We already practice it in our project.

As a second step it is required to use a matrix of distances between methods and optimize it by some empiric algorithm to allow user define expected model of class by arguments. This will allow to use this algorithm as a Check to enforce code structure automatically during build time.

Results of the project:

  • article with all details of analysis and algorithm details;
  • new Checkstyle's Check with optimization algorithm to share the algorithm with whole java community.

Prove of necessity: we have a number of PRs where contributors put new methods at any possible place in a class but better place is close to first usage. Example #1, Example #2, Example #3, ....


Project Name: Reconcile formatters of Eclipse , NetBeans and IntelliJ IDEA IDEs by Checkstyle config.

Skills required: basic Java.

Project type: new feature implementation, analysis of existing IDE features.

Project goal: to make well-known quality practices publicly available.

Mentors: Roman Ivanov, Andrei Paikin, Daniel Mühlbachler

Description:

Usage of different IDEs in the same team is already a serious problem, as different IDEs format code base on their own rules and configurations. Unwanted formatting changes happen to code which complicate code-review process. Problem become more acute when project use static analysis tool like Checkstyle that has a wide range of code formatting Checks.

It is required to make it possible to use the same Checkstyle config to work in IDEs without conflicts with IDEs internal formatters. This will help team members be independent on IDE choice but at the same time keep the same format and code style throughout the team.

Main focus of this project is the analysis of formatting abilities of IDEs (indentation, imports order, declaration order, separator/operator wrap, .....) . Update existing Checkstyle Rules to be able to work in the similar and non-conflicting way.

Results of the project:

  • create configuration for IDEs for Checkstyle project to let Checkstyle team use it and auto-format code to conform with checkstyle_check.xml file that is used by Continuous Integration.
  • create Checkstyle config that follows default Eclipse formatting + inspection rules
  • create Checkstyle config that follows default IntelliJ IDEA formatting + inspection rules
  • create Checkstyle config that follows default NetBeans formatting + inspection rules

Prove of necessity: mail-list post #1, mail-list post #2, mail-list post #3 , discussion #1


Project Name: Open JDK Code convention coverage

Skills required: basic Java.

Project type: new feature implementation.

Project goal: to make well-known quality practices publicly available.

Mentors: Roman Ivanov, Richard Veach, Timur Tibeyev, Benjamin Marwell

Description:

OpenJdk Code Convention was one of the first guidelines on how to write Java code. OpenJdk Code Convention is marked as outdated (because of date of last update made in it) but best practices described there do not have an expiration date. New OpenJDK Java Style Guidelines is close to the final version and most likely will be successor of OpenJdk Code Convention. But there is a number of projects in Apache that still follow OpenJdk rules, so both configurations are in need by community.

OpenJdk Code Convention is already partly covered by Checkstyle, known as Sun Code Convention. A lot of validation Rules were added and changed in Checkstyle from the time when Sun's configuration was created (2004 year).

During the project it is required to review both documents in detail and prove publicly that Checkstyle covers all guideline rules. Missed functionality needs to be created, blocking bugs need to be fixed. Page OpenJdk Java Style Checkstyle Coverage needs to be updated. New page "New OpenJDK's Java Style Checkstyle Coverage" need to be created. Both pages need to be formatted in the same way as it is done for Google's Java Style Checkstyle Coverage.

Prove of necessity: javadoc issues on github; results of open survey; request from users for Openjdk coverage support.


Project Name: Coverage of Documentation Comments Style Guide

Skills required: basic Java.

Project type: new feature implementation.

Project goal: to make well-known quality practices publicly available.

Mentors: Roman Ivanov

Description:

Project will mainly be focusing on automation of Documentation Comments (javadoc) guidelines by Checkstyle Checks. Reliable comments parsing was a major improvement in Checkstyle during GSoC 2014, archived results need to be reused to reliably implement automation of Javadoc best practices.

Separate configuration file with newly created Checks need to be created. Best practices in documentation make sense not for all projects. Javadoc validation matters only for library projects that need to expose online documentation in web publicly.

The result of this project will be a configuration file with the maximum possible coverage of Comment style guide. Report should look like Google's Java Style Checkstyle Coverage. If there will be time left we can focus on coverage of guidelines from https://blog.joda.org/2012/11/javadoc-coding-standards.html

Prove of necessity: javadoc issues on github.


Project Name: Spellcheck of Identifiers by English dictionary

Skills required: intermediate Java.

Project type: new feature implementation.

Project goal: implement spell checking for java code for all identifiers .

Mentors: Roman Ivanov, Ruslan Diachenko, Andrei Paikin

Description:

The correct spelling of words in code is very important, since a typo in the name of method that is part of API could result in serious problem. Mistakes in names also make reading of code frustrating and misleading, especially when a typo in one letter makes developer to read javadoc or even implementation of the method. Two most popular IDEs (Eclipse and IntelliJ IDEA) already have spell-check ability. It will be beneficial for Checkstyle to have the same functionality that could be used in any Continuous Integration system by Command Line Interface or as part of build tool (maven, ant, gradle, ....) with wide range of options to customize to users needs. Features of existing spell-checkers need to be analyzed -
IntelliJ IDEA Spellchecking , Eclipse Spelling. There are numbers of open-source projects that do spell-check. It is ok to reuse them if license is compatible. Examples: https://code.google.com/archive/p/bspell/ , http://www.softcorporation.com/products/spellcheck/, ...


Project Name: Metadata files for all modules

Skills required: intermediate Java

Project type: creation of new functionality.

Project goal: simplify checkstyle plugin maintenance

Mentors: Roman Ivanov, Daniel Mühlbachler, Calixte Bonsart

Description: Most usage of checkstyle is done through plugins in IDEs, Quality systems, or build tools. These plugins usually have some specific UI to construct and modify validation configuration in a user friendly way. All details like description of module, all properties, description of them, default values, possible values is named Metadata of Checkstyle modules. Right now plugins need to maintain their own metadata details(example in Eclipse plugin, example in Sonar plugin) of each Module of checkstyle. There might be more plugins outside of the Checkstyle organization or more plugins which would consider to implement a config builder in their target system. The maintenance of this metadata is a very error prone approach and can delay the release of a plugin with new core/validation functionality.

The update of the Checkstyle core library in a plugin should ideally be a version bump somewhere in properties without any other files updates. Metadata of the Checkstyle modules that are required by plugins already exists in core library inside javadocs and xml files for our own website.

It is required to generate metadata files from javadoc of specific set of classes and place them as resources in the resulting released JAR. Additionally, it will be required to provide a utility functionality to read the content of such files. During this project, both plugins (Eclipse and SonarQube plugins) should be updated to read metadata from checkstyle jar and do not have any data that core library already have. As final step It would be good to generate site files that contains documentation of modules (just another form of metadata) from such metafiles.

More details: https://groups.google.com/forum/#!msg/checkstyle-devel/UeboJR0evSU/8nuQ3o-2kM8J

https://github.com/checkstyle/checkstyle/wiki/Checkstyle-GSoC-2017-Project-Ideas#project-name-generation-of-web-site-content-for-all-checkstyle-modules-from-javadoc


Project Name: Automate verification of documentation for all modules

Skills required: intermediate Java

Project type: creation of new functionality.

Project goal: organize documentation and automate its maintenance

Mentors: Roman Ivanov, Richard Veach

Description: Checkstyle is an active project. Our user base is always requesting existing functionality to be expanded and adding brand new features. As these features are added to the core Checkstyle project, documentation must be updated to notify users not involved in the request of its existence. Some changes can drastically change the default behavior of a module. Documentation becomes extremely important to help users understand how our modules work and can be configured to fit each unique persons’ needs without looking at the source behind the scenes.

Documentation is mostly a manual process and it is easy to miss updating it during the request workflow. Missing documentation on functionality can be missed for years as users can only go by documentation to know what exists. Even if it is caught and tried to be added, some contributors are not aware of our best practices when it comes to writing said documentation.

We want to automate most of our documentation creation to help aleve the manual processes in creating it. Automation will ensure all documentation for checkstyle follows a strict standard that we define. Not only ensuring all configurable options are documented, it will help detect if current examples of usage are enough or if more are needed. It will ensure examples provided are valid, compilable if Java, and that it will or will not produce the violations for the configuration and check being described. For any new modules added, it will print out a template for the contributor to follow and fill in the required information specific to that check, like descriptions.

As part of this project, students must ensure all documentation verification pass for existing documentation and show new module template can be used. If students finish this task before the deadline, the rest of the time will be spent reviewing descriptions ensuring they are complete and are easy to understand.

Clone this wiki locally