Skip to content

Checkstyle GSoC 2021 Project Ideas

Roman Ivanov edited this page Apr 20, 2024 · 21 revisions

participated. 2 project were selected: https://summerofcode.withgoogle.com/archive/2021/organizations/4766383010742272#projects-list



Project Name: Upgrade Java Grammar from ANTLR2 to ANTLR4

Skills required: basic Java and experience with syntax analysis.

Project type: new feature implementation.

Project goal: to update core library to the latest version in order to simplify Java grammar support.

Mentors: Roman Ivanov, Andrei Paikin, Erik Silkensen

Description: Checkstyle needs to have new Java grammar that is based on ANTLR4 version. This task is very difficult but it is critical for Checkstyle as ANTLR2 library is not supported (from 2006) and is far less efficient. Old version has a bunch of syntax analysis limitations that have already been resolved in ANTLR4. Our team is already experiencing difficulties with support of current grammar as it is too complicated due to limited parsing abilities of ANTLR2.

New features of ANTLR4 that we need:

  • Antlr4 got support of direct left recursion that will simplify grammar significantly. We already have a lot of warnings of non-deterministic behavior that is not possible to resolve in ANTLR2, example.
  • Antlr4 has a bunch of UI tools that helps user to debug grammar and see how parser will work: IDE plugins, Parse Tree Inspector UI application from ANTLR distribution package.

Prove of necessity: results of open survey , Example of annoying warnings during the build, Support of Text Blocks (jdk14) resulted in big complexity, Feedback from users


Project Name: Auto-fix Module

Skills required: basic Java

Project type: new feature implementation.

Project goal: implement new module, test it on real projects

Mentors: Roman Ivanov, Daniel Mühlbachler, Erik Silkensen, Timur Tibeyev

Description: Checkstyle is known as tool that raises numerous minor issues. There are so many of these and they are so minor that it is hard to find time and engineer to fix them. Most of the issues are so easy to fix but navigation to certain part of the code and making the fix takes time. Engineers could spend this time doing something more valuable. Implementation of an auto-fix functionality could significantly simplify introduction of checkstyle to project as it will do most tedious work automatically.

The major part of checkstyle violations are specifically targeting the formatting of the code. It is often that IDE formatting settings are not in sync with the checkstyle configuration. The IDE can fix the code itself as part of it’s auto-formatting. The same should be done by Checkstyle. Each Check that is targeting the formatting part of the code should have “Fix” functionality built-in. This functionality will convert the code with the violation to compliant code without any user interaction. Such functionality is in huge demand by users.

In scope of this project, it is required to review all existing functionality of auto-fix of code in plugins and tools to learn challenges they have and see the whole list of requirements to resolve such a task. Make implementation of auto-fix for formatting Checks as part of a special Module that takes all reported violations and fix them that will support auto-fix. If the resulting functionality proves to be easy to maintain, and might be reused by checkstyle plugins, then propose API changes can be brought to the core library and allow any plugins to reuse it.

More details at https://github.com/checkstyle/checkstyle/issues/7427


Project Name: Optimization of distance between methods in single Java class

Skills required: basic Java , good analytical abilities, good background in mathematics.

Project type: new feature implementation.

Project goal: to make quality practices automated and publicly available.

Mentors: Roman Ivanov, Timur Tibeyev

Description:

This task is ambitious attempt to improve code read-ability by minimizing user jump/scrolls in source file to look at details of method implementation when user looks at method first usage.

It is required to analyse a lot of code and find a model to minimize distance between methods first usage and method declaration in the same file and respect users preferences to keep grouped overloaded and overridden methods together. Some other preferences may appear during investigation of open-source projects.

First step is already done by our team, we created a web service that already calculate distances between methods and make DSM matrix to ease analysis - methods-distance. We already practice it in our project.

As a second step it is required to use a matrix of distances between methods and optimize it by some empiric algorithm to allow user define expected model of class by arguments. This will allow to use this algorithm as a Check to enforce code structure automatically during build time.

Results of the project:

  • article with all details of analysis and algorithm details;
  • new Checkstyle's Check with optimization algorithm to share the algorithm with whole java community.

Prove of necessity: we have a number of PRs where contributors put new methods at any possible place in a class but better place is close to first usage. Example #1, Example #2, Example #3, ....


Project Name: Reconcile formatters of Eclipse , NetBeans and IntelliJ IDEA IDEs by Checkstyle config.

Skills required: basic Java.

Project type: new feature implementation, analysis of existing IDE features.

Project goal: to make well-known quality practices publicly available.

Mentors: Roman Ivanov,

Description:

Usage of different IDEs in the same team is already a serious problem, as different IDEs format code base on their own rules and configurations. Unwanted formatting changes happen to code which complicate code-review process. Problem become more acute when project use static analysis tool like Checkstyle that has a wide range of code formatting Checks.

It is required to make it possible to use the same Checkstyle config to work in IDEs without conflicts with IDEs internal formatters. This will help team members be independent on IDE choice but at the same time keep the same format and code style throughout the team.

Main focus of this project is the analysis of formatting abilities of IDEs (indentation, imports order, declaration order, separator/operator wrap, .....) . Update existing Checkstyle Rules to be able to work in the similar and non-conflicting way.

Results of the project:

  • create configuration for IDEs for Checkstyle project to let Checkstyle team use it and auto-format code to conform with checkstyle_check.xml file that is used by Continuous Integration.
  • create Checkstyle config that follows default Eclipse formatting + inspection rules
  • create Checkstyle config that follows default IntelliJ IDEA formatting + inspection rules
  • create Checkstyle config that follows default NetBeans formatting + inspection rules

Prove of necessity: mail-list post #1, mail-list post #2, mail-list post #3 , discussion #1


Project Name: Open JDK Code convention coverage

Skills required: basic Java.

Project type: new feature implementation.

Project goal: to make well-known quality practices publicly available.

Mentors: Roman Ivanov, Ruslan Diachenko

Description:

OpenJdk Code Convention was one of the first guidelines on how to write Java code. OpenJdk Code Convention is marked as outdated (because of date of last update made in it) but best practices described there do not have an expiration date. New OpenJDK Java Style Guidelines is close to the final version and most likely will be successor of OpenJdk Code Convention. But there is a number of projects in Apache that still follow OpenJdk rules, so both configurations are in need by community.

OpenJdk Code Convention is already partly covered by Checkstyle, known as Sun Code Convention. A lot of validation Rules were added and changed in Checkstyle from the time when Sun's configuration was created (2004 year).

During the project it is required to review both documents in detail and prove publicly that Checkstyle covers all guideline rules. Missed functionality needs to be created, blocking bugs need to be fixed. Page OpenJdk Java Style Checkstyle Coverage needs to be updated. New page "New OpenJDK's Java Style Checkstyle Coverage" need to be created. Both pages need to be formatted in the same way as it is done for Google's Java Style Checkstyle Coverage.

Prove of necessity: javadoc issues on github; results of open survey; request from users for Openjdk coverage support.


Project Name: Coverage of Documentation Comments Style Guide

Skills required: basic Java.

Project type: new feature implementation.

Project goal: to make well-known quality practices publicly available.

Mentors: Roman Ivanov

Description:

Project will mainly be focusing on automation of Documentation Comments (javadoc) guidelines by Checkstyle Checks. Reliable comments parsing was a major improvement in Checkstyle during GSoC 2014, archived results need to be reused to reliably implement automation of Javadoc best practices.

Separate configuration file with newly created Checks need to be created. Best practices in documentation make sense not for all projects. Javadoc validation matters only for library projects that need to expose online documentation in web publicly.

The result of this project will be a configuration file with the maximum possible coverage of Comment style guide. Report should look like Google's Java Style Checkstyle Coverage. If there will be time left we can focus on coverage of guidelines from https://blog.joda.org/2012/11/javadoc-coding-standards.html

Prove of necessity: javadoc issues on github.


Project Name: Spellcheck of Identifiers by English dictionary

Skills required: intermediate Java.

Project type: new feature implementation.

Project goal: implement spell checking for java code for all identifiers .

Mentors: Roman Ivanov, Andrei Paikin

Description:

The correct spelling of words in code is very important, since a typo in the name of method that is part of API could result in serious problem. Mistakes in names also make reading of code frustrating and misleading, especially when a typo in one letter makes developer to read javadoc or even implementation of the method. Two most popular IDEs (Eclipse and IntelliJ IDEA) already have spell-check ability. It will be beneficial for Checkstyle to have the same functionality that could be used in any Continuous Integration system by Command Line Interface or as part of build tool (maven, ant, gradle, ....) with wide range of options to customize to users needs. Features of existing spell-checkers need to be analyzed -
IntelliJ IDEA Spellchecking , Eclipse Spelling. There are numbers of open-source projects that do spell-check. It is ok to reuse them if license is compatible. Examples: https://code.google.com/archive/p/bspell/ , http://www.softcorporation.com/products/spellcheck/, ...


Project Name: Automate verification of documentation for all modules and generation of web site content based on javadoc of modules

Skills required: intermediate Java

Project type: creation of new functionality.

Project goal: organize documentation and automate its maintenance

Mentors: Roman Ivanov,

Description: Checkstyle is an active project. Our user base is always requesting existing functionality to be expanded and adding brand new features. As these features are added to the core Checkstyle project, documentation must be updated to notify users not involved in the request of its existence. Some changes can drastically change the default behavior of a module. Documentation becomes extremely important to help users understand how our modules work and can be configured to fit each unique persons’ needs without looking at the source behind the scenes.

Documentation is mostly a manual process and it is easy to miss updating it during the fix workflow. Missing documentation on functionality can be missed for years as users can only go by documentation to know what exists. Even if it is caught and tried to be added, some contributors are not aware of our best practices when it comes to writing said documentation.

We want to automate most of our documentation creation to help avoid the manual processes in creating it. Automation will ensure all documentation for checkstyle follows a strict standard that we define. Not only ensuring all configurable options are documented, it will help detect if current examples of usage are enough or if more are needed. It will ensure examples provided are valid, compilable if Java, and that it will or will not produce the violations for the configuration and check being described. For any new modules added, it will print out a template for the contributor to follow and fill in the required information specific to that check, like descriptions.

As part of this project, students must ensure all documentation verification pass for existing documentation and generation of xdoc/html content done automatically and do not need manual updates.


Project Name: Adaptation of Behavior Driven Development ideas for testing of source code validation algorithms

Skills required: basic Java

Project type: creation of new functionality.

Project goal: deep dive is testing approaches and archive easy to maintain tests

Mentors: Roman Ivanov, Daniel Mühlbachler, Ruslan Diachenko

Description:

Checkstyle is using test driven development. All are covered by tests - junit tests. We use 100% code coverage and 100% mutation coverage for code. Test code base is already in 3 times bigger than main codebase. Maintenance of test code base become an issue. Right now to verify certain module behavior maintainer need to read expected violation line numbers in Junit file and match all of them to Input file (target of validation). Such jumping between files (or long scrolling in web page) does not contribute to attentive verification and as result author and reviewer can miss some false-positive or false-negative, so this approach is error prone on big scale. Proposal is to focus on BDD concept “use human-readable descriptions of software user requirements as the basis for software tests.” to make Input file to define configuration of validation and as trailing comment define location of violation (example , unfortunately now comments are not validated). In this case reviewer will read only single file (no to-and-from scrolling is required) and visually see the place of violation as it would see it in IDE during code writing. We already have similar concept working during testing of Google Style, misplace of hint comment results in build failure (example: https://github.com/checkstyle/checkstyle/blob/master/src/it/resources/com/google/checkstyle/test/chapter5naming/rule526parameternames/InputLambdaParameterName.java )

It is adaptation of Behavior Driven Development ideas to testing of static code analysis algorithms by usage of Junit test framework. Not all will be migrated to BDD style, there will be a lot of pure Unit Tests that test specific method of class to cover cases that are reachable by testing with Input files. Student will see a benefits of both approaches and get to know pros and cons of each and will allow him to make good decisions on testing model in future.

Final result: Create test implementation to read config from Input file and validate that all lines that has violation has corresponding comment. Update all test code base to make it as the only way do testing in Checkstyle. Migrate to Truth for verification. Upgrade our mutation coverage (pitest) to use new mutators that are appeared in recent versions of pitest library.

Clone this wiki locally