Skip to content

Checkstyle GSoC 2017 Project Ideas

Roman Ivanov edited this page Apr 20, 2024 · 12 revisions

5 Selected projects to gsoc 2017:



Project Name: Multi-thread mode for Java files processing

Skills required: intermediate Java, knowledge of Checkstyle code base.

Project type: new feature implementation.

Project goal: to improve performance of validation by introducing special multi-threading mode for Java file processing.

Mentors: Roman Ivanov, Ilja Dubinin, Richard Veach, Vladislav Lisetskii

Description:

Validation of source code in big projects can take a lot of time, which can force users to stop using Checkstyle during on-commit builds or stop using it in IDEs as it can slow down IDE significantly. Fixing validation issues right before release or ones in a week is not a good idea as it can result in major refactoring and can take significantly more time than it could during writing a code. Validation in compliance with Google Java Style Guide on whole Guava project (~1700 files) takes about 5 min, validation on openjdk (~16400 files) sources takes 2 hours 30 min. Improving performance of validation will make Checkstyle more desirable tool to be used by users in IDEs and in continuous integration systems (validation of each commit).

Tasks to be done during project:

  • analyze the whole set of Rules and find Rules that already comply with multi-thread requirements and mark them by special annotation and run them in multi-thread mode;
  • use standard Java threads or use lightweight threads (Quasar, Akka, ... ) to process Java files;
  • provide detailed report of performance improvement and recommendations on how to design Rule implementation to be multi-thread compliant;
  • provide report on how performance can improve after rewriting Rule to multi-thread compliant algorithm;
  • extensive testing on a variety of open-source projects.

Prove of necessity: issue138; issue600; results of open survey; thoughts of maintainers.


Project Name: Flexible Suppression model

Skills required: intermediate Java.

Project type: new feature implementation.

Project goal: to ease user experience in introducing Checkstyle to a project.

Mentors: Roman Ivanov, Richard Veach, Andrei Selkin, Vladislav Lisetskii

Description:

To start using Checkstyle in big projects is a big challenge:

  • code clean-up (that will be a result of Checkstyle validation) is usually out of project development plan;
  • developers are overloaded with functional issues that cause problems to users, so functional problems are always a priority;
  • all Checkstyle fixes have to be applied gradually and without postponing releases, but current Suppression model is not able to allow to do this easily;

The only thing that prevents users from using Checkstyle is that it requires huge and unavoidable code refactoring at the beginning.

Checkstyle needs to provide users the ability to enforce Rules on newly created code lines and suppress violations on old/legacy code till engineers are ready for clean-up. We need to invent new Suppression model that is flexible to code changes. This will allow user to have legacy code and new code in the same file and not suffer from updating suppression configuration on any code changes. Unlike of suppression that is based on lines' numbers in file, new model should be based on [Abstract Syntax Tree (AST) ](http://en.wikipedia.org /wiki/Abstract_syntax_tree) structure.

In scope of the project it is required to create Suppression file generator to ease suppression configuration creation for legacy code.

Prove of necessity: discussion with Google Guava team; discussion with openjdk team; results of open survey.


Project Name: Upgrade Java Grammar from ANTLR2 to ANTLR4

Skills required: basic Java and experience with syntax analysis.

Project type: new feature implementation.

Project goal: to update core library to the latest version in order to simplify Java grammar support.

Mentors: Roman Ivanov, Ilja Dubinin, Richard Veach, Vladislav Lisetskii

Description: Checkstyle needs to have new Java grammar that is based on ANTLR4 version. This task is very difficult but it is critical for Checkstyle as ANTLR2 library is not supported (from 2006) and is far less efficient. Old version has a bunch of syntax analysis limitations that have already been resolved in ANTLR4. Our team is already experiencing difficulties with support of current grammar as it is too complicated due to limited parsing abilities of ANTLR2.

New features of ANTLR4 that we need:

  • Antlr4 got support of direct left recursion that will simplify grammar significantly. We already have a lot of warnings of non-deterministic behaviour that is not possible to resolve in ANTLR2, example.
  • Antlr4 has a bunch of UI tools that helps user to debug grammar and see how parser will work: IDE plugins, Parse Tree Inspector UI application from ANTLR distribution package.

Prove of necessity: results of open survey , Example of annoying warnings during the build


Project Name: Regression Testing Tool and HTML Report Generator for Pull Request

Skills required: basic Java, or Shell, or Groovy, or Scala; basic understanding of testing principles.

Project type: creation of testing tools.

Project goal: to enforce quality and ease new Rule implementation to project.

Mentors: Roman Ivanov, Richard Veach, Andrei Selkin, Vladislav Lisetskii

Description:

Checkstyle needs a tool that will do regression testing based on proposed patch (PullRequest). Tool needs to ensure that after fixing an issue new problems and unexpected behaviours are not introduced. It is required to parse git changes and find changed modules. Based on the list of changed modules, it should generate testing configurations(checkstyle configuration files) and run them with binaries based on code before change and with proposed change. Diff report(differences of violations) need to be generated and ready to be shared in web.

Each module could have set of manually prepared configuration chunks that should be used if that module was changed. But full automation is highly desirable.

Prove of necessity: issues on github as request for new validations ; mail-list thread that describes reason of temporal moratorium on new Rules in Checkstyle; official sandbox project with about 40 additional Rules ; validation ideas that would be good to borrow from Groovy experience; just another custom Rules for Checkstyle: 1, 2, 3, 3, 4, 5, 6, ... ; results of open survey, wiki page with ideas, link to issues that are created base on discussions in the team.


Project Name: Practice What You Preach

Skills required: intermediate Java.

Project type: infrastructure update, code refactoring.

Project goal: to ensure quality and prove the idea that following to style-guides and most best-practices is possible and beneficial.

Mentors: Roman Ivanov, Andrei Selkin, Vladislav Lisetskii

Description: The main principle to ensure quality is to use your own tool. We already do this for main part of our source code. But our project is not the only tool that enforces quality of source code. We need to reuse well known static analysis tool to check our main and test code automatically for each PullRequest. Tools: PMD, Sonar, Eclipse, IntelliJ IDEA Inspection, [Huntbugs] (https://github.com/checkstyle/checkstyle/issues/3556), Pitest on Checks, Pitest on whole code.

All minor issues need to be caught and reported automatically to the author of changes. This will help us to minimize time of code reviewer and let author and code-reviewer discuss design and performance improvements. This will speed-up process of patch acceptance as first part of code review will be done automatically by such tools.

In scope of this project we need to use all (as much as reasonable) rules and resolve all validation problems that such tools found in both main and test parts of Checkstyle code. Checkstyle build (PullRequest validation) need to be updated to fail if any violation found.

Prove of necessity: example of proposed changes to Rule that were rejected on moment of applying new behavior to Checkstyle code, issue on github pull request with long code-review; mail-list that show how much code review stages we do before accepting new code, see number of posts; results of open survey.


Project Name: Coverage of Documentation Comments Style Guide and performance optimization of Javadoc parser

Skills required: basic Java.

Project type: new feature implementation.

Project goal: to make well-known quality practices publicly available.

Mentors: Roman Ivanov, Vladislav Lisetskii

Description:

Project will mainly be focusing on automation of Documentation Comments (javadoc) guidelines by Checkstyle Checks. Reliable comments parsing was a major improvement in Checkstyle during GSoC 2014, archived results need to be reused to reliably implement automation of Javadoc best practices.

Separate configuration file with newly created Checks need to be created. Best practices in documentation make sense not for all projects. Javadoc validation matters only for library projects that need to expose online documentation in web publicly.

The result of this project will be an configuration file with the maximum possible coverage of Comment style guide. Report should look like Google's Java Style Checkstyle Coverage.

Second part of the project is to do performance optimization of javadoc parsing to make javadoc validation as desirable as java code validation for each change. Right now activation of any Javadoc Check significantly increases the overall validation time. For example, on Guava project performance degradation is in 150 times.

Prove of necessity: javadoc issues on github; performance issue; results of open survey.


Project Name: Optimization of distance between methods in single Java class

Skills required: basic Java , good analytical abilities, good background in mathematics.

Project type: new feature implementation.

Project goal: to make quality practices automated and publicly available.

Mentors: Roman Ivanov, Andrei Selkin, Vladislav Lisetskii

Description:

This tasks is ambitious attempt to improve code read-ability by minimizing user jump/scrolls in source file to look at details of method implementation when user looks at method first usage.

It is required to analyse a lot of code and find a model to minimize distance between methods first usage and method declaration in the same file and respect users preferences to keep grouped overloaded and overridden methods together. Some other preferences may appear during investigation of open-source projects.

First step is already done by our team, we created a web service that already calculate distances between methods and make DSM matrix to ease analysis - methods-distance. We already practice it in our project.

As a second step it is required to use a matrix of distances between methods and optimize it by some empiric algorithm to allow user define expected model of class by arguments. This will allow to use this algorithm as a Check to enforce code structure automatically during build time.

Results of the project:

  • article with all details of analysis and algorithm details;
  • new Checkstyle's Check with optimization algorithm to share the algorithm with whole java community.

Prove of necessity: we have a number of PRs where contributors put new methods at any possible place in a class but better place is close to first usage. Example #1, Example #2, Example #3, ....


Project Name: Reconcile formatters of Eclipse , NetBeans and IntelijIdea IDEs by Checkstyle config.

Skills required: basic Java.

Project type: new feature implementation, analysis of existing IDE features.

Project goal: to make well-known quality practices publicly available.

Mentors: Roman Ivanov, Vladislav Lisetskii

Description:

Usage of different IDEs in the same team is already a serious problem, as different IDEs format code base on their own rules and configurations. Unwanted formatting changes happen to code which complicate code-review process. Problem become more acute when project use static analysis tool like Checkstyle that has a wide range of code formatting Checks.

It is required to make it possible to use the same Checkstyle config to work in IDEs without conflicts with IDEs internal formatters. This will help team members be independent on IDE choice but at the same time keep the same format and code style throughout the team.

Main focus of this project is the analysis of formatting abilities of IDEs (indentation, imports order, declaration order, separator/operator wrap, .....) . Update existing Checkstyle Rules to be able to work in the similar and non-conflicting way.

Results of the project:

  • create configuration for IDEs for Checkstyle project to let Checkstyle team use it and auto-format code to conform with checkstyle_check.xml file that is used by Continuous Integration.
  • create Checkstyle config that follows default Eclispe formatting + inspection rules
  • create Checkstyle config that follows default IntelijIdea formatting + inspection rules
  • create Checkstyle config that follows default NetBeans formatting + inspection rules

Prove of necessity: mail-list post #1, mail-list post #2, mail-list post #3 , discussion #1


Project Name: Sun and Open JDK Code convention coverage

Skills required: basic Java.

Project type: new feature implementation.

Project goal: to make well-known quality practices publicly available.

Mentors: Roman Ivanov, Richard Veach, Vladislav Lisetskii

Description:

Sun Code Convention was one of the first guidelines on how to write Java code. Sun Code Convention is marked as outdated (because of date of last update made in it) but best practices described there do not have an expiration date. [OpenJDK Java Style Guidelines] (http://cr.openjdk.java.net/~alundblad/styleguide/) is close to the final version and most likely will be successor of Sun Code Convention. But there is a number of projects in Apache that still follow Sun rules, so both configurations are in need by community.

Sun Code Convention is already partly covered by Checkstyle. A lot of validation Rules were added and changed in Checkstyle from the time when Sun's configuration was created (2004 year).

During the project it is required to review both documents in detail and prove publicly that Checkstyle covers all guideline rules. Missed functionality needs to be created, blocking bugs need to be fixed. Page Sun's Java Style Checkstyle Coverage needs to be updated. New page "OpenJDK's Java Style Checkstyle Coverage" need to be created. Both pages need to be formatted in the same way as it is done for Google's Java Style Checkstyle Coverage.

Prove of necessity: javadoc issues on github; results of open survey.


Project Name: Spellcheck of Identifiers by English dictionary

Skills required: intermediate Java.

Project type: new feature implementation.

Project goal: implement spell checking for java code for all identifiers .

Mentors: Roman Ivanov, Ilja Dubinin, Vladislav Lisetskii

Description:

The correct spelling of words in code is very important, since a typo in the name of method that is part of API could result in serious problem. Mistakes in names also make reading of code frustrating and misleading, especially when a typo in one letter makes developer to read javadoc or even implementation of the method. Two most popular IDEs (Eclipse and InteliJ IDEA) already have spell-check ability. It will be beneficial for Checkstyle to have the same functionality that could be used in any Continuous Integration system by Command Line Interface or as part of build tool (maven, ant, gradle, ....) with wide range of options to customize to users needs. Features of existing spell-checkers need to be analysed -
IntelliJ IDEA Spellchecking , [Eclipse Spelling] (http://help.eclipse.org/mars/index.jsp?topic=%2Forg.eclipse.platform.doc.user%2Freference%2Fref-36.htm). There are numbers of open-source projects that do spell-check. It is ok to reuse them if licence is compatible. Examples: https://code.google.com/archive/p/bspell/ , http://www.softcorporation.com/products/spellcheck/, ...


Project Name: Generation of web site content for all Checkstyle Modules from Javadoc

Skills required: basic Java.

Project type: new feature implementation.

Project goal: simplification of Checkstyle development process .

Mentors: Roman Ivanov, Richard Veach, Andrei Selkin, Vladislav Lisetskii

Description:

There are two places in Checkstyle code base where developer put documentation in: Javadoc and Xdoc-maven files. One of the example is for ConstantNameCheck - Javadoc in source, Xdoc source. Site generation results in generated html javadoc, generated html page. Synchronisation of these two files is responsibility of either the author of the module or the author of change.

Right now that process is manual, time consuming and error prone. This leads to mistakes in documentation so users are seriously affected by missed details of Checkstyle's Checks implementation. This problem becomes even more acute when during patch acceptance(code-review) and testing process, functionality is changed a few times but documentation in Xdoc files left unchanged.

It is required to create:

  • tool that do generation of xdoc files from javadoc comments of modules.
  • tool needs to become part of Chekstyle build system to generate site content during "site" phase of build.

Prove of necessity: discussion #1, in all commits author need to update javadoc and then do the same update in xdoc manually. discussion #2 with real mock of how it should be done

Clone this wiki locally