Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup our eight slowest pytests (one at a time please) #9718

Closed
7 of 8 tasks
cclauss opened this issue Oct 4, 2023 · 19 comments · Fixed by #11008
Closed
7 of 8 tasks

Speedup our eight slowest pytests (one at a time please) #9718

cclauss opened this issue Oct 4, 2023 · 19 comments · Fixed by #11008
Labels
enhancement This PR modified some existing files help wanted

Comments

@cclauss
Copy link
Member

cclauss commented Oct 4, 2023

Feature description

At the end of our GitHub Actions build jobs, there is a list of the slowest ones.

Are there ways to speed up these tests without reducing our functionality or our code coverage?

Please only fix one algorithm per pull request.

  • 16.64s call neural_network/simple_neural_network.py::neural_network.simple_neural_network.forward_propagation
  • 6.15s call backtracking/word_search.py::backtracking.word_search.word_exists
  • 4.61s call digital_image_processing/test_digital_image_processing.py::test_local_binary_pattern
  • 2.59s call backtracking/power_sum.py::backtracking.power_sum.backtrack
  • 2.19s call machine_learning/xgboost_regressor.py::machine_learning.xgboost_regressor.main
  • 1.73s call maths/prime_numbers.py::maths.prime_numbers.slow_primes
  • 1.43s call backtracking/power_sum.py::backtracking.power_sum.solve
  • 1.00s call other/dijkstra_bankers_algorithm.py::other.dijkstra_bankers_algorithm.BankersAlgorithm.main
    ================= 1506 passed, 25 warnings in 71.23s (0:01:11) =================

Also, those 25 pytest warnings are worth fixing!!!

@cclauss cclauss added the enhancement This PR modified some existing files label Oct 4, 2023
@tianyizheng02
Copy link
Contributor

tianyizheng02 commented Oct 5, 2023

I pointed out in #8785 that neural_network/simple_neural_network.py has multiple issues (I don't think it even implements a neural network) and could do with a rewrite. Ideally whatever new implementation replaces it should run a lot more quickly.

@Muhammadummerr
Copy link
Contributor

I checked the maths/prime_numbers.py file. In this file, there are three methods for finding prime numbers using different techniques. The simplest method, referred to as slow_prime (which uses a loop to iterate from 2 to n and print prime numbers), is taking a lot of time. Do we need it there? Since the other two methods are faster with just minor modifications for finding prime numbers, do we still need the slow_prime method in there?

@cclauss
Copy link
Member Author

cclauss commented Oct 5, 2023

Can slow_prime() be tested with smaller numbers? We do not want to get rid of it but we want its test to finish faster.

>>> list(slow_primes(10000))[-1]
9973

Let's lower the 10_000 to 1_000

@Muhammadummerr
Copy link
Contributor

If 'slow_prime()' is tested on small test cases, then the other two methods will also be tested on small test cases to compare the time taken among them.

@Muhammadummerr
Copy link
Contributor

If I reduce the test cases, do I need to raise a PR to check whether they finished faster, or is there any other method I can use to verify if the issue is resolved before submitting a PR?

@cclauss
Copy link
Member Author

cclauss commented Oct 5, 2023

When things are run on CI platforms like GitHub Actions then an environment variable CI is defined.

Can you use https://docs.python.org/3/library/os.html#os.getenv to see if CI is defined? If CI is defined then set the number to 1_000 for all three tests and it if is not defined then set it to 10_000 for all three.

@Muhammadummerr
Copy link
Contributor

Okay, thanks. Let me try.

@Muhammadummerr
Copy link
Contributor

Muhammadummerr commented Oct 5, 2023

But if it is not defined, setting the test case to 10_000 will take the same amount of time as before, doesn't it?

@cclauss
Copy link
Member Author

cclauss commented Oct 5, 2023

Yes, but we do not care because it no longer slows down our build process.

@Muhammadummerr
Copy link
Contributor

I sped it up in #9851.

@Muhammadummerr
Copy link
Contributor

Muhammadummerr commented Oct 5, 2023

I was reviewing the backtracking/power_sum.py code, and I noticed that the main issue causing slow execution is the large value of 'needed_sum' parameter.To improve performance, is it correct to reduce the 'needed_sum' value?

@cclauss
Copy link
Member Author

cclauss commented Oct 5, 2023

@duongoku Would you be willing to advise @Muhammadummerr on the best way to speed up the slow doctests in backtracking/power_sum.py ?

@Muhammadummerr
Copy link
Contributor

I would love to hear from @duongoku.

@quant12345
Copy link
Contributor

quant12345 commented Oct 6, 2023

@cclauss tried backtracking/word_search.py to measure the execution time of each option:

code: called each option 10 times
"""
Author  : Alexander Pantyukhin
Date    : November 24, 2022

Task:
Given an m x n grid of characters board and a string word,
return true if word exists in the grid.

The word can be constructed from letters of sequentially adjacent cells,
where adjacent cells are horizontally or vertically neighboring.
The same letter cell may not be used more than once.

Example:

Matrix:
---------
|A|B|C|E|
|S|F|C|S|
|A|D|E|E|
---------

Word:
"ABCCED"

Result:
True

Implementation notes: Use backtracking approach.
At each point, check all neighbors to try to find the next letter of the word.

leetcode: https://leetcode.com/problems/word-search/

"""


def get_point_key(len_board: int, len_board_column: int, row: int, column: int) -> int:
    """
    Returns the hash key of matrix indexes.

    >>> get_point_key(10, 20, 1, 0)
    200
    """

    return len_board * len_board_column * row + column


def exits_word(
    board: list[list[str]],
    word: str,
    row: int,
    column: int,
    word_index: int,
    visited_points_set: set[int],
) -> bool:
    """
    Return True if it's possible to search the word suffix
    starting from the word_index.

    >>> exits_word([["A"]], "B", 0, 0, 0, set())
    False
    """

    if board[row][column] != word[word_index]:
        return False

    if word_index == len(word) - 1:
        return True

    traverts_directions = [(0, 1), (0, -1), (-1, 0), (1, 0)]
    len_board = len(board)
    len_board_column = len(board[0])
    for direction in traverts_directions:
        next_i = row + direction[0]
        next_j = column + direction[1]
        if not (0 <= next_i < len_board and 0 <= next_j < len_board_column):
            continue

        key = get_point_key(len_board, len_board_column, next_i, next_j)
        if key in visited_points_set:
            continue

        visited_points_set.add(key)
        if exits_word(board, word, next_i, next_j, word_index + 1, visited_points_set):
            return True

        visited_points_set.remove(key)

    return False


def word_exists(board: list[list[str]], word: str) -> bool:
    """
    >>> word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], "ABCCED")
    True
    >>> word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], "SEE")
    True
    >>> word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], "ABCB")
    False
    >>> word_exists([["A"]], "A")
    True
    >>> word_exists([["A","A","A","A","A","A"],
    ...              ["A","A","A","A","A","A"],
    ...              ["A","A","A","A","A","A"],
    ...              ["A","A","A","A","A","A"],
    ...              ["A","A","A","A","A","B"],
    ...              ["A","A","A","A","B","A"]],
    ...             "AAAAAAAAAAAAABB")
    False
    >>> word_exists([["A"]], 123)
    Traceback (most recent call last):
        ...
    ValueError: The word parameter should be a string of length greater than 0.
    >>> word_exists([["A"]], "")
    Traceback (most recent call last):
        ...
    ValueError: The word parameter should be a string of length greater than 0.
    >>> word_exists([[]], "AB")
    Traceback (most recent call last):
        ...
    ValueError: The board should be a non empty matrix of single chars strings.
    >>> word_exists([], "AB")
    Traceback (most recent call last):
        ...
    ValueError: The board should be a non empty matrix of single chars strings.
    >>> word_exists([["A"], [21]], "AB")
    Traceback (most recent call last):
        ...
    ValueError: The board should be a non empty matrix of single chars strings.
    """

    # Validate board
    board_error_message = (
        "The board should be a non empty matrix of single chars strings."
    )

    len_board = len(board)
    if not isinstance(board, list) or len(board) == 0:
        raise ValueError(board_error_message)

    for row in board:
        if not isinstance(row, list) or len(row) == 0:
            raise ValueError(board_error_message)

        for item in row:
            if not isinstance(item, str) or len(item) != 1:
                raise ValueError(board_error_message)

    # Validate word
    if not isinstance(word, str) or len(word) == 0:
        raise ValueError(
            "The word parameter should be a string of length greater than 0."
        )

    len_board_column = len(board[0])
    for i in range(len_board):
        for j in range(len_board_column):
            if exits_word(
                board, word, i, j, 0, {get_point_key(len_board, len_board_column, i, j)}
            ):
                return True

    return False


if __name__ == "__main__":
    import doctest

    doctest.testmod()

    import timeit

    n = 10

    ABCCED = timeit.timeit('word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], '
                        '"ABCCED")', globals=globals(), number=n)

    SEE = timeit.timeit('word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]],'
                        ' "SEE")', globals=globals(), number=n)

    ABCB = timeit.timeit('word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], '
                        '"ABCB")', globals=globals(), number=n)

    A = timeit.timeit('word_exists([["A"]], "A")', globals=globals(), number=n)


    AAAAAAAAAAAAABB = timeit.timeit('word_exists([["A","A","A","A","A","A"], ["A","A","A","A","A","A"], '
                        '["A","A","A","A","A","A"], ["A","A","A","A","A","A"], ["A","A","A","A","A","B"]'
                        ', ["A","A","A","A","B","A"]],"AAAAAAAAAAAAABB")',
                        globals=globals(), number=n)

    print(f"{ABCCED} - ABCCED\n{SEE} - SEE\n{ABCB} - ABCB\n{A} - A\n"
          f"{AAAAAAAAAAAAABB} - AAAAAAAAAAAAABB")

The longest time, as expected, is 'AAAAAAAAAAAAABB', it is a gigantic number of times longer.

Output:

0.00034418300128891133 - ABCCED
0.0004890400014119223 - SEE
0.0004235089982103091 - ABCB
4.959400030202232e-05 - A
29.599618363001355 - AAAAAAAAAAAAABB

@quant12345
Copy link
Contributor

quant12345 commented Oct 6, 2023

@cclauss backtracking/word_search.py: can I replace this large(AAAAAAAAAAAAABB) array with a small(ABB) one?

>>> word_exists([["B", "A", "A"], ["A", "A", "A"], ["A", "B", "A"]], "ABB")
False

@quant12345
Copy link
Contributor

quant12345 commented Oct 9, 2023

If use the image lena_small.jpg instead of lena.jpg in: test_local_binary_pattern, then the test will be much faster. It is not clear why lena.jpg was used? At least I don't see any reason for this.

Made a pull request #10161.

Made a pull request #10188 word_search - replacing the example in doctest.

@Muhammadummerr
Copy link
Contributor

Check the PR #9978.

@tianyizheng02
Copy link
Contributor

@cclauss Every algorithm on your list has been sped up (except for simple_neural_network.py, which needs to be rewritten altogether), and every fixable warning has been fixed. Do you want to update this issue's "slowest 10" checklist, or do you want to close this issue?

@cclauss
Copy link
Member Author

cclauss commented Oct 27, 2023

@CaioCordeiro Would you be willing to look at neural_network/simple_neural_network.py to see if you can speed up its tests? As discussed above (#9718 (comment)), we have sped up our other slow tests but have not been able to speed up this one. I tried without success in #11013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This PR modified some existing files help wanted
Projects
None yet
4 participants