Speedup our eight slowest pytests (one at a time please) #9718

cclauss · 2023-10-04T17:27:04Z

Feature description

At the end of our GitHub Actions build jobs, there is a list of the slowest ones.

Are there ways to speed up these tests without reducing our functionality or our code coverage?

Please only fix one algorithm per pull request.

16.64s call neural_network/simple_neural_network.py::neural_network.simple_neural_network.forward_propagation
6.15s call backtracking/word_search.py::backtracking.word_search.word_exists
4.61s call digital_image_processing/test_digital_image_processing.py::test_local_binary_pattern
2.59s call backtracking/power_sum.py::backtracking.power_sum.backtrack
2.19s call machine_learning/xgboost_regressor.py::machine_learning.xgboost_regressor.main
1.73s call maths/prime_numbers.py::maths.prime_numbers.slow_primes
1.43s call backtracking/power_sum.py::backtracking.power_sum.solve
1.00s call other/dijkstra_bankers_algorithm.py::other.dijkstra_bankers_algorithm.BankersAlgorithm.main
================= 1506 passed, 25 warnings in 71.23s (0:01:11) =================

Also, those 25 pytest warnings are worth fixing!!!

The text was updated successfully, but these errors were encountered:

tianyizheng02 · 2023-10-05T12:07:27Z

I pointed out in #8785 that neural_network/simple_neural_network.py has multiple issues (I don't think it even implements a neural network) and could do with a rewrite. Ideally whatever new implementation replaces it should run a lot more quickly.

Muhammadummerr · 2023-10-05T16:09:34Z

I checked the maths/prime_numbers.py file. In this file, there are three methods for finding prime numbers using different techniques. The simplest method, referred to as slow_prime (which uses a loop to iterate from 2 to n and print prime numbers), is taking a lot of time. Do we need it there? Since the other two methods are faster with just minor modifications for finding prime numbers, do we still need the slow_prime method in there?

cclauss · 2023-10-05T16:24:00Z

Can slow_prime() be tested with smaller numbers? We do not want to get rid of it but we want its test to finish faster.

Python/maths/prime_numbers.py

Lines 20 to 21 in 87494f1

    
               >>> list(slow_primes(10000))[-1] 
        
               9973

Let's lower the 10_000 to 1_000

Muhammadummerr · 2023-10-05T16:32:49Z

If 'slow_prime()' is tested on small test cases, then the other two methods will also be tested on small test cases to compare the time taken among them.

Muhammadummerr · 2023-10-05T16:37:14Z

If I reduce the test cases, do I need to raise a PR to check whether they finished faster, or is there any other method I can use to verify if the issue is resolved before submitting a PR?

cclauss · 2023-10-05T16:40:30Z

When things are run on CI platforms like GitHub Actions then an environment variable CI is defined.

https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables

Can you use https://docs.python.org/3/library/os.html#os.getenv to see if CI is defined? If CI is defined then set the number to 1_000 for all three tests and it if is not defined then set it to 10_000 for all three.

Muhammadummerr · 2023-10-05T16:49:58Z

Okay, thanks. Let me try.

Muhammadummerr · 2023-10-05T17:04:34Z

But if it is not defined, setting the test case to 10_000 will take the same amount of time as before, doesn't it?

cclauss · 2023-10-05T17:28:37Z

Yes, but we do not care because it no longer slows down our build process.

Muhammadummerr · 2023-10-05T17:47:34Z

I sped it up in #9851.

Muhammadummerr · 2023-10-05T19:11:21Z

I was reviewing the backtracking/power_sum.py code, and I noticed that the main issue causing slow execution is the large value of 'needed_sum' parameter.To improve performance, is it correct to reduce the 'needed_sum' value?

cclauss · 2023-10-05T19:49:23Z

@duongoku Would you be willing to advise @Muhammadummerr on the best way to speed up the slow doctests in backtracking/power_sum.py ?

Muhammadummerr · 2023-10-06T11:52:09Z

I would love to hear from @duongoku.

quant12345 · 2023-10-06T17:42:02Z

@cclauss tried backtracking/word_search.py to measure the execution time of each option:

code: called each option 10 times

"""
Author  : Alexander Pantyukhin
Date    : November 24, 2022

Task:
Given an m x n grid of characters board and a string word,
return true if word exists in the grid.

The word can be constructed from letters of sequentially adjacent cells,
where adjacent cells are horizontally or vertically neighboring.
The same letter cell may not be used more than once.

Example:

Matrix:
---------
|A|B|C|E|
|S|F|C|S|
|A|D|E|E|
---------

Word:
"ABCCED"

Result:
True

Implementation notes: Use backtracking approach.
At each point, check all neighbors to try to find the next letter of the word.

leetcode: https://leetcode.com/problems/word-search/

"""


def get_point_key(len_board: int, len_board_column: int, row: int, column: int) -> int:
    """
    Returns the hash key of matrix indexes.

    >>> get_point_key(10, 20, 1, 0)
    200
    """

    return len_board * len_board_column * row + column


def exits_word(
    board: list[list[str]],
    word: str,
    row: int,
    column: int,
    word_index: int,
    visited_points_set: set[int],
) -> bool:
    """
    Return True if it's possible to search the word suffix
    starting from the word_index.

    >>> exits_word([["A"]], "B", 0, 0, 0, set())
    False
    """

    if board[row][column] != word[word_index]:
        return False

    if word_index == len(word) - 1:
        return True

    traverts_directions = [(0, 1), (0, -1), (-1, 0), (1, 0)]
    len_board = len(board)
    len_board_column = len(board[0])
    for direction in traverts_directions:
        next_i = row + direction[0]
        next_j = column + direction[1]
        if not (0 <= next_i < len_board and 0 <= next_j < len_board_column):
            continue

        key = get_point_key(len_board, len_board_column, next_i, next_j)
        if key in visited_points_set:
            continue

        visited_points_set.add(key)
        if exits_word(board, word, next_i, next_j, word_index + 1, visited_points_set):
            return True

        visited_points_set.remove(key)

    return False


def word_exists(board: list[list[str]], word: str) -> bool:
    """
    >>> word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], "ABCCED")
    True
    >>> word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], "SEE")
    True
    >>> word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], "ABCB")
    False
    >>> word_exists([["A"]], "A")
    True
    >>> word_exists([["A","A","A","A","A","A"],
    ...              ["A","A","A","A","A","A"],
    ...              ["A","A","A","A","A","A"],
    ...              ["A","A","A","A","A","A"],
    ...              ["A","A","A","A","A","B"],
    ...              ["A","A","A","A","B","A"]],
    ...             "AAAAAAAAAAAAABB")
    False
    >>> word_exists([["A"]], 123)
    Traceback (most recent call last):
        ...
    ValueError: The word parameter should be a string of length greater than 0.
    >>> word_exists([["A"]], "")
    Traceback (most recent call last):
        ...
    ValueError: The word parameter should be a string of length greater than 0.
    >>> word_exists([[]], "AB")
    Traceback (most recent call last):
        ...
    ValueError: The board should be a non empty matrix of single chars strings.
    >>> word_exists([], "AB")
    Traceback (most recent call last):
        ...
    ValueError: The board should be a non empty matrix of single chars strings.
    >>> word_exists([["A"], [21]], "AB")
    Traceback (most recent call last):
        ...
    ValueError: The board should be a non empty matrix of single chars strings.
    """

    # Validate board
    board_error_message = (
        "The board should be a non empty matrix of single chars strings."
    )

    len_board = len(board)
    if not isinstance(board, list) or len(board) == 0:
        raise ValueError(board_error_message)

    for row in board:
        if not isinstance(row, list) or len(row) == 0:
            raise ValueError(board_error_message)

        for item in row:
            if not isinstance(item, str) or len(item) != 1:
                raise ValueError(board_error_message)

    # Validate word
    if not isinstance(word, str) or len(word) == 0:
        raise ValueError(
            "The word parameter should be a string of length greater than 0."
        )

    len_board_column = len(board[0])
    for i in range(len_board):
        for j in range(len_board_column):
            if exits_word(
                board, word, i, j, 0, {get_point_key(len_board, len_board_column, i, j)}
            ):
                return True

    return False


if __name__ == "__main__":
    import doctest

    doctest.testmod()

    import timeit

    n = 10

    ABCCED = timeit.timeit('word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], '
                        '"ABCCED")', globals=globals(), number=n)

    SEE = timeit.timeit('word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]],'
                        ' "SEE")', globals=globals(), number=n)

    ABCB = timeit.timeit('word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], '
                        '"ABCB")', globals=globals(), number=n)

    A = timeit.timeit('word_exists([["A"]], "A")', globals=globals(), number=n)


    AAAAAAAAAAAAABB = timeit.timeit('word_exists([["A","A","A","A","A","A"], ["A","A","A","A","A","A"], '
                        '["A","A","A","A","A","A"], ["A","A","A","A","A","A"], ["A","A","A","A","A","B"]'
                        ', ["A","A","A","A","B","A"]],"AAAAAAAAAAAAABB")',
                        globals=globals(), number=n)

    print(f"{ABCCED} - ABCCED\n{SEE} - SEE\n{ABCB} - ABCB\n{A} - A\n"
          f"{AAAAAAAAAAAAABB} - AAAAAAAAAAAAABB")

The longest time, as expected, is 'AAAAAAAAAAAAABB', it is a gigantic number of times longer.

Output:

0.00034418300128891133 - ABCCED
0.0004890400014119223 - SEE
0.0004235089982103091 - ABCB
4.959400030202232e-05 - A
29.599618363001355 - AAAAAAAAAAAAABB

quant12345 · 2023-10-06T18:52:21Z

@cclauss backtracking/word_search.py: can I replace this large(AAAAAAAAAAAAABB) array with a small(ABB) one?

>>> word_exists([["B", "A", "A"], ["A", "A", "A"], ["A", "B", "A"]], "ABB")
False

quant12345 · 2023-10-09T09:58:01Z

If use the image lena_small.jpg instead of lena.jpg in: test_local_binary_pattern, then the test will be much faster. It is not clear why lena.jpg was used? At least I don't see any reason for this.

Made a pull request #10161.

Made a pull request #10188 word_search - replacing the example in doctest.

Muhammadummerr · 2023-10-09T16:39:25Z

Check the PR #9978.

tianyizheng02 · 2023-10-26T15:28:27Z

@cclauss Every algorithm on your list has been sped up (except for simple_neural_network.py, which needs to be rewritten altogether), and every fixable warning has been fixed. Do you want to update this issue's "slowest 10" checklist, or do you want to close this issue?

Closes #9718

cclauss · 2023-10-27T04:45:38Z

@CaioCordeiro Would you be willing to look at neural_network/simple_neural_network.py to see if you can speed up its tests? As discussed above (#9718 (comment)), we have sped up our other slow tests but have not been able to speed up this one. I tried without success in #11013

cclauss added the enhancement This PR modified some existing files label Oct 4, 2023

tianyizheng02 added the help wanted label Oct 4, 2023

This was referenced Oct 9, 2023

Updated prime_numbers.py testcases. #9851

Merged

test_digital_image_processing -> test_local_binary_pattern replacing a large image with a smaller one #10161

Merged

This was referenced Oct 9, 2023

backtracking -> word_search - replacing the example in doctest #10188

Merged

Gaussian_elemination - change to remove warning #10221

Merged

k_means_clust - change to remove warning #10244

Merged

hollowcrust mentioned this issue Oct 11, 2023

Update doctests for power_sum.py #10288

Closed

14 tasks

This was referenced Oct 14, 2023

Updated test cases of power_sum.py #9978

Merged

Remove doctest in xgboost_regressor.py main function #10422

Merged

Fix numpy deprecation warning in 2_hidden_layers_neural_network.py #10424

Merged

This was referenced Oct 14, 2023

Update simple_neural_network.py #10433

Closed

Update simple_neural_network.py #10435

Closed

Manoj-Routhu mentioned this issue Oct 14, 2023

Update simple_neural_network.py Manoj-Routhu/Python#1

Merged

14 tasks

tianyizheng02 mentioned this issue Oct 23, 2023

Speed up dijkstra_bankers_algorithm.py #10861

Merged

15 tasks

cclauss added a commit that referenced this issue Oct 26, 2023

simple_neural_network.py: Fewer forward propogations to speed tests

1b31c25

Closes #9718

cclauss mentioned this issue Oct 26, 2023

simple_neural_network.py: Fewer forward propogations to speed tests #11013

Closed

15 tasks

cclauss mentioned this issue Oct 28, 2023

Create ipv4_conversion.py #11008

Merged

15 tasks

cclauss closed this as completed in #11008 Oct 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup our eight slowest pytests (one at a time please) #9718

Speedup our eight slowest pytests (one at a time please) #9718

cclauss commented Oct 4, 2023 •

edited by tianyizheng02

tianyizheng02 commented Oct 5, 2023 •

edited

Muhammadummerr commented Oct 5, 2023

cclauss commented Oct 5, 2023 •

edited

Muhammadummerr commented Oct 5, 2023

Muhammadummerr commented Oct 5, 2023

cclauss commented Oct 5, 2023 •

edited

Muhammadummerr commented Oct 5, 2023

Muhammadummerr commented Oct 5, 2023 •

edited

cclauss commented Oct 5, 2023

Muhammadummerr commented Oct 5, 2023

Muhammadummerr commented Oct 5, 2023 •

edited

cclauss commented Oct 5, 2023

Muhammadummerr commented Oct 6, 2023

quant12345 commented Oct 6, 2023 •

edited

quant12345 commented Oct 6, 2023 •

edited

quant12345 commented Oct 9, 2023 •

edited

Muhammadummerr commented Oct 9, 2023

tianyizheng02 commented Oct 26, 2023

cclauss commented Oct 27, 2023

Speedup our eight slowest pytests (one at a time please) #9718

Speedup our eight slowest pytests (one at a time please) #9718

Comments

cclauss commented Oct 4, 2023 • edited by tianyizheng02

Feature description

tianyizheng02 commented Oct 5, 2023 • edited

Muhammadummerr commented Oct 5, 2023

cclauss commented Oct 5, 2023 • edited

Muhammadummerr commented Oct 5, 2023

Muhammadummerr commented Oct 5, 2023

cclauss commented Oct 5, 2023 • edited

Muhammadummerr commented Oct 5, 2023

Muhammadummerr commented Oct 5, 2023 • edited

cclauss commented Oct 5, 2023

Muhammadummerr commented Oct 5, 2023

Muhammadummerr commented Oct 5, 2023 • edited

cclauss commented Oct 5, 2023

Muhammadummerr commented Oct 6, 2023

quant12345 commented Oct 6, 2023 • edited

quant12345 commented Oct 6, 2023 • edited

quant12345 commented Oct 9, 2023 • edited

Muhammadummerr commented Oct 9, 2023

tianyizheng02 commented Oct 26, 2023

cclauss commented Oct 27, 2023

cclauss commented Oct 4, 2023 •

edited by tianyizheng02

tianyizheng02 commented Oct 5, 2023 •

edited

cclauss commented Oct 5, 2023 •

edited

cclauss commented Oct 5, 2023 •

edited

Muhammadummerr commented Oct 5, 2023 •

edited

Muhammadummerr commented Oct 5, 2023 •

edited

quant12345 commented Oct 6, 2023 •

edited

quant12345 commented Oct 6, 2023 •

edited

quant12345 commented Oct 9, 2023 •

edited