Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record: skip whitespaces after shebang for scripts #1789

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

GabrielKimm
Copy link

@GabrielKimm GabrielKimm commented Jul 30, 2023

Python tracing won't work when the shebang line has a space like below:

" #! /usr/bin/env python3 "

This patch makes uftrace to understand the above shebang as well.

Fixed: #1690
Signed-off-by: Gabriel Kim 0xgabriel.kim@gmail.com

@GabrielKimm GabrielKimm force-pushed the identity_shebang branch 3 times, most recently from 6aa6180 to 534a366 Compare July 30, 2023 11:00
@honggyukim
Copy link
Collaborator

Hi, I think we can keep str_ltrim and str_rtrim, but just provide str_trim calling both functions inside.

@GabrielKimm
Copy link
Author

GabrielKimm commented Jul 30, 2023

str_ltrim and str_rtrim

Hi. I'm asking to check if it's the part I understand.
Do you mean to keep the str_ltrim function and str_rtrim function, and let them call str_trim inside each of the two functions?

@honggyukim
Copy link
Collaborator

No, I mean the opposite. The str_trim can be implemented as follows.

str_ltrim(s);
str_rtrim(s);

@GabrielKimm
Copy link
Author

As you said, I changed it to call str_ltrim and str_rtrim inside the str_trim function.

@GabrielKimm
Copy link
Author

I modified the code to call str_ltrim and str_rtrim to handle the spaces at the left and right ends, but it still cannot handle the two or more spaces that exist in the middle of the script.

Would it be good to add a str_mtrim function that handles the space in the middle of the script?

Copy link
Owner

@namhyung namhyung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your commit has unrelated changes. Please check.

@GabrielKimm
Copy link
Author

Oh! I'm sorry. I modified it to Commit except for the unrelated changes.

Copy link
Owner

@namhyung namhyung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should mention #1789 instead of #1690. Also not sure if it correctly handles spaces before "#!".

@honggyukim
Copy link
Collaborator

I think you should mention #1789 instead of #1690.

#1789 is the number of this PR. Do you mean something else?

@GabrielKimm GabrielKimm force-pushed the identity_shebang branch 6 times, most recently from 0e9aafa to 00300fd Compare August 31, 2023 10:26
Python tracing won't work when the shebang line has a space like below:

" #! /usr/bin/env python3 "

This patch makes uftrace to understand the above shebang as well.

Fixed: namhyung#1690

Signed-off-by: Gabriel Kim <0xgabriel.kim@gmail.com>
@GabrielKimm
Copy link
Author

Hello,

I've implemented a new function called str_trim which encapsulates our existing functions str_ltrim and str_rtrim, as well as a new function str_mtrim. The str_trim function is designed to be called once to handle all kinds of whitespace trimming needs for the shebang lines.

  • str_ltrim: Handles left-side whitespaces.
  • str_rtrim: Handles right-side whitespaces.
  • str_mtrim: Newly introduced to handle multiple spaces between characters in the middle of the string.

This structure enhances modularity and reusability, and it also solves the issue regarding shebang lines that have more than one space between characters.

As for mentioning issue numbers, there has been some discussion whether to mention #1690 or #1789. Could we get a final word on this?

Thank you for your time and consideration.

@namhyung
Copy link
Owner

I think it's ok.. but not clear what the problem was. Do you have an error message without this change or command line output before and after it?

@GabrielKimm
Copy link
Author

GabrielKimm commented Aug 31, 2023

I think it's ok.. but not clear what the problem was. Do you have an error message without this change or command line output before and after it?

Thank you for reviewing the PR and for your feedback. Initially, only str_ltrim was utilized for handling shebang lines, leaving us with the ability to only trim leading whitespaces. Although str_rtrim was implemented, it was not actually called, making it ineffective for trimming trailing whitespaces. This was a limitation highlighted in the discussion started by honggyukim in #1697. Comment Link : https://github.com/namhyung/uftrace/pull/1697#issuecomment-1552964734

To rectify this, we've incorporated both str_ltrim and str_rtrim to handle leading and trailing whitespaces, respectively. The need for handling these trailing whitespaces was particularly emphasized by honggyukim and was not possible in our earlier implementation.

I hope this clarifies the issue we're trying to solve. Would it be useful if I provided some before-and-after command-line outputs for further clarification?

@namhyung
Copy link
Owner

Yes, please.

@GabrielKimm
Copy link
Author

Yes, please.

Before this PR

Case 1: Multiple leading whitespaces

  • Input: " #!/usr/bin/env python"
  • After str_ltrim: "#!/usr/bin/env python"

Case 2: Multiple trailing whitespaces

  • Input: "#!/usr/bin/env python "
  • After str_ltrim: "#!/usr/bin/env python " (Trailing whitespaces are not removed)

Case 3: Multiple whitespaces in the middle

  • Input: "#!/usr/bin/env python"
  • After str_ltrim: "#!/usr/bin/env python" (Middle whitespaces are not removed)

Case 4: Multiple whitespaces at all positions

  • Input: "\t #!/usr/bin/env python "
  • After str_ltrim: "#!/usr/bin/env python " (Only leading whitespaces are removed, trailing and middle whitespaces remain)

After this PR

Case 1: Multiple leading whitespaces

  • Input: " #!/usr/bin/env python"
  • After str_trim: "#!/usr/bin/env python"

Case 2: Multiple trailing whitespaces

  • Input: "#!/usr/bin/env python "
  • After str_trim: "#!/usr/bin/env python"

Case 3: Multiple whitespaces in the middle

  • Input: "#!/usr/bin/env python"
  • After str_trim: "#!/usr/bin/env python"

Case 4: Multiple whitespaces at all positions

  • Input: "\t #!/usr/bin/env python "
  • After str_trim: "#!/usr/bin/env python"

@honggyukim
Copy link
Collaborator

I think you can write an unittest for the above testing.

You can learn more about how to write unittest by reading https://github.com/namhyung/uftrace/blob/v0.14/utils/demangle.c#L1793-L1826.

The unittest can be compiled and executed as follows.

$ make -j4 unittest
      ...
  LINK     unittest
  TEST     test_unit
Running 100 test cases
======================
[001] unittest_framework            : PASS
[002] argspec_auto_args             : PASS
[003] argspec_extract               : PASS
      ...
[008] demangle_simple1              : PASS
[009] demangle_simple2              : PASS
[010] demangle_simple3              : PASS
      ...
[099] utils_parse_timestamp         : PASS
[100] utils_strv                    : PASS

unit test stats
====================
100 ran successfully
  0 failed
  0 skipped
  0 signal caught
  0 unknown result

You can simply run a single unittest as follows.

$ make -j4 unittest UNITTESTARG=demangle_simple1
  TEST     test_unit
Running 1 test cases
======================
[001] demangle_simple1              : PASS

unit test stats
====================
  1 ran successfully
  0 failed
  0 skipped
  0 signal caught
  0 unknown result

@namhyung
Copy link
Owner

namhyung commented Sep 2, 2023

I wanted to see actual case with spaces in the hash bang which failed to run before this change and passed after that.

Maybe something like this:

$ cat pwd.sh
   #!   /bin/sh
pwd

$ ./pwd.sh
/home/namhyung/tmp

$ uftrace --force -t 10us ./pwd.sh
uftrace: /home/namhyung/project/uftrace/cmds/record.c:1615:check_binary
 ERROR: Cannot trace './pwd.sh': Invalid file
	This file doesn't look like an executable ELF file.
	Please check whether it's a kind of script or shell functions.

@MichelleJin12
Copy link
Contributor

Case 1: Multiple leading whitespaces

  • Input: " #!/usr/bin/env python"
  • After str_ltrim: "#!/usr/bin/env python"

Hello, @GabrielKimm,
I think that you don't need to deal with the Case 1 above.
Spaces before "#!" violate the shebang syntax.

In shebang Wikipedia,

The form of a shebang interpreter directive is as follows:[8]

#! interpreter [optional-arg]

in which interpreter is a path to an executable program.
The space between #! and interpreter is optional.
There could be any number of spaces or tabs either before or after interpreter.
The optional-arg will include any extra spaces up to the end-of-line.

@GabrielKimm
Copy link
Author

GabrielKimm commented Oct 3, 2023

Thank you for your feedback, @namhyung @MichelleJin12.
I've tested the script with spaces in the hash bang based on your comment.

Here's what I found:

Before this PR

Case 1: Multiple leading whitespaces

Per @MichelleJin12's comment, exclude it from the test case.

Case 2: Multiple trailing whitespaces

user@user-QEMU-Virtual-Machine:~$ cat p-abc.py 
#!/usr/bin/env python      

import os

def c():
        return os.getpid()
def b():
        return c()
def a():
        return b()

a()
user@user-QEMU-Virtual-Machine:~$ uftrace p-abc.py
/usr/bin/env: ‘python      ’: No such file or directory
/usr/bin/env: use -[v]S to pass options in shebang lines
WARN: cannot open record data: /tmp/uftrace-live-00tohn: No data available

Case 3: Multiple whitespaces in the middle

user@user-QEMU-Virtual-Machine:~$ cat p-abc.py 
#!/usr/bin/env       python

import os

def c():
        return os.getpid()
def b():
        return c()
def a():
        return b()

a()
user@user-QEMU-Virtual-Machine:~$ uftrace p-abc.py
/usr/bin/env: ‘      python’: No such file or directory
/usr/bin/env: use -[v]S to pass options in shebang lines
WARN: cannot open record data: /tmp/uftrace-live-mYX70I: No data available

Case 4: Multiple whitespaces at all positions

user@user-QEMU-Virtual-Machine:~$ cat p-abc.py 
#!/usr/bin/env       python      

import os

def c():
        return os.getpid()
def b():
        return c()
def a():
        return b()

a()
user@user-QEMU-Virtual-Machine:~$ uftrace p-abc.py
/usr/bin/env: ‘      python      ’: No such file or directory
/usr/bin/env: use -[v]S to pass options in shebang lines
WARN: cannot open record data: /tmp/uftrace-live-Qfbk2h: No data available

After this PR

Case 1: Multiple leading whitespaces

Per @MichelleJin12's comment, exclude it from the test case.

Case 2: Multiple trailing whitespaces

user@user-QEMU-Virtual-Machine:~$ cat p-abc.py 
#!/usr/bin/env python          

import os

def c():
        return os.getpid()
def b():
        return c()
def a():
        return b()

a()
user@user-QEMU-Virtual-Machine:~$ uftrace p-abc.py
# DURATION     TID     FUNCTION
            [  6239] | __main__.<module>() {
            [  6239] |   a() {
            [  6239] |     b() {
            [  6239] |       c() {
   0.958 us [  6239] |         posix.getpid();
   4.333 us [  6239] |       } /* c */
   8.333 us [  6239] |     } /* b */
  10.291 us [  6239] |   } /* a */
  14.458 us [  6239] | } /* __main__.<module> */

Case 3: Multiple whitespaces in the middle

user@user-QEMU-Virtual-Machine:~$ cat p-abc.py 
#!/usr/bin/env       python

import os

def c():
        return os.getpid()
def b():
        return c()
def a():
        return b()

a()
user@user-QEMU-Virtual-Machine:~$ uftrace p-abc.py
# DURATION     TID     FUNCTION
            [  6290] | __main__.<module>() {
            [  6290] |   a() {
            [  6290] |     b() {
            [  6290] |       c() {
   1.708 us [  6290] |         posix.getpid();
   7.666 us [  6290] |       } /* c */
 104.996 us [  6290] |     } /* b */
 107.579 us [  6290] |   } /* a */
 115.621 us [  6290] | } /* __main__.<module> */

Case 4: Multiple whitespaces at all positions

user@user-QEMU-Virtual-Machine:~$ cat p-abc.py 
#!/usr/bin/env       python      

import os

def c():
        return os.getpid()
def b():
        return c()
def a():
        return b()

a()
user@user-QEMU-Virtual-Machine:~$ uftrace p-abc.py
# DURATION     TID     FUNCTION
            [  6410] | __main__.<module>() {
            [  6410] |   a() {
            [  6410] |     b() {
            [  6410] |       c() {
   1.042 us [  6410] |         posix.getpid();
   4.959 us [  6410] |       } /* c */
   8.916 us [  6410] |     } /* b */
  11.125 us [  6410] |   } /* a */
  15.167 us [  6410] | } /* __main__.<module> */

@GabrielKimm
Copy link
Author

Thank you for the feedback, @honggyukim.

I recognize the importance of ensuring the shebang line's space handling works as intended.
Based on the manual tests I've conducted:

Case 2: Multiple trailing whitespaces
Input: "#!/usr/bin/env python      "
After str_trim: "#!/usr/bin/env python"

Case 3: Multiple whitespaces in the middle
Input: "#!/usr/bin/env       python"
After str_trim: "#!/usr/bin/env python"

Case 4: Multiple whitespaces at all positions
Input: "#!/usr/bin/env       python      "
After str_trim: "#!/usr/bin/env python"

I'm in agreement that creating an automated unittest for this will be beneficial. Having looked into the link you provided, I have a better understanding of how to structure the unittest for uftrace. I'll proceed to write the unittest to verify the shebang line's space handling and update the PR once completed.

Thank you for guiding me towards ensuring better code quality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unable to identify shebang when it has initial spaces
4 participants