Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Detector Support]: Fatal Python error: Segmentation fault #9801

Closed
usafle opened this issue Feb 11, 2024 · 17 comments
Closed

[Detector Support]: Fatal Python error: Segmentation fault #9801

usafle opened this issue Feb 11, 2024 · 17 comments

Comments

@usafle
Copy link

usafle commented Feb 11, 2024

Describe the problem you are having

Launching v13 with NVIDIA branch causes a bootloop with the above error and no other explanation. I was told in a different support ticket that my NVIDIA driver version was too new

I have since downgraded to Driver v535.129.03 which is supposedly stable according to the last ticket I opened (#9575)

The error still is present.

Version

v13

Frigate config file

mqtt:
  enabled: true
  host: 192.168.1.102
  user: frigate
  password: PASSWORD

# detectors:
#  cpu1:
#    type: cpu
#    num_threads: 2

# birdseye:
#   enabled: True
#   restream: false
#   mode: continuous
#   width: 1280
#   height: 720
#   quality: 8

go2rtc:
  streams:
    Rear_Deck:
      - rtsp://admin:PASSWORD@192.168.1.114:554/h264Preview_01_main
    Rear_Deck_sub:
      - rtsp://admin:PASWWROD@192.168.1.114:554/h264Preview_01_sub
    Garage_Camera:
      - rtsp://admin:PASSWORD@192.168.1.215:554/cam/realmonitor?channel=1&subtype=0
    Garage_Camera_sub:
     - rtsp://admin:PASSWORD@192.168.1.215:554/cam/realmonitor?channel=1&subtype=1

ffmpeg:
  hwaccel_args: preset-nvidia-h265

rtmp:
  enabled: False 

cameras:
############## REAR DECK ##################
  Rear_Deck:
    ffmpeg:
      inputs:
        - path: rtsp://127.0.0.1:8554/Rear_Deck_sub
          input_args: preset-rtsp-restream
          roles:
            - detect
        - path: rtsp://127.0.0.1:8554/Rear_Deck
          input_args: preset-rtsp-restream
          roles:
            - record
      output_args:
        record: -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c:v copy -c:a aac
    objects:
      track:
        - person
        - dog
        - bird
        - cat
    detect:
      width: 1280
      height: 720
      fps: 4
    record:
      enabled: True
      events:
        retain:
          default: 2
    snapshots:
      enabled: True

  Garage_Camera:
    ffmpeg:
      inputs:
        - path: rtsp://127.0.0.1:8554/Garage_Camera_sub
          input_args: preset-rtsp-restream
          roles:
            - detect
        - path: rtsp://127.0.0.1:8554/Garage_Camera
          input_args: preset-rtsp-restream
          roles:
            - record
      output_args:
        record: -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c:v copy -c:a aac
        # record: preset-record-generic-audio-aac
    objects:
      track:
        - person
        - dog
        - cat
        - car
        - package
    detect:
      width: 1280
      height: 720
      fps: 4
    record:
      enabled: True
      events:
        retain:
          default: 2
    snapshots:
      enabled: True

docker-compose file or Docker CLI command

docker run
  -d
  --name='frigate'
  --net='bridge'
  -e TZ="America/New_York"
  -e HOST_OS="Unraid"
  -e HOST_HOSTNAME="CozsNAS"
  -e HOST_CONTAINERNAME="frigate"
  -e 'FRIGATE_RTSP_PASSWORD'='********!'
  -e 'PLUS_API_KEY'='********'
  -e 'NVIDIA_VISIBLE_DEVICES'='GPU-53a3b891-6d7b-8fe8-bd57-9467c8797875'
  -e 'NVIDIA_DRIVER_CAPABILITIES'='compute,utility,video'
  -e 'YOLO_MODELS'='yolov4-416,yolov4-tiny-416'
  -e 'USE_FP16'='false'
  -e 'TRT_MODEL_PREP_DEVICE'='0'
  -l net.unraid.docker.managed=dockerman
  -l net.unraid.docker.webui='http://[IP]:[PORT:5000]'
  -l net.unraid.docker.icon='https://raw.githubusercontent.com/yayitazale/unraid-templates/main/frigate.png'
  -p '5000:5000/tcp'
  -p '8554:8554/tcp'
  -p '8555:8555/tcp'
  -p '8555:8555/udp'
  -p '1984:1984/tcp'
  -v '/mnt/user/appdata/frigate':'/config':'rw'
  -v '/mnt/user/Frigate Recordings/':'/media/frigate':'rw'
  -v '/etc/localtime':'/etc/localtime':'rw'
  --shm-size=256mb
  --mount type=tmpfs,target=/tmp/cache,tmpfs-size=1000000000
  --restart unless-stopped
  --gpus=all 'ghcr.io/blakeblackshear/frigate:stable-tensorrt'

d4d83afad657559c53468c5ebc065e6caf904150a1c06d31719cfa84768c6afa

Relevant log output

2024-02-11 11:51:47.953981276  Fatal Python error: Segmentation fault
2024-02-11 11:51:47.953989283  
2024-02-11 11:51:47.953991367  Thread 0x00001506c59ee6c0 (most recent call first):
2024-02-11 11:51:47.954045897    File "/usr/lib/python3.9/threading.py", line 312 in wait
2024-02-11 11:51:47.954133590    File "/usr/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
2024-02-11 11:51:47.954194783    File "/usr/lib/python3.9/threading.py", line 892 in run
2024-02-11 11:51:47.954276894    File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner
2024-02-11 11:51:47.954332450    File "/usr/lib/python3.9/threading.py", line 912 in _bootstrap
2024-02-11 11:51:47.954343287  
2024-02-11 11:51:47.954345312  Current thread 0x00001506ea62f740 (most recent call first):
2024-02-11 11:51:47.954442917    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 168 in <listcomp>
2024-02-11 11:51:47.954540476    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 167 in _do_inference
2024-02-11 11:51:47.954632394    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 286 in detect_raw
2024-02-11 11:51:47.954722466    File "/opt/frigate/frigate/object_detection.py", line 75 in detect_raw
2024-02-11 11:51:47.954846892    File "/opt/frigate/frigate/object_detection.py", line 125 in run_detector
2024-02-11 11:51:47.954996257    File "/usr/lib/python3.9/multiprocessing/process.py", line 108 in run
2024-02-11 11:51:47.955127573    File "/usr/lib/python3.9/multiprocessing/process.py", line 315 in _bootstrap
2024-02-11 11:51:47.955272691    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 71 in _launch
2024-02-11 11:51:47.955386735    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 19 in __init__
2024-02-11 11:51:47.955491455    File "/usr/lib/python3.9/multiprocessing/context.py", line 277 in _Popen
2024-02-11 11:51:47.955586815    File "/usr/lib/python3.9/multiprocessing/context.py", line 224 in _Popen
2024-02-11 11:51:47.955687173    File "/usr/lib/python3.9/multiprocessing/process.py", line 121 in start
2024-02-11 11:51:47.955783568    File "/opt/frigate/frigate/object_detection.py", line 183 in start_or_restart
2024-02-11 11:51:47.955869434    File "/opt/frigate/frigate/object_detection.py", line 151 in __init__
2024-02-11 11:51:47.955992198    File "/opt/frigate/frigate/app.py", line 453 in start_detectors
2024-02-11 11:51:47.956082578    File "/opt/frigate/frigate/app.py", line 683 in start
2024-02-11 11:51:47.956155258    File "/opt/frigate/frigate/__main__.py", line 17 in <module>
2024-02-11 11:51:47.956251165    File "/usr/lib/python3.9/runpy.py", line 87 in _run_code
2024-02-11 11:51:47.956342911    File "/usr/lib/python3.9/runpy.py", line 197 in _run_module_as_main
2024-02-11 11:51:49.659338836  [INFO] Starting go2rtc healthcheck service...
2024-02-11 11:52:02.661642001  [2024-02-11 11:52:02] frigate.watchdog               INFO    : Detection appears to be stuck. Restarting detection process...
2024-02-11 11:52:02.685462534  [2024-02-11 11:52:02] detector.tensorrt              INFO    : Starting detection process: 1255
2024-02-11 11:52:02.984562929  [2024-02-11 11:52:02] frigate.detectors.plugins.tensorrt INFO    : Loaded engine size: 39 MiB
2024-02-11 11:52:03.343516305  [2024-02-11 11:52:03] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +8, now: CPU 158, GPU 230 (MiB)
2024-02-11 11:52:03.356055956  [2024-02-11 11:52:03] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 160, GPU 240 (MiB)
2024-02-11 11:52:03.360742717  [2024-02-11 11:52:03] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +40, now: CPU 0, GPU 40 (MiB)
2024-02-11 11:52:03.369792099  [2024-02-11 11:52:03] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 120, GPU 232 (MiB)
2024-02-11 11:52:03.370187842  [2024-02-11 11:52:03] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 120, GPU 240 (MiB)
2024-02-11 11:52:03.370321313  [2024-02-11 11:52:03] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +13, now: CPU 0, GPU 53 (MiB)

Operating system

UNRAID

Install method

Docker Compose

Coral version

CPU (no coral)

Any other information that may be helpful

No response

@NickM-27
Copy link
Sponsor Collaborator

what GPU do you have again?

@usafle
Copy link
Author

usafle commented Feb 11, 2024

image

@usafle
Copy link
Author

usafle commented Feb 12, 2024

Edited Post to remove the Frigtate Plus API key and RTSP password that was visible in the Docker CLI command.

@usafle
Copy link
Author

usafle commented Feb 14, 2024

No one has ANY suggestions (besides buying a CORAL device) on how to get my Frigtate instance up and running??

@NickM-27
Copy link
Sponsor Collaborator

seg faults are difficult because it is usually something related to the host or the hardware and there is no info about what is going wrong.

From your previous post logs we can see that as soon as the model is initialized there is a seg fault indicating some failure to communicate correctly. Many users use this type of setup on unraid so it seems there is nothing particular about that. You could try a memtest and see if perhaps system memory is failing.

@usafle
Copy link
Author

usafle commented Feb 16, 2024

memtest complete. 0 errors.

Next suggestion please?

@hvardhan20
Copy link

image
I have the same error. Need help
I have a eufy cam2 pro which only sends a stream when a motion is detected. I suspect this could be a potential cause. Any thoughts?

@jdgiddings
Copy link

I'm experiencing the exact same error on TrueNAS Scale w/ GTX 1060

@usafle
Copy link
Author

usafle commented Mar 29, 2024

@hvardhan20 and @jdgiddings - I hope you both get a response but, if my past experience holds true, it doesn't look good. CPU detection worked fine. GPU detection worked fine...... until they bundled it all into one container.
Hard to fix issues when you don't have any support from anyone here.

@NickM-27
Copy link
Sponsor Collaborator

There are many tensorrt users so this seems to be a very isolated problem. Like I said before, seg faults are difficult to debug and without being able to reproduce there really isn't any good way to move towards solving the problem because it is not clear what is causing this other than something on the host.

The logic to compile the models is the same as before just done automatically, that is unlikely to be causing this. It could be due to using newer libraries / tensorrt version but that was done to support the latest Nvidia GPUs and also unrelated to frigate building the models automatically.

@jdgiddings
Copy link

here's the output from nvidia-smi on the host. I believe these are all supported versions

NVIDIA-SMI 535.54.03
Driver Version: 535.54.03
CUDA Version: 12.2

I'm experimenting with different models right now to see if any do not cause the error. I will report back

@jdgiddings
Copy link

yolov7-320 does not throw the segfault

@NickM-27
Copy link
Sponsor Collaborator

which model did you use that did?

@jdgiddings
Copy link

yolov7x-640 and yolov7x-320 were both throwing the error on my machine

@jdgiddings
Copy link

I did some more testing. Any model larger than yolov7-320 throws the same segfault error

@jpreston84
Copy link

jpreston84 commented Apr 13, 2024

I just wanted to add another voice here -- I am able to run yolov7x-320, but if I attempt to run yolov7x-640, I get a segfault (the same as the OP). I'm on a GTX 1650 Super. My setup is a bit odd:

  1. My system is running TrueNAS SCALE.
  2. Inside TrueNAS, I have set up a VM (because the Docker implementation of SCALE sucks).
  3. The VM is running Ubuntu 22.04.
  4. I'm also running CasaOS (but I don't think that matters, because I loaded Frigate via docker compose CLI).
  5. My test camera is streaming via RTSP through go2rtc, with WebRTC and MPE working correctly.

Let me know if I can do anything to help debug this.

[edit] I previously said my 1650 is an LHR. This is incorrect. My 3060 is LHR, and I confused the two.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label May 19, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants