Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unraid pcie issues #26

Open
markmghali opened this issue Aug 23, 2022 · 14 comments
Open

unraid pcie issues #26

markmghali opened this issue Aug 23, 2022 · 14 comments

Comments

@markmghali
Copy link

Hello,

I have been trying to get your PCIe adaptor to work for a few months now with no luck. I am using unraid with Frigate v0.10 Docker container. I can see both TPUs as apex_0 and apex_1. Symptom is Frigate will un for a bot then I get a PCIe error in my syslog for unraid. IT will then shutdown one of the TPUs and the Temp goes negative. I have posted my issues in the Frigate github and the unraid forums with no luck. I have reposted my unraid post below. Please let me know what else I can troubleshoot. Love all the work you have done for the community hoping to get this to work properly.

I am having a similar issue to @AdvancedMobileRepairs Using the Dual TPU in Magic-Blue-smoke PCIe adapter. Prior to this I was using a single TPU with a different adapter that was working fine. I have been monitoring the Coral Temperatures at they have not been going above 48 Degrees. I have this error in my syslog:

image

If anyone has any insight into this? I already asked in the Frigate github and we troubleshooted to a point but then they told me to ask in the unraid forum.

Thank you

EDIT EDIT:

Per this thread:

https://forums.unraid.net/topic/103901-solved-aer-pcie-bus-errors/

I disabled ASPM on PCIe in my BIOS. restarted server and running frigate to see how long it works before the coral shuts down.

And it failed again! That did not fix the issue. very weird

image

Temp is not the issue it seems

image

Any insight?

@tehniemer
Copy link

I'm experiencing a similar issue where the second TPU, apex_1, is visible but not responding, and the test model from the install instructions is failing. However, if I only use the first TPU, apex_0, in frigate it works. I've opened a support ticket with the Coral team here.

@markmghali
Copy link
Author

Thank you I will check it out

@markmghali
Copy link
Author

I am having a heck of a time getting this working

I added pci=noaer pcie_aspm=off to my unraid OS section. It seemed like it was working better but after about an hour or so the whole server just stops responding

So now it works for a bit but then my whole server stops responding. I cannot SSH webgui nothing. I have to hard reboot it by holding the power button. I also don't think I can see logs as I have to reboot so I don't get the syslog.

I thought I was on the right path but I guess not.

@magic-blue-smoke
Copy link
Owner

@markmghali @tehniemer

Thanks for feedback and diagnostics info. I'm really interested to investigate cause of this issues to see if there's manufacturing flaw of particular incompatibility issue.

Could you please contact me using form at the bottom of page here with your order number?

@markmghali
Copy link
Author

@magic-blue-smoke ok I have reached out via the contact form.

Thank you

@tehniemer
Copy link

@magic-blue-smoke ok I have reached out via the contact form.

Thank you

likewise.

@tehniemer
Copy link

I was able to get both working again by removing the associated PCI devices and rescanning, however, this fix does not survive a reboot.

root@nvr:~# lspci -PP
00:00.0 Host bridge: Intel Corporation Device 3e0f (rev 08)
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT1 [UHD Graphics 610]
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:14.5 SD Host controller: Intel Corporation Device a375 (rev 10)
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #5 (rev f0)
00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0)
00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
00:1f.0 ISA bridge: Intel Corporation H370 Chipset LPC/eSPI Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1c.0/01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
00:1c.7/02:00.0 PCI bridge: ASMedia Technology Inc. Device 1182
00:1c.7/02:00.0/03:03.0 PCI bridge: ASMedia Technology Inc. Device 1182
00:1c.7/02:00.0/03:07.0 PCI bridge: ASMedia Technology Inc. Device 1182
00:1c.7/02:00.0/03:03.0/04:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
00:1c.7/02:00.0/03:07.0/05:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
00:1d.0/06:00.0 Non-Volatile memory controller: Shenzhen Longsys Electronics Co., Ltd. Device 1d97 (rev 01)
echo 1 >/sys/bus/pci/devices/0000:00:1c.7/remove
echo 1 >/sys/bus/pci/rescan

@tehniemer
Copy link

I figured out the problem I was having wasn't really a problem after all, turns out the host machine can't use a PCI device that has been passed into a docker container. Once I realized that I determined everything is working properly.

@nmajin
Copy link

nmajin commented Aug 31, 2022

What particular machine do you have? I was hoping to use this adapter with a Synology and pass the PCI coral to my docker container.

@magic-blue-smoke
Copy link
Owner

What particular machine do you have? I was hoping to use this adapter with a Synology and pass the PCI coral to my docker container.

@nmajin I'll try to explain in other words what @tehniemer mean

When using VMs, they don't have direct access to hardware of your PC. Instead, VM environment emulates network card, drives, video adapter and other hardware. Coral TPU can't be emulated and needs PCIe pass through - a mechanism to "pull out" particular PCIe device from host PC and provide exclusive access to it within VM.

Now if I get it right, adapter made both Coral TPUs available for use. However, one TPU was configured with PCIe passthrough to VM, another was not and remained available in host system. This is expected behavior and means that TPUs can be used in a number of combinations:

  • both TPUs can be assigned to the same VM
  • first TPU can be assigned to one VM and second to another
  • first TPU can be assigned to VM and second can be used on host system

@nmajin
Copy link

nmajin commented Aug 31, 2022

@magic-blue-smoke thanks for the detail and providing more context.

So, to clarify both TPUs being available as passthrough (to a docker container), is that possible with this adapter and the dual edge TPU Coral? Sorry, just want to clarify I can in fact use both TPUs if and when I get the Coral and the adapter.

@tehniemer
Copy link

In my configuration I have both TPUs passed through to a docker container.

@NickM-27
Copy link

NickM-27 commented Sep 6, 2022

Just to add an additional anecdoate: I run Frigate with this dual-tpu-adapter in my Unraid Server, both TPUs are passed in and have not had any such issues, been running for about 3 months now. I was sure to disable all C-States for my CPU in the BIOS which is something I've always had to do do ensure stability with Unraid.

@markmghali
Copy link
Author

@magic-blue-smoke you stated you were making another revision of this adapter? I am tempted to buy it and try again though. I feel like I will have the same issues as I did before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants