This tutorial demonstrates how to build a custom system that utilizes the DPU v2.0 (v1.4.0 architecture) of the Xilinx® Deep Learning Processor (DPU) IP to accelerate machine learning algorithms using the following development flow:
-
Build the hardware platform in the Vivado® Design Suite.
-
Generate the Linux platform in PetaLinux.
-
Use Xilinx SDK to build two machine learning applications that take advantage of the DPU.
Note:
- The Ultra96 will be the targeted hardware platform. The DPU IP and yocto recipes are based on the ZCU102 DPU TRD v2.0, which can be downloaded here.
- This tutorial uses the DPU B1152. Pre-built model .elfs are also provided for the B2304.
This section lists the software and hardware tools required to use the Xilinx® Deep Learning Processor (DPU) IP to accelerate machine learning algorithms.
-
Vivado® Design Suite 2018.2
-
Board files for Ultra96 v1 should be installed
-
Xilinx SDK 2018.2
-
PetaLinux 2018.2
Note: This tutorial is known to work with Vivado/Petalinux/SDK v2018.3, but 2018.2 will provided the best experience at this time. To use it with 2018.3, you will need to make the following changes:
-
Edit the
u96_dpuv2.0_2018.2.tcl
script to specify 2018.3. -
Change the
petalinux-image.bbappend
topetalinux-image-full.bbappend
.
-
The Ultra96 board
-
12V power supply for Ultra96
-
MicroUSB to USB-A cable
-
AES-ACC-USB-JTAG board
-
A blank, FAT32 formatted microSD card
-
DisplayPort monitor (Optional)
-
Mini-display port cable suitable for the chosen monitor (Optional)
-
USB Webcam (Optional)
Download and extract the full tutorial archive from this repository and move the DPU Integration/reference-files sub-directory to your working area. Rename this directory to "dpu_integration_lab". You should end up with a directory structure as shown in the following figure:
The folders are:
-
files: Petalinux/Yocto recipes, source code for SDK, etc.
-
hsi: Directory for handing off
.hdf
files from the Vivado Design Suite to PetaLinux -
ip_repo: Repository for the DPU IP
-
prebuilts: Includes a pre-built
.hdf
file exported from the Vivado Design Suite, and a complete set of files to boot from the SD card and run applications -
sdk_workspace: Empty Eclipse workspace to be used for Xilinx SDK application development
-
vivado: The Vivado Design Suite working directory includes an archived project for Ultra96 as well as a
.tcl
script to create a working.bd
-
sdcard: Staging area for creating the SD Card image
From here, the location of the root lab directory will be referred to as <PROJ ROOT>
.
TIP: There is a file called commands.txt in the files directory, that has most of the commands required for the lab. Copy and paste the file from this location to save time.
The high-level tool flow is shown in the following figure:
-
Create a new project for the Ultra96.
-
Add the DPU IP to the project.
-
Use a
.tcl
script to hook up the block design in the IP integrator. -
Examine the DPU configuration and connections.
-
Generate the bitstream.
-
Export the
.hdf
file.
-
Create a new PetaLinux project with the "Template Flow."
-
Add some new Yocto Recipes and recipe modifications.
-
Import the
.hdf
file from the Vivado Design Suite. -
Configure some Ultra96-specifc hardware options.
-
Add some necessary packages to the root filesystem.
-
Update the device-tree to add the DPU.
-
Build the project.
-
Create a boot image.
-
Create new application projects for resnet50 and face detection.
-
Import the application source code and model
.elfs
generated bydnnc
. -
Update the application settings to point to sysroot, include needed libraries, etc.
-
Build the applications.
cd
into the Vivado directory and launch Vivado.
cd <PROJ ROOT>/vivado/
vivado
-
Create a new project based on the Ultra96 boards files:
-
Project Name: project_1
-
Project Location:
<PROJ ROOT>/vivado
-
Do not specify sources
-
Select Ultra96v1 Evaluation Platform
Note: Make sure you select the v1 option. The U96v1 Board Files are not a part of the standard Vivado installation. They must be installed separately. It is assumed that this step is already completed.
-
-
Click Finish.
-
Click IP Catalog in the Project Manager.
-
Right-click Vivado Repository and select Add Repository.
-
Select /ip_repo
**Note:** You should see a message indicating that one repository and one IP is added.
-
Open the TCL Console tab,
cd
to the<PROJ ROOT>/vivado
directory, and source the.tcl
script that has been provided to create the IP integrator block design for you:source u96_dpuv2.0_2018.2.tcl
-
When the block design is complete, right-click on the design_1 in the Sources tab and select Create HDL Wrapper.
-
Accept the default options.
- Analyze the components and connections in the block design before continuing.
Note:
- When using the B2304 in the ZU3EG device found on the Ultra96, you must set the DSP Usage to Low in the DPU configuration GUI.
- When setting the RAM Usage to Low, the DPU is compatible with DNNC v1.4.0. When setting the RAM usage to High, DNNC v1.4.0.1 must be used to generate compatible instructions.
To save time, we can skip building the Vivado project and manually export a pre-built .hdf
file to the directory where the PetaLinux flow expects it. To use the pre-built option, execute the following command to copy the pre-built .hdf
into the project:
cd <PROJ ROOT>
cp prebuilts/design_1_wrapper.hdf hsi
You can now skip to the PetaLinux section.
-
Click Generate Bitstream.
-
Accept the defaults.
Note: This step will take about 45 minutes, depending on the machine.
When the bitstream generation process is complete, do the following steps to export the .hdf
for use by PetaLinux:
-
Click File > Export > Export Hardware.
-
Make sure to include the bitstream.
-
Export the hardware platform to
<PROJ ROOT>/hsi
. -
Click OK.
You can begin with the PetaLinux flow, once the hardware definition file (.hdf
) is exported from the Vivado® Design Suite. At this point, you should have exported the .hdf
to the <PROJ ROOT>/hsi
directory.
Tip: To speed up text entry, use commands.txt
file from the <PROJ ROOT>/files
to copy and paste most of the commands. It is highly recommended that you copy and paste the commands to avoid command-line errors.
Use the following command to create a new PetaLinux project based on the Zynq® UltraScale+ template in a new directory named petalinux
. This project is not based on an existing BSP.
source /opt/xilinx/petalinux/2018.2/settings.sh
cd <PROJ ROOT>
petalinux-create -t project -n petalinux --template zynqMP
cd petalinux
In this step, you will add or edit some Yocto recipes to customize the kernel and rootfs and add the dnndk files.
Note: Make sure to cd
in the PetaLinux directory first.
- Add a recipe to add the DPU utilities, libraries, and header files into the root file system.
cp -rp ../files/recipes-apps/dnndk/ project-spec/meta-user/recipes-apps/
- Add a recipe to build the DPU driver kernel module.
cp -rp ../files/recipes-modules project-spec/meta-user
- Add a recipe to create hooks for adding an “austostart” script to run automatically during Linux init.
cp -rp ../files/recipes-apps/autostart project-spec/meta-user/recipes-apps/
- Add a
bbappend
for the base-files recipe to do various things like auto insert the DPU driver, auto mount the SD card, modify the PATH, etc.
cp -rp ../files/recipes-core/base-files/ project-spec/meta-user/recipes-core/
vi project-spec/meta-user/recipes-core/images/petalinux-image.bbappend
Add the following lines:
IMAGE_INSTALL_append = " dnndk"
IMAGE_INSTALL_append = " autostart"
IMAGE_INSTALL_append = " dpu"
- Use the following command to open the top-level PetaLinux project confguration GUI:
petalinux-config --get-hw-description=../hsi
- Change the serial port to
psu_uart_1
.
Subsystem AUTO Hardware Settings->Serial Settings->Primary stdin/stdout = psu_uart1
Note: The UART that connects to the USB JTAG/UART board is psu_uart_1
.
-
Select Ultra96 Machine.
DTG Settings -> MACHINE_NAME = zcu100-revc
Note: The Ultra96 was originally called zcu100.
Tip: Use backspace to delete the default text, then add zcu100-revc.
By doing this, the build system uses the Ultra96-specific device-tree files.
-
Exit and save the changes. This step will take a few minutes.
Use the following to open the top-level PetaLinux project configuration GUI.
petalinux-config -c rootfs
-
Enable each item listed below:
Note: Do not enable the dev or dbg packages.
Petalinux Package Groups ->
- matchbox
- opencv
- v4lutils
- x11
Apps ->
- autostart
Filesystem Packages ->
- libs->libmali-xlnx->libmali-xlnx
Modules ->
- dpu
User Packages ->
- dnndk
- Exit and save the changes.
At this time, the DPU is not supported by the device-tree generator. Therefore, we need to manually add a device-tree node to the DPU, based on our hardware settings.
At the bottom of project-spec/meta-user/recipes-bsp/device-tree/files/system-user.dtsi
, add the following text:
Tip: You can copy and paste the amba node from <PROJ ROOT>/files/dpu.dtsi
.
PS Interface | GIC IRQ # | Linux IRQ # |
PL_PS_IRQ1[7:0] | 143:136 | 111:104 |
PL_PS_IRQ0[7:0] | 128:121 | 96:89 |
In the device tree, each interrupt 3-tuple is defined as follows:
Interrupt | Description |
1st Cell | 0 = Shared Peripheral Interrupt (SPI) 1 = Processor to Processor Interrupt (PPI) |
2nd Cell | Linux Interrupt number |
3rd Cell | 1 = rising edge 2 = falling edge 4 = level high 8 = level low |
If the DPU IP is configured to use more than one core, you will need multiple sets of interrupts, and the core-num
parameter should be updated accordingly. For example, if you have three cores, interrupts
and core-num
should be set to the following values, assuming the interrupts are connected to PL_PS_IRQ0[2:0]
:
interrupts = <0x0 0x59 0x4 0x0 0x5a 0x4 0x0 0x5b 0x4 >;
core-num = <0x3>;
petalinux-build
cd images/linux
petalinux-package --boot --fsbl zynqmp_fsbl.elf --u-boot u-boot.elf /
--pmufw pmufw.elf --fpga system.bit --force
The sysroot
is required to build applications against the libraries/header files that are provided by some of the packages that are built into the root file system.
Running through the full process to rebuild the SDK can take over an hour to complete. Therefore, a pre-built SDK has been provided with the tutorial files.
To download the pre-built SDK file, download and extract the zip file from this link, then copy the sdk.sh
file to ../files
.
To install the pre-built SDK, use the following command:
cd <PROJ ROOT>/petalinux
petalinux-package --sysroot -s ../files/sdk.sh
If you want to go through full process to rebuild the SDK, use the following steps:
- Run the following command to build a Yocto SDK and copy it to
<PROJ ROOT>/petalinux/images/linux/sdk.sh
:
petalinux-build --sdk
- Run the following command to extract and install the generated SDK and sysroot into the specified directory:
petalinux-package --sysroot -d <directory>
Note: If you do not specify the directory (-d
), the SDK will be installed at images/linux/sdk
.
Use the following steps to build two machine learning applications that take advantage of the DPU, using the Xilinx® SDK:
Run the following command to launch the Xilinx SDK GUI:
xsdk
When the GUI opens, browse to the empty workspace at <PROJ ROOT>/sdk_workspace
.
Use the following steps to create a new application project:
-
Click File and select New Application Project
-
Enter the parameters as follows:
- Name: resnet50
- OS Platform: Linux
- Processor Type: psu_cortexa53
- Language: C++
-
Click Next
-
Select Empty Application
-
Click Finish.
Use the following steps to import source files and model .elfs files:
-
Click File and select Import -> General -> Filesystem.
-
Browse to
<PROJ ROOT>/files/resnet50
. -
Click OK.
-
Select main.cc. (NEED A NEW main.cc with only one kernel since average pooling is done on DPU)
-
Check if the
Into Folder
is set to resnet50/src. -
Click Finish, and allow it to overwrite
main.cc
. -
Follow the same steps to import the DPU model
.elf
,dpu_resnet50_0.elf
Note: You can use the pre-built models from <PROJ ROOT>/files/resnet50/B1152_1.4.0
, if you do not have your own.
Use the following steps to update the application build settings:
-
Right-click on resnet50 application and select C/C++ Build Settings.
-
In C/C++ Build -> Environment, add SYSROOT and point to the the sysroots location. For example:
${workspace_loc}/../petalinux/images/linux/sdk/sysroots/aarch64-xilinx-linux
-
Point the compiler and the linker to SYSROOT:
-
In the g++ linker libraries tab, add the following libraries:
-
In g++ linker -> Miscellaneous, add the model
.elfs
to Other Objects. -
Add
dpu_resnet50_0.elf
from theresnet50/src directory
. Note: You can click Workspace to browse to the objects you want, as shown in the following figure (ignore the second .elf for this version of the tutorial):
**Note:** This will cause the `.elfs` to be statically linked to the application. It is also possible to dynamically link these objects at runtime(not covered in this guide).
- Click OK.
- Right-click on the resnet50 application and select Build Project.
Use the following steps to build the face detection application:
-
Repeat Step 3, substeps 2 through 5 above.
-
Add the source file /files/face_detection/face_detection.cc.
-
Delete
main.cc
from the project. -
Add
dpu_densebox.elf
from<PROJ ROOT>/files/face_detection/B1152_1.4.0
, if you do not have your own. -
Set the SYSROOT Environment Variable to the proper value.
-
Point to SYSROOT in compiler and linker miscellaneous settings.
-
Add the following libraries:
- n2cube
- dputils
- opencv_core
- opencv_imgcodecs
- opencv_highgui
- opencv_imgproc
- opencv_videoio
- pthread
-
For the g++ Linker Miscellaneous Other Objects, select
face_detection/src/dpu_densebox.elf
. -
Click OK.
-
Right-click on the face_detection application and select Build Project.
Use the following steps to set up Ultra96:
-
Connect a proper 12V power supply.
-
Connect the AES-ACC-USB-JTAG board.
-
Connect the Camera Mezzanine board to the Ultra96 (Optional)
-
Connect a microUSB cable between the AES-ACC-USB-JTAG and your PC.
-
Connect a second microUSB cable between from the Ultra96 USB3.0 connector to your PC for networking.
-
Connect a DisplayPort Monitor using a miniDisplayPort cable (Optional)
-
Connect a USB webcam to one of the host USB ports (Optional)
-
Prepare a blank microSD card with a single FAT32 partition.
Next, we’ll gather all the images in a SD card staging area first, and then copy them all to the SD card at one time. There is a directory in PROJ_ROOT called sdcard that already includes the directories for the applications and the test images for resnet50. The test images are located in the /sdcard/common/image500_640_480 directory.
Use the following steps to copy the files to the SD card:
-
Copy
<PROJ ROOT>/petalinux/images/linux/image.ub
andBOOT.BIN
to thesdcard
directory. -
Copy
<PROJ_ROOT>/sdk_workspace/resnet50/Debug/resnet50.elf
to thesdcard/resnet50
folder. -
Copy
<PROJ_ROOT>/sdk_workspace/face_detection/Debug/face_detection.elf
to thesdcard/face_detection
folder.cd im
You can copy and paste the following commands:
cd <PROJ ROOT>
cp petalinux/images/linux/image.ub sdcard
cp petalinux/images/linux/BOOT.BIN sdcard
cp sdk_workspace/resnet50/Debug/resnet50.elf sdcard/resnet50/
cp sdk_workspace/face_detection/Debug/face_detection.elf sdcard/face_detection/`
- Copy all the files in the
sdcard
directory to a blank microSD card on your PC.
Place the micro SD card into the Ultra96 and power on the board. Once the board has booted, login using the following credentials:
- username = root
- password = root
There are two ways to display the results of the face detection application. You can either connect a display port monitor to the Ultra96, or you can stream the video over the network to a connected PC.
Run the commands below to prepare the display. If you include the autostart.sh on the SD card, this will happen automatically after boot. You'll still need to export the DISPLAY again, however.
v4l2-ctl --set-fmt-video=width=640,height=480,pixelformat=UYVY
export DISPLAY=:0.0
xrandr --output DP-1 --mode 800x600
xset s off -dpms
Note: Use xrandr
to find a suitable mode for your monitor. When running at 1920x1080, the screen may flicker due to memory bandwidth issues.
There are two ways to connect to the Ultra96 over the network:
-
USB Ethernet adapter
- Connect a USB Ethernet adapter to one of the USB Host ports on the board, and connect to your local network or directly to your PC.
-
RNDIS/Ethernet Gadget:
-
Connect a micro USB cable between the Ultra96 USB3.0 port and the PC with RNDIS support enabled.
-
After boot, issue the following commands to enable the interface:
modprobe g_ether ifup usb0
These commands are issues automatically if the autostart.sh script is used.
-
On Windows, connect to the target over the network using an SSH client that provides an X-server, such as MobaXterm. Ensure that X11-forwarding is enabled, and the DISPLAY environment variable is setup correctly. When an application is launched from this shell, the output will be forwarded back to the PC and displayed in a separate window.
For Linux (or windows command line) you can use the following command:
- ssh -X root@[IP address of Ultra96].
Change to the directory with the resnet50
application and execute the program.
• cd /media/card/resnet50
• ./resnet50.elf
Change to the following directories with the face_detection application and execute the program.
cd /media/card/face_detection
./face_detection.elf
Note: If you see “Open camera error!”, try unplugging the USB camera and inserting it again. If it still isn’t recognized, try rebooting with the camera unplugged, then plug in the camera before launching the application. If both of these efforts fail, try a different camera.