Skip to content
/ RANUM Public

[ICSE 2023] Differentiable interpretation and failure-inducing input generation for neural network numerical bugs.

License

Notifications You must be signed in to change notification settings

llylly/RANUM

Repository files navigation

RANUM

RANUM is a tool for detecting, testing and fixing the numerical defects in DNN architectures.

If you find the tool useful, please consider citing our accompanying paper at ICSE 2023:

@inproceedings{li2023reliability,
    author={Linyi Li and Yuhao Zhang and Luyao Ren and Yingfei Xiong and Tao Xie},
    title = {Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects},
    booktitle = {45th International Conference on Software Engineering, {ICSE} 2023, Melbourne, Australia, 14-20 May 2023},
    publisher = {{IEEE/ACM}},
    year = {2023},
}

Installation

First, download missing benchmark architectures and running logs from https://doi.org/10.6084/m9.figshare.21973529.v1:

  • Following the link, you will find two zip files.

  • Download model_zoo.zip, and unzip it at the project root folder. You will get a new folder model_zoo/.

  • Download results.zip, and unzip it. You will find two folders: results/ and results_digest/. Move both folders to be under the project root .

The tool is expected to run on Linux + Python + PyTorch platform with around 500 GB storage space. Reference installation commands:

apt-get install -y python3.6 python3-pip cmake
pip3 install --upgrade pip
pip install -r requirements.txt

GPU support only requires minor changes in the code (place tensor.cuda() at necessary places). With GPU and these minor changes, times of speed-ups are expected but we haven't tested yet.

For following commands, the working directory is the root directory of this project. All new results are output to results/ folder.

Result Reproduction

This code base contains (1) the official implementation of the RANUM framework for assuring the numerical reliability of deep neural networks; (2) the reference running logs; and (3) detail case-level results for the empirical study.

How to replicate and reproduce the results?

(1) From the official RANUM framework, users can reproduce all technical experimental results in the RANUM paper.

Hardware requirements: To replicate the results from scratch, we need a CPU server with around 500G storage space, to save all raw experimental data.

Hardware requirements: A Linux environment, Python, and PyTorch are required. We also provide a Dockerfile, so you can run these commands to get results from scratch.

docker build -t ranum:v1
docker run -it ranum:v1
sh runall.sh

Then, all tables in the experimental section can be found in /srv/results/results.txt. It may take 4-5 days to finish runing.

(2) From the reference running logs, users can reproduce all technical experimental results in the RANUM paper without running from scratch.

Hardware requirements: We need a CPU server without extra storage space, to save all raw experimental data.

Hardware requirements: A Linux environment, Python, and PyTorch are required.

docker build -t ranum:v1
docker run -it ranum:v1
ipython evaluate/texify/final_texify.py --result_folder results_digest

All results will be printed in console in seconds.

How to parse the output results?

Both results in /srv/results/results.txt and results in console after running final_texify.py have the same format.

  1. ==================== detection ==================== leads the detail results for static defect detection. This table is no longer shown in the paper, but it answers RQ1 and the statistics (total detection time and average detection time per case) are described in the paper.

  2. ==================== Failure-Exhibiting Unit Test ==================== leads the detail results for unit test generation (Table 1) in the paper.

  3. ==================== Failure-Exhibiting System Test ==================== leads the detail results for system test generation (Table 2) in the paper. The statistics in the end are number of successful cases and total used time, which are described in the paper for RQ2.

  4. ==================== Precondition-Fix ==================== leads the overview results for precondition-fix suggestion in Table 3. The statistics are used to answer RQ3. In addition, we report the number of iterations in solving the optimization problem. These statistics are not reported in the paper, but they are mainly for diagnosis.

(3) Detail case-level log for the empirical study

The detail empirical study log (along with the repository URLs) is in empirical_study/precond/study_detail.txt and the statistics are in empirical_study/precond/study_statistics.csv. The human-friendly precondition-fixes generated by RANUM are in empirical_study/precond, where all means preconditions can be imposed on both initial input nodes and weight nodes; input means preconditions can be imposed only on initial input nodes; weight means preconditions can be imposed only on weight nodes; and immediate means preconditions can be imposed on the immediate vulnerable operators.

The list of DEBAR supported DNN operator types is in empirical_study/debar_abstraction_list.txt. The list of RANUM supported DNN operator types can be counted from interp/interp_operator.py(inter_* methods in Interpreter class).

Commands for Individual Tasks

Specifically, we show the individual commands for each stage task.

0. Static Defect Detection

RANUM: ipython evaluate/bug_verifier.py

The running results of DEBAR are from GRIST (Yan et al) paper and DEBAR (Zhang et al) repository.

1. Unit Test Generation

RANUM

ipython evaluate/robust_inducing_inst_generator.py - generate failure-inducing intervals

Random

ipython experiments/unittest/err_trigger.py random - generate 1000 distinct unit tests from these intervals

2. System Test Generation

System test generation relies on the generated unit tests, so please run unit test generation first

RANUM

ipython evaluate/train_inst_generator.py ranum

Random

ipython evaluate/train_inst_generator.py random

RANUM for inference + Random for system

ipython evaluate/train_inst_generator.py ranum_p_random

Random for inference + RANUM for system

ipython evaluate/train_inst_generator.py random_p_ranum

3. Precondition-Fix Generation

a. Precondition on Weight + Input Nodes

RANUM

ipython evaluate/precond_generator.py ranum all

RANUM Expand

ipython evaluate/precond_generator.py ranumexpand all

Gradient Descent

ipython evaluate/precond_generator.py gd all

b. Precondition on Weight Nodes

RANUM

ipython evaluate/precond_generator.py ranum weight

RANUM Expand (RANUM-E)

ipython evaluate/precond_generator.py ranumexpand weight

Gradient Descent

ipython evaluate/precond_generator.py gd weight

c. Precondition on Input Nodes

RANUM

ipython evaluate/precond_generator.py ranum input

RANUM Expand (RANUM-E)

ipython evaluate/precond_generator.py ranumexpand input

Gradient Descent

ipython evaluate/precond_generator.py gd input

d. Precondition on Immediately Vulnerable Operators

ipython evaluate/precond_generator_immediate.py

Other Resources

  • Folder model_zoo/grist_protobufs_onnx includes ONNX-format DNN architecture files for the 79 real-world defect programs in GRIST. These files extract core DNN architectures in the original programs. Hope these help the evaluation and future development of defect-fixing methods by disentangling methods from concrete implementation codes and boilerplates.

  • Folder results/GRIST_log contains our running log for the GRIST tool on the GRIST benchmark. Hope these provide detailed information and cross-validation sources for this widely-known baseline.

About

[ICSE 2023] Differentiable interpretation and failure-inducing input generation for neural network numerical bugs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages