Skip to content

implementation of GRAIL: a simple indexing scheme, for fast and scalable reachability testing in very large graphs

Notifications You must be signed in to change notification settings

RocinanteExp/GRAIL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project: GRAIL

Group # 27

students: Giovanni Tangredi (s276086), Francesco Xia (s276509)

Suggested action for compiling and executing the project*

* this project has been tested on a Linux machine.

To compile and run the program:

make
./bin/main <graph_file_path> <num_intervals> <query_file_path> [-l] [-q]

OPTIONS
	-l     saves generated labels to "test/output/labels_out.txt"
	-q     saves query results to "test/output/queries_out.txt"

To run the test suites:

make test_node
./bin/test

The other rules available rules are: test_label, test_graph, test_query, test_bitmap

NOTE: the TEST flag must be set to 0 when the program is executed in normal mode. Otherwise, the labels are not generated randomly. Set TEST to 1 when running the test suites.

Project Stucture

SDPProjectQ2
├── README.md
├── bin
├── gen_query.c
├── include
│   ├── bitmap.h
│   ├── constants.h
│   ├── graph.h
│   ├── label.h
│   ├── menu.h
│   ├── query.h
│   └── time_tracker.h
├── makefile
├── src
│   ├── bitmap.c
│   ├── graph.c
│   ├── label.c
│   ├── main.c
│   ├── menu.c
│   ├── query.c
│   └── time_tracker.c
└── test
    ├── input
    │   ├── grafo20.gra
    │   ├── grafo20.png
    │   ├── grafo20_25.que
    │   └── grafo_con_rango.png
    │ 
    │       
    ├── output
    │   ├── labels_out.txt
    │   └── queries_out.txt
    └── tests
        ├── bitmap_test.c
        ├── graph_test.c
        ├── label_test.c
        ├── node_test.c
        └── query_test.c

This repository has been divided as follows:

  • the bin folder contains the executables.
  • the src folder contains the source files.
  • the test folder is further divided in:
    • the input folder contains an example of a graph and a set of queries.
    • the output folder contains the output generated by the program (i.e., the labels, the query results, and copy of the graph).
    • the tests folder contains the test source files.
  • the include folder contains the header files.

The user can customize the program's behavior by modifying the values of the flags found in constant.h:

  • DEBUG: verbose mode.
  • TEST: if set to 1, the labels are not randomly generated.
  • MAX_THREADS_GRAPH: number of running threads used during the graph generation phase. MUST be > 0.
  • MAX_THREADS_QUERY: number of running threads used to solve the queries. MUST be > 0.
  • ALL_NODES: prints the graph nodes on standard output.

The tests were created using the Check.h framework.

Program output

If the option -q is passed, the program creates a file called queries_out.txt, where each line represents the result of a query: source_id destination_id {0/1} (unreachable/reachable).

If the option -q is missing, the user can interactively query the reachability of two nodes.

Similarly passing the option -l, the program creates a file named labels_out.txt, with the following format: node_id: [l1_l, l1_r] [l2_l, l2_r] ...

Data Structures

The graph is represented using a data structure similar to an adjacency list. Specifically, we used a vector of Node pointers to depict the graph, where each node can be indexed through its node_id. Furthermore, each node stores:

  • a list of its outgoing edges.
  • a vector of intervals of size num_intervals

The queries are stored in a static struct called queries in query.c. Specifically:

  • a single query is represented as a route (route.src and route.dst);
  • The result of a particular query is stored in a fixed position of a bitmap, whose position is determined by the location of the query inside the file query_file_path. Each bit indicates either non-reachability/reachability between two nodes (0/1). The i-th bit of the bitmap represents the state of i-th query (i.e., the query found at i-th lines of the file <query_file>);

Strategy

The generation of the graph follows these main ideas:

  • the generation is done concurrently by running MAX_THREADS_GRAPH number of threads
  • at each iteration, each thread
    • reads x lines
    • increment shared counter of read lines (see p_curr_iteration)
    • parse those lines (e.g. find the node_id of the current node and of its children)
    • allocates a single chunk of memory, big enough to store x Nodes (see node_create_multiple)
    • save each created Node at graph->nodes[<node_id>] = node;

repeat until end of file.

The intervals are generated concurrently by running n threads, where n is equal to the parameter num_intervals. The interval generated by i-th thread is saved in the corresponding Node.intervals[i-th];

The label generator is an implementation of the algorithm 1 found in [1].

Regarding the queries, they are equally divided in n blocks, where n is empirically set to 4 (see constants.h). Each block is then assigned to a thread, which task is to find the non-reachability/reachability of every query in its block. The query solver is an implementation of algorithm 2 found in [1].

References:

[1]: Yıldırım, H., Chaoji, V. & Zaki, M.J. GRAIL: a scalable index for reachability queries in very large graphs. The VLDB Journal 21, 509–534 (2012). https://doi.org/10.1007/s00778-011-0256-4

About

implementation of GRAIL: a simple indexing scheme, for fast and scalable reachability testing in very large graphs

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 97.9%
  • Makefile 2.1%