Skip to content

Mining GitHub projects to learn about open source software development communities and practices.


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation


Mining GitHub projects to learn about open source software development communities and practices. To view a demo of this project please see


  1. Xubuntu or some similar Ubuntu variant
  2. Ruby version 1.9.3
  3. Apache2
  4. MySQL version 14.14, distribution 5.5.38
  5. PHP version 5.5.9
  6. For required gems (see Required Gems File)

Current Development Prerequisites

  1. Java version 1.7.0_51
  2. Ant version 1.9.3
  3. Eclipse Luna
  4. Eclipse ADT plugin Note Eclipse ADT with SDK has issues installing the import plug-in.
  5. Eclipse metrics plugin, version 1.3.8
  6. Eclipse Metrics xml Reader
  7. Eclipse Import tool
  8. Maven version 2.2.1, used to create eclipse project files
  9. Python version 2.7.6, required by Eclipse Metrics XML reader
  10. xvfb-run used for headless execution of metrics collection.

Setup notes

Please see the project setup notes for a more detailed explanation on how to setup the project.

Setup Ruby

  1. Install ruby1.9.3

     sudo apt-get install ruby1.9.3
  2. Install the required Gems. An example would be:

     gem install mysql

Attempting to migrate from 1.9.3 to 2.0

At least for scraping since it keeps giving segfaults.

  1. Install ruby dev for ruby2.0

    sudo apt-get install ruby2.0-dev

  2. Install the gems

    gem2.0 install mysql gem2.0 install json gem2.0 install github_api

Setup the Database

In order to store the data you must use mysql

  1. Install MySQL and enter the root user's password

     sudo apt-get install mysql-server-5.5 mysql-common mysql-client-5.5
  2. Log into the mysql server.

     mysql -u <username> -p
  3. Create github_data database using the create file.

     source ./doc/database.sql
  4. Create project_stats database using the create file

     source ./doc/stats_database.sql
  5. ** Note, the following is current development**. Create metrics database using the create file

     source ./doc/metrics_db.sql
  6. Exit the MySQL server


Setup the Web Server

  1. Install Apache2

     sudo apt-get install apache2
  2. Restart Apache2

     sudo /etc/init.d/apache2 restart
  3. Install PHP5

     sudo apt-get install php5 libapache2-mod-php5 php5-mysql
  4. Clone project into /var/www/html/ or set up virtual site

  5. Changing the api root url. Open the javascript graphing file and change the following line to point the api folder within the project.

     var rootURL = "http://git_data.dom/api";
  6. Go to page http://localhost/GitView/index.php

  7. Please note, depending on whether the project is placed in the /var/www/html/ or is a virtual site the relative paths to the resources may need to change. The paths are currently set up for a virtual server. A set up that places the project directly into /var/www/html/ will require to adjust:

Usually this just requires changing a path like:

    <link href="../css/smoothness/jquery-ui-1.10.3.custom.min.css" rel="stylesheet"/>


    <link href="./css/smoothness/jquery-ui-1.10.3.custom.min.css" rel="stylesheet"/>

Setting up Virtual Site

Note not required if the project is cloned to /var/www/html/

  1. Open /etc/apache2/sites-enabled/000-default.conf and add the following to the end of the file.

     <VirtualHost *:80>
         ServerAdmin test@git_data.dom
         DocumentRoot "<project_location>"
         ServerName git_data.dom
         ServerAlias git_data.dom
         ErrorLog "/var/log/apache2/git_data.dom-error_log"
         CustomLog "/var/log/apache2/git_data.dom-access_log" common
         <Directory "<project_location>">
                 DirectoryIndex index.php
                 AddHandler php5-script php
                 Options -Indexes +FollowSymLinks +MultiViews
                 AllowOverride All
                 Order allow,deny
                 allow from all
                 Require all granted
  2. Modify the <project_location> field in DocumentRoot and Directory to the location of the project.

Cleaning web server repository

  1. Apache by default will show the directory listings of the folder for the website. To remove this open /etc/apache2/sites-enabled/000-default.conf

  2. Add in the following (if you followed the steps for creating the virtual site only modify the Options line):

     <Directory /var/www/html/GitView/>
             Options -Indexes +FollowSymLinks +MultiViews
             AllowOverride all
             Order allow,deny
             allow from all
  3. Now that the directories are not displayed by default we now want to block the directories that are not required. The following is a list of the folders that require r-x permission for the web server to work:

    • api
    • css
    • img
    • inc
    • js
    • src
    • templates
  4. All other folder's can be removed or have their permissions revoked for both group and other users.

     sudo chmod go-rx <folder name>
  5. Finally, the two files required in the root directory are:

    • add_new.php
    • index.php
  6. All other files can be deleted or the permissions can be revoked for both group and other users.

     sudo chmod go-rx <file name>

Collecting Data

This section outlines how to collect and then parse the data to show on the website tool.

Collecting Data from GitHub

Please note this script executed in this section may take a very long time (depending on the size of the project).

  1. Run the scraper script on the desired project passing the repository owner and the repository's name as arguments. For example:

     bash scraper ACRA acra

Parsing the Collected Data

Please note this section relies on the completion of the previous section for the same repository. In order to parse ACRA/acra it must first be called with the scraper script.

Please note this script executed in this section may take a very long time (depending on the size of the project).

  1. Execute the parser script to actually store the values in the database.

     bash parser ACRA acra false
  2. Proceed to http://localhost/GitView/index.php which should now be displaying the newly parsed project. Note this can be done before the parser is finished since the changes will be visible on the site immediately.

Current Work

This section outlines how to setup the metrics collecting script.

Installing Dependencies

  1. To install Oracle's Java, please follow this guide

  2. Install Maven

     sudo apt-get install maven2
  3. Get Eclipse Luna and extract it to a preferred location.

  4. Installing the Metrics plug-in for Eclipse by adding the source:
  5. Install Python

     sudo apt-get install python2.7
  6. Download the Eclipse metrics XML reader

Installing ADT plug-in for Eclipse

  1. Installing the ADT plug-in for Eclipse by adding the source:
  2. Re-open eclipse which will prompt you to install the Android SDK.

  3. Open the Android SDK Manager

  4. Select all the required SDK Platform version. If an older version of the target application used an earlier version of the Android SDK then that version will be required as well. The most flexible method is to install every Android version. Note Downloading and install may take sometime.

Installing Import plug-in for Eclipse

  1. Clone the repository

  2. Follow the instructions on installing

Collecting Metrics

  1. Open the metric_compiler script and adjust the following variables:

    • ECLIPSE_LOCATION the location where the eclipse binary is located.
    • WORKSPACE the location of the workspace to use.
    • SCRIPT_WORK_DIR the location to create temporary files.
    • TEMPLATE_BUILD_FILE_LOCATION the location of the template build.xml file.
    • XML_CONVERTER_LOCATION the location of the clone of xml to csv program.
  2. Open the metrics_calc.rb script and adjust the following:

    • project_dir is the location the project will be cloned to and each commit is checked out.
    • output_dir is the directory to output the metrics csv files to.
    • log_file is the directory where the log files would be placed.
    • log whether to ouput the log file or not.
    • headless whether to run with xvfb (a virtualized graphical environment) or not.
    • metrics_compiler the location of the metrics compiler shell script.
  3. Execute the script to collect metrics for all stored repositories with:

     ruby metrics_calc.rb
  4. Alternatively, you can use specifically identify which repository to collect metrics for using:

     ruby metrics_calc.rb ACRA acra
  5. This can take a very long time and make it harder to use the computer is running on (eclipse will open and take focus and then close).

  • Note this can also produce a large number of log and output files so it is wise to direct each of them to separate empty directories.