Running Sample Code on Amazon EC2

Step 1: Preparing and launching your Amazon EC2 instance

One of the easier ways of getting started with the Graph Challenge is by grabbing the sample serial implementations of the two algorithms from GitHub and running them on an Amazon EC2 instance. As Amazon is hosting all of the data sets on S3, retrieval and use in EC2 is easy and painless, and even the AWS Free Tier instances are capable of running the sample code on some of the smaller data sets.

To get started with Graph Challenge on AWS, spin up a Linux instance; though any flavor of Linux which provides the required Python packages will work, this example will use Ubuntu's latest long-term support release (16.04, AMI ID: ami-f4cc1de2). The example below uses the Python AWS CLI tools (awscli), but the same result can be achieved using the EC2 console via the web interface, provided the correct instance type and AMI shown below are selected.

First, create a public/private key pair for authentication into your soon-to-be newly created EC2 instances. You can skip this step if you already have a SSH key to use for this purpose.

[user@desktop]$ ssh-keygen -f aws -C aws
[user@desktop]$ aws ec2 import-key-pair --key-name graphchallenge-ec2 --public-key-material "$(cat aws.pub | cut -f2)"

Now, use this key to create a new EC2 instance using the 'aws ec2 run-instances' command. An instance type of t2.micro (AWS Free Tier) is shown below, but you will likely want to choose an instance type with a larger memory footprint in order to run most of the GraphChallenge code.

[user@desktop]$ aws ec2 run-instances --image-id ami-f4cc1de2 --key-name graphchallenge-ec2 --instance-type t2.micro --associate-public-ip-address

You can issue the 'aws ec2 describe-instances' command to monitor the progress of the instance creation process and retrieve the external (public) IP address assigned by Amazon to your instance.

Once the instance is started and running, you will need to update the AWS Firewall to permit SSH traffic into your instances in order for you to be able to login. You can replace 0.0.0.0/0 below with your IP address or subnet mask if you want an additional layer of network security.

[user@desktop]$ aws ec2 authorize-security-group-ingress --group-name default --protocol tcp --port 22 --cidr 0.0.0.0/0

With that you should be ready to login to your EC2 instance to download and run the GraphChallenge serial implementations. You can easily use the Amazon Web Console to perform the above actions provided all three steps are followed: key generation, instance creation with the specified AMI and security group policy to allow incoming SSH connections.

Step 2: Log in to your EC2 instance and download the pre-requisite software packages for the GraphChallenge Serial Implementations.

Note: To retrieve the public IP address of your instance, you can issue the following EC2 CLI command - alternatively, it should be visible in the AWS Console under 'Instances'.

[user@desktop]$ aws ec2 describe-instances | grep PublicIpAddress

Using the key you imported in step 1 (we called the file 'aws'), connect to your EC2 instance with the IP address retrieved above. Replace aws.ip.addr.ess below with the public IP address assigned to your instance.

[user@desktop]$ ssh -i aws ubuntu@aws.ip.addr.ess

Once logged in, download all of the prerequisite software required to run the graph isomorphism and block partitioning codes. The commands below reflect the software required to run the available implementations in all languages provided (Octave/MATLAB, Python, Julia, C++) - you might only require a subset of these to work in your desired language.

ubuntu@ip-aws.ip.addr.ess:~$ sudo apt update
ubuntu@ip-aws.ip.addr.ess:~$ sudo apt install python-pandas octave julia python-munkres libboost-all-dev git awscli make

Download the source code from the public 'graphchallenge' GitHub repository:

ubuntu@ip-aws.ip.addr.ess:~$ git clone https://github.com/graphchallenge/GraphChallenge

Step 3: Run the Graph Challenge example serial implementation code of your choice against one of the datasets in Amazon S3.

You can get the list of available data sets provided as part of the Graph Challenge on the data sets page. Comprehensive instructions for downloading and parsing data sets may also be found there.

Depending on which of the Graph Challenge codes you wish to run, download a dataset of your choice into the appropriate data directory using either standard Linux tools (wget shown below) or the Amazon CLI.

  • Example using wget to retrieve the 'amazon0302' dataset for the Subgraph Isomorphism Challenge

    ubuntu@ip-aws.ip.addr.ess:~$ cd GraphChallenge/SubgraphIsomorphism/data
    ubuntu@ip-aws.ip.addr.ess:~$ wget https://graphchallenge.s3.amazonaws.com/snap/amazon0302/amazon0302_adj.tsv
    
  • Example using 'aws s3 cp' to retrieve the 'amazon0302' dataset for the Subgraph Isomorphism Challenge

    ubuntu@ip-aws.ip.addr.ess:~$ cd GraphChallenge/SubgraphIsomorphism/data
    ubuntu@ip-aws.ip.addr.ess:~$ aws s3 cp s3://graphchallenge/snap/amazon0302/amazon0302_adj.tsv .
    

To launch any of the sample codes provided for the Subgraph Isomorphism Graph Challenge, using either the triangle counting and k-truss approaches, any of the datasets provided in the "SNAP Datasets" or "Synthetic Datasets" can be used.

The Stochastic Block Partitioning Graph Challenge has datasets with known truth partitions provided in order for participants to evaluate the accuracy of graph partitioning algorithms developed for the challenge. These datasets are provided in the "Datasets with Known Truth Partitions" section of this website.