Data Sets

Amazon is making the Graph Challenge data sets available to the community free of charge as part of the AWS Public Data Sets program. The data is being presented in several file formats, and there are a variety of ways to access it.

Data is available in the 'graphchallenge' Amazon S3 Bucket.  (https://graphchallenge.s3.amazonaws.com)

Real-world graphs from Stanford’s Large Network Dataset Collection (https://snap.stanford.edu/data/) as well as synthetic data at various scales generated using the scalable Graph500 Kronecker generator (http://www.graph500.org/specifications#sec-3_3) are being provided.

Each of the SNAP datasets is provided in both TSV (Tab-Separated Values) and MMIO (Matrix Market I/O) formats.  You can access any desired files directly by crafting a HTTPS or AWS CLI URL using the following URL suffixes and instructions below.

SNAP Datasets (click to expand)
Name Description
amazon0302 Amazon product co-purchasing network from March 2 2003
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
amazon0312 Amazon product co-purchasing network from March 12 2003
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
amazon0505 Amazon product co-purchasing network from May 5 2003
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
amazon0601 Amazon product co-purchasing network from June 1 2003
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
as20000102 Autonomous Systems graph from January 02 2000
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
as-caida20071105 CAIDA AS graph from November 5 2007
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
ca-AstroPh Collaboration network of Arxiv Astro Physics
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
ca-CondMat Collaboration network of Arxiv Condensed Matter
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
ca-GrQc Collaboration network of Arxiv General Relativity
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
ca-HepPh Collaboration network of Arxiv High Energy Physics
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
ca-HepTh Collaboration network of Arxiv High Energy Physics Theory
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
cit-HepPh Arxiv High Energy Physics paper citation network
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
cit-HepTh Arxiv High Energy Physics Theory paper citation network
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
cit-Patents Citation network among US Patents
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
email-Enron Email communication network from Enron
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
email-EuAll Email network from a EU research institution
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
facebook_combined Edges from all Facebook ego networks combined
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
flickrEdges Image relationships on Flickr (edges only)
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
Friendster Friendster social network graph
  Adjacency TSV Incidence TSV
loc-brightkite_edges Brightkite location based online social network
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
loc-gowalla_edges Gowalla location based online social network
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon1_010331 AS peering information inferred from Oregon route-views from March 31 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon1_010407 AS peering information inferred from Oregon route-views from April 7 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon1_010414 AS peering information inferred from Oregon route-views from April 14 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon1_010421 AS peering information inferred from Oregon route-views from April 21 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon1_010428 AS peering information inferred from Oregon route-views from April 28 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon1_010505 AS peering information inferred from Oregon route-views from May 05 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon1_010512 AS peering information inferred from Oregon route-views from May 12 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon1_010519 AS peering information inferred from Oregon route-views from May 19 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon1_010526 AS peering information inferred from Oregon route-views from May 26 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon2_010331 AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from March 31 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon2_010407 AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from April 7 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon2_010414 AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from April 14 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon2_010421 AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from April 21 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon2_010428 AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from April 28 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon2_010505 AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from May 05 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon2_010512 AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from May 12 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon2_010519 AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from May 19 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
oregon2_010526 AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from May 26 2001
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
p2p-Gnutella04 Gnutella peer to peer network from August 4 2002
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
p2p-Gnutella05 Gnutella peer to peer network from August 5 2002
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
p2p-Gnutella06 Gnutella peer to peer network from August 6 2002
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
p2p-Gnutella08 Gnutella peer to peer network from August 8 2002
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
p2p-Gnutella09 Gnutella peer to peer network from August 9 2002
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
p2p-Gnutella24 Gnutella peer to peer network from August 24 2002
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
p2p-Gnutella25 Gnutella peer to peer network from August 25 2002
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
p2p-Gnutella30 Gnutella peer to peer network from August 30 2002
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
p2p-Gnutella31 Gnutella peer to peer network from August 31 2002
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
roadNet-CA Road network of California
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
roadNet-PA Road network of Pennsylvania
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
roadNet-TX Road network of Texas
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
soc-Epinions1 Who-trusts-whom network of Epinions.com
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
soc-Slashdot0811 Slashdot social network from November 2008
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
soc-Slashdot0902 Slashdot social network from February 2009
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO

 

Synthetic Datasets (click to expand)
Name Description
graph500-scale18-ef16 Synthetic graph500 network of scale 18 (262144x262144, 4194304 edges)
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
graph500-scale19-ef16 Synthetic graph500 network of scale 19 (524288x524288, 8388608 edges)
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
graph500-scale20-ef16 Synthetic graph500 network of scale 20 (1048576x1048576, 16777216 edges)
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
graph500-scale21-ef16 Synthetic graph500 network of scale 21 (2097152x2097152, 33554432 edges)
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
graph500-scale22-ef16 Synthetic graph500 network of scale 22 (4194304x4194304, 67108864 edges)
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
graph500-scale23-ef16 Synthetic graph500 network of scale 23 (8388608x8388608, 134217728 edges)
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
graph500-scale24-ef16 Synthetic graph500 network of scale 24 (16777216x16777216), 268435456 edges)
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO
graph500-scale25-ef16 Synthetic graph500 network of scale 25 (33554432x33554432), 536870912 edges)
  Adjacency TSV Incidence TSV Adjacency MMIO Incidence MMIO

Provided below are a set of synthetic datasets with known truth partitions for use in the Stochastic Block Partitioning Graph Challenge.

Partition Challenge Datasets with Known Truth Partitions (click to expand)
Name Description
Static Graphs (small) Small Static Graphs with known truth for the Stochastic Block Partitioning Challenge
  50 nodes 100 nodes 500 nodes 1000 nodes 5000 nodes
Static Graphs (large) Large Static Graphs with known truth for the Stochastic Block Partitioning Challenge
  20000 nodes 50000 nodes 500000 nodes 2000000 nodes 5000000 nodes
Streaming - Edge Sampling (small) Small Streaming Graphs with known truth for the Stochastic Block Partitioning Challenge
  50 nodes 100 nodes 500 nodes 1000 nodes 5000 nodes
Streaming - Edge Sampling (large) Large Streaming Graphs with known truth for the Stochastic Block Partitioning Challenge
  20000 nodes 50000 nodes 500000 nodes 2000000 nodes 5000000 nodes
Streaming - Snowball Sampling (small) Small Streaming Graphs with known truth for the Stochastic Block Partitioning Challenge
  50 nodes 100 nodes 500 nodes 1000 nodes 5000 nodes
Streaming - Snowball Sampling (large) Large Streaming Graphs with known truth for the Stochastic Block Partitioning Challenge
  20000 nodes 50000 nodes 500000 nodes 2000000 nodes 5000000 nodes

Graph data available in the Graph Challenge Amazon S3 bucket uses the following formats and conventions:

   <dataset-name>_adj.tsv
   (Row, Col, Value) tuple describing the adjacency matrix of the graph in tab separated format.
   Adjacency matrix is of size Num_vertices x Num_vertices

   <dataset-name>_inc.tsv
   (Row, Col, Value) tuple describing the incidence matrix of the graph in tab separated format.
   Adjacency matrix is of size Edges x Nun_vertices (Note that some author refer to a transpose of this version)

   <dataset-name>_adj.mmio - adjacency matrix of the graph in MMIO format
   <dataset-name>_inc.mmio - incidence matrix of the graph in MMIO format

Details and readers for the MMIO format are available here : http://math.nist.gov/MatrixMarket/

Indexing note: All matrices use 1-based indexing

Naming conventions for files provided with each SNAP dataset are as follows:

  • Tab Separated Values – edge list, adjacency (_adj) and incidence (_inc) matrices.

        s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX].tsv
        s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_adj.tsv
        s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_inc.tsv
  • Matrix Market I/O – edge list, adjacency (_adj) and incidence (_inc) matrices.

        s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX].mmio
        s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_adj.mmio
        s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_inc.mmio

The format of URLs for the synthetic Graph500 data is:

  • Tab Separated Values – edge list, adjacency (_adj) and incidence (_inc) matrices.

        s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16.tsv
        s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_adj.tsv
        s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_inc.tsv
  • Matrix Market I/O – edge list, adjacency (_adj) and incidence (_inc) matrices.

        s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16.mmio
        s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_adj.mmio
        s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_inc.mmio

Files can be individually retrieved using a web browser or command-line tools using the URL scheme for Amazon S3 buckets, for example:

Adjacency Matrix – Tab Separated Values (amazon0302)

https://graphchallenge.s3.amazonaws.com/snap/amazon0302/amazon0302_adj.tsv

Adjacency Matrix – Matrix Market I/O (Synthetic Graph500 network, scale 24)

https://graphchallenge.s3.amazonaws.com/synthetic/graph500-scale24-ef16/...

Using either the AWS CLI tools (awscli) or AWS SDK:

To view all available files in the 'graphchallenge' bucket:

aws s3 ls s3://graphchallenge/

To download a particular dataset from Amazon S3 to local disk:

aws s3 cp s3://graphchallenge/friendster/ ./friendster/ --recursive

Datasets may also be downloaded one file at a time using the HTTPS URL scheme outlined above.