Data Sets
Amazon is making the Graph Challenge data sets available to the community free of charge as part of the AWS Public Data Sets program. The data is being presented in several file formats, and there are a variety of ways to access it.
Data is available in the 'graphchallenge' Amazon S3 Bucket. (https://graphchallenge.s3.amazonaws.com)
Anonymized Network Sensing Graph Challenge
Official 2024 Anonymized Network Sensing Challenge (click to expand) Synthetic pcap files created from GraphBLAS matrices generated with randomized source and destination addresses, using static constants for all other fields in the IP and TCP headers. | |||||||||||
Name | Description / Download Link | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Random PCAP | PCAP file containing 230 synthetic "packets" using random data. | ||||||||||
Download synthetic random PCAP (13 GB, pcap) | Download synthetic random GraphBLAS (8.1 GB, tar) | ||||||||||
Caida PCAP | PCAP files containing 230 "packets" generated using anonymized data from the CAIDA network telescope. | Available | |||||||||
Synthetic Sparse Deep Neural Network data for the Sparse DNN Graph Challenge
Official 2019 Sparse Deep Neural Network Challenge (click to expand) Synthetic DNNs created using RadiX-Net with varying number of neurons and layers. Truth categories for MNIST are included for performing inference using DNN with specific numbers of layers. | |||||||||||
Name | Description / Download Link | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1024 Neurons | Sparse Deep Neural Networks with 1024 Neurons per layer (small, 176 MB) | Download | |||||||||
120 layers - Categories | 480 layers - Categories | 1920 layers - Categories | |||||||||
4096 Neurons | Sparse Deep Neural Networks with 4096 neurons per layer (medium, 800 MB) | Download | |||||||||
120 layers - Categories | 480 layers - Categories | 1920 layers - Categories | |||||||||
16384 Neurons | Sparse Deep Neural Networks with 16384 neurons per layer (large, 3.6 GB) | Download | |||||||||
120 layers - Categories | 480 layers - Categories | 1920 layers - Categories | |||||||||
65536 Neurons | Sparse Deep Neural Networks with 65536 neurons per layer (very large, 16.3 GB) | Download | |||||||||
120 layers - categories | 480 layers - categories | 1920 layers - categories |
Sparse DNNs generated using interpolated sparse versions of images in MNIST corpus resized to produce neural networks of varying dimensions. | |||||||||||
Name | Description / Download Link | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
MNIST-derived Networks | |||||||||||
32x32 (1024 neurons) | 64x64 (4096 neurons) | 128x128 (16384 neurons) | 256x256 (65535 neurons) |
Real and Synthetic Data for the Static Graph Challenge: Subgraph Isomorphism
Real-world graphs from Stanford’s Large Network Dataset Collection (https://snap.stanford.edu/data/) as well as synthetic data at various scales generated using the scalable Graph500 Kronecker generator (http://www.graph500.org/specifications#sec-3_3) are being provided.
Each of the SNAP datasets is provided in both TSV (Tab-Separated Values) and MMIO (Matrix Market I/O) formats. You can access any desired files directly by crafting a HTTPS or AWS CLI URL using the following URL suffixes and instructions below.
A csv file with metadata about the SNAP datasets below is available here : SNAP Metadata
Metadata includes number of edges, nodes and triangles.
SNAP Datasets (click to expand) | |||||
Name | Description | ||||
---|---|---|---|---|---|
amazon0302 | Amazon product co-purchasing network from March 2 2003 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
amazon0312 | Amazon product co-purchasing network from March 12 2003 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
amazon0505 | Amazon product co-purchasing network from May 5 2003 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
amazon0601 | Amazon product co-purchasing network from June 1 2003 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
as20000102 | Autonomous Systems graph from January 02 2000 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
as-caida20071105 | CAIDA AS graph from November 5 2007 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
ca-AstroPh | Collaboration network of Arxiv Astro Physics | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
ca-CondMat | Collaboration network of Arxiv Condensed Matter | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
ca-GrQc | Collaboration network of Arxiv General Relativity | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
ca-HepPh | Collaboration network of Arxiv High Energy Physics | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
ca-HepTh | Collaboration network of Arxiv High Energy Physics Theory | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
cit-HepPh | Arxiv High Energy Physics paper citation network | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
cit-HepTh | Arxiv High Energy Physics Theory paper citation network | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
cit-Patents | Citation network among US Patents | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
email-Enron | Email communication network from Enron | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
email-EuAll | Email network from a EU research institution | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
facebook_combined | Edges from all Facebook ego networks combined | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
flickrEdges | Image relationships on Flickr (edges only) | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
Friendster | Friendster social network graph | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
loc-brightkite_edges | Brightkite location based online social network | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
loc-gowalla_edges | Gowalla location based online social network | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon1_010331 | AS peering information inferred from Oregon route-views from March 31 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon1_010407 | AS peering information inferred from Oregon route-views from April 7 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon1_010414 | AS peering information inferred from Oregon route-views from April 14 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon1_010421 | AS peering information inferred from Oregon route-views from April 21 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon1_010428 | AS peering information inferred from Oregon route-views from April 28 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon1_010505 | AS peering information inferred from Oregon route-views from May 05 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon1_010512 | AS peering information inferred from Oregon route-views from May 12 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon1_010519 | AS peering information inferred from Oregon route-views from May 19 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon1_010526 | AS peering information inferred from Oregon route-views from May 26 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon2_010331 | AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from March 31 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon2_010407 | AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from April 7 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon2_010414 | AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from April 14 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon2_010421 | AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from April 21 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon2_010428 | AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from April 28 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon2_010505 | AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from May 05 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon2_010512 | AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from May 12 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon2_010519 | AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from May 19 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
oregon2_010526 | AS peering information inferred from Oregon route-views, Looking glass data, and Routing registry, from May 26 2001 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
p2p-Gnutella04 | Gnutella peer to peer network from August 4 2002 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
p2p-Gnutella05 | Gnutella peer to peer network from August 5 2002 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
p2p-Gnutella06 | Gnutella peer to peer network from August 6 2002 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
p2p-Gnutella08 | Gnutella peer to peer network from August 8 2002 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
p2p-Gnutella09 | Gnutella peer to peer network from August 9 2002 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
p2p-Gnutella24 | Gnutella peer to peer network from August 24 2002 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
p2p-Gnutella25 | Gnutella peer to peer network from August 25 2002 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
p2p-Gnutella30 | Gnutella peer to peer network from August 30 2002 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
p2p-Gnutella31 | Gnutella peer to peer network from August 31 2002 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
roadNet-CA | Road network of California | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
roadNet-PA | Road network of Pennsylvania | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
roadNet-TX | Road network of Texas | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
soc-Epinions1 | Who-trusts-whom network of Epinions.com | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
soc-Slashdot0811 | Slashdot social network from November 2008 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
soc-Slashdot0902 | Slashdot social network from February 2009 | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO |
Source for Synthetic Kronecker graph generator available : Kronecker Graphs
Metadata table for exact Kronecker graphs available : Kronecker Graphs (2018 March 16)
Exact Kronecker Graph generator paper: Design, Generation, and Validation of Extreme Scale Power-Law Graphs
Synthetic Kronecker Graphs with many triangles:
Synthetic Kronecker Graphs with some triangles:
Protein k-mer graphs generated using data from GenBank: https://www.ncbi.nlm.nih.gov/genbank/ are available below. Nodes of the graph represent segments of amino acids.
Protein k-mer graphs(click to expand) | |||||
Name | Description | ||||
---|---|---|---|---|---|
Graph 1 | Num. vertices : 170728175 Edge count : 360585172 | ||||
Adjacency TSV | |||||
Graph 2 | Num. vertices : 139353211 Edge count : 297829984 | ||||
Adjacency TSV | |||||
Graph 3 | Nun. vertices : 67716231 Edge count : 138778562 | ||||
Adjacency TSV | |||||
Graph 4 | Num. vertices : 214005017 Edge count : 465410904 | ||||
Adjacency TSV | |||||
Graph 5 | Num. vertices : 55042369 Edge count : 117217600 | ||||
Adjacency TSV |
MAWI Working Group Traffic Archive (http://mawi.wide.ad.jp/mawi/): The MAWI (Measurement and Analysis on the WIDE Internet) Working Group is a working group that has carried out network traffic measurement, analysis, evaluation, and verification from the beginning of the WIDE Project. The graphs provided here were generated from packet trace data from the WIDE backbone maintained by the MAWI Working Group.
MAWI Datasets (click to expand) | |||||
Name | Description | ||||
---|---|---|---|---|---|
Graph 1 | Num. vertices : 18571154, Edge count : 38040320 | ||||
Adjacency TSV | |||||
Graph 2 | Num. vertices : 35991342, Edge count : 74485420 | ||||
Adjacency TSV | |||||
Graph 3 | Num. vertices : 68863315, Edge count : 143414960 | ||||
Adjacency TSV | |||||
Graph 4 | Num. vertices 128568730, Edge count : 270234840 | ||||
Adjacency TSV | |||||
Graph 5 | Num. vertices : 226196185, Edge count : 480047894 | ||||
Adjacency TSV |
Synthetic Datasets (click to expand) | |||||
Name | Description | ||||
---|---|---|---|---|---|
graph500-scale18-ef16 | Synthetic graph500 network of scale 18 (262144x262144, 4194304 edges) | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
graph500-scale19-ef16 | Synthetic graph500 network of scale 19 (524288x524288, 8388608 edges) | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
graph500-scale20-ef16 | Synthetic graph500 network of scale 20 (1048576x1048576, 16777216 edges) | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
graph500-scale21-ef16 | Synthetic graph500 network of scale 21 (2097152x2097152, 33554432 edges) | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
graph500-scale22-ef16 | Synthetic graph500 network of scale 22 (4194304x4194304, 67108864 edges) | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
graph500-scale23-ef16 | Synthetic graph500 network of scale 23 (8388608x8388608, 134217728 edges) | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
graph500-scale24-ef16 | Synthetic graph500 network of scale 24 (16777216x16777216), 268435456 edges) | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO | ||
graph500-scale25-ef16 | Synthetic graph500 network of scale 25 (33554432x33554432), 536870912 edges) | ||||
Adjacency TSV | Incidence TSV | Adjacency MMIO | Incidence MMIO |
Synthetic Data for the Streaming Graph Challenge: Stochastic Block Partition
Provided below are a set of synthetic datasets generated as MxM images where M = 2^n , for n = 8, 9, 10, 11, 12, 13. Each pixel in the image was treated as a node in the graph. Each pixel is connected to its 8-neighbors by an undirected edge. Pixels on the boundary only have 3 neighbors.
Provided below are a set of synthetic datasets with known truth partitions for use in the Stochastic Block Partitioning Graph Challenge.
2017 Streaming Partition Challenge Datasets with Known Truth Partitions (click to expand) | |||||
Name | Description | ||||
---|---|---|---|---|---|
Static Graphs (small) | Small Static Graphs with known truth for the Stochastic Block Partitioning Challenge | ||||
50 nodes | 100 nodes | 500 nodes | 1000 nodes | 5000 nodes | |
Static Graphs (large) | Large Static Graphs with known truth for the Stochastic Block Partitioning Challenge | ||||
20000 nodes | 50000 nodes | 500000 nodes | 2000000 nodes | 5000000 nodes | |
Streaming - Edge Sampling (small) | Small Streaming Graphs with known truth for the Stochastic Block Partitioning Challenge | ||||
500 nodes | 1000 nodes | 5000 nodes | |||
Streaming - Edge Sampling (large) | Large Streaming Graphs with known truth for the Stochastic Block Partitioning Challenge | ||||
20000 nodes | 50000 nodes | 500000 nodes | 2000000 nodes | 5000000 nodes | |
Streaming - Snowball Sampling (small) | Small Streaming Graphs with known truth for the Stochastic Block Partitioning Challenge | ||||
500 nodes | 1000 nodes | 5000 nodes | |||
Streaming - Snowball Sampling (large) | Large Streaming Graphs with known truth for the Stochastic Block Partitioning Challenge | ||||
20000 nodes | 50000 nodes | 500000 nodes | 2000000 nodes | 5000000 nodes |
2022 Streaming Partition Challenge Datasets with Known Truth Partitions (click to expand) Each setting includes 8 different graph sizes (1K, 5K, 20K, 50K, 200K, 1M, 5M, 20M nodes) |
||||||||||
Name | Description | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Low Block Overlap and Low Block Size Variation Full Set | Low level of overlap and low level of size variation between blocks (easiest) | |||||||||
Static Graphs | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M | ||
Streaming Graphs - Edge Sampling | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M | ||
Streaming Graphs - Snowball Sampling | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M | ||
Low Block Overlap and High Block Size Variation Full Set | Low level of overlap but high level of size variation between blocks | |||||||||
Static Graphs: | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M | ||
Streaming Graphs - Edge Sampling | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M | ||
Streaming Graphs - Snowball Sampling | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M | ||
High Block Overlap and Low Block Size Variation Full Set | High level of overlap but low level of size variation between blocks | |||||||||
Static Graphs: | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M | ||
Streaming Graphs - Edge Sampling | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M | ||
Streaming Graphs - Snowball Sampling | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M | ||
High Block Overlap and High Block Size Variation Full Set | High level of overlap and high level of size variation between blocks (hardest) | |||||||||
Static Graphs: | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M | ||
Streaming Graphs - Edge Sampling | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M | ||
Streaming Graphs - Snowball Sampling | 1K | 5K | 20K | 50K | 200K | 1M | 5M | 20M |
Graph data available in the Graph Challenge Amazon S3 bucket uses the following formats and conventions:
<dataset-name>_adj.tsv (Row, Col, Value) tuple describing the adjacency matrix of the graph in tab separated format. Adjacency matrix is of size Num_vertices x Num_vertices <dataset-name>_inc.tsv (Row, Col, Value) tuple describing the incidence matrix of the graph in tab separated format. Adjacency matrix is of size Edges x Nun_vertices (Note that some author refer to a transpose of this version) <dataset-name>_adj.mmio - adjacency matrix of the graph in MMIO format <dataset-name>_inc.mmio - incidence matrix of the graph in MMIO format
Details and readers for the MMIO format are available here : http://math.nist.gov/MatrixMarket/
Indexing note: All matrices use 1-based indexing
Naming conventions for files provided with each SNAP dataset are as follows:
-
Tab Separated Values – edge list, adjacency (_adj) and incidence (_inc) matrices.
s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX].tsv s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_adj.tsv s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_inc.tsv
-
Matrix Market I/O – edge list, adjacency (_adj) and incidence (_inc) matrices.
s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX].mmio s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_adj.mmio s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_inc.mmio
The format of URLs for the synthetic Graph500 data is:
-
Tab Separated Values – edge list, adjacency (_adj) and incidence (_inc) matrices.
s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16.tsv s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_adj.tsv s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_inc.tsv
-
Matrix Market I/O – edge list, adjacency (_adj) and incidence (_inc) matrices.
s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16.mmio s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_adj.mmio s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_inc.mmio
Files can be individually retrieved using a web browser or command-line tools using the URL scheme for Amazon S3 buckets, for example:
Adjacency Matrix – Tab Separated Values (amazon0302)
https://graphchallenge.s3.amazonaws.com/snap/amazon0302/amazon0302_adj.tsv
Adjacency Matrix – Matrix Market I/O (Synthetic Graph500 network, scale 24)
https://graphchallenge.s3.amazonaws.com/synthetic/graph500-scale24-ef16/...
Using either the AWS CLI tools (awscli) or AWS SDK:
To view all available files in the 'graphchallenge' bucket:
aws s3 ls s3://graphchallenge/
To download a particular dataset from Amazon S3 to local disk:
aws s3 cp s3://graphchallenge/friendster/ ./friendster/ --recursive
Datasets may also be downloaded one file at a time using the HTTPS URL scheme outlined above.