Amazon is making the Graph Challenge data sets available to the community free of charge as part of the AWS Public Data Sets program. The data is being presented in several file formats, and there are a variety of ways to access it.
Data is available in the ‘graphchallenge’ Amazon S3 Bucket. (https://graphchallenge.s3.amazonaws.com)
Anonymized Network Sensing Graph Challenge
Official 2024 Anonymized Network Sensing Challenge (click to expand) Synthetic pcap files created from GraphBLAS matrices generated with randomized source and destination addresses, using static constants for all other fields in the IP and TCP headers.
Synthetic Sparse Deep Neural Network data for the Sparse DNN Graph Challenge
Official 2019 Sparse Deep Neural Network Challenge (click to expand) Synthetic DNNs created using RadiX-Net with varying number of neurons and layers. Truth categories for MNIST are included for performing inference using DNN with specific numbers of layers.
Sparse DNNs generated using interpolated sparse versions of images in MNIST corpus resized to produce neural networks of varying dimensions.
Real and Synthetic Data for the Static Graph Challenge: Subgraph Isomorphism
Real-world graphs from Stanford’s Large Network Dataset Collection (https://snap.stanford.edu/data/) as well as synthetic data at various scales generated using the scalable Graph500 Kronecker generator (http://www.graph500.org/specifications#sec-3_3) are being provided.
Each of the SNAP datasets is provided in both TSV (Tab-Separated Values) and MMIO (Matrix Market I/O) formats. You can access any desired files directly by crafting a HTTPS or AWS CLI URL using the following URL suffixes and instructions below.
A csv file with metadata about the SNAP datasets below is available here : SNAP Metadata
Metadata includes number of edges, nodes and triangles
SNAP Datasets (click to expand)
Source for Synthetic Kronecker graph generator available : Kronecker Graphs
Metadata table for exact Kronecker graphs available : Kronecker Graphs (2018 March 16)
Exact Kronecker Graph generator paper: Design, Generation, and Validation of Extreme Scale Power-Law Graphs
Synthetic Kronecker Graphs with many triangles:
Synthetic Kronecker Graphs with some triangles:
Protein k-mer graphs generated using data from GenBank: https://www.ncbi.nlm.nih.gov/genbank/ are available below. Nodes of the graph represent segments of amino acids.
Protein k-mer graphs(click to expand)
MAWI Working Group Traffic Archive (http://mawi.wide.ad.jp/mawi/): The MAWI (Measurement and Analysis on the WIDE Internet) Working Group is a working group that has carried out network traffic measurement, analysis, evaluation, and verification from the beginning of the WIDE Project. The graphs provided here were generated from packet trace data from the WIDE backbone maintained by the MAWI Working Group.
MAWI Datasets (click to expand)
Synthetic Datasets (click to expand)
Synthetic Data for the Streaming Graph Challenge: Stochastic Block Partition
Provided below are a set of synthetic datasets generated as MxM images where M = 2^n , for n = 8, 9, 10, 11, 12, 13. Each pixel in the image was treated as a node in the graph. Each pixel is connected to its 8-neighbors by an undirected edge. Pixels on the boundary only have 3 neighbors.
Provided below are a set of synthetic datasets with known truth partitions for use in the Stochastic Block Partitioning Graph Challenge.
2017 Streaming Partition Challenge Datasets with Known Truth Partitions (click to expand)
2022 Streaming Partition Challenge Datasets with Known Truth Partitions (click to expand)
(These datasets have been used for the streaming partition challenge since 2018)
Each setting includes 8 different graph sizes (1K, 5K, 20K, 50K, 200K, 1M, 5M, 20M nodes)
Graph data available in the Graph Challenge Amazon S3 bucket uses the following formats and conventions:
<dataset-name>_adj.tsv
(Row, Col, Value) tuple describing the adjacency matrix of the graph in tab separated format.
Adjacency matrix is of size Num_vertices x Num_vertices
<dataset-name>_inc.tsv
(Row, Col, Value) tuple describing the incidence matrix of the graph in tab separated format.
Adjacency matrix is of size Edges x Nun_vertices (Note that some author refer to a transpose of this version)
<dataset-name>_adj.mmio - adjacency matrix of the graph in MMIO format
<dataset-name>_inc.mmio - incidence matrix of the graph in MMIO format
Details and readers for the MMIO format are available here : http://math.nist.gov/MatrixMarket/
Indexing note: All matrices use 1-based indexing
Naming conventions for files provided with each SNAP dataset are as follows:
- Tab Separated Values – edge list, adjacency (_adj) and incidence (_inc) matrices.
s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX].tsv
s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_adj.tsv
s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_inc.tsv
- Matrix Market I/O – edge list, adjacency (_adj) and incidence (_inc) matrices.
s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX].mmio
s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_adj.mmio
s3://graphchallenge/snap/[URL_SUFFIX]/[URL_SUFFIX]_inc.mmio
The format of URLs for the synthetic Graph500 data is:
- Tab Separated Values – edge list, adjacency (_adj) and incidence (_inc) matrices.
s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16.tsv
s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_adj.tsv
s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_inc.tsv
- Matrix Market I/O – edge list, adjacency (_adj) and incidence (_inc) matrices.
s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16.mmio
s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_adj.mmio
s3://graphchallenge/synthetic/graph500-scale[SCALE]-ef16/graph500-scale[SCALE]-ef16_inc.mmio
Files can be individually retrieved using a web browser or command-line tools using the URL scheme for Amazon S3 buckets, for example:
Adjacency Matrix – Tab Separated Values (amazon0302)
https://graphchallenge.s3.amazonaws.com/snap/amazon0302/amazon0302_adj.tsv
Adjacency Matrix – Matrix Market I/O (Synthetic Graph500 network, scale 24)
https://graphchallenge.s3.amazonaws.com/synthetic/graph500-scale24-ef16/…
Using either the AWS CLI tools (awscli) or AWS SDK:
To view all available files in the ‘graphchallenge’ bucket:
aws s3 ls s3://graphchallenge/
To download a particular dataset from Amazon S3 to local disk:
aws s3 cp s3://graphchallenge/friendster/ ./friendster/ --recursive
Datasets may also be downloaded one file at a time using the HTTPS URL scheme outlined above.