The cisco-url-names-2014-12 dataset consists of 14 files. They were created by applying Cisco's ndn-trace-script (extract_urls.sh) on the HTTP URL traces obtained from IRCache. The full dataset comprises 13'549'129 URL content names.
NDN Trace Script Copyright (c) 2012-2013 by Cisco Systems, Inc. All rights reserved. Written by Ashok Narayanan and Won So
This software suite provides Perl scripts that can be used to translate HTTP URL traces into NDN names.
extract_urls.sh: This script reads gzipped IRCache trace files in the current directory and convert them into plain text HTTP URLs by adding ".urls" at the end of each trace file name.
url2ccnf.pl: This script converts plain text files with HTTL URLs into CCNF (Common Componentized Name Format - see another document) format files simultaneously generating the histogram of named components in the input files.
build_fib.pl: Given a set of names from CCNF files, this script builds a FIB name trace that satisfies a specific component name distribution.
ccnfdump.pl: This utility script decode names in a CCNF file and displays in a plain text.
For more details, refer comments in script source files and the paper published based on the data generated from these scripts: Won So, Ashok Narayanan, and David Oran, Named data networking on a router: fast and DoS-resistant forwarding with hash tables, In Proceedings of the 2013 ACM/IEEE Ninth Symposium on Architectures for Networking and Communications Systems, Oct. 2013.
HTTP URL traces can be obtained from independent sources. E.g. IRCache trace: ftp://ircache.net/Traces/DITL-2007-01-09