Internet2 Network flow information and Abilene Network flow information * NetFlow Version 5 (so IPv4 only) * Packets sampled 1/100 * Source-AS specified in flow records * non-multicast IPv4 addresses anonymized by zeroing the last 11 bits * All dates are UTC (GMT) Flow information is collected from the network by our Juniper T-640 routers. Interfaces are sampled 1/100, with a maximum pps rate of 7000 going to the route processor that records the flows (this is to protect the route processor); the net effect is that unless a router is very busy, every 1 in 100 packets are examined. However, during busy times or under denial-of-service conditions, the sampling may be done more sparsely. Before any flow hits disk, the flows are anonymized by zeroing the last 11 bits of any non-multicast IPv4 address. Currently, the routers are not able to record flow information for IPv6. "autonomous-system-type origin" is specified. The NetFlow data collected from the network is available using rsync from netflow.internet2.edu. Rsync may be obtained from: http://samba.anu.edu.au/ftp/rsync/rsync.html Flows are stored in flow-tools format. See http://www.splintered.net/sw/flow-tools The data for each router is stored as /flows/data/ROUTER/YYYY/YYYY-MM/YYYY-MM-DD/ Where YYYY is the year, MM is the month and DD is the day. All dates and times are specified in UTC (GMT). Each file contains 5 minutes of flow exports. There will be 288 files per day per router. The flow files are populated periodically throughout the day by pulling them from the collectors located in the Internet2 Network PoPs with rsync. At 01:00 GMT a final rsync is done with a checksum enabled for files from the previous day. If this command completes successfully a file /flows/data/ROUTER/YYYY/YYYY-MM/YYYY-MM-DD.synch is created. Logs for each day are stored as /flows/logs/YYYY/YYYY-MM/YYYY-MM-DD/. In the log directory are dumps of the SNMP ifAlias and ifName tables (as ifAlias.ROUTER and ifName.ROUTER) among other house keeping files. The ifAlias table, in particular, is useful to determine what interface an ifIndex in the flow record represents, since it contains the text interface description. An auto generated flow-nfilter config file that has the backbone interface definitions and other interesting filters is available in the file nfilter. As of 2008-Feb, the following are the current set of router names: ATLA CHIC HOUS KANS LOSA NEWY SALT STTLng WASH this set has been stable since 2007-Oct. Prior to that, we were migrating from the Abilene network to the Internet2 Network, which reduced the number of routers and moved them physically to new spaces. The migration began in 2006-Sep. The stable set of Abilene routers was: ATLA-M5 ATLAng CHINng DNVRng HSTNng IPLSng KSCY-M5 KSCYng LOSAng NYCMng SNVAng STTLng and WASHng. ATLA-M5 and KSCY-M5 were aggregators of ATM connections. You will note that STTLng is common to both; the router did not move, however, its peers did. Rsync is configured with a module flows to allow access to /flows. This module is password protected, and the rsync daemon is IP address restricted. You must have a valid username/password and IP address. First enter your password in a file, for example if your password is test % echo test > rsync.passwd; chmod 600 rsync.passwd Next invoke rsync as follows to see all files for the KANS router on 2008-01-01. Replace 'USERNAME' with your username. % rsync --password-file ./rsync.passwd -v -n -a USERNAME@netflow.internet2.edu::flows/data/KANS/2008/2008-01/2008-01-01 Remove the -n flag and add a destination path at the end of the rsync command to transfer the files % mkdir KANS % rsync --password-file ./rsync.passwd -v -a USERNAME@netflow.internet2.edu::flows/data/KANS/2008/2008-01/2008-01-01 KANS % ls KANS/ rsync.passwd Quick example to look at the data % flow-cat KANS/2008-01-01 | flow-print | head -3 srcIP dstIP prot srcPort dstPort octets packets 128.59.16.0 131.179.112.0 6 2128 41029 1420 1 128.59.16.0 131.179.112.0 6 2128 40547 52 1 Export in ASCII format % flow-cat KANS/2008-01-01 | flow-export -f2 | head -3 #:unix_secs,unix_nsecs,sysuptime,exaddr,dpkts,doctets,first,last,engine_type,engine_id,srcaddr,dstaddr,nexthop,input,output,srcport,dstport,prot,tos,tcp_flags,src_mask,dst_mask,src_as,dst_as 1199145602,0,1657978230,127.0.0.1,1,1420,1657912977,1657912977,0,0,128.59.16.0,131.179.112.0,64.57.28.56,95,39,2128,41029,6,0,24,16,16,14,52 1199145602,0,1657978230,127.0.0.1,1,52,1657879539,1657879539,0,0,128.59.16.0,131.179.112.0,64.57.28.56,95,39,2128,40547,6,0,16,16,16,14,52 Automatic gathering of data will usually work with yesterday's data set. An easy way to calculate 'yesterday' is with -v option to date. For example #!/bin/sh # default to yesterday if [ "X$1" = "X" ]; then RUNDAY="-1d" else RUNDAY=$1 fi # YMD YY=`date -v "$RUNDAY" '+%Y'` MM=`date -v "$RUNDAY" '+%m'` DD=`date -v "$RUNDAY" '+%d'` YYMMDD_DIR_RUNDAY="$YY/$YY-$MM/$YY-$MM-$DD" YYMMDD_RUNDAY="$YY-$MM-$DD" See the documentation at http://www.splintered.net/sw/flow-tools/docs for information on filtering flows (flow-nfilter) and creating reports (flow-report). [[ Internal note: when adding folks Update /etc/hosts.allow and /etc/rsyncd.secrets and /etc/rsyncd.conf ]] Last Update: 11-Feb-2008