DonGould:
Ok 11 terms there that I can only guess what they are, and others not in the click would have no idea about...
So if you have a second, as we've had over 6,000 views on this thread in such a short time, would you might just expanding those terms to something that people can Google the meaning of?
Not really - it's beyond both the scope of this thread and my spare time to go into details about router and forwarding architecture. You can either take my comments at face value as someone who designs and operates broadband networks, or you can challenge me on them if you want - but be prepared to go and do your own research if that's what tickles your fancy.
DonGould: With respect to the section I underlined, can you please either link an explanation or further explain what sampling means in this instance?
http://en.wikipedia.org/wiki/Netflow#Sampled_NetFlow
http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/12s_sanf.html
For two examples.
DonGould: How is it done?
Does it, for example, just check every 1000th packet header and just assume all the others are the same size and from the same locations?
The routers will only look at the 1 packet in every 1000 and add the data to the flow cache, either randomly or deterministically depending on the platform and implementation. It doesn't make any inferences about any other data, just about the 1/1000 (or 1/250, or 1/100, or whatever you configure) packets that are actually sampled. The rest of the packets are ignored.
DonGould: If you're running your flows at 1/1000th, what is the value of that data at all? Why would you even bother to collect it at that resolution? What would it tell you? What would you be looking for?
Typically just for statistical information for network planning, in some cases used as a form of looking for malicious traffic (e.g. to feed Arbor) or unusual traffic patterns.
E.g. Many operators use sampled netflow on my border routers to analyze which networks they need to pursue peering with - based on destination AS data.
DonGould:
Ok, cool, your view and yet many providers do use this technology in their billing, so how do you suggest they should be doing it?
They can talk to their vendors or systems integrators, but there are many better implementations, usually based around using L4 classification into policer or queues that account accurately for each subscriber session; or by using DPI boxes; or PCMM in the cable world; and so on.
Anyone using netflow for billing is likely going to get it wrong, but it should always under-report not over-report.



