Documents
Kraken overview
Oct. 23, 2016
Kraken POC Overview and Summary
EDMXX-XX – Version 1.6
Author:Kraken POC Team / Endace
©2013 Endace Technology Ltd. All Rights
Reserved – Confidential – External distribution
prohibited – Internal distribution restricted.
Kraken POC Overview and Summary
EDMXX-XX – Version 1.6
Author:Kraken POC Team / Endace
©2013 Endace Technology Ltd. All Rights
Reserved – Confidential – External distribution
prohibited – Internal distribution restricted.
EDMXX-XX v1.6 Kraken POC Overview and Summary
Table of Contents
1
Introduction: ................................................................................................................................... 7
2
Customer User Stories: ................................................................................................................... 8
2.1
Rob and the Audit Dept: ......................................................................................................... 8
2.2
Rob and the EU: ...................................................................................................................... 8
2.3
Steve and Diagnosing a Network Problem ............................................................................. 8
2.4
Rob and Adding Storage: ........................................................................................................ 8
2.5
Rob Again and US Legal Hassles:............................................................................................. 8
2.6
A Friendly Government Agency .............................................................................................. 8
3
Three Possible Competitive Advantages......................................................................................... 9
4
Design Analysis:.............................................................................................................................10
4.1
5
6
Competitive Technologies: ...................................................................................................10
4.1.1
Probe:............................................................................................................................10
4.1.2
High End SAN: ...............................................................................................................10
4.1.3
Low End SAN or NAS: ....................................................................................................10
4.2
Hard Drive Reliability ............................................................................................................10
4.3
Flash ......................................................................................................................................10
4.4
Load Balancing / Overall Balance..........................................................................................11
4.5
Intelligent Retention .............................................................................................................11
4.6
Fault Tolerance .....................................................................................................................11
Identified Key Technical Challenges:.............................................................................................12
5.1
Cost .......................................................................................................................................12
5.2
Mechanical Disk Packing .......................................................................................................12
5.3
Cooling ..................................................................................................................................12
5.4
Power Distribution ................................................................................................................12
5.5
Probe Interface (Packet and Query) .....................................................................................12
5.6
Scalable Load Balancing ........................................................................................................12
5.7
Node Write/Read CPU and Disk Performance......................................................................13
5.8
Robust Failure and Transient Handling.................................................................................13
5.9
Overprovision / Specified End of Life....................................................................................13
POC Detailed Design Goals To The Meet Key Technical Challenges.............................................14
6.1
Form Factor / Heat / Vibration: ............................................................................................14
6.2
Packet Store ..........................................................................................................................14
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 2 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
Table of Contents
1
Introduction: ................................................................................................................................... 7
2
Customer User Stories: ................................................................................................................... 8
2.1
Rob and the Audit Dept: ......................................................................................................... 8
2.2
Rob and the EU: ...................................................................................................................... 8
2.3
Steve and Diagnosing a Network Problem ............................................................................. 8
2.4
Rob and Adding Storage: ........................................................................................................ 8
2.5
Rob Again and US Legal Hassles:............................................................................................. 8
2.6
A Friendly Government Agency .............................................................................................. 8
3
Three Possible Competitive Advantages......................................................................................... 9
4
Design Analysis:.............................................................................................................................10
4.1
5
6
Competitive Technologies: ...................................................................................................10
4.1.1
Probe:............................................................................................................................10
4.1.2
High End SAN: ...............................................................................................................10
4.1.3
Low End SAN or NAS: ....................................................................................................10
4.2
Hard Drive Reliability ............................................................................................................10
4.3
Flash ......................................................................................................................................10
4.4
Load Balancing / Overall Balance..........................................................................................11
4.5
Intelligent Retention .............................................................................................................11
4.6
Fault Tolerance .....................................................................................................................11
Identified Key Technical Challenges:.............................................................................................12
5.1
Cost .......................................................................................................................................12
5.2
Mechanical Disk Packing .......................................................................................................12
5.3
Cooling ..................................................................................................................................12
5.4
Power Distribution ................................................................................................................12
5.5
Probe Interface (Packet and Query) .....................................................................................12
5.6
Scalable Load Balancing ........................................................................................................12
5.7
Node Write/Read CPU and Disk Performance......................................................................13
5.8
Robust Failure and Transient Handling.................................................................................13
5.9
Overprovision / Specified End of Life....................................................................................13
POC Detailed Design Goals To The Meet Key Technical Challenges.............................................14
6.1
Form Factor / Heat / Vibration: ............................................................................................14
6.2
Packet Store ..........................................................................................................................14
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 2 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7
6.3
Packet Query Performance ...................................................................................................14
6.4
Queries Supported in POC ....................................................................................................14
6.5
Probe CPU Loading................................................................................................................15
6.6
Node CPU Loading.................................................................................................................15
6.7
Resilient Failure Mode ..........................................................................................................16
6.8
Even Packet Store .................................................................................................................16
6.9
Packet Availability .................................................................................................................16
Kraken POC Detailed Straw-man Design.......................................................................................17
7.1
Disk Drive Choice: .................................................................................................................17
7.2
Form Factor Choice: ..............................................................................................................17
7.3
The Ring / Adding and Subtracting Components..................................................................18
7.4
Interconnect Topology..........................................................................................................19
7.5
Q2 FPGA Architecture – Packet Storage ...............................................................................20
7.6
Tentacle FPGA Architecture – Packet Storage ......................................................................21
7.7
Tentacle FPGA Architecture – Query Return ........................................................................22
7.8
Q2 FPGA Architecture – Query Return .................................................................................23
7.9
Next Generation Probe Architecture: ...................................................................................25
7.10
Intelligent Load Balancing.....................................................................................................26
7.11
Packet Storage Flow..............................................................................................................27
7.12
Queries and Query Response................................................................................................27
7.13
Management processes........................................................................................................28
7.13.1
Startup / Boot ...............................................................................................................28
7.13.2
Disk failures...................................................................................................................28
7.13.3
Tentacle failure .............................................................................................................29
7.14
8
9
Other considerations ............................................................................................................29
Kraken POC Phased Development Plan ........................................................................................30
8.1
Cooling ..................................................................................................................................30
8.2
Chassis...................................................................................................................................30
8.3
Initial Development Platform (IDP).......................................................................................31
8.3.1
IDP Introduction............................................................................................................31
8.3.2
IDP Plan .........................................................................................................................32
8.4
Ethernet Switch Options .......................................................................................................35
8.5
Final POC Platform ................................................................................................................36
Appendix A: Kraken Rough Cost Breakdown – note – now out of date .......................................37
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 3 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7
6.3
Packet Query Performance ...................................................................................................14
6.4
Queries Supported in POC ....................................................................................................14
6.5
Probe CPU Loading................................................................................................................15
6.6
Node CPU Loading.................................................................................................................15
6.7
Resilient Failure Mode ..........................................................................................................16
6.8
Even Packet Store .................................................................................................................16
6.9
Packet Availability .................................................................................................................16
Kraken POC Detailed Straw-man Design.......................................................................................17
7.1
Disk Drive Choice: .................................................................................................................17
7.2
Form Factor Choice: ..............................................................................................................17
7.3
The Ring / Adding and Subtracting Components..................................................................18
7.4
Interconnect Topology..........................................................................................................19
7.5
Q2 FPGA Architecture – Packet Storage ...............................................................................20
7.6
Tentacle FPGA Architecture – Packet Storage ......................................................................21
7.7
Tentacle FPGA Architecture – Query Return ........................................................................22
7.8
Q2 FPGA Architecture – Query Return .................................................................................23
7.9
Next Generation Probe Architecture: ...................................................................................25
7.10
Intelligent Load Balancing.....................................................................................................26
7.11
Packet Storage Flow..............................................................................................................27
7.12
Queries and Query Response................................................................................................27
7.13
Management processes........................................................................................................28
7.13.1
Startup / Boot ...............................................................................................................28
7.13.2
Disk failures...................................................................................................................28
7.13.3
Tentacle failure .............................................................................................................29
7.14
8
9
Other considerations ............................................................................................................29
Kraken POC Phased Development Plan ........................................................................................30
8.1
Cooling ..................................................................................................................................30
8.2
Chassis...................................................................................................................................30
8.3
Initial Development Platform (IDP).......................................................................................31
8.3.1
IDP Introduction............................................................................................................31
8.3.2
IDP Plan .........................................................................................................................32
8.4
Ethernet Switch Options .......................................................................................................35
8.5
Final POC Platform ................................................................................................................36
Appendix A: Kraken Rough Cost Breakdown – note – now out of date .......................................37
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 3 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
10
Appendix B: Ethernet Rings......................................................................................................39
11
Open Questions
........................................................................................................42
11.1
Query Size .............................................................................................................................42
11.2
Packet Sorting .......................................................................................................................42
11.3
Arista Switch .........................................................................................................................42
11.4
Text Search............................................................................................................................42
12
Bibliography ..............................................................................................................................43
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 4 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
10
Appendix B: Ethernet Rings......................................................................................................39
11
Open Questions
........................................................................................................42
11.1
Query Size .............................................................................................................................42
11.2
Packet Sorting .......................................................................................................................42
11.3
Arista Switch .........................................................................................................................42
11.4
Text Search............................................................................................................................42
12
Bibliography ..............................................................................................................................43
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 4 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
Figure 1: Overall Kraken Topology........................................................................................................19
Figure 2 Q2/Probe Architecture............................................................................................................20
Figure 3: Packet Processing on a Kraken Tentacle................................................................................21
Figure 4: Query Return Process ............................................................................................................22
Figure 5: Next generation Kraken-enabled Probe Architecture. Please note: this is a wild guess to
provide a discussion framework. Beyond the “Generic Packet Storage Interface” nothing here is
required for Kraken operation..............................................................................................................25
Figure 6: Initial POC Development Platform.........................................................................................31
Figure 7: Final POC Test Setup ..............................................................................................................36
Tables of Tables:
Table 1: Basic Storage Numbers – Packet Capture Rate and History Length versus Storage Required . 7
Table 2: Time to Query Completion as a function of Query Size..........................................................14
Table 3: Example queries for POC.........................................................................................................15
Table 4: Possible Disk Drives.................................................................................................................17
Table 5: Expected cost of Initial POC Development Platform ..............................................................31
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 5 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
Figure 1: Overall Kraken Topology........................................................................................................19
Figure 2 Q2/Probe Architecture............................................................................................................20
Figure 3: Packet Processing on a Kraken Tentacle................................................................................21
Figure 4: Query Return Process ............................................................................................................22
Figure 5: Next generation Kraken-enabled Probe Architecture. Please note: this is a wild guess to
provide a discussion framework. Beyond the “Generic Packet Storage Interface” nothing here is
required for Kraken operation..............................................................................................................25
Figure 6: Initial POC Development Platform.........................................................................................31
Figure 7: Final POC Test Setup ..............................................................................................................36
Tables of Tables:
Table 1: Basic Storage Numbers – Packet Capture Rate and History Length versus Storage Required . 7
Table 2: Time to Query Completion as a function of Query Size..........................................................14
Table 3: Example queries for POC.........................................................................................................15
Table 4: Possible Disk Drives.................................................................................................................17
Table 5: Expected cost of Initial POC Development Platform ..............................................................31
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 5 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
Revision History:
Author
Revision
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
Date
13-5-2013
14-5-2013
16-5-2013
21-5-2013
24-5-2013
24-5-2013
24-5-2013
24-5-2013
4-6-2013
5-6-2013
10-6-2013
11-6-2013
12-6-2013
19-6-2013
25-6-2013
Changes
First Draft
Filling in basic headings and ideas from
Added some comments from
Architecture ideas from
Added input from Kraken meeting May 23rd, 2013
Added
FPGA Architecture text+diagrams
Added
ring stuff
costing stuff added
New requirements based on discussions with
New architecture added
More details of new architecture
Lots more from
Some fixes
Updates based on events since last Thursday
Updates based on feedback from
1.6
1-7-2013
Split document into two and renamed this piece.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 6 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
Revision History:
Author
Revision
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
Date
13-5-2013
14-5-2013
16-5-2013
21-5-2013
24-5-2013
24-5-2013
24-5-2013
24-5-2013
4-6-2013
5-6-2013
10-6-2013
11-6-2013
12-6-2013
19-6-2013
25-6-2013
Changes
First Draft
Filling in basic headings and ideas from
Added some comments from
Architecture ideas from
Added input from Kraken meeting May 23rd, 2013
Added
FPGA Architecture text+diagrams
Added
ring stuff
costing stuff added
New requirements based on discussions with
New architecture added
More details of new architecture
Lots more from
Some fixes
Updates based on events since last Thursday
Updates based on feedback from
1.6
1-7-2013
Split document into two and renamed this piece.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 6 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
1 Introduction:
Kraken is a product aimed at solving the deep storage problem faced by network analytics
users. Like a SAN, Kraken includes large amounts of disk or other storage; unlike a SAN, Kraken
works directly with packets and lets you search and retrieve them quickly.Some basic numbers on
storage:
Table 1: Basic Storage Numbers – Packet Capture Rate and History Length versus Storage Required
Time
Rate
(Gbps)
1 second
1 hour
1 day
7 days
1 month
1 year
0.1
12 MB
45 GB
1 TB
7 TB
29 TB
354 TB
1
128 MB
450 GB
10 TB
73 TB
295 TB
3 PB
4.8
614 MB
2 TB
50 TB
354 TB
1 PB
16 PB
10
1 GB
4 TB
105 TB
738 TB
2 PB
34 PB
24
3 GB
10 TB
253 TB
1 PB
6 PB
83 PB
40
5 GB
17 TB
421 TB
2 PB
11 PB
138 PB
Three key goals for Kraken identified by CTO/Marketing:
1) Performance: always capable of X (20Gbps or whatever) with no “it depends”.
2) Query time: depends linearly on amount of data returned not amount of data searched.
3) “Zero Touch Maintenance”: just works without maintenance for a specified period.
Notes:
a) Query time – the above statement is for queries based on pre-indexed fields (e.g. ip 5-tuple).
Queries based on non-index fields, i.e. searching for a text string, may depend on the
amount of data searched.
b) “Lights out Management” here refers to the ability to leave the Kraken unit alone without
any maintenance for a specified period, for example, in a darkened datacentre. It does not
refer to basic “lights out” functionality as provided by IPMI.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 7 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
1 Introduction:
Kraken is a product aimed at solving the deep storage problem faced by network analytics
users. Like a SAN, Kraken includes large amounts of disk or other storage; unlike a SAN, Kraken
works directly with packets and lets you search and retrieve them quickly.Some basic numbers on
storage:
Table 1: Basic Storage Numbers – Packet Capture Rate and History Length versus Storage Required
Time
Rate
(Gbps)
1 second
1 hour
1 day
7 days
1 month
1 year
0.1
12 MB
45 GB
1 TB
7 TB
29 TB
354 TB
1
128 MB
450 GB
10 TB
73 TB
295 TB
3 PB
4.8
614 MB
2 TB
50 TB
354 TB
1 PB
16 PB
10
1 GB
4 TB
105 TB
738 TB
2 PB
34 PB
24
3 GB
10 TB
253 TB
1 PB
6 PB
83 PB
40
5 GB
17 TB
421 TB
2 PB
11 PB
138 PB
Three key goals for Kraken identified by CTO/Marketing:
1) Performance: always capable of X (20Gbps or whatever) with no “it depends”.
2) Query time: depends linearly on amount of data returned not amount of data searched.
3) “Zero Touch Maintenance”: just works without maintenance for a specified period.
Notes:
a) Query time – the above statement is for queries based on pre-indexed fields (e.g. ip 5-tuple).
Queries based on non-index fields, i.e. searching for a text string, may depend on the
amount of data searched.
b) “Lights out Management” here refers to the ability to leave the Kraken unit alone without
any maintenance for a specified period, for example, in a darkened datacentre. It does not
refer to basic “lights out” functionality as provided by IPMI.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 7 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
2 Customer User Stories:
2.1Rob and the Audit Dept:
Rob has been informed by his audit department that he needs to store 3 months worth of
packets. His probes capture 1Gbps worth of traffic. He has a budget of $800k. As long as he can
retrieve any block of packets from a certain time up to a month ago he’s happy.
2.2Rob and the EU:
Rob has been told he needs to store packets on a time basis in order to meet EU data
retention requirements. He needs to guarantee that all packets stored are stored for precisely the
same length of time.
2.3Steve and Diagnosing a Network Problem
Steve has our latest probe connected to a Kraken. He’s monitoring four 10Ge links but his
overall traffic is bursty and averages around 24Gbps total. He’s a Tier 3 guy trying to figure out why
application A running on Server S is slow every Thursday at 4:33pm. It’s Monday morning.
2.4Rob and Adding Storage:
Rob has been told his Tier 3 guys need longer packet storage in order to diagnose network
issues more effectively. He currently has a probe with 24Gbps capture and 24TB of storage which
gives him roughly 2.5 hours of storage. He would like to have three days worth of storage.
He
would like to just plug something in and have it work as he has no resources for any kind of software
development.
2.5Rob Again and US Legal Hassles:
The FBI has decreed that unencrypted data is the same as public data. So, if they seize a
server in a cloud datafarm as a result of an ongoing legal case, any data on that server, even if it is
from another virtual customer is subject to search and possible litigation. This violates Rob’s
company contract with its cloud customers. So all server storage will in the future be encrypted.
Rob wants to know if Kraken and the Probe encrypt all storage too.
2.6A Friendly Government Agency
An FGA has the encryption keys for a well-known chat program. They wish to unencrypt all
packets sent by this program on a large network in the last 24 hours and look for the text string
“Domino’s Pizza” as they have information suggesting this is the favourite pizza of international
terrorists.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 8 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
2 Customer User Stories:
2.1Rob and the Audit Dept:
Rob has been informed by his audit department that he needs to store 3 months worth of
packets. His probes capture 1Gbps worth of traffic. He has a budget of $800k. As long as he can
retrieve any block of packets from a certain time up to a month ago he’s happy.
2.2Rob and the EU:
Rob has been told he needs to store packets on a time basis in order to meet EU data
retention requirements. He needs to guarantee that all packets stored are stored for precisely the
same length of time.
2.3Steve and Diagnosing a Network Problem
Steve has our latest probe connected to a Kraken. He’s monitoring four 10Ge links but his
overall traffic is bursty and averages around 24Gbps total. He’s a Tier 3 guy trying to figure out why
application A running on Server S is slow every Thursday at 4:33pm. It’s Monday morning.
2.4Rob and Adding Storage:
Rob has been told his Tier 3 guys need longer packet storage in order to diagnose network
issues more effectively. He currently has a probe with 24Gbps capture and 24TB of storage which
gives him roughly 2.5 hours of storage. He would like to have three days worth of storage.
He
would like to just plug something in and have it work as he has no resources for any kind of software
development.
2.5Rob Again and US Legal Hassles:
The FBI has decreed that unencrypted data is the same as public data. So, if they seize a
server in a cloud datafarm as a result of an ongoing legal case, any data on that server, even if it is
from another virtual customer is subject to search and possible litigation. This violates Rob’s
company contract with its cloud customers. So all server storage will in the future be encrypted.
Rob wants to know if Kraken and the Probe encrypt all storage too.
2.6A Friendly Government Agency
An FGA has the encryption keys for a well-known chat program. They wish to unencrypt all
packets sent by this program on a large network in the last 24 hours and look for the text string
“Domino’s Pizza” as they have information suggesting this is the favourite pizza of international
terrorists.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 8 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
3 Three Possible Competitive Advantages
What is Kraken’s likely competition ? We can identify several based on what we see our
customers doing. One competition is ourselves – if buying another Probe with its storage is cheaper
or easier than buying Kraken then that is what customers will do. Also, if buying a SAN or NAS and
connecting it to the probe is easy and cheaper that is what they will do. Probe is already making this
path easier than it is currently - because Probe needs to compete with vendors who have SAN
integration. We presumably have other vendors out there seeing the need that we see and
developing something Kraken-like – which implies that we need to put some strong IP into Kraken or
we will be fighting a production battle (which isn’t our forte) versus fighting a technical battle (which
is).
So, cost is clearly going to be an important factor for customers – not just capital
expenditure (CAPEX) cost but also operational cost (OPEX) – such things as cost of power, cost of AC,
cost of renting rack space, continuing cost of upgrade/maintenance.
Performance is likely to be a big differentiator too – can the solution scale to 24 Gbps
?40Gbps ? A high performance SAN capable of sinking 40Gbps is much more expensive than a low
performance device. Where does Kraken sit here ?
One architecture that has been discussed for Kraken is similar to RAID 0 – a bunch of cheap
commercial disks with no redundancy.
This model is cheap and can be robust but will have the
issue that drive failures will occur and packets will be lost. That can be addressed by utilizing some
form of redundancy on Kraken but at the expense of more cost and presumably more complexity.
So, packet availability, or in other words, the probability that when a customer asks for a certain set
of packets that he gets them, will be another major differentiator between Kraken and its
competitors.
Summary -> Kraken Key Differentiators:
COST
PERFORMANCE
PACKET AVAILABILITY
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 9 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
3 Three Possible Competitive Advantages
What is Kraken’s likely competition ? We can identify several based on what we see our
customers doing. One competition is ourselves – if buying another Probe with its storage is cheaper
or easier than buying Kraken then that is what customers will do. Also, if buying a SAN or NAS and
connecting it to the probe is easy and cheaper that is what they will do. Probe is already making this
path easier than it is currently - because Probe needs to compete with vendors who have SAN
integration. We presumably have other vendors out there seeing the need that we see and
developing something Kraken-like – which implies that we need to put some strong IP into Kraken or
we will be fighting a production battle (which isn’t our forte) versus fighting a technical battle (which
is).
So, cost is clearly going to be an important factor for customers – not just capital
expenditure (CAPEX) cost but also operational cost (OPEX) – such things as cost of power, cost of AC,
cost of renting rack space, continuing cost of upgrade/maintenance.
Performance is likely to be a big differentiator too – can the solution scale to 24 Gbps
?40Gbps ? A high performance SAN capable of sinking 40Gbps is much more expensive than a low
performance device. Where does Kraken sit here ?
One architecture that has been discussed for Kraken is similar to RAID 0 – a bunch of cheap
commercial disks with no redundancy.
This model is cheap and can be robust but will have the
issue that drive failures will occur and packets will be lost. That can be addressed by utilizing some
form of redundancy on Kraken but at the expense of more cost and presumably more complexity.
So, packet availability, or in other words, the probability that when a customer asks for a certain set
of packets that he gets them, will be another major differentiator between Kraken and its
competitors.
Summary -> Kraken Key Differentiators:
COST
PERFORMANCE
PACKET AVAILABILITY
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 9 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
4 Design Analysis:
4.1Competitive Technologies:
4.1.1
Probe:
48TB at $60k, 3U, Query performance with POI ?
4.1.2
High End SAN:
4.1.3 Low End SAN or NAS:
NetApp E5460 list price $200k, ex-demo $70k, 60x3TB drives, 4U, dual controller with
dual 8Gbps FC. No performance numbers yet. Probe software hassingle-LUN SAN support but
without POI download – that presumably goes in EP5.2.
4.2Hard Drive Reliability
Hard drives generally fail in two ways – they gradually have sectors that lose the ability to
store data and eventually they will suffer a complete catastrophic failure (Pinheiro, 2007). We are
looking for an architecture that is low cost and high performance while having an acceptable packet
availability metric. Low cost implies that we use commercial high density drives as opposed to
enterprise rated drives. High performance implies that we don’t employ RAID strategies as such
(verify ?). So we’re considering a large array of cheap perishable drives with little or no redundancy.
We obviously can expect to have a number of drives fail completely in the expected lifetime of a
shipped unit. We need to answer the following questions: 1) How many extra drives to we need to
include in a Kraken so that we drop to the expected storage capacity only at end of life ? 2) How
many extra drives do we need to include in a Kraken so that we drop to the expected performance
only at end of life ? 3) What is the probability that a customer will not find a packet when they
request it ? (Need assumptions here – something like 24Gbps traffic, 500 customers, each looking
for a 500Mbyte packet trace once a week). How does that compare with a low-end SAN? A highend SAN? For SANs we need to include expected downtime into the equation.
4.3Flash
Looks like either cheap flash disk or even all flash storage will be too late for POC but may be
there for productization.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 10 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
4 Design Analysis:
4.1Competitive Technologies:
4.1.1
Probe:
48TB at $60k, 3U, Query performance with POI ?
4.1.2
High End SAN:
4.1.3 Low End SAN or NAS:
NetApp E5460 list price $200k, ex-demo $70k, 60x3TB drives, 4U, dual controller with
dual 8Gbps FC. No performance numbers yet. Probe software hassingle-LUN SAN support but
without POI download – that presumably goes in EP5.2.
4.2Hard Drive Reliability
Hard drives generally fail in two ways – they gradually have sectors that lose the ability to
store data and eventually they will suffer a complete catastrophic failure (Pinheiro, 2007). We are
looking for an architecture that is low cost and high performance while having an acceptable packet
availability metric. Low cost implies that we use commercial high density drives as opposed to
enterprise rated drives. High performance implies that we don’t employ RAID strategies as such
(verify ?). So we’re considering a large array of cheap perishable drives with little or no redundancy.
We obviously can expect to have a number of drives fail completely in the expected lifetime of a
shipped unit. We need to answer the following questions: 1) How many extra drives to we need to
include in a Kraken so that we drop to the expected storage capacity only at end of life ? 2) How
many extra drives do we need to include in a Kraken so that we drop to the expected performance
only at end of life ? 3) What is the probability that a customer will not find a packet when they
request it ? (Need assumptions here – something like 24Gbps traffic, 500 customers, each looking
for a 500Mbyte packet trace once a week). How does that compare with a low-end SAN? A highend SAN? For SANs we need to include expected downtime into the equation.
4.3Flash
Looks like either cheap flash disk or even all flash storage will be too late for POC but may be
there for productization.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 10 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
4.4Load Balancing / Overall Balance
A single Probe / multiple Kraken configuration for storage capacity increase is required for
POC. The multiple Kraken multiple Probe case has a complicated load balancing scenario and will
not be considered for POC.
4.5Intelligent Retention
We may wish to store different packets for different amounts of time. Intelligent retention
will increase the IO load on the disks and the CPU load running the retention policy. Numerous
questions exist for this option. We will not consider Intelligent Retention for POC.
4.6Fault Tolerance
We are looking for slow degradation not outright failure. We need to survive 40G links being
disconnected, hard drives failing, Krakens being removed, added and upgraded. When a probe is
capturing continuously the packet data stored is perishable to some extent as it will in general only
be needed or useful for a fixed amount of time. Thus our fault tolerance may be higher in some
areas than a system which stores data for long periods of time and which has a more absolute
requirement.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 11 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
4.4Load Balancing / Overall Balance
A single Probe / multiple Kraken configuration for storage capacity increase is required for
POC. The multiple Kraken multiple Probe case has a complicated load balancing scenario and will
not be considered for POC.
4.5Intelligent Retention
We may wish to store different packets for different amounts of time. Intelligent retention
will increase the IO load on the disks and the CPU load running the retention policy. Numerous
questions exist for this option. We will not consider Intelligent Retention for POC.
4.6Fault Tolerance
We are looking for slow degradation not outright failure. We need to survive 40G links being
disconnected, hard drives failing, Krakens being removed, added and upgraded. When a probe is
capturing continuously the packet data stored is perishable to some extent as it will in general only
be needed or useful for a fixed amount of time. Thus our fault tolerance may be higher in some
areas than a system which stores data for long periods of time and which has a more absolute
requirement.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 11 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
5 Identified Key Technical Challenges:
Based on the above discussion, we identify the following key technical challenges:
5.1Cost
End user cost must be equal to or cheaper than a SAN/NAS solution of similar performance
and functionality. This includes initial purchase cost (CAPEX) and operational cost (OPEX) including
rack space rental, air conditioning and power cost.
5.2Mechanical Disk Packing
Specifically, how do you pack a huge number of commercial spinning drives in as small a
space as possible ? How do you deal with the vibration issues involved ?
5.3Cooling
Each node drive will generate roughly 3W of heat. Each tentacle will generate an additional
35W of heat. The entire Kraken is expected to generate 2.4kW. Removing this heat will be a serious
challenge.
5.4Power Distribution
The power consumed in the chassis will be approximately 2400W.
While this is not
enormous for a 6U chassis it is still significant.
5.5Probe Interface (Packet and Query)
How does the probe send packets and queries to the Kraken ? How does Kraken reply ?
How does the interface deal with packets lost due to congestion (should not happen), disk drive bad
sectors (will happen) or disk drive failure (will happen). How do we deal with Kraken’s added to the
array ? Kraken’s removed ? Kraken’s starting up or shutting down ? How can we guarantee a
response time that depends on query size but not storage size ? Flow control in both directions ?
Can the Probe re-sort a bunch of unordered packets arriving from multiple Krakens in a memory
hole ? Does the Q2 card need to help with this sorting ? How do we deal with requests that are
really large (bigger than a memory hole) ? Does our API split requests ?
5.6Scalable Load Balancing
We need to be able to scale up in a) storage depth, b) packet rate while holding query reply
times to increasing at most with ln(n). This implies the ability to add nodes to a Kraken, add more
Krakens to an existing Kraken and add more probes to a Kraken system.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 12 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
5 Identified Key Technical Challenges:
Based on the above discussion, we identify the following key technical challenges:
5.1Cost
End user cost must be equal to or cheaper than a SAN/NAS solution of similar performance
and functionality. This includes initial purchase cost (CAPEX) and operational cost (OPEX) including
rack space rental, air conditioning and power cost.
5.2Mechanical Disk Packing
Specifically, how do you pack a huge number of commercial spinning drives in as small a
space as possible ? How do you deal with the vibration issues involved ?
5.3Cooling
Each node drive will generate roughly 3W of heat. Each tentacle will generate an additional
35W of heat. The entire Kraken is expected to generate 2.4kW. Removing this heat will be a serious
challenge.
5.4Power Distribution
The power consumed in the chassis will be approximately 2400W.
While this is not
enormous for a 6U chassis it is still significant.
5.5Probe Interface (Packet and Query)
How does the probe send packets and queries to the Kraken ? How does Kraken reply ?
How does the interface deal with packets lost due to congestion (should not happen), disk drive bad
sectors (will happen) or disk drive failure (will happen). How do we deal with Kraken’s added to the
array ? Kraken’s removed ? Kraken’s starting up or shutting down ? How can we guarantee a
response time that depends on query size but not storage size ? Flow control in both directions ?
Can the Probe re-sort a bunch of unordered packets arriving from multiple Krakens in a memory
hole ? Does the Q2 card need to help with this sorting ? How do we deal with requests that are
really large (bigger than a memory hole) ? Does our API split requests ?
5.6Scalable Load Balancing
We need to be able to scale up in a) storage depth, b) packet rate while holding query reply
times to increasing at most with ln(n). This implies the ability to add nodes to a Kraken, add more
Krakens to an existing Kraken and add more probes to a Kraken system.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 12 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
5.7Node Write/Read CPU and Disk Performance
The node needs to a) write packets to disk, b) keep time and flowhashindexes for packets, c)
parse incoming requests, d) identify packets based on time and flowhash indexes that might match
incoming requests, e) read those packets off disk, f) parse those packets to verify which ones actually
do match the incoming request, g) return those packets somehow to 40G ring. We need to verify
that any particular architecture provides sufficient processing power for this.
5.8Robust Failure and Transient Handling
Hard drive failures both gradual and sudden. Addition of probes or Krakens, removal of
same. We want a lights-out replacement method rather than a urgent maintenance method – ie.
Kraken should keep running with redundant fail-over for a specified amount of time – a year or six
months
5.9Overprovision / Specified End of Life
We need to overprovision the box both in terms of storage space and performance in such a
way that we reach specified minimums at defined end of life.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 13 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
5.7Node Write/Read CPU and Disk Performance
The node needs to a) write packets to disk, b) keep time and flowhashindexes for packets, c)
parse incoming requests, d) identify packets based on time and flowhash indexes that might match
incoming requests, e) read those packets off disk, f) parse those packets to verify which ones actually
do match the incoming request, g) return those packets somehow to 40G ring. We need to verify
that any particular architecture provides sufficient processing power for this.
5.8Robust Failure and Transient Handling
Hard drive failures both gradual and sudden. Addition of probes or Krakens, removal of
same. We want a lights-out replacement method rather than a urgent maintenance method – ie.
Kraken should keep running with redundant fail-over for a specified amount of time – a year or six
months
5.9Overprovision / Specified End of Life
We need to overprovision the box both in terms of storage space and performance in such a
way that we reach specified minimums at defined end of life.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 13 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
6 POC Detailed Design Goals ToThe Meet Key Technical Challenges
The purpose here is to identify what exact goals the POC (proof-of-concept) must meet in
order to demonstrate that we have solved the key technical challenges.
6.1Form Factor / Heat / Vibration:
We need to show that we can pack >300TB of 2.5” drives in ≤ 3U of vertical height , power
the drives while they run a representative read/write pattern and keep the drives cool enough
(within allowed operating temperatures for that specific drive). The entire Kraken will take up 7U of
rack space {3U for drives, 3U for compute and 1U for a large Ethernet switch}.
6.2Packet Store
We need to store generic Ethernet packets at 24Gbps continuously without dropping any.
6.3Packet Query Performance
We must finish returning packets to probe main memory after a single query in a time given
by:
=∝
+
Where T is the total time in seconds from query sent to last packet returned, alpha is a
constant which must be less than 2.0e-9 seconds per byte, n is the number of bytes returned by the
query and L is a time constant less than 500ms. Running multiple queries simultaneously will reduce
individual query performance. We will allow for at least 32 simultaneous queries. Example times are
given in a table below:
Table 2: Time to Query Completion as a function of Query Size
Query Size
Time to Finish
1B
1 kB
1 MB
1 GB
1 TB
0.5 s
0.5 s
0.5 s
2.5 s
36 min
6.4Queries Supported in POC
All queries supported in POC will be of the form n{fk,t1,t2} where n is a number of triplets
with 1 <= n <= 256, fk is a flow key included in each packet in the flow hash location of a standard
0xe extension header, t1 is a time, t2 is a time and any of fk,t1 or t2 may be replaced by *. Note that
much more complicated queries (such as text string search) are possible in the Kraken architecture
but they will not be demonstrated in the POC. The following table contains examples of queries
supported for POC. Note also that there will be three basic forms of query message in POC – the
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 14 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
6 POC Detailed Design Goals ToThe Meet Key Technical Challenges
The purpose here is to identify what exact goals the POC (proof-of-concept) must meet in
order to demonstrate that we have solved the key technical challenges.
6.1Form Factor / Heat / Vibration:
We need to show that we can pack >300TB of 2.5” drives in ≤ 3U of vertical height , power
the drives while they run a representative read/write pattern and keep the drives cool enough
(within allowed operating temperatures for that specific drive). The entire Kraken will take up 7U of
rack space {3U for drives, 3U for compute and 1U for a large Ethernet switch}.
6.2Packet Store
We need to store generic Ethernet packets at 24Gbps continuously without dropping any.
6.3Packet Query Performance
We must finish returning packets to probe main memory after a single query in a time given
by:
=∝
+
Where T is the total time in seconds from query sent to last packet returned, alpha is a
constant which must be less than 2.0e-9 seconds per byte, n is the number of bytes returned by the
query and L is a time constant less than 500ms. Running multiple queries simultaneously will reduce
individual query performance. We will allow for at least 32 simultaneous queries. Example times are
given in a table below:
Table 2: Time to Query Completion as a function of Query Size
Query Size
Time to Finish
1B
1 kB
1 MB
1 GB
1 TB
0.5 s
0.5 s
0.5 s
2.5 s
36 min
6.4Queries Supported in POC
All queries supported in POC will be of the form n{fk,t1,t2} where n is a number of triplets
with 1 <= n <= 256, fk is a flow key included in each packet in the flow hash location of a standard
0xe extension header, t1 is a time, t2 is a time and any of fk,t1 or t2 may be replaced by *. Note that
much more complicated queries (such as text string search) are possible in the Kraken architecture
but they will not be demonstrated in the POC. The following table contains examples of queries
supported for POC. Note also that there will be three basic forms of query message in POC – the
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 14 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
first will return the number of packets and number of bytes that match the query, the second will
return the actual packets that match the query and the third will cancel/cleanup the query.
Table 3: Example queries for POC.
Example:
Translation:
Supported
In POC ?
{flowhash=0xABCDEF,t1=0s,t2=1s}
Find all packets with flowhash equal to 0xABCDEF Yes
between times 0 seconds and 1 seconds
inclusive.
{flowhash=*,t1=1s,t2=1.5s} or
Find all packets with any flowhash between times Yes
{flowhash=0x123456,t1=*,t2=*} or
1s and 1.5s and all packets with flowhash equal
{flowhash=0x345678,t1=2s,t2=*}
to 0x123456 at any time and all packets with
flowhash equal to 0x345678 between time 2s
and the current time.
{flowhash=*,t1=*,t2=10s}
Find all packets with any flowhash between the Yes
earliest time for which you have packets and the
time of 10s.
Text_string=”Domino’s Pizza”
Find any packets containing this string
Application=FaceBook
Find any packets created by the Facebook No
No
application.
6.5Probe CPU Loading
In the eventual product, Kraken related processes must not consume large amounts of
probe processor capability. However, for the POC we will be developing both the ILB and the Query
algorithms in software. So, the POC will use as much as all the processor resources available in an
8000-type probe box (the Centos probe). Once an understanding of the ILB and Query algorithms is
achieved, we will investigate ways of offloading processing so as to reduce reliance on probe
resources as much as possible. So, for example, query response sorting will be done on the Probe
for POC.
6.6Node CPU Loading
We need to show that the node cpu chosen has at least 20% overprovision to handle a
representative access pattern of packet reception and queries. The worst case representative
pattern would be 24Gbps of 64 byte packets received and one 100MB query every second.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 15 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
first will return the number of packets and number of bytes that match the query, the second will
return the actual packets that match the query and the third will cancel/cleanup the query.
Table 3: Example queries for POC.
Example:
Translation:
Supported
In POC ?
{flowhash=0xABCDEF,t1=0s,t2=1s}
Find all packets with flowhash equal to 0xABCDEF Yes
between times 0 seconds and 1 seconds
inclusive.
{flowhash=*,t1=1s,t2=1.5s} or
Find all packets with any flowhash between times Yes
{flowhash=0x123456,t1=*,t2=*} or
1s and 1.5s and all packets with flowhash equal
{flowhash=0x345678,t1=2s,t2=*}
to 0x123456 at any time and all packets with
flowhash equal to 0x345678 between time 2s
and the current time.
{flowhash=*,t1=*,t2=10s}
Find all packets with any flowhash between the Yes
earliest time for which you have packets and the
time of 10s.
Text_string=”Domino’s Pizza”
Find any packets containing this string
Application=FaceBook
Find any packets created by the Facebook No
No
application.
6.5Probe CPU Loading
In the eventual product, Kraken related processes must not consume large amounts of
probe processor capability. However, for the POC we will be developing both the ILB and the Query
algorithms in software. So, the POC will use as much as all the processor resources available in an
8000-type probe box (the Centos probe). Once an understanding of the ILB and Query algorithms is
achieved, we will investigate ways of offloading processing so as to reduce reliance on probe
resources as much as possible. So, for example, query response sorting will be done on the Probe
for POC.
6.6Node CPU Loading
We need to show that the node cpu chosen has at least 20% overprovision to handle a
representative access pattern of packet reception and queries. The worst case representative
pattern would be 24Gbps of 64 byte packets received and one 100MB query every second.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 15 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
6.7Resilient Failure Mode
We need to show that we can handle 1) a complete drive failure, 2) addition of a Kraken, 3)
removal of a Kraken. We do not need to show Kraken interoperating in a standard network fabric.
6.8Even Packet Store
The ILB (Intelligent Load Balancing) algorithm must guarantee that the oldest packet on any
disk is no more than 10% older than the oldest packet on any other disk. I.e. the disks need to be
equally utilized. This does not apply to disks being used for drive failure resilience.
6.9Packet Availability
We need to demonstrate that packet availability will meet a customer acceptable
level.Define customer acceptable ? Presumably this is based on flow as well – probability of flow
corruption ?
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 16 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
6.7Resilient Failure Mode
We need to show that we can handle 1) a complete drive failure, 2) addition of a Kraken, 3)
removal of a Kraken. We do not need to show Kraken interoperating in a standard network fabric.
6.8Even Packet Store
The ILB (Intelligent Load Balancing) algorithm must guarantee that the oldest packet on any
disk is no more than 10% older than the oldest packet on any other disk. I.e. the disks need to be
equally utilized. This does not apply to disks being used for drive failure resilience.
6.9Packet Availability
We need to demonstrate that packet availability will meet a customer acceptable
level.Define customer acceptable ? Presumably this is based on flow as well – probability of flow
corruption ?
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 16 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7 Kraken POC Detailed Straw-man Design
Given the POC design goals from the previous section, we wish to narrow down and
eventually choose a POC architecture. What we describe below has followed from the above design
goals.
7.1Disk Drive Choice:
Disk drive choice is still in flux. We prefer the Western Digital WD10JPVT at the moment due
to its high density and low cost. We assume for the POC that all drives will be identical in all
Kraken’s, although, for the production Kraken we will probably need to support whatever is
cheapest at the time. Currently we intend to have roughly 400 of these drives per Kraken.
Table 4: Possible Disk Drives
Power
W
W/GB
Z
(mm)
$0.08
1.7
0.0009
15
$78.13
$0.10
1.75
0.0023
9.5
1000
$77.00
$0.08
1.4
0.0014
9.5
Vanilla
1000
$77.00
$0.08
2.1
0.0021
9.5
Hybrid 8GB Flash
1000
$119.00
$0.12
2.7
0.0027
9.5
Disk per System
166
Manufacturer
Western
Digital
Western
Digital
Western
Digital
Model
Noteworthiness
WD20NPVT
Green (Power)
2000
$157.00
WD7500BPKT
Black (Quality)
750
WD10JPVT
Blue
Hitachi
HTS72010A9E630
Seagate
ST1000LM014
No of Systems
1
WD20NPVT
Capacity
$157.00
$26,062
WD7500BPKT
$78.13
$12,970
WD10JPVT
$77.00
$12,782
HTS72010A9E630
$77.00
$12,782
$119.00
$19,754
ST1000LM014
Cost
Cost /
G
7.2Form Factor Choice:
We’d like to hit the 3U disk drive form factor. For POC we will have 3U for disk drives, 3U for
compute and 1U for a large Ethernet switch. Power supplies will be included in the compute 3U.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 17 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7 Kraken POC Detailed Straw-man Design
Given the POC design goals from the previous section, we wish to narrow down and
eventually choose a POC architecture. What we describe below has followed from the above design
goals.
7.1Disk Drive Choice:
Disk drive choice is still in flux. We prefer the Western Digital WD10JPVT at the moment due
to its high density and low cost. We assume for the POC that all drives will be identical in all
Kraken’s, although, for the production Kraken we will probably need to support whatever is
cheapest at the time. Currently we intend to have roughly 400 of these drives per Kraken.
Table 4: Possible Disk Drives
Power
W
W/GB
Z
(mm)
$0.08
1.7
0.0009
15
$78.13
$0.10
1.75
0.0023
9.5
1000
$77.00
$0.08
1.4
0.0014
9.5
Vanilla
1000
$77.00
$0.08
2.1
0.0021
9.5
Hybrid 8GB Flash
1000
$119.00
$0.12
2.7
0.0027
9.5
Disk per System
166
Manufacturer
Western
Digital
Western
Digital
Western
Digital
Model
Noteworthiness
WD20NPVT
Green (Power)
2000
$157.00
WD7500BPKT
Black (Quality)
750
WD10JPVT
Blue
Hitachi
HTS72010A9E630
Seagate
ST1000LM014
No of Systems
1
WD20NPVT
Capacity
$157.00
$26,062
WD7500BPKT
$78.13
$12,970
WD10JPVT
$77.00
$12,782
HTS72010A9E630
$77.00
$12,782
$119.00
$19,754
ST1000LM014
Cost
Cost /
G
7.2Form Factor Choice:
We’d like to hit the 3U disk drive form factor. For POC we will have 3U for disk drives, 3U for
compute and 1U for a large Ethernet switch. Power supplies will be included in the compute 3U.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 17 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7.3The Ring / Adding and Subtracting Components
We have a requirement that we can add Krakens, subtract Krakens and potentially add or
subtract probes with an absolute minimum of disturbance to our packet capture solution. We also
have the requirement that we maximize bandwidth into and out of the Kraken. These constraints
are satisfied by a) building a ring architecture with data flowing in both directions through the ring –
that way if the ring is broken during the addition or subtraction of a Kraken data flow continues and
b) doubling the number of rings to double our bandwidth.
We have decided that ring architectures are a distraction at this point – they are obviously
doable and require no technology invention – it is mostly a market/customer driven decision. So, for
the POC, we will only demonstrate a point-to-point 40G link between a probe and one or two
Krakens.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 18 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7.3The Ring / Adding and Subtracting Components
We have a requirement that we can add Krakens, subtract Krakens and potentially add or
subtract probes with an absolute minimum of disturbance to our packet capture solution. We also
have the requirement that we maximize bandwidth into and out of the Kraken. These constraints
are satisfied by a) building a ring architecture with data flowing in both directions through the ring –
that way if the ring is broken during the addition or subtraction of a Kraken data flow continues and
b) doubling the number of rings to double our bandwidth.
We have decided that ring architectures are a distraction at this point – they are obviously
doable and require no technology invention – it is mostly a market/customer driven decision. So, for
the POC, we will only demonstrate a point-to-point 40G link between a probe and one or two
Krakens.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 18 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7.4Interconnect Topology
The proposed topology of the Kraken POC is based on 10G KR (backplane) Ethernet. Each
Tentacle looks like a mini-probe – it consists of a standard dag fpga looking a great deal like a
dag10sx2, an embedded ComExpress processor (probably based off the Intel quad-core i7-3612QE
or i7-4700EQ processors), up to 16 Gbyte of plug-in dram memory, four onboard SATA connections
and 16 extension SATA connections to a total of 20 disks (probably the 1TB 2.5” form factor spinning
disks).We have around twenty tentacles, each connected to a main central Intel Fulcrum Ethernet
switch by two 10G KR links and one 1G KR link. The Intel Fulcrum Ethernet switch is connected to
the outside world by four 40G QSFP-type links.We are considering additional 10G KR connections
between fpgas for additional inter-tentacle bandwidth. There is also a “twenty-first” ComExpress
processor which acts as the central startupPXE boot manager for the Kraken box, the source of
environmental information and the management for the Fulcrum switch.
Figure 1: Overall Kraken Topology
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 19 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7.4Interconnect Topology
The proposed topology of the Kraken POC is based on 10G KR (backplane) Ethernet. Each
Tentacle looks like a mini-probe – it consists of a standard dag fpga looking a great deal like a
dag10sx2, an embedded ComExpress processor (probably based off the Intel quad-core i7-3612QE
or i7-4700EQ processors), up to 16 Gbyte of plug-in dram memory, four onboard SATA connections
and 16 extension SATA connections to a total of 20 disks (probably the 1TB 2.5” form factor spinning
disks).We have around twenty tentacles, each connected to a main central Intel Fulcrum Ethernet
switch by two 10G KR links and one 1G KR link. The Intel Fulcrum Ethernet switch is connected to
the outside world by four 40G QSFP-type links.We are considering additional 10G KR connections
between fpgas for additional inter-tentacle bandwidth. There is also a “twenty-first” ComExpress
processor which acts as the central startupPXE boot manager for the Kraken box, the source of
environmental information and the management for the Fulcrum switch.
Figure 1: Overall Kraken Topology
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 19 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7.5Q2 FPGA Architecture– Packet Storage
In describing the architecture, it makes most sense to consider the two directions separately
– firstly the packets storage (probe to Kraken), followed by the search query return (Kraken to
probe).The anticipatedpacket storage part of the probe architecture is shown in the figure below:
Figure 2 Q2/Probe Architecture
The intended operation is described in the following sections:
On the probe/centos box, we have a pair of D9.2X (or other DAG) receiving traffic.
Received packets are hash load balanced (Flow HLB) into four streams each, such
that the streams are flow-safe and the HLB settings of both cards are identical. (i.e.
the flows contained in Stream #0 are the same flows as are in Stream #1).
A software process (the “packet processor”) merges corresponding streams from
each card so that we have four flow-safe streams of packets for all 4 incoming links.
(i.e. Stream #0 and Stream #1 are merged in time order)
The packet processor then encapsulates these packets in multi-E3 format, using
MAC addresses chosen according to its own intelligent load balancing (ILB)
algorithm. These MAC addresses determine the final destination within the Kraken,
so the load balancing algorithm becomes a software-based solution (i.e. bandwidthbased / flow based / “intelligent” etc.).A process within the probe is listening for per
node “keep-alive” packets being sent by each node in each Kraken to determine the
list of active MAC addresses to send packets to or to expect queries from. The load
balancing algorithm is split into multiple streams, such that multiple cores can
implement it, thus providing scope to try different things.
Once we reach this point, the Q2 simply becomes a transmit source, similar to any
other DAG card, with the difference that we implement multiple burst managers in
order to be capable of saturating the output 40G link. Each burst manager would be
capable of transmitting ~27Gbps. Note that the packets stored in the TX MEM
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 20 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7.5Q2 FPGA Architecture– Packet Storage
In describing the architecture, it makes most sense to consider the two directions separately
– firstly the packets storage (probe to Kraken), followed by the search query return (Kraken to
probe).The anticipatedpacket storage part of the probe architecture is shown in the figure below:
Figure 2 Q2/Probe Architecture
The intended operation is described in the following sections:
On the probe/centos box, we have a pair of D9.2X (or other DAG) receiving traffic.
Received packets are hash load balanced (Flow HLB) into four streams each, such
that the streams are flow-safe and the HLB settings of both cards are identical. (i.e.
the flows contained in Stream #0 are the same flows as are in Stream #1).
A software process (the “packet processor”) merges corresponding streams from
each card so that we have four flow-safe streams of packets for all 4 incoming links.
(i.e. Stream #0 and Stream #1 are merged in time order)
The packet processor then encapsulates these packets in multi-E3 format, using
MAC addresses chosen according to its own intelligent load balancing (ILB)
algorithm. These MAC addresses determine the final destination within the Kraken,
so the load balancing algorithm becomes a software-based solution (i.e. bandwidthbased / flow based / “intelligent” etc.).A process within the probe is listening for per
node “keep-alive” packets being sent by each node in each Kraken to determine the
list of active MAC addresses to send packets to or to expect queries from. The load
balancing algorithm is split into multiple streams, such that multiple cores can
implement it, thus providing scope to try different things.
Once we reach this point, the Q2 simply becomes a transmit source, similar to any
other DAG card, with the difference that we implement multiple burst managers in
order to be capable of saturating the output 40G link. Each burst manager would be
capable of transmitting ~27Gbps. Note that the packets stored in the TX MEM
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 20 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
Stream are ERF format with multi-E3 packets inside.There is no load balancing
between the 10Q2 transmit output and the Kraken receiving tentacle – it is the
responsibility of the ILB algorithm to prevent overloading a Kraken tentacle. The E3
packets have sequence numbers on a perper node basis so that receiving tentacles
can detect dropped packets.
7.6Tentacle FPGA Architecture– Packet Storage
The Ethernet switch within the Kraken distributes incoming packets based on L2 routing (i.e.
destination MAC addresses). Each node (one disk per node) within a Tentacle will have been
assigned an individual MAC address. Each of the two 10G KR ports on the tentacle within Kraken will
occasionally send a set of broadcast “keep alive” packets to the main switch using the MAC
addresses of the nodes behind it as the source MAC address. These packets serve two purposes –
one, they allow the switch to “learn” which MAC addresses are down each 10G KR port, two, they
allow the ILB and Query processes running on the probe to identify and catalogue Kraken tentacle
nodes. A diagram of packet processing on the Kraken Tentacle is shown below:
Packet Processor #0
2x10Ge KR
Tentacle FPGA
(10sx2 like)
MEM Stream #0
MEM Stream #1
MEM Stream #2
MEM Stream #3
MEM Stream #4
MEM Stream #5
MEM Stream #6
MEM Stream #7
MEM Stream #8
MEM Stream #9
MEM Stream #10
MEM Stream #11
MEM Stream #12
MEM Stream #13
MEM Stream #14
MEM Stream #15
MEM Stream #16
MEM Stream #17
MEM Stream #18
MEM Stream #19
Time Index
FlowHash Index
Disk #0
Packets
Packet Processor #19
Figure 3: Packet Processing on a Kraken Tentacle.
EachTentacle FPGA is intended to be a simple mechanism of handling large bandwidth links
into the Intel CPUs, in much the same way as a traditional DAG card. As such, we intend to leverage
existing DAG firmware and software IP as much as possible. The FPGA is required to do the
following:
Dual-port 10G Receive: incoming packets will be multi-E3 encapsulated, with a destination
MAC address which corresponds to a particular tentacle node. We will not de-encapsulate the
multi-E3. Instead, we will add an additional ERF header which will include a standard
flowhash extension header. However, the flowhash in this extension header will simply
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 21 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
Stream are ERF format with multi-E3 packets inside.There is no load balancing
between the 10Q2 transmit output and the Kraken receiving tentacle – it is the
responsibility of the ILB algorithm to prevent overloading a Kraken tentacle. The E3
packets have sequence numbers on a perper node basis so that receiving tentacles
can detect dropped packets.
7.6Tentacle FPGA Architecture– Packet Storage
The Ethernet switch within the Kraken distributes incoming packets based on L2 routing (i.e.
destination MAC addresses). Each node (one disk per node) within a Tentacle will have been
assigned an individual MAC address. Each of the two 10G KR ports on the tentacle within Kraken will
occasionally send a set of broadcast “keep alive” packets to the main switch using the MAC
addresses of the nodes behind it as the source MAC address. These packets serve two purposes –
one, they allow the switch to “learn” which MAC addresses are down each 10G KR port, two, they
allow the ILB and Query processes running on the probe to identify and catalogue Kraken tentacle
nodes. A diagram of packet processing on the Kraken Tentacle is shown below:
Packet Processor #0
2x10Ge KR
Tentacle FPGA
(10sx2 like)
MEM Stream #0
MEM Stream #1
MEM Stream #2
MEM Stream #3
MEM Stream #4
MEM Stream #5
MEM Stream #6
MEM Stream #7
MEM Stream #8
MEM Stream #9
MEM Stream #10
MEM Stream #11
MEM Stream #12
MEM Stream #13
MEM Stream #14
MEM Stream #15
MEM Stream #16
MEM Stream #17
MEM Stream #18
MEM Stream #19
Time Index
FlowHash Index
Disk #0
Packets
Packet Processor #19
Figure 3: Packet Processing on a Kraken Tentacle.
EachTentacle FPGA is intended to be a simple mechanism of handling large bandwidth links
into the Intel CPUs, in much the same way as a traditional DAG card. As such, we intend to leverage
existing DAG firmware and software IP as much as possible. The FPGA is required to do the
following:
Dual-port 10G Receive: incoming packets will be multi-E3 encapsulated, with a destination
MAC address which corresponds to a particular tentacle node. We will not de-encapsulate the
multi-E3. Instead, we will add an additional ERF header which will include a standard
flowhash extension header. However, the flowhash in this extension header will simply
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 21 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
contain the lower bits of the destination MAC address. This will let us steer individual multi-E3
packets based upon the E3 destination MAC address.
We will steer packets intended for each node to a different memory stream. As the current
HSBM supports 32 streams we will be able to handle at least that many nodes per tentacle
(currently 20).
Each stream will be processed by a separate process to strip the external ERF and E3 headers,
write the packet to disk, insert references into the time index and theflowhash index.
7.7Tentacle FPGA Architecture – Query Return
An overall diagram of the Query return process is shown below:
Broadcast Query
Query Processor #0
Time Index
FlowHash Index
Disk #0
Merge
Process
Packets
Tentacle FPGA
(10sx2 like)
TX MEM Stream
2x10Ge KR
Broadcast Query
Query Processor #19
Pause
Frame
Flow
Control
Time Index
FlowHash Index
Disk #19
Packets
Pause
Frame
Flow
Control
TimeStamp
Ptr Index
Sort
Process
Query #1
Packets
Q2 FPGA
TimeStamp
Ptr Index
Sort
Process
Packets
Kraken
Ethernet
Switch
Query #2
Query #16
Figure 4: Query Return Process
Query returns are initiated by the probe/Centos box. The queries are broadcast via 1GE
management links to each of the tentacles of the Kraken, where the CPUs start looking up the
packets in timestamp order (making use of the time index). Packets are then encapsulated in
E3 format with the query ID embedded into the E3 encapsulation and source MAC address set
to the address of the particular node responding. The Merge process writes packets to a
single TX memory stream.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 22 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
contain the lower bits of the destination MAC address. This will let us steer individual multi-E3
packets based upon the E3 destination MAC address.
We will steer packets intended for each node to a different memory stream. As the current
HSBM supports 32 streams we will be able to handle at least that many nodes per tentacle
(currently 20).
Each stream will be processed by a separate process to strip the external ERF and E3 headers,
write the packet to disk, insert references into the time index and theflowhash index.
7.7Tentacle FPGA Architecture – Query Return
An overall diagram of the Query return process is shown below:
Broadcast Query
Query Processor #0
Time Index
FlowHash Index
Disk #0
Merge
Process
Packets
Tentacle FPGA
(10sx2 like)
TX MEM Stream
2x10Ge KR
Broadcast Query
Query Processor #19
Pause
Frame
Flow
Control
Time Index
FlowHash Index
Disk #19
Packets
Pause
Frame
Flow
Control
TimeStamp
Ptr Index
Sort
Process
Query #1
Packets
Q2 FPGA
TimeStamp
Ptr Index
Sort
Process
Packets
Kraken
Ethernet
Switch
Query #2
Query #16
Figure 4: Query Return Process
Query returns are initiated by the probe/Centos box. The queries are broadcast via 1GE
management links to each of the tentacles of the Kraken, where the CPUs start looking up the
packets in timestamp order (making use of the time index). Packets are then encapsulated in
E3 format with the query ID embedded into the E3 encapsulation and source MAC address set
to the address of the particular node responding. The Merge process writes packets to a
single TX memory stream.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 22 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
The tentacle CPU transmits the packets via the Tentacle FPGA in much the same way as a
standard DAG card transmits via TERF. We can support any number of queries at this level –
provided software specifies the ID correctly, it doesn’t matter that queries are interleaved in
the transmit stream. All of the tentacles may transmit large amounts of data at the same time
– indeed, this would be the situation we would like to see. As such, the outgoing 40G link will
be overloaded. In order to ensure that we do not lose packets, the switch should be set up to
utilise standard Ethernet pause frames in the case of congestion. This would have the effect
of throttling the aggregate bandwidth of the tentacles to the outgoing bandwidth of the
switch.We currently lack any support for pause frames in firmware, so we will be required to
implement support for this.This mechanism of flow control back to the tentacle alsomeans
that we do not need significant amounts of buffering within the firmware in this path.
Flow control on a query basis is discussed in the next section as this is part of the Q2’s role.
Once the query is complete (whether packets have been found or not), the tentacle CPU will
send a packet indicating query completion.
7.8Q2 FPGA Architecture – Query Return
As part of initiating the query, the probe sets up a pair of memory holes for the returning
packets. The Q2 card receives the packets andde-encapsulates the E3. The packets are then
steered to a suitable memory hole based on the query ID. At the same time, we write a
pointer, a timestamp and a tentacle identifier (the abbreviated source MAC address) for each
packet into the second memory hole.
The packet data from each of the tentacles will be in order, but the ordering between the
tentacles is not guaranteed.Software is therefore required to sort the pointers into timestamp
order, such that downstream applications can retrieve an ordered set of packets. The query
completion packets allow the software processing the packets to complete the merge process.
Because we use two memory holes for each query return, we could implement up to 16
concurrent queries per burst manager.
Existing firmware IP limits the receive performance to ~27Gbps of bandwidth to host memory
(HSBM is only capable of gen2x8 speeds).
This means that in the case of data being
returnedquickly, we need a flow control mechanism back to the Kraken on a query basis. We
need to ensure that the flow control only affects the query that requires it, not all outstanding
queries. As such, we propose a credit-based system – each tentacle is given a set number of
credits for each query return. As it sends packets, that number is reduced. As space on the
probe is made available for that query (i.e. the packets are consumed), we return credits to
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 23 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
The tentacle CPU transmits the packets via the Tentacle FPGA in much the same way as a
standard DAG card transmits via TERF. We can support any number of queries at this level –
provided software specifies the ID correctly, it doesn’t matter that queries are interleaved in
the transmit stream. All of the tentacles may transmit large amounts of data at the same time
– indeed, this would be the situation we would like to see. As such, the outgoing 40G link will
be overloaded. In order to ensure that we do not lose packets, the switch should be set up to
utilise standard Ethernet pause frames in the case of congestion. This would have the effect
of throttling the aggregate bandwidth of the tentacles to the outgoing bandwidth of the
switch.We currently lack any support for pause frames in firmware, so we will be required to
implement support for this.This mechanism of flow control back to the tentacle alsomeans
that we do not need significant amounts of buffering within the firmware in this path.
Flow control on a query basis is discussed in the next section as this is part of the Q2’s role.
Once the query is complete (whether packets have been found or not), the tentacle CPU will
send a packet indicating query completion.
7.8Q2 FPGA Architecture – Query Return
As part of initiating the query, the probe sets up a pair of memory holes for the returning
packets. The Q2 card receives the packets andde-encapsulates the E3. The packets are then
steered to a suitable memory hole based on the query ID. At the same time, we write a
pointer, a timestamp and a tentacle identifier (the abbreviated source MAC address) for each
packet into the second memory hole.
The packet data from each of the tentacles will be in order, but the ordering between the
tentacles is not guaranteed.Software is therefore required to sort the pointers into timestamp
order, such that downstream applications can retrieve an ordered set of packets. The query
completion packets allow the software processing the packets to complete the merge process.
Because we use two memory holes for each query return, we could implement up to 16
concurrent queries per burst manager.
Existing firmware IP limits the receive performance to ~27Gbps of bandwidth to host memory
(HSBM is only capable of gen2x8 speeds).
This means that in the case of data being
returnedquickly, we need a flow control mechanism back to the Kraken on a query basis. We
need to ensure that the flow control only affects the query that requires it, not all outstanding
queries. As such, we propose a credit-based system – each tentacle is given a set number of
credits for each query return. As it sends packets, that number is reduced. As space on the
probe is made available for that query (i.e. the packets are consumed), we return credits to
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 23 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
the tentacles, thus allowing them to send more responses for that query. This credit return
path will need to be designed carefully so as to ensure we are not going to either exhaust
buffering resources or artificially limit the data rates.
Alternatives to software-based sorting have been explored, but are not described here (for
example, merging between adjacent tentacles and daisy-chaining results such that we end up with a
single in-order stream of packets). All of the options would require a firmware component that we
don’t see value in for the POC phase of the project.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 24 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
the tentacles, thus allowing them to send more responses for that query. This credit return
path will need to be designed carefully so as to ensure we are not going to either exhaust
buffering resources or artificially limit the data rates.
Alternatives to software-based sorting have been explored, but are not described here (for
example, merging between adjacent tentacles and daisy-chaining results such that we end up with a
single in-order stream of packets). All of the options would require a firmware component that we
don’t see value in for the POC phase of the project.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 24 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7.9Next Generation Probe Architecture:
A basic diagram of a possible next generation Kraken-enabled Probe architecture is shown in
the figure below:
Figure 5: Next generation Kraken-enabled Probe Architecture. Please note: this is a wild guess to provide a discussion
framework. Beyond the “Generic Packet Storage Interface” nothing here is required for Kraken operation.
Packets are captured by a DAG card and written to multiple memory streams. A multithreaded version of Capture Daemon (CPD) reads packets, processes them to determine the
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 25 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7.9Next Generation Probe Architecture:
A basic diagram of a possible next generation Kraken-enabled Probe architecture is shown in
the figure below:
Figure 5: Next generation Kraken-enabled Probe Architecture. Please note: this is a wild guess to provide a discussion
framework. Beyond the “Generic Packet Storage Interface” nothing here is required for Kraken operation.
Packets are captured by a DAG card and written to multiple memory streams. A multithreaded version of Capture Daemon (CPD) reads packets, processes them to determine the
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 25 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
protocol stack, hands them off to a DPI engine, gets them back from the DPI engine, adds the packet
to any existing flow record updates for the Probe Meta-Database (PMDB) and passes the packet on
to a Generic Packet Storage Interface. CPD also performs flow tracking. The DPI engine updates
records in the PMDB upon identification of application. Note that in the current Probe the packets
and their indexes form a unit which is separate, but, closely linked with the PMDB. In the next
generation probe we are assuming that the meta-database and the actual packet store will be
entirely separate.
A user, running the VISION UI will determine what packets he/she would like to download
and then ask VISION for those packets. Our intention is that VISION’s interface to the Kraken will be
a generic “Packet Query API”, probably compliant with REST, which abstracts the query interface.
7.10
Intelligent Load Balancing
Software Intelligent Load Balancing (ILB) is required to distribute traffic evenly amongst
Kraken nodes to maintain performance. It is to be performed in one place only to allow maximum
flexibility and abstraction from particular hardware, and runs in parallel to the packet processors in
Figure 2, selecting flows to associate to MAC addresses. The exact algorithm has not yet been
determined, but it is intended that such load balancing is flow-coherent wherever possible. In the
case of very large flows that exceed the capacity of a single disk (determined by the number of
packets from that flow over a short time period) the load balancing model may transition from flow
coherent to evenly distributed (and back) moment to moment. It is intended to experiment with this
algorithm to determine whether it is more efficient to distribute these to nodes on the same
tentacle, randomly across all logical nodes, or striping across tentacles (possibly taking in to account
the grouping of additional interconnect). Considerations in this include: balancing tentacle merge
and link bandwidth with the higher inter-logical-node communication bandwidth available within a
CPU.
In general, for each of the 4 pairs of memory holes from the 9.2x2’s (see section 7.5), a
thread does a 2-way merge (with a small buffer to handle timestamp jitter between the two
streams), encapsulates a group of packets in E3 with a destination node MAC assigned by the load
balancing algorithm. It then forwards this to one of two central threads that manages the two
transmit memory holes on the Q2 (and transmits on a first-come-first-served basis). Load balancing
between the two transmit streams may be required to mitigate congestion due to varying flow sizes,
but there should be sufficient HSBM bandwidth to maintain the minimum capture rate (24G) in
pathological cases. Parts of existing Probe software can likely be re-used for the merge section.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 26 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
protocol stack, hands them off to a DPI engine, gets them back from the DPI engine, adds the packet
to any existing flow record updates for the Probe Meta-Database (PMDB) and passes the packet on
to a Generic Packet Storage Interface. CPD also performs flow tracking. The DPI engine updates
records in the PMDB upon identification of application. Note that in the current Probe the packets
and their indexes form a unit which is separate, but, closely linked with the PMDB. In the next
generation probe we are assuming that the meta-database and the actual packet store will be
entirely separate.
A user, running the VISION UI will determine what packets he/she would like to download
and then ask VISION for those packets. Our intention is that VISION’s interface to the Kraken will be
a generic “Packet Query API”, probably compliant with REST, which abstracts the query interface.
7.10
Intelligent Load Balancing
Software Intelligent Load Balancing (ILB) is required to distribute traffic evenly amongst
Kraken nodes to maintain performance. It is to be performed in one place only to allow maximum
flexibility and abstraction from particular hardware, and runs in parallel to the packet processors in
Figure 2, selecting flows to associate to MAC addresses. The exact algorithm has not yet been
determined, but it is intended that such load balancing is flow-coherent wherever possible. In the
case of very large flows that exceed the capacity of a single disk (determined by the number of
packets from that flow over a short time period) the load balancing model may transition from flow
coherent to evenly distributed (and back) moment to moment. It is intended to experiment with this
algorithm to determine whether it is more efficient to distribute these to nodes on the same
tentacle, randomly across all logical nodes, or striping across tentacles (possibly taking in to account
the grouping of additional interconnect). Considerations in this include: balancing tentacle merge
and link bandwidth with the higher inter-logical-node communication bandwidth available within a
CPU.
In general, for each of the 4 pairs of memory holes from the 9.2x2’s (see section 7.5), a
thread does a 2-way merge (with a small buffer to handle timestamp jitter between the two
streams), encapsulates a group of packets in E3 with a destination node MAC assigned by the load
balancing algorithm. It then forwards this to one of two central threads that manages the two
transmit memory holes on the Q2 (and transmits on a first-come-first-served basis). Load balancing
between the two transmit streams may be required to mitigate congestion due to varying flow sizes,
but there should be sufficient HSBM bandwidth to maintain the minimum capture rate (24G) in
pathological cases. Parts of existing Probe software can likely be re-used for the merge section.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 26 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7.11
Packet Storage Flow
When packets arrive at a tentacle/CPU, they are load balanced into one stream per
destination MAC address. There is one thread, associated with one (possibly two?) storage disk, that
listen in parallel on one memory hole. This system is considered a ‘logical node’ and (as far as is
possible) acts completely independently from the others. The logical node writes flow and time
index metadata (to RAM, then to either its disk or fast tentacle local storage such as an SSD) and
stores the packet to its disk in one large rotating file (that may be simply writing to the raw disk).
7.12
Queries and Query Response
Kraken will support queries on stored data using a Packet Query interface, abstracted
through a generic (most likely REST) API for use by Probe. Queries consist of a function with a time
range, a filter (such as pcap/tcpdump) and an optional string to search for. For the POC functions will
most likely be a fixed set such as packets matching, all packets in flow and bandwidth. A query result
may consist of packets matching the query or aggregate metadata, possibly returned as part of the
“query done” mechanism or potentially encapsulated in E3 using the Meta ERF format. The query
mechanism should be flexible enough to allow composing functions and for adding simple userdefined functions that may involve custom packet offsets or distributed attribution table lookup
(such as a NAT translation or GeoIP database) but these may not be implemented for POC.
Each logical node (disk) receives a copy of the same query and processes it simultaneously
(operating as independently as possible), returning results in timestamp order encapsulated in E3
with the MAC address assigned to that logical node.Each disk will have time and flow hash indexes to
accelerate this process. Depending on Q2-end POC box merge performance constraints, these
results may also be coalesced into timestamp order per physical CPU before they leave the tentacle
(in which case they would use the MAC address of one of the 10G links).SeeFigure 4: Query Return
Process for an overview of the proposed architecture. It is expected there will be one software
thread instance per ‘software process’ in that diagram, with minimal inter-thread communication or
locking required.
As an end goal, queries similar to the following should be possible in production, and
performed in parallel for each logical node (pseudo syntax):
BW(t1, t2, map(username, map(nat1, flows(t1, t2, ip.addr == 1.2.3.*
&&ip.port == 80, “foo”))))
In HTTP traffic from addresses in 1.2.3.0/24, find each instance of “foo” between t1 and t2
and return the entire flow on a match. Map the IP addresses of these flows to the
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 27 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
7.11
Packet Storage Flow
When packets arrive at a tentacle/CPU, they are load balanced into one stream per
destination MAC address. There is one thread, associated with one (possibly two?) storage disk, that
listen in parallel on one memory hole. This system is considered a ‘logical node’ and (as far as is
possible) acts completely independently from the others. The logical node writes flow and time
index metadata (to RAM, then to either its disk or fast tentacle local storage such as an SSD) and
stores the packet to its disk in one large rotating file (that may be simply writing to the raw disk).
7.12
Queries and Query Response
Kraken will support queries on stored data using a Packet Query interface, abstracted
through a generic (most likely REST) API for use by Probe. Queries consist of a function with a time
range, a filter (such as pcap/tcpdump) and an optional string to search for. For the POC functions will
most likely be a fixed set such as packets matching, all packets in flow and bandwidth. A query result
may consist of packets matching the query or aggregate metadata, possibly returned as part of the
“query done” mechanism or potentially encapsulated in E3 using the Meta ERF format. The query
mechanism should be flexible enough to allow composing functions and for adding simple userdefined functions that may involve custom packet offsets or distributed attribution table lookup
(such as a NAT translation or GeoIP database) but these may not be implemented for POC.
Each logical node (disk) receives a copy of the same query and processes it simultaneously
(operating as independently as possible), returning results in timestamp order encapsulated in E3
with the MAC address assigned to that logical node.Each disk will have time and flow hash indexes to
accelerate this process. Depending on Q2-end POC box merge performance constraints, these
results may also be coalesced into timestamp order per physical CPU before they leave the tentacle
(in which case they would use the MAC address of one of the 10G links).SeeFigure 4: Query Return
Process for an overview of the proposed architecture. It is expected there will be one software
thread instance per ‘software process’ in that diagram, with minimal inter-thread communication or
locking required.
As an end goal, queries similar to the following should be possible in production, and
performed in parallel for each logical node (pseudo syntax):
BW(t1, t2, map(username, map(nat1, flows(t1, t2, ip.addr == 1.2.3.*
&&ip.port == 80, “foo”))))
In HTTP traffic from addresses in 1.2.3.0/24, find each instance of “foo” between t1 and t2
and return the entire flow on a match. Map the IP addresses of these flows to the
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 27 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
internaladdress assigned to that external address tuple at the time of the packets, and map
that to the user assigned that internal address at that time. Finally, calculate the bandwidth
of these flows by user and display as a chart.
With a reasonably flexible generic mechanism for defining what is examined in a mapping
and how it is looked up. Mapping tables may be large, distributed and time-varying. At least the
inner function will be possible in the POC.Worst case, if a query requires searching entire disks (such
as a very broad text search) it may take up to 8 hours for all sub-queries to complete, due to disk
read performance. Is this ok?
7.13
Management processes
There are a number of processes that need to be thought through for the Kraken to function
effectively. Some of these include:
7.13.1 Startup / Boot
On startup, there are a number of things that need to happen:
FPGAs on each tentacle need to be loaded. For the POC this will take the form of a CPLD and
ROM for each tentacle, as for current DAG cards. Alternative image loading may be explored
at a later date (i.e. beyond POC)
Each tentacle CPU needs to boot and initialise the mini-DAG. The intention is that we use
PXE to boot all of the tentacles from a common software image.
The probe needs to know that the Kraken is available. It needs to know the capabilities of
the Kraken, including how many hard drives are available (this number will not be constant,
given that some hard drives will fail). This can be advertised via keep alive packets from the
tentacle fpga.
Adding an additional Kraken to the probe becomes an extension of this – the Kraken boots
and advertises its capabilities, leaving the probe to adjust the load balancing to split the
outgoing traffic between the available Kraken.
7.13.2 Disk failures
The operational model for Kraken implies that disk failures will occur and need to be
handled without user intervention. It is the responsibility of the tentacle CPUs to monitor the status
of each of the disks. Once a failure has been detected, the following needs to happen:
If a spare unused drive exists on this tentacle, the failed drive will be unmapped from its
MAC address and the new unused drive will take its place. Packets destined for that disk
will automatically be steered to an alternative disk.
The tentacle needs to inform the management CPU that the disk has failed.
If a spare unused drive does not exist, the tentacle cpu will remove that MAC address from
the keep-alive table in the FPGA. This will halt the keep-alive’s for that MAC address. The
ILB algorithm on the probe will notice this eventually and stop steering packets towards this
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 28 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
internaladdress assigned to that external address tuple at the time of the packets, and map
that to the user assigned that internal address at that time. Finally, calculate the bandwidth
of these flows by user and display as a chart.
With a reasonably flexible generic mechanism for defining what is examined in a mapping
and how it is looked up. Mapping tables may be large, distributed and time-varying. At least the
inner function will be possible in the POC.Worst case, if a query requires searching entire disks (such
as a very broad text search) it may take up to 8 hours for all sub-queries to complete, due to disk
read performance. Is this ok?
7.13
Management processes
There are a number of processes that need to be thought through for the Kraken to function
effectively. Some of these include:
7.13.1 Startup / Boot
On startup, there are a number of things that need to happen:
FPGAs on each tentacle need to be loaded. For the POC this will take the form of a CPLD and
ROM for each tentacle, as for current DAG cards. Alternative image loading may be explored
at a later date (i.e. beyond POC)
Each tentacle CPU needs to boot and initialise the mini-DAG. The intention is that we use
PXE to boot all of the tentacles from a common software image.
The probe needs to know that the Kraken is available. It needs to know the capabilities of
the Kraken, including how many hard drives are available (this number will not be constant,
given that some hard drives will fail). This can be advertised via keep alive packets from the
tentacle fpga.
Adding an additional Kraken to the probe becomes an extension of this – the Kraken boots
and advertises its capabilities, leaving the probe to adjust the load balancing to split the
outgoing traffic between the available Kraken.
7.13.2 Disk failures
The operational model for Kraken implies that disk failures will occur and need to be
handled without user intervention. It is the responsibility of the tentacle CPUs to monitor the status
of each of the disks. Once a failure has been detected, the following needs to happen:
If a spare unused drive exists on this tentacle, the failed drive will be unmapped from its
MAC address and the new unused drive will take its place. Packets destined for that disk
will automatically be steered to an alternative disk.
The tentacle needs to inform the management CPU that the disk has failed.
If a spare unused drive does not exist, the tentacle cpu will remove that MAC address from
the keep-alive table in the FPGA. This will halt the keep-alive’s for that MAC address. The
ILB algorithm on the probe will notice this eventually and stop steering packets towards this
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 28 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
MAC address. Empty query replies for that MAC address may be necessary for some time to
prevent lock issues.
In terms of query processing, the tentacle CPU needs to handle the fact that the packets may
have been lost, and be capable of handling that fact.
Detecting partial disk failures (i.e. where some data becomes unreadable rather than the
entire disk failing) may be a difficult problem.
7.13.3 Tentacle failure
Tentacle CPUs are required to remain up for months at a time. As such, there is always the
possibility that the software will fail at some point. At this stage, we do not have a mechanism
planned for how to deal with this (ideally we would have a hardware reset mechanism, such that the
management CPU can reset a tentacle).
7.14
Other considerations
Some things that have perhaps not been adequately covered as yet:
Buffering: we have DDR memory available on both the Q2 and the mini-DAG. At
present, we haven’t identified definite use-cases for this. We would like it included
in the POC, so that if we do require buffering at some point, we can utilise it.
Packet types: We intend to use E3 as a transport mechanism. This is a standard
Ethernet frame, with our own Ethertype and subtypes. We will need to carefully
define suitable subtypes for the appropriate uses.
Inter-node communication: while, through additional interconnect links, we should
have plenty of east-west bandwidth available (at least within a Kraken box); how
exactly this would work needs to be thought through carefully. It should not be
necessary where flow coherency is maintained, but this will not be possible for very
large flows. A simple alternative option for string search would be to perform the
search after merge as the data comes in through the Q2 query receive side.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 29 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
MAC address. Empty query replies for that MAC address may be necessary for some time to
prevent lock issues.
In terms of query processing, the tentacle CPU needs to handle the fact that the packets may
have been lost, and be capable of handling that fact.
Detecting partial disk failures (i.e. where some data becomes unreadable rather than the
entire disk failing) may be a difficult problem.
7.13.3 Tentacle failure
Tentacle CPUs are required to remain up for months at a time. As such, there is always the
possibility that the software will fail at some point. At this stage, we do not have a mechanism
planned for how to deal with this (ideally we would have a hardware reset mechanism, such that the
management CPU can reset a tentacle).
7.14
Other considerations
Some things that have perhaps not been adequately covered as yet:
Buffering: we have DDR memory available on both the Q2 and the mini-DAG. At
present, we haven’t identified definite use-cases for this. We would like it included
in the POC, so that if we do require buffering at some point, we can utilise it.
Packet types: We intend to use E3 as a transport mechanism. This is a standard
Ethernet frame, with our own Ethertype and subtypes. We will need to carefully
define suitable subtypes for the appropriate uses.
Inter-node communication: while, through additional interconnect links, we should
have plenty of east-west bandwidth available (at least within a Kraken box); how
exactly this would work needs to be thought through carefully. It should not be
necessary where flow coherency is maintained, but this will not be possible for very
large flows. A simple alternative option for string search would be to perform the
search after merge as the data comes in through the Q2 query receive side.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 29 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
8 Kraken POC Phased Development Plan
Technical challenges to be addressed and how we wish to de-risk them.
8.1Cooling
In the final system we will have a stack of drives in a chassis. Many questions arise as to how
we can cool such a system. Actually purchasing drives, sata controllers and hacking together a test
chassis, while accurate, is also going to be expensive (~$35k NZD) and time consuming (2 man
weeks). Instead, we would like to hack together a representative chassis, buy a bunch of aluminium
wide bar from UIrich, chop the bar into drive-like pieces, add power resistors to simulate the drive
heating and push them into the chassis in different configurations with different fan formations. We
should be able to rapidly test a large number of configurations for less cost (~$7k NZD) and in less
time (1 man week).This has been agreed to (30-5-2013) and the work has been started.
8.2Chassis
The eventual Kraken product will have a proper “bent-tin” chassis. However, this will take a
long time before it will be available. As such we intend to build two initial prototype chassis in a
simple quick manner. We expect the two prototype chassis to cost ~$1.5kNZD each. This cost was
included in the estimate for cooling above. This has been agreed to (30-5-2013) and the work
started.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 30 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
8 Kraken POC Phased Development Plan
Technical challenges to be addressed and how we wish to de-risk them.
8.1Cooling
In the final system we will have a stack of drives in a chassis. Many questions arise as to how
we can cool such a system. Actually purchasing drives, sata controllers and hacking together a test
chassis, while accurate, is also going to be expensive (~$35k NZD) and time consuming (2 man
weeks). Instead, we would like to hack together a representative chassis, buy a bunch of aluminium
wide bar from UIrich, chop the bar into drive-like pieces, add power resistors to simulate the drive
heating and push them into the chassis in different configurations with different fan formations. We
should be able to rapidly test a large number of configurations for less cost (~$7k NZD) and in less
time (1 man week).This has been agreed to (30-5-2013) and the work has been started.
8.2Chassis
The eventual Kraken product will have a proper “bent-tin” chassis. However, this will take a
long time before it will be available. As such we intend to build two initial prototype chassis in a
simple quick manner. We expect the two prototype chassis to cost ~$1.5kNZD each. This cost was
included in the estimate for cooling above. This has been agreed to (30-5-2013) and the work
started.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 30 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
8.3Initial Development Platform (IDP)
8.3.1 IDP Introduction
The final POC hardware platform will not be ready for several months. In the meantime we
need to start developing and de-risking software and firmware. We propose to build the following
development platform to let us study/develop packet capture, load balancing, MAC steering, disk
write with indexing, query broadcast, query response and aggregation, query flow control. This has
been agreed to (6-6-2013) and the work started.
Figure 6: Initial POC Development Platform
Table 5: Expected cost of Initial POC Development Platform
Item
dag92x2
Qty
2
Cost Each
$
-
10sx4
3
$
-
8000-like test probe
40Ge/10Ge Switch
Arista
1
$
-
1
$
3,000
Haswell Motherboard
2
$
332
CPU
2
$
477
16-port SATA card
2
$
1,432
Disks
40
$
157
Small 1Ge Switch
1
$
200
Power supplies
2
$
100
Total:
Ext. Cost
$
$
$
-
$
$
$
$
$
$
$
3,000
664
955
2,864
6,280
200
200
$
14,163
Notes:
Stolen from lab
Stolen from firmware team
We can take one of Daniel's castoffs
Perhaps free ?
(May just steal from Systems)
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 31 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
8.3Initial Development Platform (IDP)
8.3.1 IDP Introduction
The final POC hardware platform will not be ready for several months. In the meantime we
need to start developing and de-risking software and firmware. We propose to build the following
development platform to let us study/develop packet capture, load balancing, MAC steering, disk
write with indexing, query broadcast, query response and aggregation, query flow control. This has
been agreed to (6-6-2013) and the work started.
Figure 6: Initial POC Development Platform
Table 5: Expected cost of Initial POC Development Platform
Item
dag92x2
Qty
2
Cost Each
$
-
10sx4
3
$
-
8000-like test probe
40Ge/10Ge Switch
Arista
1
$
-
1
$
3,000
Haswell Motherboard
2
$
332
CPU
2
$
477
16-port SATA card
2
$
1,432
Disks
40
$
157
Small 1Ge Switch
1
$
200
Power supplies
2
$
100
Total:
Ext. Cost
$
$
$
-
$
$
$
$
$
$
$
3,000
664
955
2,864
6,280
200
200
$
14,163
Notes:
Stolen from lab
Stolen from firmware team
We can take one of Daniel's castoffs
Perhaps free ?
(May just steal from Systems)
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 31 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
8.3.2 IDP Plan
First plans: what things we can do to move towards getting this thing working:
Packet source: 3000 box transmitting out of one port.
Packet capture probe lookalike (in a 3000 box): we use a 92x to receive packets into
two memory streams (HLB). Software does the multi-E3 encapsulation (is there
some software to do that already?), using a random MAC address from a given set of
20. We encapsulate in ERF again. We transmit the resulting packets out of two 92x
cards (which strip the outer ERF off).
o
Work required: software encapsulation / ILB algorithm (splat for the first
pass). Standard firmware for now.
o
Questions to answer: can we do encapsulation/ILB in software? what rate is
achievable?Software can achieve 10Gb+ per thread when selecting a
random address.
Tentacle lookalike (Haswell MB, LSI card): 92x receives the E3 (we have two
tentacles receiving half the traffic each). DO NOT de-encapsulate. Once through
BFS, the outer layer of ERF will have a non-IP extension header. We replace the
hash in this with the dst MAC address. We can then steer based on this to one of 20
streams. Software receives in 20 memory holes, each of which writes to disk via LSI
RAID card thingie.
o
Work required: firmware needs to implement the steering based on
MAC.DONE.
o
software needs to capture and write to disk. Initially this could just be a
dagsnap, although getting the next-gen CPD (i.e. indexing etc) working
would be the next step.
o
Questions to answer: can we write to disk at the desired rate? how much
headroom do we have? Does the CPU have issues with 20 threads accessing
disks at once?20 threads writing to disk can apparently each sustain
roughly 50MB/s write performance.
Check how pause frames will work... does the switch support them?
what
granularity do we have? How long will it take us to resume transmit? The switch
almost certainly does support them – we need to test how well this works though.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 32 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
8.3.2 IDP Plan
First plans: what things we can do to move towards getting this thing working:
Packet source: 3000 box transmitting out of one port.
Packet capture probe lookalike (in a 3000 box): we use a 92x to receive packets into
two memory streams (HLB). Software does the multi-E3 encapsulation (is there
some software to do that already?), using a random MAC address from a given set of
20. We encapsulate in ERF again. We transmit the resulting packets out of two 92x
cards (which strip the outer ERF off).
o
Work required: software encapsulation / ILB algorithm (splat for the first
pass). Standard firmware for now.
o
Questions to answer: can we do encapsulation/ILB in software? what rate is
achievable?Software can achieve 10Gb+ per thread when selecting a
random address.
Tentacle lookalike (Haswell MB, LSI card): 92x receives the E3 (we have two
tentacles receiving half the traffic each). DO NOT de-encapsulate. Once through
BFS, the outer layer of ERF will have a non-IP extension header. We replace the
hash in this with the dst MAC address. We can then steer based on this to one of 20
streams. Software receives in 20 memory holes, each of which writes to disk via LSI
RAID card thingie.
o
Work required: firmware needs to implement the steering based on
MAC.DONE.
o
software needs to capture and write to disk. Initially this could just be a
dagsnap, although getting the next-gen CPD (i.e. indexing etc) working
would be the next step.
o
Questions to answer: can we write to disk at the desired rate? how much
headroom do we have? Does the CPU have issues with 20 threads accessing
disks at once?20 threads writing to disk can apparently each sustain
roughly 50MB/s write performance.
Check how pause frames will work... does the switch support them?
what
granularity do we have? How long will it take us to resume transmit? The switch
almost certainly does support them – we need to test how well this works though.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 32 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
So, we need someway of trying to flood from multiple tentacles into the switch and
check that we can saturate a given output link. Simple to test once we have a switch
and some firmware which respects pause...
Add firmware support for pause frames: on receive, we need to extract pause
frames from the incoming packets and pass them to the transmit side where we
initialise a timer and push pressure back to TERF until the timer expires.DONE, but
not tested at all!
Define a query interface: understand how REST API works and which bits we need.
Q2 datapath query return path:
o
we need to implement a multi-HSBM 40G rxdatapath.
o
We also need to be capable of steering based on query ID, which is
embedded in the E3 format.
o
We also want to de-encapsulate the packets (maybe?)
Q2 datapath packet capture side:
o
Two or 4 BM read stream modules, as Gerard has apparently demonstrated
(two streams)
o
multi-stream TERF: 4 streams, best effort merging into a 40G core. We want
this to be a 40G rather than mapping streams to 4x10G because it gives
more flexibility if one of the streams is larger than the others, without
having to switch.
Flow control options:
o
Packet storage: we have 400 receive streams within the Kraken. Each of
these should be receiving packets from 1 of N processing threads (where N
is somewhere around 6 perhaps – however many are needed to implement
the ILB algorithm?). Each receive stream can send stream status to the
processing thread via statistics packets – perhaps if the buffer level hits half
full we send something, otherwise we just send once a second. These are
generated by the FPGA monitoring the pointers, not by software. The ILB
processing thread on the probe can then use this information to adjust the
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 33 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
So, we need someway of trying to flood from multiple tentacles into the switch and
check that we can saturate a given output link. Simple to test once we have a switch
and some firmware which respects pause...
Add firmware support for pause frames: on receive, we need to extract pause
frames from the incoming packets and pass them to the transmit side where we
initialise a timer and push pressure back to TERF until the timer expires.DONE, but
not tested at all!
Define a query interface: understand how REST API works and which bits we need.
Q2 datapath query return path:
o
we need to implement a multi-HSBM 40G rxdatapath.
o
We also need to be capable of steering based on query ID, which is
embedded in the E3 format.
o
We also want to de-encapsulate the packets (maybe?)
Q2 datapath packet capture side:
o
Two or 4 BM read stream modules, as Gerard has apparently demonstrated
(two streams)
o
multi-stream TERF: 4 streams, best effort merging into a 40G core. We want
this to be a 40G rather than mapping streams to 4x10G because it gives
more flexibility if one of the streams is larger than the others, without
having to switch.
Flow control options:
o
Packet storage: we have 400 receive streams within the Kraken. Each of
these should be receiving packets from 1 of N processing threads (where N
is somewhere around 6 perhaps – however many are needed to implement
the ILB algorithm?). Each receive stream can send stream status to the
processing thread via statistics packets – perhaps if the buffer level hits half
full we send something, otherwise we just send once a second. These are
generated by the FPGA monitoring the pointers, not by software. The ILB
processing thread on the probe can then use this information to adjust the
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 33 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
load balancing distribution. Is this enough to handle hotspots / large flows?
How long will it take to adjust the load balancing, and is that enough to
divert traffic and avoid drop?
o Query return: we have one receive stream per query on the Q2. The
tentacles respond at whatever rate they’re capable of. If the switch is
overloaded (which we would expect it to be if all tentacles are reading from
all drives), we use 802.3X pause to stop the tentacles. At thispoint we’re
saturating the link, so head of line blocking is not a concern (provided
queries aren’t prioritized). Once at the Q2, we are limited by HSBM internal
bandwidth of 32GB. This implies that we would need to push pause back to
the switch on the 40G port.
Provided we are using the full 32Gb/s
bandwidth to the memory hole(s), again, this is OK. As a memory hole fills,
we need to send stream status updates to each of the tentacles, so they are
aware of whether to throttle back a particular query. This does imply that
we require a separate transmit stream for each query on each of the
tentacles.
DAG
Probe
ILB
ILB
...R
Kraken
Packet
storage
40G
...
Query
Handler
...
N
Query
steer
(Packet storage
operation)
Stream
status
Storage
status steer
Q2
Tentacle
SATA
DAG
x20
Transmit
(two merged streams)
R
A
Pause
(32G datapath
throttle)
Switch
Pause
(40G link
Query throttle)
Stream
status
return
Lookup
Stream
status
DAG
B
SATA
x20
Receive stream
Transmit stream
Packet flow
Flow control / status flow
R = Receive processors / ILB (6?)
N = Query handlers (32? 32-R?)
Firmware jobs:
o
o
10G pause end-to-end – i.e. we transmit pause
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 34 of 43
Tentacle
(Query
operation)
EDMXX-XX v1.6 Kraken POC Overview and Summary
load balancing distribution. Is this enough to handle hotspots / large flows?
How long will it take to adjust the load balancing, and is that enough to
divert traffic and avoid drop?
o Query return: we have one receive stream per query on the Q2. The
tentacles respond at whatever rate they’re capable of. If the switch is
overloaded (which we would expect it to be if all tentacles are reading from
all drives), we use 802.3X pause to stop the tentacles. At thispoint we’re
saturating the link, so head of line blocking is not a concern (provided
queries aren’t prioritized). Once at the Q2, we are limited by HSBM internal
bandwidth of 32GB. This implies that we would need to push pause back to
the switch on the 40G port.
Provided we are using the full 32Gb/s
bandwidth to the memory hole(s), again, this is OK. As a memory hole fills,
we need to send stream status updates to each of the tentacles, so they are
aware of whether to throttle back a particular query. This does imply that
we require a separate transmit stream for each query on each of the
tentacles.
DAG
Probe
ILB
ILB
...R
Kraken
Packet
storage
40G
...
Query
Handler
...
N
Query
steer
(Packet storage
operation)
Stream
status
Storage
status steer
Q2
Tentacle
SATA
DAG
x20
Transmit
(two merged streams)
R
A
Pause
(32G datapath
throttle)
Switch
Pause
(40G link
Query throttle)
Stream
status
return
Lookup
Stream
status
DAG
B
SATA
x20
Receive stream
Transmit stream
Packet flow
Flow control / status flow
R = Receive processors / ILB (6?)
N = Query handlers (32? 32-R?)
Firmware jobs:
o
o
10G pause end-to-end – i.e. we transmit pause
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 34 of 43
Tentacle
(Query
operation)
EDMXX-XX v1.6 Kraken POC Overview and Summary
o
40G pause support?
o
Stream Status automatic transmission
o
Multiple TX streams (up to 32)
o
8.4Ethernet Switch Options
A fundamental component of this design is the Ethernet switch at the entrance to the
Kraken box. The functionality that we need in this switch is fairly simple (L2 learning and non-block
routing) but the performance of the switch will directly impact the Kraken POC performance. Our
current first choice for the POC is to buy an Arista 1U switch.
Another choice for this switch is the Intel Fulcrum FM6764. We are making contact with
Intel and ADI Engineering (the maker of the reference design). However, as this is the hottest switch
chip from Intel at present we may have trouble getting samples and reference designs.
A much more difficult option would be to use an fpga but this would require developing our
own 10Ge switching fabric as none is available as IP at present.Given the non-blocking bandwidth
required by the design (at least 450 Gbps in POC) such a switch would require multiple FPGAs and
would be a serious design problem.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 35 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
o
40G pause support?
o
Stream Status automatic transmission
o
Multiple TX streams (up to 32)
o
8.4Ethernet Switch Options
A fundamental component of this design is the Ethernet switch at the entrance to the
Kraken box. The functionality that we need in this switch is fairly simple (L2 learning and non-block
routing) but the performance of the switch will directly impact the Kraken POC performance. Our
current first choice for the POC is to buy an Arista 1U switch.
Another choice for this switch is the Intel Fulcrum FM6764. We are making contact with
Intel and ADI Engineering (the maker of the reference design). However, as this is the hottest switch
chip from Intel at present we may have trouble getting samples and reference designs.
A much more difficult option would be to use an fpga but this would require developing our
own 10Ge switching fabric as none is available as IP at present.Given the non-blocking bandwidth
required by the design (at least 450 Gbps in POC) such a switch would require multiple FPGAs and
would be a serious design problem.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 35 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
8.5Final POC Platform
In the final Proof-Of-Concept (POC) demonstration we would like to have a full probe with
Vision and the Meta-Database available. Whether the probe will be running EP5.1.1 or EP5.2 or
something entirely new is unknown at this point. However, the probe will allow a view into flows
and thereby allow the user to specify more interesting queries. The diagram below shows how the
POC is intended to be connected
Figure 7: Final POC Test Setup
Note that the packets are duplicated to both the standard probe box and the test probe box
(running Centos). This is done to simplify integration of Kraken related software running on the test
probe. It may turn out to be easier to integrate the Kraken related software directly into the
standard probe. In this case, only one probe will be required.
This latter case has the added
advantage that necessity of carefully time-syncing the two probes is removed. It is intended that
this architecture will demonstrate all requirements listed in the chapter “POC Detailed Design Goals
ToThe Meet Key Technical Challenges”.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 36 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
8.5Final POC Platform
In the final Proof-Of-Concept (POC) demonstration we would like to have a full probe with
Vision and the Meta-Database available. Whether the probe will be running EP5.1.1 or EP5.2 or
something entirely new is unknown at this point. However, the probe will allow a view into flows
and thereby allow the user to specify more interesting queries. The diagram below shows how the
POC is intended to be connected
Figure 7: Final POC Test Setup
Note that the packets are duplicated to both the standard probe box and the test probe box
(running Centos). This is done to simplify integration of Kraken related software running on the test
probe. It may turn out to be easier to integrate the Kraken related software directly into the
standard probe. In this case, only one probe will be required.
This latter case has the added
advantage that necessity of carefully time-syncing the two probes is removed. It is intended that
this architecture will demonstrate all requirements listed in the chapter “POC Detailed Design Goals
ToThe Meet Key Technical Challenges”.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 36 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
9 Appendix A: Kraken Rough Cost Breakdown – note – now out of
date
SUCKERS
True Capacity
380
TB
Advertised Capacity
350
TB
total # of drives:
380
spare drives
TENTACLES
Mantle
Power Supplies
Cooling
Chassis
30
drive cost:
$
77
cpu cost:
$
70
suckerpcb ?
$
50
Other sucker ?
$
10
Total Nodes:
$
47,642
# of tentacles
10
Tentacle PCB
$
100
Tentacle components
$
500
Total Tentacles
$
6,000
# of Mantles
1
Mantle PCB
$
1,500
Mantle SBC
$
300
Mantle FPGA
$
1,200
Mantle DDR3
$
200
Mantle other component
$
800
Total Mantles:
$
4,000
# of Power Supplies
3
Power Supply Cost
$
350
Cable cost
$
20
Total Power Supply:
$
1,110
Fans
$
400
Boards
$
300
Components
$
100
Total Cooling:
$
800
Chassis Cost
$
1,200
Optics
$
600
Total Chassis Cost:
$
1,800
Total Kraken
Components:
$
61,352
Production Cost:
$
2,000
Production Test:
$
1,000
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 37 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
9 Appendix A: Kraken Rough Cost Breakdown – note – now out of
date
SUCKERS
True Capacity
380
TB
Advertised Capacity
350
TB
total # of drives:
380
spare drives
TENTACLES
Mantle
Power Supplies
Cooling
Chassis
30
drive cost:
$
77
cpu cost:
$
70
suckerpcb ?
$
50
Other sucker ?
$
10
Total Nodes:
$
47,642
# of tentacles
10
Tentacle PCB
$
100
Tentacle components
$
500
Total Tentacles
$
6,000
# of Mantles
1
Mantle PCB
$
1,500
Mantle SBC
$
300
Mantle FPGA
$
1,200
Mantle DDR3
$
200
Mantle other component
$
800
Total Mantles:
$
4,000
# of Power Supplies
3
Power Supply Cost
$
350
Cable cost
$
20
Total Power Supply:
$
1,110
Fans
$
400
Boards
$
300
Components
$
100
Total Cooling:
$
800
Chassis Cost
$
1,200
Optics
$
600
Total Chassis Cost:
$
1,800
Total Kraken
Components:
$
61,352
Production Cost:
$
2,000
Production Test:
$
1,000
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 37 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
Total Product Cost:
$
Gross Margin
MSRP
64,352.00
70%
$
214,507
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 38 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
Total Product Cost:
$
Gross Margin
MSRP
64,352.00
70%
$
214,507
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 38 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
10Appendix B: Ethernet Rings
If we build a double ring architecture that would seem to guarantee continuous connection
between a Probe and its Krakens and a constant high bandwidth. Clearly we must avoid a true
Ethernet ring as that would lead to a storm of duplicated packets. So each ring must be broken
intentionally at some point but fixed rapidly when another part of the ring fails. This is identical to
how STP (Spanning Tree Protocol) works. Keeping a standard Ethernet architecture would allow
extending the Kraken ring using existing switches. However, the existing solution of STP can require
50 seconds to “heal” a modified network – that results in far too many lost packets for us.
Unfortunately, while there are a large number of potential solutions out there, none are truly
“standard” as of yet.
RSTP apparently has a convergence time of around 2-5ms (towards the upper end of that
usually) per hop, worst case around 100ms (root bridge failure). It is well standardized in 802.1D2004 which also obsoletes STP. 802.1D-2004 apparently made some optimizations to the original
version of RSTP to better support low convergence times, and I’m not sure how well supported that
area is (I believe it used to be in the region of a second). RSTP wasn’t really designed with rings in
mind so there is some unnecessary traffic that goes on, and half the ring is momentarily isolated.
Essentially the problem is when a break occurs the ‘downstream’ switch/box thinks it is the root and
advertises this down, it isn’t until the message reaches the real root (through the blocked port) that
the correct information propagates back up. This means the backup ports with fast proposalagreement handshaking don’t work too well in a ring. A good description of what happens in a ring is
from
page
20
convergence.pdf.
of http://blog.ine.com/wp-content/uploads/2010/04/understanding-stp-rstpThere
is
a
paper
on
the
performance
at
http://www.odva.org/Portals/0/Library/CIPConf_AGM2009/2009_CIP_Networks_Conference_Techn
ical_Track_RSTP.pdf, as well as a calculation mechanism in IEC 62439-1 but I can’t access that (also
see link at bottom).
ERPS (ITU-REC G.8032) seems like quite a nice protocol, though it is fairly new so we might
need our own implementation (which I’m guessing we’d probably do anyway regardless, except
RSTP). It seems to be supported by a number of common vendors (Cisco, Juniper etc. though I can’t
find any Arista) though presumably only in rather new switches and/or software releases and I am
not familiar which product ranges. Essentially it works like the diagram in our document: One port is
chosen to be blocked known as the Root Protection Link (RPL) owned by the RPL Owner (also known
as the master bridge). If a node sees a link go down it immediately blocks the port and sends a
message (the standard says ‘as quickly as possible’, within 3.3ms) to the RPL owner who unblocks
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 39 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
10Appendix B: Ethernet Rings
If we build a double ring architecture that would seem to guarantee continuous connection
between a Probe and its Krakens and a constant high bandwidth. Clearly we must avoid a true
Ethernet ring as that would lead to a storm of duplicated packets. So each ring must be broken
intentionally at some point but fixed rapidly when another part of the ring fails. This is identical to
how STP (Spanning Tree Protocol) works. Keeping a standard Ethernet architecture would allow
extending the Kraken ring using existing switches. However, the existing solution of STP can require
50 seconds to “heal” a modified network – that results in far too many lost packets for us.
Unfortunately, while there are a large number of potential solutions out there, none are truly
“standard” as of yet.
RSTP apparently has a convergence time of around 2-5ms (towards the upper end of that
usually) per hop, worst case around 100ms (root bridge failure). It is well standardized in 802.1D2004 which also obsoletes STP. 802.1D-2004 apparently made some optimizations to the original
version of RSTP to better support low convergence times, and I’m not sure how well supported that
area is (I believe it used to be in the region of a second). RSTP wasn’t really designed with rings in
mind so there is some unnecessary traffic that goes on, and half the ring is momentarily isolated.
Essentially the problem is when a break occurs the ‘downstream’ switch/box thinks it is the root and
advertises this down, it isn’t until the message reaches the real root (through the blocked port) that
the correct information propagates back up. This means the backup ports with fast proposalagreement handshaking don’t work too well in a ring. A good description of what happens in a ring is
from
page
20
convergence.pdf.
of http://blog.ine.com/wp-content/uploads/2010/04/understanding-stp-rstpThere
is
a
paper
on
the
performance
at
http://www.odva.org/Portals/0/Library/CIPConf_AGM2009/2009_CIP_Networks_Conference_Techn
ical_Track_RSTP.pdf, as well as a calculation mechanism in IEC 62439-1 but I can’t access that (also
see link at bottom).
ERPS (ITU-REC G.8032) seems like quite a nice protocol, though it is fairly new so we might
need our own implementation (which I’m guessing we’d probably do anyway regardless, except
RSTP). It seems to be supported by a number of common vendors (Cisco, Juniper etc. though I can’t
find any Arista) though presumably only in rather new switches and/or software releases and I am
not familiar which product ranges. Essentially it works like the diagram in our document: One port is
chosen to be blocked known as the Root Protection Link (RPL) owned by the RPL Owner (also known
as the master bridge). If a node sees a link go down it immediately blocks the port and sends a
message (the standard says ‘as quickly as possible’, within 3.3ms) to the RPL owner who unblocks
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 39 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
the port and sends a message around the ring to flush MACs. When the link comes back up those
nodes send a message to the RPL owner which blocks its RPL port and sends a flush message, after
which the affected nodes put the link that just came up into the forwarding state. v2 of the standard
optimizes the flush messages in some way, and adds administration commands for specifically
bringing down a link as well as support for multiple rings. There is also a polling hello message sent
by the RPL owner as a fallback just in case. 50ms recovery time in the standard is for a fibre ring of
<1200km, <16 nodes so we can probably do better than that. There are a couple of potential issues
though there is not yet a mechanism for electing the RPL owner, nor for detecting erroneous
multiple RPL owners. Because of this I’m not sure what would happen if the RPL owner failed.
Separate data (RPL blocked) and control (RPL open) VLANs are also needed (there can be more than
one of these pairs, and there can be multiple per domain).
802.17 is probably a dead end: It is a different MAC layer (i.e. not at all a normal Ethernet
packet, though it does interoperate by using the same addresses and including the same fields
somewhere), is quite complicated and appears to not have really been implemented by anyone
much and seems to essentially being replaced by g.8032 in the minds of carriers. It has some nice
features that are probably unnecessary like class of service.
Then there is PRP and HSR defined by IEC 62439-3. They were originally intended for
industrial control (the standards relate to use in power substations) and have almost zero recovery
time as they send duplicates of packets at both ends and de-dupe using a counter attach to each
packet. PRP seems to need a redundant link (not entirely clear why as the standard isn’t freely
available, but I believe it is due to counters only being per network), and appends a trailer just
before the Ethernet FCS. HSR is designed for use in a ring (send packets down both ends) and has a
counter for each source(-destination pair?). HSR puts its header after the Ethernet header (possibly
as an ethertype?) so that unlike PRP the entire frame does not need to be read to decide if the
packet is a duplicate or not (which could be a long time with jumbo frames). This would affect
compatibility though as it no longer appears as padding.
The same standard also defines MRP (IEC 62439-2), a simple ring protection protocol (one
ring only) that has one device send a control packet in both directions and checks it receives them.
That standard guarantees <10ms for up to 14 nodes (depends on the size of the loop), but I don’t see
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 40 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
the port and sends a message around the ring to flush MACs. When the link comes back up those
nodes send a message to the RPL owner which blocks its RPL port and sends a flush message, after
which the affected nodes put the link that just came up into the forwarding state. v2 of the standard
optimizes the flush messages in some way, and adds administration commands for specifically
bringing down a link as well as support for multiple rings. There is also a polling hello message sent
by the RPL owner as a fallback just in case. 50ms recovery time in the standard is for a fibre ring of
<1200km, <16 nodes so we can probably do better than that. There are a couple of potential issues
though there is not yet a mechanism for electing the RPL owner, nor for detecting erroneous
multiple RPL owners. Because of this I’m not sure what would happen if the RPL owner failed.
Separate data (RPL blocked) and control (RPL open) VLANs are also needed (there can be more than
one of these pairs, and there can be multiple per domain).
802.17 is probably a dead end: It is a different MAC layer (i.e. not at all a normal Ethernet
packet, though it does interoperate by using the same addresses and including the same fields
somewhere), is quite complicated and appears to not have really been implemented by anyone
much and seems to essentially being replaced by g.8032 in the minds of carriers. It has some nice
features that are probably unnecessary like class of service.
Then there is PRP and HSR defined by IEC 62439-3. They were originally intended for
industrial control (the standards relate to use in power substations) and have almost zero recovery
time as they send duplicates of packets at both ends and de-dupe using a counter attach to each
packet. PRP seems to need a redundant link (not entirely clear why as the standard isn’t freely
available, but I believe it is due to counters only being per network), and appends a trailer just
before the Ethernet FCS. HSR is designed for use in a ring (send packets down both ends) and has a
counter for each source(-destination pair?). HSR puts its header after the Ethernet header (possibly
as an ethertype?) so that unlike PRP the entire frame does not need to be read to decide if the
packet is a duplicate or not (which could be a long time with jumbo frames). This would affect
compatibility though as it no longer appears as padding.
The same standard also defines MRP (IEC 62439-2), a simple ring protection protocol (one
ring only) that has one device send a control packet in both directions and checks it receives them.
That standard guarantees <10ms for up to 14 nodes (depends on the size of the loop), but I don’t see
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 40 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
any advantage over G.8032 which is much more widely supported other than it being slightly
simpler. All 3 of these only seems to be supported by more specialist switch vendors.
Personally I rather like G.8032 due to its simplicity and closeness to what we would probably
end up doing anyway, though I worry a little about what happens if someone takes out the RPL
owner. RSTP should be fine as a management VLAN or similar, and might be ok for packets
depending on how many packets we want to lose, if we want the simplest solution. HSR might also
be worth considering if we want zero failover time, but we would almost certainly need to
implement it ourselves and the standard isn’t freely available. I imagine with all these protocols
since we control the hardware we could also have extremely short hello times. If we were to have
our own protocol (such as simple very fast polling) we would need to be careful not to have a
momentary loop when the link comes back up (as avoided by G.8032). We may also like to keep in
mind that Kraken may not be in the same rack and how might that work with a ring?
In summary, the ring architecture in Ethernet is problematic – doable but adds work – we
need to feed in some actual customer requirements here.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 41 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
any advantage over G.8032 which is much more widely supported other than it being slightly
simpler. All 3 of these only seems to be supported by more specialist switch vendors.
Personally I rather like G.8032 due to its simplicity and closeness to what we would probably
end up doing anyway, though I worry a little about what happens if someone takes out the RPL
owner. RSTP should be fine as a management VLAN or similar, and might be ok for packets
depending on how many packets we want to lose, if we want the simplest solution. HSR might also
be worth considering if we want zero failover time, but we would almost certainly need to
implement it ourselves and the standard isn’t freely available. I imagine with all these protocols
since we control the hardware we could also have extremely short hello times. If we were to have
our own protocol (such as simple very fast polling) we would need to be careful not to have a
momentary loop when the link comes back up (as avoided by G.8032). We may also like to keep in
mind that Kraken may not be in the same rack and how might that work with a ring?
In summary, the ring architecture in Ethernet is problematic – doable but adds work – we
need to feed in some actual customer requirements here.
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 41 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
11Open Questions
11.1
Query Size
11.2
Packet Sorting
11.3
Arista Switch
11.4
Text Search
Can we break big queries into multiple 1GB queries ? Can we assume we know the amount
of data that will be returned? Answer from Stuart was yes
Do the packets in the returned query need to be sorted (in timestamp order) before
reaching the memory hole on the probe or can we use cpu on the probe to perform the sorting ?
Our development test platform will need a 10Ge capable switch. Will we be able to get such
a beast from Arista ?
Is text string search required for the POC ? Does it need to be distributed? If so, does it need
to handle non-flow-coherent (i.e. large) flows that require east-west communication?
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 42 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
11Open Questions
11.1
Query Size
11.2
Packet Sorting
11.3
Arista Switch
11.4
Text Search
Can we break big queries into multiple 1GB queries ? Can we assume we know the amount
of data that will be returned? Answer from Stuart was yes
Do the packets in the returned query need to be sorted (in timestamp order) before
reaching the memory hole on the probe or can we use cpu on the probe to perform the sorting ?
Our development test platform will need a 10Ge capable switch. Will we be able to get such
a beast from Arista ?
Is text string search required for the POC ? Does it need to be distributed? If so, does it need
to handle non-flow-coherent (i.e. large) flows that require east-west communication?
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 42 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
12Bibliography
Pinheiro, E. (2007, February). Failure Trends in a Large Disk Drive Population. Retrieved from
static.googleusercontrent.com:
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com
/en//archive/disk_failures.pdf
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 43 of 43
EDMXX-XX v1.6 Kraken POC Overview and Summary
12Bibliography
Pinheiro, E. (2007, February). Failure Trends in a Large Disk Drive Population. Retrieved from
static.googleusercontrent.com:
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com
/en//archive/disk_failures.pdf
©2013 Endace Technology Ltd. Confidential – External distribution prohibited – Internal distribution restricted.
Page 43 of 43