How to configure Link Aggregation Control Protocol on Exadata
During a recent X5 installation I had to configure Link Aggregation Control Protocol (LACP) on the client network of the compute nodes. Although the ports were running at 10Gbits and default configuration of Active/Passive works perfectly fine the customer wanted even distribution of traffic and workload across their core switches.
Link Aggregation Control Protocol (LACP), also known as 802.3ad is a methods of combining multiple physical network connections into one logical connection to increase throughput and provide redundancy in case one of the links should fail. The protocol requires both - the server and the switch(es) to have the same settings to allow LACP to work properly.
To configure LACP on Exadata you need to change the bondeth0 parameters.
On each of the compute nodes open the following file:
/etc/sysconfig/network-scripts/ifcfg-bondeth0
and replace the line saying BONDING_OPTS with this one:
BONDING_OPTS="mode=802.3ad xmit_hash_policy=layer3+4 miimon=100 downdelay=200 updelay=5000 num_grat_arp=100"[/plain]
and then restart the network interface:
ifdown bondeth0
ifup bondeth0
Determining if ip address 192.168.1.10 is already in use for device bondeth0...[/plain]
You can check the status of the interface by query the proc filesystem. Make sure both interfaces are up and running at the same speed. The esential part to make sure the LACP is working is shown below:
cat /proc/net/bonding/bondeth0
802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 33
Partner Key: 34627
Partner Mac Address: 00:23:04:ee:be:c8
I had a problem with the network where the client network did NOT come up after server reboot. This was happening because during system boot the 10Gbit interfaces goes through multiple resets causing very fast link change. Here is the status of the bond as of that time:
cat /proc/net/bonding/bondeth0
802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
bond bondeth0 has no active aggregator
The solution for that was to decrease the down_delay to 200. The issue is described in this note:
Bonding Mode 802.3ad Using 10Gbps Network - Slave NICs Fail to Come Up Consistently after Reboot (Doc ID 1621754.1)