DHCP Bug in Extreme ERS 3600 Switch Series – Resolved
I recently logged a ticket with Extreme Networks support about an issue I was having with DHCP Relay on a ERS 3600 series switch.
In short, DHCP relay redundancy was broken. In the likely event that you define multiple DHCP servers in DHCP Relay, DHCP Discover packets received by backup DHCP servers in the event of the primary server being down will be corrupted, and DHCP will fail for clients.
This bug was fixed by Extreme in this month’s release, 6.3.2, so upgrade to this version if you are doing DHCP relay to multiple DHCP servers on your 3600 edge switches.
More details below on the issue for anybody interested.
I first noticed this bug in my home lab, where I do DHCP relay from an ERS 3650GTS-PWR+ switch (running software version 6.3.0.033) to two Windows 2012 R2 domain controllers, running on separate ESXi 6.7 hosts:
Here is the breakdown of testing to demonstrate the bug:
Both DCs online, only primary DC in DHCP Relay: DHCP working for clients. Both DCs online, both DCs in DHCP Relay: DHCP working for clients. Primary DC offline, both DCs in DHCP Relay: DHCP NOT working for clients. Primary DC Offline, only secondary DC in DHCP Relay: DHCP working for clients
As you can see, in the event of a DC/DHCP server going offline and the switch attempting to relay to another DHCP server defined in DHCP Relay, DHCP fails. So essentially there is no redundancy.
Taking a look at Wireshark, the issue seemed to be with the “Relay agent IP address” field in the DHCP discover packet received by the DHCP server.
Normally, this would of course have the address of the relaying switch, in this case 10.2.2.1, so that response packets from the DHCP server will be sent back to the relaying switch.
However in the case of the above testing, this field was corrupted, with the address reversed to 22.214.171.124, which presumably meant DHCP offers were being attempted but failing to reach the correct address:
The fix contained in release 6.3.2 resolves this issue.