

My favorite tool for this task is ngrep, which can do things like normal grep on the network layer, and the ability to save the captured packets into a PCAP format. This of course adds strain to the system load and puts the interface in promiscuous mode, so it's not something that should be turned on permanently. One of the little weird joys I do while debugging live (network-oriented) production issues, is to look into the network layer and see the packets coming in. But thanks to a couple of handful experiences (failures included), this is out of the question anymore. The younger version of myself would think to test the fix in production, where it can easily be replicated. The issue was fairly trivial in nature, and could've been prevented with certain measures, but that's not what this post is about. It turns out our PDU parser written in Erlang was failing to recognize packets fragmented throughout the network. Upon further digging, it seems that the volume wasn't the main contributor for the problem. This is quite unusual since all of our other connections were stable, despite the huge traffic we're getting and generating from time to time.

On a very recent set of events in my workplace, we discovered we were having problems receiving server-initiated traffic from one of our connectivity partners.
