7

I have an issue with packets dropping to a third party data center in Florida, USA. The issue only occurs on Azure Virtual Machines, no matter which data center the VM is in. I've done the same tests simultaneously from other non-Azure networks, and there is no packets loss. The Azure Virtual Machines were "vanilla" / out of the box with no software loaded or other customizations / changes.

I've already spoken to the network admins at the data center and the only packets they are seeing are the ones that don't timeout; the packets that timeout never reach their firewall, so it sounds like something on the Azure side (especially since the packets consistently drop/timeout from multiple Azure data centers / regions). Does anyone know how I might solve this?

The test I was running was a continuous TCP ping (using tcping.exe) to port 80 (since ICMP is blocked on Azure):

tcping -t 216.155.111.149 80 tcping -t 216.155.111.151 80 tcping -t 216.155.111.146 80 

Other evidence supporting the fact that it's not the third party data center is that I can run the same continuous TCP ping from my home computer / work computer and drop no packets. I also setup a tunnel VPN from the Azure VM to a VM at a non-Azure data center and no packets are dropped. The only time packets are dropped is when the traffic goes out to the internet/WAN directly via Azure.

I know the next step would be some trace route tests, but since Azure blocks ICMP, I had to use nmap to run a TCP trace route; pasted below are the screenshots from those tests.

nmap -sS -p 80 -Pn --traceroute 216.155.111.149 

test1

test2

test3

test4

7
  • I'm seeing this too. It's bizarre. Did you ever get this resolved? Commented Oct 27, 2015 at 0:49
  • @CharlesOffenbacher, No, I am still having this issue. As a work around for the time being, I created a Linux VM with another cloud hosting provider, installed a VPN server role on that new server, connected the Azure Windows Server 2012 R2 guest VM to that VPN server and created a static routing policy to route only the traffic destined for that IP range via the VPN connection (all other traffic will still flow out via the Azure WAN to the internet like normal). But this isn't a permanent solution. I'm still hopeful someone will respond and help get this fixed permanently. Commented Oct 27, 2015 at 15:29
  • Wow, that's tough! I'm not sure if I'm actually experiencing exactly the same issue, but possibly. What I'm seeing is that servers in Azure are randomly unable to establish a connection with non-Azure servers for what looks like packet loss. However, when I tcpdump, I see that my non-Azure server actually receives a packet but doesn't respond occasionally. I'm thinking my issue is related to the Azure NAT doing some weird things with timestamps. stackoverflow.com/questions/8893888/… . Commented Oct 27, 2015 at 17:28
  • serverfault.com/questions/235965/… Commented Oct 27, 2015 at 17:29
  • 1
    @AndrewBucklin I've reproduced and found a workaround for your issue. You still need those simultaneous captures if you want to get to the bottom of this, though. Commented Nov 3, 2015 at 15:39

1 Answer 1

3

As I've mentioned on my comment, you're effectively hitting a similar scenario as described in this article.

I could easily reproduce your behaviour:

Issue reproduced

And I could easily work around the issue by adding an Instance-Level Public IP to the VM:

Issue solved

It is difficult to say what is exactly going on, as we don't have simultaneous captures, but my understanding is that the edge device (potentially a firewall) on the remote site (www.oandp.com) keeps closed connections on it's connection table for longer than Azure does, so when Azure uses one of the freed (i.e. already used) ports and the remote side still thinks that connection is not fully closed, our SYN packets get dropped.

The ILPIP applies a static NAT or a "one to one NAT", hence there's no port translation nor port reuse (unless your OS does it), thus avoiding the issue.

6
  • Would this affect the private IP of the VM itself and the site-to-site VPN that's already in place for the customer? Commented Nov 3, 2015 at 19:11
  • No, it is just a public IP address for this specific instance, so the private IP (DIP) will remain the same. The ILPIP will be your public source IP by default for traffic going to Internet. Commented Nov 3, 2015 at 19:23
  • 1
    Bingo! Much appreciated. The traffic destined for the internet immediately switched to route via the new PIP that I assigned to the VM. VPN connectivity didn't even hiccup. Thanks again. Commented Nov 3, 2015 at 23:50
  • 1
    The "ILPIP" or instance-level public IP is not supported in the modern Azure - there is no solution for getting around this issue: superuser.com/questions/1132967/… Commented Jan 23, 2017 at 9:26
  • 1
    Hi joonas.fi - In ARM ("modern Azure") you can attach public IPs directly to your VM. It is in fact the default when you deploy a VM. The problem above might only appear if you purposedly deploy VMs in an Availability Set, behind a Load Balancer and remove the VM's public IP addresses (known now as PIPs). Hope this helps! Commented Jan 23, 2017 at 16:39

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.