Thursday, March 28, 2013

Improving VM to VM network throughput on an ESXi platform


Recently I virtualized most of the servers I had at home into an ESXi 5.1 platform. This post would follow my journey to achieve better network performance between the VMs.

I am quite happy with the setup as it allowed me to eliminate 5-6 physical boxes in favor of one (very strong) machine. I was also able to achieve some performance improvements  but not to the degree I hoped to see.

I have a variety of machines running in a virtualized form:
1. Windows 8 as my primary desktop, passing dedicated GPU and USB card.from the host to the VM using VMDirectPath
2. Multiple Linux servers
3. Solaris 11.1 as NAS, running the great napp-it software (http://www.napp-it.org/) 

All the machines have the latest VMware Tools installed and running paravirtualized drivers where possible.

VM to VM network performance has been great between the Windows/Linux boxes once I enabled Jumbo Frames. 
Throughout this post I'll use iperf to measure network performance. It's a great and easy to use tool and you can find precompiled version for almost any operating system. http://iperf.fr/

Let's start with an example of network throughput performance from the Windows 8 Machine to Linux:










11.3 Gbps, not bad. CPU utilization was around 25% on the windows box throughout the test.

Network performance between the Solaris VM and any other machine on the host was relatively bad. 
I started by using the E1000G virtual adapter, as recommended by VMware for Solaris 11 (http://kb.vmware.com/kb/2032669). We'll use one of my Linux VMs (at 192.168.1.202) as a server for these tests. using iperf to test:















1.36 Gbps. Not bad between physical servers, but unacceptable between VMs on the same host. also notice the very high CPU utilization during the test - around 80% system time.

My immediate instinct was to enable jumbo frames. Although the adapter driver is supposed to support jumbo frames, I was unable to enable it no matter how hard I fought it. 


root@solaris-lab:/kernel/drv# dladm set-linkprop -p mtu=9000 net0
dladm: warning: cannot set link property 'mtu' on 'net0': link busy

I gave up on getting better performance from the E1000G adapter and switched to VMXNET3. I immediately saw improvement:















2.31 Gbps. but more importantly, the cpu utilization was much lower.

Now let's try to enable jumbo frames for the vmxnet3 adapter - followed the steps in http://kb.vmware.com/kb/2012445 and http://kb.vmware.com/kb/2032669 without success. The commands work, but jumbo frames were not really enabled. we can test with 9000 byte ping -

root@solaris-lab:~# ping -s 192.168.1.202 9000 4
PING 192.168.1.202: 9000 data bytes
----192.168.1.202 PING Statistics----
4 packets transmitted, 0 packets received, 100% packet loss


As my next step I was planning on running some dtrace commands, and accidentally noticed that the drivers I have installed are the Solaris 10 version and not the Solaris 11 version.


root@solaris-lab:~/vmware-tools-distrib# find /kernel/drv/ -ls |grep vmxnet3
78669    2 -rw-r--r--   1 root     root         1071 Mar 27 01:42 /kernel/drv/vmxnet3s.conf
78671   34 -rw-r--r--   1 root     root        34104 Mar 27 01:42 /kernel/drv/amd64/vmxnet3s
78670   25 -rw-r--r--   1 root     root        24440 Mar 27 01:42 /kernel/drv/vmxnet3s
root@solaris-lab:~/vmware-tools-distrib# find . -ls |grep vmxnet3
  231   25 -rw-r--r--   1 root     root        24528 Nov 17 07:55 ./lib/modules/binary/2009.06/vmxnet3s
  234    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/2009.06/vmxnet3s.conf
  250    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/10/vmxnet3s.conf
  244   25 -rw-r--r--   1 root     root        24440 Nov 17 07:55 ./lib/modules/binary/10/vmxnet3s
  262   34 -rw-r--r--   1 root     root        34104 Nov 17 07:55 ./lib/modules/binary/10_64/vmxnet3s
  237   35 -rw-r--r--   1 root     root        35240 Nov 17 07:55 ./lib/modules/binary/11_64/vmxnet3s
  227   34 -rw-r--r--   1 root     root        34256 Nov 17 07:55 ./lib/modules/binary/2009.06_64/vmxnet3s
  253   25 -rw-r--r--   1 root     root        24672 Nov 17 07:55 ./lib/modules/binary/11/vmxnet3s
  259    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/11/vmxnet3s.conf


This is very strange as installation of the Tools is a straightforward procedure with no room for user error.

So I decided to open the Tools installation script (perl) and found an interesting bug -


...
sub configure_module_solaris {
  my $module = shift;
  my %patch;
  my $dir = db_get_answer('LIBDIR') . '/modules/binary/';
  my ($major, $minor) = solaris_os_version();
  my $os_name = solaris_os_name();
  my $osDir;
  my $osFlavorDir;
  my $currentMinor = 10;   # The most recent version we build the drivers for

  if (solaris_10_or_greater() ne "yes") {
    print "VMware Tools for Solaris is only available for Solaris 10 and later.\n";
    return 'no';
  }

  if ($minor < $currentMinor) {
    $osDir = $minor;
  } else {
    $osDir = $currentMinor;
  }
For Solaris 11.1, $minor is 11, which forces $osDir to be Solaris 10. Bug ?
Either way it's very easy to fix - just change "<" to ">":

if ($minor > $currentMinor) {

Re-install Tools using the modified script and reboot. 
Let's check the installed driver now:



root@solaris-lab:~/vmware-tools-distrib# find /kernel/drv/ -ls |grep vmxnet3
79085    2 -rw-r--r--   1 root     root         1071 Mar 27 02:00 /kernel/drv/vmxnet3s.conf
79087   35 -rw-r--r--   1 root     root        35240 Mar 27 02:00 /kernel/drv/amd64/vmxnet3s
79086   25 -rw-r--r--   1 root     root        24672 Mar 27 02:00 /kernel/drv/vmxnet3s
root@solaris-lab:~/vmware-tools-distrib# find . -ls |grep vmxnet3
  231   25 -rw-r--r--   1 root     root        24528 Nov 17 07:55 ./lib/modules/binary/2009.06/vmxnet3s
  234    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/2009.06/vmxnet3s.conf
  250    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/10/vmxnet3s.conf
  244   25 -rw-r--r--   1 root     root        24440 Nov 17 07:55 ./lib/modules/binary/10/vmxnet3s
  262   34 -rw-r--r--   1 root     root        34104 Nov 17 07:55 ./lib/modules/binary/10_64/vmxnet3s
  237   35 -rw-r--r--   1 root     root        35240 Nov 17 07:55 ./lib/modules/binary/11_64/vmxnet3s
  227   34 -rw-r--r--   1 root     root        34256 Nov 17 07:55 ./lib/modules/binary/2009.06_64/vmxnet3s
  253   25 -rw-r--r--   1 root     root        24672 Nov 17 07:55 ./lib/modules/binary/11/vmxnet3s
  259    2 -rw-r--r--   1 root     root         1071 Nov 17 07:55 ./lib/modules/binary/11/vmxnet3s.conf

Now we have the correct version installed. 

Let's enable jumbo-frames as before and check if it made any difference:

root@solaris-lab:~# ping -s 192.168.1.202 9000 4
PING 192.168.1.202: 9000 data bytes
9008 bytes from 192.168.1.202: icmp_seq=0. time=0.338 ms
9008 bytes from 192.168.1.202: icmp_seq=1. time=0.230 ms
9008 bytes from 192.168.1.202: icmp_seq=2. time=0.289 ms
9008 bytes from 192.168.1.202: icmp_seq=3. time=0.294 ms
----192.168.1.202 PING Statistics----
4 packets transmitted, 4 packets received, 0% packet loss
round-trip (ms)  min/avg/max/stddev = 0.230/0.288/0.338/0.044

Success! jumbo-frames are working.


Let's test throughput with iperf:















Less than 1Mb/s, not what we expected at all!
Need to take a deeper look at the packets being sent. Let's use tcpdump to create a trace file:

root@solaris-lab:~# tcpdump -w pkts.pcap -s 100 -inet1 & PID=$! ; sleep 1s ; ./iperf -t1 -c192.168.1.202; kill $PID
[1] 1726
tcpdump: listening on net1, link-type EN10MB (Ethernet), capture size 100 bytes
------------------------------------------------------------
Client connecting to 192.168.1.202, TCP port 5001
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.206 port 35084 connected with 192.168.1.202 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.3 sec    168 KBytes  1.02 Mbits/sec
70 packets captured
70 packets received by filter
0 packets dropped by kernel

and open it in Wireshark for easier analysis:












The problem is clear with packet 7 - the driver is trying to send a 16KB packet, above our 9K MTU jumbo frame. This packet is not received outside of the VM and after a timeout it is being fragmented and retransmitted. This happens again for every packet generating a massive delay and causes throughput to be very low.

Reviewing the vmxnet3 driver source (open source at http://sourceforge.net/projects/open-vm-tools/) it seems the only way a packet larger than the MTU to be sent is if the LSO feature is enabled. 
To learn more about LSO (Large Segment Offload) read http://en.wikipedia.org/wiki/Large_segment_offload.
Essentially, the kernel is sending large packets (16K in the capture) and the NIC (or virtual NIC) is supposed to fragment the packet and transmit valid-size packets. On a real hardware NIC, at high speeds, this saves considerable amounts of CPU. in a virtualized environment I don't see the benefit. And it seems to be badly broken.

Let's disable LSO:

ndd -set /dev/ip ip_lso_outbound 0

And try to run iperf again:














12.1 Gbps, SUCCESS!

Now that that we are able to transmit from Solaris out in decent rates, let's check the performance of connections into the Solaris VM:









3.74 Gbps, not bad, but we can do better - let's at least get to 10Gbps.

Next step is to tune the TCP parameters to accommodate the higher speed needed - the buffers are simply too small for the amount of data in flight -


root@solaris-lab:~# ipadm set-prop -p max_buf=4194304 tcp
root@solaris-lab:~# ipadm set-prop -p recv_buf=1048576 tcp
root@solaris-lab:~# ipadm set-prop -p send_buf=1048576 tcp

And run iperf again:













18.3 Gbps, Not bad!

119 comments:

  1. Hey, cool write up! :-)

    I think there is an error in the code that detects the solaris version.

    This line:

    if ($minor < $currentMinor) {

    should be:
    if ($minor > $currentMinor) {

    /Jannich

    ReplyDelete
  2. I always wanted to put my solaris box on the ESXI, but the abysmal network performance killed it.

    This is super helpful!

    ReplyDelete
  3. FYI, when testing pinging with jumbo frames, you need to allow 28 bytes for the IP and ICMP headers, try 8972 instead of 9000 on your ping command ;-)

    ReplyDelete
  4. Great writeup, would love to see a post with more specifics on your hardware setup.

    ReplyDelete
  5. TCP performance is related to delay (RTT) and TCP window size (if we don't have any loss). You should use -w in iperf to define window size and remake the same test on windows and linux PC

    ReplyDelete
  6. Great write up and shows amazing results.. Thanks for spending your time to provide such information greatly appreciated..

    ReplyDelete
  7. You could enable jumbo frame on e1000g vnic by changing MaxFrameSize in /kernel/drv/e1000g.conf

    Default:
    MaxFrameSize=0,0,0,0 ...
    Change to:
    MaxFrameSize=3,3,3,3 ...

    Reboot.

    -----

    Anyway, thanks for ipadm set-prop tips.
    Regards.

    ReplyDelete
  8. The Solaris vmxnet3 driver has so many problems:

    - the LSO problem (there's a source patch that claims to fix the problem, but I haven't tried it: http://www.mail-archive.com/open-vm-tools-devel@lists.sourceforge.net/msg00812.html)

    - the garbage debug output printed to console

    - requiring the ndd 'accept-jumbo' flag to be set before MTU can be changed (why?!?!)

    I'm seriously thinking about taking the source from open-vm-tools, throwing it on github, and fixing these problems

    ReplyDelete
  9. In the latest update 5.1 u1, the grabage output in the driver is fixed - after updating to the latest vmware tools, ndd accept-jumbo I was not able to test, same with lso, but will check and report back !

    ReplyDelete
  10. Can you pls help understand the latter part of the sentence "On a real hardware NIC, at high speeds, this saves considerable amounts of CPU. in a virtualized environment I don't see the benefit"
    i.e. why do you say there is no benefit?

    thanks

    ReplyDelete
  11. Sure - on a real hardware NIC the segmentation is done in the NIC itself, meaning the host needs to build far less packets, calculate CRC, etc. the work of building the packet headers takes CPU on the host, and when offloaded to the hardware, can save can save considerable load. Now, in a virtualized environment, there is no physical hardware to build headers, but it is simulated by the ESXi host, still taking CPU. you are shifting load from the VM to the Host, but in total don't save any computation done on the main CPU.

    ReplyDelete
  12. How were you able to hit speeds OVER 10Gbits/sec when the drivers itself is a 10Gb driver??
    Please let me know, I've been testing with the vmxnet3 drivers in windows and linux.

    ReplyDelete
    Replies
    1. Nothing is limiting the driver to 10Gb. It can go much faster.

      Delete
  13. Nice post. But I don't understand why in my esxi 5.1 u1 lab, the iperf speed between my Windows Sever 2003 and a Linux is about 300Mbit/s, very pool. I didn't enable Jumbo frame, but those two have vmxnet3 driver installed.
    Can somebody shed some lights for troubleshooting this?

    ReplyDelete
    Replies
    1. I'd be happy to help.
      first step is to isolate the direction - are you getting 300Mb/s in both directions (linux->win and win->linux) ?
      did you try to take a tcpdump and look at the results? I'd be happy to examine it for you.

      Delete
  14. Hi Cyber Explorer,
    Can you share some tips for tuning Linux and Windows network performance if there is any?

    My machine is not strong ( AMD 1.6Ghz x 2 ), but currently I get only 500Mbit/s transfer at max, no matter what OS it is, Linux, Windows, FreeBSD or Solaris. Is it normal?

    ReplyDelete
    Replies
    1. my hunch is that you can do better. I don't have experience with these specific CPUs, but I have a standalone server (not virtualized) based on Intel Atom D510 and I am able to exhaust it's 1 GbE port after network tuning. I believe the D510 is weaker than your AMD cpus.

      Delete
    2. Today I tried to locate the problem. I have Windows Server 2003 R2 vm as a domain controller. And first I tried the iperf with the loop address to exclude other impacts. And the result is:
      C:\iperf-2.0.5-2-win32>iperf.exe -c127.0.0.1
      ------------------------------------------------------------
      Client connecting to 127.0.0.1, TCP port 5001
      TCP window size: 64.0 KByte (default)
      ------------------------------------------------------------
      [ 3] local 127.0.0.1 port 3004 connected with 127.0.0.1 port 5001
      [ ID] Interval Transfer Bandwidth
      [ 3] 0.0-10.0 sec 761 MBytes 637 Mbits/sec

      C:\iperf-2.0.5-2-win32>iperf.exe -w 125K -c127.0.0.1
      ------------------------------------------------------------
      Client connecting to 127.0.0.1, TCP port 5001
      TCP window size: 125 KByte
      ------------------------------------------------------------
      [ 3] local 127.0.0.1 port 3005 connected with 127.0.0.1 port 5001
      [ ID] Interval Transfer Bandwidth
      [ 3] 0.0-10.0 sec 814 MBytes 682 Mbits/sec

      It would eat up 80% of CPU usage, so maybe it's a CPU problem, isn't it?

      Delete
    3. do you have jumbo frames enabled?

      Delete
    4. Nope. Later I found that if I change window size to 1M, I can get 900Mbit/s throughput for loop address on Windows 2003. But I think it's Windows problem, because when testing iperf on my OpenIndiana vm through loop address, I can get 7Gbit/s, which is satisfying.

      Now what can I do next? Since I still cannot get a satisfying speed between VMs under the same port group.

      Delete
    5. so that's the problem - enable jumbo frames on the windows box and you should get a much better throughput with lower cpu.

      Delete
    6. Since the physical switch which connects to the esxi server doesn't support jumbo frames, can I enable it on esxi? Will it influence the physical clients?

      Delete
    7. Good question, I'm not sure. best would be to simply try...

      Delete
  15. USB passthrough is broken in 5.1. How were you able to pass GPU AND USB through on 5.1?? After doing your tuning, did you notice any other bugs with the vmxnet3 adapter?

    ReplyDelete
    Replies
    1. I was able to stabilize USB pass-through on one specific version, any upgrade I attempted broke it and I had to roll back. It's ESXi 5.1.0 1021289.
      Same goes for the USB PCI-E adapter: tried a few and only one worked - http://www.amazon.com/gp/product/B005ARQV6U?ie=UTF8&camp=213733&creative=393185&creativeASIN=B005ARQV6U&linkCode=shr&tag=cybeblog-20&psc=1

      Delete
    2. As for other bugs - other than the known bug that's just making noise in the log, it's been extremely stable.

      Delete
  16. by the way, can you please show us your whitebox's spec, so we can use it as a reference?
    Thanks

    ReplyDelete
  17. I've been trying to break past an 800MB/sec nfs bottleneck between two linux guests on the same system using a standard vswitch with no physical nics assigned. i even tried bonding two different vswitches without any improvement. i also tried tweaking the various tcp window sizes. MTU is set to 9k, and tcpdump output appears to validate that:


    16:44:13.639338 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 126259665:126268613, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 8948
    16:44:13.639347 IP cgx.51236 > nfsa.commplex-link: Flags [P.], seq 126268613:126271513, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 2900
    16:44:13.639355 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 126271513:126280461, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 8948
    16:44:13.639359 IP nfsa.commplex-link > cgx.51236: Flags [.], ack 126241769, win 21993, options [nop,nop,TS val 78849579 ecr 74413494], length 0
    16:44:13.639376 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 126280461:126289409, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 8948
    16:44:13.639386 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 126289409:126298357, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 8948
    16:44:13.639387 IP nfsa.commplex-link > cgx.51236: Flags [.], ack 126259665, win 21993, options [nop,nop,TS val 78849579 ecr 74413494], length 0
    16:44:13.639393 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 126298357:126307305, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 8948
    16:44:13.639400 IP nfsa.commplex-link > cgx.51236: Flags [.], ack 126280461, win 21927, options [nop,nop,TS val 78849579 ecr 74413494], length 0
    16:44:13.639405 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 126307305:126316253, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 8948
    16:44:13.639416 IP cgx.51236 > nfsa.commplex-link: Flags [P.], seq 126316253:126320665, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 4412
    16:44:13.639426 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 126320665:126329613, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 8948
    16:44:13.639435 IP nfsa.commplex-link > cgx.51236: Flags [.], ack 126298357, win 21927, options [nop,nop,TS val 78849579 ecr 74413494], length 0
    16:44:13.639435 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 126329613:126338561, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 8948
    16:44:13.639444 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 126338561:126347509, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 8948
    16:44:13.639453 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 126347509:126356457, ack 1, win 140, options [nop,nop,TS val 74413494 ecr 78849579], length 8948



    but the iperf numbers are not very good.

    using the default frame sizes:
    TCP window size: 29.0 KByte (default)
    ------------------------------------------------------------
    [ 3] local 10.0.0.2 port 34335 connected with 10.0.0.1 port 5001
    [ 3] 0.0-10.0 sec 5.20 GBytes 4.47 Gbits/sec

    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)
    ------------------------------------------------------------
    [ 4] local 10.0.0.1 port 5001 connected with 10.0.0.2 port 34335
    [ 4] 0.0-10.0 sec 5.20 GBytes 4.46 Gbits/sec


    using tweaked window sizes of 640KB etc, showed no difference.

    the same issue is on both esxi 5.1.0 build 1065491 and 799733.

    just can't seem to get pas the 4.5Gb/sec.

    anyone have any thoughts?


    ReplyDelete
    Replies
    1. wanted to add that the cpu usage is minimal. we are using the vmxnet3 drivers. i've tried various versions (the 1.1.18 through 1.1.29 and 1.1.32). also tried disabling LRO in the vmware settings (as some other web searches suggested), and the gso/tso/lro in the guest. none of the various combinations make any difference.

      Delete
    2. can you paste a longer tcpdump output, maybe through http://pastebin.com/ ?
      Please include the tcp handshake, I want to see the window scale parameters.

      My gut feeling is that you are exhausting your rcv buffers, although that should not kick in before 15-20Gbps on modern linux kernels and hardware.

      Delete
    3. this is from the first part of the tcpdump. i used the command you indicated earlier. i didn't want to post the whole thing. the section I posted earlier was from the middle. If you tell me what keywords you need (assuming this isn't it) I will look for them. here also is the sysctl settings I used which result in the same performance. I'm a little stumped because you said your linux2windows performance was fine. this is linux2linux (centos 6.2 - 2.6.32-220 to a 2.6.32-400).

      net.core.rmem_max = 16777216
      net.core.wmem_max = 16777216
      ##net.core.rmem_default = 33554432
      ##net.core.wmem_default = 33554432
      net.ipv4.tcp_mem = 16777216 16777216 16777216
      net.ipv4.tcp_rmem = 4096 873800 16777216
      net.ipv4.tcp_wmem = 4096 655360 16777216
      #net.ipv4.tcp_wmem = 4096 8738000 16777216
      net.core.netdev_max_backlog = 30000
      vm.min_free_kbytes = 2097152


      and the beginning of the tcpdump...

      16:44:13.416153 IP cgx.51236 > nfsa.commplex-link: Flags [S], seq 1689649426, win 17920, options [mss 8960,sackOK,TS val 74413271 ecr 0,nop,wscale 7], length 0
      16:44:13.416318 IP nfsa.commplex-link > cgx.51236: Flags [S.], seq 3605531965, ack 1689649427, win 17896, options [mss 8960,sackOK,TS val 78849356 ecr 74413271,nop,wscale 7], length 0
      16:44:13.416358 IP cgx.51236 > nfsa.commplex-link: Flags [.], ack 1, win 140, options [nop,nop,TS val 74413271 ecr 78849356], length 0
      16:44:13.416397 IP cgx.51236 > nfsa.commplex-link: Flags [P.], seq 1:25, ack 1, win 140, options [nop,nop,TS val 74413271 ecr 78849356], length 24
      16:44:13.416425 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 25:8973, ack 1, win 140, options [nop,nop,TS val 74413271 ecr 78849356], length 8948
      16:44:13.416585 IP nfsa.commplex-link > cgx.51236: Flags [.], ack 25, win 140, options [nop,nop,TS val 78849356 ecr 74413271], length 0
      16:44:13.416597 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 8973:17921, ack 1, win 140, options [nop,nop,TS val 74413271 ecr 78849356], length 8948
      16:44:13.416632 IP nfsa.commplex-link > cgx.51236: Flags [.], ack 8973, win 272, options [nop,nop,TS val 78849356 ecr 74413271], length 0
      16:44:13.416647 IP cgx.51236 > nfsa.commplex-link: Flags [P.], seq 17921:26869, ack 1, win 140, options [nop,nop,TS val 74413271 ecr 78849356], length 8948
      16:44:13.416667 IP cgx.51236 > nfsa.commplex-link: Flags [.], seq 26869:35817, ack 1, win 140, options [nop,nop,TS val 74413271 ecr 78849356], length 8948
      16:44:13.416667 IP nfsa.commplex-link > cgx.51236: Flags [.], ack 17921, win 227, options [nop,nop,TS val 78849357 ecr 74413271], length 0

      Delete
    4. these are the original defaults on one of the nodes (before any sysctl changes). we have two different physical hosts. they both have the same guests. we've been experimenting with changing the sysctl settings on one to see what effect it has have and kept the other physical host's guests the same as the defaults.

      [root@cgx1 /]# sysctl -a | grep rcv
      net.ipv4.tcp_moderate_rcvbuf = 1
      [root@cgx1 /]# sysctl -a | grep recv
      [root@cgx1 /]# sysctl -a | grep rmem
      net.core.rmem_max = 131071
      net.core.rmem_default = 124928
      net.ipv4.tcp_rmem = 4096 87380 4194304
      net.ipv4.udp_rmem_min = 4096


      so far, both are performing exactly the same.

      Delete
    5. the tcpdump command you are using is correct - but i need a longer snippet to try and understand the flow of packates, at least a few thousands lines. best would be to paste it at a site like http://pastebin.com/, and then reply with the link to the paste here.
      Also, it's much easier to troubleshoot with iperf than NFS (or anything else). can you run the tcpdump with iperf traffic?
      One more request - run tcpdump on both nodes concurrently when you test with iperf and attach the log.

      Delete
    6. I did run it with iperf. I used EXACTLY the command you gave above in your original post.

      i ran it again to get both client and server

      server: http://pastebin.com/m5vXBv0U

      client: http://pastebin.com/EHQAW0Lr

      Delete
  18. Sorry for taking the time to answer, I've been traveling for work.
    The traces are very interesting. Nothing in the packet flow/congestion seems to limit the performance at all.
    The issue is the rate you are sending packets out - once every 7-10uS. My linux boxes send packets out every 1-2uS, until they reach a buffer/bandwidth bottleneck at around 22 Gbps.

    anything on your system that can limit sending packets? for example, is it a multi core cpu with one core pegged at 100% ?
    maybe (but unlikely) a packet shaper or firewall on the machine?

    ReplyDelete
    Replies
    1. firewall is chkconfiged off (iptables and ip6tables)

      8 vcpu's are assigned. the physical host is dual cpu each cpu has 6 hyperthreaded cores. i don't see any single cpu limited.

      as far as i know, no packet shaper is on. it is a stock centos 6.2 install on one, and an stock 6.2 with upgrades to 2.6.32-400 on the other (for ocfs2).

      Delete
    2. I'll try to install over the weekend the same configuration and see what performance i get. will let you know what i find.

      Delete
    3. I used a livecd for centos 6.2, straight out of the box installed iperf, enabled mtu 9000 and getting 15-20Gbps... nothing installed or touched on the box beyond that - vanilla drivers, not even vmware tools drivers.
      before enabling mtu 9000 I was exactly the exact problem as you were seeing.

      can you please post the output of -
      ethtool -i eth0
      ifconfig eth0

      Delete
    4. this is the output for eth1. eth0 has a physical nic associated with it for management. eth1 is the purely virtual guest-2-guest vswitch.

      [root@cgx1 ~]# ethtool -i eth1
      driver: vmxnet3
      version: 1.1.18.0-k-NAPI
      firmware-version: N/A
      bus-info: 0000:1b:00.0
      [root@cgx1 ~]# ifconfig eth1
      eth1 Link encap:Ethernet HWaddr 00:0C:29:7E:CD:6E
      inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0
      inet6 addr: fe80::20c:29ff:fe7e:cd6e/64 Scope:Link
      UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
      RX packets:13255527 errors:0 dropped:149 overruns:0 frame:0
      TX packets:16059506 errors:0 dropped:0 overruns:0 carrier:0
      collisions:0 txqueuelen:1000
      RX bytes:5981766938 (5.5 GiB) TX bytes:159450144144 (148.4 GiB)

      Delete
    5. that is identical to what i see on mine, but i'm getting much better throughput.
      one way to continue troubleshooting, would be to create another VM, run centos 6.2 live cd, install perf, run ifconfig eth0 mtu 9000 and test. if this shows good performance, you need to understand the delta between the livecd and your instance.

      Delete
  19. I have a question about the version bug in Solaris where vmtools installs for version 10 instead of 11. Is this still in the code cause i cant find the line to change. The problem is that i also have version 10 installed and when i do "uname -r" i get 5.11 meaning i should have version 11 installed. Do you mind telling us where (if still exists) in the code we need to change the "<" and ">"?

    ReplyDelete
    Replies
    1. what version of ESXi do you use? can you paste the installation script (or at least the relevant parts) over at http://pastebin.com and reply with the url?

      Delete
    2. Ahh finally found it, thought it was in the "vmware-install.pl" but it's actually in a subfolder "./bin/vmware-config-tools.pl". Hopefully this will help someone =) and thx for very useful blogpost!

      Delete
  20. I have never, ever.....gotten vmxnet3 to perform correctly in esx aio. NFS share disconnects from ESX all the time, no matter what the MTU. e1000g just works. This is the same in ESXi 5.5

    ReplyDelete
  21. I'm trying to boost my speed from my Win7 VM to my Ubuntu12.04 VM and I can't seem to get past 2Gbps in either direction. From what I can tell I have Jumbo Frames enabled on both and I'm using the VMXNET3 driver on each. Link states 10gbps on each. When I just run the default client and servers I'm stuck at 2Gbps but if I explicitly set the window size with the -w flag on each I can get up to about 6Gbps. My questions are: if TCP is configured properly shouldn't the window size scale automatically without being set? When I transfer files over SMB/CIFS I can't seem to move past the 2Gbps limit either (I've got an SSD capable of going faster)

    ReplyDelete
  22. Well, I'm at about 4Gbps with iperf after realizing that the splashtop streamer I was running to remotely connect to the VM was slowing the iperf results. But the RWIN still doesn't seem to be auto scaling enough. It will go up to 10gbps if I set window size manually or connect multiple instances with -P but should I have to do that?

    ReplyDelete
  23. In my experience it should not make a difference. are the results identical if you use the windows as servers vs. linux as server?

    ReplyDelete
    Replies
    1. Yeah, the results are basically the same regardless of which is the server and which is the client. There's definitely something on the Win7 box thats keeping the TCP window from scaling on its own correctly. When I use iPerf between Linux clients the default window size appears to adjust to something in the 900K to the 1M range when the client side connects to the server. With Win7 the default always maxes out at 64K and the speed would indicate that it's not scaling beyond that. Only way I can get it to go faster is to explicitly set the Window size higher in iPerf or use multiple TCP streams. There's got to be a config on the Win7 side I'm missing that allows RWIN to auto scale above 64K.

      Delete
    2. do you have a packet capture i can look at?
      IIRC win 7 have some tuning options for the tcp stack - are they in their default values?

      Delete
    3. Here's a pcap with LSO on:
      https://drive.google.com/file/d/0B7vCQgqzBIZ7Q3poMmY0MUs5WnM/edit?usp=sharing
      And LSO off:
      https://drive.google.com/file/d/0B7vCQgqzBIZ7QjVKMkU3OXIxX28/edit?usp=sharing

      I've used TCPOptimizer to implement better values for Win7, did give about a 1Gbps increase in speed, but nowhere near 10Gbps total

      Delete
    4. in your capture, 10.10.1.130 has Windows Scaling disabled, which means it can only grow to 65K. i assume this is the windows 7 machine ? It would explain why RWIN doesn't scale beyond 65K and impacts performance.
      Windows 7 should have TCP WS enabled by default, something must have disabled it on your machine. take a look here http://datacomguy.blogspot.com/2011/06/tweaking-windows-7-vista-tcpip-settings.html

      try manually changing autotuninglevel to normal or experimental (netsh int tcp set global autotuninglevel=normal)

      Delete
  24. Nice write-up! This led me to the discovery that my Oracle Solaris 11.1 VM was running the older Solaris 10 vmxnet3 NIC drivers.

    But after replacing the drivers and rebooting, the max MTU is still only 1500.

    I noticed that the ping command you're using to test jumbo frames might be flawed. It's missing the parameter that sets the DF-bit on the echo-request packets.

    Under Oracle Solaris 11.1, the following ping command will test jumbo frames.
    ping -s -D 192.168.1.202 8972 4

    One more thing, would you mind posting the output of 'dladm show-linkprop -p mtu' from your Solaris 11.1 guest VM?

    ReplyDelete
  25. I too have the same problem as Pooch that I can't seem to get past 2Gbps (default window size) VM-to-VM (both VMs are Win 7 and I have assigned each with 4 cpu). One thing for sure, the speed seems to affected by how many cpu each VM has (if I assign each VM with 1 cpu, the speed drop to <100Mbps).

    Hi Cyber Explorer, is it ok for you to show us your Host machine spec? (you have dual 10Gb ethernet onboard or PCIe adapter, CPU model, how much RAM, etc) As well as your VM setup? (you use vmxnet3, Virtual Machine Version 8 or 9 or 10, how many CPU assigned to each VM, memory, etc). In additional, do you use vsphere "client" or "web client" (with vcenter installed) to create those VM?

    I am trying to follow your setup and see if I can get close to ~20Gbps and that would be extremely awesome!

    Here is some test result and he too get around 2Gbps (default window size) as well
    http://forums.freenas.org/threads/esxi-5-5-network-performance-comparison-with-vmxnet-and-intel-em.15320/

    ReplyDelete
  26. This is common in any communications protocol. In TCP / IP (TCP Offload, Full Kernel Bypass) each of these units of information called "datagram" (datagram), and data sets are sent as separate messages.
    Thanks for sharing nice blog....

    ReplyDelete
  27. Thanks for approving my comment.
    It is extremely interesting for me to read the article. Thank you for it. I like such topics and everything connected to them. I would like to read a bit more on that blog soon.
    10G bit tcp offload

    ReplyDelete
  28. Full TCP offload engine works best with 10 gigabytes Ethernet network adapters. 10G bit TCP offload technology designed for financial institutions like banks, data centers, stock exchanges etc.
    NIC with Full TCP/UDP offload

    Thanks..

    ReplyDelete
  29. This is stylish MINK FUR and this is weather for winter and girl dress .and this name is ASTRAKAN AND MINK FUR .Thanks for is blog share is blog my favrate .

    ReplyDelete
  30. VMware has started fixing the bug in the Tools installation script, for example see http://kb.vmware.com/kb/2110233

    ReplyDelete
  31. Finally the LSO bug has been squashed. In the 9.4.15 version of vmware-tools.

    ReplyDelete
  32. I'm trying to boost my speed from my Win7 VM to my Ubuntu12.04 VM and I can't seem to get past 2Gbps in either direction. From what I can tell I have Jumbo Frames enabled on both and I'm using the VMXNET3 driver on each. Link states 10gbps on each. When I just run the default client and servers I'm stuck at 2Gbps but if I explicitly set the window size with the -w flag on each I can get up to about 6Gbps. My questions are: if TCP is configured properly shouldn't the window size scale automatically without being set? When I transfer files over SMB/CIFS obd2 scanner. I can't seem to move past the 2Gbps limit either (I've got an SSD capable of going faster)

    ReplyDelete
  33. That gives off an impression of being fabulous
    anyway i am still not very beyond any doubt that I like it.
    At any rate will look much more into it and choose by and by! unkindesign

    http://www.unkindesign.com/

    ReplyDelete
  34. It’s a pity you don’t have a donate button! I’d most certainly donate to
    this outstanding blog! I suppose for visit here now i’ll settle for book-marking
    and adding your RSS feed to my Google account.
    I look forward to new updates and will talk about this
    website with my Facebook group. Chat soon!

    ReplyDelete
  35. Classical music we are often in diseluk by young children now, but if we see and hear in more detail we know music klasih very cool and interesting, let's visit our web here ,, !!!!

    ReplyDelete
  36. definately enjoy every little bit of it and I have you bookmarked to check out new stuff of your blog a must read blog! click here

    ReplyDelete
  37. It turns out that even the hottest port has a few places where you can get off the beaten path. Here are some recommendations that will make you feel like you're in the know diebesten vpn

    ReplyDelete
  38. It is a well-structured instruction which contains a few significant for considering points! Thank you for the submission!

    ReplyDelete
  39. Outstanding post, nice guide, thanks!

    ReplyDelete
  40. You may decide to perform the site-specific arrangement, such as the bathroom every Monday, the study room every Tuesday, the hallway every Wednesday, and so on.شركة مكافحة النمل الابيض بالدمام
    شركة مكافحة حشرات بالدمام
    شركة رش مبيدات بالدمام
    افضل شركة مكافحة حشرات

    ReplyDelete
  41. The Norton Setup at norton.com/Setup Norton setup is a process where reach you enter the Norton Setup Key at www.norton.com/setup to trigger & install Norton product. One can get your hands on Norton from retail include or online.

    ReplyDelete
  42. icloud hacking explained -: Clients will get autonomous help on the off chance that they utilize our help number.The above highlights help you to think about the outsider technical support number which you will impart utilizing the help telephone number. The customers should make it understood before utilizing the help telephone number that it doesn't have any association with the official client assistance telephone number of that specific brand or item. I am a digital marketing expert we will provide all kind of services in any site. We will also Satisfy our any customer that he or she will given project.You will call us on Toll-Free Number +1 804 480 2153 that you will get all satisfaction.

    Click Here -: https://icloudweb.co/

    ReplyDelete
  43. We offer complete help for your primevideo.com/mytv device. From the underlying purpose of initiating your gadget till where your channels are flawlessly spilling, we provide step by step assistance. You may have inquiries on initiating the gadget and setting up with all the most loved channels and putting it up inside and out; Prime video helpline has all the orders to sift through all the inquiry you have. We assist with fixing your myTV gadgets, be it initiating it or becoming acquainted with about the Channel list on Roku or in any event, repairing your Roku remote. If you facing any queries related to prime video or mytv, Roku or Android devices Setup contact us, our expert's team always available customer help.
    Read more…

    ReplyDelete
  44. The information is very useful. Thank you for sharing this amazing post, I really appreciate your work. keep going

    mcafee.com/activate | office.com/setup | www.office.com/setup | mcafee.com/activate



    ReplyDelete
  45. Chase Bank gives customers to verify their credit card receipt and activate the card from home or anyplace over the Chase Com Verify Card conveniently. If you are facing any glitches with your card so simply visit Chase.com/verifycard contact us. Chase bank customer care always ready to help our customers and resolve issues as soon as possible.
    Read More

    ReplyDelete

  46. If you are having trouble registering upward or signing up to YouTube TV, you may possibly well be signed directly into a Brand New Account. New Accounts Aren't harmonious with YouTube TV. You Will Need to use another Google Account to signal upward and signal into YouTube TV.
    youtube.com/activate
    www.youtube.com/activate

    ReplyDelete
  47. Paypal annual error resolution notice is related to your Paypal transaction. if you find any type error in this notice you can contact with Paypal office they will solve your request between 10 to 60 days. This report tell us about any unauthorized transaction and any kind error.

    ReplyDelete
  48. Tubi's complimentary streaming agency is a pleasing surprise, even with a vast assortment of quirky and classic pictures and television collection. It's ad-supported, nevertheless, you'll run in to more adverts on freetoair than simply while seeing among Tubi's b movies.
    tubi.tv/activate
    tubi.tv/activate enter code
    tubi.tv/activate

    ReplyDelete
  49. QuickBooks is an accounting software widely used by companies to manage their accounts, payrolls, and many other tasks effortlessly. Yet, it sometimes encounters an QuickBooks Error Code C = 272 which can hamper the company and client's data. The quickest tweak to fix this error is by disabling the Windows Compatible Mode via QuickBooks Desktop Shortcut Icon. And next step is to disable Windows Compatibility Mode for QBW32.exe files. In the case of technical queries, users can freely contact Quickbooks customer service number. You can talk to their highly qualified technician to have the best possible solution to your problem.

    Read more: QuickBooks Error Code 15101

    ReplyDelete
  50. Our the purpose is to share the reviews about the latest Jackets,Coats and Vests also shre the related Movies,Gaming, Casual,Faux Leather and Leather materials available Maynard James Keenan Leather Jacket

    ReplyDelete
  51. Hi , Thank you so much for writing such an informational blog. If you are Searching for latest Jackets, Coats and Vests, for more info click on given link-ACDC Jacket

    ReplyDelete
  52. Nice Blog !
    Here We are Specialist in Manufacturing of Movies, Gaming, Casual, Faux Leather Jackets, Coats And Vests See. Killmonger Denim Jacket

    ReplyDelete
  53. Are you entangle with the issue of Facebook search not working? Every time when trying to do a search on Facebook, search bar does not work. Then don't worry, know how you can fix it by visiting the website.

    ReplyDelete
  54. Wow.. Very informative article.. Indian visa fees depend on your nationality and your visa type. You can pay for the Indian tourist visa fee securely. You don't need to worry about losing your information.

    ReplyDelete
  55. Merkur 15c Safety Razor - Barber Pole - Deccasino
    Merkur 15C Safety Razor - Merkur - 사설 토토 사이트 15C for worrione Barber Pole is 출장안마 the https://jancasino.com/review/merit-casino/ perfect introduction deccasino to the Merkur Safety Razor.

    ReplyDelete
  56. Thanks for taking the time to talk about it; it's something I was very excited to know about. I just inform you that now you can Build an Al-driven application to do business faster, and more efficiently. AI apps are most appreciated in any business.

    ReplyDelete
  57. which is the best video editor forsports highlight /video editor for filmmaking Video Editor For MKV Files video editor for mp4 files I read this post your post so nice and very informative post thanks for sharing this post,Great article. Couldn’t be write much better! Keep it up linktr manylink nordVPN 6 month dealnordVPN 2 year plan

    ReplyDelete
  58. This is an open access article distributed under the phrases of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), the place it is permissible to download and share the work offered it is correctly cited. 온라인카지노 The work can't be changed in any means or used commercially with out permission from the journal. Balance gambling with different activities.Gambling shouldn't intervene with, or take the place of, friends, household, work, or different pleasant activities. Set a time limit and persist with it.Decide how long you wish to gambling.

    ReplyDelete
  59. The information in this article is very important to me. I want to thank you for making this free content available, and I hope that my colleagues will read it. Black Label Society vest

    ReplyDelete
  60. I will use this article in my class as a reference because it's very useful for students. Black Label Society vest

    ReplyDelete
  61. Buy NCLEX without Exam
    Buy SSN
    buy US Driver’s License

    ReplyDelete
  62. Buy Cloned Cards
    Cloned Cards Shop
    Buy Cloned Visa Cards
    Buy Cloned Master Cards

    ReplyDelete
  63. Hello! Your support means the world to us, and we're committed to delivering more engaging content. Thank you for inspiring us to keep writing. Requirements for Traveling to Turkey 2022 during COVID-19 requires a valid visa, proof of full vaccination, or a negative PCR test result taken within 72 hours. Masks and social distancing are mandatory in public places.




    ReplyDelete
  64. We are extremely grateful that you have given some thought to us and have paid attention to even the smallest of things. This means a great deal to us. iaue post utme form

    ReplyDelete
  65. Have you ever considered enriching your content beyond just articles? While your articles are valuable, imagine the added engagement that could come from including vibrant images or informative video clips. Your content is already great, but with multimedia elements, your website could truly excel and become a leader in its field. Terrific blog!

    ReplyDelete
  66. Your distinct viewpoint and polished writing style create a captivating narrative. The way you break down intricate concepts with precision is admirable. I resonated with the material, finding it relatable and intellectually stimulating. Your command of the subject is clear, motivating me to incorporate these insights into my own ventures.

    ReplyDelete
  67. Hello, I stumbled upon your website through Google while searching for a similar topic, and I must say, it looks fantastic. I've saved it in my Google bookmarks. I'm not sure how I found myself here, but I really enjoyed this post. Whoever you are, you're bound to become a renowned blogger if you aren't one already 😉 Cheers!

    ReplyDelete
  68. Writing blogs habitually, your content is undeniably outstanding. This captivating article has intrigued me. I've made a point to save your website in my bookmarks and stay updated with new information, usually on a weekly basis. Moreover, I've subscribed to your RSS feed. Obtaining an Azerbaijan eVisa is a straightforward process, allowing travelers to apply online. It offers convenience and efficiency, enabling swift approval for entry into Azerbaijan, enhancing travel experiences.

    ReplyDelete
  69. Inside your blog post, you embark on a captivating literary journey that effortlessly traverses the landscapes of intellect and emotion. The harmonious integration of profound insights and relatable anecdotes is truly remarkable. Every sentence unfolds like a masterful brushstroke, shaping a narrative that resonates profoundly.

    ReplyDelete
  70. Your blog is an absolute treasure trove of inspiration! Each post is a delightful journey through fascinating destinations and cultures. I'm continually amazed by the depth of your insights and the engaging way you share your experiences. Your passion for travel shines through every word, making your blog a must-read for anyone with wanderlust.

    ReplyDelete
  71. Your most recent blog entry embarks on a literary odyssey, gracefully traversing the realms of intellect and emotion. The seamless integration of profound insights and relatable stories is truly exceptional. Each line unfolds like a stroke of artistic brilliance, crafting a narrative that captivates on multiple levels.

    ReplyDelete
  72. Your post invoked a sense of introspection akin to wandering through my own inner musings. The authenticity and depth of your words struck a chord within me, echoing sentiments I often grapple with. Your ability to encapsulate the human experience in such a raw and relatable manner is truly commendable. Thank you for sharing your thoughts with such sincerity and vulnerability.

    ReplyDelete