nfsrdma fails to write big file,

* nfsrdma fails to write big file,
@ 2010-02-22 18:41 Vu Pham
       [not found] ` <9FA59C95FFCBB34EA5E42C1A8573784F02662E58-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Vu Pham @ 2010-02-22 18:41 UTC (permalink / raw)
  To: Tom Tucker
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mahesh Siddheshwar,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Setup: 
1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600, ConnectX2
QDR HCAs fw 2.7.8-6, RHEL 5.2.
2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.

Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
count=10000*, operation fail, connection get drop, client cannot
re-establish connection to server.
After rebooting only the client, I can mount again.

It happens with both solaris and linux nfsrdma servers.

For linux client/server, I run memreg=5 (FRMR), I don't see problem with
memreg=6 (global dma key)

On Solaris server snv 130, we see problem decoding write request of 32K.
The client send two read chunks (32K & 16-byte), the server fail to do
rdma read on the 16-byte chunk (cqe.status = 10 ie.
IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the connection. We
don't see this problem on nfs version 3 on Solaris. Solaris server run
normal memory registration mode.

On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR

I added these notes in bug #1919 (bugs.openfabrics.org) to track the
issue.

thanks,
-vu

^ permalink raw reply	[flat|nested] 16+ messages in thread