All of lore.kernel.org
 help / color / mirror / Atom feed
* help with "PSN sequence error" in Ubuntu 20.04 (using CX-4 or CX-6 mellanox cards)
@ 2021-08-02 15:37 Olga Kornievskaia
  2021-08-02 16:58 ` Leon Romanovsky
  0 siblings, 1 reply; 3+ messages in thread
From: Olga Kornievskaia @ 2021-08-02 15:37 UTC (permalink / raw)
  To: linux-rdma

Hi folks,

We are encountering an error condition (while doing NFSoRDMA) but the
problem seems to be in the RDMA core itself. The problem is that the
client at some point is ending in an RDMA NAK with "PNS Sequence
error" but the network trace shows all the PSNs are accounted for
(snippet at the bottom). It's as if the client lost its knowledge of
the current PSN.

Questions:
1. Is PSN handling done by the hardware card itself (in firmware) and
not in the kernel (making this a card/firmware specific problem)? I
was trying to look thru the rdma core/mlx5 driver code to see what
would generate a NAK with such error but wasn't able to find one. Only
found counters for nak_seq_error which made me think this is a
firmware problem.
2. If this is a kernel issue is this something that perhaps has been
fixed upstream but not present in Ubuntu?

Thank you for your help.

160 2021-07-22 13:17:52.579023 192.168.100.51 -> 192.168.100.28 NFS
v4.0 reply xid:0x0982a167 PUTFH;GETATTR (PSN: 15729419)
161 2021-07-22 13:17:52.579026 192.168.100.28 -> 192.168.100.51 RRoCE
RC_Acknowledge QP=0x017c PSN=15729419
162 2021-07-22 13:17:52.579247 192.168.100.28 -> 192.168.100.51 NFS
v4.0 call  xid:0x0a82a167 PUTFH;READDIR             DH:0xbee72168
cookie:0 verf:0x0000000000000000 count:8170
163 2021-07-22 13:17:52.579249 192.168.100.51 -> 192.168.100.28 RRoCE
RC_Acknowledge QP=0x0244 PSN=16086680
164 2021-07-22 13:17:52.579631 192.168.100.51 -> 192.168.100.28 RRoCE
RC_RDMA_WRITE_First QP=0x0244 PSN=15729420 size=4096 rkey=0x40000a13
dmalen=9824
165 2021-07-22 13:17:52.579644 192.168.100.51 -> 192.168.100.28 RRoCE
RC_RDMA_WRITE_Middle QP=0x0244 PSN=15729421 size=4096
166 2021-07-22 13:17:52.579652 192.168.100.51 -> 192.168.100.28 RRoCE
RC_RDMA_WRITE_Last QP=0x0244 PSN=15729422 size=1632
167 2021-07-22 13:17:52.579653 192.168.100.51 -> 192.168.100.28 NFS
v4.0 reply xid:0x0a82a167 PUTFH;READDIR
verf:0x0000000000000000 eof:TRUE (PSN: 15729423)
168 2021-07-22 13:17:52.579653 192.168.100.28 -> 192.168.100.51 RRoCE
RC_Acknowledge QP=0x017c PSN=15729420 PSN_SEQ_ERR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: help with "PSN sequence error" in Ubuntu 20.04 (using CX-4 or CX-6 mellanox cards)
  2021-08-02 15:37 help with "PSN sequence error" in Ubuntu 20.04 (using CX-4 or CX-6 mellanox cards) Olga Kornievskaia
@ 2021-08-02 16:58 ` Leon Romanovsky
  2021-08-02 19:00   ` Olga Kornievskaia
  0 siblings, 1 reply; 3+ messages in thread
From: Leon Romanovsky @ 2021-08-02 16:58 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: linux-rdma

On Mon, Aug 02, 2021 at 11:37:06AM -0400, Olga Kornievskaia wrote:
> Hi folks,
> 
> We are encountering an error condition (while doing NFSoRDMA) but the
> problem seems to be in the RDMA core itself. The problem is that the
> client at some point is ending in an RDMA NAK with "PNS Sequence
> error" but the network trace shows all the PSNs are accounted for
> (snippet at the bottom). It's as if the client lost its knowledge of
> the current PSN.
> 
> Questions:
> 1. Is PSN handling done by the hardware card itself (in firmware) and
> not in the kernel (making this a card/firmware specific problem)? I
> was trying to look thru the rdma core/mlx5 driver code to see what
> would generate a NAK with such error but wasn't able to find one. Only
> found counters for nak_seq_error which made me think this is a
> firmware problem.

The decisions what is valid or not are done in the FW, kernel doesn't
check anything. Although, the kernel sets/gets next_send_psn/next_recv_psn.

Thanks

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: help with "PSN sequence error" in Ubuntu 20.04 (using CX-4 or CX-6 mellanox cards)
  2021-08-02 16:58 ` Leon Romanovsky
@ 2021-08-02 19:00   ` Olga Kornievskaia
  0 siblings, 0 replies; 3+ messages in thread
From: Olga Kornievskaia @ 2021-08-02 19:00 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma

On Mon, Aug 2, 2021 at 12:59 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Aug 02, 2021 at 11:37:06AM -0400, Olga Kornievskaia wrote:
> > Hi folks,
> >
> > We are encountering an error condition (while doing NFSoRDMA) but the
> > problem seems to be in the RDMA core itself. The problem is that the
> > client at some point is ending in an RDMA NAK with "PNS Sequence
> > error" but the network trace shows all the PSNs are accounted for
> > (snippet at the bottom). It's as if the client lost its knowledge of
> > the current PSN.
> >
> > Questions:
> > 1. Is PSN handling done by the hardware card itself (in firmware) and
> > not in the kernel (making this a card/firmware specific problem)? I
> > was trying to look thru the rdma core/mlx5 driver code to see what
> > would generate a NAK with such error but wasn't able to find one. Only
> > found counters for nak_seq_error which made me think this is a
> > firmware problem.
>
> The decisions what is valid or not are done in the FW, kernel doesn't
> check anything. Although, the kernel sets/gets next_send_psn/next_recv_psn.

Thank you for the confirmation.

>
> Thanks

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-02 19:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-02 15:37 help with "PSN sequence error" in Ubuntu 20.04 (using CX-4 or CX-6 mellanox cards) Olga Kornievskaia
2021-08-02 16:58 ` Leon Romanovsky
2021-08-02 19:00   ` Olga Kornievskaia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.