From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29C19C33CB3 for ; Tue, 14 Jan 2020 10:20:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 00B4D24676 for ; Tue, 14 Jan 2020 10:20:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1578997209; bh=4v5wvKo9nMdl4yXFhyl6u3jWJXkwSStR/SDynIEaXZA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=m7154k7KBaUe6F7XuHrlyGn323vDdoCBUKAToTrBXoJbY6xfKfxjf3MLbzb/vDmXn j0/+TCl1dGxSlA2gPEOVhEQ0i3/nkHuPU89i4KIxKU8MCuNu9iff5LbdC1MSdObkmd NslKy+wgYubzy8Vj0gdVsuD13O6y6KeSsSSc/meY= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729801AbgANKEJ (ORCPT ); Tue, 14 Jan 2020 05:04:09 -0500 Received: from mail.kernel.org ([198.145.29.99]:59490 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728944AbgANKEJ (ORCPT ); Tue, 14 Jan 2020 05:04:09 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6AF9624676; Tue, 14 Jan 2020 10:04:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1578996248; bh=4v5wvKo9nMdl4yXFhyl6u3jWJXkwSStR/SDynIEaXZA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sjfKHcYPLOpA9r9UK9l0y9wPN7SLGOl2N00eFuocYoK9Y14NngsX8y8PpvhxMzRjd x7RuI8Vcyvk39k3XMo1vUIoPwP9yt2tnWwwgYsRu6/qS/yi+20SOLokinIzihScwxY M1zHBf3j1ULXdi03sYuVqjB+/bqwDKicMkgcCEls= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Mike Marciniszyn , Kaike Wan , Dennis Dalessandro , Jason Gunthorpe Subject: [PATCH 5.4 31/78] IB/hfi1: Adjust flow PSN with the correct resync_psn Date: Tue, 14 Jan 2020 11:01:05 +0100 Message-Id: <20200114094357.850365222@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200114094352.428808181@linuxfoundation.org> References: <20200114094352.428808181@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Kaike Wan commit b2ff0d510182eb5cc05a65d1b2371af62c4b170c upstream. When a TID RDMA ACK to RESYNC request is received, the flow PSNs for pending TID RDMA WRITE segments will be adjusted with the next flow generation number, based on the resync_psn value extracted from the flow PSN of the TID RDMA ACK packet. The resync_psn value indicates the last flow PSN for which a TID RDMA WRITE DATA packet has been received by the responder and the requester should resend TID RDMA WRITE DATA packets, starting from the next flow PSN. However, if resync_psn points to the last flow PSN for a segment and the next segment flow PSN starts with a new generation number, use of the old resync_psn to adjust the flow PSN for the next segment will lead to miscalculation, resulting in WARN_ON and sge rewinding errors: WARNING: CPU: 4 PID: 146961 at /nfs/site/home/phcvs2/gitrepo/ifs-all/components/Drivers/tmp/rpmbuild/BUILD/ifs-kernel-updates-3.10.0_957.el7.x86_64/hfi1/tid_rdma.c:4764 hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1] Modules linked in: ib_ipoib(OE) hfi1(OE) rdmavt(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel ib_isert iscsi_target_mod target_core_mod aesni_intel lrw gf128mul glue_helper ablk_helper cryptd rpcrdma sunrpc opa_vnic ast ttm ib_iser libiscsi drm_kms_helper scsi_transport_iscsi ipmi_ssif syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev ipmi_si pcspkr sg drm_panel_orientation_quirks ipmi_devintf lpc_ich i2c_i801 ipmi_msghandler wmi rdma_ucm ib_ucm ib_uverbs acpi_cpufreq acpi_power_meter ib_umad rdma_cm ib_cm iw_cm ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul i2c_algo_bit crct10dif_common crc32c_intel e1000e ib_core ahci libahci ptp libata pps_core nfit libnvdimm [last unloaded: rdmavt] CPU: 4 PID: 146961 Comm: kworker/4:0H Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.0X.02.0117.040420182310 04/04/2018 Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1] Call Trace: [] dump_stack+0x19/0x1b [] __warn+0xd8/0x100 [] warn_slowpath_null+0x1d/0x20 [] hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1] [] hfi1_kdeth_eager_rcv+0x1dc/0x210 [hfi1] [] ? hfi1_kdeth_expected_rcv+0x1ef/0x210 [hfi1] [] kdeth_process_eager+0x35/0x90 [hfi1] [] handle_receive_interrupt_nodma_rtail+0x17a/0x2b0 [hfi1] [] receive_context_interrupt+0x23/0x40 [hfi1] [] __handle_irq_event_percpu+0x44/0x1c0 [] handle_irq_event_percpu+0x32/0x80 [] handle_irq_event+0x3c/0x60 [] handle_edge_irq+0x7f/0x150 [] handle_irq+0xe4/0x1a0 [] do_IRQ+0x4d/0xf0 [] common_interrupt+0x162/0x162 [] ? swiotlb_map_page+0x49/0x150 [] hfi1_verbs_send_dma+0x291/0xb70 [hfi1] [] ? hfi1_wait_kmem+0xf0/0xf0 [hfi1] [] hfi1_verbs_send+0x126/0x2b0 [hfi1] [] _hfi1_do_tid_send+0x1d3/0x320 [hfi1] [] process_one_work+0x17f/0x440 [] worker_thread+0x126/0x3c0 [] ? manage_workers.isra.25+0x2a0/0x2a0 [] kthread+0xd1/0xe0 [] ? insert_kthread_work+0x40/0x40 [] ret_from_fork_nospec_begin+0x7/0x21 [] ? insert_kthread_work+0x40/0x40 This patch fixes the issue by adjusting the resync_psn first if the flow generation has been advanced for a pending segment. Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet") Link: https://lore.kernel.org/r/20191219231920.51069.37147.stgit@awfm-01.aw.intel.com Cc: Reviewed-by: Mike Marciniszyn Signed-off-by: Kaike Wan Signed-off-by: Dennis Dalessandro Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hfi1/tid_rdma.c | 9 +++++++++ 1 file changed, 9 insertions(+) --- a/drivers/infiniband/hw/hfi1/tid_rdma.c +++ b/drivers/infiniband/hw/hfi1/tid_rdma.c @@ -4633,6 +4633,15 @@ void hfi1_rc_rcv_tid_rdma_ack(struct hfi */ fpsn = full_flow_psn(flow, flow->flow_state.spsn); req->r_ack_psn = psn; + /* + * If resync_psn points to the last flow PSN for a + * segment and the new segment (likely from a new + * request) starts with a new generation number, we + * need to adjust resync_psn accordingly. + */ + if (flow->flow_state.generation != + (resync_psn >> HFI1_KDETH_BTH_SEQ_SHIFT)) + resync_psn = mask_psn(fpsn - 1); flow->resync_npkts += delta_psn(mask_psn(resync_psn + 1), fpsn); /*