From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21888C32767 for ; Fri, 3 Jan 2020 20:50:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E37EA227BF for ; Fri, 3 Jan 2020 20:50:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="fDZKDA2h" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728481AbgACUuc (ORCPT ); Fri, 3 Jan 2020 15:50:32 -0500 Received: from mail-qk1-f194.google.com ([209.85.222.194]:33975 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727848AbgACUub (ORCPT ); Fri, 3 Jan 2020 15:50:31 -0500 Received: by mail-qk1-f194.google.com with SMTP id j9so35189751qkk.1 for ; Fri, 03 Jan 2020 12:50:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=NAwft+aLs7BO4WXzbFKCr7eb7xk7iXEk/fc2g9UNhBs=; b=fDZKDA2hAVeHkxnFds/LUnr1hHuDHbDmC0gEv7EHnyiVZ95vb59EHfH+6v5TWad+kX zlin2/b/9DEsvJX4+pcWnFGGEY1LggxRYvGFhax+dOwtwHF+1VX/vVP51agpkSFBeLf5 YjIYFhFeTaESFMbWKoaZU/j8O0zQXGHFa1waeHP4vdyYVx+et3G25RYWJ36XUvlzQ+1f KIviiPsFsLYr9bayXbmKqKzW7Q6amqIc74TmFk3c2cjW+Yk31bscEDSvFR0odsYyc1ZJ Vye20YXZhBMMMUUDx3iZyqHmIjfJfkqd3wmFzkig4ZHzzvkADcMzO3eWWJwV3PWGPElZ Z/Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=NAwft+aLs7BO4WXzbFKCr7eb7xk7iXEk/fc2g9UNhBs=; b=AApww4l6tPq3O39ihL9IAbwVhL661C9liOoGt2qk5evV7SVuXSzpHkvcZU1UGrRIk2 PlV24hp4Rn7QZIa4KxwdBHDwsEY/YccrvEAizkBknWTUkdDpU0B21SY+bstFvBOCipMg 54YO5HykARiWe4IEPgqg0M3UhwL0KpEWXsZK697L+PB0z2WqF6/SZUHsUYHE5Gs3nC4o /V16mqJ9zk69c9BPoVE+rHsK5kPjdofiSBt3XzM3+w4A2X6M1A1jPc/rD2vzyge5t/M5 MCDtrip9BQ6Nw/tUE1rwFGq7TmtLZTdc9EeT9zl98kd0M+zBYLeKM1l9EpcyqvTkSSgn /9oA== X-Gm-Message-State: APjAAAWJmf8dKugkK2ICpbr0nTm0TezXk7TYzctCJPVD0d947uy/o+kC TGWOtuHVs4F6LSDvfusGfh6xsQ== X-Google-Smtp-Source: APXvYqzXPGh/oMJafIpLkljdKWL0BpGd22hgXNswPExWI1mKmHWDuqoROaeoqf0vqYdFR7E1qFAR/w== X-Received: by 2002:a05:620a:102e:: with SMTP id a14mr70443923qkk.159.1578084630712; Fri, 03 Jan 2020 12:50:30 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-68-57-212.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.57.212]) by smtp.gmail.com with ESMTPSA id u4sm16822582qkh.59.2020.01.03.12.50.30 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 03 Jan 2020 12:50:30 -0800 (PST) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1inTtt-0003CA-TP; Fri, 03 Jan 2020 16:50:29 -0400 Date: Fri, 3 Jan 2020 16:50:29 -0400 From: Jason Gunthorpe To: Dennis Dalessandro Cc: dledford@redhat.com, linux-rdma@vger.kernel.org, Mike Marciniszyn , stable@vger.kernel.org, Kaike Wan Subject: Re: [PATCH for-rc] IB/hfi1: Adjust flow PSN with the correct resync_psn Message-ID: <20200103205029.GA12225@ziepe.ca> References: <20191219231920.51069.37147.stgit@awfm-01.aw.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191219231920.51069.37147.stgit@awfm-01.aw.intel.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On Thu, Dec 19, 2019 at 06:19:20PM -0500, Dennis Dalessandro wrote: > From: Kaike Wan > > When a TID RDMA ACK to RESYNC request is received, the flow PSNs for > pending TID RDMA WRITE segments will be adjusted with the next flow > generation number, based on the resync_psn value extracted from the > flow PSN of the TID RDMA ACK packet. The resync_psn value indicates > the last flow PSN for which a TID RDMA WRITE DATA packet has been > received by the responder and the requester should resend TID RDMA > WRITE DATA packets, starting from the next flow PSN. However, if > resync_psn points to the last flow PSN for a segment and the next > segment flow PSN starts with a new generation number, use of the > old resync_psn to adjust the flow PSN for the next segment will > lead to miscalculation, resulting in WARN_ON and sge rewinding > errors: > [2419460.492485] WARNING: CPU: 4 PID: 146961 at /nfs/site/home/phcvs2/gitrepo/ifs-all/components/Drivers/tmp/rpmbuild/BUILD/ifs-kernel-updates-3.10.0_957.el7.x86_64/hfi1/tid_rdma.c:4764 hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1] > [2419460.514565] Modules linked in: ib_ipoib(OE) hfi1(OE) rdmavt(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel ib_isert iscsi_target_mod target_core_mod aesni_intel lrw gf128mul glue_helper ablk_helper cryptd rpcrdma sunrpc opa_vnic ast ttm ib_iser libiscsi drm_kms_helper scsi_transport_iscsi ipmi_ssif syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev ipmi_si pcspkr sg drm_panel_orientation_quirks ipmi_devintf lpc_ich i2c_i801 ipmi_msghandler wmi rdma_ucm ib_ucm ib_uverbs acpi_cpufreq acpi_power_meter ib_umad rdma_cm ib_cm iw_cm ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul i2c_algo_bit crct10dif_common > [2419460.594432] crc32c_intel e1000e ib_core ahci libahci ptp libata pps_core nfit libnvdimm [last unloaded: rdmavt] > [2419460.605645] CPU: 4 PID: 146961 Comm: kworker/4:0H Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1 > [2419460.619424] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.0X.02.0117.040420182310 04/04/2018 > [2419460.631062] Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1] > [2419460.637423] Call Trace: > [2419460.641044] [] dump_stack+0x19/0x1b > [2419460.647980] [] __warn+0xd8/0x100 > [2419460.654023] [] warn_slowpath_null+0x1d/0x20 > [2419460.661025] [] hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1] > [2419460.669333] [] hfi1_kdeth_eager_rcv+0x1dc/0x210 [hfi1] > [2419460.677295] [] ? hfi1_kdeth_expected_rcv+0x1ef/0x210 [hfi1] > [2419460.685693] [] kdeth_process_eager+0x35/0x90 [hfi1] > [2419460.693394] [] handle_receive_interrupt_nodma_rtail+0x17a/0x2b0 [hfi1] > [2419460.702745] [] receive_context_interrupt+0x23/0x40 [hfi1] > [2419460.710963] [] __handle_irq_event_percpu+0x44/0x1c0 > [2419460.718659] [] handle_irq_event_percpu+0x32/0x80 > [2419460.726086] [] handle_irq_event+0x3c/0x60 > [2419460.732903] [] handle_edge_irq+0x7f/0x150 > [2419460.739710] [] handle_irq+0xe4/0x1a0 > [2419460.746091] [] do_IRQ+0x4d/0xf0 > [2419460.752040] [] common_interrupt+0x162/0x162 > [2419460.759029] [] ? swiotlb_map_page+0x49/0x150 > [2419460.766758] [] hfi1_verbs_send_dma+0x291/0xb70 [hfi1] > [2419460.774637] [] ? hfi1_wait_kmem+0xf0/0xf0 [hfi1] > [2419460.782080] [] hfi1_verbs_send+0x126/0x2b0 [hfi1] > [2419460.789606] [] _hfi1_do_tid_send+0x1d3/0x320 [hfi1] > [2419460.797298] [] process_one_work+0x17f/0x440 > [2419460.804292] [] worker_thread+0x126/0x3c0 > [2419460.811025] [] ? manage_workers.isra.25+0x2a0/0x2a0 > [2419460.818710] [] kthread+0xd1/0xe0 > [2419460.824751] [] ? insert_kthread_work+0x40/0x40 > [2419460.832013] [] ret_from_fork_nospec_begin+0x7/0x21 > [2419460.839611] [] ? insert_kthread_work+0x40/0x40 > > This patch fixes the issue by adjusting the resync_psn first if the flow > generation has been advanced for a pending segment. > > Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet") > Cc: > Reviewed-by: Mike Marciniszyn > Signed-off-by: Kaike Wan > Signed-off-by: Dennis Dalessandro > drivers/infiniband/hw/hfi1/tid_rdma.c | 9 +++++++++ > 1 file changed, 9 insertions(+) Applied to for-rc, thanks Jason