* crash in 4.14-rc1 with IPoIB @ 2017-09-20 9:53 Johannes Thumshirn [not found] ` <20170920095339.zhfymeyfbhiyepz5-qw2SdCWA0PpjqqEj2zc+bA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Johannes Thumshirn @ 2017-09-20 9:53 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg, dledford-H+wXaHxf7aLQT0dZR+AlfA Hi folks, I wanted to try out Christoph's NVMe multipathing patchset on my NVMe OmniPath setup and merged it into 4.14-rc1. On bootup I stumbled upon that splat and no RDMA operation was possible: hfi1 0000:ff:00.0: hfi1_1: send_idle_message: sending idle message 0x203 hfi1 0000:ff:00.0: hfi1_1: Switching to NO_DMA_RTAIL BUG: unable to handle kernel NULL pointer dereference at (null) IP: (null) PGD 0 P4D 0 Oops: 0010 [#1] SMP Modules linked in: iptable_filter(E) af_packet(E) xt_nat(E) xt_tcpudp(E) iscsi_ibft(E) iscsi_boot_sysfs(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) libcrc32c(E) ip_tables(E) x_tables(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) ib_iser(E) libiscsi(E) scsi_transport_iscsi(E) ib_srpt(E) target_core_mod(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ib_srp(E) scsi_transport_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) configfs(E) ib_cm(E) iw_cm(E) mlx5_ib(E) intel_rapl(E) sha512_ssse3(E) skx_edac(E) sha512_generic(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) ipmi_ssif(E) pcbc(E) aesni_intel(E) mlx5_core (E) qat_c62x(E) aes_x86_64(E) intel_qat(E) mlxfw(E) joydev(E) hfi1(E) i40e(E) crypto_simd(E) devlink(E) rdmavt(E) ipmi_si(E) ptp(E) iTCO_wdt(E) dh_generic(E) glue_helper(E) iTCO_vendor_support(E) authenc(E) ib_core(E) pps_core(E) ipmi_devintf(E) mei_me(E) ioatdma(E) cryptd(E) lpc_ich(E) pcspkr(E) mfd_core(E) i2c_i801(E) shpchp(E) mei(E) dca(E) ipmi_msghandler(E) tpm_crb(E) nfit(E) libnvdimm(E) acpi_pad(E) sunrpc(E) btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) hid_generic(E) usbhid(E) raid6_pq(E) sd_mod(E) sr_mod(E) cdrom(E) crc32c_intel(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) xhci_pci(E) ahci(E) xhci_hcd(E) libahci(E) drm(E) usbcore(E) libata(E) wmi(E) button(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E ) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) CPU: 20 PID: 950 Comm: kworker/20:1H Tainted: G E 4.14.0-rc1-6.3-default-nvme-mpath #773 Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0412.020920172159 02/09/2017 Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] task: ffff882fce3f4b00 task.stack: ffffc9002422c000 RIP: 0010: (null) RSP: 0018:ffffc9002422f990 EFLAGS: 00010206 RAX: ffff882fd0078000 RBX: ffff882fa0263000 RCX: ffffc9002422f998 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff882fd0078000 RBP: ffffc9002422fad0 R08: 0000000000000000 R09: ffff882fa0263080 R10: ffffffffa0964ca0 R11: 0000000000000000 R12: ffff8817dcea3700 R13: ffff882fa0263000 R14: 000000000000c000 R15: 000000000000c000 FS: 0000000000000000(0000) GS:ffff882fdd000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000017db346004 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? is_valid_mcast_lid.isra.23+0xfb/0x110 [ib_core] ib_attach_mcast+0x6f/0xa0 [ib_core] ipoib_mcast_attach+0x72/0x160 [ib_ipoib] ipoib_mcast_join_complete+0x354/0xb40 [ib_ipoib] mcast_work_handler+0x2ff/0x630 [ib_core] join_handler+0xf0/0x1e0 [ib_core] ib_sa_mcmember_rec_callback+0x54/0x80 [ib_core] recv_handler+0x3a/0x60 [ib_core] ib_mad_recv_done+0x43d/0xa20 [ib_core] __ib_process_cq+0x5d/0xb0 [ib_core] ib_cq_poll_work+0x20/0x60 [ib_core] process_one_work+0x138/0x370 worker_thread+0x4d/0x3b0 kthread+0x109/0x140 ? rescuer_thread+0x320/0x320 ? kthread_park+0x60/0x60 ret_from_fork+0x25/0x30 Code: Bad RIP value. RIP: (null) RSP: ffffc9002422f990 CR2: 0000000000000000 ---[ end trace f3c2d0cdf0ebfb9c ]--- is_valid_mcast_lid.isra.23+0xfb/0x110 (gdb) l *(is_valid_mcast_lid+0xfb) 0x229b is in is_valid_mcast_lid (drivers/infiniband/core/verbs.c:1649). 1644 /* If QP state >= init, it is assigned to a port and we can check this 1645 * port only. 1646 */ 1647 if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) { 1648 if (attr.qp_state >= IB_QPS_INIT) { 1649 if (qp->device->get_link_layer(qp->device, attr.port_num) != 1650 IB_LINK_LAYER_INFINIBAND) 1651 return true; 1652 goto lid_check; 1653 } (gdb) Byte, Johannes -- Johannes Thumshirn Storage jthumshirn-l3A5Bk7waGM@public.gmane.org +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20170920095339.zhfymeyfbhiyepz5-qw2SdCWA0PpjqqEj2zc+bA@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <20170920095339.zhfymeyfbhiyepz5-qw2SdCWA0PpjqqEj2zc+bA@public.gmane.org> @ 2017-09-20 10:37 ` Sagi Grimberg [not found] ` <7aac2d78-462b-c9ad-4443-9ec670a27b74-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> 2017-09-20 16:32 ` Jason Gunthorpe 1 sibling, 1 reply; 19+ messages in thread From: Sagi Grimberg @ 2017-09-20 10:37 UTC (permalink / raw) To: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA > Hi folks, > > I wanted to try out Christoph's NVMe multipathing patchset on my NVMe OmniPath > setup and merged it into 4.14-rc1. On bootup I stumbled upon that splat and no > RDMA operation was possible: ... > is_valid_mcast_lid.isra.23+0xfb/0x110 > > (gdb) l *(is_valid_mcast_lid+0xfb) > 0x229b is in is_valid_mcast_lid (drivers/infiniband/core/verbs.c:1649). > 1644 /* If QP state >= init, it is assigned to a port and we can check this > 1645 * port only. > 1646 */ > 1647 if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) { > 1648 if (attr.qp_state >= IB_QPS_INIT) { > 1649 if (qp->device->get_link_layer(qp->device, attr.port_num) != > 1650 IB_LINK_LAYER_INFINIBAND) > 1651 return true; > 1652 goto lid_check; > 1653 } > (gdb) Why isn't ipoib uses the generic rdma_port_get_link_layer? Does this help? -- diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index ee9e27dc799b..f2c70afea238 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1646,7 +1646,7 @@ static bool is_valid_mcast_lid(struct ib_qp *qp, u16 lid) */ if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) { if (attr.qp_state >= IB_QPS_INIT) { - if (qp->device->get_link_layer(qp->device, attr.port_num) != + if (rdma_port_get_link_layer(qp->device, attr.port_num) != IB_LINK_LAYER_INFINIBAND) return true; goto lid_check; -- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 19+ messages in thread
[parent not found: <7aac2d78-462b-c9ad-4443-9ec670a27b74-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <7aac2d78-462b-c9ad-4443-9ec670a27b74-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> @ 2017-09-20 10:57 ` Johannes Thumshirn 2017-09-20 11:35 ` Hal Rosenstock 1 sibling, 0 replies; 19+ messages in thread From: Johannes Thumshirn @ 2017-09-20 10:57 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA On Wed, Sep 20, 2017 at 01:37:28PM +0300, Sagi Grimberg wrote: > -- > diff --git a/drivers/infiniband/core/verbs.c > b/drivers/infiniband/core/verbs.c > index ee9e27dc799b..f2c70afea238 100644 > --- a/drivers/infiniband/core/verbs.c > +++ b/drivers/infiniband/core/verbs.c > @@ -1646,7 +1646,7 @@ static bool is_valid_mcast_lid(struct ib_qp *qp, u16 > lid) > */ > if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) { > if (attr.qp_state >= IB_QPS_INIT) { > - if (qp->device->get_link_layer(qp->device, > attr.port_num) != > + if (rdma_port_get_link_layer(qp->device, > attr.port_num) != > IB_LINK_LAYER_INFINIBAND) > return true; > goto lid_check; > -- w00000! You're my hero. Tested-by: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org> -- Johannes Thumshirn Storage jthumshirn-l3A5Bk7waGM@public.gmane.org +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <7aac2d78-462b-c9ad-4443-9ec670a27b74-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> 2017-09-20 10:57 ` Johannes Thumshirn @ 2017-09-20 11:35 ` Hal Rosenstock [not found] ` <be30c079-6513-627f-0276-6556e6f9eea5-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 1 sibling, 1 reply; 19+ messages in thread From: Hal Rosenstock @ 2017-09-20 11:35 UTC (permalink / raw) To: Sagi Grimberg, Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA On 9/20/2017 6:37 AM, Sagi Grimberg wrote: >> Hi folks, >> >> I wanted to try out Christoph's NVMe multipathing patchset on my NVMe >> OmniPath >> setup and merged it into 4.14-rc1. On bootup I stumbled upon that >> splat and no >> RDMA operation was possible: > > ... > >> is_valid_mcast_lid.isra.23+0xfb/0x110 >> >> (gdb) l *(is_valid_mcast_lid+0xfb) >> 0x229b is in is_valid_mcast_lid (drivers/infiniband/core/verbs.c:1649). >> 1644 /* If QP state >= init, it is assigned to a port and we >> can check this >> 1645 * port only. >> 1646 */ >> 1647 if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, >> &init_attr)) { >> 1648 if (attr.qp_state >= IB_QPS_INIT) { >> 1649 if (qp->device->get_link_layer(qp->device, >> attr.port_num) != >> 1650 IB_LINK_LAYER_INFINIBAND) >> 1651 return true; >> 1652 goto lid_check; >> 1653 } >> (gdb) > > Why isn't ipoib uses the generic rdma_port_get_link_layer? > > Does this help? > -- > diff --git a/drivers/infiniband/core/verbs.c > b/drivers/infiniband/core/verbs.c > index ee9e27dc799b..f2c70afea238 100644 > --- a/drivers/infiniband/core/verbs.c > +++ b/drivers/infiniband/core/verbs.c > @@ -1646,7 +1646,7 @@ static bool is_valid_mcast_lid(struct ib_qp *qp, > u16 lid) > */ > if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, > &init_attr)) { > if (attr.qp_state >= IB_QPS_INIT) { > - if (qp->device->get_link_layer(qp->device, > attr.port_num) != > + if (rdma_port_get_link_layer(qp->device, > attr.port_num) != > IB_LINK_LAYER_INFINIBAND) > return true; > goto lid_check; There's another occurrence of qp->device->get_link_layer in that routine just below this. Shouldn't that be replaced by rdma_port_get_link_layer too ? -- Hal -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <be30c079-6513-627f-0276-6556e6f9eea5-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <be30c079-6513-627f-0276-6556e6f9eea5-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2017-09-20 11:51 ` Sagi Grimberg 0 siblings, 0 replies; 19+ messages in thread From: Sagi Grimberg @ 2017-09-20 11:51 UTC (permalink / raw) To: Hal Rosenstock, Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA Hey Hal! :) > There's another occurrence of qp->device->get_link_layer in that routine > just below this. Shouldn't that be replaced by rdma_port_get_link_layer > too ? You're absolutely correct! Sending a formal patch now. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <20170920095339.zhfymeyfbhiyepz5-qw2SdCWA0PpjqqEj2zc+bA@public.gmane.org> 2017-09-20 10:37 ` Sagi Grimberg @ 2017-09-20 16:32 ` Jason Gunthorpe [not found] ` <20170920163237.GD536-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 19+ messages in thread From: Jason Gunthorpe @ 2017-09-20 16:32 UTC (permalink / raw) To: Johannes Thumshirn Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg, dledford-H+wXaHxf7aLQT0dZR+AlfA On Wed, Sep 20, 2017 at 11:53:39AM +0200, Johannes Thumshirn wrote: > I wanted to try out Christoph's NVMe multipathing patchset on my NVMe OmniPath > setup and merged it into 4.14-rc1. On bootup I stumbled upon that splat and no > RDMA operation was possible: I think this was already found and fixed a month ago?? The oops is the same: https://patchwork.kernel.org/patch/9932505/ Doug, one of the topics during the LPC was 'what to QA' - it obviously causes QA problems if known bugs are left to sit on the mailing list for a month :( These are excatly the things need to get to Linus faster to get people on board the QA upstream train.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20170920163237.GD536-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <20170920163237.GD536-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2017-09-22 17:27 ` Doug Ledford [not found] ` <1506101272.5172.11.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Doug Ledford @ 2017-09-22 17:27 UTC (permalink / raw) To: Jason Gunthorpe, Johannes Thumshirn Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg On Wed, 2017-09-20 at 10:32 -0600, Jason Gunthorpe wrote: > On Wed, Sep 20, 2017 at 11:53:39AM +0200, Johannes Thumshirn wrote: > > I wanted to try out Christoph's NVMe multipathing patchset on my > > NVMe OmniPath > > setup and merged it into 4.14-rc1. On bootup I stumbled upon that > > splat and no > > RDMA operation was possible: > > I think this was already found and fixed a month ago?? The oops is > the same: > > https://patchwork.kernel.org/patch/9932505/ > > Doug, one of the topics during the LPC was 'what to QA' - it > obviously causes QA problems if known bugs are left to sit on the > mailing list for a month :( A few things: 1) It wasn't a month 2) I was out on well known, pre-announced PTO 3) I've got it now I can't do much else about it. > These are excatly the things need to get to Linus faster to get > people on board the QA upstream train.. > > Jason -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <1506101272.5172.11.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <1506101272.5172.11.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-09-22 19:48 ` Jason Gunthorpe [not found] ` <20170922194834.GA26479-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Jason Gunthorpe @ 2017-09-22 19:48 UTC (permalink / raw) To: Doug Ledford Cc: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg On Fri, Sep 22, 2017 at 01:27:52PM -0400, Doug Ledford wrote: > On Wed, 2017-09-20 at 10:32 -0600, Jason Gunthorpe wrote: > > On Wed, Sep 20, 2017 at 11:53:39AM +0200, Johannes Thumshirn wrote: > > > I wanted to try out Christoph's NVMe multipathing patchset on my > > > NVMe OmniPath > > > setup and merged it into 4.14-rc1. On bootup I stumbled upon that > > > splat and no > > > RDMA operation was possible: > > > > I think this was already found and fixed a month ago?? The oops is > > the same: > > > > https://patchwork.kernel.org/patch/9932505/ > > > > Doug, one of the topics during the LPC was 'what to QA' - it > > obviously causes QA problems if known bugs are left to sit on the > > mailing list for a month :( > > A few things: > > 1) It wasn't a month > 2) I was out on well known, pre-announced PTO > 3) I've got it now > > I can't do much else about it. Just so we are talking about the same expectation.. The patch v1 was on Aug 30, you accepted it on Sep 20, as I write this I don't see it on your k.o. I see it in your github tree, so I know it is on the way. If you push it to k.o at EOD today it will be ~27 days before it gets into the hands of anyone doing QA based on your k.o tree. If you send a PR on Monday it will be > 28 days before it gets into the hands of anyone doing QA from Linus's tree. I know this patch unavoidably overlaps with your PTO, but this is still essentially an example of the topic we discussed at LPC.. As a concrete recommendation, pushing this kind of patch to your k.o right away on the 20th and skipping the github 0day process might be helpful.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20170922194834.GA26479-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <20170922194834.GA26479-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2017-09-22 20:43 ` Leon Romanovsky 2017-09-22 21:06 ` Doug Ledford 1 sibling, 0 replies; 19+ messages in thread From: Leon Romanovsky @ 2017-09-22 20:43 UTC (permalink / raw) To: Jason Gunthorpe Cc: Doug Ledford, Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg [-- Attachment #1: Type: text/plain, Size: 2280 bytes --] On Fri, Sep 22, 2017 at 01:48:34PM -0600, Jason Gunthorpe wrote: > On Fri, Sep 22, 2017 at 01:27:52PM -0400, Doug Ledford wrote: > > On Wed, 2017-09-20 at 10:32 -0600, Jason Gunthorpe wrote: > > > On Wed, Sep 20, 2017 at 11:53:39AM +0200, Johannes Thumshirn wrote: > > > > I wanted to try out Christoph's NVMe multipathing patchset on my > > > > NVMe OmniPath > > > > setup and merged it into 4.14-rc1. On bootup I stumbled upon that > > > > splat and no > > > > RDMA operation was possible: > > > > > > I think this was already found and fixed a month ago?? The oops is > > > the same: > > > > > > https://patchwork.kernel.org/patch/9932505/ > > > > > > Doug, one of the topics during the LPC was 'what to QA' - it > > > obviously causes QA problems if known bugs are left to sit on the > > > mailing list for a month :( > > > > A few things: > > > > 1) It wasn't a month > > 2) I was out on well known, pre-announced PTO > > 3) I've got it now > > > > I can't do much else about it. > > Just so we are talking about the same expectation.. > > The patch v1 was on Aug 30, you accepted it on Sep 20, as I write this > I don't see it on your k.o. I see it in your github tree, so I know it > is on the way. Maybe my expectations are too high, but I don't see any difference between 3 weeks and 4 weeks in this particular case. > > If you push it to k.o at EOD today it will be ~27 days before it gets > into the hands of anyone doing QA based on your k.o tree. > > If you send a PR on Monday it will be > 28 days before it gets into > the hands of anyone doing QA from Linus's tree. Most of the time, QA doesn't run on k.o -rc branch but run it directly on Linus's tree. It is the best and the right way to test the whole kernel and to find bugs in the coming release. The focus is to clean release and not to test k.o. -rc branches, while the expectation is that -rc patches are small, localized and fix specific issue. The -next branch is a completely different story. > > I know this patch unavoidably overlaps with your PTO, but this is > still essentially an example of the topic we discussed at LPC.. > > As a concrete recommendation, pushing this kind of patch to your k.o > right away on the 20th and skipping the github 0day process might be > helpful.. > > Jason [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <20170922194834.GA26479-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2017-09-22 20:43 ` Leon Romanovsky @ 2017-09-22 21:06 ` Doug Ledford [not found] ` <1506114386.120853.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 19+ messages in thread From: Doug Ledford @ 2017-09-22 21:06 UTC (permalink / raw) To: Jason Gunthorpe Cc: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg On Fri, 2017-09-22 at 13:48 -0600, Jason Gunthorpe wrote: > On Fri, Sep 22, 2017 at 01:27:52PM -0400, Doug Ledford wrote: > > On Wed, 2017-09-20 at 10:32 -0600, Jason Gunthorpe wrote: > > > On Wed, Sep 20, 2017 at 11:53:39AM +0200, Johannes Thumshirn > > > wrote: > > > > I wanted to try out Christoph's NVMe multipathing patchset on > > > > my > > > > NVMe OmniPath > > > > setup and merged it into 4.14-rc1. On bootup I stumbled upon > > > > that > > > > splat and no > > > > RDMA operation was possible: > > > > > > I think this was already found and fixed a month ago?? The oops > > > is > > > the same: > > > > > > https://patchwork.kernel.org/patch/9932505/ > > > > > > Doug, one of the topics during the LPC was 'what to QA' - it > > > obviously causes QA problems if known bugs are left to sit on the > > > mailing list for a month :( > > > > A few things: > > > > 1) It wasn't a month > > 2) I was out on well known, pre-announced PTO > > 3) I've got it now > > > > I can't do much else about it. > > Just so we are talking about the same expectation.. > > The patch v1 was on Aug 30, you accepted it on Sep 20, as I write > this > I don't see it on your k.o. I see it in your github tree, so I know > it > is on the way. > > If you push it to k.o at EOD today it will be ~27 days before it gets > into the hands of anyone doing QA based on your k.o tree. > > If you send a PR on Monday it will be > 28 days before it gets into > the hands of anyone doing QA from Linus's tree. > > I know this patch unavoidably overlaps with your PTO, but this is > still essentially an example of the topic we discussed at LPC.. > > As a concrete recommendation, pushing this kind of patch to your k.o > right away on the 20th and skipping the github 0day process might be > helpful.. Sure, I get that, but I was already out on PTO on the 30th. What sucks is that it landed right after I was out. But I plan to have the pull request in before EOB today, so the difference between the 20th and today is neglible. Especially since lots of people doing QA testing prefer to take -rc tags, in that case, the difference is non-existent. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <1506114386.120853.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <1506114386.120853.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-09-22 21:17 ` Jason Gunthorpe [not found] ` <20170922211727.GA2348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Jason Gunthorpe @ 2017-09-22 21:17 UTC (permalink / raw) To: Doug Ledford Cc: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote: > Sure, I get that, but I was already out on PTO on the 30th. What sucks > is that it landed right after I was out. But I plan to have the pull > request in before EOB today, so the difference between the 20th and > today is neglible. Especially since lots of people doing QA testing > prefer to take -rc tags, in that case, the difference is non-existent. My thinking was that people should test -rc, but if they have problems they could grab your for-rc branch and check if their issue is already fixed.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20170922211727.GA2348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <20170922211727.GA2348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2017-09-22 22:42 ` Doug Ledford [not found] ` <1506120161.120853.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Doug Ledford @ 2017-09-22 22:42 UTC (permalink / raw) To: Jason Gunthorpe Cc: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote: > On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote: > > > Sure, I get that, but I was already out on PTO on the 30th. What > > sucks > > is that it landed right after I was out. But I plan to have the > > pull > > request in before EOB today, so the difference between the 20th and > > today is neglible. Especially since lots of people doing QA > > testing > > prefer to take -rc tags, in that case, the difference is non- > > existent. > > My thinking was that people should test -rc, Great, with you here... > but if they have problems > they could grab your for-rc branch and check if their issue is > already > fixed.. They can do this too... But if that still doesn't resolve their problem, a quick check of the mailing list contents isn't out of the question either. In that case, they would have found the solution to their problem. But, when you get right down to it, only one person reported it in addition to the original poster, so either other people saw the original post and compensated in their own testing, or (the more likely I think), most people don't start testing -rcs until after -rc2. Which is why I try to set -rc2 as a milestone for several purposes. For getting in the bulk of the known fixes, but also as a branching point for for-next. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <1506120161.120853.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <1506120161.120853.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-09-23 7:38 ` Leon Romanovsky [not found] ` <20170923073843.GX5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org> 2017-09-24 20:30 ` Jason Gunthorpe 1 sibling, 1 reply; 19+ messages in thread From: Leon Romanovsky @ 2017-09-23 7:38 UTC (permalink / raw) To: Doug Ledford Cc: Jason Gunthorpe, Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg [-- Attachment #1: Type: text/plain, Size: 1894 bytes --] On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote: > On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote: > > On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote: > > > > > Sure, I get that, but I was already out on PTO on the 30th. What > > > sucks > > > is that it landed right after I was out. But I plan to have the > > > pull > > > request in before EOB today, so the difference between the 20th and > > > today is neglible. Especially since lots of people doing QA > > > testing > > > prefer to take -rc tags, in that case, the difference is non- > > > existent. > > > > My thinking was that people should test -rc, > > Great, with you here... > > > but if they have problems > > they could grab your for-rc branch and check if their issue is > > already > > fixed.. > > They can do this too... > > But if that still doesn't resolve their problem, a quick check of the > mailing list contents isn't out of the question either. In that case, > they would have found the solution to their problem. But, when you get > right down to it, only one person reported it in addition to the > original poster, so either other people saw the original post and > compensated in their own testing, or (the more likely I think), most > people don't start testing -rcs until after -rc2. I don't know about other people, but our testing of -rc starts on -rc1 and we are not waiting for -rc2. From my observe of netdev, they also start to test -rc immediately. Otherwise, what is the point of the week between -rc1 and -rc2? > Which is why I try to set -rc2 as a milestone for several purposes. > For getting in the bulk of the known fixes, but also as a branching > point for for-next. > > -- > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20170923073843.GX5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>]
* RE: crash in 4.14-rc1 with IPoIB [not found] ` <20170923073843.GX5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org> @ 2017-09-23 16:17 ` Estrin, Alex [not found] ` <F3529576D8E232409F431C309E29399336CD972A-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Estrin, Alex @ 2017-09-23 16:17 UTC (permalink / raw) To: Leon Romanovsky, Doug Ledford Cc: Jason Gunthorpe, Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg Hello, One minor note regarding the original commit 523633359224 that broke the core. It seem it was let through without trivial validation, otherwise it wouldn't pass the checkpatch. Thanks, Alex. > On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote: > > On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote: > > > On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote: > > > > > > > Sure, I get that, but I was already out on PTO on the 30th. What > > > > sucks > > > > is that it landed right after I was out. But I plan to have the > > > > pull > > > > request in before EOB today, so the difference between the 20th and > > > > today is neglible. Especially since lots of people doing QA > > > > testing > > > > prefer to take -rc tags, in that case, the difference is non- > > > > existent. > > > > > > My thinking was that people should test -rc, > > > > Great, with you here... > > > > > but if they have problems > > > they could grab your for-rc branch and check if their issue is > > > already > > > fixed.. > > > > They can do this too... > > > > But if that still doesn't resolve their problem, a quick check of the > > mailing list contents isn't out of the question either. In that case, > > they would have found the solution to their problem. But, when you get > > right down to it, only one person reported it in addition to the > > original poster, so either other people saw the original post and > > compensated in their own testing, or (the more likely I think), most > > people don't start testing -rcs until after -rc2. > > I don't know about other people, but our testing of -rc starts on -rc1 > and we are not waiting for -rc2. From my observe of netdev, they also > start to test -rc immediately. > > Otherwise, what is the point of the week between -rc1 and -rc2? > > > Which is why I try to set -rc2 as a milestone for several purposes. > > For getting in the bulk of the known fixes, but also as a branching > > point for for-next. > > > > -- > > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > GPG KeyID: B826A3330E572FDD > > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <F3529576D8E232409F431C309E29399336CD972A-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <F3529576D8E232409F431C309E29399336CD972A-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2017-09-23 17:29 ` Leon Romanovsky [not found] ` <20170923172935.GZ5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Leon Romanovsky @ 2017-09-23 17:29 UTC (permalink / raw) To: Estrin, Alex Cc: Doug Ledford, Jason Gunthorpe, Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg [-- Attachment #1: Type: text/plain, Size: 3737 bytes --] On Sat, Sep 23, 2017 at 04:17:10PM +0000, Estrin, Alex wrote: > Hello, > > One minor note regarding the original commit 523633359224 > that broke the core. > It seem it was let through without trivial validation, > otherwise it wouldn't pass the checkpatch. Can you be more specific? Are you referring to "WARNING: line over 80 characters" or to something else? If yes, I feel really bad for you and your workplace. Readability is a first priority for the submitted code. ➜ linux-rdma git:(rdma-rc) git fp -1 523633359224 -o /tmp/ /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch ➜ linux-rdma git:(rdma-rc) ./scripts/checkpatch.pl --strict /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch WARNING: line over 80 characters #45: FILE: drivers/infiniband/core/verbs.c:1584: + if (qp->device->get_link_layer(qp->device, attr.port_num) != total: 0 errors, 1 warnings, 0 checks, 62 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. > > Thanks, > Alex. > > > On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote: > > > On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote: > > > > On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote: > > > > > > > > > Sure, I get that, but I was already out on PTO on the 30th. What > > > > > sucks > > > > > is that it landed right after I was out. But I plan to have the > > > > > pull > > > > > request in before EOB today, so the difference between the 20th and > > > > > today is neglible. Especially since lots of people doing QA > > > > > testing > > > > > prefer to take -rc tags, in that case, the difference is non- > > > > > existent. > > > > > > > > My thinking was that people should test -rc, > > > > > > Great, with you here... > > > > > > > but if they have problems > > > > they could grab your for-rc branch and check if their issue is > > > > already > > > > fixed.. > > > > > > They can do this too... > > > > > > But if that still doesn't resolve their problem, a quick check of the > > > mailing list contents isn't out of the question either. In that case, > > > they would have found the solution to their problem. But, when you get > > > right down to it, only one person reported it in addition to the > > > original poster, so either other people saw the original post and > > > compensated in their own testing, or (the more likely I think), most > > > people don't start testing -rcs until after -rc2. > > > > I don't know about other people, but our testing of -rc starts on -rc1 > > and we are not waiting for -rc2. From my observe of netdev, they also > > start to test -rc immediately. > > > > Otherwise, what is the point of the week between -rc1 and -rc2? > > > > > Which is why I try to set -rc2 as a milestone for several purposes. > > > For getting in the bulk of the known fixes, but also as a branching > > > point for for-next. > > > > > > -- > > > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > > GPG KeyID: B826A3330E572FDD > > > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20170923172935.GZ5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>]
* RE: crash in 4.14-rc1 with IPoIB [not found] ` <20170923172935.GZ5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org> @ 2017-09-23 19:20 ` Estrin, Alex [not found] ` <F3529576D8E232409F431C309E29399336CD9762-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Estrin, Alex @ 2017-09-23 19:20 UTC (permalink / raw) To: Leon Romanovsky Cc: Doug Ledford, Jason Gunthorpe, Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg > > Hello, > > > > One minor note regarding the original commit 523633359224 > > that broke the core. > > It seem it was let through without trivial validation, > > otherwise it wouldn't pass the checkpatch. > > Can you be more specific? Are you referring to "WARNING: line over 80 > characters" or to something else? If yes, I feel really bad for you and > your workplace. Please don't be. Keep doing a great job at your workplace, I will do the same at mine. > Readability is a first priority for the submitted code. I can agree with you on that, considering easy readable submitted code does not introduce a trivial bugs. > ➜ linux-rdma git:(rdma-rc) git fp -1 523633359224 -o /tmp/ > /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch > ➜ linux-rdma git:(rdma-rc) ./scripts/checkpatch.pl --strict /tmp/0001-IB-core-Fix- > the-validations-of-a-multicast-LID-in-at.patch > WARNING: line over 80 characters > #45: FILE: drivers/infiniband/core/verbs.c:1584: > + if (qp->device->get_link_layer(qp->device, attr.port_num) != > > total: 0 errors, 1 warnings, 0 checks, 62 lines checked > > NOTE: For some of the reported defects, checkpatch may be able to > mechanically convert to the typical style using --fix or --fix-inplace. > > /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch has style > problems, please review. > > NOTE: If any of the errors are false positives, please report > them to the maintainer, see CHECKPATCH in MAINTAINERS. > > > > > > Thanks, > > Alex. > > > > > On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote: > > > > On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote: > > > > > On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote: > > > > > > > > > > > Sure, I get that, but I was already out on PTO on the 30th. What > > > > > > sucks > > > > > > is that it landed right after I was out. But I plan to have the > > > > > > pull > > > > > > request in before EOB today, so the difference between the 20th and > > > > > > today is neglible. Especially since lots of people doing QA > > > > > > testing > > > > > > prefer to take -rc tags, in that case, the difference is non- > > > > > > existent. > > > > > > > > > > My thinking was that people should test -rc, > > > > > > > > Great, with you here... > > > > > > > > > but if they have problems > > > > > they could grab your for-rc branch and check if their issue is > > > > > already > > > > > fixed.. > > > > > > > > They can do this too... > > > > > > > > But if that still doesn't resolve their problem, a quick check of the > > > > mailing list contents isn't out of the question either. In that case, > > > > they would have found the solution to their problem. But, when you get > > > > right down to it, only one person reported it in addition to the > > > > original poster, so either other people saw the original post and > > > > compensated in their own testing, or (the more likely I think), most > > > > people don't start testing -rcs until after -rc2. > > > > > > I don't know about other people, but our testing of -rc starts on -rc1 > > > and we are not waiting for -rc2. From my observe of netdev, they also > > > start to test -rc immediately. > > > > > > Otherwise, what is the point of the week between -rc1 and -rc2? > > > > > > > Which is why I try to set -rc2 as a milestone for several purposes. > > > > For getting in the bulk of the known fixes, but also as a branching > > > > point for for-next. > > > > > > > > -- > > > > Doug Ledford <dledford@redhat.com> > > > > GPG KeyID: B826A3330E572FDD > > > > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 > 2FDD > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <F3529576D8E232409F431C309E29399336CD9762-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <F3529576D8E232409F431C309E29399336CD9762-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2017-09-24 4:00 ` Leon Romanovsky [not found] ` <20170924040012.GA21110-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Leon Romanovsky @ 2017-09-24 4:00 UTC (permalink / raw) To: Estrin, Alex Cc: Doug Ledford, Jason Gunthorpe, Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg [-- Attachment #1: Type: text/plain, Size: 4889 bytes --] On Sat, Sep 23, 2017 at 07:20:53PM +0000, Estrin, Alex wrote: > > > Hello, > > > > > > One minor note regarding the original commit 523633359224 > > > that broke the core. > > > It seem it was let through without trivial validation, > > > otherwise it wouldn't pass the checkpatch. > > > > Can you be more specific? Are you referring to "WARNING: line over 80 > > characters" or to something else? If yes, I feel really bad for you and > > your workplace. > Please don't be. Keep doing a great job at your workplace, I will do the same at mine. > > > Readability is a first priority for the submitted code. > I can agree with you on that, considering easy readable submitted code > does not introduce a trivial bugs. It will be very helpful to everyone if you stop to throw claims without any actual support. 1. Doug allows enough time to respond on the patches and neither you and neither your colleagues didn't see such "trivial bug" back then. 2. It fixed another "trivial bug" introduced by your colleague which broke RoCE (one of the most popular fabric in the stack) and we didn't cry other the internet about it. Before you are rushing to reply me, please consult with Denny, he can give you a short update on how hard the recent OPA changes in AH and LIDs broke the stack and RoCE/IB devices. > > > ➜ linux-rdma git:(rdma-rc) git fp -1 523633359224 -o /tmp/ > > /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch > > ➜ linux-rdma git:(rdma-rc) ./scripts/checkpatch.pl --strict /tmp/0001-IB-core-Fix- > > the-validations-of-a-multicast-LID-in-at.patch > > WARNING: line over 80 characters > > #45: FILE: drivers/infiniband/core/verbs.c:1584: > > + if (qp->device->get_link_layer(qp->device, attr.port_num) != > > > > total: 0 errors, 1 warnings, 0 checks, 62 lines checked > > > > NOTE: For some of the reported defects, checkpatch may be able to > > mechanically convert to the typical style using --fix or --fix-inplace. > > > > /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch has style > > problems, please review. > > > > NOTE: If any of the errors are false positives, please report > > them to the maintainer, see CHECKPATCH in MAINTAINERS. > > > > > > > > > > Thanks, > > > Alex. > > > > > > > On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote: > > > > > On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote: > > > > > > On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote: > > > > > > > > > > > > > Sure, I get that, but I was already out on PTO on the 30th. What > > > > > > > sucks > > > > > > > is that it landed right after I was out. But I plan to have the > > > > > > > pull > > > > > > > request in before EOB today, so the difference between the 20th and > > > > > > > today is neglible. Especially since lots of people doing QA > > > > > > > testing > > > > > > > prefer to take -rc tags, in that case, the difference is non- > > > > > > > existent. > > > > > > > > > > > > My thinking was that people should test -rc, > > > > > > > > > > Great, with you here... > > > > > > > > > > > but if they have problems > > > > > > they could grab your for-rc branch and check if their issue is > > > > > > already > > > > > > fixed.. > > > > > > > > > > They can do this too... > > > > > > > > > > But if that still doesn't resolve their problem, a quick check of the > > > > > mailing list contents isn't out of the question either. In that case, > > > > > they would have found the solution to their problem. But, when you get > > > > > right down to it, only one person reported it in addition to the > > > > > original poster, so either other people saw the original post and > > > > > compensated in their own testing, or (the more likely I think), most > > > > > people don't start testing -rcs until after -rc2. > > > > > > > > I don't know about other people, but our testing of -rc starts on -rc1 > > > > and we are not waiting for -rc2. From my observe of netdev, they also > > > > start to test -rc immediately. > > > > > > > > Otherwise, what is the point of the week between -rc1 and -rc2? > > > > > > > > > Which is why I try to set -rc2 as a milestone for several purposes. > > > > > For getting in the bulk of the known fixes, but also as a branching > > > > > point for for-next. > > > > > > > > > > -- > > > > > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > > > > GPG KeyID: B826A3330E572FDD > > > > > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 > > 2FDD > > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20170924040012.GA21110-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>]
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <20170924040012.GA21110-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org> @ 2017-09-24 5:59 ` Sagi Grimberg 0 siblings, 0 replies; 19+ messages in thread From: Sagi Grimberg @ 2017-09-24 5:59 UTC (permalink / raw) To: Leon Romanovsky, Estrin, Alex Cc: Doug Ledford, Jason Gunthorpe, Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig Guys, > It will be very helpful to everyone if you stop to throw claims without any actual support. > 1. Doug allows enough time to respond on the patches and neither you and neither your > colleagues didn't see such "trivial bug" back then. > 2. It fixed another "trivial bug" introduced by your colleague which > broke RoCE (one of the most popular fabric in the stack) and we didn't > cry other the internet about it. Please remove individual CC's from this correspondence. Also, please change the subject to something more suitable to the direction this discussion has taken. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: crash in 4.14-rc1 with IPoIB [not found] ` <1506120161.120853.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-09-23 7:38 ` Leon Romanovsky @ 2017-09-24 20:30 ` Jason Gunthorpe 1 sibling, 0 replies; 19+ messages in thread From: Jason Gunthorpe @ 2017-09-24 20:30 UTC (permalink / raw) To: Doug Ledford Cc: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig, Sagi Grimberg On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote: > But if that still doesn't resolve their problem, a quick check of the > mailing list contents isn't out of the question either. In that case, > they would have found the solution to their problem. But, when you get > right down to it, only one person reported it in addition to the Well, this has happened twice in recent memory that several people came to the list hitting something that already had a posted fix. The port-number issue had more comments, I think we were up to 4? IIRC Laurance spent a long time bisecting it even.. So I'm not sure 'check the list' is working. I view it as a really good sign, it means that people are finally testing upstream, not just waiting for OFED to test! Based on the LPC comments the people doing QA can do a better job with a some process help. > compensated in their own testing, or (the more likely I think), most > people don't start testing -rcs until after -rc2. Which is why I try > to set -rc2 as a milestone for several purposes. For getting in the Hrm, people need to test rc1 :| Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2017-09-24 20:30 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-09-20 9:53 crash in 4.14-rc1 with IPoIB Johannes Thumshirn [not found] ` <20170920095339.zhfymeyfbhiyepz5-qw2SdCWA0PpjqqEj2zc+bA@public.gmane.org> 2017-09-20 10:37 ` Sagi Grimberg [not found] ` <7aac2d78-462b-c9ad-4443-9ec670a27b74-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> 2017-09-20 10:57 ` Johannes Thumshirn 2017-09-20 11:35 ` Hal Rosenstock [not found] ` <be30c079-6513-627f-0276-6556e6f9eea5-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2017-09-20 11:51 ` Sagi Grimberg 2017-09-20 16:32 ` Jason Gunthorpe [not found] ` <20170920163237.GD536-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2017-09-22 17:27 ` Doug Ledford [not found] ` <1506101272.5172.11.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-09-22 19:48 ` Jason Gunthorpe [not found] ` <20170922194834.GA26479-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2017-09-22 20:43 ` Leon Romanovsky 2017-09-22 21:06 ` Doug Ledford [not found] ` <1506114386.120853.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-09-22 21:17 ` Jason Gunthorpe [not found] ` <20170922211727.GA2348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2017-09-22 22:42 ` Doug Ledford [not found] ` <1506120161.120853.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-09-23 7:38 ` Leon Romanovsky [not found] ` <20170923073843.GX5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org> 2017-09-23 16:17 ` Estrin, Alex [not found] ` <F3529576D8E232409F431C309E29399336CD972A-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2017-09-23 17:29 ` Leon Romanovsky [not found] ` <20170923172935.GZ5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org> 2017-09-23 19:20 ` Estrin, Alex [not found] ` <F3529576D8E232409F431C309E29399336CD9762-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2017-09-24 4:00 ` Leon Romanovsky [not found] ` <20170924040012.GA21110-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org> 2017-09-24 5:59 ` Sagi Grimberg 2017-09-24 20:30 ` Jason Gunthorpe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.