All of lore.kernel.org
 help / color / mirror / Atom feed
* crash in 4.14-rc1 with IPoIB
@ 2017-09-20  9:53 Johannes Thumshirn
       [not found] ` <20170920095339.zhfymeyfbhiyepz5-qw2SdCWA0PpjqqEj2zc+bA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Johannes Thumshirn @ 2017-09-20  9:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg,
	dledford-H+wXaHxf7aLQT0dZR+AlfA

Hi folks,

I wanted to try out Christoph's NVMe multipathing patchset on my NVMe OmniPath
setup and merged it into 4.14-rc1. On bootup I stumbled upon that splat and no
RDMA operation was possible:


hfi1 0000:ff:00.0: hfi1_1: send_idle_message: sending idle message 0x203
hfi1 0000:ff:00.0: hfi1_1: Switching to NO_DMA_RTAIL
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP:           (null)
PGD 0 P4D 0
Oops: 0010 [#1] SMP
Modules linked in: iptable_filter(E) af_packet(E) xt_nat(E) xt_tcpudp(E) iscsi_ibft(E) iscsi_boot_sysfs(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) libcrc32c(E) ip_tables(E) x_tables(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) ib_iser(E) libiscsi(E) scsi_transport_iscsi(E) ib_srpt(E) target_core_mod(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ib_srp(E) scsi_transport_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) configfs(E) ib_cm(E) iw_cm(E) mlx5_ib(E) intel_rapl(E) sha512_ssse3(E) skx_edac(E) sha512_generic(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) ipmi_ssif(E) pcbc(E) aesni_intel(E) mlx5_core
 (E)
 qat_c62x(E) aes_x86_64(E) intel_qat(E) mlxfw(E) joydev(E) hfi1(E) i40e(E) crypto_simd(E) devlink(E) rdmavt(E) ipmi_si(E) ptp(E) iTCO_wdt(E) dh_generic(E) glue_helper(E) iTCO_vendor_support(E) authenc(E) ib_core(E) pps_core(E) ipmi_devintf(E) mei_me(E) ioatdma(E) cryptd(E) lpc_ich(E) pcspkr(E) mfd_core(E) i2c_i801(E) shpchp(E) mei(E) dca(E) ipmi_msghandler(E) tpm_crb(E) nfit(E) libnvdimm(E) acpi_pad(E) sunrpc(E) btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) hid_generic(E) usbhid(E) raid6_pq(E) sd_mod(E) sr_mod(E) cdrom(E) crc32c_intel(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) xhci_pci(E) ahci(E) xhci_hcd(E) libahci(E) drm(E) usbcore(E) libata(E) wmi(E) button(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E
 )
 scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E)
CPU: 20 PID: 950 Comm: kworker/20:1H Tainted: G            E   4.14.0-rc1-6.3-default-nvme-mpath #773
 Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0412.020920172159 02/09/2017
 Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
 task: ffff882fce3f4b00 task.stack: ffffc9002422c000
 RIP: 0010:          (null)
 RSP: 0018:ffffc9002422f990 EFLAGS: 00010206
 RAX: ffff882fd0078000 RBX: ffff882fa0263000 RCX: ffffc9002422f998
 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff882fd0078000
 RBP: ffffc9002422fad0 R08: 0000000000000000 R09: ffff882fa0263080
 R10: ffffffffa0964ca0 R11: 0000000000000000 R12: ffff8817dcea3700
 R13: ffff882fa0263000 R14: 000000000000c000 R15: 000000000000c000
 FS:  0000000000000000(0000) GS:ffff882fdd000000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000017db346004 CR4: 00000000007606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  ? is_valid_mcast_lid.isra.23+0xfb/0x110 [ib_core]
  ib_attach_mcast+0x6f/0xa0 [ib_core]
  ipoib_mcast_attach+0x72/0x160 [ib_ipoib]
  ipoib_mcast_join_complete+0x354/0xb40 [ib_ipoib]
  mcast_work_handler+0x2ff/0x630 [ib_core]
  join_handler+0xf0/0x1e0 [ib_core]
  ib_sa_mcmember_rec_callback+0x54/0x80 [ib_core]
  recv_handler+0x3a/0x60 [ib_core]
  ib_mad_recv_done+0x43d/0xa20 [ib_core]
  __ib_process_cq+0x5d/0xb0 [ib_core]
  ib_cq_poll_work+0x20/0x60 [ib_core]
  process_one_work+0x138/0x370
  worker_thread+0x4d/0x3b0
  kthread+0x109/0x140
  ? rescuer_thread+0x320/0x320
  ? kthread_park+0x60/0x60
  ret_from_fork+0x25/0x30
 Code:  Bad RIP value.
 RIP:           (null) RSP: ffffc9002422f990
 CR2: 0000000000000000
 ---[ end trace f3c2d0cdf0ebfb9c ]---

is_valid_mcast_lid.isra.23+0xfb/0x110

(gdb) l *(is_valid_mcast_lid+0xfb)
0x229b is in is_valid_mcast_lid (drivers/infiniband/core/verbs.c:1649).
1644		/* If QP state >= init, it is assigned to a port and we can check this
1645		 * port only.
1646		 */
1647		if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) {
1648			if (attr.qp_state >= IB_QPS_INIT) {
1649				if (qp->device->get_link_layer(qp->device, attr.port_num) !=
1650				    IB_LINK_LAYER_INFINIBAND)
1651					return true;
1652				goto lid_check;
1653			}
(gdb) 

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn-l3A5Bk7waGM@public.gmane.org                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found] ` <20170920095339.zhfymeyfbhiyepz5-qw2SdCWA0PpjqqEj2zc+bA@public.gmane.org>
@ 2017-09-20 10:37   ` Sagi Grimberg
       [not found]     ` <7aac2d78-462b-c9ad-4443-9ec670a27b74-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  2017-09-20 16:32   ` Jason Gunthorpe
  1 sibling, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2017-09-20 10:37 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig,
	dledford-H+wXaHxf7aLQT0dZR+AlfA

> Hi folks,
> 
> I wanted to try out Christoph's NVMe multipathing patchset on my NVMe OmniPath
> setup and merged it into 4.14-rc1. On bootup I stumbled upon that splat and no
> RDMA operation was possible:

...

> is_valid_mcast_lid.isra.23+0xfb/0x110
> 
> (gdb) l *(is_valid_mcast_lid+0xfb)
> 0x229b is in is_valid_mcast_lid (drivers/infiniband/core/verbs.c:1649).
> 1644		/* If QP state >= init, it is assigned to a port and we can check this
> 1645		 * port only.
> 1646		 */
> 1647		if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) {
> 1648			if (attr.qp_state >= IB_QPS_INIT) {
> 1649				if (qp->device->get_link_layer(qp->device, attr.port_num) !=
> 1650				    IB_LINK_LAYER_INFINIBAND)
> 1651					return true;
> 1652				goto lid_check;
> 1653			}
> (gdb)

Why isn't ipoib uses the generic rdma_port_get_link_layer?

Does this help?
--
diff --git a/drivers/infiniband/core/verbs.c 
b/drivers/infiniband/core/verbs.c
index ee9e27dc799b..f2c70afea238 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1646,7 +1646,7 @@ static bool is_valid_mcast_lid(struct ib_qp *qp, 
u16 lid)
          */
         if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, 
&init_attr)) {
                 if (attr.qp_state >= IB_QPS_INIT) {
-                       if (qp->device->get_link_layer(qp->device, 
attr.port_num) !=
+                       if (rdma_port_get_link_layer(qp->device, 
attr.port_num) !=
                             IB_LINK_LAYER_INFINIBAND)
                                 return true;
                         goto lid_check;
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]     ` <7aac2d78-462b-c9ad-4443-9ec670a27b74-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2017-09-20 10:57       ` Johannes Thumshirn
  2017-09-20 11:35       ` Hal Rosenstock
  1 sibling, 0 replies; 19+ messages in thread
From: Johannes Thumshirn @ 2017-09-20 10:57 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A,
	Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig,
	dledford-H+wXaHxf7aLQT0dZR+AlfA

On Wed, Sep 20, 2017 at 01:37:28PM +0300, Sagi Grimberg wrote:
> --
> diff --git a/drivers/infiniband/core/verbs.c
> b/drivers/infiniband/core/verbs.c
> index ee9e27dc799b..f2c70afea238 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1646,7 +1646,7 @@ static bool is_valid_mcast_lid(struct ib_qp *qp, u16
> lid)
>          */
>         if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) {
>                 if (attr.qp_state >= IB_QPS_INIT) {
> -                       if (qp->device->get_link_layer(qp->device,
> attr.port_num) !=
> +                       if (rdma_port_get_link_layer(qp->device,
> attr.port_num) !=
>                             IB_LINK_LAYER_INFINIBAND)
>                                 return true;
>                         goto lid_check;
> --

w00000! You're my hero.

Tested-by: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
-- 
Johannes Thumshirn                                          Storage
jthumshirn-l3A5Bk7waGM@public.gmane.org                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]     ` <7aac2d78-462b-c9ad-4443-9ec670a27b74-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  2017-09-20 10:57       ` Johannes Thumshirn
@ 2017-09-20 11:35       ` Hal Rosenstock
       [not found]         ` <be30c079-6513-627f-0276-6556e6f9eea5-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Hal Rosenstock @ 2017-09-20 11:35 UTC (permalink / raw)
  To: Sagi Grimberg, Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig,
	dledford-H+wXaHxf7aLQT0dZR+AlfA

On 9/20/2017 6:37 AM, Sagi Grimberg wrote:
>> Hi folks,
>>
>> I wanted to try out Christoph's NVMe multipathing patchset on my NVMe
>> OmniPath
>> setup and merged it into 4.14-rc1. On bootup I stumbled upon that
>> splat and no
>> RDMA operation was possible:
> 
> ...
> 
>> is_valid_mcast_lid.isra.23+0xfb/0x110
>>
>> (gdb) l *(is_valid_mcast_lid+0xfb)
>> 0x229b is in is_valid_mcast_lid (drivers/infiniband/core/verbs.c:1649).
>> 1644        /* If QP state >= init, it is assigned to a port and we
>> can check this
>> 1645         * port only.
>> 1646         */
>> 1647        if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT,
>> &init_attr)) {
>> 1648            if (attr.qp_state >= IB_QPS_INIT) {
>> 1649                if (qp->device->get_link_layer(qp->device,
>> attr.port_num) !=
>> 1650                    IB_LINK_LAYER_INFINIBAND)
>> 1651                    return true;
>> 1652                goto lid_check;
>> 1653            }
>> (gdb)
> 
> Why isn't ipoib uses the generic rdma_port_get_link_layer?
> 
> Does this help?
> -- 
> diff --git a/drivers/infiniband/core/verbs.c
> b/drivers/infiniband/core/verbs.c
> index ee9e27dc799b..f2c70afea238 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1646,7 +1646,7 @@ static bool is_valid_mcast_lid(struct ib_qp *qp,
> u16 lid)
>          */
>         if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT,
> &init_attr)) {
>                 if (attr.qp_state >= IB_QPS_INIT) {
> -                       if (qp->device->get_link_layer(qp->device,
> attr.port_num) !=
> +                       if (rdma_port_get_link_layer(qp->device,
> attr.port_num) !=
>                             IB_LINK_LAYER_INFINIBAND)
>                                 return true;
>                         goto lid_check;

There's another occurrence of qp->device->get_link_layer in that routine
just below this. Shouldn't that be replaced by rdma_port_get_link_layer
too ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]         ` <be30c079-6513-627f-0276-6556e6f9eea5-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2017-09-20 11:51           ` Sagi Grimberg
  0 siblings, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2017-09-20 11:51 UTC (permalink / raw)
  To: Hal Rosenstock, Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig,
	dledford-H+wXaHxf7aLQT0dZR+AlfA

Hey Hal! :)

> There's another occurrence of qp->device->get_link_layer in that routine
> just below this. Shouldn't that be replaced by rdma_port_get_link_layer
> too ?

You're absolutely correct!

Sending a formal patch now.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found] ` <20170920095339.zhfymeyfbhiyepz5-qw2SdCWA0PpjqqEj2zc+bA@public.gmane.org>
  2017-09-20 10:37   ` Sagi Grimberg
@ 2017-09-20 16:32   ` Jason Gunthorpe
       [not found]     ` <20170920163237.GD536-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Jason Gunthorpe @ 2017-09-20 16:32 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A,
	Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig,
	Sagi Grimberg, dledford-H+wXaHxf7aLQT0dZR+AlfA

On Wed, Sep 20, 2017 at 11:53:39AM +0200, Johannes Thumshirn wrote:
> I wanted to try out Christoph's NVMe multipathing patchset on my NVMe OmniPath
> setup and merged it into 4.14-rc1. On bootup I stumbled upon that splat and no
> RDMA operation was possible:

I think this was already found and fixed a month ago?? The oops is the same:

https://patchwork.kernel.org/patch/9932505/

Doug, one of the topics during the LPC was 'what to QA' - it
obviously causes QA problems if known bugs are left to sit on the
mailing list for a month :(

These are excatly the things need to get to Linus faster to get
people on board the QA upstream train..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]     ` <20170920163237.GD536-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-09-22 17:27       ` Doug Ledford
       [not found]         ` <1506101272.5172.11.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Doug Ledford @ 2017-09-22 17:27 UTC (permalink / raw)
  To: Jason Gunthorpe, Johannes Thumshirn
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, leon-DgEjT+Ai2ygdnm+yROfE0A,
	Thomas Bogendoerfer, Bart Van Assche, Christoph Hellwig,
	Sagi Grimberg

On Wed, 2017-09-20 at 10:32 -0600, Jason Gunthorpe wrote:
> On Wed, Sep 20, 2017 at 11:53:39AM +0200, Johannes Thumshirn wrote:
> > I wanted to try out Christoph's NVMe multipathing patchset on my
> > NVMe OmniPath
> > setup and merged it into 4.14-rc1. On bootup I stumbled upon that
> > splat and no
> > RDMA operation was possible:
> 
> I think this was already found and fixed a month ago?? The oops is
> the same:
> 
> https://patchwork.kernel.org/patch/9932505/
> 
> Doug, one of the topics during the LPC was 'what to QA' - it
> obviously causes QA problems if known bugs are left to sit on the
> mailing list for a month :(

A few things:

1)  It wasn't a month
2)  I was out on well known, pre-announced PTO
3)  I've got it now

I can't do much else about it.

> These are excatly the things need to get to Linus faster to get
> people on board the QA upstream train..
> 
> Jason
-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]         ` <1506101272.5172.11.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-09-22 19:48           ` Jason Gunthorpe
       [not found]             ` <20170922194834.GA26479-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Gunthorpe @ 2017-09-22 19:48 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg

On Fri, Sep 22, 2017 at 01:27:52PM -0400, Doug Ledford wrote:
> On Wed, 2017-09-20 at 10:32 -0600, Jason Gunthorpe wrote:
> > On Wed, Sep 20, 2017 at 11:53:39AM +0200, Johannes Thumshirn wrote:
> > > I wanted to try out Christoph's NVMe multipathing patchset on my
> > > NVMe OmniPath
> > > setup and merged it into 4.14-rc1. On bootup I stumbled upon that
> > > splat and no
> > > RDMA operation was possible:
> > 
> > I think this was already found and fixed a month ago?? The oops is
> > the same:
> > 
> > https://patchwork.kernel.org/patch/9932505/
> > 
> > Doug, one of the topics during the LPC was 'what to QA' - it
> > obviously causes QA problems if known bugs are left to sit on the
> > mailing list for a month :(
> 
> A few things:
> 
> 1)  It wasn't a month
> 2)  I was out on well known, pre-announced PTO
> 3)  I've got it now
> 
> I can't do much else about it.

Just so we are talking about the same expectation..

The patch v1 was on Aug 30, you accepted it on Sep 20, as I write this
I don't see it on your k.o. I see it in your github tree, so I know it
is on the way.

If you push it to k.o at EOD today it will be ~27 days before it gets
into the hands of anyone doing QA based on your k.o tree.

If you send a PR on Monday it will be > 28 days before it gets into
the hands of anyone doing QA from Linus's tree.

I know this patch unavoidably overlaps with your PTO, but this is
still essentially an example of the topic we discussed at LPC..

As a concrete recommendation, pushing this kind of patch to your k.o
right away on the 20th and skipping the github 0day process might be
helpful..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]             ` <20170922194834.GA26479-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-09-22 20:43               ` Leon Romanovsky
  2017-09-22 21:06               ` Doug Ledford
  1 sibling, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2017-09-22 20:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Johannes Thumshirn,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg

[-- Attachment #1: Type: text/plain, Size: 2280 bytes --]

On Fri, Sep 22, 2017 at 01:48:34PM -0600, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2017 at 01:27:52PM -0400, Doug Ledford wrote:
> > On Wed, 2017-09-20 at 10:32 -0600, Jason Gunthorpe wrote:
> > > On Wed, Sep 20, 2017 at 11:53:39AM +0200, Johannes Thumshirn wrote:
> > > > I wanted to try out Christoph's NVMe multipathing patchset on my
> > > > NVMe OmniPath
> > > > setup and merged it into 4.14-rc1. On bootup I stumbled upon that
> > > > splat and no
> > > > RDMA operation was possible:
> > >
> > > I think this was already found and fixed a month ago?? The oops is
> > > the same:
> > >
> > > https://patchwork.kernel.org/patch/9932505/
> > >
> > > Doug, one of the topics during the LPC was 'what to QA' - it
> > > obviously causes QA problems if known bugs are left to sit on the
> > > mailing list for a month :(
> >
> > A few things:
> >
> > 1)  It wasn't a month
> > 2)  I was out on well known, pre-announced PTO
> > 3)  I've got it now
> >
> > I can't do much else about it.
>
> Just so we are talking about the same expectation..
>
> The patch v1 was on Aug 30, you accepted it on Sep 20, as I write this
> I don't see it on your k.o. I see it in your github tree, so I know it
> is on the way.

Maybe my expectations are too high, but I don't see any difference
between 3 weeks and 4 weeks in this particular case.

>
> If you push it to k.o at EOD today it will be ~27 days before it gets
> into the hands of anyone doing QA based on your k.o tree.
>
> If you send a PR on Monday it will be > 28 days before it gets into
> the hands of anyone doing QA from Linus's tree.

Most of the time, QA doesn't run on k.o -rc branch but run it directly
on Linus's tree. It is the best and the right way to test the whole kernel
and to find bugs in the coming release. The focus is to clean release
and not to test k.o. -rc branches, while the expectation is that -rc
patches are small, localized and fix specific issue.

The -next branch is a completely different story.

>
> I know this patch unavoidably overlaps with your PTO, but this is
> still essentially an example of the topic we discussed at LPC..
>
> As a concrete recommendation, pushing this kind of patch to your k.o
> right away on the 20th and skipping the github 0day process might be
> helpful..
>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]             ` <20170922194834.GA26479-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2017-09-22 20:43               ` Leon Romanovsky
@ 2017-09-22 21:06               ` Doug Ledford
       [not found]                 ` <1506114386.120853.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Doug Ledford @ 2017-09-22 21:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg

On Fri, 2017-09-22 at 13:48 -0600, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2017 at 01:27:52PM -0400, Doug Ledford wrote:
> > On Wed, 2017-09-20 at 10:32 -0600, Jason Gunthorpe wrote:
> > > On Wed, Sep 20, 2017 at 11:53:39AM +0200, Johannes Thumshirn
> > > wrote:
> > > > I wanted to try out Christoph's NVMe multipathing patchset on
> > > > my
> > > > NVMe OmniPath
> > > > setup and merged it into 4.14-rc1. On bootup I stumbled upon
> > > > that
> > > > splat and no
> > > > RDMA operation was possible:
> > > 
> > > I think this was already found and fixed a month ago?? The oops
> > > is
> > > the same:
> > > 
> > > https://patchwork.kernel.org/patch/9932505/
> > > 
> > > Doug, one of the topics during the LPC was 'what to QA' - it
> > > obviously causes QA problems if known bugs are left to sit on the
> > > mailing list for a month :(
> > 
> > A few things:
> > 
> > 1)  It wasn't a month
> > 2)  I was out on well known, pre-announced PTO
> > 3)  I've got it now
> > 
> > I can't do much else about it.
> 
> Just so we are talking about the same expectation..
> 
> The patch v1 was on Aug 30, you accepted it on Sep 20, as I write
> this
> I don't see it on your k.o. I see it in your github tree, so I know
> it
> is on the way.
> 
> If you push it to k.o at EOD today it will be ~27 days before it gets
> into the hands of anyone doing QA based on your k.o tree.
> 
> If you send a PR on Monday it will be > 28 days before it gets into
> the hands of anyone doing QA from Linus's tree.
> 
> I know this patch unavoidably overlaps with your PTO, but this is
> still essentially an example of the topic we discussed at LPC..
> 
> As a concrete recommendation, pushing this kind of patch to your k.o
> right away on the 20th and skipping the github 0day process might be
> helpful..

Sure, I get that, but I was already out on PTO on the 30th.  What sucks
is that it landed right after I was out.  But I plan to have the pull
request in before EOB today, so the difference between the 20th and
today is neglible.  Especially since lots of people doing QA testing
prefer to take -rc tags, in that case, the difference is non-existent.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
2FDD

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]                 ` <1506114386.120853.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-09-22 21:17                   ` Jason Gunthorpe
       [not found]                     ` <20170922211727.GA2348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Gunthorpe @ 2017-09-22 21:17 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg

On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote:

> Sure, I get that, but I was already out on PTO on the 30th.  What sucks
> is that it landed right after I was out.  But I plan to have the pull
> request in before EOB today, so the difference between the 20th and
> today is neglible.  Especially since lots of people doing QA testing
> prefer to take -rc tags, in that case, the difference is non-existent.

My thinking was that people should test -rc, but if they have problems
they could grab your for-rc branch and check if their issue is already
fixed..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]                     ` <20170922211727.GA2348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-09-22 22:42                       ` Doug Ledford
       [not found]                         ` <1506120161.120853.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Doug Ledford @ 2017-09-22 22:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg

On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote:
> 
> > Sure, I get that, but I was already out on PTO on the 30th.  What
> > sucks
> > is that it landed right after I was out.  But I plan to have the
> > pull
> > request in before EOB today, so the difference between the 20th and
> > today is neglible.  Especially since lots of people doing QA
> > testing
> > prefer to take -rc tags, in that case, the difference is non-
> > existent.
> 
> My thinking was that people should test -rc,

Great, with you here...

>  but if they have problems
> they could grab your for-rc branch and check if their issue is
> already
> fixed..

They can do this too...

But if that still doesn't resolve their problem, a quick check of the
mailing list contents isn't out of the question either.  In that case,
they would have found the solution to their problem.  But, when you get
right down to it, only one person reported it in addition to the
original poster, so either other people saw the original post and
compensated in their own testing, or (the more likely I think), most
people don't start testing -rcs until after -rc2.  Which is why I try
to set -rc2 as a milestone for several purposes.  For getting in the
bulk of the known fixes, but also as a branching point for for-next.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]                         ` <1506120161.120853.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-09-23  7:38                           ` Leon Romanovsky
       [not found]                             ` <20170923073843.GX5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  2017-09-24 20:30                           ` Jason Gunthorpe
  1 sibling, 1 reply; 19+ messages in thread
From: Leon Romanovsky @ 2017-09-23  7:38 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Jason Gunthorpe, Johannes Thumshirn,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg

[-- Attachment #1: Type: text/plain, Size: 1894 bytes --]

On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote:
> On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote:
> > On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote:
> >
> > > Sure, I get that, but I was already out on PTO on the 30th.  What
> > > sucks
> > > is that it landed right after I was out.  But I plan to have the
> > > pull
> > > request in before EOB today, so the difference between the 20th and
> > > today is neglible.  Especially since lots of people doing QA
> > > testing
> > > prefer to take -rc tags, in that case, the difference is non-
> > > existent.
> >
> > My thinking was that people should test -rc,
>
> Great, with you here...
>
> >  but if they have problems
> > they could grab your for-rc branch and check if their issue is
> > already
> > fixed..
>
> They can do this too...
>
> But if that still doesn't resolve their problem, a quick check of the
> mailing list contents isn't out of the question either.  In that case,
> they would have found the solution to their problem.  But, when you get
> right down to it, only one person reported it in addition to the
> original poster, so either other people saw the original post and
> compensated in their own testing, or (the more likely I think), most
> people don't start testing -rcs until after -rc2.

I don't know about other people, but our testing of -rc starts on -rc1
and we are not waiting for -rc2. From my observe of netdev, they also
start to test -rc immediately.

Otherwise, what is the point of the week between -rc1 and -rc2?

> Which is why I try to set -rc2 as a milestone for several purposes.
> For getting in the bulk of the known fixes, but also as a branching
> point for for-next.
>
> --
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>     GPG KeyID: B826A3330E572FDD
>     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: crash in 4.14-rc1 with IPoIB
       [not found]                             ` <20170923073843.GX5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-09-23 16:17                               ` Estrin, Alex
       [not found]                                 ` <F3529576D8E232409F431C309E29399336CD972A-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Estrin, Alex @ 2017-09-23 16:17 UTC (permalink / raw)
  To: Leon Romanovsky, Doug Ledford
  Cc: Jason Gunthorpe, Johannes Thumshirn,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg

Hello,

One minor note regarding the original commit 523633359224
that broke the core.
It seem it was let through without trivial validation,
otherwise it wouldn't pass the checkpatch.

Thanks,
Alex.

> On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote:
> > On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote:
> > > On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote:
> > >
> > > > Sure, I get that, but I was already out on PTO on the 30th.  What
> > > > sucks
> > > > is that it landed right after I was out.  But I plan to have the
> > > > pull
> > > > request in before EOB today, so the difference between the 20th and
> > > > today is neglible.  Especially since lots of people doing QA
> > > > testing
> > > > prefer to take -rc tags, in that case, the difference is non-
> > > > existent.
> > >
> > > My thinking was that people should test -rc,
> >
> > Great, with you here...
> >
> > >  but if they have problems
> > > they could grab your for-rc branch and check if their issue is
> > > already
> > > fixed..
> >
> > They can do this too...
> >
> > But if that still doesn't resolve their problem, a quick check of the
> > mailing list contents isn't out of the question either.  In that case,
> > they would have found the solution to their problem.  But, when you get
> > right down to it, only one person reported it in addition to the
> > original poster, so either other people saw the original post and
> > compensated in their own testing, or (the more likely I think), most
> > people don't start testing -rcs until after -rc2.
> 
> I don't know about other people, but our testing of -rc starts on -rc1
> and we are not waiting for -rc2. From my observe of netdev, they also
> start to test -rc immediately.
> 
> Otherwise, what is the point of the week between -rc1 and -rc2?
> 
> > Which is why I try to set -rc2 as a milestone for several purposes.
> > For getting in the bulk of the known fixes, but also as a branching
> > point for for-next.
> >
> > --
> > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >     GPG KeyID: B826A3330E572FDD
> >     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]                                 ` <F3529576D8E232409F431C309E29399336CD972A-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2017-09-23 17:29                                   ` Leon Romanovsky
       [not found]                                     ` <20170923172935.GZ5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Leon Romanovsky @ 2017-09-23 17:29 UTC (permalink / raw)
  To: Estrin, Alex
  Cc: Doug Ledford, Jason Gunthorpe, Johannes Thumshirn,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg

[-- Attachment #1: Type: text/plain, Size: 3737 bytes --]

On Sat, Sep 23, 2017 at 04:17:10PM +0000, Estrin, Alex wrote:
> Hello,
>
> One minor note regarding the original commit 523633359224
> that broke the core.
> It seem it was let through without trivial validation,
> otherwise it wouldn't pass the checkpatch.

Can you be more specific? Are you referring to "WARNING: line over 80
characters" or to something else? If yes, I feel really bad for you and
your workplace.

Readability is a first priority for the submitted code.

➜  linux-rdma git:(rdma-rc) git fp -1 523633359224 -o /tmp/
/tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch
➜  linux-rdma git:(rdma-rc) ./scripts/checkpatch.pl --strict /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch
WARNING: line over 80 characters
#45: FILE: drivers/infiniband/core/verbs.c:1584:
+			if (qp->device->get_link_layer(qp->device, attr.port_num) !=

total: 0 errors, 1 warnings, 0 checks, 62 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

/tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.


>
> Thanks,
> Alex.
>
> > On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote:
> > > On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote:
> > > > On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote:
> > > >
> > > > > Sure, I get that, but I was already out on PTO on the 30th.  What
> > > > > sucks
> > > > > is that it landed right after I was out.  But I plan to have the
> > > > > pull
> > > > > request in before EOB today, so the difference between the 20th and
> > > > > today is neglible.  Especially since lots of people doing QA
> > > > > testing
> > > > > prefer to take -rc tags, in that case, the difference is non-
> > > > > existent.
> > > >
> > > > My thinking was that people should test -rc,
> > >
> > > Great, with you here...
> > >
> > > >  but if they have problems
> > > > they could grab your for-rc branch and check if their issue is
> > > > already
> > > > fixed..
> > >
> > > They can do this too...
> > >
> > > But if that still doesn't resolve their problem, a quick check of the
> > > mailing list contents isn't out of the question either.  In that case,
> > > they would have found the solution to their problem.  But, when you get
> > > right down to it, only one person reported it in addition to the
> > > original poster, so either other people saw the original post and
> > > compensated in their own testing, or (the more likely I think), most
> > > people don't start testing -rcs until after -rc2.
> >
> > I don't know about other people, but our testing of -rc starts on -rc1
> > and we are not waiting for -rc2. From my observe of netdev, they also
> > start to test -rc immediately.
> >
> > Otherwise, what is the point of the week between -rc1 and -rc2?
> >
> > > Which is why I try to set -rc2 as a milestone for several purposes.
> > > For getting in the bulk of the known fixes, but also as a branching
> > > point for for-next.
> > >
> > > --
> > > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > >     GPG KeyID: B826A3330E572FDD
> > >     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
> > >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: crash in 4.14-rc1 with IPoIB
       [not found]                                     ` <20170923172935.GZ5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-09-23 19:20                                       ` Estrin, Alex
       [not found]                                         ` <F3529576D8E232409F431C309E29399336CD9762-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Estrin, Alex @ 2017-09-23 19:20 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Johannes Thumshirn,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg

> > Hello,
> >
> > One minor note regarding the original commit 523633359224
> > that broke the core.
> > It seem it was let through without trivial validation,
> > otherwise it wouldn't pass the checkpatch.
> 
> Can you be more specific? Are you referring to "WARNING: line over 80
> characters" or to something else? If yes, I feel really bad for you and
> your workplace.
Please don't be. Keep doing a great job at your workplace, I will do the same at mine.

> Readability is a first priority for the submitted code.
I can agree with you on that, considering easy readable submitted code
does not introduce a trivial bugs.

> ➜  linux-rdma git:(rdma-rc) git fp -1 523633359224 -o /tmp/
> /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch
> ➜  linux-rdma git:(rdma-rc) ./scripts/checkpatch.pl --strict /tmp/0001-IB-core-Fix-
> the-validations-of-a-multicast-LID-in-at.patch
> WARNING: line over 80 characters
> #45: FILE: drivers/infiniband/core/verbs.c:1584:
> +			if (qp->device->get_link_layer(qp->device, attr.port_num) !=
> 
> total: 0 errors, 1 warnings, 0 checks, 62 lines checked
> 
> NOTE: For some of the reported defects, checkpatch may be able to
>       mechanically convert to the typical style using --fix or --fix-inplace.
> 
> /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch has style
> problems, please review.
> 
> NOTE: If any of the errors are false positives, please report
>       them to the maintainer, see CHECKPATCH in MAINTAINERS.
> 
> 
> >
> > Thanks,
> > Alex.
> >
> > > On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote:
> > > > On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote:
> > > > > On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote:
> > > > >
> > > > > > Sure, I get that, but I was already out on PTO on the 30th.  What
> > > > > > sucks
> > > > > > is that it landed right after I was out.  But I plan to have the
> > > > > > pull
> > > > > > request in before EOB today, so the difference between the 20th and
> > > > > > today is neglible.  Especially since lots of people doing QA
> > > > > > testing
> > > > > > prefer to take -rc tags, in that case, the difference is non-
> > > > > > existent.
> > > > >
> > > > > My thinking was that people should test -rc,
> > > >
> > > > Great, with you here...
> > > >
> > > > >  but if they have problems
> > > > > they could grab your for-rc branch and check if their issue is
> > > > > already
> > > > > fixed..
> > > >
> > > > They can do this too...
> > > >
> > > > But if that still doesn't resolve their problem, a quick check of the
> > > > mailing list contents isn't out of the question either.  In that case,
> > > > they would have found the solution to their problem.  But, when you get
> > > > right down to it, only one person reported it in addition to the
> > > > original poster, so either other people saw the original post and
> > > > compensated in their own testing, or (the more likely I think), most
> > > > people don't start testing -rcs until after -rc2.
> > >
> > > I don't know about other people, but our testing of -rc starts on -rc1
> > > and we are not waiting for -rc2. From my observe of netdev, they also
> > > start to test -rc immediately.
> > >
> > > Otherwise, what is the point of the week between -rc1 and -rc2?
> > >
> > > > Which is why I try to set -rc2 as a milestone for several purposes.
> > > > For getting in the bulk of the known fixes, but also as a branching
> > > > point for for-next.
> > > >
> > > > --
> > > > Doug Ledford <dledford@redhat.com>
> > > >     GPG KeyID: B826A3330E572FDD
> > > >     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
> 2FDD
> > > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]                                         ` <F3529576D8E232409F431C309E29399336CD9762-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2017-09-24  4:00                                           ` Leon Romanovsky
       [not found]                                             ` <20170924040012.GA21110-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Leon Romanovsky @ 2017-09-24  4:00 UTC (permalink / raw)
  To: Estrin, Alex
  Cc: Doug Ledford, Jason Gunthorpe, Johannes Thumshirn,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg

[-- Attachment #1: Type: text/plain, Size: 4889 bytes --]

On Sat, Sep 23, 2017 at 07:20:53PM +0000, Estrin, Alex wrote:
> > > Hello,
> > >
> > > One minor note regarding the original commit 523633359224
> > > that broke the core.
> > > It seem it was let through without trivial validation,
> > > otherwise it wouldn't pass the checkpatch.
> >
> > Can you be more specific? Are you referring to "WARNING: line over 80
> > characters" or to something else? If yes, I feel really bad for you and
> > your workplace.
> Please don't be. Keep doing a great job at your workplace, I will do the same at mine.
>
> > Readability is a first priority for the submitted code.
> I can agree with you on that, considering easy readable submitted code
> does not introduce a trivial bugs.

It will be very helpful to everyone if you stop to throw claims without any actual support.
1. Doug allows enough time to respond on the patches and neither you and neither your
colleagues didn't see such "trivial bug" back then.
2. It fixed another "trivial bug" introduced by your colleague which
broke RoCE (one of the most popular fabric in the stack) and we didn't
cry other the internet about it.

Before you are rushing to reply me, please consult with Denny, he can
give you a short update on how hard the recent OPA changes in AH and
LIDs broke the stack and RoCE/IB devices.

>
> > ➜  linux-rdma git:(rdma-rc) git fp -1 523633359224 -o /tmp/
> > /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch
> > ➜  linux-rdma git:(rdma-rc) ./scripts/checkpatch.pl --strict /tmp/0001-IB-core-Fix-
> > the-validations-of-a-multicast-LID-in-at.patch
> > WARNING: line over 80 characters
> > #45: FILE: drivers/infiniband/core/verbs.c:1584:
> > +			if (qp->device->get_link_layer(qp->device, attr.port_num) !=
> >
> > total: 0 errors, 1 warnings, 0 checks, 62 lines checked
> >
> > NOTE: For some of the reported defects, checkpatch may be able to
> >       mechanically convert to the typical style using --fix or --fix-inplace.
> >
> > /tmp/0001-IB-core-Fix-the-validations-of-a-multicast-LID-in-at.patch has style
> > problems, please review.
> >
> > NOTE: If any of the errors are false positives, please report
> >       them to the maintainer, see CHECKPATCH in MAINTAINERS.
> >
> >
> > >
> > > Thanks,
> > > Alex.
> > >
> > > > On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote:
> > > > > On Fri, 2017-09-22 at 15:17 -0600, Jason Gunthorpe wrote:
> > > > > > On Fri, Sep 22, 2017 at 05:06:26PM -0400, Doug Ledford wrote:
> > > > > >
> > > > > > > Sure, I get that, but I was already out on PTO on the 30th.  What
> > > > > > > sucks
> > > > > > > is that it landed right after I was out.  But I plan to have the
> > > > > > > pull
> > > > > > > request in before EOB today, so the difference between the 20th and
> > > > > > > today is neglible.  Especially since lots of people doing QA
> > > > > > > testing
> > > > > > > prefer to take -rc tags, in that case, the difference is non-
> > > > > > > existent.
> > > > > >
> > > > > > My thinking was that people should test -rc,
> > > > >
> > > > > Great, with you here...
> > > > >
> > > > > >  but if they have problems
> > > > > > they could grab your for-rc branch and check if their issue is
> > > > > > already
> > > > > > fixed..
> > > > >
> > > > > They can do this too...
> > > > >
> > > > > But if that still doesn't resolve their problem, a quick check of the
> > > > > mailing list contents isn't out of the question either.  In that case,
> > > > > they would have found the solution to their problem.  But, when you get
> > > > > right down to it, only one person reported it in addition to the
> > > > > original poster, so either other people saw the original post and
> > > > > compensated in their own testing, or (the more likely I think), most
> > > > > people don't start testing -rcs until after -rc2.
> > > >
> > > > I don't know about other people, but our testing of -rc starts on -rc1
> > > > and we are not waiting for -rc2. From my observe of netdev, they also
> > > > start to test -rc immediately.
> > > >
> > > > Otherwise, what is the point of the week between -rc1 and -rc2?
> > > >
> > > > > Which is why I try to set -rc2 as a milestone for several purposes.
> > > > > For getting in the bulk of the known fixes, but also as a branching
> > > > > point for for-next.
> > > > >
> > > > > --
> > > > > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > >     GPG KeyID: B826A3330E572FDD
> > > > >     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
> > 2FDD
> > > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]                                             ` <20170924040012.GA21110-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-09-24  5:59                                               ` Sagi Grimberg
  0 siblings, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2017-09-24  5:59 UTC (permalink / raw)
  To: Leon Romanovsky, Estrin, Alex
  Cc: Doug Ledford, Jason Gunthorpe, Johannes Thumshirn,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig

Guys,

> It will be very helpful to everyone if you stop to throw claims without any actual support.
> 1. Doug allows enough time to respond on the patches and neither you and neither your
> colleagues didn't see such "trivial bug" back then.
> 2. It fixed another "trivial bug" introduced by your colleague which
> broke RoCE (one of the most popular fabric in the stack) and we didn't
> cry other the internet about it.

Please remove individual CC's from this correspondence.

Also, please change the subject to something more suitable to the
direction this discussion has taken.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crash in 4.14-rc1 with IPoIB
       [not found]                         ` <1506120161.120853.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-09-23  7:38                           ` Leon Romanovsky
@ 2017-09-24 20:30                           ` Jason Gunthorpe
  1 sibling, 0 replies; 19+ messages in thread
From: Jason Gunthorpe @ 2017-09-24 20:30 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Johannes Thumshirn, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	leon-DgEjT+Ai2ygdnm+yROfE0A, Thomas Bogendoerfer,
	Bart Van Assche, Christoph Hellwig, Sagi Grimberg

On Fri, Sep 22, 2017 at 06:42:41PM -0400, Doug Ledford wrote:

> But if that still doesn't resolve their problem, a quick check of the
> mailing list contents isn't out of the question either.  In that case,
> they would have found the solution to their problem.  But, when you get
> right down to it, only one person reported it in addition to the

Well, this has happened twice in recent memory that several people
came to the list hitting something that already had a posted fix. The
port-number issue had more comments, I think we were up to 4? IIRC
Laurance spent a long time bisecting it even.. So I'm not sure 'check
the list' is working.

I view it as a really good sign, it means that people are finally
testing upstream, not just waiting for OFED to test!

Based on the LPC comments the people doing QA can do a better job with
a some process help.

> compensated in their own testing, or (the more likely I think), most
> people don't start testing -rcs until after -rc2.  Which is why I try
> to set -rc2 as a milestone for several purposes.  For getting in the

Hrm, people need to test rc1 :|

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-09-24 20:30 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-20  9:53 crash in 4.14-rc1 with IPoIB Johannes Thumshirn
     [not found] ` <20170920095339.zhfymeyfbhiyepz5-qw2SdCWA0PpjqqEj2zc+bA@public.gmane.org>
2017-09-20 10:37   ` Sagi Grimberg
     [not found]     ` <7aac2d78-462b-c9ad-4443-9ec670a27b74-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-09-20 10:57       ` Johannes Thumshirn
2017-09-20 11:35       ` Hal Rosenstock
     [not found]         ` <be30c079-6513-627f-0276-6556e6f9eea5-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2017-09-20 11:51           ` Sagi Grimberg
2017-09-20 16:32   ` Jason Gunthorpe
     [not found]     ` <20170920163237.GD536-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-09-22 17:27       ` Doug Ledford
     [not found]         ` <1506101272.5172.11.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-09-22 19:48           ` Jason Gunthorpe
     [not found]             ` <20170922194834.GA26479-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-09-22 20:43               ` Leon Romanovsky
2017-09-22 21:06               ` Doug Ledford
     [not found]                 ` <1506114386.120853.2.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-09-22 21:17                   ` Jason Gunthorpe
     [not found]                     ` <20170922211727.GA2348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-09-22 22:42                       ` Doug Ledford
     [not found]                         ` <1506120161.120853.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-09-23  7:38                           ` Leon Romanovsky
     [not found]                             ` <20170923073843.GX5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-09-23 16:17                               ` Estrin, Alex
     [not found]                                 ` <F3529576D8E232409F431C309E29399336CD972A-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-09-23 17:29                                   ` Leon Romanovsky
     [not found]                                     ` <20170923172935.GZ5788-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-09-23 19:20                                       ` Estrin, Alex
     [not found]                                         ` <F3529576D8E232409F431C309E29399336CD9762-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-09-24  4:00                                           ` Leon Romanovsky
     [not found]                                             ` <20170924040012.GA21110-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-09-24  5:59                                               ` Sagi Grimberg
2017-09-24 20:30                           ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.