linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/1] net/mlx5: Failing fw tracer allocation on s390
@ 2020-04-07  8:01 Niklas Schnelle
  2020-04-07  8:01 ` [RFC 1/1] net/mlx5: Fix failing " Niklas Schnelle
  0 siblings, 1 reply; 4+ messages in thread
From: Niklas Schnelle @ 2020-04-07  8:01 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: netdev, David S . Miller, linux-kernel, Leon Romanovsky,
	Eran Ben Elisha, Moshe Shemesh, Niklas Schnelle

Hello,

I sent this message before, but it seems to have fallen through the cracks so,
as this remains an issue I wanted to ping you again.

On s390 the MLX5 driver generates the following stack trace when
initializing a device with support for firmware tracing.

[  331.531819] WARNING: CPU: 7 PID: 2156 at mm/page_alloc.c:4727 __alloc_pages_nodemask+0x25c/0x320
[  331.531820] Modules linked in: mlx5_core(+) mlxfw tls ptp pps_core s390_trng chsc_sch vfio_ccw vfio_mdev mdev eadm_sch vfio_iommu_type1 vfio sch_fq_codel ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_service_time pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 qeth_l2 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp qeth scsi_transport_fc qdio ccwgroup scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[  331.531833] CPU: 7 PID: 2156 Comm: systemd-udevd Not tainted 5.4.0-14-generic #17-Ubuntu
[  331.531833] Hardware name: IBM 8562 GT2 A00 (LPAR)
[  331.531834] Krnl PSW : 0704c00180000000 00000000735d720c (__alloc_pages_nodemask+0x25c/0x320)
[  331.531836]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[  331.531837] Krnl GPRS: 000000007418d687 0000000000040dc0 0000000000040dc0 000000000000000a
[  331.531837]            0000000000000000 0000000000000000 000000000000000a 000003ff8042607e
[  331.531838]            0000000000000dc0 00000000002203b0 000000000000000a 00000001c9480120
[  331.531838]            00000001ecda4400 0000000000000055 000003e001943680 000003e001943600
[  331.531844] Krnl Code: 00000000735d7200: a7212000            tmll    %r2,8192
		      00000000735d7204: a774ff87            brc     7,00000000735d7112
		     #00000000735d7208: a7f40001            brc     15,00000000735d720a
		     >00000000735d720c: a7890000            lghi    %r8,0
		      00000000735d7210: a7f4ff83            brc     15,00000000735d7116
		      00000000735d7214: a7180000            lhi     %r1,0
		      00000000735d7218: a7f4ff1b            brc     15,00000000735d704e
		      00000000735d721c: e31003400004        lg      %r1,832
[  331.531851] Call Trace:
[  331.531852] ([<0000000000000201>] 0x201)
[  331.531856]  [<00000000735a20c4>] kmalloc_order+0x34/0xb0
[  331.531856]  [<00000000735a2172>] kmalloc_order_trace+0x32/0xe0
[  331.531880]  [<000003ff8042607e>] mlx5_fw_tracer_create+0x3e/0x500 [mlx5_core]
[  331.531899]  [<000003ff803ffa88>] mlx5_init_once+0x148/0x3c0 [mlx5_core]
[  331.531917]  [<000003ff8040152a>] mlx5_load_one+0x7a/0x240 [mlx5_core]
[  331.531935]  [<000003ff804018d8>] init_one+0x1e8/0x310 [mlx5_core]
[  331.531939]  [<0000000073916e16>] local_pci_probe+0x56/0xc0
[  331.531941]  [<0000000073917ef2>] pci_device_probe+0x132/0x1e0
[  331.531942]  [<00000000739a1374>] really_probe+0xf4/0x460
[  331.531943]  [<00000000739a1a60>] driver_probe_device+0x130/0x190
[  331.531944]  [<00000000739a1dae>] device_driver_attach+0x7e/0xa0
[  331.531945]  [<00000000739a1e86>] __driver_attach+0xb6/0x180
[  331.531947]  [<000000007399eae2>] bus_for_each_dev+0x82/0xc0
[  331.531948]  [<00000000739a030a>] bus_add_driver+0x16a/0x260
[  331.531949]  [<00000000739a2b38>] driver_register+0x88/0x150
[  331.531967]  [<000003ff80362080>] init+0x80/0xb0 [mlx5_core]
[  331.531968]  [<00000000733648bc>] do_one_initcall+0x3c/0x200
[  331.531970]  [<0000000073495fc0>] do_init_module+0x70/0x270
[  331.531970]  [<00000000734983b2>] load_module+0x1142/0x1440
[  331.531971]  [<00000000734988e4>] __do_sys_finit_module+0xa4/0xf0
[  331.531973]  [<0000000073c54ec2>] system_call+0x2a6/0x2c8
[  331.531974] Last Breaking-Event-Address:
[  331.531975]  [<00000000735d7208>] __alloc_pages_nodemask+0x258/0x320
[  331.531975] ---[ end trace 5985b580c6dbfd3e ]---

This happens because on s390 FORCE_MAX_ZONEORDER is 9 instead of 11, such
a large kzalloc() allocation will thus always fail. As the tracer is
a debug feature and failing allocations are checked the device remains
usable but of course we still want to enable this feature and get rid of the
warning.

Looking at mlx5_fw_tracer_save_trace() I would think that since it is
actually the driver not the device that copies into the trace array we can
just use kvzalloc() instead. This patch prevents the above stack trace for
us, though

I haven't yet been able to test the tracing as the devlink command given in
the initial commit fd1483fe1f9f ("net/mlx5: Add support for FW reporter
dump") always reports 'Object "health" not found' for me even after
enabling it via

echo traceon > /sys/kernel/debug/tracing/events/mlx5/mlx5_fw/enable

Still I wanted to get your comments on the proposed fix and maybe a hint at
how to test this.

Best regards,
Niklas Schnelle

Niklas Schnelle (1):
  net/mlx5: Fix failing fw tracer allocation on s390

 drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC 1/1] net/mlx5: Fix failing fw tracer allocation on s390
  2020-04-07  8:01 [RFC 0/1] net/mlx5: Failing fw tracer allocation on s390 Niklas Schnelle
@ 2020-04-07  8:01 ` Niklas Schnelle
  2020-04-08 19:36   ` Saeed Mahameed
  0 siblings, 1 reply; 4+ messages in thread
From: Niklas Schnelle @ 2020-04-07  8:01 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: netdev, David S . Miller, linux-kernel, Leon Romanovsky,
	Eran Ben Elisha, Moshe Shemesh, Niklas Schnelle

On s390 FORCE_MAX_ZONEORDER is 9 instead of 11, thus a larger kzalloc()
allocation as done for the firmware tracer will always fail.

Looking at mlx5_fw_tracer_save_trace(), it is actually the driver itself
that copies the debug data into the trace array and there is no need for
the allocation to be contiguous in physical memory. We can therefor use
kvzalloc() instead of kzalloc() and get rid of the large contiguous
allcoation.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
index c9c9b479bda5..5ce6ebbc7f10 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
@@ -935,7 +935,7 @@ struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev)
 		return NULL;
 	}
 
-	tracer = kzalloc(sizeof(*tracer), GFP_KERNEL);
+	tracer = kvzalloc(sizeof(*tracer), GFP_KERNEL);
 	if (!tracer)
 		return ERR_PTR(-ENOMEM);
 
@@ -982,7 +982,7 @@ struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev)
 	tracer->dev = NULL;
 	destroy_workqueue(tracer->work_queue);
 free_tracer:
-	kfree(tracer);
+	kvfree(tracer);
 	return ERR_PTR(err);
 }
 
@@ -1061,7 +1061,7 @@ void mlx5_fw_tracer_destroy(struct mlx5_fw_tracer *tracer)
 	mlx5_fw_tracer_destroy_log_buf(tracer);
 	flush_workqueue(tracer->work_queue);
 	destroy_workqueue(tracer->work_queue);
-	kfree(tracer);
+	kvfree(tracer);
 }
 
 static int fw_tracer_event(struct notifier_block *nb, unsigned long action, void *data)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC 1/1] net/mlx5: Fix failing fw tracer allocation on s390
  2020-04-07  8:01 ` [RFC 1/1] net/mlx5: Fix failing " Niklas Schnelle
@ 2020-04-08 19:36   ` Saeed Mahameed
  0 siblings, 0 replies; 4+ messages in thread
From: Saeed Mahameed @ 2020-04-08 19:36 UTC (permalink / raw)
  To: schnelle, Feras Daoud, Alex Vesker
  Cc: Eran Ben Elisha, Moshe Shemesh, netdev, davem, leon, linux-kernel

On Tue, 2020-04-07 at 10:01 +0200, Niklas Schnelle wrote:
> On s390 FORCE_MAX_ZONEORDER is 9 instead of 11, thus a larger
> kzalloc()
> allocation as done for the firmware tracer will always fail.
> 
> Looking at mlx5_fw_tracer_save_trace(), it is actually the driver
> itself
> that copies the debug data into the trace array and there is no need
> for
> the allocation to be contiguous in physical memory. We can therefor
> use
> kvzalloc() instead of kzalloc() and get rid of the large contiguous
> allcoation.
> 

This looks fine and very straight forward.. i don't expect any issue
with this.

Please provide a proper "Fixes:" tag and resubmit to net without the
[RFC].

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC 0/1] net/mlx5: Failing fw tracer allocation on s390
@ 2020-03-18 16:44 Niklas Schnelle
  0 siblings, 0 replies; 4+ messages in thread
From: Niklas Schnelle @ 2020-03-18 16:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Saeed Mahameed, Leon Romanovsky, Eran Ben Elisha, Moshe Shemesh,
	Niklas Schnelle

Hi,

on s390 the MLX5 driver generates the following stack trace when
initializing a device with support for firmware tracing.

[  331.531813] ------------[ cut here ]------------
[  331.531819] WARNING: CPU: 7 PID: 2156 at mm/page_alloc.c:4727 __alloc_pages_nodemask+0x25c/0x320
[  331.531820] Modules linked in: mlx5_core(+) mlxfw tls ptp pps_core s390_trng chsc_sch vfio_ccw vfio_mdev mdev eadm_sch vfio_iommu_type1 vfio sch_fq_codel ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_service_time pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 qeth_l2 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp qeth scsi_transport_fc qdio ccwgroup scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[  331.531833] CPU: 7 PID: 2156 Comm: systemd-udevd Not tainted 5.4.0-14-generic #17-Ubuntu
[  331.531833] Hardware name: IBM 8562 GT2 A00 (LPAR)
[  331.531834] Krnl PSW : 0704c00180000000 00000000735d720c (__alloc_pages_nodemask+0x25c/0x320)
[  331.531836]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[  331.531837] Krnl GPRS: 000000007418d687 0000000000040dc0 0000000000040dc0 000000000000000a
[  331.531837]            0000000000000000 0000000000000000 000000000000000a 000003ff8042607e
[  331.531838]            0000000000000dc0 00000000002203b0 000000000000000a 00000001c9480120
[  331.531838]            00000001ecda4400 0000000000000055 000003e001943680 000003e001943600
[  331.531844] Krnl Code: 00000000735d7200: a7212000            tmll    %r2,8192
		      00000000735d7204: a774ff87            brc     7,00000000735d7112
		     #00000000735d7208: a7f40001            brc     15,00000000735d720a
		     >00000000735d720c: a7890000            lghi    %r8,0
		      00000000735d7210: a7f4ff83            brc     15,00000000735d7116
		      00000000735d7214: a7180000            lhi     %r1,0
		      00000000735d7218: a7f4ff1b            brc     15,00000000735d704e
		      00000000735d721c: e31003400004        lg      %r1,832
[  331.531851] Call Trace:
[  331.531852] ([<0000000000000201>] 0x201)
[  331.531856]  [<00000000735a20c4>] kmalloc_order+0x34/0xb0
[  331.531856]  [<00000000735a2172>] kmalloc_order_trace+0x32/0xe0
[  331.531880]  [<000003ff8042607e>] mlx5_fw_tracer_create+0x3e/0x500 [mlx5_core]
[  331.531899]  [<000003ff803ffa88>] mlx5_init_once+0x148/0x3c0 [mlx5_core]
[  331.531917]  [<000003ff8040152a>] mlx5_load_one+0x7a/0x240 [mlx5_core]
[  331.531935]  [<000003ff804018d8>] init_one+0x1e8/0x310 [mlx5_core]
[  331.531939]  [<0000000073916e16>] local_pci_probe+0x56/0xc0
[  331.531941]  [<0000000073917ef2>] pci_device_probe+0x132/0x1e0
[  331.531942]  [<00000000739a1374>] really_probe+0xf4/0x460
[  331.531943]  [<00000000739a1a60>] driver_probe_device+0x130/0x190
[  331.531944]  [<00000000739a1dae>] device_driver_attach+0x7e/0xa0
[  331.531945]  [<00000000739a1e86>] __driver_attach+0xb6/0x180
[  331.531947]  [<000000007399eae2>] bus_for_each_dev+0x82/0xc0
[  331.531948]  [<00000000739a030a>] bus_add_driver+0x16a/0x260
[  331.531949]  [<00000000739a2b38>] driver_register+0x88/0x150
[  331.531967]  [<000003ff80362080>] init+0x80/0xb0 [mlx5_core]
[  331.531968]  [<00000000733648bc>] do_one_initcall+0x3c/0x200
[  331.531970]  [<0000000073495fc0>] do_init_module+0x70/0x270
[  331.531970]  [<00000000734983b2>] load_module+0x1142/0x1440
[  331.531971]  [<00000000734988e4>] __do_sys_finit_module+0xa4/0xf0
[  331.531973]  [<0000000073c54ec2>] system_call+0x2a6/0x2c8
[  331.531974] Last Breaking-Event-Address:
[  331.531975]  [<00000000735d7208>] __alloc_pages_nodemask+0x258/0x320
[  331.531975] ---[ end trace 5985b580c6dbfd3e ]---

This is because on s390 FORCE_MAX_ZONEORDER is 9 instead of 11, such
a large kzalloc() allocation will thus always fail. As the tracer is
a debug feature and failing allocations are checked the device is usable
however.

Looking at mlx5_fw_tracer_save_trace() I would think that since it is
actually the driver not the device that copies into the trace array we can
just use kvzalloc() instead. This patch prevents the above stack trace for
us, though

I haven't yet been able to test the tracing as the devlink command given in
the initial commit fd1483fe1f9f ("net/mlx5: Add support for FW reporter
dump") always reports 'Object "health" not found' for me even after
enabling it via

echo traceon > /sys/kernel/debug/tracing/events/mlx5/mlx5_fw/enable

Still I wanted to get your comments on the proposed fix and maybe a hint at
how to test this.

Best regards,
Niklas Schnelle

Niklas Schnelle (1):
  net/mlx5: Fix failing fw tracer allocation on s390

 drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-08 19:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-07  8:01 [RFC 0/1] net/mlx5: Failing fw tracer allocation on s390 Niklas Schnelle
2020-04-07  8:01 ` [RFC 1/1] net/mlx5: Fix failing " Niklas Schnelle
2020-04-08 19:36   ` Saeed Mahameed
  -- strict thread matches above, loose matches on Subject: below --
2020-03-18 16:44 [RFC 0/1] net/mlx5: Failing " Niklas Schnelle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).