All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joao Pinto <Joao.Pinto-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
To: Majd Dibbiny <majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Joao Pinto <Joao.Pinto-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>,
	Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Issue with Infiniband / MLX5 IB driver when running opensm
Date: Thu, 1 Jun 2017 19:40:21 +0100	[thread overview]
Message-ID: <4bad8be6-4179-00e2-4ad9-7c2edad77810@synopsys.com> (raw)
In-Reply-To: <455d9539-8284-7e8d-fe8b-17035b511e9d-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>

Hello,

I am trying to bring up a Connect-X 5 Ex and I am getting an issue when
executing opensm when the infiniband cables are connected (connected from one
port to the other). Could you please give me an hint of what might be hapenning?

# ibstat
CA 'mlx5_0'
        CA type: MT4119
        Number of ports: 1
        Firmware version: 16.19.2244
        Hardware version: 0
        Node GUID: 0x248a0703009ad906
        System image GUID: 0x248a0703009ad906
        Port 1:
                State: Initializing
                Physical state: LinkUp
                Rate: 56
                Base lid: 65535
                LMC: 0
                SM lid: 0
                Capability mask: 0x2651e848
                Port GUID: 0x248a0703009ad906
                Link layer: InfiniBand
CA 'mlx5_1'
        CA type: MT4119
        Number of ports: 1
        Firmware version: 16.19.2244
        Hardware version: 0
        Node GUID: 0x248a0703009ad907
        System image GUID: 0x248a0703009ad906
        Port 1:
                State: Initializing
                Physical state: LinkUp
                Rate: 56
                Base lid: 65535
                LMC: 0
                SM lid: 0
                Capability mask: 0x2651e848
                Port GUID: 0x248a0703009ad907
                Link layer: InfiniBand
#
#
# which opensm
/usr/sbin/opensm
# opensm -g 0x248a0703009ad906 &
# -------------------------------------------------
OpenSM 3.3.20
Command Line Arguments:
 Guid <0x248a0703009ad906>
 Log File: /var/log/opensm.log
-------------------------------------------------
OpenSM 3.3.20

Entering DISCOVERING state

------------[ cut here ]------------
WARNING: CPU: 0 PID: 128 at drivers/infiniband/hw/mlx5/mad.c:263
mlx5_ib_process_mad+0x1a6/0x64c
Modules linked in:
CPU: 0 PID: 128 Comm: kworker/0:1H Not tainted
4.12.0-MLNX20170524-ge176cc5-dirty #22
Workqueue: ib-comp-wq ib_cq_poll_work

Stack Trace:
  arc_unwind_core.constprop.2+0xb4/0x100
  warn_slowpath_null+0x48/0xe4
  mlx5_ib_process_mad+0x1a6/0x64c
  ib_mad_recv_done+0x352/0xa7c
  ib_cq_poll_work+0x72/0x130
  process_one_work+0x1c8/0x390
  worker_thread+0x120/0x540
  kthread+0x116/0x13c
  ret_from_fork+0x18/0x1c
---[ end trace 942bc9d60690df3b ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 128 at mm/page_alloc.c:3689
__alloc_pages_nodemask+0x18ec/0x24e4
Modules linked in:
CPU: 0 PID: 128 Comm: kworker/0:1H Tainted: G        W
4.12.0-MLNX20170524-ge176cc5-dirty #22
Workqueue: ib-comp-wq ib_cq_poll_work

Stack Trace:
  arc_unwind_core.constprop.2+0xb4/0x100
  warn_slowpath_null+0x48/0xe4
  __alloc_pages_nodemask+0x18ec/0x24e4
  kmalloc_order+0x16/0x28
  alloc_mad_private+0x12/0x20
  ib_mad_recv_done+0x2bc/0xa7c
  ib_cq_poll_work+0x72/0x130
  process_one_work+0x1c8/0x390
  worker_thread+0x120/0x540
  kthread+0x116/0x13c
  ret_from_fork+0x18/0x1c
---[ end trace 942bc9d60690df3c ]---
BUG: Bad rss-counter state mm:9672c000 idx:1 val:11
BUG: Bad rss-counter state mm:9672c000 idx:3 val:84
BUG: non-zero nr_ptes on freeing mm: 3
Path: /bin/busybox
CPU: 0 PID: 82 Comm: klogd Tainted: G        W
4.12.0-MLNX20170524-ge176cc5-dirty #22
task: 8fe0e3c0 task.stack: 8fe02000

[ECR   ]: 0x00220100 => Invalid Read @ 0x00008088 by insn @ 0x8124babc
[EFA   ]: 0x00008088
[BLINK ]: __d_alloc+0x2c/0x1cc
[ERET  ]: kmem_cache_alloc+0x4c/0xe8
------------[ cut here ]------------
WARNING: CPU: 0 PID: 128 at kernel/workqueue.c:1080 worker_thread+0x120/0x540
Modules linked in:
CPU: 0 PID: 128 Comm: kworker/0:1H Tainted: G        W
4.12.0-MLNX20170524-ge176cc5-dirty #22
------------[ cut here ]------------
WARNING: CPU: 0 PID: 128 at kernel/workqueue.c:1436 __queue_work+0x3e2/0x3e8
workqueue: per-cpu pwq for ib-comp-wq on cpu0 has 0 refcnt
Modules linked in:
CPU: 0 PID: 128 Comm: kworker/0:1H Tainted: G        W
4.12.0-MLNX20170524-ge176cc5-dirty #22

Stack Trace:
  arc_unwind_core.constprop.2+0xb4/0x100
  warn_slowpath_fmt+0x6c/0x110
  __queue_work+0x3e2/0x3e8
  queue_work_on+0x40/0x48
  mlx5_cq_completion+0x62/0xd8
  mlx5_eq_int+0x2dc/0x3a8
  __handle_irq_event_percpu+0xb8/0x150
  handle_irq_event+0x44/0x8c
  handle_simple_irq+0x5c/0xa4
  generic_handle_irq+0x1c/0x2c
  dw_handle_msi_irq+0x5a/0xd4
  dw_chained_msi_isr+0x26/0x78
  generic_handle_irq+0x1c/0x2c
  dw_apb_ictl_handler+0x7e/0xf8
  __handle_domain_irq+0x56/0x98
  handle_interrupt_level1+0xcc/0xd8
---[ end trace 942bc9d60690df3d ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 128 at kernel/workqueue.c:1064 __queue_work+0x31c/0x3e8
Modules linked in:
CPU: 0 PID: 128 Comm: kworker/0:1H Tainted: G        W
4.12.0-MLNX20170524-ge176cc5-dirty #22

Stack Trace:
  arc_unwind_core.constprop.2+0xb4/0x100
  warn_slowpath_null+0x48/0xe4
  __queue_work+0x31c/0x3e8
  queue_work_on+0x40/0x48
  mlx5_cq_completion+0x62/0xd8
  mlx5_eq_int+0x2dc/0x3a8
  __handle_irq_event_percpu+0xb8/0x150
  handle_irq_event+0x44/0x8c
  handle_simple_irq+0x5c/0xa4
  generic_handle_irq+0x1c/0x2c
  dw_handle_msi_irq+0x5a/0xd4
  dw_chained_msi_isr+0x26/0x78
  generic_handle_irq+0x1c/0x2c
  dw_apb_ictl_handler+0x7e/0xf8
  __handle_domain_irq+0x56/0x98
  handle_interrupt_level1+0xcc/0xd8
---[ end trace 942bc9d60690df3e ]---

Stack Trace:
  arc_unwind_core.constprop.2+0xb4/0x100
  warn_slowpath_null+0x48/0xe4
  worker_thread+0x120/0x540
  kthread+0x116/0x13c
  ret_from_fork+0x18/0x1c
---[ end trace 942bc9d60690df3f ]---
[STAT32]: 0x00000406 : K         E2 E1
BTA: 0x8124ba86  SP: 0x8fe03dec  FP: 0x00000000
LPS: 0x81274348 LPE: 0x81274354 LPC: 0x00000000
r00: 0x00008088 r01: 0x014000c0 r02: 0x00008088
r03: 0x00001b1a r04: 0x00000000 r05: 0x00000806
r06: 0x9a19cea0 r07: 0x00000005 r08: 0x00000054
r09: 0x00000000 r10: 0x00000000 r11: 0x2000a038
r12: 0x00000000

Stack Trace:
  kmem_cache_alloc+0x4c/0xe8
  __d_alloc+0x2c/0x1cc
  d_alloc_parallel+0x46/0x3f8
  path_openat+0xd48/0x132c
  do_filp_open+0x44/0xc0
  SyS_openat+0x144/0x1d4
  EV_Trap+0x11c/0x120


Thank you and best regards,

Joao Pinto

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      parent reply	other threads:[~2017-06-01 18:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-31 15:59 Issue with MLX5 IB driver Joao Pinto
     [not found] ` <ae8a8bbf-edb5-1909-824c-f98384f506b0-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
2017-05-31 16:18   ` Leon Romanovsky
     [not found]     ` <20170531161819.GK5406-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-05-31 16:39       ` Majd Dibbiny
2017-05-31 19:44       ` Christoph Hellwig
     [not found]         ` <20170531194426.GA23120-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2017-06-01  4:30           ` Leon Romanovsky
     [not found]             ` <20170601043013.GN5406-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-06-01 10:05               ` Joao Pinto
     [not found]                 ` <09d8f6bc-5994-82d1-9a0f-59540b6c525f-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
2017-06-01 11:18                   ` Joao Pinto
     [not found]                     ` <fbb4b7cb-e3e4-b540-22e4-5d920857e8fe-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
2017-06-01 11:57                       ` Majd Dibbiny
     [not found]                         ` <52727D4A-F647-4924-8DF0-4D7F248626AA-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-06-01 11:59                           ` Joao Pinto
     [not found]                             ` <7a4e8dce-f1af-d664-bb0b-062f84b45b60-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
2017-06-01 12:07                               ` Majd Dibbiny
     [not found]                                 ` <E798E910-E897-4C14-9161-BE1220D412DF-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-06-01 12:08                                   ` Joao Pinto
     [not found]                                     ` <455d9539-8284-7e8d-fe8b-17035b511e9d-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
2017-06-01 18:40                                       ` Joao Pinto [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4bad8be6-4179-00e2-4ad9-7c2edad77810@synopsys.com \
    --to=joao.pinto-hkixbcoqz3hwk0htik3j/w@public.gmane.org \
    --cc=leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.