linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: General protection fault in iscsi_rx_thread_pre_handler
       [not found] ` <1421920228.7061.48.camel@haakon3.risingtidesystems.com>
@ 2015-01-22 15:56   ` Gavin Guo
  2015-01-22 17:35     ` Nicholas A. Bellinger
  0 siblings, 1 reply; 9+ messages in thread
From: Gavin Guo @ 2015-01-22 15:56 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-scsi, target-devel, linux-kernel

Hi Nicolas,

On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
<nab@linux-iscsi.org> wrote:
> Hi Gavin,
>
> On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
>> Hi all,
>>
>> The general protection fault screenshot is attached.
>>
>> Summary:
>> The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
>> the fault is in list_del of iscsi_del_ts_from_active_list. And it
>> looks like deleting the iscsi_thread_set *ts two times. The point to
>> delete including iscsi_get_ts_from_inactive_list, was also checked but
>> still can't find the clue. Really appreciate if anyone can provide any
>> idea on the bug.
>>
>> static void iscsi_del_ts_from_active_list(struct iscsi_thread_set *ts)
>> {
>> <...>
>> list_del(&ts->ts_list);
>> <...>
>> }
>> static inline void list_del(struct list_head *entry)
>> {
>> __list_del(entry->prev, entry->next);
>> entry->next = LIST_POISON1;
>> entry->prev = LIST_POISON2;
>> }
>>
>>
>> static inline void __list_del(struct list_head * prev, struct list_head * next)
>> {
>> next->prev = prev;
>> prev->next = next;
>> }
>>
>> According coredump is trace3.png. The %rdx is ts->ts_list->next
>> (0xdead000000100100, LIST_POISON1), %rax is ts->ts_list->prev
>> (0xdead000000200200, LIST_POISON2). When the “next->prev = prev;”
>> executes, it’s the instruction:
>>
>> 48 89 42 08 mov %rax,0x8(%rdx)
>>
>> The %rdx is the value (0xdead000000100100, LIST_POISON1). So, general
>> protection fault happened. List_del() is the one of the only three
>> points to set LIST_POISON1/2. The other two are hlist_bl_del() and
>> hlist_del(). The root cause has high possibility related to calling
>> __list_del() twice for deleting the ts->ts_list.
>>
>> Detailed analysis:
>>
>> 00000000000057a0 <iscsi_del_ts_from_active_list>:
>> __list_del():
>> /build/buildd/linux-3.13.0/drivers/target/iscsi/iscsi_target_tq.c:50
>> 57a0: e8 00 00 00 00 callq 57a5 <iscsi_del_ts_from_active_list+0
>> x5>
>> list_del():
>> 57a5: 55 push %rbp
>> 57a6: 48 89 e5 mov %rsp,%rbp
>> 57a9: 53 push %rbx
>> 57aa: 48 89 fb mov %rdi,%rbx <--iscsi_thread_set *ts
>> /build/buildd/linux-3.13.0/include/linux/spinlock.h:293
>> 57ad: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
>> 57b4: e8 00 00 00 00 callq 57b9 <iscsi_del_ts_from_active_list+0
>> x19>
>>
>> __list_del(entry->prev, entry->next);
>>
>> /build/buildd/linux-3.13.0/include/linux/list.h:106
>> 57b9: 48 8b 83 c8 00 00 00 mov 0xc8(%rbx),%rax <--ts->ts_list->prev
>> 57c0: 48 8b 93 c0 00 00 00 mov 0xc0(%rbx),%rdx <--ts->ts_list->next
>> iscsi_del_ts_from_active_list():
>> /build/buildd/linux-3.13.0/include/linux/spinlock.h:333
>> 57c7: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
>> /build/buildd/linux-3.13.0/include/linux/list.h:88
>> 57ce: 48 89 42 08 mov %rax,0x8(%rdx) ts->ts_list->next->prev = ts->ts_list->prev
>> spin_unlock():
>> /build/buildd/linux-3.13.0/include/linux/list.h:89
>> 57d2: 48 89 10 mov %rdx,(%rax) ts->ts_list->prev->next = ts->ts_list->next
>>
>> entry->next = LIST_POISON1;
>>
>> /build/buildd/linux-3.13.0/include/linux/list.h:107
>> 57d5: 48 b8 00 01 10 00 00 movabs $0xdead000000100100,%rax
>> 57dc: 00 ad de
>> iscsi_del_ts_from_active_list():
>> 57df: 48 89 83 c0 00 00 00 mov %rax,0xc0(%rbx)
>>
>> entry->prev = LIST_POISON2;
>>
>> iscsi_deallocate_thread_one():
>> /build/buildd/linux-3.13.0/include/linux/list.h:108
>> 57e6: 48 b8 00 02 20 00 00 movabs $0xdead000000200200,%rax
>> 57ed: 00 ad de
>> 57f0: 48 89 83 c8 00 00 00 mov %rax,0xc8(%rbx)
>>
>
> Thanks for your detailed analysis.
>
> A similar bug was reported off-list some months back by a person using
> iser-target + RoCE export on v3.12.y code.  Just to confirm, your
> environment is using traditional iscsi-target + TCP export, right..?

I am sorry that I'm not an expert of the field and already google RoCE
on the internet but still don't really know what RoCE is. However, I
can provide the informations.  We used iscsiadm on the initiator side
and lio_node and tcm_node commands to create the targets for
connection. I think it should be normal  iscsi-target using TCP
export.

>
> At the time, a different set of iser-target related changes ended up
> avoiding this issue on his particular setup, so we thought it was likely
> a race triggered by login failures specific to iser-target code.
>
> There was a untested patch (included inline below) to drop the legacy
> active_ts_list usage all-together, but IIRC he was not able to reproduce
> further so the patch didn't get picked up for mainline.
>
> If your able to reliability reproduce, please try with the following
> patch and let us know your progress.

Thanks for your time reading the mail. I'll let you know the result.

>
> Thank you,
>
> --nab
>
> From 33f211fcf0f4149b13de826dcbe204241f71b2e8 Mon Sep 17 00:00:00 2001
> From: Nicholas Bellinger <nab@linux-iscsi.org>
> Date: Thu, 22 Jan 2015 00:56:53 -0800
> Subject: [PATCH] iscsi-target: Drop problematic active_ts_list usage
>
> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> ---
>  drivers/target/iscsi/iscsi_target_tq.c | 28 +++++-----------------------
>  1 file changed, 5 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/target/iscsi/iscsi_target_tq.c b/drivers/target/iscsi/iscsi_target_tq.c
> index 601e9cc..bb2890e 100644
> --- a/drivers/target/iscsi/iscsi_target_tq.c
> +++ b/drivers/target/iscsi/iscsi_target_tq.c
> @@ -24,36 +24,22 @@
>  #include "iscsi_target_tq.h"
>  #include "iscsi_target.h"
>
> -static LIST_HEAD(active_ts_list);
>  static LIST_HEAD(inactive_ts_list);
> -static DEFINE_SPINLOCK(active_ts_lock);
>  static DEFINE_SPINLOCK(inactive_ts_lock);
>  static DEFINE_SPINLOCK(ts_bitmap_lock);
>
> -static void iscsi_add_ts_to_active_list(struct iscsi_thread_set *ts)
> -{
> -       spin_lock(&active_ts_lock);
> -       list_add_tail(&ts->ts_list, &active_ts_list);
> -       iscsit_global->active_ts++;
> -       spin_unlock(&active_ts_lock);
> -}
> -
>  static void iscsi_add_ts_to_inactive_list(struct iscsi_thread_set *ts)
>  {
> +       if (!list_empty(&ts->ts_list)) {
> +               WARN_ON(1);
> +               return;
> +       }
>         spin_lock(&inactive_ts_lock);
>         list_add_tail(&ts->ts_list, &inactive_ts_list);
>         iscsit_global->inactive_ts++;
>         spin_unlock(&inactive_ts_lock);
>  }
>
> -static void iscsi_del_ts_from_active_list(struct iscsi_thread_set *ts)
> -{
> -       spin_lock(&active_ts_lock);
> -       list_del(&ts->ts_list);
> -       iscsit_global->active_ts--;
> -       spin_unlock(&active_ts_lock);
> -}
> -
>  static struct iscsi_thread_set *iscsi_get_ts_from_inactive_list(void)
>  {
>         struct iscsi_thread_set *ts;
> @@ -66,7 +52,7 @@ static struct iscsi_thread_set *iscsi_get_ts_from_inactive_list(void)
>
>         ts = list_first_entry(&inactive_ts_list, struct iscsi_thread_set, ts_list);
>
> -       list_del(&ts->ts_list);
> +       list_del_init(&ts->ts_list);
>         iscsit_global->inactive_ts--;
>         spin_unlock(&inactive_ts_lock);
>
> @@ -204,8 +190,6 @@ static void iscsi_deallocate_extra_thread_sets(void)
>
>  void iscsi_activate_thread_set(struct iscsi_conn *conn, struct iscsi_thread_set *ts)
>  {
> -       iscsi_add_ts_to_active_list(ts);
> -
>         spin_lock_bh(&ts->ts_state_lock);
>         conn->thread_set = ts;
>         ts->conn = conn;
> @@ -397,7 +381,6 @@ struct iscsi_conn *iscsi_rx_thread_pre_handler(struct iscsi_thread_set *ts)
>
>         if (ts->delay_inactive && (--ts->thread_count == 0)) {
>                 spin_unlock_bh(&ts->ts_state_lock);
> -               iscsi_del_ts_from_active_list(ts);
>
>                 if (!iscsit_global->in_shutdown)
>                         iscsi_deallocate_extra_thread_sets();
> @@ -452,7 +435,6 @@ struct iscsi_conn *iscsi_tx_thread_pre_handler(struct iscsi_thread_set *ts)
>
>         if (ts->delay_inactive && (--ts->thread_count == 0)) {
>                 spin_unlock_bh(&ts->ts_state_lock);
> -               iscsi_del_ts_from_active_list(ts);
>
>                 if (!iscsit_global->in_shutdown)
>                         iscsi_deallocate_extra_thread_sets();
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General protection fault in iscsi_rx_thread_pre_handler
  2015-01-22 15:56   ` General protection fault in iscsi_rx_thread_pre_handler Gavin Guo
@ 2015-01-22 17:35     ` Nicholas A. Bellinger
  2015-01-23  1:30       ` Gavin Guo
  0 siblings, 1 reply; 9+ messages in thread
From: Nicholas A. Bellinger @ 2015-01-22 17:35 UTC (permalink / raw)
  To: Gavin Guo; +Cc: linux-scsi, target-devel, linux-kernel

On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
> Hi Nicolas,
> 
> On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
> <nab@linux-iscsi.org> wrote:
> > Hi Gavin,
> >
> > On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
> >> Hi all,
> >>
> >> The general protection fault screenshot is attached.
> >>
> >> Summary:
> >> The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
> >> the fault is in list_del of iscsi_del_ts_from_active_list. And it
> >> looks like deleting the iscsi_thread_set *ts two times. The point to
> >> delete including iscsi_get_ts_from_inactive_list, was also checked but
> >> still can't find the clue. Really appreciate if anyone can provide any
> >> idea on the bug.
> >>

<SNIP>

> >
> > Thanks for your detailed analysis.
> >
> > A similar bug was reported off-list some months back by a person using
> > iser-target + RoCE export on v3.12.y code.  Just to confirm, your
> > environment is using traditional iscsi-target + TCP export, right..?
> 
> I am sorry that I'm not an expert of the field and already google RoCE
> on the internet but still don't really know what RoCE is. However, I
> can provide the informations.  We used iscsiadm on the initiator side
> and lio_node and tcm_node commands to create the targets for
> connection. I think it should be normal  iscsi-target using TCP
> export.
> 

Yep, that would be traditional iscsi-target + TCP export.

> >
> > At the time, a different set of iser-target related changes ended up
> > avoiding this issue on his particular setup, so we thought it was likely
> > a race triggered by login failures specific to iser-target code.
> >
> > There was a untested patch (included inline below) to drop the legacy
> > active_ts_list usage all-together, but IIRC he was not able to reproduce
> > further so the patch didn't get picked up for mainline.
> >
> > If your able to reliability reproduce, please try with the following
> > patch and let us know your progress.
> 
> Thanks for your time reading the mail. I'll let you know the result.

Just curious, are you able to reliability reproduce this bug in a VM..?

--nab


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General protection fault in iscsi_rx_thread_pre_handler
  2015-01-22 17:35     ` Nicholas A. Bellinger
@ 2015-01-23  1:30       ` Gavin Guo
  2015-01-30 22:53         ` Nicholas A. Bellinger
  0 siblings, 1 reply; 9+ messages in thread
From: Gavin Guo @ 2015-01-23  1:30 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-scsi, target-devel, linux-kernel

Hi Nicholas,

On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
<nab@linux-iscsi.org> wrote:
> On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
>> Hi Nicolas,
>>
>> On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
>> <nab@linux-iscsi.org> wrote:
>> > Hi Gavin,
>> >
>> > On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
>> >> Hi all,
>> >>
>> >> The general protection fault screenshot is attached.
>> >>
>> >> Summary:
>> >> The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
>> >> the fault is in list_del of iscsi_del_ts_from_active_list. And it
>> >> looks like deleting the iscsi_thread_set *ts two times. The point to
>> >> delete including iscsi_get_ts_from_inactive_list, was also checked but
>> >> still can't find the clue. Really appreciate if anyone can provide any
>> >> idea on the bug.
>> >>
>
> <SNIP>
>
>> >
>> > Thanks for your detailed analysis.
>> >
>> > A similar bug was reported off-list some months back by a person using
>> > iser-target + RoCE export on v3.12.y code.  Just to confirm, your
>> > environment is using traditional iscsi-target + TCP export, right..?
>>
>> I am sorry that I'm not an expert of the field and already google RoCE
>> on the internet but still don't really know what RoCE is. However, I
>> can provide the informations.  We used iscsiadm on the initiator side
>> and lio_node and tcm_node commands to create the targets for
>> connection. I think it should be normal  iscsi-target using TCP
>> export.
>>
>
> Yep, that would be traditional iscsi-target + TCP export.
>
>> >
>> > At the time, a different set of iser-target related changes ended up
>> > avoiding this issue on his particular setup, so we thought it was likely
>> > a race triggered by login failures specific to iser-target code.
>> >
>> > There was a untested patch (included inline below) to drop the legacy
>> > active_ts_list usage all-together, but IIRC he was not able to reproduce
>> > further so the patch didn't get picked up for mainline.
>> >
>> > If your able to reliability reproduce, please try with the following
>> > patch and let us know your progress.
>>
>> Thanks for your time reading the mail. I'll let you know the result.
>
> Just curious, are you able to reliability reproduce this bug in a VM..?

Thanks for your caring, the machine is on the customer side, I've
asked and now waiting for their response.

>
> --nab
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General protection fault in iscsi_rx_thread_pre_handler
  2015-01-23  1:30       ` Gavin Guo
@ 2015-01-30 22:53         ` Nicholas A. Bellinger
  2015-02-01  3:47           ` Gavin Guo
  0 siblings, 1 reply; 9+ messages in thread
From: Nicholas A. Bellinger @ 2015-01-30 22:53 UTC (permalink / raw)
  To: Gavin Guo; +Cc: linux-scsi, target-devel, linux-kernel

On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote:
> Hi Nicholas,
> 
> On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
> <nab@linux-iscsi.org> wrote:
> > On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
> >> Hi Nicolas,
> >>
> >> On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
> >> <nab@linux-iscsi.org> wrote:
> >> > Hi Gavin,
> >> >
> >> > On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
> >> >> Hi all,
> >> >>
> >> >> The general protection fault screenshot is attached.
> >> >>
> >> >> Summary:
> >> >> The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
> >> >> the fault is in list_del of iscsi_del_ts_from_active_list. And it
> >> >> looks like deleting the iscsi_thread_set *ts two times. The point to
> >> >> delete including iscsi_get_ts_from_inactive_list, was also checked but
> >> >> still can't find the clue. Really appreciate if anyone can provide any
> >> >> idea on the bug.
> >> >>
> >
> > <SNIP>
> >
> >> >
> >> > Thanks for your detailed analysis.
> >> >
> >> > A similar bug was reported off-list some months back by a person using
> >> > iser-target + RoCE export on v3.12.y code.  Just to confirm, your
> >> > environment is using traditional iscsi-target + TCP export, right..?
> >>
> >> I am sorry that I'm not an expert of the field and already google RoCE
> >> on the internet but still don't really know what RoCE is. However, I
> >> can provide the informations.  We used iscsiadm on the initiator side
> >> and lio_node and tcm_node commands to create the targets for
> >> connection. I think it should be normal  iscsi-target using TCP
> >> export.
> >>
> >
> > Yep, that would be traditional iscsi-target + TCP export.
> >
> >> >
> >> > At the time, a different set of iser-target related changes ended up
> >> > avoiding this issue on his particular setup, so we thought it was likely
> >> > a race triggered by login failures specific to iser-target code.
> >> >
> >> > There was a untested patch (included inline below) to drop the legacy
> >> > active_ts_list usage all-together, but IIRC he was not able to reproduce
> >> > further so the patch didn't get picked up for mainline.
> >> >
> >> > If your able to reliability reproduce, please try with the following
> >> > patch and let us know your progress.
> >>
> >> Thanks for your time reading the mail. I'll let you know the result.
> >
> > Just curious, are you able to reliability reproduce this bug in a VM..?
> 
> Thanks for your caring, the machine is on the customer side, I've
> asked and now waiting for their response.

Hi Gavin,

Just curious if there has been any update on this yet..?

--nab


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General protection fault in iscsi_rx_thread_pre_handler
  2015-01-30 22:53         ` Nicholas A. Bellinger
@ 2015-02-01  3:47           ` Gavin Guo
  2015-02-03  0:28             ` Gavin Guo
  0 siblings, 1 reply; 9+ messages in thread
From: Gavin Guo @ 2015-02-01  3:47 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-scsi, target-devel, linux-kernel

Hi Nicholas,

On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger
<nab@linux-iscsi.org> wrote:
> On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote:
>> Hi Nicholas,
>>
>> On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
>> <nab@linux-iscsi.org> wrote:
>> > On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
>> >> Hi Nicolas,
>> >>
>> >> On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
>> >> <nab@linux-iscsi.org> wrote:
>> >> > Hi Gavin,
>> >> >
>> >> > On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
>> >> >> Hi all,
>> >> >>
>> >> >> The general protection fault screenshot is attached.
>> >> >>
>> >> >> Summary:
>> >> >> The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
>> >> >> the fault is in list_del of iscsi_del_ts_from_active_list. And it
>> >> >> looks like deleting the iscsi_thread_set *ts two times. The point to
>> >> >> delete including iscsi_get_ts_from_inactive_list, was also checked but
>> >> >> still can't find the clue. Really appreciate if anyone can provide any
>> >> >> idea on the bug.
>> >> >>
>> >
>> > <SNIP>
>> >
>> >> >
>> >> > Thanks for your detailed analysis.
>> >> >
>> >> > A similar bug was reported off-list some months back by a person using
>> >> > iser-target + RoCE export on v3.12.y code.  Just to confirm, your
>> >> > environment is using traditional iscsi-target + TCP export, right..?
>> >>
>> >> I am sorry that I'm not an expert of the field and already google RoCE
>> >> on the internet but still don't really know what RoCE is. However, I
>> >> can provide the informations.  We used iscsiadm on the initiator side
>> >> and lio_node and tcm_node commands to create the targets for
>> >> connection. I think it should be normal  iscsi-target using TCP
>> >> export.
>> >>
>> >
>> > Yep, that would be traditional iscsi-target + TCP export.
>> >
>> >> >
>> >> > At the time, a different set of iser-target related changes ended up
>> >> > avoiding this issue on his particular setup, so we thought it was likely
>> >> > a race triggered by login failures specific to iser-target code.
>> >> >
>> >> > There was a untested patch (included inline below) to drop the legacy
>> >> > active_ts_list usage all-together, but IIRC he was not able to reproduce
>> >> > further so the patch didn't get picked up for mainline.
>> >> >
>> >> > If your able to reliability reproduce, please try with the following
>> >> > patch and let us know your progress.
>> >>
>> >> Thanks for your time reading the mail. I'll let you know the result.
>> >
>> > Just curious, are you able to reliability reproduce this bug in a VM..?
>>
>> Thanks for your caring, the machine is on the customer side, I've
>> asked and now waiting for their response.
>
> Hi Gavin,
>
> Just curious if there has been any update on this yet..?
>
> --nab
>

Really thanks for your attention. I'm also currently waiting for the
customer's reply and will send the email again to ask for the result.
However, I think the symptom may be hard to replicate that's why the
customer didn't reply me for a long time. Thanks for your time again.

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General protection fault in iscsi_rx_thread_pre_handler
  2015-02-01  3:47           ` Gavin Guo
@ 2015-02-03  0:28             ` Gavin Guo
  2015-02-12  7:16               ` Nicholas A. Bellinger
  0 siblings, 1 reply; 9+ messages in thread
From: Gavin Guo @ 2015-02-03  0:28 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-scsi, target-devel, linux-kernel

Hi Nicholas,

On Sun, Feb 1, 2015 at 11:47 AM, Gavin Guo <gavin.guo@canonical.com> wrote:
> Hi Nicholas,
>
> On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger
> <nab@linux-iscsi.org> wrote:
>> On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote:
>>> Hi Nicholas,
>>>
>>> On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
>>> <nab@linux-iscsi.org> wrote:
>>> > On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
>>> >> Hi Nicolas,
>>> >>
>>> >> On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
>>> >> <nab@linux-iscsi.org> wrote:
>>> >> > Hi Gavin,
>>> >> >
>>> >> > On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
>>> >> >> Hi all,
>>> >> >>
>>> >> >> The general protection fault screenshot is attached.
>>> >> >>
>>> >> >> Summary:
>>> >> >> The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
>>> >> >> the fault is in list_del of iscsi_del_ts_from_active_list. And it
>>> >> >> looks like deleting the iscsi_thread_set *ts two times. The point to
>>> >> >> delete including iscsi_get_ts_from_inactive_list, was also checked but
>>> >> >> still can't find the clue. Really appreciate if anyone can provide any
>>> >> >> idea on the bug.
>>> >> >>
>>> >
>>> > <SNIP>
>>> >
>>> >> >
>>> >> > Thanks for your detailed analysis.
>>> >> >
>>> >> > A similar bug was reported off-list some months back by a person using
>>> >> > iser-target + RoCE export on v3.12.y code.  Just to confirm, your
>>> >> > environment is using traditional iscsi-target + TCP export, right..?
>>> >>
>>> >> I am sorry that I'm not an expert of the field and already google RoCE
>>> >> on the internet but still don't really know what RoCE is. However, I
>>> >> can provide the informations.  We used iscsiadm on the initiator side
>>> >> and lio_node and tcm_node commands to create the targets for
>>> >> connection. I think it should be normal  iscsi-target using TCP
>>> >> export.
>>> >>
>>> >
>>> > Yep, that would be traditional iscsi-target + TCP export.
>>> >
>>> >> >
>>> >> > At the time, a different set of iser-target related changes ended up
>>> >> > avoiding this issue on his particular setup, so we thought it was likely
>>> >> > a race triggered by login failures specific to iser-target code.
>>> >> >
>>> >> > There was a untested patch (included inline below) to drop the legacy
>>> >> > active_ts_list usage all-together, but IIRC he was not able to reproduce
>>> >> > further so the patch didn't get picked up for mainline.
>>> >> >
>>> >> > If your able to reliability reproduce, please try with the following
>>> >> > patch and let us know your progress.
>>> >>
>>> >> Thanks for your time reading the mail. I'll let you know the result.
>>> >
>>> > Just curious, are you able to reliability reproduce this bug in a VM..?
>>>
>>> Thanks for your caring, the machine is on the customer side, I've
>>> asked and now waiting for their response.
>>
>> Hi Gavin,
>>
>> Just curious if there has been any update on this yet..?
>>
>> --nab
>>
>
> Really thanks for your attention. I'm also currently waiting for the
> customer's reply and will send the email again to ask for the result.
> However, I think the symptom may be hard to replicate that's why the
> customer didn't reply me for a long time. Thanks for your time again.
>
> Thanks,
> Gavin

Sorry for making you wait so long. I just got the response from the
customer, they said the general protection fault happened just 2 times
in the past and cannot be reliably reproduced. And I am now waiting
for the verification test.

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General protection fault in iscsi_rx_thread_pre_handler
  2015-02-03  0:28             ` Gavin Guo
@ 2015-02-12  7:16               ` Nicholas A. Bellinger
  2015-02-16 10:52                 ` Gavin Guo
  0 siblings, 1 reply; 9+ messages in thread
From: Nicholas A. Bellinger @ 2015-02-12  7:16 UTC (permalink / raw)
  To: Gavin Guo; +Cc: linux-scsi, target-devel, linux-kernel

Hi Gavin,

On Tue, 2015-02-03 at 08:28 +0800, Gavin Guo wrote:
> Hi Nicholas,
> 
> On Sun, Feb 1, 2015 at 11:47 AM, Gavin Guo <gavin.guo@canonical.com> wrote:
> > Hi Nicholas,
> >
> > On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger
> > <nab@linux-iscsi.org> wrote:
> >> On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote:
> >>> Hi Nicholas,
> >>>
> >>> On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
> >>> <nab@linux-iscsi.org> wrote:
> >>> > On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
> >>> >> Hi Nicolas,
> >>> >>
> >>> >> On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger

<SNIP>

> >>> >> > At the time, a different set of iser-target related changes ended up
> >>> >> > avoiding this issue on his particular setup, so we thought it was likely
> >>> >> > a race triggered by login failures specific to iser-target code.
> >>> >> >
> >>> >> > There was a untested patch (included inline below) to drop the legacy
> >>> >> > active_ts_list usage all-together, but IIRC he was not able to reproduce
> >>> >> > further so the patch didn't get picked up for mainline.
> >>> >> >
> >>> >> > If your able to reliability reproduce, please try with the following
> >>> >> > patch and let us know your progress.
> >>> >>
> >>> >> Thanks for your time reading the mail. I'll let you know the result.
> >>> >
> >>> > Just curious, are you able to reliability reproduce this bug in a VM..?
> >>>
> >>> Thanks for your caring, the machine is on the customer side, I've
> >>> asked and now waiting for their response.
> >>
> >> Hi Gavin,
> >>
> >> Just curious if there has been any update on this yet..?
> >>
> >> --nab
> >>
> >
> > Really thanks for your attention. I'm also currently waiting for the
> > customer's reply and will send the email again to ask for the result.
> > However, I think the symptom may be hard to replicate that's why the
> > customer didn't reply me for a long time. Thanks for your time again.
> >
> > Thanks,
> > Gavin
> 
> Sorry for making you wait so long. I just got the response from the
> customer, they said the general protection fault happened just 2 times
> in the past and cannot be reliably reproduced. And I am now waiting
> for the verification test.
> 

Just a heads up that I'm planning to include this patch in the v3.20-rc1
PULL request.

Please let me know if you have any objections.

Thank you,

--nab


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General protection fault in iscsi_rx_thread_pre_handler
  2015-02-12  7:16               ` Nicholas A. Bellinger
@ 2015-02-16 10:52                 ` Gavin Guo
  2015-02-16 10:56                   ` Gavin Guo
  0 siblings, 1 reply; 9+ messages in thread
From: Gavin Guo @ 2015-02-16 10:52 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-scsi, target-devel, linux-kernel

Hi Nicholas,

On Thu, Feb 12, 2015 at 3:16 PM, Nicholas A. Bellinger
<nab@linux-iscsi.org> wrote:
> Hi Gavin,
>
> On Tue, 2015-02-03 at 08:28 +0800, Gavin Guo wrote:
>> Hi Nicholas,
>>
>> On Sun, Feb 1, 2015 at 11:47 AM, Gavin Guo <gavin.guo@canonical.com> wrote:
>> > Hi Nicholas,
>> >
>> > On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger
>> > <nab@linux-iscsi.org> wrote:
>> >> On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote:
>> >>> Hi Nicholas,
>> >>>
>> >>> On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
>> >>> <nab@linux-iscsi.org> wrote:
>> >>> > On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
>> >>> >> Hi Nicolas,
>> >>> >>
>> >>> >> On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
>
> <SNIP>
>
>> >>> >> > At the time, a different set of iser-target related changes ended up
>> >>> >> > avoiding this issue on his particular setup, so we thought it was likely
>> >>> >> > a race triggered by login failures specific to iser-target code.
>> >>> >> >
>> >>> >> > There was a untested patch (included inline below) to drop the legacy
>> >>> >> > active_ts_list usage all-together, but IIRC he was not able to reproduce
>> >>> >> > further so the patch didn't get picked up for mainline.
>> >>> >> >
>> >>> >> > If your able to reliability reproduce, please try with the following
>> >>> >> > patch and let us know your progress.
>> >>> >>
>> >>> >> Thanks for your time reading the mail. I'll let you know the result.
>> >>> >
>> >>> > Just curious, are you able to reliability reproduce this bug in a VM..?
>> >>>
>> >>> Thanks for your caring, the machine is on the customer side, I've
>> >>> asked and now waiting for their response.
>> >>
>> >> Hi Gavin,
>> >>
>> >> Just curious if there has been any update on this yet..?
>> >>
>> >> --nab
>> >>
>> >
>> > Really thanks for your attention. I'm also currently waiting for the
>> > customer's reply and will send the email again to ask for the result.
>> > However, I think the symptom may be hard to replicate that's why the
>> > customer didn't reply me for a long time. Thanks for your time again.
>> >
>> > Thanks,
>> > Gavin
>>
>> Sorry for making you wait so long. I just got the response from the
>> customer, they said the general protection fault happened just 2 times
>> in the past and cannot be reliably reproduced. And I am now waiting
>> for the verification test.
>>
>
> Just a heads up that I'm planning to include this patch in the v3.20-rc1
> PULL request.
>
> Please let me know if you have any objections.
>
> Thank you,
>
> --nab
>

The bug

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General protection fault in iscsi_rx_thread_pre_handler
  2015-02-16 10:52                 ` Gavin Guo
@ 2015-02-16 10:56                   ` Gavin Guo
  0 siblings, 0 replies; 9+ messages in thread
From: Gavin Guo @ 2015-02-16 10:56 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-scsi, target-devel, linux-kernel

Hi Nicholas,

On Mon, Feb 16, 2015 at 6:52 PM, Gavin Guo <gavin.guo@canonical.com> wrote:
> Hi Nicholas,
>
> On Thu, Feb 12, 2015 at 3:16 PM, Nicholas A. Bellinger
> <nab@linux-iscsi.org> wrote:
>> Hi Gavin,
>>
>> On Tue, 2015-02-03 at 08:28 +0800, Gavin Guo wrote:
>>> Hi Nicholas,
>>>
>>> On Sun, Feb 1, 2015 at 11:47 AM, Gavin Guo <gavin.guo@canonical.com> wrote:
>>> > Hi Nicholas,
>>> >
>>> > On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger
>>> > <nab@linux-iscsi.org> wrote:
>>> >> On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote:
>>> >>> Hi Nicholas,
>>> >>>
>>> >>> On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
>>> >>> <nab@linux-iscsi.org> wrote:
>>> >>> > On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
>>> >>> >> Hi Nicolas,
>>> >>> >>
>>> >>> >> On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
>>
>> <SNIP>
>>
>>> >>> >> > At the time, a different set of iser-target related changes ended up
>>> >>> >> > avoiding this issue on his particular setup, so we thought it was likely
>>> >>> >> > a race triggered by login failures specific to iser-target code.
>>> >>> >> >
>>> >>> >> > There was a untested patch (included inline below) to drop the legacy
>>> >>> >> > active_ts_list usage all-together, but IIRC he was not able to reproduce
>>> >>> >> > further so the patch didn't get picked up for mainline.
>>> >>> >> >
>>> >>> >> > If your able to reliability reproduce, please try with the following
>>> >>> >> > patch and let us know your progress.
>>> >>> >>
>>> >>> >> Thanks for your time reading the mail. I'll let you know the result.
>>> >>> >
>>> >>> > Just curious, are you able to reliability reproduce this bug in a VM..?
>>> >>>
>>> >>> Thanks for your caring, the machine is on the customer side, I've
>>> >>> asked and now waiting for their response.
>>> >>
>>> >> Hi Gavin,
>>> >>
>>> >> Just curious if there has been any update on this yet..?
>>> >>
>>> >> --nab
>>> >>
>>> >
>>> > Really thanks for your attention. I'm also currently waiting for the
>>> > customer's reply and will send the email again to ask for the result.
>>> > However, I think the symptom may be hard to replicate that's why the
>>> > customer didn't reply me for a long time. Thanks for your time again.
>>> >
>>> > Thanks,
>>> > Gavin
>>>
>>> Sorry for making you wait so long. I just got the response from the
>>> customer, they said the general protection fault happened just 2 times
>>> in the past and cannot be reliably reproduced. And I am now waiting
>>> for the verification test.
>>>
>>
>> Just a heads up that I'm planning to include this patch in the v3.20-rc1
>> PULL request.
>>
>> Please let me know if you have any objections.
>>
>> Thank you,
>>
>> --nab
>>
>
> The bug

Sorry, I mistakenly press the send button last time.

The bug doesn't appear after the customer upgraded the kernel with the
patch. Really thanks for your help. I'll keep you posted if the bug
appears again.

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-02-16 10:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CA+eFSM1iir5QYsUALigzwcCPiqDGaNKBVq22unydN+-uTxZD2Q@mail.gmail.com>
     [not found] ` <1421920228.7061.48.camel@haakon3.risingtidesystems.com>
2015-01-22 15:56   ` General protection fault in iscsi_rx_thread_pre_handler Gavin Guo
2015-01-22 17:35     ` Nicholas A. Bellinger
2015-01-23  1:30       ` Gavin Guo
2015-01-30 22:53         ` Nicholas A. Bellinger
2015-02-01  3:47           ` Gavin Guo
2015-02-03  0:28             ` Gavin Guo
2015-02-12  7:16               ` Nicholas A. Bellinger
2015-02-16 10:52                 ` Gavin Guo
2015-02-16 10:56                   ` Gavin Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).