stgt.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Handle access of a target that has been removed
@ 2015-10-13 18:10 Lee Duncan
  2015-10-17 12:57 ` FUJITA Tomonori
  0 siblings, 1 reply; 7+ messages in thread
From: Lee Duncan @ 2015-10-13 18:10 UTC (permalink / raw)
  To: stgt

Hello:

I recently got a report of a tgtd core dump from our opencloud
group. The stack trace showed that a strcmp against a NULL was causing
the failure:

----------------------------------------------------------------
Program terminated with signal 11, Segmentation fault.
(gdb) bt
#0  0x00007fa701817576 in __strcmp_sse42 () from /lib64/libc.so.6
#1  0x0000000000408012 in target_find_by_name (
    name=0x6ac16f "iqn.2010-10.org.openstack:volume-e812c705-80bc-4064-a84c-5559cda8b1ca") at iscsi/target.c:216
#2  0x0000000000406042 in login_start (conn=0x6abea8) at iscsi/iscsid.c:478
#3  0x0000000000406e77 in cmnd_exec_login (conn=<optimized out>)
    at iscsi/iscsid.c:654
#4  cmnd_execute (conn=<optimized out>) at iscsi/iscsid.c:914
#5  iscsi_rx_handler (conn=0x6abea8) at iscsi/iscsid.c:2064
#6  0x0000000000409e98 in iscsi_tcp_event_handler (fd=<optimized out>, 
    events=1, data=0x63a480 <target_list>) at iscsi/iscsi_tcp.c:158
#7  0x0000000000418f1e in event_loop () at tgtd.c:272
#8  0x0000000000419405 in main (argc=1, argv=<optimized out>) at tgtd.c:438
----------------------------------------------------------------

It looks like target_find_by_name() uses tgt_targetname(), but doesn't
account for the fact that it can return a NULL when the target being
looked up does not (now) exist:

----------------------------------------------------------------
char *tgt_targetname(int tid)            
{               
        struct target *target;      
                        
        target = target_lookup(tid);
        if (!target)
                return NULL;
        
        return target->name;
}                                   
----------------------------------------------------------------

I propose the following patch to address this:

--- 
diff --git a/usr/iscsi/target.c b/usr/iscsi/target.c
index 7afd7fd0ba87..b0a36e7de7b4 100644
--- a/usr/iscsi/target.c
+++ b/usr/iscsi/target.c
@@ -369,9 +369,11 @@ void target_list_build(struct iscsi_connection *conn, char *addr, char *name)
 struct iscsi_target *target_find_by_name(const char *name)
 {
        struct iscsi_target *target;
+       char *tname;
 
        list_for_each_entry(target, &iscsi_targets_list, tlist) {
-               if (!strcmp(tgt_targetname(target->tid), name))
+               tname = tgt_targetname(target->tid);
+               if (tname && !strcmp(tname, name))
                        return target;
        }
 

-- 
Lee Duncan

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Handle access of a target that has been removed
  2015-10-13 18:10 [PATCH] Handle access of a target that has been removed Lee Duncan
@ 2015-10-17 12:57 ` FUJITA Tomonori
  2015-10-19 15:56   ` Lee Duncan
  0 siblings, 1 reply; 7+ messages in thread
From: FUJITA Tomonori @ 2015-10-17 12:57 UTC (permalink / raw)
  To: lduncan; +Cc: stgt

On Tue, 13 Oct 2015 11:10:17 -0700
Lee Duncan <lduncan@suse.com> wrote:

> Hello:
> 
> I recently got a report of a tgtd core dump from our opencloud
> group. The stack trace showed that a strcmp against a NULL was causing
> the failure:
> 
> ----------------------------------------------------------------
> Program terminated with signal 11, Segmentation fault.
> (gdb) bt
> #0  0x00007fa701817576 in __strcmp_sse42 () from /lib64/libc.so.6
> #1  0x0000000000408012 in target_find_by_name (
>     name=0x6ac16f "iqn.2010-10.org.openstack:volume-e812c705-80bc-4064-a84c-5559cda8b1ca") at iscsi/target.c:216
> #2  0x0000000000406042 in login_start (conn=0x6abea8) at iscsi/iscsid.c:478
> #3  0x0000000000406e77 in cmnd_exec_login (conn=<optimized out>)
>     at iscsi/iscsid.c:654
> #4  cmnd_execute (conn=<optimized out>) at iscsi/iscsid.c:914
> #5  iscsi_rx_handler (conn=0x6abea8) at iscsi/iscsid.c:2064
> #6  0x0000000000409e98 in iscsi_tcp_event_handler (fd=<optimized out>, 
>     events=1, data=0x63a480 <target_list>) at iscsi/iscsi_tcp.c:158
> #7  0x0000000000418f1e in event_loop () at tgtd.c:272
> #8  0x0000000000419405 in main (argc=1, argv=<optimized out>) at tgtd.c:438
> ----------------------------------------------------------------
> 
> It looks like target_find_by_name() uses tgt_targetname(), but doesn't
> account for the fact that it can return a NULL when the target being
> looked up does not (now) exist:

Thanks for the patch. But I'm confused why this happens. target with
the same tid as iscsi_target must exist?

> ----------------------------------------------------------------
> char *tgt_targetname(int tid)            
> {               
>         struct target *target;      
>                         
>         target = target_lookup(tid);
>         if (!target)
>                 return NULL;
>         
>         return target->name;
> }                                   
> ----------------------------------------------------------------
> 
> I propose the following patch to address this:
> 
> --- 
> diff --git a/usr/iscsi/target.c b/usr/iscsi/target.c
> index 7afd7fd0ba87..b0a36e7de7b4 100644
> --- a/usr/iscsi/target.c
> +++ b/usr/iscsi/target.c
> @@ -369,9 +369,11 @@ void target_list_build(struct iscsi_connection *conn, char *addr, char *name)
>  struct iscsi_target *target_find_by_name(const char *name)
>  {
>         struct iscsi_target *target;
> +       char *tname;
>  
>         list_for_each_entry(target, &iscsi_targets_list, tlist) {
> -               if (!strcmp(tgt_targetname(target->tid), name))
> +               tname = tgt_targetname(target->tid);
> +               if (tname && !strcmp(tname, name))
>                         return target;
>         }
>  
> 
> -- 
> Lee Duncan
> --
> To unsubscribe from this list: send the line "unsubscribe stgt" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Handle access of a target that has been removed
  2015-10-17 12:57 ` FUJITA Tomonori
@ 2015-10-19 15:56   ` Lee Duncan
  2015-10-28  5:28     ` FUJITA Tomonori
  0 siblings, 1 reply; 7+ messages in thread
From: Lee Duncan @ 2015-10-19 15:56 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: stgt

On 10/17/2015 05:57 AM, FUJITA Tomonori wrote:
> On Tue, 13 Oct 2015 11:10:17 -0700
> Lee Duncan <lduncan@suse.com> wrote:
> 
>> Hello:
>>
>> I recently got a report of a tgtd core dump from our opencloud
>> group. The stack trace showed that a strcmp against a NULL was causing
>> the failure:
>>
>> ----------------------------------------------------------------
>> Program terminated with signal 11, Segmentation fault.
>> (gdb) bt
>> #0  0x00007fa701817576 in __strcmp_sse42 () from /lib64/libc.so.6
>> #1  0x0000000000408012 in target_find_by_name (
>>     name=0x6ac16f "iqn.2010-10.org.openstack:volume-e812c705-80bc-4064-a84c-5559cda8b1ca") at iscsi/target.c:216
>> #2  0x0000000000406042 in login_start (conn=0x6abea8) at iscsi/iscsid.c:478
>> #3  0x0000000000406e77 in cmnd_exec_login (conn=<optimized out>)
>>     at iscsi/iscsid.c:654
>> #4  cmnd_execute (conn=<optimized out>) at iscsi/iscsid.c:914
>> #5  iscsi_rx_handler (conn=0x6abea8) at iscsi/iscsid.c:2064
>> #6  0x0000000000409e98 in iscsi_tcp_event_handler (fd=<optimized out>, 
>>     events=1, data=0x63a480 <target_list>) at iscsi/iscsi_tcp.c:158
>> #7  0x0000000000418f1e in event_loop () at tgtd.c:272
>> #8  0x0000000000419405 in main (argc=1, argv=<optimized out>) at tgtd.c:438
>> ----------------------------------------------------------------
>>
>> It looks like target_find_by_name() uses tgt_targetname(), but doesn't
>> account for the fact that it can return a NULL when the target being
>> looked up does not (now) exist:
> 
> Thanks for the patch. But I'm confused why this happens. target with
> the same tid as iscsi_target must exist?
> 

No, I believe this is happening because the targets are getting
dynamically removed. But I will verify that. Because if tgt_targetname()
can return a NULL (as apparently it did this time), there are probably
other places in the code that need to check for that.

-- 
Lee Duncan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Handle access of a target that has been removed
  2015-10-19 15:56   ` Lee Duncan
@ 2015-10-28  5:28     ` FUJITA Tomonori
  2015-11-13 19:15       ` Lee Duncan
  0 siblings, 1 reply; 7+ messages in thread
From: FUJITA Tomonori @ 2015-10-28  5:28 UTC (permalink / raw)
  To: lduncan; +Cc: stgt

On Mon, 19 Oct 2015 08:56:06 -0700
Lee Duncan <lduncan@suse.com> wrote:

> On 10/17/2015 05:57 AM, FUJITA Tomonori wrote:
>> On Tue, 13 Oct 2015 11:10:17 -0700
>> Lee Duncan <lduncan@suse.com> wrote:
>> 
>>> Hello:
>>>
>>> I recently got a report of a tgtd core dump from our opencloud
>>> group. The stack trace showed that a strcmp against a NULL was causing
>>> the failure:
>>>
>>> ----------------------------------------------------------------
>>> Program terminated with signal 11, Segmentation fault.
>>> (gdb) bt
>>> #0  0x00007fa701817576 in __strcmp_sse42 () from /lib64/libc.so.6
>>> #1  0x0000000000408012 in target_find_by_name (
>>>     name=0x6ac16f "iqn.2010-10.org.openstack:volume-e812c705-80bc-4064-a84c-5559cda8b1ca") at iscsi/target.c:216
>>> #2  0x0000000000406042 in login_start (conn=0x6abea8) at iscsi/iscsid.c:478
>>> #3  0x0000000000406e77 in cmnd_exec_login (conn=<optimized out>)
>>>     at iscsi/iscsid.c:654
>>> #4  cmnd_execute (conn=<optimized out>) at iscsi/iscsid.c:914
>>> #5  iscsi_rx_handler (conn=0x6abea8) at iscsi/iscsid.c:2064
>>> #6  0x0000000000409e98 in iscsi_tcp_event_handler (fd=<optimized out>, 
>>>     events=1, data=0x63a480 <target_list>) at iscsi/iscsi_tcp.c:158
>>> #7  0x0000000000418f1e in event_loop () at tgtd.c:272
>>> #8  0x0000000000419405 in main (argc=1, argv=<optimized out>) at tgtd.c:438
>>> ----------------------------------------------------------------
>>>
>>> It looks like target_find_by_name() uses tgt_targetname(), but doesn't
>>> account for the fact that it can return a NULL when the target being
>>> looked up does not (now) exist:
>> 
>> Thanks for the patch. But I'm confused why this happens. target with
>> the same tid as iscsi_target must exist?
>> 
> 
> No, I believe this is happening because the targets are getting
> dynamically removed. But I will verify that. Because if tgt_targetname()
> can return a NULL (as apparently it did this time), there are probably
> other places in the code that need to check for that.

Ok, I'm still confused but applied the patch. Let's see if it would
help.

Thanks,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Handle access of a target that has been removed
  2015-10-28  5:28     ` FUJITA Tomonori
@ 2015-11-13 19:15       ` Lee Duncan
  2015-11-26  5:06         ` Lee Duncan
  0 siblings, 1 reply; 7+ messages in thread
From: Lee Duncan @ 2015-11-13 19:15 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: stgt

On 10/28/2015 06:28 AM, FUJITA Tomonori wrote:
> On Mon, 19 Oct 2015 08:56:06 -0700
> Lee Duncan <lduncan@suse.com> wrote:
> 
>> On 10/17/2015 05:57 AM, FUJITA Tomonori wrote:
>>> On Tue, 13 Oct 2015 11:10:17 -0700
>>> Lee Duncan <lduncan@suse.com> wrote:
>>>
>>>> Hello:
>>>>
>>>> I recently got a report of a tgtd core dump from our opencloud
>>>> group. The stack trace showed that a strcmp against a NULL was causing
>>>> the failure:
>>>>
>>>> ----------------------------------------------------------------
>>>> Program terminated with signal 11, Segmentation fault.
>>>> (gdb) bt
>>>> #0  0x00007fa701817576 in __strcmp_sse42 () from /lib64/libc.so.6
>>>> #1  0x0000000000408012 in target_find_by_name (
>>>>     name=0x6ac16f "iqn.2010-10.org.openstack:volume-e812c705-80bc-4064-a84c-5559cda8b1ca") at iscsi/target.c:216
>>>> #2  0x0000000000406042 in login_start (conn=0x6abea8) at iscsi/iscsid.c:478
>>>> #3  0x0000000000406e77 in cmnd_exec_login (conn=<optimized out>)
>>>>     at iscsi/iscsid.c:654
>>>> #4  cmnd_execute (conn=<optimized out>) at iscsi/iscsid.c:914
>>>> #5  iscsi_rx_handler (conn=0x6abea8) at iscsi/iscsid.c:2064
>>>> #6  0x0000000000409e98 in iscsi_tcp_event_handler (fd=<optimized out>, 
>>>>     events=1, data=0x63a480 <target_list>) at iscsi/iscsi_tcp.c:158
>>>> #7  0x0000000000418f1e in event_loop () at tgtd.c:272
>>>> #8  0x0000000000419405 in main (argc=1, argv=<optimized out>) at tgtd.c:438
>>>> ----------------------------------------------------------------
>>>>
>>>> It looks like target_find_by_name() uses tgt_targetname(), but doesn't
>>>> account for the fact that it can return a NULL when the target being
>>>> looked up does not (now) exist:
>>>
>>> Thanks for the patch. But I'm confused why this happens. target with
>>> the same tid as iscsi_target must exist?
>>>
>>
>> No, I believe this is happening because the targets are getting
>> dynamically removed. But I will verify that. Because if tgt_targetname()
>> can return a NULL (as apparently it did this time), there are probably
>> other places in the code that need to check for that.
> 
> Ok, I'm still confused but applied the patch. Let's see if it would
> help.
> 
> Thanks,

I am still trying to look more deeply into how this could happen.

This bug was triggered when using cloud storage as the back-end for for
their target, and that cloud storage "goes away". (I am still trying to
determine what that actually means.) If so, I should be able to simulate
by allowing my back-end storage to go away.

As you say, let's see if we see any other instances of tgt_targetname()
returning null, in other spots in the code.

Thank you.
-- 
Lee Duncan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Handle access of a target that has been removed
  2015-11-13 19:15       ` Lee Duncan
@ 2015-11-26  5:06         ` Lee Duncan
  2015-11-27  5:56           ` FUJITA Tomonori
  0 siblings, 1 reply; 7+ messages in thread
From: Lee Duncan @ 2015-11-26  5:06 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: stgt

Hi:

Response below ...

On 11/13/2015 11:15 AM, Lee Duncan wrote:
> On 10/28/2015 06:28 AM, FUJITA Tomonori wrote:
>> On Mon, 19 Oct 2015 08:56:06 -0700
>> Lee Duncan <lduncan@suse.com> wrote:
>>
>>> On 10/17/2015 05:57 AM, FUJITA Tomonori wrote:
>>>> On Tue, 13 Oct 2015 11:10:17 -0700
>>>> Lee Duncan <lduncan@suse.com> wrote:
>>>>
>>>>> Hello:
>>>>>

>>>>> I recently got a report of a tgtd core dump from our opencloud
>>>>> group. The stack trace showed that a strcmp against a NULL was causing
>>>>> the failure:
>>>>>
>>>>> ...
>>>>>
>>>>> It looks like target_find_by_name() uses tgt_targetname(), but doesn't
>>>>> account for the fact that it can return a NULL when the target being
>>>>> looked up does not (now) exist:
>>>>
>>>> Thanks for theI supplied a patch you appliedI supplied a patch you applied patch. But I'm confused why this happens. target with
>>>> the same tid as iscsi_target must exist?
>>>>
>>>
>>> No, I believe this is happening because the targets are getting
>>> dynamically removed. But I will verify that. Because if tgt_targetname()
>>> can return a NULL (as apparently it did this time), there are probably
>>> other places in the code that need to check for that.
>>
>> Ok, I'm still confused but applied the patch. Let's see if it would
>> help.
>>
>> Thanks,
> 
> I am still trying to look more deeply into how this could happen.
> 
> This bug was triggered when using cloud storage as the back-end for for
> their target, and that cloud storage "goes away". (I am still trying to
> determine what that actually means.) If so, I should be able to simulate
> by allowing my back-end storage to go away.
> 
> As you say, let's see if we see any other instances of tgt_targetname()
> returning null, in other spots in the code.
> 

I thought that you might like to know that I finally tested this. I
created a test bed where I randomly created and deleted 4 target nodes
by changing the configuration files then running "tgt-admin --update
ALL -f".

At the same time, I randomly attempted to log out of then back into
all 4 targets.

It only takes a minute to hit the case the patch I sent fixes, where
target_find_by_name() dies when tgt_targetname() returns an unexpected
NULL. It was nice to validate that the patch was actually needed.

You might remember, though, that I was worried that other places where
tgt_targetname() where used might need to check for NULL as well. So
I tested for that.

Surprisingly, none of the other places that uses tgt_targetname() ever
got NULL back. I tested for several hours. So either I just didn't hit
the corner case needed to tickle such a bug, or that just doesn't
happen. I would love to investigate further, but I may not have the
time to get to it soon. In the mean time, it appears we are in good
shape.
-- 
Lee Duncan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Handle access of a target that has been removed
  2015-11-26  5:06         ` Lee Duncan
@ 2015-11-27  5:56           ` FUJITA Tomonori
  0 siblings, 0 replies; 7+ messages in thread
From: FUJITA Tomonori @ 2015-11-27  5:56 UTC (permalink / raw)
  To: lduncan; +Cc: stgt

Hi,

On Wed, 25 Nov 2015 21:06:14 -0800
Lee Duncan <lduncan@suse.com> wrote:

> I thought that you might like to know that I finally tested this. I

Yeah, thanks a lot for the input!

> created a test bed where I randomly created and deleted 4 target nodes
> by changing the configuration files then running "tgt-admin --update
> ALL -f".
> 
> At the same time, I randomly attempted to log out of then back into
> all 4 targets.
> 
> It only takes a minute to hit the case the patch I sent fixes, where
> target_find_by_name() dies when tgt_targetname() returns an unexpected
> NULL. It was nice to validate that the patch was actually needed.
> 
> You might remember, though, that I was worried that other places where
> tgt_targetname() where used might need to check for NULL as well. So
> I tested for that.
> 
> Surprisingly, none of the other places that uses tgt_targetname() ever
> got NULL back. I tested for several hours. So either I just didn't hit
> the corner case needed to tickle such a bug, or that just doesn't
> happen. I would love to investigate further, but I may not have the
> time to get to it soon. In the mean time, it appears we are in good
> shape.

I see. I try to think more.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-11-27  5:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-13 18:10 [PATCH] Handle access of a target that has been removed Lee Duncan
2015-10-17 12:57 ` FUJITA Tomonori
2015-10-19 15:56   ` Lee Duncan
2015-10-28  5:28     ` FUJITA Tomonori
2015-11-13 19:15       ` Lee Duncan
2015-11-26  5:06         ` Lee Duncan
2015-11-27  5:56           ` FUJITA Tomonori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).