All of lore.kernel.org
 help / color / mirror / Atom feed
* lockup in rb_get_reader_page
@ 2010-03-30 23:27 Jiaying Zhang
  2010-03-31  0:35 ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Jiaying Zhang @ 2010-03-30 23:27 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, Michael Rubin, David Sharp, linux-kernel

Hi Steven,

We recently saw some cpu lockups while running kernel tracing.
The problem started to happen after we synced up our ring buffer
code with the upstream lockless ring buffer change. It usually
took many hours and heavy trace load to hit this problem. When
the lockup happens, the problematic cpu seemed to be in an infinite
loop of trying to grab the head_page in rb_get_reader_page().

We would like to check with you on whether this is a known issue.
If so, do we have a bug fix? If not, do you have any suggestions on
where we should check?

Thanks a lot!

Jiaying

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lockup in rb_get_reader_page
  2010-03-30 23:27 lockup in rb_get_reader_page Jiaying Zhang
@ 2010-03-31  0:35 ` Steven Rostedt
  2010-03-31  4:01   ` Jiaying Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2010-03-31  0:35 UTC (permalink / raw)
  To: Jiaying Zhang
  Cc: Ingo Molnar, Michael Rubin, David Sharp, linux-kernel, Steven Rostedt

Hi Jiaying,


On Tue, 2010-03-30 at 16:27 -0700, Jiaying Zhang wrote:
> Hi Steven,
> 
> We recently saw some cpu lockups while running kernel tracing.
> The problem started to happen after we synced up our ring buffer
> code with the upstream lockless ring buffer change. It usually
> took many hours and heavy trace load to hit this problem. When
> the lockup happens, the problematic cpu seemed to be in an infinite
> loop of trying to grab the head_page in rb_get_reader_page().
> 
> We would like to check with you on whether this is a known issue.
> If so, do we have a bug fix? If not, do you have any suggestions on
> where we should check?

I'm unaware of any problems with the ring buffer. I've been running the
lockless version for over a year now, and hammering it with very
intensive tracing.

Now, you see this on the reader side. There has been a few recent fixes
that could cause problems when we have multiple readers. Ftrace usage
does not usually encounter multiple readers so I have not had issues.
But Li Zefan had a stress test that did find and trigger the problems.

Are you using the latest ring buffer that is in Linus's tree? Are you
resetting the ring buffer while reading it?

I guess I need to know more exactly what you are doing to understand the
problem.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lockup in rb_get_reader_page
  2010-03-31  0:35 ` Steven Rostedt
@ 2010-03-31  4:01   ` Jiaying Zhang
  2010-04-01  0:28     ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Jiaying Zhang @ 2010-03-31  4:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Michael Rubin, David Sharp, linux-kernel, Steven Rostedt

Thanks a lot for your quick reply!

On Tue, Mar 30, 2010 at 5:35 PM, Steven Rostedt <srostedt@redhat.com> wrote:
> Hi Jiaying,
>
>
> On Tue, 2010-03-30 at 16:27 -0700, Jiaying Zhang wrote:
>> Hi Steven,
>>
>> We recently saw some cpu lockups while running kernel tracing.
>> The problem started to happen after we synced up our ring buffer
>> code with the upstream lockless ring buffer change. It usually
>> took many hours and heavy trace load to hit this problem. When
>> the lockup happens, the problematic cpu seemed to be in an infinite
>> loop of trying to grab the head_page in rb_get_reader_page().
>>
>> We would like to check with you on whether this is a known issue.
>> If so, do we have a bug fix? If not, do you have any suggestions on
>> where we should check?
>
> I'm unaware of any problems with the ring buffer. I've been running the
> lockless version for over a year now, and hammering it with very
> intensive tracing.
>
> Now, you see this on the reader side. There has been a few recent fixes
> that could cause problems when we have multiple readers. Ftrace usage
> does not usually encounter multiple readers so I have not had issues.
> But Li Zefan had a stress test that did find and trigger the problems.
>
I looked at Li Zefan's patch but I think that is to fix a lockup issue in
ftrace. We are not using ftrace right now but still use our own kernel
trace wrapper built on top of ring buffer. We also only use one reader.
I guess it is more likely a race between trace reader and trace writer.

> Are you using the latest ring buffer that is in Linus's tree? Are you
> resetting the ring buffer while reading it?
>
No. We don't allow resetting buffer when there is any active reader.
I don't think we ever reset the ring buffer on those machines that hit
this problem.

> I guess I need to know more exactly what you are doing to understand the
> problem.
>
I am going to patch rb_get_reader_page() to print out some debugging
message when it enters into an infinitely loop. I will keep you updated
if I find any interesting info.

Thanks a lot!

Jiaying

> Thanks,
>
> -- Steve
>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lockup in rb_get_reader_page
  2010-03-31  4:01   ` Jiaying Zhang
@ 2010-04-01  0:28     ` Steven Rostedt
  2010-04-01  1:14       ` Jiaying Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2010-04-01  0:28 UTC (permalink / raw)
  To: Jiaying Zhang
  Cc: Steven Rostedt, Ingo Molnar, Michael Rubin, David Sharp, linux-kernel

On Tue, 2010-03-30 at 21:01 -0700, Jiaying Zhang wrote:

> I am going to patch rb_get_reader_page() to print out some debugging
> message when it enters into an infinitely loop. I will keep you updated
> if I find any interesting info.

I was just thinking about this some more. One restriction to the
ring_buffer is that a reader can not preempt a writer, which means that
a read can not happen in an interrupt or NMI. If a read happens there
and preempts a writer as the writer was moving the head page, it will go
into an infinite loop waiting for the writer to finish its move.

You can only do reads from interrupt context if the writer disables
interrupts before reserving and committing the data.

Are you doing reads from interrupt context?

-- Steve



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lockup in rb_get_reader_page
  2010-04-01  0:28     ` Steven Rostedt
@ 2010-04-01  1:14       ` Jiaying Zhang
  2010-04-01  1:23         ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Jiaying Zhang @ 2010-04-01  1:14 UTC (permalink / raw)
  To: rostedt
  Cc: Steven Rostedt, Ingo Molnar, Michael Rubin, David Sharp, linux-kernel

On Wed, Mar 31, 2010 at 5:28 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2010-03-30 at 21:01 -0700, Jiaying Zhang wrote:
>
>> I am going to patch rb_get_reader_page() to print out some debugging
>> message when it enters into an infinitely loop. I will keep you updated
>> if I find any interesting info.
>
> I was just thinking about this some more. One restriction to the
> ring_buffer is that a reader can not preempt a writer, which means that
> a read can not happen in an interrupt or NMI. If a read happens there
> and preempts a writer as the writer was moving the head page, it will go
> into an infinite loop waiting for the writer to finish its move.
>
> You can only do reads from interrupt context if the writer disables
> interrupts before reserving and committing the data.
>
> Are you doing reads from interrupt context?
>
No. We are using a user-space reader that reads ring buffer via syscall
or ioctl. But even though we don't read from interrupt context, as I
understand, we can still have a reader on another cpuB reads cpuA's
buffer when the writer on cpuA moves the head page. Will we have
any problem in that case?

Jiaying

> -- Steve
>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lockup in rb_get_reader_page
  2010-04-01  1:14       ` Jiaying Zhang
@ 2010-04-01  1:23         ` Steven Rostedt
  2010-04-02 23:10           ` Jiaying Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2010-04-01  1:23 UTC (permalink / raw)
  To: Jiaying Zhang
  Cc: Steven Rostedt, Ingo Molnar, Michael Rubin, David Sharp, linux-kernel

On Wed, 2010-03-31 at 18:14 -0700, Jiaying Zhang wrote:
> On Wed, Mar 31, 2010 at 5:28 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Are you doing reads from interrupt context?
> >
> No. We are using a user-space reader that reads ring buffer via syscall
> or ioctl. But even though we don't read from interrupt context, as I
> understand, we can still have a reader on another cpuB reads cpuA's
> buffer when the writer on cpuA moves the head page. Will we have
> any problem in that case?

Nope, unless that write hangs for some reason ;-)

The read may spin waiting for a writer to finish, hence it just can not
preempt a writer in an interrupt context, because then it may spin
forever, because it is blocking the writer. But if a writer is on
another CPU, then all is well. In fact, the ring buffer was designed
with that in mind.

-- Steve


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lockup in rb_get_reader_page
  2010-04-01  1:23         ` Steven Rostedt
@ 2010-04-02 23:10           ` Jiaying Zhang
  2010-04-03 12:45             ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Jiaying Zhang @ 2010-04-02 23:10 UTC (permalink / raw)
  To: rostedt
  Cc: Steven Rostedt, Ingo Molnar, Michael Rubin, David Sharp, linux-kernel

I finally reproduced the problem again with an instrumented kernel. Here are
the logged debugging messages I put right after the spin label in
rb_get_reader_page():

<4>cpu 0 may deadlock, loops 100
<4>reader pgoff 11 list ffff8100025c9c40 prev ffff8100025c9b00 next
ffff8100025c9a41
<4>reader_page pgoff 11 list ffff8100025c9c40 prev ffff8100025c9b00
next ffff8100025c9a41
<4>head_page pgoff 11 list ffff8100025c9c40 prev ffff8100025c9b00 next
ffff8100025c9a41
<4>tail_page pgoff 3 list ffff8100025c9a40 prev ffff8100025c9d00 next
ffff8100025c9e40
<4>commit_page pgoff 3 list ffff8100025c9a40 prev ffff8100025c9d00
next ffff8100025c9e40
<4>page offset 4 list ffff8100025c9b00 prev ffff8100025c9c80 next
ffff8100025c9d00
<4>page offset 12 list ffff8100025c9d00 prev ffff8100025c9b00 next
ffff8100025c9a40
<4>page offset 3 list ffff8100025c9a40 prev ffff8100025c9d00 next
ffff8100025c9e40
<4>page offset 17 list ffff8100025c9e40 prev ffff8100025c9a40 next
ffff8100025c9d80
<4>page offset 14 list ffff8100025c9d80 prev ffff8100025c9e40 next
ffff8100025c9cc0
<4>page offset 10 list ffff8100025c9cc0 prev ffff8100025c9d80 next
ffff8100025c9b80
<4>page offset 7 list ffff8100025c9b80 prev ffff8100025c9cc0 next
ffff8100025c9ac0
<4>page offset 5 list ffff8100025c9ac0 prev ffff8100025c9b80 next
ffff8100025c9a00
<4>page offset 1 list ffff8100025c9a00 prev ffff8100025c9ac0 next
ffff8100025c9d40
<4>page offset 13 list ffff8100025c9d40 prev ffff8100025c9a00 next
ffff8100025c9e00
<4>page offset 15 list ffff8100025c9e00 prev ffff8100025c9d40 next
ffff8100025c9a80
<4>page offset 2 list ffff8100025c9a80 prev ffff8100025c9e00 next
ffff8100025c9bc0
<4>page offset 6 list ffff8100025c9bc0 prev ffff8100025c9a80 next
ffff8100025c9dc0
<4>page offset 16 list ffff8100025c9dc0 prev ffff8100025c9bc0 next
ffff8100025c99c0
<4>page offset 0 list ffff8100025c99c0 prev ffff8100025c9dc0 next
ffff8100025c9c00
<4>page offset 8 list ffff8100025c9c00 prev ffff8100025c99c0 next
ffff8100025c9c80
<4>page offset 9 list ffff8100025c9c80 prev ffff8100025c9c00 next
ffff8100025c9b00

The page offset is the index we added in the buffer_page structure.
You can ignore this field. The interesting part here is that both
cpu_buffer->head_page and cpu_buffer->reader_page point to the
same buffer_page. I am not sure yet how we entered this situation,
but the problem is once we get here, we will be in an infinite loop.

At the beginning of the spin loop, we call rb_set_head_page() to grab
the head_page. In that function, we check whether a page is the head_page
with rb_is_head_page(). The problem is that rb_is_head_page() may
return RB_PAGE_MOVED if the head_page has changed to another
page, and that is what has happened as the above messages show.
Shouldn't we just return 0 in case that head_page has moved so that
we can move to the next page in the loop inside rb_set_head_page()?

Jiaying

On Wed, Mar 31, 2010 at 6:23 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 2010-03-31 at 18:14 -0700, Jiaying Zhang wrote:
>> On Wed, Mar 31, 2010 at 5:28 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> >
>> > Are you doing reads from interrupt context?
>> >
>> No. We are using a user-space reader that reads ring buffer via syscall
>> or ioctl. But even though we don't read from interrupt context, as I
>> understand, we can still have a reader on another cpuB reads cpuA's
>> buffer when the writer on cpuA moves the head page. Will we have
>> any problem in that case?
>
> Nope, unless that write hangs for some reason ;-)
>
> The read may spin waiting for a writer to finish, hence it just can not
> preempt a writer in an interrupt context, because then it may spin
> forever, because it is blocking the writer. But if a writer is on
> another CPU, then all is well. In fact, the ring buffer was designed
> with that in mind.
>
> -- Steve
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lockup in rb_get_reader_page
  2010-04-02 23:10           ` Jiaying Zhang
@ 2010-04-03 12:45             ` Steven Rostedt
  2010-04-06 21:12               ` Jiaying Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2010-04-03 12:45 UTC (permalink / raw)
  To: Jiaying Zhang
  Cc: Steven Rostedt, Ingo Molnar, Michael Rubin, David Sharp, linux-kernel

On Fri, 2010-04-02 at 16:10 -0700, Jiaying Zhang wrote:

> The page offset is the index we added in the buffer_page structure.
> You can ignore this field. The interesting part here is that both
> cpu_buffer->head_page and cpu_buffer->reader_page point to the
> same buffer_page. I am not sure yet how we entered this situation,

You can ignore the cpu_buffer->head_page, it is used as a reference and
is not part of the main algorithm. It is just there to tell the reader
where the last head page was.

> but the problem is once we get here, we will be in an infinite loop.

But yes, it should never point to the reader page, because the reader
controls the head_page __and__ the reader page.

> 
> At the beginning of the spin loop, we call rb_set_head_page() to grab
> the head_page. In that function, we check whether a page is the head_page
> with rb_is_head_page(). The problem is that rb_is_head_page() may
> return RB_PAGE_MOVED if the head_page has changed to another
> page, and that is what has happened as the above messages show.

I don't see where it said that.

If RB_PAGE_MOVED is returned in rb_set_head_page then something is very
broken. Because that is only returned if the reader modified the code.
And since we only allow one reader at a time (we have locks to protect
that), and the rb_set_head_page is only called by the reader, then this
would mean another reader is reading the ring buffer.

I should add a:

	if ((ret = rb_is_head_page(cpu_buffer, page, page->list.prev))) {
		RB_WARN_ON(ret == RB_PAGE_MOVED);
		cpu_buffer->head_page = page;
		return page;
	}


> Shouldn't we just return 0 in case that head_page has moved so that
> we can move to the next page in the loop inside rb_set_head_page()?

No, when the reader moves the page, the RB_PAGE_MOVED forces the writer
to go into the conflict path (conflict between writer and reader).

-- Steve



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lockup in rb_get_reader_page
  2010-04-03 12:45             ` Steven Rostedt
@ 2010-04-06 21:12               ` Jiaying Zhang
  0 siblings, 0 replies; 9+ messages in thread
From: Jiaying Zhang @ 2010-04-06 21:12 UTC (permalink / raw)
  To: rostedt
  Cc: Steven Rostedt, Ingo Molnar, Michael Rubin, David Sharp, linux-kernel

Hi Steven,

On Sat, Apr 3, 2010 at 5:45 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 2010-04-02 at 16:10 -0700, Jiaying Zhang wrote:
>
>> The page offset is the index we added in the buffer_page structure.
>> You can ignore this field. The interesting part here is that both
>> cpu_buffer->head_page and cpu_buffer->reader_page point to the
>> same buffer_page. I am not sure yet how we entered this situation,
>
> You can ignore the cpu_buffer->head_page, it is used as a reference and
> is not part of the main algorithm. It is just there to tell the reader
> where the last head page was.
>
>> but the problem is once we get here, we will be in an infinite loop.
>
> But yes, it should never point to the reader page, because the reader
> controls the head_page __and__ the reader page.
>
>>
>> At the beginning of the spin loop, we call rb_set_head_page() to grab
>> the head_page. In that function, we check whether a page is the head_page
>> with rb_is_head_page(). The problem is that rb_is_head_page() may
>> return RB_PAGE_MOVED if the head_page has changed to another
>> page, and that is what has happened as the above messages show.
>
> I don't see where it said that.
>
> If RB_PAGE_MOVED is returned in rb_set_head_page then something is very
> broken. Because that is only returned if the reader modified the code.
> And since we only allow one reader at a time (we have locks to protect
> that), and the rb_set_head_page is only called by the reader, then this
> would mean another reader is reading the ring buffer.
>
> I should add a:
>
>        if ((ret = rb_is_head_page(cpu_buffer, page, page->list.prev))) {
>                RB_WARN_ON(ret == RB_PAGE_MOVED);
>                cpu_buffer->head_page = page;
>                return page;
>        }
>
>
I added the RB_WARN_ON(ret == RB_PAGE_MOVED) in rb_set_head_page()
as you suggested and I think it has helped me figure out the problem.

I saw a warning triggered by this WARN_ON this morning and realized
that although we are not doing read from interrupt context, we sometimes
call ring_buffer_empty() from a timer interrupt handler that checks whether
there is new data coming in the trace buffer and if so wakes up the
user-space reader. ring_buffer_empty() calls rb_set_head_page()
that can move the head_page. As far as I understand, it should be
ok to have ring_buffer_empty() preempt a writer so I guess we should leave
that RB_WARN_ON out from rb_set_head_page(). The problem in our case
is that we use our own locking mechanism to guarantee a single reader
instead of using the cpu_buffer->reader_lock so the reader is not synchronized
with ring_buffer_empty(). So when ring_buffer_empty() is called while
we are in the process of swapping the reader_page and head_page, the
head_page pointer can point to the old head, i.e., the new reader_page,
and we will enter into an infinite loop.

I wrapped our rb_get_reader_page() calls with cpu_buffer->reader_lock
spinlock and it seems to have solved the problem.

Thank you very much for the help!

Jiaying

>> Shouldn't we just return 0 in case that head_page has moved so that
>> we can move to the next page in the loop inside rb_set_head_page()?
>
> No, when the reader moves the page, the RB_PAGE_MOVED forces the writer
> to go into the conflict path (conflict between writer and reader).
>
> -- Steve
>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-04-06 21:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-30 23:27 lockup in rb_get_reader_page Jiaying Zhang
2010-03-31  0:35 ` Steven Rostedt
2010-03-31  4:01   ` Jiaying Zhang
2010-04-01  0:28     ` Steven Rostedt
2010-04-01  1:14       ` Jiaying Zhang
2010-04-01  1:23         ` Steven Rostedt
2010-04-02 23:10           ` Jiaying Zhang
2010-04-03 12:45             ` Steven Rostedt
2010-04-06 21:12               ` Jiaying Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.