All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>, Jan Kara <jack@suse.cz>,
	dm-devel@redhat.com, linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	kosaki.motohiro@jp.fujitsu.com, linux-fsdevel@vger.kernel.org,
	lwoodman@redhat.com, "Alasdair G. Kergon" <agk@redhat.com>
Subject: Re: [PATCH 0/4] Fix a crash when block device is read and block size is changed at the same time
Date: Tue, 25 Sep 2012 20:11:30 +0200	[thread overview]
Message-ID: <5061F3D2.6050502@kernel.dk> (raw)
In-Reply-To: <5061F0E6.5000403@kernel.dk>

On 2012-09-25 19:59, Jens Axboe wrote:
> On 2012-09-25 19:49, Jeff Moyer wrote:
>> Jeff Moyer <jmoyer@redhat.com> writes:
>>
>>> Mikulas Patocka <mpatocka@redhat.com> writes:
>>>
>>>> Hi Jeff
>>>>
>>>> Thanks for testing.
>>>>
>>>> It would be interesting ... what happens if you take the patch 3, leave 
>>>> "struct percpu_rw_semaphore bd_block_size_semaphore" in "struct 
>>>> block_device", but remove any use of the semaphore from fs/block_dev.c? - 
>>>> will the performance be like unpatched kernel or like patch 3? It could be 
>>>> that the change in the alignment affects performance on your CPU too, just 
>>>> differently than on my CPU.
>>>
>>> It turns out to be exactly the same performance as with the 3rd patch
>>> applied, so I guess it does have something to do with cache alignment.
>>> Here is the patch (against vanilla) I ended up testing.  Let me know if
>>> I've botched it somehow.
>>>
>>> So, I next up I'll play similar tricks to what you did (padding struct
>>> block_device in all kernels) to eliminate the differences due to
>>> structure alignment and provide a clear picture of what the locking
>>> effects are.
>>
>> After trying again with the same padding you used in the struct
>> bdev_inode, I see no performance differences between any of the
>> patches.  I tried bumping up the number of threads to saturate the
>> number of cpus on a single NUMA node on my hardware, but that resulted
>> in lower IOPS to the device, and hence consumption of less CPU time.
>> So, I believe my results to be inconclusive.
>>
>> After talking with Vivek about the problem, he had mentioned that it
>> might be worth investigating whether bd_block_size could be protected
>> using SRCU.  I looked into it, and the one thing I couldn't reconcile is
>> updating both the bd_block_size and the inode->i_blkbits at the same
>> time.  It would involve (afaiui) adding fields to both the inode and the
>> block_device data structures and using rcu_assign_pointer  and
>> rcu_dereference to modify and access the fields, and both fields would
>> need to protected by the same struct srcu_struct.  I'm not sure whether
>> that's a desirable approach.  When I started to implement it, it got
>> ugly pretty quickly.  What do others think?
>>
>> For now, my preference is to get the full patch set in.  I will continue
>> to investigate the performance impact of the data structure size changes
>> that I've been seeing.
>>
>> So, for the four patches:
>>
>> Acked-by: Jeff Moyer <jmoyer@redhat.com>
>>
>> Jens, can you have a look at the patch set?  We are seeing problem
>> reports of this in the wild[1][2].
> 
> I'll queue it up for 3.7. I can run my regular testing on the 8-way, it
> has a nack for showing scaling problems very nicely in aio/dio. As long
> as we're not adding per-inode cache line dirtying per IO (and the
> per-cpu rw sem looks OK), then I don't think there's too much to worry
> about.

I take that back. The series doesn't apply to my current tree. Not too
unexpected, since it's some weeks old. But more importantly, please send
this is a "real" patch series. I don't want to see two implementations
of rw semaphores. I think it's perfectly fine to first do a regular rw
sem, then a last patch adding the cache friendly variant from Eric and
converting to that.

In other words, get rid of 3/4.

-- 
Jens Axboe


WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <axboe@kernel.dk>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>, Jan Kara <jack@suse.cz>,
	dm-devel@redhat.com, linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	kosaki.motohiro@jp.fujitsu.com, linux-fsdevel@vger.kernel.org,
	lwoodman@redhat.com, "Alasdair G. Kergon" <agk@redhat.com>
Subject: Re: [PATCH 0/4] Fix a crash when block device is read and block size is changed at the same time
Date: Tue, 25 Sep 2012 20:11:30 +0200	[thread overview]
Message-ID: <5061F3D2.6050502@kernel.dk> (raw)
In-Reply-To: <5061F0E6.5000403@kernel.dk>

On 2012-09-25 19:59, Jens Axboe wrote:
> On 2012-09-25 19:49, Jeff Moyer wrote:
>> Jeff Moyer <jmoyer@redhat.com> writes:
>>
>>> Mikulas Patocka <mpatocka@redhat.com> writes:
>>>
>>>> Hi Jeff
>>>>
>>>> Thanks for testing.
>>>>
>>>> It would be interesting ... what happens if you take the patch 3, leave 
>>>> "struct percpu_rw_semaphore bd_block_size_semaphore" in "struct 
>>>> block_device", but remove any use of the semaphore from fs/block_dev.c? - 
>>>> will the performance be like unpatched kernel or like patch 3? It could be 
>>>> that the change in the alignment affects performance on your CPU too, just 
>>>> differently than on my CPU.
>>>
>>> It turns out to be exactly the same performance as with the 3rd patch
>>> applied, so I guess it does have something to do with cache alignment.
>>> Here is the patch (against vanilla) I ended up testing.  Let me know if
>>> I've botched it somehow.
>>>
>>> So, I next up I'll play similar tricks to what you did (padding struct
>>> block_device in all kernels) to eliminate the differences due to
>>> structure alignment and provide a clear picture of what the locking
>>> effects are.
>>
>> After trying again with the same padding you used in the struct
>> bdev_inode, I see no performance differences between any of the
>> patches.  I tried bumping up the number of threads to saturate the
>> number of cpus on a single NUMA node on my hardware, but that resulted
>> in lower IOPS to the device, and hence consumption of less CPU time.
>> So, I believe my results to be inconclusive.
>>
>> After talking with Vivek about the problem, he had mentioned that it
>> might be worth investigating whether bd_block_size could be protected
>> using SRCU.  I looked into it, and the one thing I couldn't reconcile is
>> updating both the bd_block_size and the inode->i_blkbits at the same
>> time.  It would involve (afaiui) adding fields to both the inode and the
>> block_device data structures and using rcu_assign_pointer  and
>> rcu_dereference to modify and access the fields, and both fields would
>> need to protected by the same struct srcu_struct.  I'm not sure whether
>> that's a desirable approach.  When I started to implement it, it got
>> ugly pretty quickly.  What do others think?
>>
>> For now, my preference is to get the full patch set in.  I will continue
>> to investigate the performance impact of the data structure size changes
>> that I've been seeing.
>>
>> So, for the four patches:
>>
>> Acked-by: Jeff Moyer <jmoyer@redhat.com>
>>
>> Jens, can you have a look at the patch set?  We are seeing problem
>> reports of this in the wild[1][2].
> 
> I'll queue it up for 3.7. I can run my regular testing on the 8-way, it
> has a nack for showing scaling problems very nicely in aio/dio. As long
> as we're not adding per-inode cache line dirtying per IO (and the
> per-cpu rw sem looks OK), then I don't think there's too much to worry
> about.

I take that back. The series doesn't apply to my current tree. Not too
unexpected, since it's some weeks old. But more importantly, please send
this is a "real" patch series. I don't want to see two implementations
of rw semaphores. I think it's perfectly fine to first do a regular rw
sem, then a last patch adding the cache friendly variant from Eric and
converting to that.

In other words, get rid of 3/4.

-- 
Jens Axboe


  reply	other threads:[~2012-09-25 18:11 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-28  3:04 Crash when IO is being submitted and block size is changed Mikulas Patocka
2012-06-28 11:15 ` Jan Kara
2012-06-28 15:44   ` Mikulas Patocka
2012-06-28 16:53     ` Jan Kara
2012-07-16  0:55   ` Mikulas Patocka
2012-07-17 19:19     ` Jeff Moyer
2012-07-19  2:27       ` Mikulas Patocka
2012-07-19 13:33         ` Jeff Moyer
2012-07-28 16:40           ` [PATCH 1/3] Fix " Mikulas Patocka
2012-07-28 16:41             ` [PATCH 2/3] Introduce percpu rw semaphores Mikulas Patocka
2012-07-28 16:42               ` [PATCH 3/3] blockdev: turn a rw semaphore into a percpu rw semaphore Mikulas Patocka
2012-07-28 20:44               ` [PATCH 2/3] Introduce percpu rw semaphores Eric Dumazet
2012-07-29  5:13                 ` [dm-devel] " Mikulas Patocka
2012-07-29 10:10                   ` Eric Dumazet
2012-07-29 18:36                     ` Eric Dumazet
2012-08-01 20:07                       ` Mikulas Patocka
2012-08-01 20:09                       ` [PATCH 4/3] " Mikulas Patocka
2012-08-31 18:40                         ` [PATCH 0/4] Fix a crash when block device is read and block size is changed at the same time Mikulas Patocka
2012-08-31 18:41                           ` [PATCH 1/4] Add a lock that will be needed by the next patch Mikulas Patocka
2012-08-31 18:42                             ` [PATCH 2/4] blockdev: fix a crash when block size is changed and I/O is issued simultaneously Mikulas Patocka
2012-08-31 18:43                               ` [PATCH 3/4] blockdev: turn a rw semaphore into a percpu rw semaphore Mikulas Patocka
2012-08-31 18:43                                 ` [PATCH 4/4] New percpu lock implementation Mikulas Patocka
2012-08-31 19:27                           ` [PATCH 0/4] Fix a crash when block device is read and block size is changed at the same time Mikulas Patocka
2012-08-31 20:11                             ` Jeff Moyer
2012-08-31 20:34                               ` Mikulas Patocka
2012-09-17 21:19                               ` Jeff Moyer
2012-09-17 21:19                                 ` Jeff Moyer
2012-09-18 17:04                                 ` Mikulas Patocka
2012-09-18 17:22                                   ` Jeff Moyer
2012-09-18 18:55                                     ` Mikulas Patocka
2012-09-18 18:58                                       ` Jeff Moyer
2012-09-18 20:11                                   ` Jeff Moyer
2012-09-25 17:49                                     ` Jeff Moyer
2012-09-25 17:59                                       ` Jens Axboe
2012-09-25 17:59                                         ` Jens Axboe
2012-09-25 18:11                                         ` Jens Axboe [this message]
2012-09-25 18:11                                           ` Jens Axboe
2012-09-25 22:49                                           ` [PATCH 1/2] " Mikulas Patocka
2012-09-26  5:48                                             ` Jens Axboe
2012-09-26  5:48                                               ` Jens Axboe
2012-11-16 22:02                                             ` Jeff Moyer
2012-09-25 22:50                                           ` [PATCH 2/2] " Mikulas Patocka
2012-09-25 22:58                                       ` [PATCH 0/4] " Mikulas Patocka
2012-09-26 13:47                                         ` Jeff Moyer
2012-09-26 14:35                                           ` Mikulas Patocka
2012-07-30 17:00                   ` [dm-devel] [PATCH 2/3] Introduce percpu rw semaphores Paul E. McKenney
2012-07-31  0:00                     ` Mikulas Patocka
2012-08-01 17:15                       ` Paul E. McKenney
2012-06-29  6:25 ` Crash when IO is being submitted and block size is changed Vyacheslav Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5061F3D2.6050502@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=aarcange@redhat.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lwoodman@redhat.com \
    --cc=mpatocka@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.