* A lot of flush requests to the backing device @ 2021-11-05 11:21 Aleksei Zakharov 2021-11-08 5:38 ` Dongdong Tao 0 siblings, 1 reply; 6+ messages in thread From: Aleksei Zakharov @ 2021-11-05 11:21 UTC (permalink / raw) To: linux-bcache Hi all, I've used bcache a lot for the last three years, mostly in writeback mode with ceph, and I faced a strange behavior. When there's a heavy write load on the bcache device with a lot of fsync()/fdatasync() requests, the bcache device issues a lot of flush requests to the backing device. If the writeback rate is low, then there might be hundreds of flush requests per second issued to the backing device. If the writeback rate growths, then latency of the flush requests increases. And latency of the bcache device increases as a result and the application experiences higher disk latency. So, this behavior of bcache slows the application in it's I/O requests when writeback rate becomes high. This workload pattern with a lot of fsync()/fdatasync() requests is a common for a latency-sensitive applications. And it seems that this bcache behavior slows down this type of workloads. As I understand, if a write request with REQ_PREFLUSH is issued to bcache device, then bcache issues new empty write request with REQ_PREFLUSH to the backing device. What is the purpose of this behavior? It looks like it might be eliminated for the better performance. -- Regards, Aleksei Zakharov alexzzz.ru ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: A lot of flush requests to the backing device 2021-11-05 11:21 A lot of flush requests to the backing device Aleksei Zakharov @ 2021-11-08 5:38 ` Dongdong Tao 2021-11-08 6:35 ` Kai Krakow 2021-11-10 14:35 ` A lot of flush requests to the backing device Aleksei Zakharov 0 siblings, 2 replies; 6+ messages in thread From: Dongdong Tao @ 2021-11-08 5:38 UTC (permalink / raw) To: Aleksei Zakharov; +Cc: linux-bcache [Sorry for the Spam detection ... ] Hi Aleksei, This is a very interesting finding, I understand that ceph blustore will issue fdatasync requests when it tries to flush data or metadata (via bluefs) to the OSD device. But I'm surprised to see so much pressure it can bring to the backing device. May I know how do you measure the number of flush requests to the backing device per second that is sent from the bcache with the REQ_PREFLUSH flag? (ftrace to some bcache tracepoint ?) My understanding is that the bcache doesn't need to wait for the flush requests to be completed from the backing device in order to finish the write request, since it used a new bio "flush" for the backing device. So I don't think this will increase the fdatasync latency as long as the write can be performed in a writeback mode. It does increase the read latency if the read io missed the cache. Or maybe I am missing something, let me know how did you observe the latency increasing from bcache layer , I would want to do some experiments as well? Regards, Dongdong On Fri, Nov 5, 2021 at 7:21 PM Aleksei Zakharov <zakharov.a.g@yandex.ru> wrote: > > Hi all, > > I've used bcache a lot for the last three years, mostly in writeback mode with ceph, and I faced a strange behavior. When there's a heavy write load on the bcache device with a lot of fsync()/fdatasync() requests, the bcache device issues a lot of flush requests to the backing device. If the writeback rate is low, then there might be hundreds of flush requests per second issued to the backing device. > > If the writeback rate growths, then latency of the flush requests increases. And latency of the bcache device increases as a result and the application experiences higher disk latency. So, this behavior of bcache slows the application in it's I/O requests when writeback rate becomes high. > > This workload pattern with a lot of fsync()/fdatasync() requests is a common for a latency-sensitive applications. And it seems that this bcache behavior slows down this type of workloads. > > As I understand, if a write request with REQ_PREFLUSH is issued to bcache device, then bcache issues new empty write request with REQ_PREFLUSH to the backing device. What is the purpose of this behavior? It looks like it might be eliminated for the better performance. > > -- > Regards, > Aleksei Zakharov > alexzzz.ru ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: A lot of flush requests to the backing device 2021-11-08 5:38 ` Dongdong Tao @ 2021-11-08 6:35 ` Kai Krakow 2021-11-08 8:11 ` Coly Li 2021-11-10 14:35 ` A lot of flush requests to the backing device Aleksei Zakharov 1 sibling, 1 reply; 6+ messages in thread From: Kai Krakow @ 2021-11-08 6:35 UTC (permalink / raw) To: Dongdong Tao; +Cc: Aleksei Zakharov, linux-bcache Am Mo., 8. Nov. 2021 um 06:38 Uhr schrieb Dongdong Tao <dongdong.tao@canonical.com>: > > My understanding is that the bcache doesn't need to wait for the flush > requests to be completed from the backing device in order to finish > the write request, since it used a new bio "flush" for the backing > device. That's probably true for requests going to the writeback cache. But requests that bypass the cache must also pass the flush request to the backing device - otherwise it would violate transactional guarantees. bcache still guarantees the presence of the dirty data when it later replays all dirty data to the backing device (and it can probably reduce flushes here and only flush just before removing the writeback log from its cache). Personally, I've turned writeback caching off due to increasingly high latencies as seen by applications [1]. Writes may be slower throughput-wise but overall latency is lower which "feels" faster. I wonder if maybe a lot of writes with flush requests may bypass the cache... That said, initial releases of bcache felt a lot smoother here. But I'd like to add that I only ever used it for desktop workflows, I never used ceph. Regards, Kai [1]: And some odd behavior where bcache would detach dirty caches on caching device problems, which happens for me sometimes at reboot just after bcache was detected (probably due to a SSD firmware hiccup, the device temporarily goes missing and re-appears) - and then all dirty data is lost and discarded. In consequence, on next reboot, cache mode is set to "none" and the devices need to be re-attached. But until then, dirty data is long gone. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: A lot of flush requests to the backing device 2021-11-08 6:35 ` Kai Krakow @ 2021-11-08 8:11 ` Coly Li 2021-11-08 11:29 ` Latency, performance, detach behavior (was: A lot of flush requests to the backing device) Kai Krakow 0 siblings, 1 reply; 6+ messages in thread From: Coly Li @ 2021-11-08 8:11 UTC (permalink / raw) To: Kai Krakow; +Cc: Aleksei Zakharov, Dongdong Tao, linux-bcache On 11/8/21 2:35 PM, Kai Krakow wrote: > Am Mo., 8. Nov. 2021 um 06:38 Uhr schrieb Dongdong Tao > <dongdong.tao@canonical.com>: >> My understanding is that the bcache doesn't need to wait for the flush >> requests to be completed from the backing device in order to finish >> the write request, since it used a new bio "flush" for the backing >> device. > That's probably true for requests going to the writeback cache. But > requests that bypass the cache must also pass the flush request to the > backing device - otherwise it would violate transactional guarantees. > bcache still guarantees the presence of the dirty data when it later > replays all dirty data to the backing device (and it can probably > reduce flushes here and only flush just before removing the writeback > log from its cache). > > Personally, I've turned writeback caching off due to increasingly high > latencies as seen by applications [1]. Writes may be slower > throughput-wise but overall latency is lower which "feels" faster. > > I wonder if maybe a lot of writes with flush requests may bypass the cache... > > That said, initial releases of bcache felt a lot smoother here. But > I'd like to add that I only ever used it for desktop workflows, I > never used ceph. > > Regards, > Kai > > [1]: And some odd behavior where bcache would detach dirty caches on > caching device problems, which happens for me sometimes at reboot just > after bcache was detected (probably due to a SSD firmware hiccup, the > device temporarily goes missing and re-appears) - and then all dirty > data is lost and discarded. In consequence, on next reboot, cache mode > is set to "none" and the devices need to be re-attached. But until > then, dirty data is long gone. Just an off topic question, when you experienced the above situation, what is the kernel version for this? We recently have a bkey oversize regression triggered in Linux v5.12 or v5.13, which behaved quite similar to the above description. The issue was fixed in Linux v5.13 by the following commits, commit 1616a4c2ab1a ("bcache: remove bcache device self-defined readahead") commit 41fe8d088e96 ("bcache: avoid oversized read request in cache missing code path") Coly Li ^ permalink raw reply [flat|nested] 6+ messages in thread
* Latency, performance, detach behavior (was: A lot of flush requests to the backing device) 2021-11-08 8:11 ` Coly Li @ 2021-11-08 11:29 ` Kai Krakow 0 siblings, 0 replies; 6+ messages in thread From: Kai Krakow @ 2021-11-08 11:29 UTC (permalink / raw) To: Coly Li; +Cc: Aleksei Zakharov, Dongdong Tao, linux-bcache Am Mo., 8. Nov. 2021 um 09:11 Uhr schrieb Coly Li <colyli@suse.de>: > On 11/8/21 2:35 PM, Kai Krakow wrote: > > [1]: And some odd behavior where bcache would detach dirty caches on > > caching device problems, which happens for me sometimes at reboot just > > after bcache was detected (probably due to a SSD firmware hiccup, the > > device temporarily goes missing and re-appears) - and then all dirty > > data is lost and discarded. In consequence, on next reboot, cache mode > > is set to "none" and the devices need to be re-attached. But until > > then, dirty data is long gone. > > Just an off topic question, when you experienced the above situation, > what is the kernel version for this? > We recently have a bkey oversize regression triggered in Linux v5.12 or > v5.13, which behaved quite similar to the above description. > The issue was fixed in Linux v5.13 by the following commits, You mean exactly the above mentioned situation? Or the latency problems? I'm using LTS kernels, that is currently the 5.10 series, and usually I'm updating it as soon as possible. I didn't switch to 5.15 yet. Latency problems: That's a long-standing issue, and may be more related to how btrfs works on top of bcache. It has improved during the course of 5.10 probably due to changes in btrfs. But it seems that using bcache writeback causes more writeback blocking than it should while without bcache writeback, dirty writeback takes longer but doesn't block desktop so much. It may be related to sometimes varying latency performance of Samsung Evo SSD drives. > commit 1616a4c2ab1a ("bcache: remove bcache device self-defined readahead") > commit 41fe8d088e96 ("bcache: avoid oversized read request in cache > missing code path") Without having looked at the commits, this mostly sounds like it would affect latency and performance. So your request was probably NOT about the detach-on-error situation. Just for completeness: That one isn't really a software problem (I'll just ditch Samsung on the next SSD swap, maybe going to Seagate Ironwolf instead which was recommended by Zygo who created bees and works on btrfs). I then expect that situation not to occur again, I never experienced it back when I used Crucial MX (which also had better latency behavior). Since using Samsung SSDs, I've lost parts of EFI more than once (2 MB where just zeroed out in vfat), which didn't happen again since I turned TRIM off (some filesystems or even bcache seem to enable it, the kernel doesn't blacklist the feature for my model). This also caused bcache to sometimes complain about a broken journal structure. But well, this is not the lost-data-on-TRIM situation: Due to the nature of the problem, I cannot really pinpoint when it happened first. The problem is, usually on cold boots, that the SSD firmware would shortly after power-cycle detach from SATA and come back, since I use fast-boot UEFI, that means it can happen when the kernel already booted and bcache loaded. This never happens on a running system, only during boot/POST. The problematic bcache commit introduced a behavior to detach errored caching backends which in turn invalidates dirty cache data. Looking at the cache status after such an incident, the cache mode of the detached members is set to "none", they are no longer attached, but the cache device still has the same amount of data so data of the detached device was not freed from the cache. But on re-attach, dirty data won't be replayed, dirty data stays 0, and btrfs tells me that expected transaction numbers are some 300 generations behind (which is usually not fixable, I was lucky this time because only one btrfs member had dirty data, scrub fixed it). bcache still keeps its usage level (like 90%, or 860GB in my case), and it seems to just discard old "stale" data from before the detach situation. I still think that bcache should not detach backends when the cache device goes missing with dirty data - instead it must reply with IO errors and/or go to read-only mode, until I either manually bring the cache back or decide to resolve the situation by declaring the dirty data as lost manually. Even simple RAID controllers do that: If the cache contents are lost or broken, they won't "auto fix" themselves by purging the cache, they halt on boot telling me that I can either work without the device set, or accept that the dirty data is lost. bcache should go into read-only mode, leave the cache attached but mark it missing/errored, until I decided to either accept the data loss, or resolve the situation with the missing cache device. Another work-around would be if I could instruct bcache to flush all dirty data during shutdown. Regards, Kai ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: A lot of flush requests to the backing device 2021-11-08 5:38 ` Dongdong Tao 2021-11-08 6:35 ` Kai Krakow @ 2021-11-10 14:35 ` Aleksei Zakharov 1 sibling, 0 replies; 6+ messages in thread From: Aleksei Zakharov @ 2021-11-10 14:35 UTC (permalink / raw) To: Dongdong Tao; +Cc: linux-bcache > [Sorry for the Spam detection ... ] > > Hi Aleksei, > > This is a very interesting finding, I understand that ceph blustore > will issue fdatasync requests when it tries to flush data or metadata > (via bluefs) to the OSD device. But I'm surprised to see so much > pressure it can bring to the backing device. > May I know how do you measure the number of flush requests to the > backing device per second that is sent from the bcache with the > REQ_PREFLUSH flag? (ftrace to some bcache tracepoint ?) That was easy: the writeback rate was minimal and there were a lot of write requests to the backing device in iostat -xtd 1 output and bytes/s was too small for that number of writes. It was relatively old kernel, so flushes were not separated in the block layer stats yet. > > My understanding is that the bcache doesn't need to wait for the flush > requests to be completed from the backing device in order to finish > the write request, since it used a new bio "flush" for the backing > device. > So I don't think this will increase the fdatasync latency as long as > the write can be performed in a writeback mode. It does increase the > read latency if the read io missed the cache. Hm, that might be truth for the reads, i'll do some experiments. But, I don't see any reason to send flush request to the backing device if there's nothing to flush. > Or maybe I am missing something, let me know how did you observe the > latency increasing from bcache layer , I would want to do some > experiments as well? I'll do some experiments and come back with more details on the issue in a week! Already quit that job and don't work with ceph anymore, but still thinking about this interesting issue. > > Regards, > Dongdong > > On Fri, Nov 5, 2021 at 7:21 PM Aleksei Zakharov <zakharov.a.g@yandex.ru> wrote: > >> Hi all, >> >> I've used bcache a lot for the last three years, mostly in writeback mode with ceph, and I faced a strange behavior. When there's a heavy write load on the bcache device with a lot of fsync()/fdatasync() requests, the bcache device issues a lot of flush requests to the backing device. If the writeback rate is low, then there might be hundreds of flush requests per second issued to the backing device. >> >> If the writeback rate growths, then latency of the flush requests increases. And latency of the bcache device increases as a result and the application experiences higher disk latency. So, this behavior of bcache slows the application in it's I/O requests when writeback rate becomes high. >> >> This workload pattern with a lot of fsync()/fdatasync() requests is a common for a latency-sensitive applications. And it seems that this bcache behavior slows down this type of workloads. >> >> As I understand, if a write request with REQ_PREFLUSH is issued to bcache device, then bcache issues new empty write request with REQ_PREFLUSH to the backing device. What is the purpose of this behavior? It looks like it might be eliminated for the better performance. >> >> -- >> Regards, >> Aleksei Zakharov >> alexzzz.ru -- Regards, Aleksei Zakharov alexzzz.ru ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-11-10 14:35 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-05 11:21 A lot of flush requests to the backing device Aleksei Zakharov 2021-11-08 5:38 ` Dongdong Tao 2021-11-08 6:35 ` Kai Krakow 2021-11-08 8:11 ` Coly Li 2021-11-08 11:29 ` Latency, performance, detach behavior (was: A lot of flush requests to the backing device) Kai Krakow 2021-11-10 14:35 ` A lot of flush requests to the backing device Aleksei Zakharov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).