From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932102AbcG2Aue (ORCPT ); Thu, 28 Jul 2016 20:50:34 -0400 Received: from mx.ewheeler.net ([66.155.3.69]:34749 "EHLO mail.ewheeler.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752684AbcG2Auc (ORCPT ); Thu, 28 Jul 2016 20:50:32 -0400 Date: Thu, 28 Jul 2016 17:50:14 -0700 (PDT) From: Eric Wheeler X-X-Sender: lists@mail.ewheeler.net To: linux-block@vger.kernel.org cc: dm-devel@redhat.com, linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, linux-bcache@vger.kernel.org Subject: To add, or not to add, a bio REQ_ROTATIONAL flag Message-ID: User-Agent: Alpine 2.11 (LRH 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello all, With the many SSD caching layers being developed (bcache, dm-cache, dm-writeboost, etc), how could we flag a bio from userspace to indicate whether the bio is preferred to hit spinning disks instead of an SSD? Unnecessary promotions, evections, and writeback increase the write burden on the caching layer and burns out SSDs too fast (TBW), thus requring equipment replacement. Is there already a mechanism for this that could be added to the various caching mechanisms' promote/demote/bypass logic? For example, I would like to prevent backups from influencing the cache eviction logic. Neither do I wish to evict cache due to a bio from a backup process, nor do I wish a bio from the backup process to be cached on the SSD. We would want to bypass the cache for IO that is somehow flagged to bypass block-layer caches and use the rotational disk unless the referenced block already exists on the SSD. There might be two cases here that would be ideal to unify without touching filesystem code: 1) open() of a block device 2) open() on a file such that a filesystem must flag the bio I had considered writing something to detect FADV_SEQUENTIAL/FADV_NOREUSE or `ionice -c3` on a process hitting bcache and modifying check_should_bypass()/should_writeback() to behave as such. However, just because FADV_SEQUENTIAL is flagged doesn't mean the cache should bypass. Filesystems can fragment, and while the file being read may be read sequentially, the blocks on which it resides may not be. Same thing for higher-level block devices such as dm-thinp where one might sequentially read a thin volume but its _tdata might not be in linear order. This may imply that we need a new way to flag cache bypass from userspace that is neither io-priority nor fadvise driven. So what are our options? What might be the best way to do this? If fadvise is the better option, how can a block device driver lookup the fadvise advice from a given bio struct? Can we add an FADV_NOSSD flag since FADV_SEQUENTIAL may be insufficent? Are FADV_NOREUSE/FADV_DONTNEED reasonable candidates? Perhaps ionice could be used used, but the concept of "priority" doesn't exactly encompass the concept of cache-bypass---so is something else needed? Other ideas? -- Eric Wheeler