From: Ming Lei <ming.lei@redhat.com> To: Stephen Rust <srust@blockbridge.com> Cc: Rob Townley <rob.townley@gmail.com>, Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, linux-scsi@vger.kernel.org, martin.petersen@oracle.com, target-devel@vger.kernel.org Subject: Re: Data corruption in kernel 5.1+ with iSER attached ramdisk Date: Thu, 28 Nov 2019 17:12:10 +0800 [thread overview] Message-ID: <20191128091210.GC15549@ming.t460p> (raw) In-Reply-To: <CAAFE1bfsXsKGyw7SU_z4NanT+wmtuJT=XejBYbHHMCDQwm73sw@mail.gmail.com> On Wed, Nov 27, 2019 at 11:14:46PM -0500, Stephen Rust wrote: > Hi, > > Thanks for your reply. > > I agree it does seem surprising that the git bisect pointed to this > particular commit when tracking down this issue. > > The ramdisk we export in LIO is a standard "brd" module ramdisk (ie: > /dev/ram*). We configure it as a "block" backstore in LIO, not using the > built-in LIO ramdisk. Then it isn't strange any more, since iblock code uses bio interface. > > LIO configuration is as follows: > > o- backstores .......................................................... > [...] > | o- block .............................................. [Storage > Objects: 1] > | | o- Blockbridge-952f0334-2535-5fae-9581-6c6524165067 > [/dev/ram-bb.952f0334-2535-5fae-9581-6c6524165067.cm2 (16.0MiB) write-thru > activated] > | | o- alua ............................................... [ALUA > Groups: 1] > | | o- default_tg_pt_gp ................... [ALUA state: > Active/optimized] > | o- fileio ............................................. [Storage > Objects: 0] > | o- pscsi .............................................. [Storage > Objects: 0] > | o- ramdisk ............................................ [Storage > Objects: 0] > o- iscsi ........................................................ > [Targets: 1] > | o- > iqn.2009-12.com.blockbridge:rda:1:952f0334-2535-5fae-9581-6c6524165067:rda > [TPGs: 1] > | o- tpg1 ...................................... [no-gen-acls, auth > per-acl] > | o- acls ...................................................... > [ACLs: 1] > | | o- iqn.1994-05.com.redhat:115ecc56a5c .. [mutual auth, Mapped > LUNs: 1] > | | o- mapped_lun0 [lun0 > block/Blockbridge-952f0334-2535-5fae-9581-6c6524165067 (rw)] > | o- luns ...................................................... > [LUNs: 1] > | | o- lun0 [block/Blockbridge-952f0334-2535-5fae-9581-6c6524165067 > (/dev/ram-bb.952f0334-2535-5fae-9581-6c6524165067.cm2) (default_tg_pt_gp)] > | o- portals ................................................ > [Portals: 1] > | o- 0.0.0.0:3260 ............................................... > [iser] > > > iSER is the iSCSI extension for RDMA, and it is important to note that we > have _only_ reproduced this when the writes occur over RDMA, with the > target portal in LIO having enabled "iser". The iscsi client (using > iscsiadm) connects to the target directly over iSER. We use the Mellanox > ConnectX-5 Ethernet NICs (mlx5* module) for this purpose, which utilizes > RoCE (RDMA over Converged Ethernet) instead of TCP. I may get one machine with Mellanox NIC, is it easy to setup & reproduce just in the local machine(both host and target are setup on same machine)? > > The identical ramdisk configuration using TCP/IP target in LIO has _not_ > reproduced this issue for us. Yeah, I just tried iblock over brd, and can't reproduce it. > > I installed bcc and used the stackcount tool to trace rd_execute_rw, but I > suspect because we are not using the built-in LIO ramdisk this did not > catch anything. Are there other function traces we can provide for you? Please try to trace bio_add_page() a bit via 'bpftrace ./ilo.bt'. [root@ktest-01 func]# cat ilo.bt kprobe:iblock_execute_rw { @start[tid]=1; } kretprobe:iblock_execute_rw { @start[tid]=0; } kprobe:bio_add_page /@start[tid]/ { printf("%d %d\n", arg2, arg3); } Thanks, Ming
WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com> To: Stephen Rust <srust@blockbridge.com> Cc: Rob Townley <rob.townley@gmail.com>, Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, linux-scsi@vger.kernel.org, martin.petersen@oracle.com, target-devel@vger.kernel.org Subject: Re: Data corruption in kernel 5.1+ with iSER attached ramdisk Date: Thu, 28 Nov 2019 09:12:10 +0000 [thread overview] Message-ID: <20191128091210.GC15549@ming.t460p> (raw) In-Reply-To: <CAAFE1bfsXsKGyw7SU_z4NanT+wmtuJT=XejBYbHHMCDQwm73sw@mail.gmail.com> On Wed, Nov 27, 2019 at 11:14:46PM -0500, Stephen Rust wrote: > Hi, > > Thanks for your reply. > > I agree it does seem surprising that the git bisect pointed to this > particular commit when tracking down this issue. > > The ramdisk we export in LIO is a standard "brd" module ramdisk (ie: > /dev/ram*). We configure it as a "block" backstore in LIO, not using the > built-in LIO ramdisk. Then it isn't strange any more, since iblock code uses bio interface. > > LIO configuration is as follows: > > o- backstores .......................................................... > [...] > | o- block .............................................. [Storage > Objects: 1] > | | o- Blockbridge-952f0334-2535-5fae-9581-6c6524165067 > [/dev/ram-bb.952f0334-2535-5fae-9581-6c6524165067.cm2 (16.0MiB) write-thru > activated] > | | o- alua ............................................... [ALUA > Groups: 1] > | | o- default_tg_pt_gp ................... [ALUA state: > Active/optimized] > | o- fileio ............................................. [Storage > Objects: 0] > | o- pscsi .............................................. [Storage > Objects: 0] > | o- ramdisk ............................................ [Storage > Objects: 0] > o- iscsi ........................................................ > [Targets: 1] > | o- > iqn.2009-12.com.blockbridge:rda:1:952f0334-2535-5fae-9581-6c6524165067:rda > [TPGs: 1] > | o- tpg1 ...................................... [no-gen-acls, auth > per-acl] > | o- acls ...................................................... > [ACLs: 1] > | | o- iqn.1994-05.com.redhat:115ecc56a5c .. [mutual auth, Mapped > LUNs: 1] > | | o- mapped_lun0 [lun0 > block/Blockbridge-952f0334-2535-5fae-9581-6c6524165067 (rw)] > | o- luns ...................................................... > [LUNs: 1] > | | o- lun0 [block/Blockbridge-952f0334-2535-5fae-9581-6c6524165067 > (/dev/ram-bb.952f0334-2535-5fae-9581-6c6524165067.cm2) (default_tg_pt_gp)] > | o- portals ................................................ > [Portals: 1] > | o- 0.0.0.0:3260 ............................................... > [iser] > > > iSER is the iSCSI extension for RDMA, and it is important to note that we > have _only_ reproduced this when the writes occur over RDMA, with the > target portal in LIO having enabled "iser". The iscsi client (using > iscsiadm) connects to the target directly over iSER. We use the Mellanox > ConnectX-5 Ethernet NICs (mlx5* module) for this purpose, which utilizes > RoCE (RDMA over Converged Ethernet) instead of TCP. I may get one machine with Mellanox NIC, is it easy to setup & reproduce just in the local machine(both host and target are setup on same machine)? > > The identical ramdisk configuration using TCP/IP target in LIO has _not_ > reproduced this issue for us. Yeah, I just tried iblock over brd, and can't reproduce it. > > I installed bcc and used the stackcount tool to trace rd_execute_rw, but I > suspect because we are not using the built-in LIO ramdisk this did not > catch anything. Are there other function traces we can provide for you? Please try to trace bio_add_page() a bit via 'bpftrace ./ilo.bt'. [root@ktest-01 func]# cat ilo.bt kprobe:iblock_execute_rw { @start[tid]=1; } kretprobe:iblock_execute_rw { @start[tid]=0; } kprobe:bio_add_page /@start[tid]/ { printf("%d %d\n", arg2, arg3); } Thanks, Ming
next prev parent reply other threads:[~2019-11-28 9:12 UTC|newest] Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <CAAFE1bd9wuuobpe4VK7Ty175j7mWT+kRmHCNhVD+6R8MWEAqmw@mail.gmail.com> 2019-11-28 1:57 ` Data corruption in kernel 5.1+ with iSER attached ramdisk Ming Lei 2019-11-28 1:57 ` Ming Lei [not found] ` <CA+VdTb_-CGaPjKUQteKVFSGqDz-5o-tuRRkJYqt8B9iOQypiwQ@mail.gmail.com> 2019-11-28 2:58 ` Ming Lei 2019-11-28 2:58 ` Ming Lei [not found] ` <CAAFE1bfsXsKGyw7SU_z4NanT+wmtuJT=XejBYbHHMCDQwm73sw@mail.gmail.com> 2019-11-28 4:25 ` Stephen Rust 2019-11-28 4:25 ` Stephen Rust 2019-11-28 5:51 ` Rob Townley 2019-11-28 5:51 ` Rob Townley 2019-11-28 9:12 ` Ming Lei [this message] 2019-11-28 9:12 ` Ming Lei 2019-12-02 18:42 ` Stephen Rust 2019-12-02 18:42 ` Stephen Rust 2019-12-03 0:58 ` Ming Lei 2019-12-03 0:58 ` Ming Lei 2019-12-03 3:04 ` Stephen Rust 2019-12-03 3:04 ` Stephen Rust 2019-12-03 3:14 ` Ming Lei 2019-12-03 3:14 ` Ming Lei 2019-12-03 3:26 ` Stephen Rust 2019-12-03 3:26 ` Stephen Rust 2019-12-03 3:50 ` Stephen Rust 2019-12-03 3:50 ` Stephen Rust 2019-12-03 12:45 ` Ming Lei 2019-12-03 12:45 ` Ming Lei 2019-12-03 19:56 ` Stephen Rust 2019-12-03 19:56 ` Stephen Rust 2019-12-04 1:05 ` Ming Lei 2019-12-04 1:05 ` Ming Lei 2019-12-04 17:23 ` Stephen Rust 2019-12-04 17:23 ` Stephen Rust 2019-12-04 23:02 ` Ming Lei 2019-12-04 23:02 ` Ming Lei 2019-12-05 0:16 ` Bart Van Assche 2019-12-05 0:16 ` Bart Van Assche 2019-12-05 14:44 ` Stephen Rust 2019-12-05 14:44 ` Stephen Rust 2019-12-05 2:28 ` Stephen Rust 2019-12-05 2:28 ` Stephen Rust 2019-12-05 3:05 ` Ming Lei 2019-12-05 3:05 ` Ming Lei 2019-12-05 9:17 ` Sagi Grimberg 2019-12-05 9:17 ` Sagi Grimberg 2019-12-05 14:36 ` Stephen Rust 2019-12-05 14:36 ` Stephen Rust [not found] ` <CAAFE1beqFBQS_zVYEXFTD2qu8PAF9hBSW4j1k9ZD6MhU_gWg5Q@mail.gmail.com> 2020-03-25 0:15 ` Sagi Grimberg 2020-03-25 0:15 ` Sagi Grimberg 2020-03-30 17:08 ` Stephen Rust 2020-03-30 17:08 ` Stephen Rust 2020-03-31 1:07 ` Sagi Grimberg 2020-03-31 1:07 ` Sagi Grimberg 2020-04-01 0:38 ` Sagi Grimberg 2020-04-01 0:38 ` Sagi Grimberg 2020-04-02 20:03 ` Stephen Rust 2020-04-02 20:03 ` Stephen Rust 2020-04-02 22:16 ` Sagi Grimberg 2020-04-02 22:16 ` Sagi Grimberg 2019-12-04 2:39 ` Ming Lei 2019-12-04 2:39 ` Ming Lei 2019-12-03 4:15 ` Ming Lei 2019-12-03 4:15 ` Ming Lei
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191128091210.GC15549@ming.t460p \ --to=ming.lei@redhat.com \ --cc=axboe@kernel.dk \ --cc=hch@lst.de \ --cc=linux-block@vger.kernel.org \ --cc=linux-rdma@vger.kernel.org \ --cc=linux-scsi@vger.kernel.org \ --cc=martin.petersen@oracle.com \ --cc=rob.townley@gmail.com \ --cc=srust@blockbridge.com \ --cc=target-devel@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.