Tech Insider					     Technology and Trends


			      USENET Archives

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Posted-Date: Tue, 27 Nov 2001 21:36:11 GMT
Original-Date: 	Tue, 27 Nov 2001 21:44:29 +0100
From: f5ibh <f5...@db0bm.ampr.org>
Original-Message-Id: <200111272044.fARKiTv13653@db0bm.ampr.org>
To: linux-ker...@vger.kernel.org
Subject: 2.5.1-pre2 does not compile
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 27 Nov 2001 20:47:25 GMT
Message-ID: <fa.em81drv.1n647p8@ifi.uio.no>
Lines: 36

Hi,

I've the following error :

gcc -D__KERNEL__ -I/usr/src/kernel-sources-2.5.1-pre2/include -Wall
-Wstrict-prototypes -Wno-trigraphs -O2 -fomit-frame-pointer
-fno-strict-aliasing -fno-common -pipe -mpreferred-stack-boundary=2 -march=k6
-DMODULE -DMODVERSIONS -include
/usr/src/kernel-sources-2.5.1-pre2/include/linux/modversions.h   -c -o
aha1542.o aha1542.c
aha1542.c: In function `do_aha1542_intr_handle':
aha1542.c:423: `io_request_lock' undeclared (first use in this function)
aha1542.c:423: (Each undeclared identifier is reported only once
aha1542.c:423: for each function it appears in.)
aha1542.c: In function `aha1542_bus_reset':
aha1542.c:1479: `io_request_lock' undeclared (first use in this function)
aha1542.c: In function `aha1542_host_reset':
aha1542.c:1543: `io_request_lock' undeclared (first use in this function)
aha1542.c: At top level:
aha1542.c:114: warning: `setup_str' defined but not used
make[3]: *** [aha1542.o] Erreur 1
make[3]: Leaving directory `/usr/src/kernel-sources-2.5.1-pre2/drivers/scsi'
make[2]: *** [_modsubdir_scsi] Erreur 2
make[2]: Leaving directory `/usr/src/kernel-sources-2.5.1-pre2/drivers'
make[1]: *** [_mod_drivers] Erreur 2
make[1]: Leaving directory `/usr/src/kernel-sources-2.5.1-pre2'
make: *** [stamp-build] Erreur 2

-------
Regards
		Jean-Luc
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
X-Authentication-Warning: palladium.transmeta.com: 
mail set sender to n...@transmeta.com using -f
To: linux-ker...@vger.kernel.org
From: torva...@transmeta.com (Linus Torvalds)
Subject: Re: 2.5.1-pre2 does not compile
Original-Date: 	Tue, 27 Nov 2001 20:50:09 +0000 (UTC)
Original-Message-ID: <9u0ua1$1g2$1@penguin.transmeta.com>
Original-References: <200111272044.fARKiTv13...@db0bm.ampr.org>
X-Complaints-To: news@transmeta.com
Original-NNTP-Posting-Date: 27 Nov 2001 20:55:23 GMT
Cache-Post-Path: palladium.transmeta.com!unkn...@penguin.transmeta.com
X-Cache: nntpcache 2.4.0b5 (see http://www.nntpcache.org/)
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Transmeta Corporation
Date: Tue, 27 Nov 2001 21:33:25 GMT
Message-ID: <fa.j5ejjiv.112kiit@ifi.uio.no>
References: <fa.em81drv.1n647p8@ifi.uio.no>
Lines: 32
Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news.tele.dk!small.news.tele.dk!213.131.157.171!newsfeed1.wineasy.se!
newsfeed1.ulv.nextra.no!nextra.com!uninett.no!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist

In article <200111272044.fARKiTv13...@db0bm.ampr.org>,
f5ibh  <f5...@db0bm.ampr.org> wrote:
>
>I've the following error :

Yes.

The next-generation block-layer support is starting to be merged into
the 2.5.x tree, and that breaks old drivers that haven't been updated to
the new locking.

In particular, there used to be _one_ lock for the whole IO system
("io_request_lock"), and these days it's a per-block-queue lock.

In many cases the fix is as simple as just replacing the
"io_request_lock" with "host->host_lock", but sometimes this is
complicated by the need to pass the right data structures down far
enough..

Many drivers have been converted (ie IDE, symbios, aic7xxx etc), but
many more have not (especially older SCSI drivers, in your case it's the
classic aha1542).

It will probably take some time until most drivers have been converted.
Tested patches are more than welcome,

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
From: Paul Mackerras <pau...@samba.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Original-Message-ID: <15364.3457.368582.994067@gargle.gargle.HOWL>
Original-Date: 	Wed, 28 Nov 2001 09:02:41 +1100 (EST)
To: torva...@transmeta.com (Linus Torvalds)
Cc: linux-ker...@vger.kernel.org
Subject: Re: 2.5.1-pre2 does not compile
In-Reply-To: <9u0ua1$1g2$1@penguin.transmeta.com>
Original-References: <200111272044.fARKiTv13...@db0bm.ampr.org>
	<9u0ua1$1g...@penguin.transmeta.com>
X-Mailer: VM 6.75 under Emacs 20.7.2
Reply-To: pau...@samba.org
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 27 Nov 2001 22:06:00 GMT
Message-ID: <fa.hnc6ogv.m56khj@ifi.uio.no>
References: <fa.j5ejjiv.112kiit@ifi.uio.no>
Lines: 26

Linus Torvalds writes:

> The next-generation block-layer support is starting to be merged into
> the 2.5.x tree, and that breaks old drivers that haven't been updated to
> the new locking.
> 
> In particular, there used to be _one_ lock for the whole IO system
> ("io_request_lock"), and these days it's a per-block-queue lock.
> 
> In many cases the fix is as simple as just replacing the
> "io_request_lock" with "host->host_lock", but sometimes this is
> complicated by the need to pass the right data structures down far
> enough..

Is there a description of the new block layer and its interface to
block device drivers somewhere?  That would be helpful, since Ben
Herrenschmidt and I are going to have to convert several
powermac-specific drivers.

Thanks,
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
newsfeeds.belnet.be!news.belnet.be!news1.ebone.net!news.ebone.net!
news.net.uni-c.dk!uninett.no!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Tue, 27 Nov 2001 23:09:55 +0100
Original-Message-Id: <200111272209.fARM9tk18991@ns.caldera.de>
From: Christoph Hellwig <h...@ns.caldera.de>
To: torva...@transmeta.com (Linus Torvalds)
Cc: linux-ker...@vger.kernel.org
Subject: Re: 2.5.1-pre2 does not compile
X-Newsgroups: caldera.lists.linux.kernel
In-Reply-To: <9u0ua1$1g2$1@penguin.transmeta.com>
User-Agent: tin/1.4.4-20000803 ("Vet for the Insane") (UNIX) (Linux/2.4.2 (i686))
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 27 Nov 2001 22:11:51 GMT
Message-ID: <fa.egnj7iv.1snet06@ifi.uio.no>
References: <fa.j5ejjiv.112kiit@ifi.uio.no>
Lines: 26

In article <9u0ua1$1g...@penguin.transmeta.com> you wrote:
> In many cases the fix is as simple as just replacing the
> "io_request_lock" with "host->host_lock", but sometimes this is
> complicated by the need to pass the right data structures down far
> enough..
>
> Many drivers have been converted (ie IDE, symbios, aic7xxx etc), but
> many more have not (especially older SCSI drivers, in your case it's the
> classic aha1542).
>
> It will probably take some time until most drivers have been converted.
> Tested patches are more than welcome,

While we are at breaking scsi, would you take a patch to remove the
old-style (2.0) scsi error handling completly, forcing drivers still
using it to be fixed?  Early 2.5 looks like a good time for that to me..

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
X-Authentication-Warning: penguin.transmeta.com: torvalds owned process doing -bs
Original-Date: 	Tue, 27 Nov 2001 17:04:46 -0800 (PST)
From: Linus Torvalds <torva...@transmeta.com>
To: Paul Mackerras <pau...@samba.org>, Jens Axboe <ax...@suse.de>
cc: <linux-ker...@vger.kernel.org>
Subject: Re: 2.5.1-pre2 does not compile
In-Reply-To: <15364.3457.368582.994067@gargle.gargle.HOWL>
Original-Message-ID: <Pine.LNX.4.33.0111271701140.1629-100000@penguin.transmeta.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Wed, 28 Nov 2001 01:12:04 GMT
Message-ID: <fa.o9q3cvv.h405j3@ifi.uio.no>
References: <fa.hnc6ogv.m56khj@ifi.uio.no>
Lines: 19


On Wed, 28 Nov 2001, Paul Mackerras wrote:
>
> Is there a description of the new block layer and its interface to
> block device drivers somewhere?  That would be helpful, since Ben
> Herrenschmidt and I are going to have to convert several
> powermac-specific drivers.

Jens has something written up, which he sent to me as an introduction to
the patch. I'll send that out unless he does a cleaned-up version, but I'd
actually prefer for him to do the sending. Jens?

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
X-Authentication-Warning: penguin.transmeta.com: torvalds owned process doing -bs
Original-Date: 	Tue, 27 Nov 2001 16:29:17 -0800 (PST)
From: Linus Torvalds <torva...@transmeta.com>
To: Christoph Hellwig <h...@ns.caldera.de>
cc: <linux-ker...@vger.kernel.org>
Subject: Re: 2.5.1-pre2 does not compile
In-Reply-To: <200111272209.fARM9tk18991@ns.caldera.de>
Original-Message-ID: <Pine.LNX.4.33.0111271628430.1629-100000@penguin.transmeta.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Wed, 28 Nov 2001 00:37:45 GMT
Message-ID: <fa.ob9vdfv.jkc53a@ifi.uio.no>
References: <fa.egnj7iv.1snet06@ifi.uio.no>
Lines: 17


On Tue, 27 Nov 2001, Christoph Hellwig wrote:
>
> While we are at breaking scsi, would you take a patch to remove the
> old-style (2.0) scsi error handling completly, forcing drivers still
> using it to be fixed?  Early 2.5 looks like a good time for that to me..

I agree, that sounds like a good thing, and as I consider the block layer
to be one of the major pushes for 2.5.x it makes perfect sense.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Wed, 28 Nov 2001 07:58:50 +0100
From: Jens Axboe <ax...@suse.de>
To: Linus Torvalds <torva...@transmeta.com>
Cc: Paul Mackerras <pau...@samba.org>, linux-ker...@vger.kernel.org
Subject: Re: 2.5.1-pre2 does not compile
Original-Message-ID: <20011128075850.D23858@suse.de>
Original-References: <15364.3457.368582.994...@gargle.gargle.HOWL> 
<Pine.LNX.4.33.0111271701140.1629-100...@penguin.transmeta.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.33.0111271701140.1629-100000@penguin.transmeta.com>
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Wed, 28 Nov 2001 07:00:52 GMT
Message-ID: <fa.e0hk34v.145ecqg@ifi.uio.no>
References: <fa.o9q3cvv.h405j3@ifi.uio.no>
Lines: 24

On Tue, Nov 27 2001, Linus Torvalds wrote:
> 
> On Wed, 28 Nov 2001, Paul Mackerras wrote:
> >
> > Is there a description of the new block layer and its interface to
> > block device drivers somewhere?  That would be helpful, since Ben
> > Herrenschmidt and I are going to have to convert several
> > powermac-specific drivers.
> 
> Jens has something written up, which he sent to me as an introduction to
> the patch. I'll send that out unless he does a cleaned-up version, but I'd
> actually prefer for him to do the sending. Jens?

No problem, I'll clean it up and send it out. I also planned on doing a
specific guide to converting drivers to exploit the new features.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Wed, 28 Nov 2001 13:20:00 +0100
From: Jens Axboe <ax...@suse.de>
To: Linus Torvalds <torva...@transmeta.com>
Cc: Paul Mackerras <pau...@samba.org>, linux-ker...@vger.kernel.org
Subject: bio write-up (was: Re: 2.5.1-pre2 does not compile)
Original-Message-ID: <20011128132000.T23858@suse.de>
Original-References: <15364.3457.368582.994...@gargle.gargle.HOWL> 
<Pine.LNX.4.33.0111271701140.1629-100...@penguin.transmeta.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.33.0111271701140.1629-100000@penguin.transmeta.com>
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Wed, 28 Nov 2001 12:23:30 GMT
Message-ID: <fa.duhc2cv.1656d28@ifi.uio.no>
References: <fa.o9q3cvv.h405j3@ifi.uio.no>
Lines: 348

On Tue, Nov 27 2001, Linus Torvalds wrote:
> 
> On Wed, 28 Nov 2001, Paul Mackerras wrote:
> >
> > Is there a description of the new block layer and its interface to
> > block device drivers somewhere?  That would be helpful, since Ben
> > Herrenschmidt and I are going to have to convert several
> > powermac-specific drivers.
> 
> Jens has something written up, which he sent to me as an introduction to
> the patch. I'll send that out unless he does a cleaned-up version, but I'd
> actually prefer for him to do the sending. Jens?

Ok, here's the stuff I sent Linus + some extra comments on how exactly
it differs for stuff like drivers.

--->
o io_request_lock removal. it's completely gone, although lots of SCSI
drivers still need to be fixed (with it gone they break during compile
though, which is what we want). locking is per-queue (q->queue_lock),
the locking semantics are the same though (locking is still imposed by
the block layer, grabbing the lock before request_fn execution -- I want
to keep it this way, because it means we can leave lots of older drivers
alone and they will automagically be SMP safe still. clever drivers are
free to drop the queue lock whenever they want, of course).

SCSI mid layer is in an ok state, low level drivers that currently work
are sym, sym2, aic7xxx_old, and aic7xxx. IDE should be completely ok,
basically it relies on a new ide_lock to keep the same serialization
that it currently has. Keeping a couple of disks busy in IDE didn't show
this as being a big bottleneck at all, I guess we can later optimize
this if we really want... The "other" block drivers like DAC960,
cpqarray, and cciss are fixed too although at least DAC960 may have
multi-page bio quirks. Not a big deal at this point since only raw I/O
is the user that.


o main unit of I/O is the bio, not the buffer_head. this is the really
major change that requires updates to basically all block drivers. the
exception are drivers that only reference CURRENT and CURRENT->buffer
for transfers (hence forth called "old-style drivers", they will still
work as usual. Any driver that currently traverse segments in a request
must be updated.

A bio has no virtual mapping of the data at all. This basically forces
highmem support on drivers, at least it is no different than doing low
memory I/O which is what I think matters. Of course bounce limits can be
set by the driver. A bio looks like this:

struct bio {
	sector_t		bi_sector;
	struct bio		*bi_next;	/* request queue link */
	atomic_t		bi_cnt;		/* pin count */
	kdev_t			bi_dev;		/* will be block device */
	struct bio_vec_list	*bi_io_vec;
	unsigned long		bi_flags;	/* status, command, etc */
	unsigned long		bi_rw;		/* bottom bits READ/WRITE,
						 * top bits priority
						 */
	int (*bi_end_io)(struct bio *bio, int nr_sectors);
	void			*bi_private;

	void (*bi_destructor)(struct bio *);	/* destructor */
};

Most of the above is self-explanatory, so I'll just brief you on some of
the differences from buffer_head. bi_io_vec is the actual io vector,
it's composed of these:

struct bio_vec {
	struct page	*bv_page;
	unsigned int	bv_len;
	unsigned int	bv_offset;
};

struct bio_vec_list {
	unsigned int	bvl_cnt;	/* how may bio_vec's */
	unsigned int	bvl_idx;	/* current index into bvl_vec */
	unsigned int	bvl_size;	/* total size in bytes */
	unsigned int	bvl_max;	/* max bvl_vecs we can hold, used
					   as index into pool */
	struct bio_vec	bvl_vec[0];	/* the iovec array */
};

The main reason for doing an array like this is that it enables easy
pre-clustering of pages for I/O submission. Stuff like read_cluster, raw
I/O, O_DIRECT etc can queue nice big portions of pages at once. And XFS
will really like it too. Also, the vm can pre-alloc a bio + veclist and
do it's own merging if it wants to (again XFS, I know they do this on
Irix (well sort of, pass it down differently, the merging at the vm
layer is what I'm comparing it to)). The I/O scheduler still uses
bi_next to singly link merges at that level.

The end_io callback now has a number of sectors argument, basically so
we don't have to loop around ending requests. No uptodate flag is passed
down, all that can be fitted in the bi_flags now. One bit of completion
info was never enough anyways.

bi_desctructor is only there because a bio can originate from different
sources. Normally it will be gotten with bio_alloc(int gfp_mask, int
nr_iovecs) and the default destructor frees the necessary things on I/O
completion. A bio may come from other sources though, or it may be a
cloned or even a copied bio (ie loop can clone or copy a bio (similar to
network skb stuff) with bio_clone/bio_copy), or maybe someone using just
kmalloc for allocating the bio.

Traversal of data segments in a request becomes even more hairy at this
point though, since a request still holds a list of bios which in turn
can hold a map of segments. I've always thought that drivers had to
initimate knowledge of how that works though, so I've moved this to
something like:

	rq_for_each_bio(bio, rq)
		bio_for_each_segment(bio_vec, bio, i)
			/* bio_vec is now current segment */

which I think we should have had a long time ago. Then we are also free
to change the implementation details later without having to change
drivers again :-). I should mention that ll_rw_blk provides a
blk_rq_map_sg helper to map a struct request to a scatterlist, so we can
rip the scatterlist build out of a lot of drivers. They can simply do

	nr_segments = blk_rq_map_sg(q, rq, scatterlist);

I can say a lot more about the bio stuff, but it's probably better if
you just start with commenting/flaming the above and we can go from
there :-). I'll move on to a few more things...

o head active queues are a thing of the past. We never should have
exposted the list implementation of requests to the drivers -- I've
changed this so drivers merely do

	rq = elv_next_request(q);

which is the next request that the I/O scheduler thinks should be
handled. That enables us to easily mark requests as active or not, so
the elevator knows not to touch it. Then this head-active crap is
handled transparently instead. Also, it enables us to mark more than the
first request as untouchable, something we've needed for a while.

...

I'll finish off for now with a status... Basically the tree is stable
for IDE and the SCSI stuff listed, same with cpqarray etc. loop probably
needs a bit of fixing, but not much. O_DIRECT is currently a kludge, the
kio blocks/bh stuff is a MESS that needs resolving real soon! Raw I/O is
almost there too, we can pull lots of megs/sec with very little sys time
now.

...

<---

Now, I guess what others are mainly interested in is either a) design of
this stuff or b) what to look out for when converting drivers. Wrt a),
read the source :-). For b, read on.

If you are maintaing and old-style driver that just uses CURRENT and
ignores clustered requests, you should be alright and not need any
changes. Maybe you are dropping the io_request_lock from your request_fn
strategy, if so then just replace that with &q->queue_lock instead. The
generic layer will automatically handle clustered requests, multi-page
bios, etc for you. For a low performance driver or hardware that is PIO
driven or just doesn't support scatter-gather, you can stop reading now.

Lets start by looking at how we transferred data in the 2.4 kernels.
There are two structures you need to know about. The first is the
buffer_head, this is the I/O "atom". For a block driver, the relevant
parts of this struct is:

	unsigned long b_rsector;	/* where */
	kdev_t b_rdev;			/* who */

	char *b_data;			/* what */

	struct buffer_head *b_reqnext;	/* next buffer */

and that is basically it. b_rsector is offset from b_rdev which includes
minor stuff for partitions. b_data is the virtual mapping of the buffer
that wants to be read or written. actually, b_data is the virtual
mapping og b_page and a possible offset, but now we are already getting
into 2.4 + block-highmem or 2.5 land. b_reqnext is the next pointer to a
chain of buffer_heads.

Most real-hardware (ie not loop, raid, rd, etc) don't receive the
buffer_heads directly though. Instead they receive a list of struct
requests to handle. struct request contains lots of house keeping info,
the interesting part again for block drivers are:

	unsigned long sector;		/* like b_rsector */
	unsigned long nr_sectors;	/* total number of sectors */
	unsigned long current_nr_sectors; /* "current" ditto of above */
	unsigned int nr_segments;

	char *buffer;			/* like b_data */
	struct buffer_head *bh;		/* first bh in request */
	struct buffer_head *bhtail;	/* last bh in request */

So struct request ties a list of buffer_heads together for handing to
the driver. Performance hardware usually builds a scatter-gather list of
these chunks and sends it off to the driver. nr_sectors contains the
total size of all the buffer_heads linked to the request,
current_nr_sectors only the size of the currently first bh in the list
(ie rq->bh). When I/O is ended on a request, the buffer_heads are pealed
off the list as I/O on them completes.

Now lets take a look at how 2.5 does this differently. Struct
buffer_head now has no relevance for block drivers, it's purely a buffer
cache thing. Instead we deal with struct bio, which I explained the
layout of earlier in this mail. Struct request is basically the same,
except that now we link bio and biotail into it. For a sane block
driver, the request handling loop with bio will look something like:

	do {
		struct request *rq = elv_next_request(q);
		struct scatterlist *sg;
		struct my_driver_cmd *cmd;
		int nr_segments;

		/*
		 * no more to handle
		 */
		if (!rq)
			break;

		blkdev_dequeue_request(rq);

		/*
		 * for queuing, otherwise you will probably just
		 * have a sg structure allocated at init time (see ide)
		 */
		sg = my_driver_alloc_sg(rq->nr_segments);

		/*
		 * block layer helper to map a struct request into
		 * a scatter-gather list
		 */
		nr_segments = blk_rq_map_sg(q, rq, sg);		

		cmd = my_driver_alloc_cmd();

		/*
		 * init hardware command with data direction and sg list
		 */
		my_driver_setup_cmd(cmd, rq->cmd, sg);
		my_driver_queue_cmd(cmd);
	} while (1);

You are free to loop through the request segments yourself using the
technique described higher up of course, however I recommend using the
blk_rq_map_sg helper to handle the grunt of the mapping work. That will
(probably :-) also help you out later if the internals are changed
again.

blk_rq_map_sg will look at several queue properties to handle stuff
like:

- cluster contigious segments into one, if wanted (QUEUE_FLAG_CLUSTER)
- don't allow a clustered segment to cross a 4GB mem boundary
- don't build bigger segments than q->max_segment_size

and probably more.

Other changes that may affect you:

New queue property settings:

	blk_queue_bounce_limit(q, u64 dma_address)
		Enable I/O to highmem pages, dma_address being your
		limit. No highmem default.

	blk_queue_max_sectors(q, max_sectors)
		Maximum size request you can handle in units of 512 byte
		sectors. 255 default.

	blk_queue_max_segments(q, max_segments)
		Maximum segments you can handle in a request. 128
		default.

	blk_queue_max_segment_size(q, max_seg_size)
		Maximum size of a clustered segment, 64kB default.

New queue flags:

	QUEUE_FLAG_NOSPLIT
		can handle a bio with more than one segment. ll_rw_blk
		will split bigger bio's for you if needed (not actually
		implemented yet :-)

	QUEUE_FLAG_CLUSTER
		Explained above.

- struct request ->queue is no more. with the introduction of
  elv_next_request, you are no longer supposed to handle looping
  directly over the request list.

- end_that_request_first takes an additional number_of_sectors argument.
  It used to handle always just the first buffer_head in a request, now
  it will loop and handle as many sectors (on a bio-segment granularity)
  as you want.

- bh->b_end_io is bio->bi_end_io, but you probably want to use
  bio_endio(bio, uptodate, nr_sectors) instead.

- you can set max sector size, max segment size etc per queue now.
  drivers that used to define their own merge functions to handle things
  like this can now just use the blk_queue_* functions at blk_init_queue
  time.

- you no longer have to map a {partition, sector offset} into the
  correct absolute location anymore, this is done by the block layer. so
  when you received a request ala this before:

	rq->rq_dev = MKDEV(3, 5);	/* /dev/hda5 */
	rq->sector = 0;			/* first sector on hda5 */

  you will now see

	rq->rq_dev = MKDEV(3, 0);	/* /dev/hda */
	rq->sector = 123128;		/* offset from start of disk */

- As mentioned, there is no virtual mapping of a bio. For DMA, this is
  not a problem as you probably never will need a virtual mapping.
  Instead you want a bus mapping so you can ship it to the driver. For
  PIO drivers (or drivers that need to revert to PIO transfer once in a
  while (IDE for example)), where the CPU is doing the actual data
  transfer for you, you do need a virtual mapping though. If you are
  supporting highmem I/O, you need to use bio_kmap and bio_kmap_irq to
  temporarily map a bio into the virtual address space. See how IDE
  handles this with ide_map_buffer.

I've lost track of what else there is to explain, so I'll stop now. If
you have problems convering a driver or questions in general, fire away.
Suparana@IBM has written lots of stuff about bio as it was a WIP, see

http://lse.sourceforge.net/io/bionotes.txt

it may not be completely uptodate right now wrt multi-page bios etc, but
I know that is on its way :-)

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

			      USENET Archives


The materials and information included in this website may only be used
for purposes such as criticism, review, private study, scholarship, or 
research.


Electronic mail:			       WorldWideWeb:
   tech-insider@outlook.com			  http://tech-insider.org/