From: Linus Torvalds <torva...@transmeta.com> Subject: 2.2.0-final Date: 1999/01/21 Message-ID: <fa.nm4t96v.1l7uk8s@ifi.uio.no>#1/1 X-Deja-AN: 435278601 Original-Date: Wed, 20 Jan 1999 23:10:42 -0800 (PST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.95.990120224340.23558G-100000@penguin.transmeta.com> To: Kernel Mailing List <linux-ker...@vger.rutgers.edu> X-Authentication-Warning: penguin.transmeta.com: torvalds owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Hoya, there's now a 2.2.0-pre9 on ftp.kernel.org, and when you compile it it will call itself 2.2.0-final. The reason is fairly obvious: enough is enough, and I can't make pre-kernels forever, it just dilutes the whole idea. The only reason the tar-file is not called 2.2.0 is that I want to avoid having any embarrassing typos that cause it to not compile under reasonable configurations or something like that. Unreasonable configurations I no longer care about. Every program has bugs, and I'm sure there are still bugs in this. Get over it - we've done our best, and nobody ever believed that there wouldn't be 2.2.x kernels to fix problems as they come up, and delaying 2.2.0 forever is not an option. I have a wedding anniversary and a company party coming up, so I'm taking a few days off - when I get back I expect to take this current 2.2.0-final and just remove the "-final" from the Makefile, and that will be it. I suspect somebody _will_ find something embarrassing enough that I would fix it too, but let's basically avoid planning on that. In short, before you post a bug-report about 2.2.0-final, I'd like you to have the following simple guidelines: "Is this something Linus would be embarrassed enough about that he would wear a brown paper bag over his head for a month?" and "Is this something that normal people would ever really care deeply about?" If the answer to either question is "probably not", then please consider just politely discussing it as a curiosity on the kernel mailing lists rather than even sending email about it to me: I've been too busy the last few weeks, and I'd really appreciate it if I could just forget the worries of a release for a few days.. But if you find something hilariously stupid I did, feel free to share it with me, and we'll laugh about it together (and I'll avoid wearing the brown paper bag on my head during the month of February). Do we have a deal? I've seen people working on a 2.2.0 announcement, and I'm happy - I've been too busy to think straight, much less worry about details like that. If everything turns out ok, I'll have a few memorable bloopers in my mailbox but nothing worse than that, and I can sit down and actually read the announcement texts that people have been discussing. ObFeatures: - m68k sync - various minor driver fixes (irda, net drivers, scsi, video, isdn) - SGI Visual Workstation support - adjtimex update to the latest standards - vfat silly buglet fix - semaphores work on alpha again - drop the inline strstr() that gcc got wrong whatever we did - kswapd needed to be a bit more aggressive - minor TCP retransmission and delack fixes Until Monday, Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@e-mind.com> Subject: Re: 2.2.0-final Date: 1999/01/23 Message-ID: <fa.ivug2ev.12ga7bk@ifi.uio.no> X-Deja-AN: 436101764 Original-Date: Sat, 23 Jan 1999 21:56:20 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.96.990123210422.2856A-100000@laser.bogus> References: <fa.nm4t96v.1l7uk8s@ifi.uio.no> To: Linus Torvalds <torva...@transmeta.com> X-Sender: and...@laser.bogus Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig X-PgP-Public-Key-URL: http://e-mind.com/~andrea/aa.asc Organization: Internet mailing list MIME-Version: 1.0 Reply-To: Andrea Arcangeli <and...@e-mind.com> Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Wed, 20 Jan 1999, Linus Torvalds wrote: > In short, before you post a bug-report about 2.2.0-final, I'd like you to There are three things from me I think should go in before 2.2.0 real (maybe a normal user would be not too much worried by these two races, it depends also about the definition on `normal user' ;). The first is a fix for a potential swapout deadlock I discovered and fixed some day ago. See my email about the topic with the patch: On Mon, 18 Jan 1999, Andrea Arcangeli wrote: > > Date: Mon, 18 Jan 1999 21:26:05 +0100 (CET) > From: Andrea Arcangeli <and...@e-mind.com> > To: Zlatko Calusic <Zlatko.Calu...@CARNet.hr>, > "Stephen C. Tweedie" <s...@redhat.com>, > Linus Torvalds <torva...@transmeta.com> > Cc: Linux-MM List <linux...@kvack.org>, > Linux Kernel List <linux-ker...@vger.rutgers.edu> > Subject: Re: Removing swap lockmap... > > On 18 Jan 1999, Zlatko Calusic wrote: > > > I removed swap lockmap all together and, to my surprise, I can't > > produce any ill behaviour on my system, not even under very heavy > > swapping (in low memory condition). > > Looking at your patch (and so looking at the swap_lockmap code) I found a > potential deadlock in the current swap_lockmap handling: > > task A task B > ---------- ------------- > rw_swap_page_base() > > ...if (test_and_set_bit(lockmap)) > ... run_task_queue() > swap_after_unlock_page() > ... clear_bit(lockmap) > .... wakeup(&lock_queue) > ...sleep_on(&lock_queue); > deadlocked > > I think it will not harm too much because the window is not too big (but > not small) and because usually one of the process not yet deadlocked will > generate IO and will wakeup also the deadlocked process at I/O > completation time. A very lazy ;) but at the same time obviosly right > (that should not harm performances at all) fix could be to replace the > sleep_on() with a sleep_on_timeout(..., 1). > * patch snipped * > > I think we need the swap_lockmap in the shm case because without swap > cache a swapin could happen at the same time of the swapout because > find_in_swap_cache() won't work there. > > Andrea Arcangeli Here the fix: Index: page_io.c =================================================================== RCS file: /var/cvs/linux/mm/page_io.c,v retrieving revision 1.1.2.1 diff -u -r1.1.2.1 page_io.c --- page_io.c 1999/01/18 01:32:53 1.1.2.1 +++ linux/mm/page_io.c 1999/01/18 20:21:41 @@ -88,7 +88,7 @@ /* Make sure we are the only process doing I/O with this swap page. */ while (test_and_set_bit(offset,p->swap_lockmap)) { run_task_queue(&tq_disk); - sleep_on(&lock_queue); + sleep_on_timeout(&lock_queue, 1); } /* ---------------------------------------------------------------------- The second thing is the complete race fix for the disable/enable_bh(). It's obviously right. Here it is (against 2.2.0-pre8intestingforalan but should apply clean to your tree too): Index: linux/include/asm-i386/softirq.h diff -u linux/include/asm-i386/softirq.h:1.1.1.1 linux/include/asm-i386/softirq.h:1.1.2.2 --- linux/include/asm-i386/softirq.h:1.1.1.1 Mon Jan 18 02:27:17 1999 +++ linux/include/asm-i386/softirq.h Wed Jan 20 07:41:42 1999 @@ -9,24 +9,6 @@ #define get_active_bhs() (bh_mask & bh_active) #define clear_active_bhs(x) atomic_clear_mask((x),&bh_active) -extern inline void init_bh(int nr, void (*routine)(void)) -{ - bh_base[nr] = routine; - atomic_set(&bh_mask_count[nr], 0); - bh_mask |= 1 << nr; -} - -extern inline void remove_bh(int nr) -{ - bh_base[nr] = NULL; - bh_mask &= ~(1 << nr); -} - -extern inline void mark_bh(int nr) -{ - set_bit(nr, &bh_active); -} - #ifdef __SMP__ /* @@ -90,21 +72,49 @@ #endif /* SMP */ +extern inline void init_bh(int nr, void (*routine)(void)) +{ + bh_base[nr] = routine; + bh_mask_count[nr] = 0; + wmb(); + bh_mask |= 1 << nr; +} + +extern inline void remove_bh(int nr) +{ + bh_mask &= ~(1 << nr); + synchronize_bh(); + bh_base[nr] = NULL; +} + +extern inline void mark_bh(int nr) +{ + set_bit(nr, &bh_active); +} + /* * These use a mask count to correctly handle * nested disable/enable calls */ extern inline void disable_bh(int nr) { + unsigned long flags; + + spin_lock_irqsave(&bh_lock, flags); bh_mask &= ~(1 << nr); - atomic_inc(&bh_mask_count[nr]); + bh_mask_count[nr]++; + spin_unlock_irqrestore(&bh_lock, flags); synchronize_bh(); } extern inline void enable_bh(int nr) { - if (atomic_dec_and_test(&bh_mask_count[nr])) + unsigned long flags; + + spin_lock_irqsave(&bh_lock, flags); + if (!--bh_mask_count[nr]) bh_mask |= 1 << nr; + spin_unlock_irqrestore(&bh_lock, flags); } #endif /* __ASM_SOFTIRQ_H */ Index: linux/include/linux/interrupt.h diff -u linux/include/linux/interrupt.h:1.1.1.1 linux/include/linux/interrupt.h:1.1.2.1 --- linux/include/linux/interrupt.h:1.1.1.1 Mon Jan 18 02:27:09 1999 +++ linux/include/linux/interrupt.h Mon Jan 18 02:32:58 1999 @@ -17,7 +17,8 @@ extern volatile unsigned char bh_running; -extern atomic_t bh_mask_count[32]; +extern spinlock_t bh_lock; +extern int bh_mask_count[32]; extern unsigned long bh_active; extern unsigned long bh_mask; extern void (*bh_base[32])(void); Index: linux/kernel/softirq.c diff -u linux/kernel/softirq.c:1.1.1.1 linux/kernel/softirq.c:1.1.2.1 --- linux/kernel/softirq.c:1.1.1.1 Mon Jan 18 02:27:00 1999 +++ linux/kernel/softirq.c Mon Jan 18 02:32:52 1999 @@ -20,7 +20,8 @@ /* intr_count died a painless death... -DaveM */ -atomic_t bh_mask_count[32]; +spinlock_t bh_lock = SPIN_LOCK_UNLOCKED; +int bh_mask_count[32]; unsigned long bh_active = 0; unsigned long bh_mask = 0; void (*bh_base[32])(void); ---------------------------------------------------------------------- The third thing I disagree is to swapout in cluster when shrink_mmap() fails at priority == 6 (or whatever). shrink_mmap() that fails tell nothing about the state of the VM. We could be with 0 phys RAM but with some freeable cache but shrink_mmap could still fail at that stage. This has no trivial fix (I think my new nr_freeable pages balance level will fix it though) and luckily is mostly a performances issue (even if I think it's the cause of the VM slowdown after some day of usage). From a stableness point of view instead I think that the current try_to_free_pages() algorithm is not good because we should do only _one_ (and not count-- until swapout fail) swapout(), if nr_free_pages < freepages.min. This because low memory system SWAP_CLUSTER_MAX (aka 32) is very major than 10 (minimum of freepages.min). Here a patch: Index: vmscan.c =================================================================== RCS file: /var/cvs/linux/mm/vmscan.c,v retrieving revision 1.1.1.3 diff -u -r1.1.1.3 vmscan.c --- vmscan.c 1999/01/23 18:52:32 1.1.1.3 +++ linux/mm/vmscan.c 1999/01/23 20:53:11 @@ -487,6 +487,8 @@ while (swap_out(priority, gfp_mask)) { if (!--count) goto done; + if (nr_free_pages < freepages.min) + break; } shrink_dcache_memory(priority, gfp_mask); But NOTE, I _never_ tried this patch (nor tried compiled it), because I am testing my VM algorithm instead of 2.2.0 ones. Maybe it will harm a bit performances (not too much though) but looks to me strictly _needed_ to me for low memory machines. If somebody would try out the system w and w/o this patch after setting echo 10 >/proc/sys/vm/freepages it would be interesting. ------------ Busy-Linus can stop reading here (now ;) ---------------- BTW, I am running now with my new vm that take stable the number of freeable pages. This VM works greatly here. But I had to change all bh->b_count++ with bget(bh) and implementing bget() this way: extern inline unsigned int bget(struct buffer_head * bh) { buffer_get(bh); return ++bh->b_count; } where buffer_get() is this: extern inline void buffer_get(struct buffer_head *bh) { struct page * page = mem_map + MAP_NR(bh->b_data); switch (atomic_read(&page->count)) { case 1: atomic_inc(&page->count); nr_freeable_pages--; break; #if 1 /* PARANOID */ case 0: printk(KERN_ERR "buffer_get: page was unused!\n"); #endif } } And for b_count-- exists a bput(). Taking uptodate the file cache instead is been very easier (some line changed and nothing more). Lukily the only b_count++ or b_count-- are in buffer.c and in ext2fs, other fs has one or two b_count only. Seeems to works fine and stable here but I still need to do some test before release it. The only reason I developed nr_freeable_pages is because I want stable numbers. And to get stable numbers under swapout shrink_mmap retval is not enough because I could go sometime in the wrong direction doing the wrong work. But I can't trust the size of the cache or of the buffers as a balance factor because they could be all busy or all freeable... (as pointed out by Stephen). BTW, Stephen, having b_count == 0 (as I done) is a good approximation that the buffer is ready to be freed? I seen in buffer.c that it should be also unlocked, unprotected and clean to be freeable, but b_count looks like to be the most important thing, can a driver take locked/dirty/protected for an infinite time a buffer? If I rember well (not sure if we was talking about the same thing) also Rik suggested to have a nr_freeable_pages, I don't know if the reason he wanted it is my same one though. Comments from MM guys? Andrea Arcangeli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@e-mind.com> Subject: Re: 2.2.0-final Date: 1999/01/24 Message-ID: <fa.iq09e6v.1ag0o1a@ifi.uio.no>#1/1 X-Deja-AN: 436174787 Original-Date: Sun, 24 Jan 1999 02:51:55 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.96.990124024902.199B-100000@laser.bogus> References: <fa.ivug2ev.12ga7bk@ifi.uio.no> To: "Stephen C. Tweedie" <s...@redhat.com>, Rik van Riel <H.H.vanR...@phys.uu.nl> X-Sender: and...@laser.bogus Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig X-PgP-Public-Key-URL: http://e-mind.com/~andrea/aa.asc Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Sat, 23 Jan 1999, Andrea Arcangeli wrote: > where buffer_get() is this: Just for the record, I cut-and-pasted a wrong buffer_get() (due a last-minute wrong hack, I noticed it now when I powerup the machine now ;), the right one is this: extern inline void buffer_get(struct buffer_head *bh) { struct page * page = mem_map + MAP_NR(bh->b_data); switch (atomic_read(&page->count)) { case 1: nr_freeable_pages--; default: atomic_inc(&page->count); break; #if 1 /* PARANOID */ case 0: printk(KERN_ERR "buffer_get: page was unused!\n"); #endif } } Andrea Arcangeli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@e-mind.com> Subject: Re: 2.2.0-final Date: 1999/01/24 Message-ID: <fa.imvrd6v.1fguph3@ifi.uio.no>#1/1 X-Deja-AN: 436299803 Original-Date: Sun, 24 Jan 1999 14:16:35 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.96.990124141300.222A-100000@laser.bogus> References: <fa.ivug2ev.12ga7bk@ifi.uio.no> To: Linus Torvalds <torva...@transmeta.com> X-Sender: and...@laser.bogus Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig X-PgP-Public-Key-URL: http://e-mind.com/~andrea/aa.asc Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Sat, 23 Jan 1999, Andrea Arcangeli wrote: > On Wed, 20 Jan 1999, Linus Torvalds wrote: > > > In short, before you post a bug-report about 2.2.0-final, I'd like you to > > There are three things from me I think should go in before 2.2.0 real There's a fourth thing I forget to tell yesterday. If all pte are young we could not be able to swapout while with priority == 0 we must not care about CPU aging. I hope to have pointed out right and needed things, I don't want to spam you while you are busy... Here the fix: Index: vmscan.c =================================================================== RCS file: /var/cvs/linux/mm/vmscan.c,v retrieving revision 1.1.1.3 diff -u -r1.1.1.3 vmscan.c --- vmscan.c 1999/01/23 18:52:32 1.1.1.3 +++ linux/mm/vmscan.c 1999/01/24 13:12:30 @@ -325,7 +325,7 @@ * Think of swap_cnt as a "shadow rss" - it tells us which process * we want to page out (always try largest first). */ - counter = nr_tasks / (priority+1); + counter = (nr_tasks << 1) / (priority+1); if (counter < 1) counter = 1; if (counter > nr_tasks) Andrea Arcangeli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli <and...@e-mind.com> Subject: Re: 2.2.0-final Date: 1999/01/27 Message-ID: <fa.ir0ddev.1cgsp90@ifi.uio.no>#1/1 X-Deja-AN: 437356774 Original-Date: Sun, 24 Jan 1999 14:28:31 +0100 (CET) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.96.990124142635.476A-100000@laser.bogus> References: <fa.imvrd6v.1fguph3@ifi.uio.no> To: Linus Torvalds <torva...@transmeta.com> X-Sender: and...@laser.bogus Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig X-PgP-Public-Key-URL: http://e-mind.com/~andrea/aa.asc Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Sun, 24 Jan 1999, Andrea Arcangeli wrote: > Here the fix: Woops the fix was wrong, I forgot that there was a not needed check due me (just removed from some weeks here, and that's because I forget to remove it now ;): The complete fix is this. Excuse me... Index: vmscan.c =================================================================== RCS file: /var/cvs/linux/mm/vmscan.c,v retrieving revision 1.1.1.3 diff -u -r1.1.1.3 vmscan.c --- vmscan.c 1999/01/23 18:52:32 1.1.1.3 +++ vmscan.c 1999/01/24 13:26:24 @@ -325,11 +325,9 @@ * Think of swap_cnt as a "shadow rss" - it tells us which process * we want to page out (always try largest first). */ - counter = nr_tasks / (priority+1); + counter = (nr_tasks << 1) / (priority+1); if (counter < 1) counter = 1; - if (counter > nr_tasks) - counter = nr_tasks; for (; counter >= 0; counter--) { assign = 0; Andrea Arcangeli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds <torva...@transmeta.com> Subject: Linux-2.2.1 - the Brown Paper Bag release Date: 1999/01/28 Message-ID: <fa.lgrdt2v.c1e4i4@ifi.uio.no>#1/1 X-Deja-AN: 438068920 Original-Date: Thu, 28 Jan 1999 13:00:37 -0800 (PST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.95.990128125922.956A-100000@penguin.transmeta.com> To: Kernel Mailing List <linux-ker...@vger.rutgers.edu> X-Authentication-Warning: penguin.transmeta.com: torvalds owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu The subject says it all. We did have a few paper-bag-inducing bugs in 2.2.0, so there's a 2.2.1 out there now, just a few days after 2.2.0. Oh, well. These things happen, Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/