IRC, freenode, #hurd, 2012-04-23

<braunr> btw, i'm running a gnumach version using red-black trees for vm
  map entries
<antrik> braunr: sounds fashionable ;-)
<youpi> braunr: with some perf improvement?
<braunr> looks promising for our ext2fs instances showing several thousands
  of map entries
<braunr> youpi: i'm not using it for lookups yet
<braunr> but the tree is there, maintained, used for both regular and copy
  maps, and it doesn't crash
<youpi> good :)
<braunr> antrik: isn't it ? :)
<braunr> youpi: and the diff stat is like 50/15
<antrik> braunr: what's the goal of using the fashionable trees?
<braunr> antrik: speeding up lookups in address spaces
<antrik> braunr: so the idea is that if we have a heavily fragmented
  address space, the performance penalty is smaller?
<braunr> yes
<antrik> OK
<antrik> I take it you gave up on attempts to actually decrease
  fragmentation?...
<braunr> it's not as good as reducing fragmentation, which requires
  implementing a powerful merge, but it's still better
<braunr> yes
<braunr> it's too messy for my brain :/
<antrik> I see
<antrik> oh
<braunr> it will add some overhead though
<youpi> I guess log(n) ?
<braunr> but if there is a significant performance gain, it'll be worth it
<braunr> yes
<braunr> i was more thinking about the memory overhead
<antrik> right now it's a linear list?
<youpi> I don't think we care nowadays :)
<braunr> antrik: yes
<antrik> ouch
<braunr> antrik: yes ... :>
<braunr> the original authors expected vm maps to have like 30 entries
<braunr> so they used a list for the maps, and a hash table for the
  object/offset to physical page lookups
<braunr> there is a small lookup cache though, which is a nice optimization
<braunr> my code now uses it first, and falls back to the RB tree if the
  hint didn't help
<antrik> braunr: well, don't forget to check whether it actually *is* still
  an optimisation, when using fashionable trees ;-)
<braunr> antrik: i checked that already :)
<braunr> i did the same in x15
<antrik> I see
<braunr> both bsd and linux uses a similar technique
<braunr> use*
<braunr> (well, bsd actually use what is done in mach :)
<antrik> (or perhaps the other way around... ;-) )
<braunr> i don't think so, as the bsd vm is really the mach vm
<braunr> but we don't care much
<antrik> oh, right... that part actually went full circle
<braunr> youpi: i have a patch ready for test on machines with significant
  amounts of map entries (e.g. the buildds ..)
<braunr> youpi: i won't have time for tests tonight, are you interested ?
<braunr> (i've been running it for 15 minutes without any issue for now)
<youpi> I'd say post to the list
<braunr> ok
<youpi> braunr: your patch uses the rb tree for lookups, right?
<youpi> braunr: the buildd using rbtree seems swift
<youpi> but maybe it's just a psychologic effect :)
<youpi> the chroot ext2fs already has 1392 lines in vminfo
<youpi> an rbtree can't hurt  there :)
<youpi> braunr: it really seems faster
<youpi> the reboot might have helped too
<youpi> benchmarks shall say
<youpi> for now, I'll just let ironforge use it
<antrik> youpi: it's always fast after a reboot ;-)
<youpi> sure
<youpi> but still
<youpi> I mean
<youpi> *obviously* the reboot has helped
<youpi> but it might not be all
<youpi> at least it feels so
<youpi> and obviously only benchmarks can say
<antrik> the major benefit AIUI is rather that the slowdown happening over
  time will be less noticable

degradation.

<youpi> "over time" is actually quite fast
<youpi> ext2 fills up very quickly when you build a package
<youpi> it went up to 1700 lines very quickly
<youpi> and stabilized around there
<antrik> yeah, it can be very fast under heavy load
<youpi> that's why I say reboot seems not all
<youpi> it's already not so fresh
<youpi> with 1700 lines in vminfo
<antrik> well, I don't know how much of the slowdown I'm experiencing
  (after doing stuff under memory pressure) is actually related to VM map
  fragmentation...
<antrik> might be all, might be none, might be something in between
<youpi> then try his patch
<antrik> guess I should play a bit with vminfo...
<antrik> well, currently I actually experience pretty little slowdown, as
  for certain reasons (only indirectly related to the Hurd) I'm not running
  mutt on this machine, so I don't really have memory pressure...

IRC, freenode, #hurd, 2012-04-24

<braunr> youpi: yes, it uses bst lookups
<braunr> youpi: FYI, the last time i checked, one ext2fs instance had 4k+
  map entries, and another around 7.5k
<braunr> (on ironforge)

IRC, freenode, #hurd, 2012-04-24

<youpi> braunr: $ sudo vminfo  624 | wc -l
<youpi> 22957
<youpi> there's no way it can not help :)
<braunr> youpi: 23k, that's really huge

IRC, freenode, #hurd, 2012-04-26

<braunr> youpi: any new numbers wrt the rbtree patch ?
<youpi> well, buildd times are not really accurate :)
<youpi> but what I can tell is that it managed to build qtwebkit fine
<braunr> ok
<youpi> so the patch is probably safe :)
<braunr> i'll commit it soon then
<youpi> I'd say feel free to, yes
<braunr> thanks

IRC, freenode, #hurd, 2012-04-27

<braunr> elmig: don't expect anything grand though, this patch is mostly
  useful when address spaces get really fragmented, which mainly happens on
  buildds
<braunr> (and it only speeds lookups, which isn't as good as reducing
  fragmentation; things like fork still have to copy thousands of map
  entries)

fork.

IRC, freenode, #hurdfr, 2012-06-02

<youpi> braunr: oh, un bug de rbtree
<youpi> Assertion `diff != 0' failed in file "vm/vm_map.c", line 1002
<youpi> c'est dans rbtree_insert()
<youpi> vm_map_enter (vm/vm_map.c:1002).
<youpi> vm_map (vm/vm_user.c:373).
<youpi> syscall_vm_map (kern/ipc_mig.c:657).
<youpi> erf j'ai tué mon débuggueur, je ne peux pas inspecter plus
<youpi> le peu qui me reste c'est qu'apparemment target_map == 1, size ==
  0, mask == 0
<youpi> anywhere = 1
<braunr> youpi: ça signifie sûrement que des adresses overlappent
<braunr> je rejetterai un coup d'oeil sur le code demain
<braunr> (si ça se trouve c'est un bug rare de la vm, le genre qui fait
  crasher le noyau)
<braunr> (enfin jveux dire, qui faisait crasher le noyau de façon très
  obscure avant le patch rbtree)

IRC, freenode, #hurd, 2012-07-15

<bddebian> I get errors in vm_map.c whenever I try to "mount" a CD
<bddebian> Hmm, this time it rebooted the machine
<bddebian> braunr: The translator set this time and the machine reboots
  before I can get the full message about vm_map, but here is some of the
  crap I get:  http://paste.debian.net/179191/
<braunr> oh
<braunr> nice
<braunr> that may be the bug youpi saw with my redblack tree patch
<braunr> bddebian: assert(diff != 0); ?
<bddebian> Aye
<braunr> good
<braunr> it means we're trying to insert a vm_map_entry at a region in a
  map which is already occupied
<bddebian> Oh
<braunr> and unlike the previous code, the tree actually checks that
<braunr> it has to
<braunr> so you just simply use the iso9660fs translator and it crashes ?
<bddebian> Well it used to on just trying to set the translator.  This time
  I was able to set the translator but as soon as I cd to the mount point I
  get all that crap
<braunr> that's very good
<braunr> more test cases to fix the vm

IRC, freenode, #hurd, 2012-11-01

<youpi> braunr: Assertion `diff != 0' failed in file "vm/vm_map.c", line
  1002
<youpi> that's in rbtree_insert
<braunr> youpi: the problem isn't the tree, it's the map entries
<braunr> some must overlap
<braunr> if you can inspect that, it would be helpful
<youpi> I have a kdb there
<youpi> it's within a port_name_to_task system call
<braunr> this assertion basically means there already is an item in the
  tree where the new item is supposed to be inserted
<youpi> this port_name_to_task presence in the stack is odd
<braunr> it's in vm_map_enter
<youpi> there's a vm_map just after that (and the assembly trap code
  before)
<youpi> I know
<youpi> I'm wondering about the caller
<braunr> do you have a way to inspect the inserted map entry ?
<youpi> I'm actually wondering whether I have the right kernel in gdb
<braunr> oh
<youpi> better
<youpi> with the right kernel :)
<youpi> 0x80039acf (syscall_vm_map)
  (target_map=d48b6640,address=d3b63f90,size=0,mask=0,anywhere=1)
<youpi> size == 0 seems odd to me
<youpi> (same parameters for vm_map)
<braunr> right
<braunr> my code does assume an entry has a non null size
<braunr> (in the entry comparison function)
<braunr>        EINVAL (since Linux 2.6.12) length was 0.
<braunr> that's a quick glance at mmap(2)
<braunr> might help track bugs from userspace (e.g. in exec .. :))
<braunr> posix says the saem
<braunr> same*
<braunr> the gnumach manual isn't that precise
<youpi> I don't seem to manage to read the entry
<youpi> but I guess size==0 is the problem anyway
<mcsim> youpi, braunr: Is there another kernel fault? Was that in my
  kernel?
<braunr> no that's another problem
<braunr> which became apparent following the addition of red black trees in
  the vm_map code
<braunr> (but which was probably present long before)
<mcsim> braunr: BTW, do you know if there where some specific circumstances
  that led to memory exhaustion in my code? Or it just aggregated over
  time?
<braunr> mcsim: i don't know
<mcsim> s/where/were
<mcsim> braunr: ok

IRC, freenode, #hurd, 2012-11-05

<tschwinge> braunr: I have now also hit the diff != 0 assertion error;
  sitting in KDB, waiting for your commands.
<braunr> tschwinge: can you check the backtrace, have a look at the system
  call and its parameters like youpi did ?
<tschwinge> If I manage to figure out how to do that...  :-)
* tschwinge goes read scrollback.
<braunr> "trace" i suppose
<braunr> if running inside qemu, you can use the integrated gdb server
<tschwinge> braunr: No, hardware.  And work intervened.  And mobile phone
  <-> laptop via bluetooth didn't work.  But now:
<tschwinge> Pretty similar to Samuel's:
<tschwinge>     Assert([...])
<tschwinge>     vm_map_enter(0xc11de6c8, 0xc1785f94, 0, 0, 1)
<tschwinge>     vm_map(0xc11de6c8, 0xc1785f94, 0, 0, 1)
<tschwinge>     syscall_vm_map(1, 0x1024a88, 0, 0, 1)
<tschwinge>     mach_call_call(1, 0x1024a88, 0, 0, 1)
<braunr> thanks
<braunr> same as youpi observed, the requested size for the mapping is 0
<braunr> tschwinge: thanks
<tschwinge> braunr: Anything else you'd like to see before I reboot?
<braunr> tschwinge: no, that's enough for now, and the other kind of info
  i'd like are much more difficult to obtain
<braunr> if we still have the problem once a small patch to prevent null
  size is applied, then it'll be worth looking more into it
<pinotree> isn't it possible to find out who called with that size?
<braunr> not easy, no
<braunr> it's also likely that the call that fails isn't the first one
<pinotree> ah sure
<pinotree> braunr: making mmap reject 0 size length could help? posix says
  such size should be rejected straight away
<braunr> 17:09 < braunr> if we still have the problem once a small patch to
  prevent null size is applied, then it'll be worth looking more into it
<braunr> that's the idea
<braunr> making faulty processes choke on it should work fine :)
<pinotree> «If len is zero, mmap() shall fail and no mapping shall be
  established.»
<pinotree> braunr: should i cook up such patch for mmap?
<braunr> no, the change must be applied in gnumach
<pinotree> sure, but that could simply such condition in mmap (ie avoiding
  to call io_map on a file)
<braunr> such calls are erroneous and rare, i don't see the need
<pinotree> ok
<braunr> i bet it comes from the exec server anyway :p
<tschwinge> braunr: Is the mmap with size 0 already a reproducible testcase
  you can use for the diff != 0 assertion?
<tschwinge> Otherwise I'd have a reproducer now.
<braunr> tschwinge: i'm not sure but probably yes
<tschwinge> braunr: Otherwise, take GDB sources, then: gcc -fsplit-stack
  gdb/testsuite/gdb.base/morestack.c && ./a.out
<tschwinge> I have not looked what exactly this does; I think -fsplit-stack
  is not really implemented for us (needs something in libgcc we might not
  have), is on my GCC TODO list already.
<braunr> tschwinge: interesting too :)

IRC, freenode, #hurd, 2012-11-19

<tschwinge> braunr: Hmm, I have now hit the diff != 0 GNU Mach assertion
  failure during some GCC invocation (GCC testsuite) that does not relate
  to -fsplit-stack (as the others before always have).
<tschwinge> Reproduced:
  /media/erich/home/thomas/tmp/gcc/hurd/master.build/gcc/xgcc
  -B/media/erich/home/thomas/tmp/gcc/hurd/master.build/gcc/
  /home/thomas/tmp/gcc/hurd/master/gcc/testsuite/gcc.dg/torture/pr42878-1.c
  -fno-diagnostics-show-caret -O2 -flto -fuse-linker-plugin
  -fno-fat-lto-objects -fcompare-debug -S -o pr42878-1.s
<tschwinge> Will check whether it's the same backtrace in GNU Mach.
<tschwinge> Yes, same.
<braunr> tschwinge: as youpi seems quite busy these days, i'll cook a patch
  and commit it directly
<tschwinge> braunr: Thanks!  I have, by the way, confirmed that the
  following is enough to trigger the issue: vm_map(mach_task_self(), 0, 0,
  0, 1, 0, 0, 0, 0, 0, 0);
<tschwinge> ... and before the allocator patch, GNU Mach did accept that
  and return 0 -- though I did not check what effect it actually has.  (And
  I don't think it has any useful one.)  I'm also reading that as of lately
  (Linux 2.6.12), mmap (length = 0) is to return EINVAL, which I think is
  the foremost user of vm_map.
<pinotree> tschwinge: posix too says to return EINVAL for length = 0
<braunr> yes, we checked that earlier with youpi

id:"87sj8522zx.fsf@kepler.schwinge.homeip.net".

<braunr> tschwinge: well, actually your patch is what i had in mind
  (although i'd like one in vm_map_enter to catch wrong kernel requests
  too)
<braunr> tschwinge: i'll work on it tonight, and do some testing to make
  sure we don't regress critical stuff (exec is another major direct user
  of vm_map iirc)
<tschwinge> braunr: Oh, OK.  :-)