/srv/irclogs.linaro.org/2018/06/27/#linaro-virtualization.txt

rthwd03:43
* ajb-linaro --test-sve=3 run is currently on sve-all-short/insn_frintn_z_p_z___INC.risu.bin08:54
rthajb-linaro, before you do too much more review of v5, I posted v6 last night.14:21
ajb-linarorth: ok I'll redo14:22
ajb-linarorth: btw I think the lulesh2 test passes now - although unlike himeno we don't see a sve related speed boost14:23
rthah, i haven't re-tested those in a while.14:23
rthshame about the speed, but I kind-of think that's expected for now.14:24
ajb-linarorth: btw the --test-sve=3 test set is still generating.....14:25
rthgeez14:26
ajb-linarocurrently on sve-all-short/insn_MOVS_orrs_p_p_pp_z__INC.risu.bin14:27
ajb-linarorth: I tried the armie emulator - but sadly that falls down on the lack of signal support14:28
rthajb-linaro, you're using FVP yes?  does it support VQ > 4?14:29
ajb-linarorth: FVP and I think it will do up to the full architectural limit14:30
ajb-linaro-C SVE.ScalableVectorExtension.veclen=3214:30
rththen generating --test-sve=16 would be good.  there are some code paths that are only exercised by vq > 4.14:31
ajb-linarounits of 64-bit14:31
rthcool.14:32
ajb-linarorth: will do - once this set is finished ;-)14:32
rth(fortunately there are not *that* many unexercised code paths; for the most part it's simply a loop that rolls more times.)14:34
ajb-linarorth: do we have any test cases for ffr and no fault behaviour that actually exercise that?14:55
rthajb-linaro, that's what i did the work in cortex-strings for.14:55
rthajb-linaro, and now in glibc (not yet posted).14:56
ajb-linarorth: ahh good stuff14:56
ajb-linarorth: how can make check pass if I configure --with-sve and it's using my system qemu?15:00
rthajb-linaro, dunno.  i set binfmt_misc to point to my recently built qemu.15:05
rthit's irritating because cortex-strings uses libtool.15:05
rthso to run by hand you have to pull bits out of .libs/15:06
ajb-linarorth: ahh otherwise it just uses libc....15:07
rthajb-linaro, --disable-shared can help15:08
ajb-linarohmm I guess the library does a probe or something....15:12
ajb-linarolibtool - what is it good for - huh!15:15
* ajb-linaro continues humming the tune15:15
* ajb-linaro is once again glad QEMU never took the autotools route15:19
ajb-linarochecking whether the aarch64-linux-gnu-gcc linker (/usr/bin/aarch64-linux-gnu-ld -m elf_x86_64) supports shared libraries...15:19
ajb-linaro*sigh*15:19
ajb-linaroenv CC=aarch64-linux-gnu-gcc ./configure --host=aarch64-linux-gnu --with-sve --enable-shared and I now have tests that fail with SIGILL15:33
ajb-linaroand --disable-shared now actually does what it says15:34
ajb-linaroand finally15:39
ajb-linarofind tests/ -executable -and -xtype f | xargs -n 1 ~/lsrc/qemu/qemu.git/aarch64-linux-user/qemu-aarch6415:39
ajb-linarorth: so AIUI holding mmap_lock prevents pages being mapped/unmapped beneath us - but page_range_check ensures we'll never actually fault (as in QEMU getting a sigbus)15:51
rthcorrect.15:51
* ajb-linaro is mildly concerned if we did what that means for locking15:51
rththe lock means that nothing changes between the page_range_check and the actual memory access.15:52
ajb-linarorth: would a potential later optimisation be to skip the page_range_check and fixup if we did get a SIGBUS?15:54
ajb-linarorth: or would that run afoul of not having a perfect mapping between host/guest page sizes?15:55
rthin order to avoid the page_range_check, we'd need a sigsetjmp, and then longjmp from the signal handler.15:55
rththe overhead of sigsetjmp is not insignificant.15:56
rthit would be very difficult to ballance.15:56
rthotoh, you also would not need the lock, so two things.15:56
ajb-linarorth: only if you have to restart - you could just have a thread flag and test it in the loop, it's not a "real" signal in that respect, anyway something to play with on a later day15:58
rthhuh?15:58
rthanyway, yea, something to play with later.15:59
ajb-linarorth: helper would set __thread lazy_signal_handling, signal handler would set __thread yeah_you_faulted but return (skipping instr), not restart, helper would check __yeah_you_faulted and then work out why... , clearing lazy_signal_handling on exit...16:00
* ajb-linaro hand waves16:00
ajb-linarorth: anyway 2/65 now has a rb16:01
rthok.  that's certainly less overhead than sigsetjmp.16:01
* ajb-linaro gets ready for home16:01
rththe skip instr step is definitely non-trivial for x86.16:02
rthbut not impossible.16:02
rthindeed, if you're going to play those games, we might even get the signal handler to adjust the PC.16:03
rthwhat if: __get_user becomes an assembly routine which sets __thread jump_here_on_fault, which returns the usual error code.16:06
rthno need to parse x86 instructions.16:07
ajb-linarorth: you would probably special case it for cpu_ldst helpers.. anyway laters16:08
rthciao16:08
ajb-linarorth: nice and simple - I like it17:46
ajb-linarorth: so for FMLA I don't understand this 7 operands to pass these arguments "properly". Surely we need 3 Z regs and the preg (4) + desc (5) + status(6) - would there be anything wrong with a side-effects no-return value helper?19:21
rthajb-linaro, there are 4 Z regs for "properly".19:22
rthwhere Zd != Za != Zn != Zm.19:23
rthwhich one could generate with "proper" optimization of movprfx.19:23
ajb-linarorth: I'm missing something - the instruction specifies Zda = Zda + Zn * Zm19:25
* ajb-linaro looks at movprfx for enlightenment19:25
rthajb-linaro, yes, that should answer your question.19:25
ajb-linarorth: hmm sort of - is this like some sort of glorified software hinted register renaming performance hack? Move Zn to Zd but it's going to be replaced with the result of the next instruction so feel free to shortcut in silicon if you want?19:29
rthajb-linaro, yes.19:29
ajb-linarodo we even model movprfx as anything other than a copy?19:30
rthbut movprfx /z is something unique -- discard inactive elements and zero them.  you can't generate that any other way, really.19:30
rthajb-linaro, not yet.  there's a large comment about it in the movprfx patch.19:31
ajb-linaroI see in 27/36  TODO: The implementation so far could handle predicated merging movprfx.19:31
rththat's it.19:31
ajb-linarorth: ok, maybe add a reference to potential movpfx optimisations in the FMLA patch? I'll see if I can come up with some words in the morning19:32
* ajb-linaro is called by food19:32

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!