On 12/8/19 12:42 PM, George Koehler wrote:
> On Thu, 05 Dec 2019 10:22:11 +0000
> Stuart Henderson <stu@spacehopper.org> wrote:
>
>> On 5 December 2019 01:15:09 Matthew Hull <castersupmode@verizon.net> wrote:
>>
>>> I'm interested in guile2 (because I do some programming in Scheme) and
>>> powerpc because I have a Mac Mini G4 with OpenBSD 6.5 installed.
>>>
>>> The package is marked broken for powerpc...
>>>
>>> Does the default make in include "-g" or "-ggdb" flags??? Would a build
>>> with -O0 -ggdb be a practical debugging option??? If so, how could those
>>> flags be propagated "from the top"?
>>>
>> make clean=all
>> make DEBUG="-O0 -g" install
>>
>> Gdb in base is old and doesn't work too well - use a newer one from
>> packages: pkg_add gdb and use the "egdb" command.
> Hello Matt. For some reason, I didn't receive your mails. I did
> receive Stuart's reply, and other mails sent to ports@. This problem
> is at my end: I'm using GMail. I'm reading your mails through the
> archives at MARC.
>
> Your backtrace https://marc.info/?l=openbsd-ports&m=157566079007497&w=2
> shows where Guile crashes, but doesn't provide enough information to fix
> the problem. I have a PowerBook G4, so I have reproduced the crash and
> gotten more info, but still don't know the fix. My PowerBook5,4 runs a
> snapshot of OpenBSD macppc 6.6-current from a few days ago, with a ports
> tree from about 2 weeks ago, including lang/guile2 version 2.2.6p0.
> Your OpenBSD 6.5 would have lang/guile2 version 2.2.4p0.
>
> Your backtrace shows a crash at "vm-engine.c:573 NEXT (0);". I got the
> crash in the same place. The macro "NEXT (0);" has a part that reads
> ip[0]. In my crash, I can't access *ip, so ip[0] probably caused the
> crash by segfault.
>
> This code in vm-engine.c "call" assigns ip before doing "NEXT (0);":
>
> if (SCM_LIKELY (SCM_PROGRAM_P (FP_REF (0))))
> ip = SCM_PROGRAM_CODE (FP_REF (0));
> else
> ip = (scm_t_uint32 *) vm_apply_non_program_code;
>
> APPLY_HOOK ();
>
> NEXT (0);
>
> By looking at macro definitions, I concluded that "FP_REF (0)" gets
> (vp->fp - 1)->as_scm, a pointer to a Scheme object; and SCM_PROGRAM_*
> interpret the object as a scm_t_cell. I printed this cell in egdb.
>
> (gdb) print ip
> $19 = (scm_t_uint32 *) 0x33955378
> (gdb) print *ip
> Cannot access memory at address 0x33955378
> (gdb) print *(scm_t_cell *)((vp->fp - 1)->as_scm)
> $20 = {word_0 = 0x45, word_1 = 0x33955378}
>
> Here 0x45 is scm_tc7_program (so SCM_PROGRAM_P is true), and 0x33955378
> is the bad pointer that SCM_PROGRAM_CODE gets from word_1. Some code
> might put bad pointers in program objects. I modified guile to look for
> such code. I added a global "scm_t_uint32 aaa;" and added some checks
> like "aaa = *pointer". One such check crashed at vm-engine.c:1654
> "make-closure":
>
> UNPACK_24 (op, dst);
> offset = ip[1];
> UNPACK_24 (ip[2], nfree);
>
> // FIXME: Assert range of nfree?
> SYNC_IP ();
> closure = scm_inline_words (thread, scm_tc7_program | (nfree << 16),
> nfree + 2);
> aaa = *(ip + offset);
> SCM_SET_CELL_WORD_1 (closure, ip + offset);
> // FIXME: Elide these initializations?
> for (n = 0; n < nfree; n++)
> SCM_PROGRAM_FREE_VARIABLE_SET (closure, n, SCM_BOOL_F);
> SP_SET (dst, closure);
> NEXT (3);
>
> (gdb) print ip
> $12 = (scm_t_uint32 *) 0xcf1ea3b8
> (gdb) print offset
> $13 = -1005191168
> (gdb) print *(ip + offset)
> Cannot access memory at address 0xdf76a3b8
> (gdb) print ip[1]
> Cannot access memory at address 0xcf1ea3bc
>
> I can't read ip[1] in the core dump, but the program did read ip[1] in
> "offset = ip[1];" before the crash. The call to scm_inline_words(), to
> allocate the scm_tc7_program object, seems to have also freed the memory
> where ip points. This might be a problem with the garbage collector.
>
> I also can't read ip[0] and ip[3] in the core dump. If the program
> didn't run "aaa = *(ip + offset);", it would crash when "NEXT (3);"
> reads ip[3]. This doesn't make sense, because the original crash was
> not at this "NEXT (3);", but at that other "NEXT (0);". I seem to have
> changed the behavior of the garbage collector. I wonder if the GC scans
> global variables, and my added "aaa" caused the change.
>
> The garbage collector is from devel/boehm-gc version 7.6.0p3. I did
>
> $ cd /usr/ports/devel/boehm-gc
> $ make test
>
> and all 15 tests passed. If the garbage collector has a problem, these
> tests don't expose the problem. I still don't know how to fix the
> problem in Guile. --George
>
>
Thanks George. This is good information. I'm traveling the next 2
weeks but I'm taking the G4 Mini with me in case I have time to work on
it. Thanks again for looking into it.
No comments:
Post a Comment