Saturday, November 02, 2024

Re: GHC on OpenBSD/arm64?

You seem to have made it pretty far through the build if you already have a functional cross-compiler. Maybe you'll get a clue where things go off the rails in your hello-world if you run that cross ghc with enough -v's to see what happens?

While it's possible that OpenBSD arm64 is special in ways that include more than just the combination of arm64 on other platforms and OpenBSD on x86, it's not very likely. Still these two are important setting in lang/ghc:
* USE_NOEXECONLY
* USE_NOBTCFI

Both are in set in the x86 port and are also specified with
GHC_CC_OPTS = -Wl,--no-execute-only -Qunused-arguments -Wl,-z,nobtcfi
CONFIGURE_ENV += CONF_GCC_LINKER_OPTS_STAGE0="${GHC_CC_OPTS}" \
CONF_GCC_LINKER_OPTS_STAGE1="${GHC_CC_OPTS}" \
CONF_GCC_LINKER_OPTS_STAGE2="${GHC_CC_OPTS}" \
CONF_CC_OPTS_STAGE2="${GHC_CC_OPTS}"

If I were to guess, this instruction could be an indication that x-only is in the way, try to print its value and see how this memory is mapped.
->  0xa2c3a0 <+24>: ldur   w15, [x17, #-0x8]

If you are lucky, -Wl,--no-execute-only is all you need to produce a working binary.

configure has a flagto take care of the x-only problem --disable-tables-next-to-code. I don't set it in the port because ghc test suite has some tests that are broken in this mode.


‪On Thu, Oct 31, 2024 at 6:12 AM ‫حبيب محمد الأمين محمد الهـاد‬‎ <ha.alamin@gmail.com> wrote:‬
All the object files in the stage 2 build directory are aarch64, except
the GHC binary itself; now, I don't know why this happens, but this line
seems a little weird in hadrian/src/Rules/Program.hs:

-- For cross compiler, copy @stage0/bin/<pgm>@ to @stage1/bin/@.

Anyway, with the hello world program I mentioned I built using stage 1,
I ran it with LLDB in the OpenBSD/arm64 machine, and it segfaults here:

(lldb) run
Process 66875 launched: '/home/habib/hello' (aarch64)
Process 66875 stopped
* thread #1, stop reason = signal SIGSEGV
    frame #0: 0x0000000000a2c3a0 hello`stg_enter_info + 24
hello`stg_enter_info:
->  0xa2c3a0 <+24>: ldur   w15, [x17, #-0x8]
    0xa2c3a4 <+28>: sxtw   x15, w15
    0xa2c3a8 <+32>: mov    x14, #0x1a
    0xa2c3ac <+36>: cmp    x15, x14
(lldb)

I saw an old ticket for macOS where it errors on the line before the
ldur (which you can't see here), because it accesses the reserved x18
register, so I thought maybe something similar is going on with OpenBSD
where,

due to GHC not yet having tested or even specifically designed for, the
OpenBSD/arm64 compilation target, maybe there's some OpenBSD-specific
thing that hasn't been accounted for that the OpenBSD devs know about.

If an OpenBSD dev wants to look at it, I can send the binary or an
assembly dump of it as an attachment, however you want. I can then let
the GHC developers know where their aarch64 output is wrong for OpenBSD.

I'll try using the LLVM backend to compile the hello world program,
since that obviously supports OpenBSD/arm64, so I don't see it
outputting incorrect assembly for a hello world program for that target.

I might or might not keep working on this today, but I'll try and get
back in with another progress update tomorrow evening.

Cheers,
Habib

On 30 Oct 2024, at 12:53, ⁨حبيب محمد الأمين محمد الهـاد⁩ <⁨ha.alamin@gmail.com⁩> wrote:

Progress update: stage 2 compiler built, kind of, this morning after 7
hours overnight. It turns out it built an amd64 binary.

Note: use --with-intree-gmp. Also, it by default doesn't find
libiconv, not even the version installed in the amd64 system;
anyway, I just pointed it to the one in the aarch64 sysroot using
--with-libiconv-{includes,libraries}, and that worked (so I don't even
understand how it built an amd64 binary).

Okay, but the build system built the libraries for stage 1 as a
dependency of this so that the stage 1 compiler could actually run, so
I was able to manually compile a hello world program with it this time
without a missing Prelude — passing in --target= to -opta, -optc,
-optl, and --sysroot to one of them as well — and I saw that it was an
aarch64 binary.

I'm sure I'm passing the same arguments to the build of the stage 2
compiler, but knowing the build system, I'm sure there's some place
where it's getting missed. Still, it can't be a mixture of aarch64 and
amd64, so I need to figure out what's going on, because a lot of stuff
was complaining about not finding the sysroot and I'd fix that, or the
assembler wasn't understanding the instructions, so I specified to the
assembler that it's aarch64, and that fixed it, etc.

Anyway, when I scp'd that hello world binary to the OpenBSD/arm64 system
I took the sysroot from and ran it, it segfaulted.

Cheers,
Habib

On 29 Oct 2024, at 18:29, ⁨حبيب محمد الأمين محمد الهـاد⁩ <⁨ha.alamin@gmail.com⁩> wrote:

Just a correction to my previous message. I meant sigevent, not
timer_create; the timer_* stuff was just warnings, what's really missing
is struct sigevent, which seems to usually be defined in signal.h,
including in FreeBSD.

And it's not just header checking, the configure script actually checks
that a program with timer_create can compile and link, and it reports
yes (correctly), but can't run the test that it actually works (with
sigevent and everything plumbed in) when cross-compiling;

"checking for a working timer_create", emphasis on "working"
doesn't show up in the config.log but does in the configure script where
it writes a test program with sigevent, and the configure script makes
it obvious that it can't run this test when cross-compiling.

It optimistically assumes that timer_create works with sigevent and
all that when cross-compiling… which it doesn't. So I guess GHC's
configure script needs to learn so it knows this when cross-compiling on
OpenBSD. I'll hack it for now to see if the build otherwise works.

Cheers,
Habib

On 29 Oct 2024, at 14:49, ⁨حبيب محمد الأمين محمد الهـاد⁩ <⁨ha.alamin@gmail.com⁩> wrote:

Thanks Greg. Yeah, I saw you mention that in the GitLab and attempted it, but could not make heads or tails.

Anyway, my current obstacle is a lack of timer_create in OpenBSD. There's fallback code, but the configure isn't recognising the feature lack (maybe it's only checking the headers signal.h and time.h which both exist, haven't dug in yet).

Cheers,
Habib

On 28 Oct 2024, at 18:36, Greg Steuck <greg@nest.cx> wrote:

Habib, I just wanted to tell you that I am following your progress, just don't have time yet to reproduce nor much insight to share. Thank you for digging into this, I know it's a hard slog.

One thing that occurred to me some time ago was to try comparing the runs upstream does in their working cross-environment to what we attempt. Their setup is heavily factored, so we can't easily extract the relevant command lines. Looking at their logs might be illuminating, also running the same environment under linux might help. None of it is easy or fun.

‪On Sat, Oct 26, 2024 at 10:52 PM ‫حبيب محمد الأمين محمد الهـاد‬‎ <ha.alamin@gmail.com> wrote:‬
Progress report:

  - I managed to build the C parts of rts through Hadrian. It proved
    impossible to pass multiple options via LDFLAGS, as Hadrian doesn't
    allow spaces in arguments passed to tools. I fixed this with a
    bin/clang-with-sysroot-env shim.
  - There are some (C) libraries built without a Cabal file, like
    libffi, and there seems to be no way to pass arguments through
    Hadrian to control these builds.
    - Using $CONF_GCC_LINKER_OPTS_STAGE{0,1,2} seems to work and passes
      through arguments with spaces to both rts and libffi; I had tried
      $CONF_LD_LINKER_OPTS_STAGE{0,1,2} before, but the compiler binary
      is called as a linker, so only the former arguments to configure
      work to control flags to the linker (even for Clang). Anyway,
      I still seem to require the shim, despite it seeming to pass
      arguments to rts, but whatever, it gets me past the libffi build.
  - To get any further than that, to compile libffi, I had to edit the
    Hadrian code directly (it's part of the GHC source tree).

I'm now working on getting the Haskell parts of rts to compile
without "invalid instruction mnemonic" errors; so far,
with some more arguments via Hadrian to GHC of the form
-optl--host=aarch64-unknown-openbsd (and similar), I can get it to
compile a little bit instead of blowing up with the errors straight
away, but it does eventually spit out the same errors, just further
along in the build.

Cheers,
Habib

On 25 Oct 2024, at 15:48, ⁨حبيب محمد الأمين محمد الهـاد⁩ <⁨ha.alamin@gmail.com⁩> wrote:

For some reason, pretty much all the options aren't getting passed
through to cabal configure. These are the log messages for what
arguments get passed to cabal configure and ultimately to configure (rts
is the part of the build that fails with the missing libpthread and libm
I mentioned in my previous message):

# | Package 'rts' configuration flags: configure --distdir /home/habib/ghc-9.10.1/_build/stage1/rts --disable-executable-stripping --disable-library-stripping --disable-executable-stripping --disable-library-stripping --cabal-file rts/rts.cabal --ipid rts-1.0.2 --prefix ${pkgroot}/.. --htmldir ${pkgroot}/../../doc/html/libraries/rts-1.0.2 --with-ghc=/home/habib/ghc-9.10.1/_build/stage0/bin/aarch64-unknown-openbsd-ghc --with-ghc-pkg=/home/habib/ghc-9.10.1/_build/stage0/bin/aarch64-unknown-openbsd-ghc-pkg --with-gcc=clang --with-ar=/usr/local/bin/llvm-ar-13 --ghc-option=-no-global-package-db --ghc-option=-package-db=/home/habib/ghc-9.10.1/_build/stage1/inplace/package.conf.d --ghc-pkg-option=--global-package-db=/home/habib/ghc-9.10.1/_build/stage1/inplace/package.conf.d --enable-library-vanilla --disable-library-profiling --disable-library-for-ghci --disable-shared --with-ld=clang --with-alex=/usr/local/bin/alex --with-happy=/usr/local/bin/happy --configure-option=CFLAGS=-Qunused-arguments -iquote /home/habib/ghc-9.10.1/rts -Qunused-arguments --configure-option=LDFLAGS=--target=aarch64-unknown-openbsd --configure-option=--host=aarch64-unknown-openbsd --configure-option=--with-cc=clang --ghc-option=-ghcversion-file=rts/include/ghcversion.h --ghc-option=-ghcversion-file=rts/include/ghcversion.h --configure-option=LDFLAGS= --configure-option=CPPFLAGS= --extra-lib-dirs=/home/habib/ghc-9.10.1/aarch64-sysroot/usr/lib --extra-lib-dirs=/usr/lib --extra-include-dirs=/home/habib/ghc-9.10.1/aarch64-sysroot/usr/include --extra-include-dirs=/usr/include --ghc-option= --ghc-option= --ghc-option= --ghc-option= --ghc-option= -v3 --flags=-profiling -debug -dynamic threaded libm -librt -libdl -use-system-libffi libffi-adjustors need-pthread -libbfd -need-atomic -libdw -libnuma -libzstd -static-libzstd -leading-underscore -unregisterised tables-next-to-code -find-ptr -v2
[…]
Running: /bin/sh //home/habib/ghc-9.10.1/rts/configure '--with-compiler=ghc' '--prefix=${pkgroot}/..' 'CFLAGS=-Qunused-arguments -iquote /home/habib/ghc-9.10.1/rts -Qunused-arguments' 'LDFLAGS=--target=aarch64-unknown-openbsd' '--host=aarch64-unknown-openbsd' '--with-cc=clang' 'LDFLAGS=' 'CPPFLAGS=' 'CC=/usr/local/llvm13/bin/clang' '--host=aarch64-openbsd'

This is from running (on one line):

hadrian/build \
  --docs=none \
  --flavour=quickest \
  "stage0.*.cabal.configure.opts += --configure-option=LDFLAGS= --configure-option=--target=aarch64-unknown-openbsd" \
  "stage1.*.cabal.configure.opts += --configure-option=LDFLAGS=\"-syslibroot=$SYSROOT -L/usr/lib -L$SYSROOT/usr/lib\"" \
  "stage1.*.cabal.configure.opts += --configure-option=CPPFLAGS=\"-isysroot=$SYSROOT -I/usr/include -I$SYSROOT/usr/include\"" \
  "stage1.*.cabal.configure.opts += --extra-lib-dirs=$SYSROOT/usr/lib" \
  "stage1.*.cabal.configure.opts += --extra-lib-dirs=/usr/lib" \
  "stage1.*.cabal.configure.opts += --extra-include-dirs=$SYSROOT/usr/include" \
  "stage1.*.cabal.configure.opts += --extra-include-dirs=/usr/include" \
  "stage1.*.cabal.configure.opts += --ghc-option=\"-optl -L$SYSROOT/usr/lib\"" \
  "stage1.*.cabal.configure.opts += --ghc-option=\"-optl -L/usr/lib\"" \
  "stage1.*.cabal.configure.opts += --ghc-option=\"-optl -lpthread -lm\"" \
  "stage1.*.cabal.configure.opts += --ghc-option=\"-optc -I$SYSROOT/usr/include\"" \
  "stage1.*.cabal.configure.opts += --ghc-option=\"-optc -I/usr/include\"" \
  "stage1.*.cabal.configure.opts += -v3" \
  "stage1.*.ghc.hs.opts += \"-optl -L$SYSROOT/usr/lib\"" \
  "stage1.*.ghc.hs.opts += \"-optl -L/usr/lib\"" \
  "stage1.*.ghc.hs.opts += \"-optl -lpthread -lm\"" \
  "stage1.*.ghc.hs.opts += \"-optc -I$SYSROOT/usr/include\"" \
  "stage1.*.ghc.hs.opts += \"-optc -I/usr/include\"" \
  "stage1.*.ghc.c.opts += -I$SYSROOT/usr/include -I/usr/include" \
  "stage1.*.ghc.link.opts += -L$SYSROOT/usr/lib -L/usr/lib -lpthread -lm" \
  "stage1.*.cc.c.opts += -L$SYSROOT/usr/lib -L/usr/lib -lpthread -lm" \
  -VVVVVVVVVV --freeze1 \
  stage2:exe:ghc-bin

It seems like pretty much none of the arguments are getting passed
through (unlike with the stage0 compilation, where I verified that the
flags I was giving to Hadrian were getting passed through).

I'm gonna create a ticket on GHC's Gitlab.

Cheers,
Habib

On 25 Oct 2024, at 13:20, ⁨حبيب محمد الأمين محمد الهـاد⁩ <⁨ha.alamin@gmail.com⁩> wrote:

What follows are my notes on where I'm at so far, how I got here, what
my next steps are, and some rambling and infodump that may be useful.

I managed to build the stage 1 compiler Wednesday. It seems to run as
expected, though I can't test actual compilation and whether it can
successfully output AArch64 binaries, as it (correctly) gives an error
about the lack of Prelude.

I spent yesterday trying to build the stage 2 compiler using the stage
1 compiler. By default, the configure tests fail at C compiler cannot
create executables, and the logs show this:

configure:2773: /usr/local/llvm13/bin/clang -Qunused-arguments -iquote /home/habib/ghc-9.10.1/rts -Qunused-arguments  --target=aarch64-unknown-openbsd conftest.c  >&5
ld: error: /tmp/conftest-0430b9.o is incompatible with /usr/lib/crt0.o
libraries: m, pthread".

However, by setting `-syslibroot` and `-isysroot` (indirectly and with
difficulty via Hadrian) to a folder containing the `/usr/lib` and
`/usr/include` directories from an OpenBSD/arm64 installation, as well
as including the , I managed to fix that error (so I know it's having an
effect), but then I get an error like this:

Error: hadrian: Missing dependencies on foreign libraries:
* Missing (or bad) C libraries: m, pthread
[…]
If the libraries are already installed but in a non-standard
location then you can use the flags --extra-include-dirs=
and --extra-lib-dirs= to specify where they are.

(I then added those flags as well, but no dice.)

Well, the sysroot definitely contains a `libpthread.so.27.1` (and a
libm.so.*); does it need to be symlinked without the version as well?
Perhaps my tar of it from an OpenBSD/arm64 installation didn't properly
capture the symlinks on the original installation?

Anyway, I won't be able to work on this much today, but I'm gonna try
increasing the verbosity of configure as the error message above (I've
elided this part) suggests, though I've already increased Hadrian's
verbosity as far as I can, not sure if it forwards that to the tools it
runs. I'm also gonna try and check that I shouldn't have a libpthread.so
symlink without the version suffix, and try and pass in the sysroot
and/or include and lib directories through GHC's -optl and -optc flags.

I searched for a while yesterday if there were any known issues with
building GHC on OpenBSD failing due to pthreads, but nothing fruitful.

Sorry I couldn't trim this message further, I'm in a bit of a rush. I
just wanted to leave this here with some notes in case one of you knows
something about building GHC on OpenBSD w/ pthreads, or in general the
interaction between GHC, OpenBSD, and pthreads. Or to hopefully stop you
wasting your time if you're still stuck on the stage 1 compiler (sorry,
I meant to send the initial version of this message on Wednesday, but
some things prevented me).

BTW, before I forget, steps to get a seemingly-working stage 1 compiler
(it seems to run fine, but I can't test compilation as it fails to
find the Prelude module; AFAICT, there's a complicated interplay here
between the libraries/package dbs for each stage of the compiler, and
the compilers themselves, but suffice to say that I'm gonna take the
building of the stage 2 compiler as the confirmation that stage 1 can
successfully compile AArch64 binaries):

export PATH="/usr/local/llvm13/bin:$PATH"
# I don't recall why this is needed, but I'm fairly sure it is
export LD_LIBRARY_PATH=/usr/local/llvm13/lib
export SYSROOT="$PWD/aarch64-sysroot" # populate this yourself

cabal update

# This isn't necessary for stage 1 to build, but I have a hunch it was
# holding me back from building stage 2 (which I still haven't). The
# configure script picks up clang-13 and all that, but not these tools,
# and I rebuilt on a freshly extracted source tree with these exports,
# and it still worked to build stage 1 (though it crapped out halfway
# through; I don't recall the error message, but I remember thinking it
# seemed transient, so I just restarted it and it seemed to resume from
# the same file, and I got a running stage 1 build that still fails to
# find Prelude)
export \
AR=/usr/local/bin/llvm-ar-13 \
RANLIB=/usr/local/bin/llvm-ranlib-13 \
LD=/usr/local/bin/ld.lld-13 \
OBJDUMP=/usr/local/bin/llvm-objdump-13 \
NM=/usr/local/bin/llvm-nm-13

./configure --target=aarch64-unknown-openbsd

# I wipe out the LDFLAGS because it erroneously had a flag
# `--host=aarch64-unknown-openbsd`, which it gets from the target;
# there's some discussion in Greg's link about how that's wrong, and
# questioning why it does that, so I just wiped that flag; it added
# `--target` as well, but that gets added through other means, too, and
# I even add it myself as you can see, though not sure how needed it is.
hadrian/build \
  --docs=none \
  --flavour=quickest \
  "stage0.*.cabal.configure.opts += \
  --configure-option=LDFLAGS= \
  --configure-option=--target=aarch64-unknown-openbsd" \
  stage1:exe:ghc-bin

# I'm trying this monster command now to build stage 2, I've tried
# most of the different flags in there with different variations, but
# not all together, and some I haven't. breaking it up into lines and
# paragraphs, but I paste all of these commands on one line. I've frozen
# the stage 1 compiler just in case something I do rebuilds it (it takes
# 4 hours on my 4 core, 8gb amd64 QEMU machine being emulated on an M1
# Pro).
hadrian/build \
  --docs=none \
  --flavour=quickest \

  "stage0.*.cabal.configure.opts += \
  --configure-option=LDFLAGS= \
  --configure-option=--target=aarch64-unknown-openbsd" \

  "stage1.*.cabal.configure.opts += \
  --configure-option=LDFLAGS=\"-syslibroot=$SYSROOT -L/usr/lib -L$SYSROOT/usr/lib\" \
  --configure-option=CPPFLAGS=\"-isysroot=$SYSROOT -I/usr/include -I$SYSROOT/usr/include\" \

  --configure-option=--extra-lib-dirs=$SYSROOT/usr/lib\" \
  --configure-option=--extra-lib-dirs=/usr/lib\" \
  --configure-option=--extra-include-dirs=$SYSROOT/usr/include\" \
  --configure-option=--extra-include-dirs=/usr/include\" \

  --configure-option=--ghc-option=\"-optl -L$SYSROOT/usr/lib\"
  --configure-option=--ghc-option=\"-optl -L/usr/lib\"
  --configure-option=--ghc-option=\"-optl -lpthread\"
  --configure-option=--ghc-option=\"-optc -I$SYSROOT/usr/include\" \
  --configure-option=--ghc-option=\"-optc -I/usr/include\" \

  --configure-option=-v3" \

  "stage1.*.ghc.c.opts += -I$SYSROOT/usr/include -I/usr/include" \
  "stage1.*.ghc.link.opts += -L$SYSROOT/usr/lib -L/usr/lib -lpthread" \
  "stage1.*.cc.c.opts += -L$SYSROOT/usr/lib -L/usr/lib -lpthread" \
  -VVVVVVVVVV --freeze1 \
  stage2:exe:ghc-bin

I only put them all together like that so you can see the different
contortions I'm testing out to get options passed to the compiler
and linker. sysroot should mean that -I/usr/lib should be treated as
$SYSROOT/usr/lib, but I'm not sure if it is or not, all I had to go
on were configure logs, no compiler logs (hopefully -v3 should change
that). I'm dumping all this on you guys in a disorganised fashion
because I can't work on this much today, so I hope my messy notes and
rambling will be useful.

(One last note I forgot to add; testing this with GHC 9.8.3 may prove
easier to test whether the stage 1 compiler can successfully output
binary files, as the Cabal version from ports would be compatible with
it, and so one can more easily populate the package db to get Prelude?)

Cheers,
Habib

On 21 Oct 2024, at 04:01, ⁨حبيب محمد الأمين محمد الهـاد⁩ <⁨ha.alamin@gmail.com⁩> wrote:

I just finished a PR to get GHCup to build and run on OpenBSD (though
not necessarily support it w/ working build manifests and bindists yet),
so it'll be great to see GHC in ports on OpenBSD/arm64.

I'll hopefully start working on it this week. Lydia, feel free to share
any notes w/ myself and Greg if you make any progress or are banging
your head against a specific error.

Cheers,
Habib

On 19 Oct 2024, at 06:54, Greg Steuck <gnezdo@openbsd.org> wrote:

Lydia Sobot <chilledfrogs@disroot.org> writes:

Perfect. How far in are you?
Not very, still figuring out how exactly the pieces fit in together

FWIW, I made some effort in this area and it didn't go particularly
smoothly. Some info in https://gitlab.haskell.org/ghc/ghc/-/issues/24431







--
nest.cx is Gmail hosted, use PGP: https://pgp.key-server.io/0x0B1542BD8DF5A1B0
Fingerprint: 5E2B 2D0E 1E03 2046 BEC3  4D50 0B15 42BD 8DF5 A1B0






--
nest.cx is Gmail hosted, use PGP: https://pgp.key-server.io/0x0B1542BD8DF5A1B0
Fingerprint: 5E2B 2D0E 1E03 2046 BEC3  4D50 0B15 42BD 8DF5 A1B0

No comments:

Post a Comment