Friday, November 01, 2024

Re: [new] databases/arrow 18.0.0

Le Fri, Nov 01, 2024 at 07:17:31PM +0100, Landry Breuil a écrit :
> Le Fri, Nov 01, 2024 at 10:10:35AM +0000, Stuart Henderson a écrit :
> > On 2024/11/01 10:56, Landry Breuil wrote:
> > > hi,
> > >
> > > following thrift, here's the port for the c++ part of arrow:
> > > https://github.com/apache/arrow/blob/main/cpp/README.md
> > > it provides the parquet library for https://parquet.apache.org/.
> > >
> > > some open questions:
> > > - i've put the port in databases because for me its sort-of a database
> > > format: "The universal columnar format and multi-language toolbox for
> > > fast data interchange and in-memory analytics"
> > >
> > > but it can go into devel or textproc, i'm not settled on it. devel is
> > > already a bit crowded...
> >
> > databases sounds good
> >
> > > - the toplevel in https://github.com/apache/arrow/ has zero build goo,
> > > so from the same distfile one has to build by subdir (eg setting
> > > WRKDIST=${WRKDIR}/${DISTNAME}/cpp), hence the pkgname being arrow-cpp
> > > since i'm only interested in the c++ part.
> >
> > shouldn't that be WRKSRC=${WRKDIST}/cpp?
>
> yes, i'm always confused by the various variations..
>
> > > should i name the port databases/arrow-cpp ? databases/arrow/cpp in
> > > preparation for potential other ports for various bindings ?
> >
> > databases/arrow/cpp sounds a good plan to me. common parts can be
> > factored in Makefile.inc later when we find out what the common parts
> > are :)
>
> here's a new version that:
> - enables json support via textproc/rapidjson
> (CXXFLAGS=-I/usr/local/include was the missing key so that cmake finds
> rapidjson headers)
> - enables building tools & tests, lots of tests run fine:
> 89% tests passed, 9 tests failed out of 81
> (and i just realized some of the parquet test failures are only because
> i forgot to set PARQUET_TEST_DATA in the env)

and here's a third version fetching the test files from github via
DIST_TUPLE, and properly setting TEST_ENV so that more tests pass:

94% tests passed, 5 tests failed out of 81
The following tests FAILED:
23 - arrow-compute-scalar-temporal-test (Failed)
34 - arrow-io-file-test (Failed)
36 - arrow-utility-test (Failed)
39 - arrow-threading-utility-test (Failed)
78 - parquet-arrow-test (Failed)

feedback on that last version welcome, oks too. My plan is to enable
lerc in gtiff, and lerc/arrow/avif in gdal when updating to 3.10 in the
coming days.

Landry

No comments:

Post a Comment