Friday, May 03, 2024

Re: UPDATE: libsamplerate

On May 03 20:03:43, hans@stare.cz wrote:
> > > > > > > On Apr 26 20:46:51, brad@comstyle.com wrote:
> > > > > > > > Implement SSE2 lrint() and lrintf() on amd64.
> > > > > > >
> > > > > > > I don't think this is worth the added complexity:
> > > > > > > seven more patches to have a different lrint()?
> > > > > > > Does it make the resampling noticably better/faster?

BTW, this is what libm/arch/amd64/s_lrint.S says:

ENTRY(lrint)
RETGUARD_SETUP(lrint, r11)
cvtsd2si %xmm0, %rax
RETGUARD_CHECK(lrint, r11)
ret
END(lrint)

So isn't that already used anyway?
If so, what's the point of replacing lrint
with _mm_cvtsd_si32(_mm_load_sd(&x)) ?

Jan



> > > > https://github.com/libsndfile/libsndfile/pull/663
> > > > -> https://quick-bench.com/q/OabKT-gEOZ8CYDriy1JEwq1lEsg
> > > > where there's a huge difference in clang builds.
> > >
> > > Sorry, I don't understand at all how this concerns
> > > the OpenBSD port of libsamplerate: the Benchmark does not
> > > mention an OS or an architecture, so what is this being run on?
> > >
> > > Anyway, just running it (Run Benchmark) gives the result
> > > of cpu_time of 722.537 for BM_d2les_array (using lrint)
> > > and cpu_time of 0 for BM_d2les_array_sse2 (using psf_lrint),
> > > reporting a speedup ratio of 200,000,000.
> > >
> > > That's not an example of what I have in mind: a simple application
> > > of libsamplerate, sped up by the usage of the new SSE2 lrint
>
> > OK, here is a test that's a modified version of what Stuart linked,
> > testing the performance of the lrint() itself (code below).
>
> A better test below, lrint()ing a random sequence.
> The SSE version is slower on every SSE2 machine I tried.
> Is that the case for you too?
>
> Jan
>
>
> #include <immintrin.h>
> #include <math.h>
>
> static inline int
> psf_lrint(double const x)
> {
> return _mm_cvtsd_si32(_mm_load_sd(&x));
> }
>
> static void
> d2l(const double *src, long *dst, size_t len)
> {
> for (size_t i = 0; i < len; i++)
> dst[i] = lrint(src[i]);
> }
>
> static void
> d2l_sse(const double *src, long *dst, size_t len)
> {
> for (size_t i = 0; i < len; i++)
> dst[i] = psf_lrint(src[i]);
> }
>
> int
> main()
> {
> size_t i, len = 1000 * 1000 * 100;
> double *src = NULL;
> long *dst = NULL;
>
> src = calloc(len, sizeof(double));
> dst = calloc(len, sizeof(long));
>
> arc4random_buf(src, len * sizeof(double));
> d2l_sse(src, dst, len);
> /*d2l(src, dst, len);*/
>
> return 0;
> }

No comments:

Post a Comment