OpenBSD Mail Box: UPDATE: dav1d 1.5.0

Here is an update to dav1d 1.5.0.

Upstream has created their own diffs to fix aarch64 for xonly and
works fine as is.

https://code.videolan.org/videolan/dav1d/-/commit/41511bf12ef3f7f0facf6e567849b342597bfbd6
https://code.videolan.org/videolan/dav1d/-/commit/2355eeb8f254a1c34dbb0241be5c70cdf6ed46d1

The amd64 patches for IBT need to be reapplied and updated. Could
someone with hardware to test and please be able to look into this?

Upstream seems to be open to accepting IBT patches if they're updated.

Changes for 1.5.0 'Sonic':
--------------------------

1.5.0 is a major release of dav1d, that:
- WARNING: we removed some of the SSE2 optimizations, so if you care about
systems without SSSE3, you should be careful when updating!
- Add Arm OpenBSD run-time CPU feature
- Optimize index offset calculations for decode_coefs
- picture: copy HDR10+ and T35 metadata only to visible frames
- SSSE3 new optimizations for 6-tap (8bit and hbd)
- AArch64/SVE: Add HBD subpel filters using 128-bit SVE2
- AArch64: Add USMMLA implempentation for 6-tap H/HV
- AArch64: Optimize Armv8.0 NEON for HBD horizontal filters and 6-tap filters
- Power9: Optimized ITX till 16x4.
- Loongarch: numerous optimizations
- RISC-V optimizations for pal, cdef_filter, ipred, mc_blend, mc_bdir, itx
- Allow playing videos in full-screen mode in dav1dplay

Changes for 1.4.3 'Road Runner':
--------------------------------

1.4.3 is a small release focused on security issues
- AArch64: Fix potential out of bounds access in DotProd H/HV filters
- cli: Prevent buffer over-read

Changes for 1.4.2 'Road Runner':
--------------------------------

1.4.2 is a small release of dav1d, improving notably ARM, AVX-512 and PowerPC
- AVX2 optimizations for 8-tap and new variants for 6-tap
- AVX-512 optimizations for 8-tap and new variants for 6-tap
- Improve entropy decoding on ARM64
- New ARM64 optimizations for convolutions based on DotProd extension
- New ARM64 optimizations for convolutions based on i8mm extension
- New ARM64 optimizations for subpel and prep filters for i8mm
- Misc improvements on existing ARM64 optimizations, notably for put/prep
- New PowerPC9 optimizations for loopfilter
- Support for macOS kperf API for benchmarking

Changes for 1.4.1 'Road Runner':
--------------------------------

1.4.1 is a small release of dav1d, improving notably ARM and RISC-V speed

- Optimizations for 6tap filters for NEON (ARM)
- More RISC-V optimizations for itx (4x8, 8x4, 4x16, 16x4, 8x16, 16x8)
- Reduction of binary size on ARM64, ARM32 and RISC-V
- Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter
- Msac optimizations

Changes for 1.4.0 'Road Runner':
--------------------------------

1.4.0 is a medium release of dav1d, focusing on new architecture support and optimizations

- AVX-512 optimizations for z1, z2, z3 in 8bit and high-bitdepth
- New architecture supported: loongarch
- Loongarch optimizations for 8bit
- New architecture supported: RISC-V
- RISC-V optimizations for itx
- Misc improvements in threading and in reducing binary size
- Fix potential integer overflow with extremely large frame sizes (CVE-2024-1580)

Changes for 1.3.0 'Tundra Peregrine Falcon (Calidus)':
------------------------------------------------------

1.3.0 is a medium release of dav1d, focus on new APIs and memory usage reduction.

- Reduce memory usage in numerous places
- ABI break in Dav1dSequenceHeader, Dav1dFrameHeader, Dav1dContentLightLevel structures
- new API function to check the API version: dav1d_version_api()
- Rewrite of the SGR functions for ARM64 to be faster
- NEON implemetation of save_tmvs for ARM32 and ARM64
- x86 palette DSP for pal_idx_finish function

Index: Makefile
===================================================================
RCS file: /cvs/ports/multimedia/dav1d/Makefile,v
retrieving revision 1.39
diff -u -p -u -p -r1.39 Makefile
--- Makefile 29 Feb 2024 14:33:39 -0000 1.39
+++ Makefile 1 Dec 2024 08:45:12 -0000
@@ -4,14 +4,13 @@ COMMENT= small and fast AV1 decoder
# /!\ DO NOT UPDATE WITHOUT RUNNING TESTS ON ARM64 (XONLY) and AMD64 (IBT) /!\ #
#################################################################################

-VER= 1.2.1
+VER= 1.5.0
DISTNAME= dav1d-${VER}
-REVISION= 3
CATEGORIES= multimedia
SITES= https://downloads.videolan.org/pub/videolan/dav1d/${VER}/
EXTRACT_SUFX= .tar.xz

-SHARED_LIBS= dav1d 2.3
+SHARED_LIBS= dav1d 3.0

HOMEPAGE= https://code.videolan.org/videolan/dav1d/

@@ -39,6 +38,7 @@ CONFIGURE_ARGS+=-Ddefault_library=both \
CONFIGURE_ARGS+=-Denable_asm=false
# XXX SIGBUS otherwise
CFLAGS+= -O1
+#CFLAGS+= -fno-slp-vectorize
.endif

.include <bsd.port.mk>
Index: distinfo
===================================================================
RCS file: /cvs/ports/multimedia/dav1d/distinfo,v
retrieving revision 1.18
diff -u -p -u -p -r1.18 distinfo
--- distinfo 11 Jun 2023 07:58:45 -0000 1.18
+++ distinfo 1 Dec 2024 08:45:12 -0000
@@ -1,2 +1,2 @@
-SHA256 (dav1d-1.2.1.tar.xz) = TjPrYexUx2ihbaDPj6CSi0xFk/X4BKPIh9SiHDGDQLI=
-SIZE (dav1d-1.2.1.tar.xz) = 873008
+SHA256 (dav1d-1.5.0.tar.xz) = FL1vUVeAjtmu3K++UN9onTBP1IEKwgvm7sGrA3Q2r9Y=
+SIZE (dav1d-1.5.0.tar.xz) = 1017040
Index: patches/patch-src_arm_64_filmgrain16_S
===================================================================
RCS file: patches/patch-src_arm_64_filmgrain16_S
diff -N patches/patch-src_arm_64_filmgrain16_S
--- patches/patch-src_arm_64_filmgrain16_S 24 Apr 2023 21:06:59 -0000 1.1
+++ /dev/null 1 Jan 1970 00:00:00 -0000
@@ -1,186 +0,0 @@
-Index: src/arm/64/filmgrain16.S
---- src/arm/64/filmgrain16.S.orig
-+++ src/arm/64/filmgrain16.S
-@@ -740,12 +740,12 @@ function generate_grain_\type\()_16bpc_neon, export=1
- add x4, x1, #FGD_AR_COEFFS_UV
- .endif
- add w9, w9, w15 // grain_scale_shift - bitdepth_min_8
-- adr x16, L(gen_grain_\type\()_tbl)
-+ adrp x16, L(gen_grain_\type\()_tbl)
-+ add x16, x16, :lo12: L(gen_grain_\type\()_tbl)
- ldr w17, [x1, #FGD_AR_COEFF_LAG]
- add w9, w9, #4
-- ldrh w17, [x16, w17, uxtw #1]
-+ ldr x16, [x16, w17, uxtw #3]
- dup v31.8h, w9 // 4 - bitdepth_min_8 + data->grain_scale_shift
-- sub x16, x16, w17, uxtw
- neg v31.8h, v31.8h
-
- .ifc \type, uv_444
-@@ -946,11 +946,13 @@ L(generate_grain_\type\()_lag3):
- AARCH64_VALIDATE_LINK_REGISTER
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(gen_grain_\type\()_tbl):
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag0)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag1)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag2)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag3)
-+ .xword L(generate_grain_\type\()_lag0)
-+ .xword L(generate_grain_\type\()_lag1)
-+ .xword L(generate_grain_\type\()_lag2)
-+ .xword L(generate_grain_\type\()_lag3)
-+ .popsection
- endfunc
- .endm
-
-@@ -991,12 +993,12 @@ function generate_grain_\type\()_16bpc_neon, export=1
- ldr w9, [x1, #FGD_GRAIN_SCALE_SHIFT]
- add x4, x1, #FGD_AR_COEFFS_UV
- add w9, w9, w15 // grain_scale_shift - bitdepth_min_8
-- adr x16, L(gen_grain_\type\()_tbl)
-+ adrp x16, L(gen_grain_\type\()_tbl)
-+ add x16, x16, :lo12: L(gen_grain_\type\()_tbl)
- ldr w17, [x1, #FGD_AR_COEFF_LAG]
- add w9, w9, #4
-- ldrh w17, [x16, w17, uxtw #1]
-+ ldr x16, [x16, w17, uxtw #3]
- dup v31.8h, w9 // 4 - bitdepth_min_8 + data->grain_scale_shift
-- sub x16, x16, w17, uxtw
- neg v31.8h, v31.8h
-
- cmp w13, #0
-@@ -1156,11 +1158,13 @@ L(generate_grain_\type\()_lag3):
- AARCH64_VALIDATE_LINK_REGISTER
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(gen_grain_\type\()_tbl):
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag0)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag1)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag2)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag3)
-+ .xword L(generate_grain_\type\()_lag0)
-+ .xword L(generate_grain_\type\()_lag1)
-+ .xword L(generate_grain_\type\()_lag2)
-+ .xword L(generate_grain_\type\()_lag3)
-+ .popsection
- endfunc
- .endm
-
-@@ -1306,19 +1310,18 @@ function fgy_32x32_16bpc_neon, export=1
- add_offset x5, w6, x10, x5, x9
-
- ldr w11, [sp, #88] // type
-- adr x13, L(fgy_loop_tbl)
-+ adrp x13, L(fgy_loop_tbl)
-+ add x13, x13, :lo12: L(fgy_loop_tbl)
-
- add x4, x12, #32*2 // grain_lut += BLOCK_SIZE * bx
- add x6, x14, x9, lsl #5 // grain_lut += grain_stride * BLOCK_SIZE * by
-
- tst w11, #1
-- ldrh w11, [x13, w11, uxtw #1]
-+ ldr x11, [x13, w11, uxtw #3]
-
- add x8, x16, x9, lsl #5 // grain_lut += grain_stride * BLOCK_SIZE * by
- add x8, x8, #32*2 // grain_lut += BLOCK_SIZE * bx
-
-- sub x11, x13, w11, uxtw
--
- b.eq 1f
- // y overlap
- dup v8.8h, v27.h[0]
-@@ -1481,11 +1484,13 @@ L(loop_\ox\oy):
- fgy 1, 0
- fgy 1, 1
-
-+ .pushsection .data.rel.ro, "aw"
- L(fgy_loop_tbl):
-- .hword L(fgy_loop_tbl) - L(loop_00)
-- .hword L(fgy_loop_tbl) - L(loop_01)
-- .hword L(fgy_loop_tbl) - L(loop_10)
-- .hword L(fgy_loop_tbl) - L(loop_11)
-+ .xword L(loop_00)
-+ .xword L(loop_01)
-+ .xword L(loop_10)
-+ .xword L(loop_11)
-+ .popsection
- endfunc
-
- // void dav1d_fguv_32x32_420_16bpc_neon(pixel *const dst,
-@@ -1589,11 +1594,12 @@ function fguv_32x32_\layout\()_16bpc_neon, export=1
- ldr w13, [sp, #112] // type
-
- movrel x16, overlap_coeffs_\sx
-- adr x14, L(fguv_loop_sx\sx\()_tbl)
-+ adrp x14, L(fguv_loop_sx\sx\()_tbl)
-+ add x14, x14, :lo12: L(fguv_loop_sx\sx\()_tbl)
-
- ld1 {v27.4h, v28.4h}, [x16] // overlap_coeffs
- tst w13, #1
-- ldrh w13, [x14, w13, uxtw #1]
-+ ldr x13, [x14, w13, uxtw #3]
-
- b.eq 1f
- // y overlap
-@@ -1601,8 +1607,6 @@ function fguv_32x32_\layout\()_16bpc_neon, export=1
- mov w9, #(2 >> \sy)
-
- 1:
-- sub x13, x14, w13, uxtw
--
- .if \sy
- movi v25.8h, #23
- movi v26.8h, #22
-@@ -1819,15 +1823,17 @@ L(fguv_loop_sx0_csfl\csfl\()_\ox\oy):
- AARCH64_VALIDATE_LINK_REGISTER
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(fguv_loop_sx0_tbl):
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_00)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_01)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_10)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_11)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_00)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_01)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_10)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_11)
-+ .xword L(fguv_loop_sx0_csfl0_00)
-+ .xword L(fguv_loop_sx0_csfl0_01)
-+ .xword L(fguv_loop_sx0_csfl0_10)
-+ .xword L(fguv_loop_sx0_csfl0_11)
-+ .xword L(fguv_loop_sx0_csfl1_00)
-+ .xword L(fguv_loop_sx0_csfl1_01)
-+ .xword L(fguv_loop_sx0_csfl1_10)
-+ .xword L(fguv_loop_sx0_csfl1_11)
-+ .popsection
- endfunc
-
- function fguv_loop_sx1_neon
-@@ -1985,13 +1991,15 @@ L(fguv_loop_sx1_csfl\csfl\()_\ox\oy):
- AARCH64_VALIDATE_LINK_REGISTER
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(fguv_loop_sx1_tbl):
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_00)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_01)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_10)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_11)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_00)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_01)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_10)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_11)
-+ .xword L(fguv_loop_sx1_csfl0_00)
-+ .xword L(fguv_loop_sx1_csfl0_01)
-+ .xword L(fguv_loop_sx1_csfl0_10)
-+ .xword L(fguv_loop_sx1_csfl0_11)
-+ .xword L(fguv_loop_sx1_csfl1_00)
-+ .xword L(fguv_loop_sx1_csfl1_01)
-+ .xword L(fguv_loop_sx1_csfl1_10)
-+ .xword L(fguv_loop_sx1_csfl1_11)
-+ .popsection
- endfunc
Index: patches/patch-src_arm_64_filmgrain_S
===================================================================
RCS file: patches/patch-src_arm_64_filmgrain_S
diff -N patches/patch-src_arm_64_filmgrain_S
--- patches/patch-src_arm_64_filmgrain_S 24 Apr 2023 21:06:59 -0000 1.1
+++ /dev/null 1 Jan 1970 00:00:00 -0000
@@ -1,186 +0,0 @@
-Index: src/arm/64/filmgrain.S
---- src/arm/64/filmgrain.S.orig
-+++ src/arm/64/filmgrain.S
-@@ -884,12 +884,12 @@ function generate_grain_\type\()_8bpc_neon, export=1
- .else
- add x4, x1, #FGD_AR_COEFFS_UV
- .endif
-- adr x16, L(gen_grain_\type\()_tbl)
-+ adrp x16, L(gen_grain_\type\()_tbl)
-+ add x16, x16, :lo12: L(gen_grain_\type\()_tbl)
- ldr w17, [x1, #FGD_AR_COEFF_LAG]
- add w9, w9, #4
-- ldrh w17, [x16, w17, uxtw #1]
-+ ldr x16, [x16, w17, uxtw #3]
- dup v31.8h, w9 // 4 + data->grain_scale_shift
-- sub x16, x16, w17, uxtw
- neg v31.8h, v31.8h
-
- .ifc \type, uv_444
-@@ -1076,11 +1076,13 @@ L(generate_grain_\type\()_lag3):
- AARCH64_VALIDATE_LINK_REGISTER
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(gen_grain_\type\()_tbl):
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag0)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag1)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag2)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag3)
-+ .xword L(generate_grain_\type\()_lag0)
-+ .xword L(generate_grain_\type\()_lag1)
-+ .xword L(generate_grain_\type\()_lag2)
-+ .xword L(generate_grain_\type\()_lag3)
-+ .popsection
- endfunc
- .endm
-
-@@ -1118,12 +1120,12 @@ function generate_grain_\type\()_8bpc_neon, export=1
- ldr w2, [x1, #FGD_SEED]
- ldr w9, [x1, #FGD_GRAIN_SCALE_SHIFT]
- add x4, x1, #FGD_AR_COEFFS_UV
-- adr x16, L(gen_grain_\type\()_tbl)
-+ adrp x16, L(gen_grain_\type\()_tbl)
-+ add x16, x16, :lo12: L(gen_grain_\type\()_tbl)
- ldr w17, [x1, #FGD_AR_COEFF_LAG]
- add w9, w9, #4
-- ldrh w17, [x16, w17, uxtw #1]
-+ ldr x16, [x16, w17, uxtw #3]
- dup v31.8h, w9 // 4 + data->grain_scale_shift
-- sub x16, x16, w17, uxtw
- neg v31.8h, v31.8h
-
- cmp w13, #0
-@@ -1273,11 +1275,13 @@ L(generate_grain_\type\()_lag3):
- AARCH64_VALIDATE_LINK_REGISTER
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(gen_grain_\type\()_tbl):
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag0)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag1)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag2)
-- .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag3)
-+ .xword L(generate_grain_\type\()_lag0)
-+ .xword L(generate_grain_\type\()_lag1)
-+ .xword L(generate_grain_\type\()_lag2)
-+ .xword L(generate_grain_\type\()_lag3)
-+ .popsection
- endfunc
- .endm
-
-@@ -1407,19 +1411,18 @@ function fgy_32x32_8bpc_neon, export=1
- add_offset x5, w6, x10, x5, x9
-
- ldr w11, [sp, #24] // type
-- adr x13, L(fgy_loop_tbl)
-+ adrp x13, L(fgy_loop_tbl)
-+ add x13, x13, :lo12: L(fgy_loop_tbl)
-
- add x4, x12, #32 // grain_lut += BLOCK_SIZE * bx
- add x6, x14, x9, lsl #5 // grain_lut += grain_stride * BLOCK_SIZE * by
-
- tst w11, #1
-- ldrh w11, [x13, w11, uxtw #1]
-+ ldr x11, [x13, w11, uxtw #3]
-
- add x8, x16, x9, lsl #5 // grain_lut += grain_stride * BLOCK_SIZE * by
- add x8, x8, #32 // grain_lut += BLOCK_SIZE * bx
-
-- sub x11, x13, w11, uxtw
--
- b.eq 1f
- // y overlap
- dup v6.16b, v27.b[0]
-@@ -1556,11 +1559,13 @@ L(loop_\ox\oy):
- fgy 1, 0
- fgy 1, 1
-
-+ .pushsection .data.rel.ro, "aw"
- L(fgy_loop_tbl):
-- .hword L(fgy_loop_tbl) - L(loop_00)
-- .hword L(fgy_loop_tbl) - L(loop_01)
-- .hword L(fgy_loop_tbl) - L(loop_10)
-- .hword L(fgy_loop_tbl) - L(loop_11)
-+ .xword L(loop_00)
-+ .xword L(loop_01)
-+ .xword L(loop_10)
-+ .xword L(loop_11)
-+ .popsection
- endfunc
-
- // void dav1d_fguv_32x32_420_8bpc_neon(pixel *const dst,
-@@ -1646,11 +1651,12 @@ function fguv_32x32_\layout\()_8bpc_neon, export=1
- ldr w13, [sp, #64] // type
-
- movrel x16, overlap_coeffs_\sx
-- adr x14, L(fguv_loop_sx\sx\()_tbl)
-+ adrp x14, L(fguv_loop_sx\sx\()_tbl)
-+ add x14, x14, :lo12: L(fguv_loop_sx\sx\()_tbl)
-
- ld1 {v27.8b, v28.8b}, [x16] // overlap_coeffs
- tst w13, #1
-- ldrh w13, [x14, w13, uxtw #1]
-+ ldr x13, [x14, w13, uxtw #3]
-
- b.eq 1f
- // y overlap
-@@ -1658,8 +1664,6 @@ function fguv_32x32_\layout\()_8bpc_neon, export=1
- mov w9, #(2 >> \sy)
-
- 1:
-- sub x13, x14, w13, uxtw
--
- .if \sy
- movi v25.16b, #23
- movi v26.16b, #22
-@@ -1849,15 +1853,17 @@ L(fguv_loop_sx0_csfl\csfl\()_\ox\oy):
- AARCH64_VALIDATE_LINK_REGISTER
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(fguv_loop_sx0_tbl):
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_00)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_01)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_10)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_11)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_00)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_01)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_10)
-- .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_11)
-+ .xword L(fguv_loop_sx0_csfl0_00)
-+ .xword L(fguv_loop_sx0_csfl0_01)
-+ .xword L(fguv_loop_sx0_csfl0_10)
-+ .xword L(fguv_loop_sx0_csfl0_11)
-+ .xword L(fguv_loop_sx0_csfl1_00)
-+ .xword L(fguv_loop_sx0_csfl1_01)
-+ .xword L(fguv_loop_sx0_csfl1_10)
-+ .xword L(fguv_loop_sx0_csfl1_11)
-+ .popsection
- endfunc
-
- function fguv_loop_sx1_neon
-@@ -1998,13 +2004,15 @@ L(fguv_loop_sx1_csfl\csfl\()_\ox\oy):
- AARCH64_VALIDATE_LINK_REGISTER
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(fguv_loop_sx1_tbl):
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_00)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_01)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_10)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_11)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_00)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_01)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_10)
-- .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_11)
-+ .xword L(fguv_loop_sx1_csfl0_00)
-+ .xword L(fguv_loop_sx1_csfl0_01)
-+ .xword L(fguv_loop_sx1_csfl0_10)
-+ .xword L(fguv_loop_sx1_csfl0_11)
-+ .xword L(fguv_loop_sx1_csfl1_00)
-+ .xword L(fguv_loop_sx1_csfl1_01)
-+ .xword L(fguv_loop_sx1_csfl1_10)
-+ .xword L(fguv_loop_sx1_csfl1_11)
-+ .popsection
- endfunc
Index: patches/patch-src_arm_64_ipred16_S
===================================================================
RCS file: patches/patch-src_arm_64_ipred16_S
diff -N patches/patch-src_arm_64_ipred16_S
--- patches/patch-src_arm_64_ipred16_S 13 Jul 2023 12:26:14 -0000 1.4
+++ /dev/null 1 Jan 1970 00:00:00 -0000
@@ -1,965 +0,0 @@
-Index: src/arm/64/ipred16.S
---- src/arm/64/ipred16.S.orig
-+++ src/arm/64/ipred16.S
-@@ -36,11 +36,11 @@
- function ipred_dc_128_16bpc_neon, export=1
- ldr w8, [sp]
- clz w3, w3
-- adr x5, L(ipred_dc_128_tbl)
-+ adrp x5, L(ipred_dc_128_tbl)
-+ add x5, x5, :lo12: L(ipred_dc_128_tbl)
- sub w3, w3, #25
-- ldrh w3, [x5, w3, uxtw #1]
-+ ldr x5, [x5, w3, uxtw #3]
- dup v0.8h, w8
-- sub x5, x5, w3, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- urshr v0.8h, v0.8h, #1
-@@ -106,12 +106,14 @@ function ipred_dc_128_16bpc_neon, export=1
- b.gt 64b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_dc_128_tbl):
-- .hword L(ipred_dc_128_tbl) - 640b
-- .hword L(ipred_dc_128_tbl) - 320b
-- .hword L(ipred_dc_128_tbl) - 160b
-- .hword L(ipred_dc_128_tbl) - 8b
-- .hword L(ipred_dc_128_tbl) - 4b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 8b
-+ .xword 4b
-+ .popsection
- endfunc
-
- // void ipred_v_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -120,11 +122,11 @@ endfunc
- // const int max_width, const int max_height);
- function ipred_v_16bpc_neon, export=1
- clz w3, w3
-- adr x5, L(ipred_v_tbl)
-+ adrp x5, L(ipred_v_tbl)
-+ add x5, x5, :lo12: L(ipred_v_tbl)
- sub w3, w3, #25
-- ldrh w3, [x5, w3, uxtw #1]
-+ ldr x5, [x5, w3, uxtw #3]
- add x2, x2, #2
-- sub x5, x5, w3, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -190,12 +192,14 @@ function ipred_v_16bpc_neon, export=1
- b.gt 64b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_v_tbl):
-- .hword L(ipred_v_tbl) - 640b
-- .hword L(ipred_v_tbl) - 320b
-- .hword L(ipred_v_tbl) - 160b
-- .hword L(ipred_v_tbl) - 80b
-- .hword L(ipred_v_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void ipred_h_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -204,11 +208,11 @@ endfunc
- // const int max_width, const int max_height);
- function ipred_h_16bpc_neon, export=1
- clz w3, w3
-- adr x5, L(ipred_h_tbl)
-+ adrp x5, L(ipred_h_tbl)
-+ add x5, x5, :lo12: L(ipred_h_tbl)
- sub w3, w3, #25
-- ldrh w3, [x5, w3, uxtw #1]
-+ ldr x5, [x5, w3, uxtw #3]
- sub x2, x2, #8
-- sub x5, x5, w3, uxtw
- mov x7, #-8
- add x6, x0, x1
- lsl x1, x1, #1
-@@ -292,12 +296,14 @@ function ipred_h_16bpc_neon, export=1
- b.gt 64b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_h_tbl):
-- .hword L(ipred_h_tbl) - 64b
-- .hword L(ipred_h_tbl) - 32b
-- .hword L(ipred_h_tbl) - 16b
-- .hword L(ipred_h_tbl) - 8b
-- .hword L(ipred_h_tbl) - 4b
-+ .xword 64b
-+ .xword 32b
-+ .xword 16b
-+ .xword 8b
-+ .xword 4b
-+ .popsection
- endfunc
-
- // void ipred_dc_top_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -306,11 +312,11 @@ endfunc
- // const int max_width, const int max_height);
- function ipred_dc_top_16bpc_neon, export=1
- clz w3, w3
-- adr x5, L(ipred_dc_top_tbl)
-+ adrp x5, L(ipred_dc_top_tbl)
-+ add x5, x5, :lo12: L(ipred_dc_top_tbl)
- sub w3, w3, #25
-- ldrh w3, [x5, w3, uxtw #1]
-+ ldr x5, [x5, w3, uxtw #3]
- add x2, x2, #2
-- sub x5, x5, w3, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -409,12 +415,14 @@ function ipred_dc_top_16bpc_neon, export=1
- b.gt 64b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_dc_top_tbl):
-- .hword L(ipred_dc_top_tbl) - 640b
-- .hword L(ipred_dc_top_tbl) - 320b
-- .hword L(ipred_dc_top_tbl) - 160b
-- .hword L(ipred_dc_top_tbl) - 80b
-- .hword L(ipred_dc_top_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void ipred_dc_left_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -425,13 +433,12 @@ function ipred_dc_left_16bpc_neon, export=1
- sub x2, x2, w4, uxtw #1
- clz w3, w3
- clz w7, w4
-- adr x5, L(ipred_dc_left_tbl)
-+ adrp x5, L(ipred_dc_left_tbl)
-+ add x5, x5, :lo12: L(ipred_dc_left_tbl)
- sub w3, w3, #20 // 25 leading bits, minus table offset 5
- sub w7, w7, #25
-- ldrh w3, [x5, w3, uxtw #1]
-- ldrh w7, [x5, w7, uxtw #1]
-- sub x3, x5, w3, uxtw
-- sub x5, x5, w7, uxtw
-+ ldr x3, [x5, w3, uxtw #3]
-+ ldr x5, [x5, w7, uxtw #3]
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -550,17 +557,19 @@ L(ipred_dc_left_w64):
- b.gt 1b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_dc_left_tbl):
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h64)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h32)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h16)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h8)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h4)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w64)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w32)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w16)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w8)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w4)
-+ .xword L(ipred_dc_left_h64)
-+ .xword L(ipred_dc_left_h32)
-+ .xword L(ipred_dc_left_h16)
-+ .xword L(ipred_dc_left_h8)
-+ .xword L(ipred_dc_left_h4)
-+ .xword L(ipred_dc_left_w64)
-+ .xword L(ipred_dc_left_w32)
-+ .xword L(ipred_dc_left_w16)
-+ .xword L(ipred_dc_left_w8)
-+ .xword L(ipred_dc_left_w4)
-+ .popsection
- endfunc
-
- // void ipred_dc_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -573,16 +582,15 @@ function ipred_dc_16bpc_neon, export=1
- clz w3, w3
- clz w6, w4
- dup v16.4s, w7 // width + height
-- adr x5, L(ipred_dc_tbl)
-+ adrp x5, L(ipred_dc_tbl)
-+ add x5, x5, :lo12: L(ipred_dc_tbl)
- rbit w7, w7 // rbit(width + height)
- sub w3, w3, #20 // 25 leading bits, minus table offset 5
- sub w6, w6, #25
- clz w7, w7 // ctz(width + height)
-- ldrh w3, [x5, w3, uxtw #1]
-- ldrh w6, [x5, w6, uxtw #1]
-+ ldr x3, [x5, w3, uxtw #3]
-+ ldr x5, [x5, w6, uxtw #3]
- neg w7, w7 // -ctz(width + height)
-- sub x3, x5, w3, uxtw
-- sub x5, x5, w6, uxtw
- ushr v16.4s, v16.4s, #1 // (width + height) >> 1
- dup v17.4s, w7 // -ctz(width + height)
- add x6, x0, x1
-@@ -795,17 +803,19 @@ L(ipred_dc_w64):
- b.gt 2b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_dc_tbl):
-- .hword L(ipred_dc_tbl) - L(ipred_dc_h64)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_h32)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_h16)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_h8)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_h4)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_w64)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_w32)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_w16)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_w8)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_w4)
-+ .xword L(ipred_dc_h64)
-+ .xword L(ipred_dc_h32)
-+ .xword L(ipred_dc_h16)
-+ .xword L(ipred_dc_h8)
-+ .xword L(ipred_dc_h4)
-+ .xword L(ipred_dc_w64)
-+ .xword L(ipred_dc_w32)
-+ .xword L(ipred_dc_w16)
-+ .xword L(ipred_dc_w8)
-+ .xword L(ipred_dc_w4)
-+ .popsection
- endfunc
-
- // void ipred_paeth_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -814,13 +824,13 @@ endfunc
- // const int max_width, const int max_height);
- function ipred_paeth_16bpc_neon, export=1
- clz w9, w3
-- adr x5, L(ipred_paeth_tbl)
-+ adrp x5, L(ipred_paeth_tbl)
-+ add x5, x5, :lo12: L(ipred_paeth_tbl)
- sub w9, w9, #25
-- ldrh w9, [x5, w9, uxtw #1]
-+ ldr x5, [x5, w9, uxtw #3]
- ld1r {v4.8h}, [x2]
- add x8, x2, #2
- sub x2, x2, #8
-- sub x5, x5, w9, uxtw
- mov x7, #-8
- add x6, x0, x1
- lsl x1, x1, #1
-@@ -934,12 +944,14 @@ function ipred_paeth_16bpc_neon, export=1
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_paeth_tbl):
-- .hword L(ipred_paeth_tbl) - 640b
-- .hword L(ipred_paeth_tbl) - 320b
-- .hword L(ipred_paeth_tbl) - 160b
-- .hword L(ipred_paeth_tbl) - 80b
-- .hword L(ipred_paeth_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void ipred_smooth_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -951,13 +963,13 @@ function ipred_smooth_16bpc_neon, export=1
- add x11, x10, w4, uxtw
- add x10, x10, w3, uxtw
- clz w9, w3
-- adr x5, L(ipred_smooth_tbl)
-+ adrp x5, L(ipred_smooth_tbl)
-+ add x5, x5, :lo12: L(ipred_smooth_tbl)
- sub x12, x2, w4, uxtw #1
- sub w9, w9, #25
-- ldrh w9, [x5, w9, uxtw #1]
-+ ldr x5, [x5, w9, uxtw #3]
- ld1r {v4.8h}, [x12] // bottom
- add x8, x2, #2
-- sub x5, x5, w9, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -1138,12 +1150,14 @@ function ipred_smooth_16bpc_neon, export=1
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_smooth_tbl):
-- .hword L(ipred_smooth_tbl) - 640b
-- .hword L(ipred_smooth_tbl) - 320b
-- .hword L(ipred_smooth_tbl) - 160b
-- .hword L(ipred_smooth_tbl) - 80b
-- .hword L(ipred_smooth_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void ipred_smooth_v_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -1154,13 +1168,13 @@ function ipred_smooth_v_16bpc_neon, export=1
- movrel x7, X(sm_weights)
- add x7, x7, w4, uxtw
- clz w9, w3
-- adr x5, L(ipred_smooth_v_tbl)
-+ adrp x5, L(ipred_smooth_v_tbl)
-+ add x5, x5, :lo12: L(ipred_smooth_v_tbl)
- sub x8, x2, w4, uxtw #1
- sub w9, w9, #25
-- ldrh w9, [x5, w9, uxtw #1]
-+ ldr x5, [x5, w9, uxtw #3]
- ld1r {v4.8h}, [x8] // bottom
- add x2, x2, #2
-- sub x5, x5, w9, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -1265,12 +1279,14 @@ function ipred_smooth_v_16bpc_neon, export=1
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_smooth_v_tbl):
-- .hword L(ipred_smooth_v_tbl) - 640b
-- .hword L(ipred_smooth_v_tbl) - 320b
-- .hword L(ipred_smooth_v_tbl) - 160b
-- .hword L(ipred_smooth_v_tbl) - 80b
-- .hword L(ipred_smooth_v_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void ipred_smooth_h_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -1281,12 +1297,12 @@ function ipred_smooth_h_16bpc_neon, export=1
- movrel x8, X(sm_weights)
- add x8, x8, w3, uxtw
- clz w9, w3
-- adr x5, L(ipred_smooth_h_tbl)
-+ adrp x5, L(ipred_smooth_h_tbl)
-+ add x5, x5, :lo12: L(ipred_smooth_h_tbl)
- add x12, x2, w3, uxtw #1
- sub w9, w9, #25
-- ldrh w9, [x5, w9, uxtw #1]
-+ ldr x5, [x5, w9, uxtw #3]
- ld1r {v5.8h}, [x12] // right
-- sub x5, x5, w9, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -1397,12 +1413,14 @@ function ipred_smooth_h_16bpc_neon, export=1
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_smooth_h_tbl):
-- .hword L(ipred_smooth_h_tbl) - 640b
-- .hword L(ipred_smooth_h_tbl) - 320b
-- .hword L(ipred_smooth_h_tbl) - 160b
-- .hword L(ipred_smooth_h_tbl) - 80b
-- .hword L(ipred_smooth_h_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- const padding_mask_buf
-@@ -1728,11 +1746,11 @@ endfunc
- // const int dx, const int max_base_x);
- function ipred_z1_fill1_16bpc_neon, export=1
- clz w9, w3
-- adr x8, L(ipred_z1_fill1_tbl)
-+ adrp x8, L(ipred_z1_fill1_tbl)
-+ add x8, x8, :lo12: L(ipred_z1_fill1_tbl)
- sub w9, w9, #25
-- ldrh w9, [x8, w9, uxtw #1]
-+ ldr x8, [x8, w9, uxtw #3]
- add x10, x2, w6, uxtw #1 // top[max_base_x]
-- sub x8, x8, w9, uxtw
- ld1r {v31.8h}, [x10] // padding
- mov w7, w5
- mov w15, #64
-@@ -1917,12 +1935,14 @@ function ipred_z1_fill1_16bpc_neon, export=1
- mov w3, w12
- b 169b
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_z1_fill1_tbl):
-- .hword L(ipred_z1_fill1_tbl) - 640b
-- .hword L(ipred_z1_fill1_tbl) - 320b
-- .hword L(ipred_z1_fill1_tbl) - 160b
-- .hword L(ipred_z1_fill1_tbl) - 80b
-- .hword L(ipred_z1_fill1_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- function ipred_z1_fill2_16bpc_neon, export=1
-@@ -2050,11 +2070,11 @@ endconst
- // const int dx, const int dy);
- function ipred_z2_fill1_16bpc_neon, export=1
- clz w10, w4
-- adr x9, L(ipred_z2_fill1_tbl)
-+ adrp x9, L(ipred_z2_fill1_tbl)
-+ add x9, x9, :lo12: L(ipred_z2_fill1_tbl)
- sub w10, w10, #25
-- ldrh w10, [x9, w10, uxtw #1]
-+ ldr x9, [x9, w10, uxtw #3]
- mov w8, #(1 << 6) // xpos = 1 << 6
-- sub x9, x9, w10, uxtw
- sub w8, w8, w6 // xpos -= dx
-
- movrel x11, increments
-@@ -2815,12 +2835,14 @@ function ipred_z2_fill1_16bpc_neon, export=1
- ldp d8, d9, [sp], 0x40
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_z2_fill1_tbl):
-- .hword L(ipred_z2_fill1_tbl) - 640b
-- .hword L(ipred_z2_fill1_tbl) - 320b
-- .hword L(ipred_z2_fill1_tbl) - 160b
-- .hword L(ipred_z2_fill1_tbl) - 80b
-- .hword L(ipred_z2_fill1_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- function ipred_z2_fill2_16bpc_neon, export=1
-@@ -3432,11 +3454,11 @@ endfunc
- // const int dy, const int max_base_y);
- function ipred_z3_fill1_16bpc_neon, export=1
- clz w9, w4
-- adr x8, L(ipred_z3_fill1_tbl)
-+ adrp x8, L(ipred_z3_fill1_tbl)
-+ add x8, x8, :lo12: L(ipred_z3_fill1_tbl)
- sub w9, w9, #25
-- ldrh w9, [x8, w9, uxtw #1]
-+ ldr x8, [x8, w9, uxtw #3]
- add x10, x2, w6, uxtw #1 // left[max_base_y]
-- sub x8, x8, w9, uxtw
- ld1r {v31.8h}, [x10] // padding
- mov w7, w5
- mov w15, #64
-@@ -3638,17 +3660,20 @@ function ipred_z3_fill1_16bpc_neon, export=1
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_z3_fill1_tbl):
-- .hword L(ipred_z3_fill1_tbl) - 640b
-- .hword L(ipred_z3_fill1_tbl) - 320b
-- .hword L(ipred_z3_fill1_tbl) - 160b
-- .hword L(ipred_z3_fill1_tbl) - 80b
-- .hword L(ipred_z3_fill1_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- function ipred_z3_fill_padding_neon, export=0
- cmp w3, #8
-- adr x8, L(ipred_z3_fill_padding_tbl)
-+ adrp x8, L(ipred_z3_fill_padding_tbl)
-+ add x8, x8, :lo12: L(ipred_z3_fill_padding_tbl)
- b.gt L(ipred_z3_fill_padding_wide)
- // w3 = remaining width, w4 = constant height
- mov w12, w4
-@@ -3659,10 +3684,11 @@ function ipred_z3_fill_padding_neon, export=0
- // power of two in the remaining width, and repeating.
- clz w9, w3
- sub w9, w9, #25
-- ldrh w9, [x8, w9, uxtw #1]
-- sub x9, x8, w9, uxtw
-+ ldr x9, [x8, w9, uxtw #3]
- br x9
-
-+20:
-+ AARCH64_VALID_JUMP_TARGET
- 2:
- st1 {v31.s}[0], [x0], x1
- subs w4, w4, #4
-@@ -3681,6 +3707,8 @@ function ipred_z3_fill_padding_neon, export=0
- mov w4, w12
- b 1b
-
-+40:
-+ AARCH64_VALID_JUMP_TARGET
- 4:
- st1 {v31.4h}, [x0], x1
- subs w4, w4, #4
-@@ -3699,10 +3727,11 @@ function ipred_z3_fill_padding_neon, export=0
- mov w4, w12
- b 1b
-
--8:
--16:
--32:
--64:
-+80:
-+160:
-+320:
-+640:
-+ AARCH64_VALID_JUMP_TARGET
- st1 {v31.8h}, [x0], x1
- subs w4, w4, #4
- st1 {v31.8h}, [x13], x1
-@@ -3723,13 +3752,15 @@ function ipred_z3_fill_padding_neon, export=0
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_z3_fill_padding_tbl):
-- .hword L(ipred_z3_fill_padding_tbl) - 64b
-- .hword L(ipred_z3_fill_padding_tbl) - 32b
-- .hword L(ipred_z3_fill_padding_tbl) - 16b
-- .hword L(ipred_z3_fill_padding_tbl) - 8b
-- .hword L(ipred_z3_fill_padding_tbl) - 4b
-- .hword L(ipred_z3_fill_padding_tbl) - 2b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .popsection
-
- L(ipred_z3_fill_padding_wide):
- // Fill a WxH rectangle with padding, with W > 8.
-@@ -3880,13 +3911,13 @@ function ipred_filter_\bpc\()bpc_neon
- add x6, x6, w5, uxtw
- ld1 {v16.8b, v17.8b, v18.8b, v19.8b}, [x6], #32
- clz w9, w3
-- adr x5, L(ipred_filter\bpc\()_tbl)
-+ adrp x5, L(ipred_filter\bpc\()_tbl)
-+ add x5, x5, :lo12: L(ipred_filter\bpc\()_tbl)
- ld1 {v20.8b, v21.8b, v22.8b}, [x6]
- sub w9, w9, #26
-- ldrh w9, [x5, w9, uxtw #1]
-+ ldr x5, [x5, w9, uxtw #3]
- sxtl v16.8h, v16.8b
- sxtl v17.8h, v17.8b
-- sub x5, x5, w9, uxtw
- sxtl v18.8h, v18.8b
- sxtl v19.8h, v19.8b
- add x6, x0, x1
-@@ -4160,11 +4191,13 @@ function ipred_filter_\bpc\()bpc_neon
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_filter\bpc\()_tbl):
-- .hword L(ipred_filter\bpc\()_tbl) - 320b
-- .hword L(ipred_filter\bpc\()_tbl) - 160b
-- .hword L(ipred_filter\bpc\()_tbl) - 80b
-- .hword L(ipred_filter\bpc\()_tbl) - 40b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
- .endm
-
-@@ -4184,11 +4217,11 @@ endfunc
- function pal_pred_16bpc_neon, export=1
- ld1 {v30.8h}, [x2]
- clz w9, w4
-- adr x6, L(pal_pred_tbl)
-+ adrp x6, L(pal_pred_tbl)
-+ add x6, x6, :lo12: L(pal_pred_tbl)
- sub w9, w9, #25
-- ldrh w9, [x6, w9, uxtw #1]
-+ ldr x6, [x6, w9, uxtw #3]
- movi v31.8h, #1, lsl #8
-- sub x6, x6, w9, uxtw
- br x6
- 40:
- AARCH64_VALID_JUMP_TARGET
-@@ -4357,12 +4390,14 @@ function pal_pred_16bpc_neon, export=1
- b.gt 64b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(pal_pred_tbl):
-- .hword L(pal_pred_tbl) - 640b
-- .hword L(pal_pred_tbl) - 320b
-- .hword L(pal_pred_tbl) - 160b
-- .hword L(pal_pred_tbl) - 80b
-- .hword L(pal_pred_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void ipred_cfl_128_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4373,12 +4408,12 @@ endfunc
- function ipred_cfl_128_16bpc_neon, export=1
- dup v31.8h, w7 // bitdepth_max
- clz w9, w3
-- adr x7, L(ipred_cfl_128_tbl)
-+ adrp x7, L(ipred_cfl_128_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_128_tbl)
- sub w9, w9, #26
-- ldrh w9, [x7, w9, uxtw #1]
-+ ldr x7, [x7, w9, uxtw #3]
- urshr v0.8h, v31.8h, #1
- dup v1.8h, w6 // alpha
-- sub x7, x7, w9, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- movi v30.8h, #0
-@@ -4510,12 +4545,14 @@ L(ipred_cfl_splat_w16):
- b.gt 1b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_128_tbl):
- L(ipred_cfl_splat_tbl):
-- .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w16)
-- .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w16)
-- .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w8)
-- .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w4)
-+ .xword L(ipred_cfl_splat_w16)
-+ .xword L(ipred_cfl_splat_w16)
-+ .xword L(ipred_cfl_splat_w8)
-+ .xword L(ipred_cfl_splat_w4)
-+ .popsection
- endfunc
-
- // void ipred_cfl_top_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4526,12 +4563,12 @@ endfunc
- function ipred_cfl_top_16bpc_neon, export=1
- dup v31.8h, w7 // bitdepth_max
- clz w9, w3
-- adr x7, L(ipred_cfl_top_tbl)
-+ adrp x7, L(ipred_cfl_top_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_top_tbl)
- sub w9, w9, #26
-- ldrh w9, [x7, w9, uxtw #1]
-+ ldr x7, [x7, w9, uxtw #3]
- dup v1.8h, w6 // alpha
- add x2, x2, #2
-- sub x7, x7, w9, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- movi v30.8h, #0
-@@ -4569,11 +4606,13 @@ function ipred_cfl_top_16bpc_neon, export=1
- dup v0.8h, v0.h[0]
- b L(ipred_cfl_splat_w16)
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_top_tbl):
-- .hword L(ipred_cfl_top_tbl) - 32b
-- .hword L(ipred_cfl_top_tbl) - 16b
-- .hword L(ipred_cfl_top_tbl) - 8b
-- .hword L(ipred_cfl_top_tbl) - 4b
-+ .xword 32b
-+ .xword 16b
-+ .xword 8b
-+ .xword 4b
-+ .popsection
- endfunc
-
- // void ipred_cfl_left_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4586,15 +4625,15 @@ function ipred_cfl_left_16bpc_neon, export=1
- sub x2, x2, w4, uxtw #1
- clz w9, w3
- clz w8, w4
-- adr x10, L(ipred_cfl_splat_tbl)
-- adr x7, L(ipred_cfl_left_tbl)
-+ adrp x10, L(ipred_cfl_splat_tbl)
-+ add x10, x10, :lo12: L(ipred_cfl_splat_tbl)
-+ adrp x7, L(ipred_cfl_left_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_left_tbl)
- sub w9, w9, #26
- sub w8, w8, #26
-- ldrh w9, [x10, w9, uxtw #1]
-- ldrh w8, [x7, w8, uxtw #1]
-+ ldr x9, [x10, w9, uxtw #3]
-+ ldr x7, [x7, w8, uxtw #3]
- dup v1.8h, w6 // alpha
-- sub x9, x10, w9, uxtw
-- sub x7, x7, w8, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- movi v30.8h, #0
-@@ -4636,11 +4675,13 @@ L(ipred_cfl_left_h32):
- dup v0.8h, v0.h[0]
- br x9
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_left_tbl):
-- .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h32)
-- .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h16)
-- .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h8)
-- .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h4)
-+ .xword L(ipred_cfl_left_h32)
-+ .xword L(ipred_cfl_left_h16)
-+ .xword L(ipred_cfl_left_h8)
-+ .xword L(ipred_cfl_left_h4)
-+ .popsection
- endfunc
-
- // void ipred_cfl_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4656,16 +4697,15 @@ function ipred_cfl_16bpc_neon, export=1
- clz w9, w3
- clz w6, w4
- dup v16.4s, w8 // width + height
-- adr x7, L(ipred_cfl_tbl)
-+ adrp x7, L(ipred_cfl_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_tbl)
- rbit w8, w8 // rbit(width + height)
- sub w9, w9, #22 // 26 leading bits, minus table offset 4
- sub w6, w6, #26
- clz w8, w8 // ctz(width + height)
-- ldrh w9, [x7, w9, uxtw #1]
-- ldrh w6, [x7, w6, uxtw #1]
-+ ldr x9, [x7, w9, uxtw #3]
-+ ldr x7, [x7, w6, uxtw #3]
- neg w8, w8 // -ctz(width + height)
-- sub x9, x7, w9, uxtw
-- sub x7, x7, w6, uxtw
- ushr v16.4s, v16.4s, #1 // (width + height) >> 1
- dup v17.4s, w8 // -ctz(width + height)
- add x6, x0, x1
-@@ -4789,15 +4829,17 @@ L(ipred_cfl_w32):
- dup v0.8h, v0.h[0]
- b L(ipred_cfl_splat_w16)
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_tbl):
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_h32)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_h16)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_h8)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_h4)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_w32)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_w16)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_w8)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_w4)
-+ .xword L(ipred_cfl_h32)
-+ .xword L(ipred_cfl_h16)
-+ .xword L(ipred_cfl_h8)
-+ .xword L(ipred_cfl_h4)
-+ .xword L(ipred_cfl_w32)
-+ .xword L(ipred_cfl_w16)
-+ .xword L(ipred_cfl_w8)
-+ .xword L(ipred_cfl_w4)
-+ .popsection
- endfunc
-
- // void cfl_ac_420_16bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -4806,14 +4848,14 @@ endfunc
- function ipred_cfl_ac_420_16bpc_neon, export=1
- clz w8, w5
- lsl w4, w4, #2
-- adr x7, L(ipred_cfl_ac_420_tbl)
-+ adrp x7, L(ipred_cfl_ac_420_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_420_tbl)
- sub w8, w8, #27
-- ldrh w8, [x7, w8, uxtw #1]
-+ ldr x7, [x7, w8, uxtw #3]
- movi v24.4s, #0
- movi v25.4s, #0
- movi v26.4s, #0
- movi v27.4s, #0
-- sub x7, x7, w8, uxtw
- sub w8, w6, w4 // height - h_pad
- rbit w9, w5 // rbit(width)
- rbit w10, w6 // rbit(height)
-@@ -4945,9 +4987,9 @@ L(ipred_cfl_ac_420_w8_hpad):
-
- L(ipred_cfl_ac_420_w16):
- AARCH64_VALID_JUMP_TARGET
-- adr x7, L(ipred_cfl_ac_420_w16_tbl)
-- ldrh w3, [x7, w3, uxtw #1]
-- sub x7, x7, w3, uxtw
-+ adrp x7, L(ipred_cfl_ac_420_w16_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_420_w16_tbl)
-+ ldr x7, [x7, w3, uxtw #3]
- br x7
-
- L(ipred_cfl_ac_420_w16_wpad0):
-@@ -5124,17 +5166,19 @@ L(ipred_cfl_ac_420_w16_hpad):
- lsl w6, w6, #2
- b L(ipred_cfl_ac_420_w4_calc_subtract_dc)
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_420_tbl):
-- .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w16)
-- .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w8)
-- .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w4)
-- .hword 0
-+ .xword L(ipred_cfl_ac_420_w16)
-+ .xword L(ipred_cfl_ac_420_w8)
-+ .xword L(ipred_cfl_ac_420_w4)
-+ .xword 0
-
- L(ipred_cfl_ac_420_w16_tbl):
-- .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad0)
-- .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad1)
-- .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad2)
-- .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad3)
-+ .xword L(ipred_cfl_ac_420_w16_wpad0)
-+ .xword L(ipred_cfl_ac_420_w16_wpad1)
-+ .xword L(ipred_cfl_ac_420_w16_wpad2)
-+ .xword L(ipred_cfl_ac_420_w16_wpad3)
-+ .popsection
- endfunc
-
- // void cfl_ac_422_16bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -5143,14 +5187,14 @@ endfunc
- function ipred_cfl_ac_422_16bpc_neon, export=1
- clz w8, w5
- lsl w4, w4, #2
-- adr x7, L(ipred_cfl_ac_422_tbl)
-+ adrp x7, L(ipred_cfl_ac_422_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_422_tbl)
- sub w8, w8, #27
-- ldrh w8, [x7, w8, uxtw #1]
-+ ldr x7, [x7, w8, uxtw #3]
- movi v24.4s, #0
- movi v25.4s, #0
- movi v26.4s, #0
- movi v27.4s, #0
-- sub x7, x7, w8, uxtw
- sub w8, w6, w4 // height - h_pad
- rbit w9, w5 // rbit(width)
- rbit w10, w6 // rbit(height)
-@@ -5251,9 +5295,9 @@ L(ipred_cfl_ac_422_w8_wpad):
-
- L(ipred_cfl_ac_422_w16):
- AARCH64_VALID_JUMP_TARGET
-- adr x7, L(ipred_cfl_ac_422_w16_tbl)
-- ldrh w3, [x7, w3, uxtw #1]
-- sub x7, x7, w3, uxtw
-+ adrp x7, L(ipred_cfl_ac_422_w16_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_422_w16_tbl)
-+ ldr x7, [x7, w3, uxtw #3]
- br x7
-
- L(ipred_cfl_ac_422_w16_wpad0):
-@@ -5372,17 +5416,19 @@ L(ipred_cfl_ac_422_w16_wpad3):
- mov v1.16b, v3.16b
- b L(ipred_cfl_ac_420_w16_hpad)
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_422_tbl):
-- .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w16)
-- .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w8)
-- .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w4)
-- .hword 0
-+ .xword L(ipred_cfl_ac_422_w16)
-+ .xword L(ipred_cfl_ac_422_w8)
-+ .xword L(ipred_cfl_ac_422_w4)
-+ .xword 0
-
- L(ipred_cfl_ac_422_w16_tbl):
-- .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad0)
-- .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad1)
-- .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad2)
-- .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad3)
-+ .xword L(ipred_cfl_ac_422_w16_wpad0)
-+ .xword L(ipred_cfl_ac_422_w16_wpad1)
-+ .xword L(ipred_cfl_ac_422_w16_wpad2)
-+ .xword L(ipred_cfl_ac_422_w16_wpad3)
-+ .popsection
- endfunc
-
- // void cfl_ac_444_16bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -5391,14 +5437,14 @@ endfunc
- function ipred_cfl_ac_444_16bpc_neon, export=1
- clz w8, w5
- lsl w4, w4, #2
-- adr x7, L(ipred_cfl_ac_444_tbl)
-+ adrp x7, L(ipred_cfl_ac_444_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_444_tbl)
- sub w8, w8, #26
-- ldrh w8, [x7, w8, uxtw #1]
-+ ldr x7, [x7, w8, uxtw #3]
- movi v24.4s, #0
- movi v25.4s, #0
- movi v26.4s, #0
- movi v27.4s, #0
-- sub x7, x7, w8, uxtw
- sub w8, w6, w4 // height - h_pad
- rbit w9, w5 // rbit(width)
- rbit w10, w6 // rbit(height)
-@@ -5507,10 +5553,11 @@ L(ipred_cfl_ac_444_w16_wpad):
-
- L(ipred_cfl_ac_444_w32):
- AARCH64_VALID_JUMP_TARGET
-- adr x7, L(ipred_cfl_ac_444_w32_tbl)
-- ldrh w3, [x7, w3, uxtw] // (w3>>1) << 1
-+ adrp x7, L(ipred_cfl_ac_444_w32_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_444_w32_tbl)
-+ lsr w3, w3, #1
-+ ldr x7, [x7, w3, uxtw #3] // (w3>>1) << 3
- lsr x2, x2, #1 // Restore the stride to one line increments
-- sub x7, x7, w3, uxtw
- br x7
-
- L(ipred_cfl_ac_444_w32_wpad0):
-@@ -5625,15 +5672,17 @@ L(ipred_cfl_ac_444_w32_hpad):
- lsl w6, w6, #3
- b L(ipred_cfl_ac_420_w4_calc_subtract_dc)
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_444_tbl):
-- .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w32)
-- .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w16)
-- .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w8)
-- .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w4)
-+ .xword L(ipred_cfl_ac_444_w32)
-+ .xword L(ipred_cfl_ac_444_w16)
-+ .xword L(ipred_cfl_ac_444_w8)
-+ .xword L(ipred_cfl_ac_444_w4)
-
- L(ipred_cfl_ac_444_w32_tbl):
-- .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad0)
-- .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad2)
-- .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad4)
-- .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad6)
-+ .xword L(ipred_cfl_ac_444_w32_wpad0)
-+ .xword L(ipred_cfl_ac_444_w32_wpad2)
-+ .xword L(ipred_cfl_ac_444_w32_wpad4)
-+ .xword L(ipred_cfl_ac_444_w32_wpad6)
-+ .popsection
- endfunc
Index: patches/patch-src_arm_64_ipred_S
===================================================================
RCS file: patches/patch-src_arm_64_ipred_S
diff -N patches/patch-src_arm_64_ipred_S
--- patches/patch-src_arm_64_ipred_S 13 Jul 2023 12:26:14 -0000 1.4
+++ /dev/null 1 Jan 1970 00:00:00 -0000
@@ -1,972 +0,0 @@
-Index: src/arm/64/ipred.S
---- src/arm/64/ipred.S.orig
-+++ src/arm/64/ipred.S
-@@ -34,11 +34,11 @@
- // const int max_width, const int max_height);
- function ipred_dc_128_8bpc_neon, export=1
- clz w3, w3
-- adr x5, L(ipred_dc_128_tbl)
-+ adrp x5, L(ipred_dc_128_tbl)
-+ add x5, x5, :lo12: L(ipred_dc_128_tbl)
- sub w3, w3, #25
-- ldrh w3, [x5, w3, uxtw #1]
-+ ldr x5, [x5, w3, uxtw #3]
- movi v0.16b, #128
-- sub x5, x5, w3, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -94,12 +94,14 @@ function ipred_dc_128_8bpc_neon, export=1
- b.gt 64b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_dc_128_tbl):
-- .hword L(ipred_dc_128_tbl) - 640b
-- .hword L(ipred_dc_128_tbl) - 320b
-- .hword L(ipred_dc_128_tbl) - 16b
-- .hword L(ipred_dc_128_tbl) - 8b
-- .hword L(ipred_dc_128_tbl) - 4b
-+ .xword 640b
-+ .xword 320b
-+ .xword 16b
-+ .xword 8b
-+ .xword 4b
-+ .popsection
- endfunc
-
- // void ipred_v_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -108,11 +110,11 @@ endfunc
- // const int max_width, const int max_height);
- function ipred_v_8bpc_neon, export=1
- clz w3, w3
-- adr x5, L(ipred_v_tbl)
-+ adrp x5, L(ipred_v_tbl)
-+ add x5, x5, :lo12: L(ipred_v_tbl)
- sub w3, w3, #25
-- ldrh w3, [x5, w3, uxtw #1]
-+ ldr x5, [x5, w3, uxtw #3]
- add x2, x2, #1
-- sub x5, x5, w3, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -172,12 +174,14 @@ function ipred_v_8bpc_neon, export=1
- b.gt 64b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_v_tbl):
-- .hword L(ipred_v_tbl) - 640b
-- .hword L(ipred_v_tbl) - 320b
-- .hword L(ipred_v_tbl) - 160b
-- .hword L(ipred_v_tbl) - 80b
-- .hword L(ipred_v_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void ipred_h_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -186,11 +190,11 @@ endfunc
- // const int max_width, const int max_height);
- function ipred_h_8bpc_neon, export=1
- clz w3, w3
-- adr x5, L(ipred_h_tbl)
-+ adrp x5, L(ipred_h_tbl)
-+ add x5, x5, :lo12: L(ipred_h_tbl)
- sub w3, w3, #25
-- ldrh w3, [x5, w3, uxtw #1]
-+ ldr x5, [x5, w3, uxtw #3]
- sub x2, x2, #4
-- sub x5, x5, w3, uxtw
- mov x7, #-4
- add x6, x0, x1
- lsl x1, x1, #1
-@@ -258,12 +262,14 @@ function ipred_h_8bpc_neon, export=1
- b.gt 64b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_h_tbl):
-- .hword L(ipred_h_tbl) - 64b
-- .hword L(ipred_h_tbl) - 32b
-- .hword L(ipred_h_tbl) - 16b
-- .hword L(ipred_h_tbl) - 8b
-- .hword L(ipred_h_tbl) - 4b
-+ .xword 64b
-+ .xword 32b
-+ .xword 16b
-+ .xword 8b
-+ .xword 4b
-+ .popsection
- endfunc
-
- // void ipred_dc_top_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -272,11 +278,11 @@ endfunc
- // const int max_width, const int max_height);
- function ipred_dc_top_8bpc_neon, export=1
- clz w3, w3
-- adr x5, L(ipred_dc_top_tbl)
-+ adrp x5, L(ipred_dc_top_tbl)
-+ add x5, x5, :lo12: L(ipred_dc_top_tbl)
- sub w3, w3, #25
-- ldrh w3, [x5, w3, uxtw #1]
-+ ldr x5, [x5, w3, uxtw #3]
- add x2, x2, #1
-- sub x5, x5, w3, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -363,12 +369,14 @@ function ipred_dc_top_8bpc_neon, export=1
- b.gt 64b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_dc_top_tbl):
-- .hword L(ipred_dc_top_tbl) - 640b
-- .hword L(ipred_dc_top_tbl) - 320b
-- .hword L(ipred_dc_top_tbl) - 160b
-- .hword L(ipred_dc_top_tbl) - 80b
-- .hword L(ipred_dc_top_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void ipred_dc_left_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -379,13 +387,12 @@ function ipred_dc_left_8bpc_neon, export=1
- sub x2, x2, w4, uxtw
- clz w3, w3
- clz w7, w4
-- adr x5, L(ipred_dc_left_tbl)
-+ adrp x5, L(ipred_dc_left_tbl)
-+ add x5, x5, :lo12: L(ipred_dc_left_tbl)
- sub w3, w3, #20 // 25 leading bits, minus table offset 5
- sub w7, w7, #25
-- ldrh w3, [x5, w3, uxtw #1]
-- ldrh w7, [x5, w7, uxtw #1]
-- sub x3, x5, w3, uxtw
-- sub x5, x5, w7, uxtw
-+ ldr x3, [x5, w3, uxtw #3]
-+ ldr x5, [x5, w7, uxtw #3]
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -489,17 +496,19 @@ L(ipred_dc_left_w64):
- b.gt 1b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_dc_left_tbl):
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h64)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h32)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h16)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h8)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h4)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w64)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w32)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w16)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w8)
-- .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w4)
-+ .xword L(ipred_dc_left_h64)
-+ .xword L(ipred_dc_left_h32)
-+ .xword L(ipred_dc_left_h16)
-+ .xword L(ipred_dc_left_h8)
-+ .xword L(ipred_dc_left_h4)
-+ .xword L(ipred_dc_left_w64)
-+ .xword L(ipred_dc_left_w32)
-+ .xword L(ipred_dc_left_w16)
-+ .xword L(ipred_dc_left_w8)
-+ .xword L(ipred_dc_left_w4)
-+ .popsection
- endfunc
-
- // void ipred_dc_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -512,16 +521,15 @@ function ipred_dc_8bpc_neon, export=1
- clz w3, w3
- clz w6, w4
- dup v16.8h, w7 // width + height
-- adr x5, L(ipred_dc_tbl)
-+ adrp x5, L(ipred_dc_tbl)
-+ add x5, x5, :lo12: L(ipred_dc_tbl)
- rbit w7, w7 // rbit(width + height)
- sub w3, w3, #20 // 25 leading bits, minus table offset 5
- sub w6, w6, #25
- clz w7, w7 // ctz(width + height)
-- ldrh w3, [x5, w3, uxtw #1]
-- ldrh w6, [x5, w6, uxtw #1]
-+ ldr x3, [x5, w3, uxtw #3]
-+ ldr x5, [x5, w6, uxtw #3]
- neg w7, w7 // -ctz(width + height)
-- sub x3, x5, w3, uxtw
-- sub x5, x5, w6, uxtw
- ushr v16.8h, v16.8h, #1 // (width + height) >> 1
- dup v17.8h, w7 // -ctz(width + height)
- add x6, x0, x1
-@@ -714,17 +722,19 @@ L(ipred_dc_w64):
- b.gt 2b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_dc_tbl):
-- .hword L(ipred_dc_tbl) - L(ipred_dc_h64)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_h32)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_h16)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_h8)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_h4)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_w64)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_w32)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_w16)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_w8)
-- .hword L(ipred_dc_tbl) - L(ipred_dc_w4)
-+ .xword L(ipred_dc_h64)
-+ .xword L(ipred_dc_h32)
-+ .xword L(ipred_dc_h16)
-+ .xword L(ipred_dc_h8)
-+ .xword L(ipred_dc_h4)
-+ .xword L(ipred_dc_w64)
-+ .xword L(ipred_dc_w32)
-+ .xword L(ipred_dc_w16)
-+ .xword L(ipred_dc_w8)
-+ .xword L(ipred_dc_w4)
-+ .popsection
- endfunc
-
- // void ipred_paeth_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -733,13 +743,13 @@ endfunc
- // const int max_width, const int max_height);
- function ipred_paeth_8bpc_neon, export=1
- clz w9, w3
-- adr x5, L(ipred_paeth_tbl)
-+ adrp x5, L(ipred_paeth_tbl)
-+ add x5, x5, :lo12: L(ipred_paeth_tbl)
- sub w9, w9, #25
-- ldrh w9, [x5, w9, uxtw #1]
-+ ldr x5, [x5, w9, uxtw #3]
- ld1r {v4.16b}, [x2]
- add x8, x2, #1
- sub x2, x2, #4
-- sub x5, x5, w9, uxtw
- mov x7, #-4
- add x6, x0, x1
- lsl x1, x1, #1
-@@ -899,12 +909,14 @@ function ipred_paeth_8bpc_neon, export=1
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_paeth_tbl):
-- .hword L(ipred_paeth_tbl) - 640b
-- .hword L(ipred_paeth_tbl) - 320b
-- .hword L(ipred_paeth_tbl) - 160b
-- .hword L(ipred_paeth_tbl) - 80b
-- .hword L(ipred_paeth_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void ipred_smooth_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -916,13 +928,13 @@ function ipred_smooth_8bpc_neon, export=1
- add x11, x10, w4, uxtw
- add x10, x10, w3, uxtw
- clz w9, w3
-- adr x5, L(ipred_smooth_tbl)
-+ adrp x5, L(ipred_smooth_tbl)
-+ add x5, x5, :lo12: L(ipred_smooth_tbl)
- sub x12, x2, w4, uxtw
- sub w9, w9, #25
-- ldrh w9, [x5, w9, uxtw #1]
-+ ldr x5, [x5, w9, uxtw #3]
- ld1r {v4.16b}, [x12] // bottom
- add x8, x2, #1
-- sub x5, x5, w9, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -1080,12 +1092,14 @@ function ipred_smooth_8bpc_neon, export=1
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_smooth_tbl):
-- .hword L(ipred_smooth_tbl) - 640b
-- .hword L(ipred_smooth_tbl) - 320b
-- .hword L(ipred_smooth_tbl) - 160b
-- .hword L(ipred_smooth_tbl) - 80b
-- .hword L(ipred_smooth_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void ipred_smooth_v_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -1096,13 +1110,13 @@ function ipred_smooth_v_8bpc_neon, export=1
- movrel x7, X(sm_weights)
- add x7, x7, w4, uxtw
- clz w9, w3
-- adr x5, L(ipred_smooth_v_tbl)
-+ adrp x5, L(ipred_smooth_v_tbl)
-+ add x5, x5, :lo12: L(ipred_smooth_v_tbl)
- sub x8, x2, w4, uxtw
- sub w9, w9, #25
-- ldrh w9, [x5, w9, uxtw #1]
-+ ldr x5, [x5, w9, uxtw #3]
- ld1r {v4.16b}, [x8] // bottom
- add x2, x2, #1
-- sub x5, x5, w9, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -1221,12 +1235,14 @@ function ipred_smooth_v_8bpc_neon, export=1
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_smooth_v_tbl):
-- .hword L(ipred_smooth_v_tbl) - 640b
-- .hword L(ipred_smooth_v_tbl) - 320b
-- .hword L(ipred_smooth_v_tbl) - 160b
-- .hword L(ipred_smooth_v_tbl) - 80b
-- .hword L(ipred_smooth_v_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void ipred_smooth_h_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -1237,12 +1253,12 @@ function ipred_smooth_h_8bpc_neon, export=1
- movrel x8, X(sm_weights)
- add x8, x8, w3, uxtw
- clz w9, w3
-- adr x5, L(ipred_smooth_h_tbl)
-+ adrp x5, L(ipred_smooth_h_tbl)
-+ add x5, x5, :lo12: L(ipred_smooth_h_tbl)
- add x12, x2, w3, uxtw
- sub w9, w9, #25
-- ldrh w9, [x5, w9, uxtw #1]
-+ ldr x5, [x5, w9, uxtw #3]
- ld1r {v5.16b}, [x12] // right
-- sub x5, x5, w9, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x5
-@@ -1367,12 +1383,14 @@ function ipred_smooth_h_8bpc_neon, export=1
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_smooth_h_tbl):
-- .hword L(ipred_smooth_h_tbl) - 640b
-- .hword L(ipred_smooth_h_tbl) - 320b
-- .hword L(ipred_smooth_h_tbl) - 160b
-- .hword L(ipred_smooth_h_tbl) - 80b
-- .hword L(ipred_smooth_h_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- const padding_mask_buf
-@@ -1653,11 +1671,11 @@ endfunc
- // const int dx, const int max_base_x);
- function ipred_z1_fill1_8bpc_neon, export=1
- clz w9, w3
-- adr x8, L(ipred_z1_fill1_tbl)
-+ adrp x8, L(ipred_z1_fill1_tbl)
-+ add x8, x8, :lo12: L(ipred_z1_fill1_tbl)
- sub w9, w9, #25
-- ldrh w9, [x8, w9, uxtw #1]
-+ ldr x8, [x8, w9, uxtw #3]
- add x10, x2, w6, uxtw // top[max_base_x]
-- sub x8, x8, w9, uxtw
- ld1r {v31.16b}, [x10] // padding
- mov w7, w5
- mov w15, #64
-@@ -1816,12 +1834,14 @@ function ipred_z1_fill1_8bpc_neon, export=1
- mov w3, w12
- b 169b
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_z1_fill1_tbl):
-- .hword L(ipred_z1_fill1_tbl) - 640b
-- .hword L(ipred_z1_fill1_tbl) - 320b
-- .hword L(ipred_z1_fill1_tbl) - 160b
-- .hword L(ipred_z1_fill1_tbl) - 80b
-- .hword L(ipred_z1_fill1_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- function ipred_z1_fill2_8bpc_neon, export=1
-@@ -1940,11 +1960,11 @@ endconst
- // const int dx, const int dy);
- function ipred_z2_fill1_8bpc_neon, export=1
- clz w10, w4
-- adr x9, L(ipred_z2_fill1_tbl)
-+ adrp x9, L(ipred_z2_fill1_tbl)
-+ add x9, x9, :lo12: L(ipred_z2_fill1_tbl)
- sub w10, w10, #25
-- ldrh w10, [x9, w10, uxtw #1]
-+ ldr x9, [x9, w10, uxtw #3]
- mov w8, #(1 << 6) // xpos = 1 << 6
-- sub x9, x9, w10, uxtw
- sub w8, w8, w6 // xpos -= dx
-
- movrel x11, increments
-@@ -2651,12 +2671,14 @@ function ipred_z2_fill1_8bpc_neon, export=1
- ldp d8, d9, [sp], 0x40
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_z2_fill1_tbl):
-- .hword L(ipred_z2_fill1_tbl) - 640b
-- .hword L(ipred_z2_fill1_tbl) - 320b
-- .hword L(ipred_z2_fill1_tbl) - 160b
-- .hword L(ipred_z2_fill1_tbl) - 80b
-- .hword L(ipred_z2_fill1_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- function ipred_z2_fill2_8bpc_neon, export=1
-@@ -3160,11 +3182,11 @@ endfunc
- function ipred_z3_fill1_8bpc_neon, export=1
- cmp w6, #64
- clz w9, w3
-- adr x8, L(ipred_z3_fill1_tbl)
-+ adrp x8, L(ipred_z3_fill1_tbl)
-+ add x8, x8, :lo12: L(ipred_z3_fill1_tbl)
- sub w9, w9, #25
-- ldrh w9, [x8, w9, uxtw #1]
-+ ldr x8, [x8, w9, uxtw #3]
- add x10, x2, w6, uxtw // left[max_base_y]
-- sub x8, x8, w9, uxtw
- movrel x11, increments
- ld1r {v31.16b}, [x10] // padding
- ld1 {v30.8h}, [x11] // increments
-@@ -3503,17 +3525,20 @@ L(ipred_z3_fill1_large_h16):
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_z3_fill1_tbl):
-- .hword L(ipred_z3_fill1_tbl) - 640b
-- .hword L(ipred_z3_fill1_tbl) - 320b
-- .hword L(ipred_z3_fill1_tbl) - 160b
-- .hword L(ipred_z3_fill1_tbl) - 80b
-- .hword L(ipred_z3_fill1_tbl) - 40b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- function ipred_z3_fill_padding_neon, export=0
- cmp w3, #16
-- adr x8, L(ipred_z3_fill_padding_tbl)
-+ adrp x8, L(ipred_z3_fill_padding_tbl)
-+ add x8, x8, :lo12: L(ipred_z3_fill_padding_tbl)
- b.gt L(ipred_z3_fill_padding_wide)
- // w3 = remaining width, w4 = constant height
- mov w12, w4
-@@ -3524,10 +3549,11 @@ function ipred_z3_fill_padding_neon, export=0
- // power of two in the remaining width, and repeating.
- clz w9, w3
- sub w9, w9, #25
-- ldrh w9, [x8, w9, uxtw #1]
-- sub x9, x8, w9, uxtw
-+ ldr x9, [x8, w9, uxtw #3]
- br x9
-
-+20:
-+ AARCH64_VALID_JUMP_TARGET
- 2:
- st1 {v31.h}[0], [x0], x1
- subs w4, w4, #4
-@@ -3546,6 +3572,8 @@ function ipred_z3_fill_padding_neon, export=0
- mov w4, w12
- b 1b
-
-+40:
-+ AARCH64_VALID_JUMP_TARGET
- 4:
- st1 {v31.s}[0], [x0], x1
- subs w4, w4, #4
-@@ -3564,7 +3592,8 @@ function ipred_z3_fill_padding_neon, export=0
- mov w4, w12
- b 1b
-
--8:
-+80:
-+ AARCH64_VALID_JUMP_TARGET
- st1 {v31.8b}, [x0], x1
- subs w4, w4, #4
- st1 {v31.8b}, [x13], x1
-@@ -3582,9 +3611,10 @@ function ipred_z3_fill_padding_neon, export=0
- mov w4, w12
- b 1b
-
--16:
--32:
--64:
-+160:
-+320:
-+640:
-+ AARCH64_VALID_JUMP_TARGET
- st1 {v31.16b}, [x0], x1
- subs w4, w4, #4
- st1 {v31.16b}, [x13], x1
-@@ -3605,13 +3635,15 @@ function ipred_z3_fill_padding_neon, export=0
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_z3_fill_padding_tbl):
-- .hword L(ipred_z3_fill_padding_tbl) - 64b
-- .hword L(ipred_z3_fill_padding_tbl) - 32b
-- .hword L(ipred_z3_fill_padding_tbl) - 16b
-- .hword L(ipred_z3_fill_padding_tbl) - 8b
-- .hword L(ipred_z3_fill_padding_tbl) - 4b
-- .hword L(ipred_z3_fill_padding_tbl) - 2b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .popsection
-
- L(ipred_z3_fill_padding_wide):
- // Fill a WxH rectangle with padding, with W > 16.
-@@ -3766,13 +3798,13 @@ function ipred_filter_8bpc_neon, export=1
- add x6, x6, w5, uxtw
- ld1 {v16.8b, v17.8b, v18.8b, v19.8b}, [x6], #32
- clz w9, w3
-- adr x5, L(ipred_filter_tbl)
-+ adrp x5, L(ipred_filter_tbl)
-+ add x5, x5, :lo12: L(ipred_filter_tbl)
- ld1 {v20.8b, v21.8b, v22.8b}, [x6]
- sub w9, w9, #26
-- ldrh w9, [x5, w9, uxtw #1]
-+ ldr x5, [x5, w9, uxtw #3]
- sxtl v16.8h, v16.8b
- sxtl v17.8h, v17.8b
-- sub x5, x5, w9, uxtw
- sxtl v18.8h, v18.8b
- sxtl v19.8h, v19.8b
- add x6, x0, x1
-@@ -3913,11 +3945,13 @@ function ipred_filter_8bpc_neon, export=1
- 9:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_filter_tbl):
-- .hword L(ipred_filter_tbl) - 320b
-- .hword L(ipred_filter_tbl) - 160b
-- .hword L(ipred_filter_tbl) - 80b
-- .hword L(ipred_filter_tbl) - 40b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- // void pal_pred_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -3926,11 +3960,11 @@ endfunc
- function pal_pred_8bpc_neon, export=1
- ld1 {v0.8h}, [x2]
- clz w9, w4
-- adr x6, L(pal_pred_tbl)
-+ adrp x6, L(pal_pred_tbl)
-+ add x6, x6, :lo12: L(pal_pred_tbl)
- sub w9, w9, #25
-- ldrh w9, [x6, w9, uxtw #1]
-+ ldr x6, [x6, w9, uxtw #3]
- xtn v0.8b, v0.8h
-- sub x6, x6, w9, uxtw
- add x2, x0, x1
- lsl x1, x1, #1
- br x6
-@@ -4008,12 +4042,14 @@ function pal_pred_8bpc_neon, export=1
- b.gt 64b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(pal_pred_tbl):
-- .hword L(pal_pred_tbl) - 64b
-- .hword L(pal_pred_tbl) - 32b
-- .hword L(pal_pred_tbl) - 16b
-- .hword L(pal_pred_tbl) - 8b
-- .hword L(pal_pred_tbl) - 4b
-+ .xword 64b
-+ .xword 32b
-+ .xword 16b
-+ .xword 8b
-+ .xword 4b
-+ .popsection
- endfunc
-
- // void ipred_cfl_128_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4022,12 +4058,12 @@ endfunc
- // const int16_t *ac, const int alpha);
- function ipred_cfl_128_8bpc_neon, export=1
- clz w9, w3
-- adr x7, L(ipred_cfl_128_tbl)
-+ adrp x7, L(ipred_cfl_128_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_128_tbl)
- sub w9, w9, #26
-- ldrh w9, [x7, w9, uxtw #1]
-+ ldr x7, [x7, w9, uxtw #3]
- movi v0.8h, #128 // dc
- dup v1.8h, w6 // alpha
-- sub x7, x7, w9, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x7
-@@ -4132,12 +4168,14 @@ L(ipred_cfl_splat_w16):
- b.gt 1b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_128_tbl):
- L(ipred_cfl_splat_tbl):
-- .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w16)
-- .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w16)
-- .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w8)
-- .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w4)
-+ .xword L(ipred_cfl_splat_w16)
-+ .xword L(ipred_cfl_splat_w16)
-+ .xword L(ipred_cfl_splat_w8)
-+ .xword L(ipred_cfl_splat_w4)
-+ .popsection
- endfunc
-
- // void ipred_cfl_top_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4146,12 +4184,12 @@ endfunc
- // const int16_t *ac, const int alpha);
- function ipred_cfl_top_8bpc_neon, export=1
- clz w9, w3
-- adr x7, L(ipred_cfl_top_tbl)
-+ adrp x7, L(ipred_cfl_top_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_top_tbl)
- sub w9, w9, #26
-- ldrh w9, [x7, w9, uxtw #1]
-+ ldr x7, [x7, w9, uxtw #3]
- dup v1.8h, w6 // alpha
- add x2, x2, #1
-- sub x7, x7, w9, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x7
-@@ -4186,11 +4224,13 @@ function ipred_cfl_top_8bpc_neon, export=1
- dup v0.8h, v2.h[0]
- b L(ipred_cfl_splat_w16)
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_top_tbl):
-- .hword L(ipred_cfl_top_tbl) - 32b
-- .hword L(ipred_cfl_top_tbl) - 16b
-- .hword L(ipred_cfl_top_tbl) - 8b
-- .hword L(ipred_cfl_top_tbl) - 4b
-+ .xword 32b
-+ .xword 16b
-+ .xword 8b
-+ .xword 4b
-+ .popsection
- endfunc
-
- // void ipred_cfl_left_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4201,15 +4241,15 @@ function ipred_cfl_left_8bpc_neon, export=1
- sub x2, x2, w4, uxtw
- clz w9, w3
- clz w8, w4
-- adr x10, L(ipred_cfl_splat_tbl)
-- adr x7, L(ipred_cfl_left_tbl)
-+ adrp x10, L(ipred_cfl_splat_tbl)
-+ add x10, x10, :lo12: L(ipred_cfl_splat_tbl)
-+ adrp x7, L(ipred_cfl_left_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_left_tbl)
- sub w9, w9, #26
- sub w8, w8, #26
-- ldrh w9, [x10, w9, uxtw #1]
-- ldrh w8, [x7, w8, uxtw #1]
-+ ldr x9, [x10, w9, uxtw #3]
-+ ldr x7, [x7, w8, uxtw #3]
- dup v1.8h, w6 // alpha
-- sub x9, x10, w9, uxtw
-- sub x7, x7, w8, uxtw
- add x6, x0, x1
- lsl x1, x1, #1
- br x7
-@@ -4248,11 +4288,13 @@ L(ipred_cfl_left_h32):
- dup v0.8h, v2.h[0]
- br x9
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_left_tbl):
-- .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h32)
-- .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h16)
-- .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h8)
-- .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h4)
-+ .xword L(ipred_cfl_left_h32)
-+ .xword L(ipred_cfl_left_h16)
-+ .xword L(ipred_cfl_left_h8)
-+ .xword L(ipred_cfl_left_h4)
-+ .popsection
- endfunc
-
- // void ipred_cfl_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4266,16 +4308,15 @@ function ipred_cfl_8bpc_neon, export=1
- clz w9, w3
- clz w6, w4
- dup v16.8h, w8 // width + height
-- adr x7, L(ipred_cfl_tbl)
-+ adrp x7, L(ipred_cfl_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_tbl)
- rbit w8, w8 // rbit(width + height)
- sub w9, w9, #22 // 26 leading bits, minus table offset 4
- sub w6, w6, #26
- clz w8, w8 // ctz(width + height)
-- ldrh w9, [x7, w9, uxtw #1]
-- ldrh w6, [x7, w6, uxtw #1]
-+ ldr x9, [x7, w9, uxtw #3]
-+ ldr x7, [x7, w6, uxtw #3]
- neg w8, w8 // -ctz(width + height)
-- sub x9, x7, w9, uxtw
-- sub x7, x7, w6, uxtw
- ushr v16.8h, v16.8h, #1 // (width + height) >> 1
- dup v17.8h, w8 // -ctz(width + height)
- add x6, x0, x1
-@@ -4392,15 +4433,17 @@ L(ipred_cfl_w32):
- dup v0.8h, v0.h[0]
- b L(ipred_cfl_splat_w16)
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_tbl):
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_h32)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_h16)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_h8)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_h4)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_w32)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_w16)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_w8)
-- .hword L(ipred_cfl_tbl) - L(ipred_cfl_w4)
-+ .xword L(ipred_cfl_h32)
-+ .xword L(ipred_cfl_h16)
-+ .xword L(ipred_cfl_h8)
-+ .xword L(ipred_cfl_h4)
-+ .xword L(ipred_cfl_w32)
-+ .xword L(ipred_cfl_w16)
-+ .xword L(ipred_cfl_w8)
-+ .xword L(ipred_cfl_w4)
-+ .popsection
- endfunc
-
- // void cfl_ac_420_8bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -4409,14 +4452,14 @@ endfunc
- function ipred_cfl_ac_420_8bpc_neon, export=1
- clz w8, w5
- lsl w4, w4, #2
-- adr x7, L(ipred_cfl_ac_420_tbl)
-+ adrp x7, L(ipred_cfl_ac_420_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_420_tbl)
- sub w8, w8, #27
-- ldrh w8, [x7, w8, uxtw #1]
-+ ldr x7, [x7, w8, uxtw #3]
- movi v16.8h, #0
- movi v17.8h, #0
- movi v18.8h, #0
- movi v19.8h, #0
-- sub x7, x7, w8, uxtw
- sub w8, w6, w4 // height - h_pad
- rbit w9, w5 // rbit(width)
- rbit w10, w6 // rbit(height)
-@@ -4555,9 +4598,9 @@ L(ipred_cfl_ac_420_w8_subtract_dc):
-
- L(ipred_cfl_ac_420_w16):
- AARCH64_VALID_JUMP_TARGET
-- adr x7, L(ipred_cfl_ac_420_w16_tbl)
-- ldrh w3, [x7, w3, uxtw #1]
-- sub x7, x7, w3, uxtw
-+ adrp x7, L(ipred_cfl_ac_420_w16_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_420_w16_tbl)
-+ ldr x7, [x7, w3, uxtw #3]
- br x7
-
- L(ipred_cfl_ac_420_w16_wpad0):
-@@ -4714,17 +4757,19 @@ L(ipred_cfl_ac_420_w16_hpad):
- lsl w6, w6, #1
- b L(ipred_cfl_ac_420_w8_calc_subtract_dc)
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_420_tbl):
-- .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w16)
-- .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w8)
-- .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w4)
-- .hword 0
-+ .xword L(ipred_cfl_ac_420_w16)
-+ .xword L(ipred_cfl_ac_420_w8)
-+ .xword L(ipred_cfl_ac_420_w4)
-+ .xword 0
-
- L(ipred_cfl_ac_420_w16_tbl):
-- .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad0)
-- .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad1)
-- .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad2)
-- .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad3)
-+ .xword L(ipred_cfl_ac_420_w16_wpad0)
-+ .xword L(ipred_cfl_ac_420_w16_wpad1)
-+ .xword L(ipred_cfl_ac_420_w16_wpad2)
-+ .xword L(ipred_cfl_ac_420_w16_wpad3)
-+ .popsection
- endfunc
-
- // void cfl_ac_422_8bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -4733,14 +4778,14 @@ endfunc
- function ipred_cfl_ac_422_8bpc_neon, export=1
- clz w8, w5
- lsl w4, w4, #2
-- adr x7, L(ipred_cfl_ac_422_tbl)
-+ adrp x7, L(ipred_cfl_ac_422_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_422_tbl)
- sub w8, w8, #27
-- ldrh w8, [x7, w8, uxtw #1]
-+ ldr x7, [x7, w8, uxtw #3]
- movi v16.8h, #0
- movi v17.8h, #0
- movi v18.8h, #0
- movi v19.8h, #0
-- sub x7, x7, w8, uxtw
- sub w8, w6, w4 // height - h_pad
- rbit w9, w5 // rbit(width)
- rbit w10, w6 // rbit(height)
-@@ -4831,9 +4876,9 @@ L(ipred_cfl_ac_422_w8_wpad):
-
- L(ipred_cfl_ac_422_w16):
- AARCH64_VALID_JUMP_TARGET
-- adr x7, L(ipred_cfl_ac_422_w16_tbl)
-- ldrh w3, [x7, w3, uxtw #1]
-- sub x7, x7, w3, uxtw
-+ adrp x7, L(ipred_cfl_ac_422_w16_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_422_w16_tbl)
-+ ldr x7, [x7, w3, uxtw #3]
- br x7
-
- L(ipred_cfl_ac_422_w16_wpad0):
-@@ -4936,17 +4981,19 @@ L(ipred_cfl_ac_422_w16_wpad3):
- mov v1.16b, v3.16b
- b L(ipred_cfl_ac_420_w16_hpad)
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_422_tbl):
-- .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w16)
-- .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w8)
-- .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w4)
-- .hword 0
-+ .xword L(ipred_cfl_ac_422_w16)
-+ .xword L(ipred_cfl_ac_422_w8)
-+ .xword L(ipred_cfl_ac_422_w4)
-+ .xword 0
-
- L(ipred_cfl_ac_422_w16_tbl):
-- .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad0)
-- .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad1)
-- .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad2)
-- .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad3)
-+ .xword L(ipred_cfl_ac_422_w16_wpad0)
-+ .xword L(ipred_cfl_ac_422_w16_wpad1)
-+ .xword L(ipred_cfl_ac_422_w16_wpad2)
-+ .xword L(ipred_cfl_ac_422_w16_wpad3)
-+ .popsection
- endfunc
-
- // void cfl_ac_444_8bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -4955,14 +5002,14 @@ endfunc
- function ipred_cfl_ac_444_8bpc_neon, export=1
- clz w8, w5
- lsl w4, w4, #2
-- adr x7, L(ipred_cfl_ac_444_tbl)
-+ adrp x7, L(ipred_cfl_ac_444_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_444_tbl)
- sub w8, w8, #26
-- ldrh w8, [x7, w8, uxtw #1]
-+ ldr x7, [x7, w8, uxtw #3]
- movi v16.8h, #0
- movi v17.8h, #0
- movi v18.8h, #0
- movi v19.8h, #0
-- sub x7, x7, w8, uxtw
- sub w8, w6, w4 // height - h_pad
- rbit w9, w5 // rbit(width)
- rbit w10, w6 // rbit(height)
-@@ -5083,9 +5130,10 @@ L(ipred_cfl_ac_444_w16_wpad):
-
- L(ipred_cfl_ac_444_w32):
- AARCH64_VALID_JUMP_TARGET
-- adr x7, L(ipred_cfl_ac_444_w32_tbl)
-- ldrh w3, [x7, w3, uxtw] // (w3>>1) << 1
-- sub x7, x7, w3, uxtw
-+ adrp x7, L(ipred_cfl_ac_444_w32_tbl)
-+ add x7, x7, :lo12: L(ipred_cfl_ac_444_w32_tbl)
-+ lsr w3, w3, #1
-+ ldr x7, [x7, w3, uxtw #3] // (w3>>1) << 3
- br x7
-
- L(ipred_cfl_ac_444_w32_wpad0):
-@@ -5231,15 +5279,17 @@ L(ipred_cfl_ac_444_w32_hpad):
- dup v4.8h, v4.h[0]
- b L(ipred_cfl_ac_420_w8_subtract_dc)
-
-+ .pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_444_tbl):
-- .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w32)
-- .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w16)
-- .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w8)
-- .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w4)
-+ .xword L(ipred_cfl_ac_444_w32)
-+ .xword L(ipred_cfl_ac_444_w16)
-+ .xword L(ipred_cfl_ac_444_w8)
-+ .xword L(ipred_cfl_ac_444_w4)
-
- L(ipred_cfl_ac_444_w32_tbl):
-- .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad0)
-- .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad2)
-- .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad4)
-- .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad6)
-+ .xword L(ipred_cfl_ac_444_w32_wpad0)
-+ .xword L(ipred_cfl_ac_444_w32_wpad2)
-+ .xword L(ipred_cfl_ac_444_w32_wpad4)
-+ .xword L(ipred_cfl_ac_444_w32_wpad6)
-+ .popsection
- endfunc
Index: patches/patch-src_arm_64_mc16_S
===================================================================
RCS file: patches/patch-src_arm_64_mc16_S
diff -N patches/patch-src_arm_64_mc16_S
--- patches/patch-src_arm_64_mc16_S 24 Apr 2023 21:06:59 -0000 1.1
+++ /dev/null 1 Jan 1970 00:00:00 -0000
@@ -1,523 +0,0 @@
-Index: src/arm/64/mc16.S
---- src/arm/64/mc16.S.orig
-+++ src/arm/64/mc16.S
-@@ -145,11 +145,11 @@ function \type\()_16bpc_neon, export=1
- dup v27.4s, w6
- neg v27.4s, v27.4s
- .endif
-- adr x7, L(\type\()_tbl)
-+ adrp x7, L(\type\()_tbl)
-+ add x7, x7, :lo12: L(\type\()_tbl)
- sub w4, w4, #24
- \type v4, v5, v0, v1, v2, v3
-- ldrh w4, [x7, x4, lsl #1]
-- sub x7, x7, w4, uxtw
-+ ldr x7, [x7, x4, lsl #3]
- br x7
- 40:
- AARCH64_VALID_JUMP_TARGET
-@@ -228,13 +228,15 @@ function \type\()_16bpc_neon, export=1
- b 128b
- 0:
- ret
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_tbl):
-- .hword L(\type\()_tbl) - 1280b
-- .hword L(\type\()_tbl) - 640b
-- .hword L(\type\()_tbl) - 32b
-- .hword L(\type\()_tbl) - 16b
-- .hword L(\type\()_tbl) - 80b
-- .hword L(\type\()_tbl) - 40b
-+ .xword 1280b
-+ .xword 640b
-+ .xword 32b
-+ .xword 16b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
- .endm
-
-@@ -247,12 +249,12 @@ bidir_fn mask, w7
- function w_mask_\type\()_16bpc_neon, export=1
- ldr w8, [sp]
- clz w9, w4
-- adr x10, L(w_mask_\type\()_tbl)
-+ adrp x10, L(w_mask_\type\()_tbl)
-+ add x10, x10, :lo12: L(w_mask_\type\()_tbl)
- dup v31.8h, w8 // bitdepth_max
- sub w9, w9, #24
- clz w8, w8 // clz(bitdepth_max)
-- ldrh w9, [x10, x9, lsl #1]
-- sub x10, x10, w9, uxtw
-+ ldr x10, [x10, x9, lsl #3]
- sub w8, w8, #12 // sh = intermediate_bits + 6 = clz(bitdepth_max) - 12
- mov w9, #PREP_BIAS*64
- neg w8, w8 // -sh
-@@ -541,13 +543,15 @@ function w_mask_\type\()_16bpc_neon, export=1
- add x12, x12, x1
- b.gt 161b
- ret
-+ .pushsection .data.rel.ro, "aw"
- L(w_mask_\type\()_tbl):
-- .hword L(w_mask_\type\()_tbl) - 1280b
-- .hword L(w_mask_\type\()_tbl) - 640b
-- .hword L(w_mask_\type\()_tbl) - 320b
-- .hword L(w_mask_\type\()_tbl) - 160b
-- .hword L(w_mask_\type\()_tbl) - 8b
-- .hword L(w_mask_\type\()_tbl) - 4b
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 8b
-+ .xword 4b
-+ .popsection
- endfunc
- .endm
-
-@@ -557,11 +561,11 @@ w_mask_fn 420
-
-
- function blend_16bpc_neon, export=1
-- adr x6, L(blend_tbl)
-+ adrp x6, L(blend_tbl)
-+ add x6, x6, :lo12: L(blend_tbl)
- clz w3, w3
- sub w3, w3, #26
-- ldrh w3, [x6, x3, lsl #1]
-- sub x6, x6, w3, uxtw
-+ ldr x6, [x6, x3, lsl #3]
- add x8, x0, x1
- br x6
- 40:
-@@ -673,15 +677,18 @@ function blend_16bpc_neon, export=1
- st1 {v0.8h, v1.8h, v2.8h, v3.8h}, [x0], x1
- b.gt 32b
- ret
-+ .pushsection .data.rel.ro, "aw"
- L(blend_tbl):
-- .hword L(blend_tbl) - 32b
-- .hword L(blend_tbl) - 160b
-- .hword L(blend_tbl) - 80b
-- .hword L(blend_tbl) - 40b
-+ .xword 32b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
- function blend_h_16bpc_neon, export=1
-- adr x6, L(blend_h_tbl)
-+ adrp x6, L(blend_h_tbl)
-+ add x6, x6, :lo12: L(blend_h_tbl)
- movrel x5, X(obmc_masks)
- add x5, x5, w4, uxtw
- sub w4, w4, w4, lsr #2
-@@ -689,8 +696,7 @@ function blend_h_16bpc_neon, export=1
- add x8, x0, x1
- lsl x1, x1, #1
- sub w7, w7, #24
-- ldrh w7, [x6, x7, lsl #1]
-- sub x6, x6, w7, uxtw
-+ ldr x6, [x6, x7, lsl #3]
- br x6
- 2:
- AARCH64_VALID_JUMP_TARGET
-@@ -835,26 +841,28 @@ function blend_h_16bpc_neon, export=1
- add x7, x7, w3, uxtw #1
- b.gt 321b
- ret
-+ .pushsection .data.rel.ro, "aw"
- L(blend_h_tbl):
-- .hword L(blend_h_tbl) - 1280b
-- .hword L(blend_h_tbl) - 640b
-- .hword L(blend_h_tbl) - 320b
-- .hword L(blend_h_tbl) - 16b
-- .hword L(blend_h_tbl) - 8b
-- .hword L(blend_h_tbl) - 4b
-- .hword L(blend_h_tbl) - 2b
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 16b
-+ .xword 8b
-+ .xword 4b
-+ .xword 2b
-+ .popsection
- endfunc
-
- function blend_v_16bpc_neon, export=1
-- adr x6, L(blend_v_tbl)
-+ adrp x6, L(blend_v_tbl)
-+ add x6, x6, :lo12: L(blend_v_tbl)
- movrel x5, X(obmc_masks)
- add x5, x5, w3, uxtw
- clz w3, w3
- add x8, x0, x1
- lsl x1, x1, #1
- sub w3, w3, #26
-- ldrh w3, [x6, x3, lsl #1]
-- sub x6, x6, w3, uxtw
-+ ldr x6, [x6, x3, lsl #3]
- br x6
- 20:
- AARCH64_VALID_JUMP_TARGET
-@@ -992,21 +1000,23 @@ function blend_v_16bpc_neon, export=1
- st1 {v4.8h, v5.8h, v6.8h}, [x8], x1
- b.gt 32b
- ret
-+ .pushsection .data.rel.ro, "aw"
- L(blend_v_tbl):
-- .hword L(blend_v_tbl) - 320b
-- .hword L(blend_v_tbl) - 160b
-- .hword L(blend_v_tbl) - 80b
-- .hword L(blend_v_tbl) - 40b
-- .hword L(blend_v_tbl) - 20b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .popsection
- endfunc
-
-
- // This has got the same signature as the put_8tap functions,
- // and assumes that x9 is set to (clz(w)-24).
- function put_neon
-- adr x10, L(put_tbl)
-- ldrh w9, [x10, x9, lsl #1]
-- sub x10, x10, w9, uxtw
-+ adrp x10, L(put_tbl)
-+ add x10, x10, :lo12: L(put_tbl)
-+ ldr x10, [x10, x9, lsl #3]
- br x10
-
- 2:
-@@ -1106,14 +1116,16 @@ function put_neon
- b.gt 128b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(put_tbl):
-- .hword L(put_tbl) - 128b
-- .hword L(put_tbl) - 64b
-- .hword L(put_tbl) - 32b
-- .hword L(put_tbl) - 16b
-- .hword L(put_tbl) - 80b
-- .hword L(put_tbl) - 4b
-- .hword L(put_tbl) - 2b
-+ .xword 128b
-+ .xword 64b
-+ .xword 32b
-+ .xword 16b
-+ .xword 80b
-+ .xword 4b
-+ .xword 2b
-+ .popsection
- endfunc
-
-
-@@ -1121,11 +1133,11 @@ endfunc
- // and assumes that x9 is set to (clz(w)-24), w7 to intermediate_bits and
- // x8 to w*2.
- function prep_neon
-- adr x10, L(prep_tbl)
-- ldrh w9, [x10, x9, lsl #1]
-+ adrp x10, L(prep_tbl)
-+ add x10, x10, :lo12: L(prep_tbl)
-+ ldr x10, [x10, x9, lsl #3]
- dup v31.8h, w7 // intermediate_bits
- movi v30.8h, #(PREP_BIAS >> 8), lsl #8
-- sub x10, x10, w9, uxtw
- br x10
-
- 40:
-@@ -1278,13 +1290,15 @@ function prep_neon
- b.gt 128b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(prep_tbl):
-- .hword L(prep_tbl) - 128b
-- .hword L(prep_tbl) - 64b
-- .hword L(prep_tbl) - 32b
-- .hword L(prep_tbl) - 16b
-- .hword L(prep_tbl) - 80b
-- .hword L(prep_tbl) - 40b
-+ .xword 128b
-+ .xword 64b
-+ .xword 32b
-+ .xword 16b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
-
-
-@@ -1563,16 +1577,16 @@ L(\type\()_8tap_h):
- add \xmx, x11, \mx, uxtw #3
- b.ne L(\type\()_8tap_hv)
-
-- adr x10, L(\type\()_8tap_h_tbl)
-+ adrp x10, L(\type\()_8tap_h_tbl)
-+ add x10, x10, :lo12: L(\type\()_8tap_h_tbl)
- dup v30.4s, w12 // 6 - intermediate_bits
-- ldrh w9, [x10, x9, lsl #1]
-+ ldr x10, [x10, x9, lsl #3]
- neg v30.4s, v30.4s // -(6-intermediate_bits)
- .ifc \type, put
- dup v29.8h, \bdmax // intermediate_bits
- .else
- movi v28.8h, #(PREP_BIAS >> 8), lsl #8
- .endif
-- sub x10, x10, w9, uxtw
- .ifc \type, put
- neg v29.8h, v29.8h // -intermediate_bits
- .endif
-@@ -1734,15 +1748,17 @@ L(\type\()_8tap_h):
- b.gt 81b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_h_tbl):
-- .hword L(\type\()_8tap_h_tbl) - 1280b
-- .hword L(\type\()_8tap_h_tbl) - 640b
-- .hword L(\type\()_8tap_h_tbl) - 320b
-- .hword L(\type\()_8tap_h_tbl) - 160b
-- .hword L(\type\()_8tap_h_tbl) - 80b
-- .hword L(\type\()_8tap_h_tbl) - 40b
-- .hword L(\type\()_8tap_h_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
-
-
- L(\type\()_8tap_v):
-@@ -1758,12 +1774,12 @@ L(\type\()_8tap_v):
- dup v30.4s, w12 // 6 - intermediate_bits
- movi v29.8h, #(PREP_BIAS >> 8), lsl #8
- .endif
-- adr x10, L(\type\()_8tap_v_tbl)
-- ldrh w9, [x10, x9, lsl #1]
-+ adrp x10, L(\type\()_8tap_v_tbl)
-+ add x10, x10, :lo12: L(\type\()_8tap_v_tbl)
-+ ldr x10, [x10, x9, lsl #3]
- .ifc \type, prep
- neg v30.4s, v30.4s // -(6-intermediate_bits)
- .endif
-- sub x10, x10, w9, uxtw
- br x10
-
- 20: // 2xN v
-@@ -2029,15 +2045,17 @@ L(\type\()_8tap_v):
- 0:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_v_tbl):
-- .hword L(\type\()_8tap_v_tbl) - 1280b
-- .hword L(\type\()_8tap_v_tbl) - 640b
-- .hword L(\type\()_8tap_v_tbl) - 320b
-- .hword L(\type\()_8tap_v_tbl) - 160b
-- .hword L(\type\()_8tap_v_tbl) - 80b
-- .hword L(\type\()_8tap_v_tbl) - 40b
-- .hword L(\type\()_8tap_v_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
-
- L(\type\()_8tap_hv):
- cmp \h, #4
-@@ -2048,16 +2066,16 @@ L(\type\()_8tap_hv):
- 4:
- add \xmy, x11, \my, uxtw #3
-
-- adr x10, L(\type\()_8tap_hv_tbl)
-+ adrp x10, L(\type\()_8tap_hv_tbl)
-+ add x10, x10, :lo12: L(\type\()_8tap_hv_tbl)
- dup v30.4s, w12 // 6 - intermediate_bits
-- ldrh w9, [x10, x9, lsl #1]
-+ ldr x10, [x10, x9, lsl #3]
- neg v30.4s, v30.4s // -(6-intermediate_bits)
- .ifc \type, put
- dup v29.4s, w13 // 6 + intermediate_bits
- .else
- movi v29.8h, #(PREP_BIAS >> 8), lsl #8
- .endif
-- sub x10, x10, w9, uxtw
- .ifc \type, put
- neg v29.4s, v29.4s // -(6+intermediate_bits)
- .endif
-@@ -2623,15 +2641,17 @@ L(\type\()_8tap_filter_8):
- uzp1 v24.8h, v27.8h, v28.8h // Ditto
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_hv_tbl):
-- .hword L(\type\()_8tap_hv_tbl) - 1280b
-- .hword L(\type\()_8tap_hv_tbl) - 640b
-- .hword L(\type\()_8tap_hv_tbl) - 320b
-- .hword L(\type\()_8tap_hv_tbl) - 160b
-- .hword L(\type\()_8tap_hv_tbl) - 80b
-- .hword L(\type\()_8tap_hv_tbl) - 40b
-- .hword L(\type\()_8tap_hv_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
- endfunc
-
-
-@@ -2665,16 +2685,16 @@ function \type\()_bilin_16bpc_neon, export=1
- L(\type\()_bilin_h):
- cbnz \my, L(\type\()_bilin_hv)
-
-- adr x10, L(\type\()_bilin_h_tbl)
-+ adrp x10, L(\type\()_bilin_h_tbl)
-+ add x10, x10, :lo12: L(\type\()_bilin_h_tbl)
- dup v31.8h, w11 // 4 - intermediate_bits
-- ldrh w9, [x10, x9, lsl #1]
-+ ldr x10, [x10, x9, lsl #3]
- neg v31.8h, v31.8h // -(4-intermediate_bits)
- .ifc \type, put
- dup v30.8h, \bdmax // intermediate_bits
- .else
- movi v29.8h, #(PREP_BIAS >> 8), lsl #8
- .endif
-- sub x10, x10, w9, uxtw
- .ifc \type, put
- neg v30.8h, v30.8h // -intermediate_bits
- .endif
-@@ -2832,29 +2852,31 @@ L(\type\()_bilin_h):
- b.gt 161b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_h_tbl):
-- .hword L(\type\()_bilin_h_tbl) - 1280b
-- .hword L(\type\()_bilin_h_tbl) - 640b
-- .hword L(\type\()_bilin_h_tbl) - 320b
-- .hword L(\type\()_bilin_h_tbl) - 160b
-- .hword L(\type\()_bilin_h_tbl) - 80b
-- .hword L(\type\()_bilin_h_tbl) - 40b
-- .hword L(\type\()_bilin_h_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
-
-
- L(\type\()_bilin_v):
- cmp \h, #4
-- adr x10, L(\type\()_bilin_v_tbl)
-+ adrp x10, L(\type\()_bilin_v_tbl)
-+ add x10, x10, :lo12: L(\type\()_bilin_v_tbl)
- .ifc \type, prep
- dup v31.8h, w11 // 4 - intermediate_bits
- .endif
-- ldrh w9, [x10, x9, lsl #1]
-+ ldr x10, [x10, x9, lsl #3]
- .ifc \type, prep
- movi v29.8h, #(PREP_BIAS >> 8), lsl #8
- neg v31.8h, v31.8h // -(4-intermediate_bits)
- .endif
-- sub x10, x10, w9, uxtw
- br x10
-
- 20: // 2xN v
-@@ -3030,27 +3052,29 @@ L(\type\()_bilin_v):
- 0:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_v_tbl):
-- .hword L(\type\()_bilin_v_tbl) - 1280b
-- .hword L(\type\()_bilin_v_tbl) - 640b
-- .hword L(\type\()_bilin_v_tbl) - 320b
-- .hword L(\type\()_bilin_v_tbl) - 160b
-- .hword L(\type\()_bilin_v_tbl) - 80b
-- .hword L(\type\()_bilin_v_tbl) - 40b
-- .hword L(\type\()_bilin_v_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
-
- L(\type\()_bilin_hv):
-- adr x10, L(\type\()_bilin_hv_tbl)
-+ adrp x10, L(\type\()_bilin_hv_tbl)
-+ add x10, x10, :lo12: L(\type\()_bilin_hv_tbl)
- dup v31.8h, w11 // 4 - intermediate_bits
-- ldrh w9, [x10, x9, lsl #1]
-+ ldr x10, [x10, x9, lsl #3]
- neg v31.8h, v31.8h // -(4-intermediate_bits)
- .ifc \type, put
- dup v30.4s, w12 // 4 + intermediate_bits
- .else
- movi v29.8h, #(PREP_BIAS >> 8), lsl #8
- .endif
-- sub x10, x10, w9, uxtw
- .ifc \type, put
- neg v30.4s, v30.4s // -(4+intermediate_bits)
- .endif
-@@ -3224,15 +3248,17 @@ L(\type\()_bilin_hv):
- 0:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_hv_tbl):
-- .hword L(\type\()_bilin_hv_tbl) - 1280b
-- .hword L(\type\()_bilin_hv_tbl) - 640b
-- .hword L(\type\()_bilin_hv_tbl) - 320b
-- .hword L(\type\()_bilin_hv_tbl) - 160b
-- .hword L(\type\()_bilin_hv_tbl) - 80b
-- .hword L(\type\()_bilin_hv_tbl) - 40b
-- .hword L(\type\()_bilin_hv_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
- endfunc
- .endm
-
Index: patches/patch-src_arm_64_mc_S
===================================================================
RCS file: patches/patch-src_arm_64_mc_S
diff -N patches/patch-src_arm_64_mc_S
--- patches/patch-src_arm_64_mc_S 24 Apr 2023 21:06:59 -0000 1.1
+++ /dev/null 1 Jan 1970 00:00:00 -0000
@@ -1,483 +0,0 @@
-Index: src/arm/64/mc.S
---- src/arm/64/mc.S.orig
-+++ src/arm/64/mc.S
-@@ -79,11 +79,11 @@ function \type\()_8bpc_neon, export=1
- .ifc \type, mask
- movi v31.16b, #256-2
- .endif
-- adr x7, L(\type\()_tbl)
-+ adrp x7, L(\type\()_tbl)
-+ add x7, x7, :lo12: L(\type\()_tbl)
- sub w4, w4, #24
-- ldrh w4, [x7, x4, lsl #1]
-+ ldr x7, [x7, x4, lsl #3]
- \type v4, v0, v1, v2, v3
-- sub x7, x7, w4, uxtw
- br x7
- 40:
- AARCH64_VALID_JUMP_TARGET
-@@ -192,13 +192,15 @@ function \type\()_8bpc_neon, export=1
- b 128b
- 0:
- ret
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_tbl):
-- .hword L(\type\()_tbl) - 1280b
-- .hword L(\type\()_tbl) - 640b
-- .hword L(\type\()_tbl) - 320b
-- .hword L(\type\()_tbl) - 16b
-- .hword L(\type\()_tbl) - 80b
-- .hword L(\type\()_tbl) - 40b
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 16b
-+ .xword 80b
-+ .xword 40b
-+ .popsection
- endfunc
- .endm
-
-@@ -210,10 +212,10 @@ bidir_fn mask
- .macro w_mask_fn type
- function w_mask_\type\()_8bpc_neon, export=1
- clz w8, w4
-- adr x9, L(w_mask_\type\()_tbl)
-+ adrp x9, L(w_mask_\type\()_tbl)
-+ add x9, x9, :lo12: L(w_mask_\type\()_tbl)
- sub w8, w8, #24
-- ldrh w8, [x9, x8, lsl #1]
-- sub x9, x9, w8, uxtw
-+ ldr x9, [x9, x8, lsl #3]
- mov w10, #6903
- dup v0.8h, w10
- .if \type == 444
-@@ -413,13 +415,15 @@ function w_mask_\type\()_8bpc_neon, export=1
- add x12, x12, x1
- b.gt 161b
- ret
-+ .pushsection .data.rel.ro, "aw"
- L(w_mask_\type\()_tbl):
-- .hword L(w_mask_\type\()_tbl) - 1280b
-- .hword L(w_mask_\type\()_tbl) - 640b
-- .hword L(w_mask_\type\()_tbl) - 320b
-- .hword L(w_mask_\type\()_tbl) - 160b
-- .hword L(w_mask_\type\()_tbl) - 8b
-- .hword L(w_mask_\type\()_tbl) - 4b
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 8b
-+ .xword 4b
-+ .popsection
- endfunc
- .endm
-
-@@ -429,11 +433,11 @@ w_mask_fn 420
-
-
- function blend_8bpc_neon, export=1
-- adr x6, L(blend_tbl)
-+ adrp x6, L(blend_tbl)
-+ add x6, x6, :lo12: L(blend_tbl)
- clz w3, w3
- sub w3, w3, #26
-- ldrh w3, [x6, x3, lsl #1]
-- sub x6, x6, w3, uxtw
-+ ldr x6, [x6, x3, lsl #3]
- movi v4.16b, #64
- add x8, x0, x1
- lsl x1, x1, #1
-@@ -535,15 +539,18 @@ function blend_8bpc_neon, export=1
- st1 {v27.16b, v28.16b}, [x8], x1
- b.gt 32b
- ret
-+ .pushsection .data.rel.ro, "aw"
- L(blend_tbl):
-- .hword L(blend_tbl) - 32b
-- .hword L(blend_tbl) - 16b
-- .hword L(blend_tbl) - 8b
-- .hword L(blend_tbl) - 4b
-+ .xword 32b
-+ .xword 16b
-+ .xword 8b
-+ .xword 4b
-+ .popsection
- endfunc
-
- function blend_h_8bpc_neon, export=1
-- adr x6, L(blend_h_tbl)
-+ adrp x6, L(blend_h_tbl)
-+ add x6, x6, :lo12: L(blend_h_tbl)
- movrel x5, X(obmc_masks)
- add x5, x5, w4, uxtw
- sub w4, w4, w4, lsr #2
-@@ -552,8 +559,7 @@ function blend_h_8bpc_neon, export=1
- add x8, x0, x1
- lsl x1, x1, #1
- sub w7, w7, #24
-- ldrh w7, [x6, x7, lsl #1]
-- sub x6, x6, w7, uxtw
-+ ldr x6, [x6, x7, lsl #3]
- br x6
- 2:
- AARCH64_VALID_JUMP_TARGET
-@@ -682,18 +688,21 @@ function blend_h_8bpc_neon, export=1
- add x7, x7, w3, uxtw
- b.gt 321b
- ret
-+ .pushsection .data.rel.ro, "aw"
- L(blend_h_tbl):
-- .hword L(blend_h_tbl) - 1280b
-- .hword L(blend_h_tbl) - 640b
-- .hword L(blend_h_tbl) - 320b
-- .hword L(blend_h_tbl) - 16b
-- .hword L(blend_h_tbl) - 8b
-- .hword L(blend_h_tbl) - 4b
-- .hword L(blend_h_tbl) - 2b
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 16b
-+ .xword 8b
-+ .xword 4b
-+ .xword 2b
-+ .popsection
- endfunc
-
- function blend_v_8bpc_neon, export=1
-- adr x6, L(blend_v_tbl)
-+ adrp x6, L(blend_v_tbl)
-+ add x6, x6, :lo12: L(blend_v_tbl)
- movrel x5, X(obmc_masks)
- add x5, x5, w3, uxtw
- clz w3, w3
-@@ -701,8 +710,7 @@ function blend_v_8bpc_neon, export=1
- add x8, x0, x1
- lsl x1, x1, #1
- sub w3, w3, #26
-- ldrh w3, [x6, x3, lsl #1]
-- sub x6, x6, w3, uxtw
-+ ldr x6, [x6, x3, lsl #3]
- br x6
- 20:
- AARCH64_VALID_JUMP_TARGET
-@@ -826,21 +834,23 @@ function blend_v_8bpc_neon, export=1
- st1 {v27.8b}, [x8], x1
- b.gt 32b
- ret
-+ .pushsection .data.rel.ro, "aw"
- L(blend_v_tbl):
-- .hword L(blend_v_tbl) - 320b
-- .hword L(blend_v_tbl) - 160b
-- .hword L(blend_v_tbl) - 80b
-- .hword L(blend_v_tbl) - 40b
-- .hword L(blend_v_tbl) - 20b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .popsection
- endfunc
-
-
- // This has got the same signature as the put_8tap functions,
- // and assumes that x8 is set to (clz(w)-24).
- function put_neon
-- adr x9, L(put_tbl)
-- ldrh w8, [x9, x8, lsl #1]
-- sub x9, x9, w8, uxtw
-+ adrp x9, L(put_tbl)
-+ add x9, x9, :lo12: L(put_tbl)
-+ ldr x9, [x9, x8, lsl #3]
- br x9
-
- 2:
-@@ -926,23 +936,25 @@ function put_neon
- b.gt 128b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(put_tbl):
-- .hword L(put_tbl) - 128b
-- .hword L(put_tbl) - 64b
-- .hword L(put_tbl) - 32b
-- .hword L(put_tbl) - 160b
-- .hword L(put_tbl) - 8b
-- .hword L(put_tbl) - 4b
-- .hword L(put_tbl) - 2b
-+ .xword 128b
-+ .xword 64b
-+ .xword 32b
-+ .xword 160b
-+ .xword 8b
-+ .xword 4b
-+ .xword 2b
-+ .popsection
- endfunc
-
-
- // This has got the same signature as the prep_8tap functions,
- // and assumes that x8 is set to (clz(w)-24), and x7 to w*2.
- function prep_neon
-- adr x9, L(prep_tbl)
-- ldrh w8, [x9, x8, lsl #1]
-- sub x9, x9, w8, uxtw
-+ adrp x9, L(prep_tbl)
-+ add x9, x9, :lo12: L(prep_tbl)
-+ ldr x9, [x9, x8, lsl #3]
- br x9
-
- 4:
-@@ -1058,13 +1070,15 @@ function prep_neon
- b.gt 128b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(prep_tbl):
-- .hword L(prep_tbl) - 1280b
-- .hword L(prep_tbl) - 640b
-- .hword L(prep_tbl) - 320b
-- .hword L(prep_tbl) - 160b
-- .hword L(prep_tbl) - 8b
-- .hword L(prep_tbl) - 4b
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 8b
-+ .xword 4b
-+ .popsection
- endfunc
-
-
-@@ -1370,9 +1384,9 @@ L(\type\()_8tap_h):
- add \xmx, x10, \mx, uxtw #3
- b.ne L(\type\()_8tap_hv)
-
-- adr x9, L(\type\()_8tap_h_tbl)
-- ldrh w8, [x9, x8, lsl #1]
-- sub x9, x9, w8, uxtw
-+ adrp x9, L(\type\()_8tap_h_tbl)
-+ add x9, x9, :lo12: L(\type\()_8tap_h_tbl)
-+ ldr x9, [x9, x8, lsl #3]
- br x9
-
- 20: // 2xN h
-@@ -1575,15 +1589,17 @@ L(\type\()_8tap_h):
- b.gt 161b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_h_tbl):
-- .hword L(\type\()_8tap_h_tbl) - 1280b
-- .hword L(\type\()_8tap_h_tbl) - 640b
-- .hword L(\type\()_8tap_h_tbl) - 320b
-- .hword L(\type\()_8tap_h_tbl) - 160b
-- .hword L(\type\()_8tap_h_tbl) - 80b
-- .hword L(\type\()_8tap_h_tbl) - 40b
-- .hword L(\type\()_8tap_h_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
-
-
- L(\type\()_8tap_v):
-@@ -1595,9 +1611,9 @@ L(\type\()_8tap_v):
- 4:
- add \xmy, x10, \my, uxtw #3
-
-- adr x9, L(\type\()_8tap_v_tbl)
-- ldrh w8, [x9, x8, lsl #1]
-- sub x9, x9, w8, uxtw
-+ adrp x9, L(\type\()_8tap_v_tbl)
-+ add x9, x9, :lo12: L(\type\()_8tap_v_tbl)
-+ ldr x9, [x9, x8, lsl #3]
- br x9
-
- 20: // 2xN v
-@@ -1901,15 +1917,17 @@ L(\type\()_8tap_v):
- 0:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_v_tbl):
-- .hword L(\type\()_8tap_v_tbl) - 1280b
-- .hword L(\type\()_8tap_v_tbl) - 640b
-- .hword L(\type\()_8tap_v_tbl) - 320b
-- .hword L(\type\()_8tap_v_tbl) - 160b
-- .hword L(\type\()_8tap_v_tbl) - 80b
-- .hword L(\type\()_8tap_v_tbl) - 40b
-- .hword L(\type\()_8tap_v_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
-
- L(\type\()_8tap_hv):
- cmp \h, #4
-@@ -1920,9 +1938,9 @@ L(\type\()_8tap_hv):
- 4:
- add \xmy, x10, \my, uxtw #3
-
-- adr x9, L(\type\()_8tap_hv_tbl)
-- ldrh w8, [x9, x8, lsl #1]
-- sub x9, x9, w8, uxtw
-+ adrp x9, L(\type\()_8tap_hv_tbl)
-+ add x9, x9, :lo12: L(\type\()_8tap_hv_tbl)
-+ ldr x9, [x9, x8, lsl #3]
- br x9
-
- 20:
-@@ -2444,15 +2462,17 @@ L(\type\()_8tap_filter_8):
- srshr v25.8h, v25.8h, #2
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_hv_tbl):
-- .hword L(\type\()_8tap_hv_tbl) - 1280b
-- .hword L(\type\()_8tap_hv_tbl) - 640b
-- .hword L(\type\()_8tap_hv_tbl) - 320b
-- .hword L(\type\()_8tap_hv_tbl) - 160b
-- .hword L(\type\()_8tap_hv_tbl) - 80b
-- .hword L(\type\()_8tap_hv_tbl) - 40b
-- .hword L(\type\()_8tap_hv_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
- endfunc
-
-
-@@ -2478,9 +2498,9 @@ function \type\()_bilin_8bpc_neon, export=1
- L(\type\()_bilin_h):
- cbnz \my, L(\type\()_bilin_hv)
-
-- adr x9, L(\type\()_bilin_h_tbl)
-- ldrh w8, [x9, x8, lsl #1]
-- sub x9, x9, w8, uxtw
-+ adrp x9, L(\type\()_bilin_h_tbl)
-+ add x9, x9, :lo12: L(\type\()_bilin_h_tbl)
-+ ldr x9, [x9, x8, lsl #3]
- br x9
-
- 20: // 2xN h
-@@ -2624,22 +2644,24 @@ L(\type\()_bilin_h):
- b.gt 161b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_h_tbl):
-- .hword L(\type\()_bilin_h_tbl) - 1280b
-- .hword L(\type\()_bilin_h_tbl) - 640b
-- .hword L(\type\()_bilin_h_tbl) - 320b
-- .hword L(\type\()_bilin_h_tbl) - 160b
-- .hword L(\type\()_bilin_h_tbl) - 80b
-- .hword L(\type\()_bilin_h_tbl) - 40b
-- .hword L(\type\()_bilin_h_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
-
-
- L(\type\()_bilin_v):
- cmp \h, #4
-- adr x9, L(\type\()_bilin_v_tbl)
-- ldrh w8, [x9, x8, lsl #1]
-- sub x9, x9, w8, uxtw
-+ adrp x9, L(\type\()_bilin_v_tbl)
-+ add x9, x9, :lo12: L(\type\()_bilin_v_tbl)
-+ ldr x9, [x9, x8, lsl #3]
- br x9
-
- 20: // 2xN v
-@@ -2810,22 +2832,24 @@ L(\type\()_bilin_v):
- 0:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_v_tbl):
-- .hword L(\type\()_bilin_v_tbl) - 1280b
-- .hword L(\type\()_bilin_v_tbl) - 640b
-- .hword L(\type\()_bilin_v_tbl) - 320b
-- .hword L(\type\()_bilin_v_tbl) - 160b
-- .hword L(\type\()_bilin_v_tbl) - 80b
-- .hword L(\type\()_bilin_v_tbl) - 40b
-- .hword L(\type\()_bilin_v_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
-
- L(\type\()_bilin_hv):
- uxtl v2.8h, v2.8b
- uxtl v3.8h, v3.8b
-- adr x9, L(\type\()_bilin_hv_tbl)
-- ldrh w8, [x9, x8, lsl #1]
-- sub x9, x9, w8, uxtw
-+ adrp x9, L(\type\()_bilin_hv_tbl)
-+ add x9, x9, :lo12: L(\type\()_bilin_hv_tbl)
-+ ldr x9, [x9, x8, lsl #3]
- br x9
-
- 20: // 2xN hv
-@@ -2975,15 +2999,17 @@ L(\type\()_bilin_hv):
- 0:
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_hv_tbl):
-- .hword L(\type\()_bilin_hv_tbl) - 1280b
-- .hword L(\type\()_bilin_hv_tbl) - 640b
-- .hword L(\type\()_bilin_hv_tbl) - 320b
-- .hword L(\type\()_bilin_hv_tbl) - 160b
-- .hword L(\type\()_bilin_hv_tbl) - 80b
-- .hword L(\type\()_bilin_hv_tbl) - 40b
-- .hword L(\type\()_bilin_hv_tbl) - 20b
-- .hword 0
-+ .xword 1280b
-+ .xword 640b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 0
-+ .popsection
- endfunc
- .endm
-
Index: patches/patch-src_arm_64_refmvs_S
===================================================================
RCS file: patches/patch-src_arm_64_refmvs_S
diff -N patches/patch-src_arm_64_refmvs_S
--- patches/patch-src_arm_64_refmvs_S 24 Apr 2023 21:06:59 -0000 1.1
+++ /dev/null 1 Jan 1970 00:00:00 -0000
@@ -1,40 +0,0 @@
-Index: src/arm/64/refmvs.S
---- src/arm/64/refmvs.S.orig
-+++ src/arm/64/refmvs.S
-@@ -34,13 +34,13 @@
- function splat_mv_neon, export=1
- ld1 {v3.16b}, [x1]
- clz w3, w3
-- adr x5, L(splat_tbl)
-+ adrp x5, L(splat_tbl)
-+ add x5, x5, :lo12: L(splat_tbl)
- sub w3, w3, #26
- ext v2.16b, v3.16b, v3.16b, #12
-- ldrh w3, [x5, w3, uxtw #1]
-+ ldr x3, [x5, w3, uxtw #3]
- add w2, w2, w2, lsl #1
- ext v0.16b, v2.16b, v3.16b, #4
-- sub x3, x5, w3, uxtw
- ext v1.16b, v2.16b, v3.16b, #8
- lsl w2, w2, #2
- ext v2.16b, v2.16b, v3.16b, #12
-@@ -81,11 +81,13 @@ function splat_mv_neon, export=1
- b.gt 1b
- ret
-
-+ .pushsection .data.rel.ro, "aw"
- L(splat_tbl):
-- .hword L(splat_tbl) - 320b
-- .hword L(splat_tbl) - 160b
-- .hword L(splat_tbl) - 80b
-- .hword L(splat_tbl) - 40b
-- .hword L(splat_tbl) - 20b
-- .hword L(splat_tbl) - 10b
-+ .xword 320b
-+ .xword 160b
-+ .xword 80b
-+ .xword 40b
-+ .xword 20b
-+ .xword 10b
-+ .popsection
- endfunc
Index: patches/patch-src_arm_cpu_c
===================================================================
RCS file: patches/patch-src_arm_cpu_c
diff -N patches/patch-src_arm_cpu_c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ patches/patch-src_arm_cpu_c 1 Dec 2024 08:45:12 -0000
@@ -0,0 +1,38 @@
+Provide dav1d_getauxval() wrapper for getauxvaul() and elf_aux_info()
+93f12c117a4e1c0cc2b129dcc52e84dbd9b84200
+
+Index: src/arm/cpu.c
+--- src/arm/cpu.c.orig
++++ src/arm/cpu.c
+@@ -43,15 +43,8 @@
+ #define HWCAP2_AARCH64_I8MM (1 << 13)
+
+ COLD unsigned dav1d_get_cpu_flags_arm(void) {
+-#if HAVE_GETAUXVAL
+- unsigned long hw_cap = getauxval(AT_HWCAP);
+- unsigned long hw_cap2 = getauxval(AT_HWCAP2);
+-#else
+- unsigned long hw_cap = 0;
+- unsigned long hw_cap2 = 0;
+- elf_aux_info(AT_HWCAP, &hw_cap, sizeof(hw_cap));
+- elf_aux_info(AT_HWCAP2, &hw_cap2, sizeof(hw_cap2));
+-

OpenBSD Mail Box

Sunday, December 01, 2024

UPDATE: dav1d 1.5.0

No comments:

Post a Comment