Monday, December 30, 2024

Re: acme-client challenges

Quick response, thank you!

On Tue, 31 Dec 2024 03:22:14 +0100, Kirill A. Korinsky wrote:
> On Tue, 31 Dec 2024 01:44:15 +0100,
> Amelia A Lewis <amyzing@talsever.com> wrote:
>>
>> $ doas acme-client -vvv simmonpatch.com
>> acme-client: /etc/ssl/private/leo-simmonpatch.com.key: loaded domain key
>> acme-client: /etc/acme/letsencrypt-staging-privkey.pem: loaded account
>> key
>> acme-client: https://acme-staging.api.letsencrypt.org/directory:
>> directories
>> acme-client: https://acme-staging.api.letsencrypt.org/directory: bad
>> comm
>> acme-client: bad exit: netproc(39958): 1
>>
>
> can you run host acme-staging.api.letsencrypt.org on the same machine?

$ host acme-staging.api.letsencrypt.org
Host acme-staging.api.letsencrypt.org not found: 3(NXDOMAIN)

Well, that seems dispositive.

$ dig acme-staging.api.letsencrypt.org

; <<>> dig 9.10.8-P1 <<>> acme-staging.api.letsencrypt.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 46315
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;acme-staging.api.letsencrypt.org. IN A

;; AUTHORITY SECTION:
letsencrypt.org. 1784 IN SOA owen.ns.cloudflare.com.
dns.cloudflare.com. 2360987238 10000 2400 604800 1800

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Dec 30 21:51:48 EST 2024
;; MSG SIZE rcvd: 123

Hmmmm. okay. That's a confirmation.



> I'm asking because as far as I understand the code at 3:20 am, the only case
> when you see 'bad comm' without additional errors / warnings is DNS related.

Ah. I went looking in netproc.c (after I sent the email; should've
looked first), and localized it to nfreq() < 0, but couldn't focus
enough after a day working to figure which of the return -1's in there
were the bad ones. Which reminds me: I owe the devs an apology (but
"comm" is unnecessarily cryptic; google keeps suggesting that I must
mean "bad community" and while I suspect it ought to mean "bad command"
or "bad communication channel" or something like that, I'm not certain:
"bad connection" [sic]?). But it is an acme-client bit, and it was rude
of me to snark on it. Teach me to check the code before asking
questions.

> I mean that acme-client can't resolve acme-staging.api.letsencrypt.org.

Seems so. Neither acme-staging nor acme.

$ host -t ns letsencrypt.org
letsencrypt.org name server owen.ns.cloudflare.com.
letsencrypt.org name server vera.ns.cloudflare.com.

acme.api.letsencrypt.org is also nxdomain.

host acme{,-staging}.api.letsencrypt.org vera.ns.cloudflare.com ditto.

Pretty much nx-ed. Lemme try a couple things. Same from other server in
colo (they're sharing the same lookup mechanism, not dispositive).

Okay, trying host acme.api.letsencrypt.org $nameserver and substituting
the IP address of a nameserver selected from a list of public resolving
servers in unbound.conf (an older copy has a longer list than new
installs) remains consistent. NXDOMAIN from everywhere I can start
from. Is there a problem at le.o? Just me? But if it's just me, then
directly querying 8.8.8.8 or similar from my home machine _should_
resolve, no? Not understanding how it can be filtered for both my home
machines checking public resolvers and for the servers in colo, but not
the rest of the world.

FWIW, letsencrypt.org responds positively, and I've been trying to
browse their docs since yesterday for something that looked like this
problem. But no one else has reported it? Or at least the available
hits for a search on 'acme-client "directories" "bad comm" -ai' turns
up nothing really pointing at dns (or else I missed it).

But wait! I found some curl commands on the net that someone offered
for debugging, which downloaded json from a staging server, on my
machine.

host acme-staging-v02.api.letsencrypt.org
acme-staging-v02.api.letsencrypt.org is an alias for
staging.api.letsencrypt.org.
staging.api.letsencrypt.org is an alias for
56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com.
56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com has address
172.65.46.172
56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com has IPv6 address
2606:4700:60:0:f41b:d4fe:4325:6026

Ah, now *that's* encouraging.

$ host staging.api.letsencrypt.org
staging.api.letsencrypt.org is an alias for
56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com.
56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com has address
172.65.46.172
56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com has IPv6 address
2606:4700:60:0:f41b:d4fe:4325:6026

lessee, delete an 'acme-' ...

$ doas acme-client -vv simmonpatch.com
acme-client: /etc/acme/letsencrypt-staging-privkey.pem: loaded account
key
acme-client: /etc/ssl/private/leo-simmonpatch.com.key: loaded domain key
acme-client: https://staging.api.letsencrypt.org/directory: directories
acme-client: staging.api.letsencrypt.org: DNS: 172.65.46.172
acme-client: 172.65.46.172: tls_write: name
`staging.api.letsencrypt.org' not present in server certificate
acme-client: 172.65.46.172: tls_read: name
`staging.api.letsencrypt.org' not present in server certificate
acme-client: https://staging.api.letsencrypt.org/directory: bad comm
acme-client: bad exit: netproc(18286): 1

Progress? At least a different result, if not quite what I hoped for.
Apparently the main url may have changed as well? Because there's this:
$ host acme-v02.api.letsencrypt.org
acme-v02.api.letsencrypt.org is an alias for prod.api.letsencrypt.org.
prod.api.letsencrypt.org is an alias for
ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com.
ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com has address
172.65.32.248
ca80a1adb12a4fbdac5ffcbc944e9a61.pacloudflare.com has IPv6 address
2606:4700:60:0:f53d:5624:85c7:3a2c

Or I've triggered their famous throttling? But that ought to be per
requesting privkey (or since those can be deleted, per requesting IP).
I don't want to test against production; that's what staging is for.

It's true that the current production url on
letencrypt.org/getting-started is acme-v02.api.letsencrypt.org, though,
so at least that should prolly be updated in acme-client? And at a
guess, they've made a change to all of those URLs? And it's possible
that acme-client is running against the v1 protocol instead of v2? Have
they shut it down? acme-v01 (as found in /etc/examples and /etc) is
also NXDOMAIN, for me.

Thanks for the quick reply and pointers! Have you any idea what the
tls_write tls_read errors are? They're not triggering off pretend pear
x1 and bogus broccoli x2 are they?

Amy!
--
Amelia A. Lewis amyzing {at} talsever.com
Merchant, street girl, beggar, yeoman,
king or common, man or woman,
only two things make us human--
sorrow and love, sorrow and love ....
-- The Last Song of Sirit Byar

No comments:

Post a Comment