Monday, March 02, 2020

Re: man to render pure text? (or a pipe in vi macros ?)

On Mon, Mar 02, 2020 at 06:25:47PM +0100, Ingo Schwarze wrote:
> Hi,
>
> Marc Chantreux wrote on Mon, Mar 02, 2020 at 11:49:31AM +0100:
>
> > coming from linux, i'm used to read manpages
> > in a vi buffer so i can do much more than
> > reading the content.
>
> I have no idea what the "much more" refers to. The main effect is to
> lose tagging functionality. That is, compared to man(1) with the
> default pager, you cannot use the :t functionality to move to the
> place where a word is defined.
>
> > i basically use
> >
> > :r !man ls
> > or
> > !!sh (when the line content is "man ls")
>
> Yikes. I had no idea what either of these are doing and had to
> try them out. vi(1) contains so much bloat that is never really
> needed and doesn't belong in a text editor at all.
>
> > under openbsd, it seems man doesn't if stdout
> > is a tty.
>
> You mean, man(1) doesn't *imply col -b* if stdout is *not* a tty?
>
> > i digged the man manual a little bit
> > without finding a solution so i worked the
> > things around:
> >
> > :r !man ls|fmt
>
> As others said, the normal way to strip backspace formatting is
>
> $ man ls | col -b
>
> It is documented in man(1) below the -c option and below EXAMPLES,
> and in mandoc(1) below "ASCII Output":
>
> https://man.openbsd.org/man.1#c
> https://man.openbsd.org/man.1#EXAMPLES
> https://man.openbsd.org/mandoc.1#ASCII_Output
>
> You find such stuff as follows:
>
> $ man -k 'Xr=col(1)'
> man(1) - display manual pages
> mandoc(1) - format manual pages
>
> The advantage of col(1) over fmt(1) is that it is guaranteed to not
> mess up line breaks.
>
> > now i would like a poor version of keyword
> > feature in openbsd vi. the linux version
> >
> > map K yw:E /tmp/vi.keyword.$$p!!xargs man
>
> You don't say what that is supposed to do.
>
> Under Debian Jessie, if i start "vim", then type
>
> :map K yw:E /tmp/vi.keyword.$$p!!xargs man <ENTER>
> als <ESC>
> K <ENTER>
>
> i get:
>
> Error detected while processing function netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore:
> line 30:
> E132: Function call depth is higher than 'maxfuncdepth'
> Press ENTER or type command to continue
>
> That doesn't seem useful to me.
>
> I also tried the same with OpenBSD vi(1) and it resulted in
>
> Usage: e[dit][!] [+cmd] [file].
>
> So, no idea what you are trying to do.
>
> > becomes
> >
> > map K yw:E /tmp/vi.keyword.$$p!!xargs -IX sh -c 'man X|fmt'
> >
> > which doesn't work as | separates 2 vi commands.
> >
> > i really would like to know one or the two of these:
> >
> > * is there a way to ask man to deliver pure (non-formatted) text ?
>
> In 2014, i already wrote a patch to do that because the question
> came up repeatedly. But demand wasn't that high after all, so i
> never committed it. Now, i updated the patch to -current, see
> below.
>
> On the one hand, the UNIX phlosophy is to have each tool do one
> thing well, then use pipes to connect tools as needed. Then again,
> arguably, you maybe shouldn't need another tool to just revert
> something that the first tool does. Why would *not* adding backspace
> formatting require a pipe to another program, rather than not adding
> it in the first place?
>
> Also, the patch that would be required is very small and straightforward.
>
> So, what do people think? Should i test the patch below in more
> depth and commit it? Or do people consider this bloat?
>
> Yours,
> Ingo
>
>
> Index: main.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/main.c,v
> retrieving revision 1.247
> diff -u -p -r1.247 main.c
> --- main.c 24 Feb 2020 21:15:05 -0000 1.247
> +++ main.c 2 Mar 2020 17:06:53 -0000
> @@ -158,6 +158,7 @@ main(int argc, char *argv[])
> /* Search options. */
>
> memset(&conf, 0, sizeof(conf));
> + conf.output.backspace = -1;
> conf_file = NULL;
> defpaths = auxpaths = NULL;
>
> @@ -373,6 +374,9 @@ main(int argc, char *argv[])
> return mandoc_msg_getrc();
> }
> }
> +
> + if (conf.output.backspace == -1)
> + conf.output.backspace = 1;
>
> /* Parse arguments. */
>
> Index: manconf.h
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/manconf.h,v
> retrieving revision 1.7
> diff -u -p -r1.7 manconf.h
> --- manconf.h 22 Nov 2018 11:30:15 -0000 1.7
> +++ manconf.h 2 Mar 2020 17:06:54 -0000
> @@ -1,6 +1,6 @@
> /* $OpenBSD: manconf.h,v 1.7 2018/11/22 11:30:15 schwarze Exp $ */
> /*
> - * Copyright (c) 2011, 2015, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org>
> + * Copyright (c) 2011,2015,2017,2018,2020 Ingo Schwarze <schwarze@openbsd.org>
> * Copyright (c) 2011 Kristaps Dzonsons <kristaps@bsd.lv>
> *
> * Permission to use, copy, modify, and distribute this software for any
> @@ -33,6 +33,7 @@ struct manoutput {
> char *tag;
> size_t indent;
> size_t width;
> + int backspace;
> int fragment;
> int mdoc;
> int noval;
> Index: mandoc.1
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/mandoc.1,v
> retrieving revision 1.166
> diff -u -p -r1.166 mandoc.1
> --- mandoc.1 15 Feb 2020 15:28:01 -0000 1.166
> +++ mandoc.1 2 Mar 2020 17:06:55 -0000
> @@ -284,6 +284,13 @@ The following
> .Fl O
> arguments are accepted:
> .Bl -tag -width Ds
> +.It Cm format Ns = Ns Cm none
> +No back-spaced encoding is used, neither for bold face and underlining
> +nor for character overstrikes. Only the last character of each
> +overstrike group is printed.
> +This has the same effect as piping the output through
> +.Xr col 1
> +.Fl bx .
> .It Cm indent Ns = Ns Ar indent
> The left margin for normal text is set to
> .Ar indent
> Index: manpath.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/manpath.c,v
> retrieving revision 1.28
> diff -u -p -r1.28 manpath.c
> --- manpath.c 10 Feb 2020 14:42:03 -0000 1.28
> +++ manpath.c 2 Mar 2020 17:06:57 -0000
> @@ -1,6 +1,6 @@
> /* $OpenBSD: manpath.c,v 1.28 2020/02/10 14:42:03 schwarze Exp $ */
> /*
> - * Copyright (c) 2011,2014,2015,2017-2019 Ingo Schwarze <schwarze@openbsd.org>
> + * Copyright (c) 2011,2014,2015,2017-2020 Ingo Schwarze <schwarze@openbsd.org>
> * Copyright (c) 2011 Kristaps Dzonsons <kristaps@bsd.lv>
> *
> * Permission to use, copy, modify, and distribute this software for any
> @@ -226,7 +226,7 @@ manconf_output(struct manoutput *conf, c
> {
> const char *const toks[] = {
> "includes", "man", "paper", "style", "indent", "width",
> - "tag", "fragment", "mdoc", "noval", "toc"
> + "format", "tag", "fragment", "mdoc", "noval", "toc"
> };
> const size_t ntoks = sizeof(toks) / sizeof(toks[0]);
>
> @@ -247,11 +247,11 @@ manconf_output(struct manoutput *conf, c
> }
> }
>
> - if (tok < 6 && *cp == '\0') {
> + if (tok < 7 && *cp == '\0') {
> mandoc_msg(MANDOCERR_BADVAL_MISS, 0, 0, "-O %s=?", toks[tok]);
> return -1;
> }
> - if (tok > 6 && tok < ntoks && *cp != '\0') {
> + if (tok > 7 && tok < ntoks && *cp != '\0') {
> mandoc_msg(MANDOCERR_BADVAL, 0, 0, "-O %s=%s", toks[tok], cp);
> return -1;
> }
> @@ -308,22 +308,43 @@ manconf_output(struct manoutput *conf, c
> "-O width=%s is %s", cp, errstr);
> return -1;
> case 6:
> + switch (conf->backspace) {
> + case 0:
> + oldval = mandoc_strdup("none");
> + break;
> + case 1:
> + oldval = mandoc_strdup("backspace");
> + break;
> + default:
> + if (strcmp(cp, "none") == 0) {
> + conf->backspace = 0;
> + return 0;
> + } else if (strcmp(cp, "backspace") == 0) {
> + conf->backspace = 1;
> + return 0;
> + }
> + mandoc_msg(MANDOCERR_BADVAL_BAD, 0, 0,
> + "-O format=%s", cp);
> + return -1;
> + }
> + break;
> + case 7:
> if (conf->tag != NULL) {
> oldval = mandoc_strdup(conf->tag);
> break;
> }
> conf->tag = mandoc_strdup(cp);
> return 0;
> - case 7:
> + case 8:
> conf->fragment = 1;
> return 0;
> - case 8:
> + case 9:
> conf->mdoc = 1;
> return 0;
> - case 9:
> + case 10:
> conf->noval = 1;
> return 0;
> - case 10:
> + case 11:
> conf->toc = 1;
> return 0;
> default:
> Index: term.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/term.c,v
> retrieving revision 1.141
> diff -u -p -r1.141 term.c
> --- term.c 3 Jun 2019 20:23:39 -0000 1.141
> +++ term.c 2 Mar 2020 17:07:04 -0000
> @@ -1,7 +1,7 @@
> /* $OpenBSD: term.c,v 1.141 2019/06/03 20:23:39 schwarze Exp $ */
> /*
> * Copyright (c) 2008, 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
> - * Copyright (c) 2010-2019 Ingo Schwarze <schwarze@openbsd.org>
> + * Copyright (c) 2010-2020 Ingo Schwarze <schwarze@openbsd.org>
> *
> * Permission to use, copy, modify, and distribute this software for any
> * purpose with or without fee is hereby granted, provided that the above
> @@ -795,24 +795,26 @@ encode1(struct termp *p, int c)
> f = (c == ASCII_HYPH || c > 127 || isgraph(c)) ?
> p->fontq[p->fonti] : TERMFONT_NONE;
>
> - if (p->flags & TERMP_BACKBEFORE) {
> - if (p->tcol->buf[p->col - 1] == ' ' ||
> - p->tcol->buf[p->col - 1] == '\t')
> - p->col--;
> - else
> + if (p->backspace) {
> + if (p->flags & TERMP_BACKBEFORE) {
> + if (p->tcol->buf[p->col - 1] == ' ' ||
> + p->tcol->buf[p->col - 1] == '\t')
> + p->col--;
> + else
> + p->tcol->buf[p->col++] = '\b';
> + p->flags &= ~TERMP_BACKBEFORE;
> + }
> + if (f == TERMFONT_UNDER || f == TERMFONT_BI) {
> + p->tcol->buf[p->col++] = '_';
> p->tcol->buf[p->col++] = '\b';
> - p->flags &= ~TERMP_BACKBEFORE;
> - }
> - if (f == TERMFONT_UNDER || f == TERMFONT_BI) {
> - p->tcol->buf[p->col++] = '_';
> - p->tcol->buf[p->col++] = '\b';
> - }
> - if (f == TERMFONT_BOLD || f == TERMFONT_BI) {
> - if (c == ASCII_HYPH)
> - p->tcol->buf[p->col++] = '-';
> - else
> - p->tcol->buf[p->col++] = c;
> - p->tcol->buf[p->col++] = '\b';
> + }
> + if (f == TERMFONT_BOLD || f == TERMFONT_BI) {
> + if (c == ASCII_HYPH)
> + p->tcol->buf[p->col++] = '-';
> + else
> + p->tcol->buf[p->col++] = c;
> + p->tcol->buf[p->col++] = '\b';
> + }
> }
> if (p->tcol->lastcol <= p->col || (c != ' ' && c != ASCII_NBRSP))
> p->tcol->buf[p->col] = c;
> @@ -839,7 +841,9 @@ encode(struct termp *p, const char *word
> adjbuf(p->tcol, p->col + 2 + (sz * 5));
>
> for (i = 0; i < sz; i++) {
> - if (ASCII_HYPH == word[i] ||
> + if (p->backspace == 0 && word[i] == '\b')
> + p->col--;
> + else if (word[i] == ASCII_HYPH ||
> isgraph((unsigned char)word[i]))
> encode1(p, word[i]);
> else {
> Index: term.h
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/term.h,v
> retrieving revision 1.75
> diff -u -p -r1.75 term.h
> --- term.h 4 Jan 2019 03:20:44 -0000 1.75
> +++ term.h 2 Mar 2020 17:07:04 -0000
> @@ -1,7 +1,7 @@
> /* $OpenBSD: term.h,v 1.75 2019/01/04 03:20:44 schwarze Exp $ */
> /*
> * Copyright (c) 2008, 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
> - * Copyright (c) 2011-2015, 2017, 2019 Ingo Schwarze <schwarze@openbsd.org>
> + * Copyright (c) 2011-2015,2017,2019,2020 Ingo Schwarze <schwarze@openbsd.org>
> *
> * Permission to use, copy, modify, and distribute this software for any
> * purpose with or without fee is hereby granted, provided that the above
> @@ -73,6 +73,7 @@ struct termp {
> size_t viscol; /* Chars on current line. */
> size_t trailspace; /* See term_flushln(). */
> size_t minbl; /* Minimum blanks before next field. */
> + int backspace; /* Use \b in output. */
> int synopsisonly; /* Print the synopsis only. */
> int mdocstyle; /* Imitate mdoc(7) output. */
> int ti; /* Temporary indent for one line. */
> Index: term_ascii.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/term_ascii.c,v
> retrieving revision 1.50
> diff -u -p -r1.50 term_ascii.c
> --- term_ascii.c 19 Jul 2019 21:45:37 -0000 1.50
> +++ term_ascii.c 2 Mar 2020 17:07:04 -0000
> @@ -1,7 +1,7 @@
> /* $OpenBSD: term_ascii.c,v 1.50 2019/07/19 21:45:37 schwarze Exp $ */
> /*
> * Copyright (c) 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
> - * Copyright (c) 2014, 2015, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org>
> + * Copyright (c) 2014, 2015, 2017-2020 Ingo Schwarze <schwarze@openbsd.org>
> *
> * Permission to use, copy, modify, and distribute this software for any
> * purpose with or without fee is hereby granted, provided that the above
> @@ -112,6 +112,8 @@ ascii_init(enum termenc enc, const struc
> }
> }
>
> + if (outopts->backspace)
> + p->backspace = 1;
> if (outopts->mdoc) {
> p->mdocstyle = 1;
> p->defindent = 5;
>


Hi,

I wanted to do a similar thing (mandoc to UTF-8 text) and used col -b.

I noticed while processing the output of mandoc(1) to ASCII/UTF-8 using col(1)
it filters away UTF-8 non-breaking spaces too (\xc2\xa0) for example.

To reproduce more simply:

OpenBSD:
printf 'test\xc2\xa0.\n' | col -b | hexdump -C
00000000 74 65 73 74 2e 0a |test..|

util-linux col uses wide-chars and outputs:
00000000 74 65 73 74 c2 a0 2e 0a |test....|

On NetBSD and other col implementations there is a -p option.
The -p option is specified in an older standard:

Technical Standard Commands and Utilities Issue 4, Version 2
page 200
https://pubs.opengroup.org/onlinepubs/9695969399/toc.pdf

The below patch adds -p to col (from NetBSD):


Patch below:


diff --git usr.bin/col/col.1 usr.bin/col/col.1
index cceebfec5db..f0f1e906992 100644
--- usr.bin/col/col.1
+++ usr.bin/col/col.1
@@ -41,7 +41,7 @@
.Nd filter reverse line feeds and backspaces from input
.Sh SYNOPSIS
.Nm col
-.Op Fl bfhx
+.Op Fl bfhpx
.Op Fl l Ar num
.Sh DESCRIPTION
.Nm
@@ -73,6 +73,12 @@ Buffer at least
.Ar num
lines in memory.
By default, 128 lines are buffered.
+.It Fl p
+Force unknown control sequences to be passed through unchanged.
+Normally,
+.Nm
+will filter out any control sequences from the input other than those
+recognized and interpreted by itself, which are listed below.
.It Fl x
Output multiple spaces instead of tabs.
.El
diff --git usr.bin/col/col.c usr.bin/col/col.c
index c3c51b4c630..8b59a2f09cf 100644
--- usr.bin/col/col.c
+++ usr.bin/col/col.c
@@ -92,6 +92,7 @@ int fine; /* if `fine' resolution (half lines) */
int max_bufd_lines; /* max # of half lines to keep in memory */
int nblank_lines; /* # blanks after last flushed line */
int no_backspaces; /* if not to output any backspaces */
+int pass_unknown_seqs; /* whether to pass unknown control sequences */

#define PUTC(ch) \
if (putchar(ch) == EOF) \
@@ -118,7 +119,8 @@ main(int argc, char *argv[])

max_bufd_lines = 256;
compress_spaces = 1; /* compress spaces into tabs */
- while ((opt = getopt(argc, argv, "bfhl:x")) != -1)
+ pass_unknown_seqs = 0; /* remove unknown escape sequences */
+ while ((opt = getopt(argc, argv, "bfhl:px")) != -1)
switch (opt) {
case 'b': /* do not output backspaces */
no_backspaces = 1;
@@ -136,6 +138,9 @@ main(int argc, char *argv[])
errx(1, "bad -l argument, %s: %s", errstr,
optarg);
break;
+ case 'p': /* pass unknown control sequences */
+ pass_unknown_seqs = 1;
+ break;
case 'x': /* do not compress spaces into tabs */
compress_spaces = 0;
break;
@@ -212,7 +217,8 @@ main(int argc, char *argv[])
addto_lineno(&cur_line, -2);
continue;
}
- continue;
+ if (!pass_unknown_seqs)
+ continue;
}

/* Must stuff ch in a line - are we at the right one? */
@@ -534,7 +540,7 @@ xreallocarray(void *p, size_t n, size_t size)
void
usage(void)
{
- (void)fprintf(stderr, "usage: col [-bfhx] [-l num]\n");
+ (void)fprintf(stderr, "usage: col [-bfhpx] [-l num]\n");
exit(1);
}


--
Kind regards,
Hiltjo

No comments:

Post a Comment