Hello,
I've noticed something unexpected when copy-pasting UTF-8 characters in
xterm: xterm seems to change some of the characters into something
different but visually similar. Here's an example (using ksh):
$ uname -a
OpenBSD foo.my.domain 6.1 GENERIC#19 i386
$ ls
Thérèse
$ ls | od -c
0000000 T h e 314 201 r e 314 200 s e \n
0000014
$ cp Thérèse Thérèse
This copy command is typed as follows: type 'cp ', press tab for ksh to
auto-complete the first filename, another space, then use the mouse to
copy-paste the first filename into xterm to get the second filename.
The cp command works without any error. The result is:
$ ls
Thérèse Thérèse
$ ls | od -c
0000000 T h e 314 201 r e 314 200 s e \n T h 303 251
0000020 r 303 250 s e \n
0000026
Note how the two filenames look exactly the same but are actually different
byte sequences... So it looks like xterm is changing e 314 201 into 303 251
and e 314 200 into 303 250 when copy-pasting... which was rather a surprise
to me. I'm pretty sure the problem is with xterm, not with ksh, because
the same thing happens with bash (using a similar xterm and using bash
through ssh to a Linux machine).
Is this normal / expected?
For info:
$ cat .Xdefaults
xterm*background: black
xterm*foreground: white
xterm*metaSendsEscape: true
xterm*multiScroll: true
xterm*saveLines: 256
xterm*scrollBar: true
xterm*scrollKey: true
xterm*scrollTtyOutput: false
xterm*utf8Title: true
xterm*utmpInhibit: true
xterm*visualBell: true
$ set | egrep -i utf
LC_CTYPE=en_US.UTF-8
XTERM_LOCALE=en_US.UTF-8
Thanks,
Philippe
No comments:
Post a Comment