by GalaxySnail on 12/20/23, 11:57 AM with 24 comments
by Per_Bothner on 12/23/23, 6:31 AM
by zokier on 12/23/23, 12:40 PM
by hollow-moe on 12/23/23, 1:55 PM
by ubercow13 on 12/23/23, 5:16 AM
If you create a file in Finder with a hangul name, by default ls in iTerm2 will display the name incorrectly, with the composite syllables expanded out into separate characters. There is an option in iTerm2 to 'normalise unicode' which corrects this issue if set to HFS or NFC (if I remember right). This mimics the behaviour of Terminal.app.
However, different interactive terminal apps seem to have different expectations of terminal behaviour for this form of unicode hangul. With the 'normalise' setting off, fzf will corrupt the display when showing hangul. But with it on, vim/neovim will corrupt the display when navigating lines with hangul text.
It seems there is no way to get consistent and user-friendly hangul handling in the terminal on macOS, but I'm not sure who's at fault (except Apple of course).
by tetris11 on 12/23/23, 11:20 AM
There is literally nothing out there that can reliably do this. Par (https://manpages.debian.org/par) seems to be promising, but still gets lost on some example texts.
by chrismorgan on 12/23/23, 5:27 PM
Well yeah, if your font doesn’t have the right glyphs, you’re going to get a mess. Indic text has a habit of being just straight illegible even in terminal emulators that handle this wcwidth, and I don’t read Arabic but I expect it tends to be particularly commonly illegible, not just overflowing all over the place but also being rendered left-to-right instead of right-to-left.
Fun fact, Indic text can accidentally end up more legible in terminal emulators that don’t support this wcwidth stuff, so long as they still do Indic script rendering. Take my name in the Telugu script: క్రిస్ is six code points long (letter ka, sign virama (which suppresses the -a inherent vowel), letter ra, vowel sign i, letter sa, sign virama) and normally rendered as two clusters (kri, s), but it’s actually perfectly valid (though uncommon) to write it as క్ రిస్ (I inserted a ZERO WIDTH NON-JOINER between the k and the ri, but HN turned it into a regular space which is very wrong :-( ), and separating the conjunct makes the character fit into a cell more reliably, because it won’t go so far up or down or occasionally sideways. I use Alacritty; it doesn’t do the full wcwidth thing in this case specifically because (if I recall correctly) it doesn’t want to do Indic script rendering for performance reasons. Now because of this unfortunate combination what I actually get there is the three cells, but with the inherent vowel still rendered on every consonant, even if replaced with another vowel sign or virama—so I basically get క్ రిస్ and కరస rendered on top of each other.
But returning to the original topic: yeah, all of this stuff falls over extremely often, because no one has fonts that handle everything properly, and the whole thing is just an exhibition of how the column/cell model is completely unsuited for a Unicode world, and we just keep layering patches upon patches to mitigate the harm, but it’s impossible to actually fix it: the whole thing needs burning down and replacing.
What I wish for is a control sequence that enables a non-cellular mode: which allows non-monospaced rendering if desired, starts doing BiDi text rendering, introduces a couple of varieties of flexible tab stop (since visual columnar alignment is still highly desirable and two of the main forms are still practical to achieve, loosely corresponding to CSS flex-wrap and grid), tweaks the behaviours of most cursor-affecting control sequences to use a grapheme cluster basis (and you’d need to do something about logical versus wrapped lines, programs might need to address both sometimes, though most should go logical), and ruins the notion of $COLUMNS. Honestly, with minor tweaks to programs that do not-at-start-of-line visual alignment, there’s not all that much that would break: things like rustc output would suffer a little (you could only align the start of span underlines, not the end), and vertical splits in things like Vim and tmux are not practical, but that’s about all I can immediately think of. And so many more things would start working properly. I think this is achievable and even practical, and I think the end goal is well worthwhile, but it would take quite a lot of effort, in specifying and in implementing. I’m curious if this catches anyone’s fancy. What I have in mind sounds similar to what Per_Bothner describes elsewhere in this thread, perhaps just more featureful (for good and ill!).
by divyaranjan1905 on 12/23/23, 11:50 AM