Issue 81029 - Broken display of some characters after adding combining diacritical marks
Summary: Broken display of some characters after adding combining diacritical marks
Status: ACCEPTED
Alias: None
Product: Writer
Classification: Application
Component: viewing (show other issues)
Version: OOo 2.2
Hardware: PC Windows XP
: P4 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-25 20:44 UTC by bsb
Modified: 2017-05-20 10:44 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Screenshot of garbled letters (55.79 KB, text/plain)
2007-08-27 15:22 UTC, bsb
no flags Details
File to show garbled letters (6.94 KB, text/plain)
2007-08-27 15:23 UTC, bsb
no flags Details
3 screenshots defective displays described in comment (426.54 KB, image/jpeg)
2013-12-10 15:29 UTC, Gerald Bettrdige
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description bsb 2007-08-25 20:44:22 UTC
Set the font as "Tahoma", a TrueType/OpenType Unicode font that comes with 
Windows XP.
Enter some cyrillic letters. After the letter, insert a combining grave accent 
(U+0x0300) via "Insert - Special Character...". With some letters the combining 
goes as expected. With other letters the "original" letter gets replaced by a 
modified (accented) letter. However, the original letter is not deleted off the 
screen, and the modified letter is offset to the right.
This happens with letters U+0x0415 (Cyrillic capital letter Ie) and U+0x0418 
(Cyrillic capital letter I) and the corresponding small letters. The reason is 
that Tahoma includes the glyphs U+0x0400 (Cyrillic capital letter Ie with 
grave) and U+0x040D (Cyrillic capital letter I with grave), so OOw decides to 
change the combined symbols into one, presumably the same symbol. This does not 
happen with other "original" cyrillic letters, because there are no 
corresponding accented single glyphs in the font.
Comment 1 simos.bugzilla 2007-08-26 14:34:35 UTC
Could you please provide a screenshot that demonstrates the problem?
In addition, attach the sample ODT file that was used to generate this screenshot.
Comment 2 nmailhot 2007-08-27 09:21:33 UTC
Can you reproduce with dejavu (http://dejavu.sourceforge.net/) a FLOSS font
everyone can test with regardless of their platform?
Comment 3 bsb 2007-08-27 15:22:37 UTC
Created attachment 47802 [details]
Screenshot of garbled letters
Comment 4 bsb 2007-08-27 15:23:44 UTC
Created attachment 47803 [details]
File to show garbled letters
Comment 5 bsb 2007-08-27 15:31:56 UTC
While creating the file to illustrate the bug, I realized that the font needs 
not be Tahoma, neither the letters need be cyrillic. As you can see from my 
attachment, I achieved the same effect with Times New Roman and latin letters. 
What is important is that the font contains the accented letter as a separate 
glyph - then OOw replaces the combined characters with a single character.

Please note that it is important to use "Insert - Special character" from the 
menu. If you use "Edit - Repeat: Insert Special Character", then everything 
shows fine.

To reply to nmailhot - I can't do what you're asking. However, I believe that 
with this new development I just described you'd be able to reproduce this 
behaviour.
Comment 6 michael.ruess 2007-08-28 08:28:33 UTC
Reassigned to ES.
Comment 7 kpalagin 2008-01-26 18:43:41 UTC
Confirming with 2.4m4 on WinXP - as described.
Comment 8 eric.savary 2008-09-11 09:34:25 UTC
Reproduced with most of the combining diacritical marks in current version
(OOO300m5).
Comment 9 hdu@apache.org 2008-09-11 11:00:24 UTC
.
Comment 10 Gerald Bettrdige 2013-12-10 15:29:47 UTC
Created attachment 82075 [details]
3 screenshots defective displays described in comment

This bug can be a problem ― I came across it when I wanted to make a t with a tilde to imitate a Latin abbreviation. So I’ve been investigating exactly how it happens (under Windows 7, oo 4.0.1) in the hope that the evidence might allow someone to correct the code.

It happens when you add the combining diacritic (hereafter CoD!) with insert special character.  You can clear it at once by selecting the messy bit (the font widow then goes blank) and re-selecting the font.
It doesn’t happen if a) you type the CoD directly (of which more later) or b) cut-and-paste it from a text box – not a rich text box which carries font info with it. So you can cut-and-paste from Notepad but not from Wordpad.
Whether the font replaces the char with an existing glyph, or displays a composite glyph, seems to make no difference. A good test of this is t tilde, which doesn’t exist as a separate glyph, and t caron, which always displays ť instead, which has to be a substitute glyph. (U+0165)
If you deliberately make the CoD a different font from the letter – say Arial over Times Roman – and use insert special character, you get the composite character in Arial displaced over the unmodified character in Times Roman. Try this with t dot under by adding the CoD U+0323, then you can be sure the CoD isn’t hidden.
Now try the same thing but cut-and-paste the CoD from Wordpad. Use Times 60pt in oo, and Arial 28 pt in red in Wordpad. You will find the same behaviour, but the red enables you to see clearly what comes from what. (Under Windows 7, Google Chrome seems to do odd things to the clipboard, so if it is running Wordpad behaves like Notepad, with no format info). However while you can clear the mess up with insert character by selecting the mess and reselecting the font, if you cut-and-paste from Wordpad that doesn’t work.
Finally here is something really odd. Take a line where you have an unmodified char with the modified one beside it. At the beginning of the line insert a char like t̃ and you will find that the unmodified original letter now gets modified! But it only does this on the first mess, not subsequent ones. All this on the attached latin screenshot; notice on the second line ṭ has been added as a single character U+1E6D, but on the third as t plus U+0323, and only the latter affects the following mess. In LatinScreenshot2.jpg two of the lines have been selected and the font redefined, which tidies up everything except the mess imported from Wordpad.
So what happens if the font replaces the preceding character, but what you are adding is not a CoD? This behaviour is inherent in Korean fonts. There are relatively few letters in the Korean alphabet, but the letters of a syllable have to be displayed as a single glyph. A syllable consists of an initial consonant (which might be silent), a vowel, and an optional final consonant. Unicode defines the separate letters as Hangul Jamo, and the combined glyphs (over 10000 of them) as Hangul syllables. There’s a simple formula to get the combined glyph from the letters. To try this you need to set the language in the little window at the bottom to Korean, and choose Gungsuh or another Korean font as the Asian font. Then set the font to Gungsuh 40 pt. You need to know that h is U+1112, a is U+1161 and final n is U+11AB. The combined glyph han is U+D55C (in Hangul Ha).
If you insert the characters in order you will find the same sort of behaviour as with CoDs. If you insert a new han at the beginning of a line which already contains a badly displayed han, you will find it works correctly, but it doesn’t correct the existing mess. If you insert the combined glyph directly as U+D55C, twice, you can see the correct spacing. The attached asian screenshot shows, on successive lines, h: a: n: a messy ha: a messy han: two combined glyphs correctly spaced: and finally a messy han with one inserted before which has come out correctly.

There are two ways of not having to suffer this. One is to use a custom keyboard layout, but for that you will need a program to do it. I use and am very happy with KdbEdit, but there are doubtless others. I’ve based the layout on the Canadian multilingual, which has dead keys for all the common accents (so you can type PinYin direct, for example, without customising anything. It’s provided by Windows). But you can’t type CoDs without customising. The trick is to edit the dead keys so that not only does DK+ space give the accent alone, but DK+= gives the associated CoD.
The second is to use a program to insert hex codes. I’ve cobbled one together in C# which, when it loses the focus, transfers the character of the hex code to the clipboard, so CtrlV will insert it. It works with surrogates too. The program is presented as a tiny window which stays on top. Would be nice if this were included in oo but that’s a different topic.
Comment 11 Marcus 2017-05-20 10:44:48 UTC
Reset the assignee to the default "issues@openoffice.apache.org".