Coding for Right-To-Left text in DotNet (using VB.NET and Syriac)
I want to write some software to work with unicode text which is entered and displayed right-to-left. However I am NOT running Arabic Windows or any other special version: the software must work on 'normal' English US/UK Windows XP. My development environment is Visual Studio 2003.
It can be done, and I include a simple sample application, done in VB.NET for simplicity. But there are an awful lot of pitfalls along the way.
Special note: my interest is Syriac, but most of this will apply equally to Arabic or Hebrew. The fonts chosen will be different.
The best way to test all this out is to make sure that you can enter text in Microsoft Word. If it doesn't work there, you've done something wrong.
Microsoft provide two Syriac keyboards. You have to install this specially, however. These scripts are known to Microsoft as "complex scripts".
A. Activate Language Bar (instructions cribbed from here)
B. Install East Asian and Complex Script Utilities (instructions cribbed from here)
Two Syriac keyboards are available in Windows, but you may have to install it from the Windows System disk because it is a complex script. Follow the instructions below.
C - Activate Keyboards
Back in the Settings window, you should see the new
language or keyboard listed in the Input language menu. (Do not make
the added language the default!)
You are now ready to input Syriac text.
This is about getting the right fonts. It will apply also to Arabic and Hebrew, but I have no information on what fonts should be obtained. So what follows is mainly about Syriac.
To enter Syriac text you will need a windows font that (a) is a unicode font and (b) includes characters for the Syriac letters. Some people imagine that all unicode fonts include all the symbols defined for unicode, but this is not so. Only a few (e.g. Titus Cyberbit, Arial Unicode MS) include these. For Syriac the actual letters can appear in three different forms anyway, depending on whether they are written in the ancient Estrangelo script (as visible in the Estrangelo Edessa font, which ships with Windows XP -- use charmap to browse the characters that this font contains), or the later West Syrian or East Syrian scripts. The symbol Alap means the same in all three scripts (=A), and has the same code in unicode (known as a "code point"), but is visibly rather different in the three scripts.
A pack of fonts is available free online. These are the "Meltho" fonts.
The Microsoft Visual Keyboard is a utility which allows you to view the keyboard layout for each Input Locale within Microsoft Office applications. You will find it most useful to see what key will give what result.
You can download the utility onto your own computer from http://office.microsoft.com/downloads/2002/VkeyInst.aspx. Follow the posted instructions to install and use.
The Visual Keyboard can be opened from Start » All programs » Microsoft Office Tools » Microsoft Visual Keyboard. Switch to the appropriate keyboard in the Language Bar to see its layout. Keys highlighted in white are typically "hot keys" for adding accents.
The image below shows a sample layout window of a Hebrew keyboard as seen in the Microsoft Visual Keyboard.
It also allows you to enter text by hitting those keys.
Start up Word. You should get an extra menu item "Meltho". (If you do not, then probably your security settings for macros are too high -- put them down to medium).
Now change your language to "Syriac" by left-clicking on the language bar hovering at the top right, and the keyboard to "Syriac phonetic" by left-clicking on the bit of the language bar to the right of the keyboard icon.
Now change the font in Word to Estrangelo Edessa, font size to 20. Click on the page, and hit MALKA. This should appear, right-to-left (i.e. as AKLAM). If it does not, you've done something wrong; go back and recheck your steps.
You can repeat this exercise in Notepad as well, and it will work.
The Meltho macros show layouts for the Estrangelo consonants, but most of us learn Serto first. Here are the key-mappings (case is important):
Consonants | Vowels | (above) | (below) | |||
alaph | a | mim | m | ptaha (a) | Q | A |
beth | b | nun | n | zqafa (ā) | W | S |
gamal | g | semkat | s | rhboso (e) | E | D |
dalat | d | ayin | i | hsoso (i) | R | F |
he | h | pe | p | usoso (u) | T | G |
waw | w | tsade | x | |||
zayn | z | qop | q | Other marks | ||
het | ; | resh | r | seyame | I | |
tet | t | shin | v | qussaye | P | |
yod | y | taw | j | rukkaka | : | |
kaph | k | underscore | L | |||
lamad | l | full stop | . | |||
end of paragraph | , |
In word, you can change keyboard from English to Syriac mid-sentence and the direction of the characters will reverse. You can then change back, and fro, as many times as you like.
Correct. It won't. You must use Word XP.
1.6.2 It works in Word XP on Windows XP but I can't see any diacritics
Mark Dickens writes:
I've discovered what the problem is. My wife had said to me, "It's probably some little box you haven't ticked somewhere" and indeed it was. So, FYI (in case you run across this problem again), in Word XP (and I assume later versions of Word, but not in Word 2000 or earlier versions), under Tools/Options there is a tab called Complex Scripts. On that tab, there is a section Show and under that is a box for Diacritics to select. Once I selected it, my diacritics are showing up fine.
As to how it got unselected or whether Microsoft Office just assumed I wouldn't want to see my diacritics and so installed itself with that box unselected, I will never know. One of life's little mysteries...
This is actually simple, but you will not find this out from any other source on the net. There are also some real limits on what you can do.
Trust me on this. You can spend as much time as you like on this, but you will NEVER be able to get a RichTextBox to support rich-to-left text entry and display as Word and Notepad do. It doesn't work. You can set the "right to left" property, you can set "right align", you can mirror the control (which also doesn't work); nothing will make any difference.
Solution: use the TextBox control, with multiline=true. This DOES work.
Some controls can be flipped so that they display completely right-to-left. This isn't hard to do in VB.NET either. Microsoft document this here:
Here are the controls which do allow mirroring:
Control | Should not allow layout inheritance |
Listview | No |
Panel | Yes |
Statusbar | Yes |
Tabcontrol | Yes |
TabPage | Yes |
Toolbar | No |
TreeView | No |
Form | Yes |
Splitter | Yes |
To mirror a form, add this to the top of the form.vb file:
Public Class Form1 Inherits System.Windows.Forms.FormConst WS_EX_LAYOUTRTL = &H400000 Const WS_EX_NOINHERITLAYOUT = &H100000 Protected Overrides ReadOnly Property CreateParams() As System.Windows.Forms.CreateParams Get Dim CP As System.Windows.Forms.CreateParams = MyBase.CreateParams If Not MyBase.DesignMode() Then CP.ExStyle = CP.ExStyle Or WS_EX_LAYOUTRTL ''Or _ '' WS_EX_NOINHERITLAYOUT End If Return CP End Get End Property'-- rest of code below
The WS_EX_NOINHERITLAYOUT specifies whether child controls should inherit the mirroring or not. I have it commented out here, since I wanted to see the reversed buttons, and not just a reversed titlebar on the form.
To mirror a control, you need to subclass it, and add something similar at the top. I created a myRichTextBoxClass1.vb, which started:
Imports System.ComponentModel Public Class MyRichTextBoxClass1 Inherits System.Windows.Forms.RichTextBox Const WS_EX_LAYOUTRTL As Integer = &H400000 Const WS_EX_NOINHERITLAYOUT As Integer = &H100000 Private _mirrored As Boolean = False <Description("Change to the right-to-left layout."), _ DefaultValue(False), Localizable(True), _ Category("Appearance"), Browsable(True)> _ Public Property Mirrored() As Boolean Get Return _mirrored End Get Set(ByVal Value As Boolean) If _mirrored <> Value Then _mirrored = Value MyBase.OnRightToLeftChanged(EventArgs.Empty) End If End Set End Property Protected Overrides ReadOnly Property CreateParams() _ As System.Windows.Forms.CreateParams Get Dim CP As System.Windows.Forms.CreateParams = _ MyBase.CreateParams If Mirrored Then CP.ExStyle = CP.ExStyle Or WS_EX_LAYOUTRTL 'Or _ 'WS_EX_NOINHERITLAYOUT End If Return CP End Get End Property ' Rest of control code here
While this does NOT work for RichTextBoxes, since they won't mirror, it would work perfectly well for a TreeView control.
A .zip of the project and all its files is here. This will also work with VB.NET 2005, which is available for free download from the Microsoft site.
Just assign the value in the textbox to a string. It will work fine! You can split it into characters using tochararray(). Here are some sample bits of code:
Public AscArray() As Char = {"A", "B", "G", "D", "H", "W", "Z", _ "h", "t", "Y", "K", "L", "M", "N", "S", _ "E", "P", "z", "Q", "R", "s", "T", _ "'", ",", "*"} Public IntArray() As Integer = {&H710, &H712, &H713, &H715, &H717, &H718, &H719, _ &H71A, &H71B, &H71D, &H71F, &H720, &H721, &H722, &H723, _ &H725, &H726, &H728, &H729, &H72A, &H725, &H72C, _ &H741, &H742, &H308} '-- Take an ascii char in sedra encoding and return a syriac code point Private Function AscToSyriac(ByVal ch As Char) As Integer ch = ch.ToUpper(ch) Dim i As Integer For i = 0 To AscArray.Length - 1 If AscArray(i) = ch Then Return IntArray(i) End If Next i '-- If drops through Return AscW(ch) End Function '-- Take a syriac code point and return a sedra ascii character Private Function SyriacToAsc(ByVal ch As Integer) As Char Dim i As Integer For i = 0 To IntArray.Length - 1 If IntArray(i) = ch Then Return AscArray(i) End If Next i '-- If drops through Return ChrW(ch) End Function Public Sub dumpUnicode(ByVal mystr As String) Dim i As Integer For i = 0 To mystr.Length - 1 Dim lstr As String = mystr.Substring(i, 1) BottomBox.Text = BottomBox.Text & IIf(i = 0, "", vbCrLf) & "&H" & Hex(AscW(lstr)) Next i End Sub
You work out what the hex code for the character is using charmap and then some way to map an integer containing that value (I specify these in hex, since that is what charmap gives me: &H0701 = hex 0701) to an ASCII character. In the above example, I have two arrays of characters, and use these to convert from one to the other. Then internally I just process all the unicode characters as ASCII, and convert them back when the time comes to display.
Split a string separated by spaces and @'s :
Dim split As String() = mytext.Split(New [Char]() {" ", "@"})
StrReverse() works fine on the unicode string, although you may get a shock when it is displayed if the unicode character has more than one appearance depending on what is to the left and right of it (as is the case for Syriac, and Arabic and all these languages with characters joined together): this is one reason why you cannot fake RTL in the RichTextBox.
ChrW will convert your integer representation of a character back to a char containing the unicode symbol. Display it in the Textbox.text using
Textbox.text = textbox.text & mychar & vbcrlf
If you use the TextBox to handle RTL, the strings will internally all be LTR, so you won't need to do anything special; the RTL will be transparent to your code.
I have adapted the above example to take Syriac text right-to-left in the left hand box and transcribe it in the right-hand box left-to-right. It requires the Meltho fonts for the left-hand window (using Serto Jerusalem) and the free Titus Cyberbit Basic font for the right hand one (only so that it can handle any untranslated Syriac codes and also handle shin - s with a circumflex over it).
Install the utility (it's very simple) under XP by downloading and double-clicking on setup.msi. A more sophisticated version of this with a different interface can be downloaded as setupqs.msi
Here's the code (painting the form, assigning the font to each window, etc all being done by clicking items in the IDE)
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim i As Long Dim x As String Dim y As String x = TextBox1.Text y = "" Dim chararray As Char() chararray = x.ToCharArray() '-- Walk around string and process character in turn For i = 0 To chararray.Length - 1 ' display each character in hex 'MsgBox(Hex(AscW(chararray(i)))) '-- handle underscore for unpronounced letters as brackets '-- look ahead one character to see if there is an underscore under the current character If i < chararray.Length - 1 Then If AscW(chararray(i + 1)) = &H331 Then y = y + "(" End If End If '-- Process current character y = y + cvt(AscW(chararray(i))) Next i RichTextBox1.Text = y End Sub Function cvt(ByRef c As Integer) As Char Select Case c Case &H710 cvt = ChrW(&H2019) '-- alap Case &H712 cvt = "b" Case &H713 cvt = "g" Case &H715 cvt = "d" Case &H717 cvt = "h" Case &H718 cvt = "w" Case &H719 cvt = "z" Case &H71A cvt = ChrW(&H1E25) '-- het Case &H71B cvt = ChrW(&H1E6D) '-- tet Case &H71D cvt = "y" Case &H71F cvt = "k" Case &H720 cvt = "l" Case &H721 cvt = "m" Case &H722 cvt = "n" Case &H723 cvt = "s" Case &H725 cvt = ChrW(&H2018) '-- ayin Case &H726 cvt = "p" Case &H728 cvt = ChrW(&H1E63) '-- tsade Case &H729 cvt = "q" Case &H72A cvt = "r" Case &H72B cvt = ChrW(&H161) '-- shin Case &H72C cvt = "t" Case &H730, &H731 cvt = "a" Case &H733, &H734 cvt = ChrW(&H101) '-- zqafa Case &H736, &H737 cvt = "e" '-- rhbasa Case &H73A, &H73B cvt = "i" '-- hbasa Case &H73D, &H73E cvt = "u" '-- esasa/usoso Case &H331 cvt = ")" Case Else '-- passthrough cvt = ChrW(c) End Select End Function End Class
There is, thus, very little to it other than a bit of look-ahead for the underscore.
Download syriactranscription.zip.
It is quite likely that you will only want to enter Syriac text in certain boxes in your application, while still using English menus and entering English text in other boxes. Your user will get very tired very quickly of changing keyboard, so you must handle this for them.
Note: I have been unable to get this to work with more than one Syriac keyboard installed.
I created a class that gave me two methods. I ran keyboardCheck() when the program started, and stored the 'Original language' and the Syriac language keyboard id's. My public string ErrorMessage was set if there was an error.
Then I had two more methods: ActivateOriginalKeyboard() and ActivateSyriacKeyboard() which I called from GotFocus() events in my code (i.e. whenever a user clicked on a box to enter text, I called one of these).
Public Class clsPlatform Declare Function GetKeyboardLayoutList Lib "user32" (ByVal nBuff As Integer, ByRef lpList As Integer) As Integer Declare Function ActivateKeyboardLayout Lib "user32" (ByVal HKL As Long, ByVal flags As Integer) As Integer Declare Function GetLocaleInfo Lib "kernel32" Alias "GetLocaleInfoA" (ByVal Locale As Integer, ByVal LCType As Integer, ByVal lpLCData As String, ByVal cchData As Integer) As Integer Declare Function IsValidLocale Lib "kernel32" (ByVal Locale As Integer, ByVal dwFlags As Integer) As Integer Const LOCALE_SENGCOUNTRY As Long = &H1002 '// English name of country Const LOCALE_SLANGUAGE As Long = &H2 'localized name of language Const LCID_INSTALLED As Long = &H1 '-- is locale present? Declare Function GetKeyboardLayout Lib "user32" (ByVal dwLayout As Integer) As Integer Declare Function GetKeyboardLayoutName Lib "user32" Alias "GetKeyboardLayoutNameA" (ByVal pwszKLID As String) As Long Public ErrorMessage As String '-- Store these so can use when switching in editors Public OriginalKeyboardCode As Long = 0 Public SyriacKeyboardCode As Long = 0 Public Function KeyboardCheck() As ArrayList '-- Make sure Syriac installed. We do not care which Syriac keyboard the user uses Dim rc As Long Dim i As Integer Dim lLayouts(50) As Integer Dim retval As New ArrayList Dim buf As String = " " Dim layout As String Dim SyriacFound As Boolean = False ErrorMessage = "" 'Save current configuration OriginalKeyboardCode = GetKeyboardLayout(0) rc = GetKeyboardLayoutName(buf) layout = buf.Substring(0, 8) Dim layoutCode As String = "&H" & layout '-- a long Dim layoutName As String = getLocale(layoutCode, LOCALE_SLANGUAGE) If layoutName.Contains("English") = False Then MsgBox("Your keyboard is currently not set to English, but to " + layoutName _ + ". It will be reset to English.", MsgBoxStyle.Exclamation) End If 'Get the first 50 supported keyboard layouts (50 is max supported for now) rc = GetKeyboardLayoutList(50, lLayouts(0)) 'Loop through all the keyboard layouts 'Ignore the first one on 0 which is negative For i = 0 To UBound(lLayouts) If lLayouts(i) = 0 Then '-- all entries beyond those installed are 0 Exit For End If '--Activate the keyboard layout and get its name rc = ActivateKeyboardLayout(lLayouts(i), 0) rc = GetKeyboardLayoutName(buf) '-- This returns a long, i.e. 8 digits. The first 4 are something else. '-- The second 4 are the locale id. '-- Note that 0,8 gives a long, but getLocaleInfo only takes an int layout = buf.Substring(4, 4) layoutCode = "&H" & layout If IsValidLocale(layoutCode, LCID_INSTALLED) = 0 Then MsgBox("invalid locale " + i.ToString + " " + layoutCode) End If layoutName = getLocale(layoutCode, LOCALE_SLANGUAGE) MsgBox(i.ToString + " " + Hex(lLayouts(i)) + vbCrLf + buf.Substring(0, 4) + " " + layoutCode + vbCrLf + layoutName) If layoutName.Contains("Syriac") Then SyriacKeyboardCode = lLayouts(i) SyriacFound = True End If If layoutName.Contains("English") And OriginalKeyboardCode = 0 Then OriginalKeyboardCode = lLayouts(i) End If retval.Add(layoutCode + ":" + layoutName) Next i 'Restore current configuration ActivateOriginalKeyboard() If SyriacFound = False Then ErrorMessage = "The Syriac language and phonetic keyboard are not installed on your PC. Please correct this." End If Return retval End Function Private Function getLocale(ByVal m_LocaleLCID As Long, ByVal reqInfo As Integer) As String Dim Buffer As String = " " If GetLocaleInfo(m_LocaleLCID, reqInfo, Buffer, Buffer.Length) = 0 Then MsgBox("Unable to get locale info") End If getLocale = StripNull(Buffer) End Function Private Function StripNull(ByVal StrIn As String) As String Dim nul As Long nul = InStr(StrIn, vbNullChar) Select Case nul Case Is > 1 StripNull = Left$(StrIn, nul - 1) Case 1 StripNull = "" Case 0 StripNull = Trim$(StrIn) Case Else StripNull = StrIn End Select End Function Private Function LoWord(ByVal wParam As Long) As Integer If wParam And &H8000& Then LoWord = &H8000& Or (wParam And &H7FFF&) Else LoWord = wParam And &HFFFF& End If End Function Public Function ActivateOriginalKeyboard() As Long If OriginalKeyboardCode = 0 Then Return -1 Return (ActivateKeyboardLayout(OriginalKeyboardCode, 0)) End Function Public Function ActivateSyriacKeyboard() As Long If SyriacKeyboardCode = 0 Then Return -1 Return (ActivateKeyboardLayout(SyriacKeyboardCode, 0)) End Function End Class
Another approach to this is to find out if a complex script is installed using the IsValidLocale function with the LCID_INSTALLED flag on any locale that requires complex script support, such as:
BOOL fComplexScripts = IsValidLocale(LANG_HEBREW, LCID_INSTALLED);
A table of language identifiers is below. You can find the keyboard codes on your own machine by using regedit HKEY_CURRENT_USER\Keyboard Layout\ Preload and Substitutes.
Links
http://www.experts-exchange.com/Programming/Programming_Languages/Visual_Basic/VB_Controls/Q_21208243.html?query=Regional+Options&topics=94
http://www.experts-exchange.com/Programming/Programming_Languages/Visual_Basic/VB_Controls/Q_21409380.html
http://custom.programming-in.net/articles/art9-2.asp?lib=user32.dll
(reference for vb.net calls; what was a long in VB6 is an integer in VB.NET)
http://vbnet.mvps.org/index.html?code/locale/localecountry.htm
(stuff on country info)
Constructive feedback is welcomed to Roger Pearse.
Written 30th August 2006.
Updated 28th December 2006 with key strokes for Syriac and transcription
utility.
Updated 12th January 2007 with minimising language bar and changing keyboard in
your code.
This page has been online since 30th August 2006.
Return to Roger Pearse's Pages