利用者:Xiaoxiangquan/GoogleSummerOfCode/2011/Internationalization/design
Garlic's Internationalization Design
We use GNU Gettext to translate strings. Here is the standard workflow.
GNU Gettext Workflow
- Firstly, a programmer writes a piece of code containing strings that he wants them to be translated into end users' local language. The original code is like:
ot->name = "Quit";
To use GNU Gettext, the programmer wrote like this:
/* header for gettext */
#include <libintl.h>
...
ot->name = gettext("Quit");
- Then the programmer ran the command xgettext, which will scan the source files specified, and gather the words in gettext(""). Finally, xgettext will gennerate a .pot file, whose suffix means a PO Template.
- A po is a translation file with items like:
msgid "Wave" msgstr "Vague"
Actually, this is a po for French, generally named fr.po. Translators use msgmerge command to merge the old PO and the newly updated POT, then all the new changes will be available to see in the PO. If there is no "Quit" item in the old PO, now there will be:
msgid "Quit" msgstr ""
Then translators can open the PO and fill untranslated strings.
- After the PO is translated, anyone can execute a msgfmt command to compile it as a MO file, and replace the old one used by the application. And the repository maintainer could update the MO to the trunk, too.
- At runtime, the gettext("Quit") calling will find the "Quit" entry in the MO and replace the string with the mapped one. Finally, users see a localized message if he has set application's locale and language properly.
Note that If there is no target entry or just an empty one, the calling will return the original message.
Now as you can see, this work has great need for cooperations.
More about Gathering and Translating
In the process above, the gettext(str) act as two roles: mark the string to be gathered by xgettext command, and do the real translate at runtime. Actually, we can seperate these two stages as follows:
#define Mark(msgid) msgid
#define Trans(msgid) gettext(msgid)
...
ot->name = Mark("Quit");
ot->name = Trans(ot->name);
In this style, the Mark() do nothing. But if we add "Mark" as a keyword for xgettext, the command will notice that "Quit" should be gathered. And at runtime, the Trans() calling will do the real thanstation.
Note that the xgettext is not intelligent enough to notice the "Quit" through Trans() calling, as the ot->name is not an evident string. Even the following syntax will not pass:
#define SubStr "DEF"
...
/* cannot recognize string constants concatenated with macro */
ot->name = gettext("ABC" SubStr "GHI");
While simple string constants concat could be handled well:
/* xgettext works well to gather "ABCDEFGHI" */
ot->name = gettext("ABC" "DEF" "GHI");
Further more, format string can be handled well only if the translated string contains the same number and type of format elements, or else a runtime error may occur:
/* make sure the translated "Hello, %s!" contains "%s" */
printf( gettext("Hello, %s!"), gettext("world") );
Garlic's Gettext Framework
Supporting Files
Let's see the related file tree first:
/po: Contains all PO and POT files, as well as some updating tools. ---/blender.pot: The PO Template ---/*.po: All translations ---/POTFILES.in: All source files containing messages to be handled ---/update_pot.py: Script to update the blender.pot. ---/update_po.py: Script to update the *.po. ---/update_mo.py: Script to compile the *.po. /release/bin/.blender/locale: Contains all compiled translations. ----------------------------/*/LC_MESSAGES/blender.mo: Compiled translation for single language.
Update POT
The update_pot.py will scan all the source files listed in the POTFILES.in. The command is like:
# --files-from specifies the files list
# --keyword specifies what indicates a target string
# --output specifies output file
# --from-code specifies string encode
xgettext --files-from=POTFILES.in
--keyword=_ --keyword=N_
--output=blender.pot
--from-code=utf-8
Generally, xgettext takes gettext() ans _() as default keywords, but it's not bad to repeat for clarity. As you can see, in Garlic we also use another keyword, N_(). See details later.
Update PO
The update_po.py will merge the blender.pot with all the PO files. After that, all additions, deletions and modifications of items will be available in the PO for translators. The command is like:
# --lang specifies the target language
# zh_CN.po is the old PO
# blender.pot is the updated PO Template
msgmerge --update --lang=zh_CN zh_CN.po blender.pot
Update MO
The update_mo.py will compile all the PO, generates MO files and put them under /release/bin/.blender/locale. The command is like:
# --statistics makes translation status printed
# zh_CN.po is the source PO
# -o specifies the target MO file path
msgfmt --statistics zh_CN.po -o LOCALE_DIR/zh_CN/LC_MESSAGES/blender.mo
The _() and N_() Macro
We supply a decorated gettext(), named BLF_gettext(), to do the real translation at runtime, which is in /source/blender/blenfont/intern/blf.c:
const char* BLF_gettext(const char *msgid)
{
#ifdef INTERNATIONAL
if( msgid!=NULL && strlen(msgid)>0 )
return gettext( msgid );
return "";
#else
return msgid;
#endif
}
Taking into account that it will be called thousands of times and we programers are lazy, a macro _() is supplied for short:
#define _(msgid) BLF_gettext(msgid)
Please always try to use _(). It will make the code clearer, and translation related issues won't disturbe the logic semantics too much.
However, sometimes we cannot call a function like BLF_gettext(). For example, there is piece of code in /source/blender/editors/screen/screen_ops.c:
static EnumPropertyItem modal_items[] = {
{KM_MODAL_CANCEL, "CANCEL", 0, "Cancel", ""},
{KM_MODAL_APPLY, "APPLY", 0, "Apply", ""},
{KM_MODAL_STEP10, "STEP10", 0, "Steps on", ""},
{KM_MODAL_STEP10_OFF, "STEP10_OFF", 0, "Steps off", ""},
{0, NULL, 0, NULL, NULL}};
We cannot call a function in these statements, but we need to let the xgettext command know what to collect. So, we seperate the gathering and translating stages as mentioned above. To achieve this we define a macro that do nothing. See it in /source/blender/blenfont/BLF_api.c:
#define N_(msgid) msgid
With this macro we can rewrite the code as:
static EnumPropertyItem modal_items[] = {
{KM_MODAL_CANCEL, "CANCEL", 0, N_("Cancel"), ""},
{KM_MODAL_APPLY, "APPLY", 0, N_("Apply"), ""},
{KM_MODAL_STEP10, "STEP10", 0, N_("Steps on"), ""},
{KM_MODAL_STEP10_OFF, "STEP10_OFF", 0, N_("Steps off"), ""},
{0, NULL, 0, NULL, NULL}};
Now, all the marked strings can be gathered to POT. But where to do the real translating at runtime? The answer is any one place before its content takes effect. In this example, it's just in the same function:
keymap= WM_modalkeymap_add( keyconf, "Standard Modal Map", modal_items);
We rewrite it as:
keymap= WM_modalkeymap_add( keyconf, "Standard Modal Map",
RNA_enum_items_gettexted(modal_items));
The RNA_enum_items_gettexted() is defined in /source/blender/makesrna/intern/rna_access.c. It traverses over a EnumPropertyItem array, and try translating all name and description field:
/* make every name and description field translated by gettext */
EnumPropertyItem* RNA_enum_items_gettexted(EnumPropertyItem *item)
{
if( item )
{
int i;
for(i=0; item[i].identifier; i++)
{
if( item[i].name )
item[i].name = _(item[i].name);
if( item[i].description )
item[i].description = _(item[i].description);
}
}
return item;
}
The Special makesrna Issues
Most strings outside /source/blender/makesrna could be handled easily by _(). Both gathering and translating work well. But it's different for those inside the makesrna folder.
When compiling, these files will linked as an executable makesrna, which will gennerate a lot of rna_*_gen.c somewhere depending on system. For linux, it's under [Build Path]/source/blender/makesrna/intern. These generated files will be compiled and linked into the final blender application.
So there is no need to translate strings when running makesrna. We'll just use N_() to mark all strings we are interested in. To do the real translating, we should access the datas in generated files. This can be done by a extern declaration. See the makesrna/RNA_access.h for example:
...
/* Types */
extern BlenderRNA BLENDER_RNA;
extern StructRNA RNA_Action;
extern StructRNA RNA_ActionConstraint;
extern StructRNA RNA_ActionFCurves;
extern StructRNA RNA_ActionGroup;
...
Each of these StructRNAs contains a linked list of properties with name and description fields which should be translated. If it is an ENUM property, the item field would be an array of EnumPropertyItems mentioned above, so we also need to process it with RNA_enum_items_gettexted().
Finally, we implement a RNA_types_init_gettext() in the makesrna/intern/rna_access.c to do this traversal processing.
Language Selection
The application decides which translation to work with by both the locale and language settings. The language options are listed in an EnumPropertyItem array named language_items, in the /source/blender/makesrna/intern/rna_userdef.c. Users' selection is stored in the global U.language. So we can set the environment accordingly. This is all done in /source/blender/blenfont/intern/blf_lang.c. The most important function, BLF_lang_set(), will try setting the locale, LANG and LANGUAGE environment values to the specified language.
This function also set the domain binding for gettext:
void BLF_lang_set(const char *str)
{
...
if( str[0]!=0 )
{
BLI_setenv("LANG", str);
BLI_setenv("LANGUAGE", str);
}
...
setlocale(LC_ALL, str);
...
textdomain(DOMAIN_NAME);
bindtextdomain(DOMAIN_NAME, global_messagepath);
...
}
Font Loading
Blender is rendered by OpenGL, so after other issues settled, the last problem is how to render international texts such as Chinese on the UI. The main point is to load a proper font when calling uiStyleInit() in /source/blender/editors/interface/interface_style.c. Finally we find Unifont a good glyphs that support all useful utf8 characters.
However, the TTF takes more than 10MB space. To make blender as lightweight as possible, we decide to gzip it. As a result, the application has to ungzip it at runtime. This work is done by BLI_ungzip_to_mem() in /source/blender/blenlib/intern/fileops.c:
/*
* gzip the file in from_file and write it to memery to_mem,
* at most size bytes.
*
* return the unziped size
*/
int BLI_ungzip_to_mem(const char *from_file, char *to_mem, const int size)
{
gzFile gzfile;
int readsize;
gzfile = gzopen( from_file, "rb" );
readsize = gzread( gzfile, to_mem, size);
if (readsize < 0)
readsize = EOF;
return readsize;
}
The get_datatoc_bunifont_ttf() function in /source/blender/editors/datafiles calls this function and return the font data been read. Then the uiStyleInit() finished loading the unifont TTF:
font->blf_id= BLF_load_mem_unique(
"default",
(unsigned char *)get_datatoc_bunifont_ttf(),
datatoc_bunifont_ttf_size);
Presently Garlic takes unifont as the default font, while not the old bfont. Because the Language Selection interface needs almost all languages, so any single language TTF may not work well.
After these preparation, international texts can be displayed properly, now.