Metrics are important information for localization projects. One of the most significant is word count since most of the cost is based on how many words are to translate.

There is currently no common way to do word count. Each tool uses its own set of rules when estimating projects. The problem comes mostly from whether a few characters (such as apostrophe, dash, dot, semi-colon, colon, etc.) are treated as work break or not.

For example, the text segment:

abc abc-abc abc;abc abc:abc abc'abc abc=abc 123 @#$

Gives the following results:

Tool Word Count
Catalyst 5 build 2040 13
DéjàVu 3.0.18 13
DOS wc utility 9
ForeignDesk 5.7.1 10
Rainbow 2.01 build 14 10
SDLX 4.0 12
Trados 5 build 217 11
Word 2000 9
Wordfast 3.35b 12

In projects counting several dozens or hundreds of files, this type difference will generate very significant differences when it comes to the estimating of the scope of the work, its cost and even its duration.

Another problem is how to deal with numbers, not only whether they should be counted as a word or not, but also how to recognize formatted number. For example the number "54,321" formatted in French is "54 321".

There is probably no definitive correct or incorrect way to decide what exactly should be the word count. What is important is that, at some point, the translation industry establish a common algorithm for word count and tools (at least localization and translation tools) implement it.

Word counts are only part of a larger set of metrics. OSCAR, the LISA standards committee, has started a working group on this topic. The GILT Metrics (GMX) will most likely be developed into a tripartite specification covering Volume (GMX-V), Complexity (GMX-C) and Quality (GMX-Q).