sebix

Automatic hyphenation

2012-04-15, 17:46

Rea­sons for full jus­ti­fi­ca­tion

Some weeks ago I began to play around with full jus­ti­fi­ca­tion of texts be­yond LaTeX. Fully jus­ti­fied texts are used by pro­fes­sion­al type­set­ters and they can be found in al­most every news­pa­per. Full jus­ti­fi­ca­tion is nec­es­sary there, be­cause oth­er­wise the read­er would loose the ori­en­ta­tion be­tween all the small col­umns. They also ap­pear more and more in the In­ter­net as it just looks more pro­fes­sion­al and is far eas­i­er to read, in par­tic­u­lar when the para­graph widths are small. This is the case on mo­bile de­vices and in multi-col­umn-lay­outs using HTML5.

Re­spon­sive De­sign

There’s an im­por­tant link to Re­spon­sive Web­de­sign as the jus­ti­fi­ca­tion de­pends es­sen­tial­ly on the dis­play-width. If a fully jus­ti­fied para­graph is hy­phened with the usual hy­phen -, it works only for a spe­cif­ic text/box-width and usu­al­ly fails with slight­ly changed ren­der­ing con­di­tions. Not only in Re­spon­sive De­sign, it is ex­pect­ed that all texts are scal­ing prop­er­ly and are dis­played in a read­able and con­sis­tent fash­ion, so a “hard hy­phen” - is a very bad idea.

Some time ago a found a web­site of a pro­fes­sion­al mar­ket­ing group which had a short (fully jus­ti­fied) para­graph at the top of their home­page. But they used just a nor­mal hy­phen -, which re­sults in ugly white spaces be­tween the words when zoom­ing in/out. This is of course not a good way to cre­ate jus­ti­fied para­graphs and can re­sult in a bad im­pres­sion!

Ty­pog­ra­phy

Above I men­tioned al­ready TeX, which has a great (maybe the best?) hy­phen­ation al­go­rithm. Any­way, it’s the best open one ;-). This al­go­rithm uses syl­la­ble di­vi­sion based on lan­guage-spe­cif­ic pat­terns. Of course there exist many ex­cep­tions to this com­mon rules, but the TeX-al­go­rithm is very well-en­gi­neered and fails very rarely under spe­cial and un­pre­dictable cases (in these cases you can help TeX by adding your own rules).

A sim­ple hy­phen­ation al­go­rithm with pat­tern-de­tec­tion can be eas­i­ly im­ple­ment­ed, but I only want per­fect so­lu­tions. So, pro­fes­sion­al set­ting pro­grams have their (good) so­lu­tions and take their time to process all the data. So what are the so­lu­tions for the web?

So­lu­tions for the Web

The prob­lem is that most Brows­er don’t have syl­la­ble di­vi­sion al­go­rithms im­ple­ment­ed, so the de­vel­op­ers and web de­sign­ers have to take care about this issue. There exist sev­er­al so­lu­tions which en­able au­to­mat­ic hy­phen­ation on the web. Some are im­ple­ment­ed in JavaScript, which gen­er­ate the hy­phen­ation at run-/ren­der-time in the brows­er. In my opin­ion this is the worst idea to solve this prob­lem, as JavaScript is re­quired and the hy­phen­ation is done for every ren­dering and every client (re­sult­ing in longer ren­dering times), thus it is more in­ef­fi­cient than other so­lu­tions.

Oth­ers gen­er­ate the so-called “soft-hy­phens” at run- or com­pile-time (on the serv­er by a CGI-lan­guage or lo­cal­ly by sta­t­ic com­pil­ers), which are em­bed­ded in the HTML-source­code by the ­-code. These HTML-Codes are hints for the brows­er and sig­nal them the pos­si­ble syl­la­ble di­vi­sions. The Brows­er then knows where an in-word break is pos­si­ble and jus­ti­fies the para­graph per­fect­ly de­pend­ing on the block width. This method is used by acry­lamid via a fil­ter and there­fore also en­abled on this home­page. It is the best method for the time being, be­cause it is sup­port­ed by all browsers! The only down­side of the soft hy­phens is, that the text is quite un­read­able in the source code (se­man­tic ex­pres­sions are of course not af­fect­ed).

CSS3

The best news is an up­com­ing fea­ture of CSS3, called hyphens. It en­ables au­to­mat­ic hy­phen­ation al­go­rithms in the brows­ers, so that they can fi­nal­ly do the work they have to do: dis­play con­tent prop­er­ly. The fea­ture is yet not fully im­ple­ment­ed in every brows­er, so that brows­er-spe­cif­ic prop­er­ties have to be used. Ad­di­tion­al­ly the brows­ers may not sup­port all lan­guages yet.

body {
  -moz-hyphens: auto;
  -ms-hyphens: auto;
  -o-hyphens: auto;
  -webkit-hyphens: auto;
  hyphens: auto;
}

At the time being, the fea­ture is not at all re­li­ably im­ple­ment­ed! The best way re­mains adding soft hy­phens until every brows­er has a good hy­phen­ation al­go­rithm. Nev­er­the­less I added this code to my stylesheet, maybe it is help­ful for some ren­der­ings.

The cur­rent state of art can be test­ed on a pri­vate test-page. The spec­i­fi­ca­tion is – as usual – at W3.org in the sec­tion CSS Text Level 3.

hosted by github
powered by acrylamid
Creative Commons Attribution 3.0 Unported License