Introduction
The Russian ruble has a standard sign which looks like this: ₽.

It's short and understandable, so you would want to display it on your website if you're doing commerce in Russia. Unfortunately, this symbol is not widely adopted yet and the search engines you're optimizing for might not understand that you're selling something.
One way to solve this is to make a drop-in font which would contain a ligature, rendering all "руб" strings (standing for "ruble") as "₽". This way you could put the textual old-school representation of the currency in your markup, but visually it would look modern and slick.

The new font should only contain the ligature, so that it can be applied on top of the default font, using the default font (without the ligature) as a fallback.
Yes, this is not the easiest solution, but bear with me for the sake of the post. This is an interesting method because it provides an excuse to research how the fonts are made.
In this post, I will be dissecting and modifying one font in particular — Roboto by Google, but generally, the method should apply to any font.
Testing the water
After quick googling, skimming, and link following I have found a post by Roel Nieskens which is a guide about adding a custom ligature to a font. This is almost exactly what I need, I just need to remove everything else besides the ligature. This is a great starting point.

The post advises you to decode the font's TTF file into "TTX" using fonttools — a Python library that also contains some CLI utilities. TTX file is an XML representation of the font. You can edit the XML and then encode it back into TTF.
As it turns out, OpenType fonts (those that have .otf or .ttf file extension) are structured into separate sections called tables. If you look at the TTX file of Roboto Regular you'll see these tables as the direct children of the root element:
<?xml version="1.0" encoding="UTF-8"?>
<ttFont sfntVersion="\x00\x01\x00\x00" ttLibVersion="4.22">
<GlyphOrder><!-- content omitted --></GlyphOrder>
<head><!-- content omitted --></head>
<hhea><!-- content omitted --></hhea>
<maxp><!-- content omitted --></maxp>
<OS_2><!-- content omitted --></OS_2>
<hmtx><!-- content omitted --></hmtx>
<hdmx><!-- content omitted --></hdmx>
<cmap><!-- content omitted --></cmap>
<fpgm><!-- content omitted --></fpgm>
<prep><!-- content omitted --></prep>
<cvt><!-- content omitted --></cvt>
<loca><!-- content omitted --></loca>
<glyf><!-- content omitted --></glyf>
<name><!-- content omitted --></name>
<post><!-- content omitted --></post>
<gasp><!-- content omitted --></gasp>
<GDEF><!-- content omitted --></GDEF>
<GPOS><!-- content omitted --></GPOS>
<GSUB><!-- content omitted --></GSUB>
</ttFont>
Each table has a short, usually unintelligible name and those names are not always consistent with each other: some are 4 letters long, one is 3 letters long; most are lower case, some are upper case (Except for GlyphOrder
which is not a table, but a utility element generated by the TTF decompiler). This is an indicator of the long history of the format.
Some of the interesting tables, discovered with the help of Roel's post, are:
glyf
glyf
table contains information about how to draw glyphs — that is, pictures of characters. In the TTX file <glyf>
element is filled with <TTGlyph>
elements — one for each glyph in the font.
For example, this is the glyph for the comma:
<TTGlyph name="comma" xMin="29" yMin="-290" xMax="308" yMax="219">
<contour>
<pt x="134" y="-290" on="1"/>
<pt x="29" y="-218" on="1"/>
<pt x="123" y="-87" on="0"/>
<pt x="127" y="52" on="1"/>
<pt x="127" y="219" on="1"/>
<pt x="308" y="219" on="1"/>
<pt x="308" y="74" on="1"/>
<pt x="308" y="-27" on="0"/>
<pt x="209" y="-229" on="0"/>
</contour>
<instructions>
<assembly>
SVTCA[0] /* SetFPVectorToAxis */
PUSHB[ ] /* 1 value pushed */
9
MDAP[1] /* MoveDirectAbsPt */
PUSHB[ ] /* 2 values pushed */
4 5
PUSHB[ ] /* 1 value pushed */
10
CALL[ ] /* CallFunction */
IF[ ] /* If */
POP[ ] /* PopTopStack */
MDRP[11000] /* MoveDirectRelPt */
ELSE[ ] /* Else */
MIRP[10100] /* MoveIndirectRelPt */
EIF[ ] /* EndIf */
PUSHB[ ] /* 1 value pushed */
0
MDRP[10000] /* MoveDirectRelPt */
PUSHB[ ] /* 1 value pushed */
0
MDAP[1] /* MoveDirectAbsPt */
IUP[0] /* InterpolateUntPts */
IUP[1] /* InterpolateUntPts */
</assembly>
</instructions>
</TTGlyph>
As you can see, it contains metrics of the glyph and a drawing program in a language resembling Assembly. Fortunately, we don't have to deal with this, because Roboto already contains all the glyphs we need. We won't add new glyphs.
cmap
cmap
table assigns glyphs to characters (in the case of Roboto — to Unicode code points). Using this table, a renderer converts a string of text (which is usually a sequence of Unicode code points) into a series of glyphs to put them one after another.
The structure of this table is pretty straightforward, though it's duplicated twice in Roboto — apparently, for better cross-platform support:
<cmap>
<tableVersion version="0"/>
<cmap_format_4 platformID="0" platEncID="3" language="0">
<map code="0x0" name="uni0000"/><!-- ???? -->
<map code="0x2" name="uni0002"/><!-- ???? -->
<map code="0xd" name="uni000D"/><!-- ???? -->
<map code="0x20" name="space"/><!-- SPACE -->
<map code="0x21" name="exclam"/><!-- EXCLAMATION MARK -->
<map code="0x22" name="quotedbl"/><!-- QUOTATION MARK -->
<map code="0x23" name="numbersign"/><!-- NUMBER SIGN -->
<!-- many more maps omitted -->
</cmap_format_4>
<cmap_format_4 platformID="3" platEncID="1" language="0">
<map code="0x0" name="uni0000"/><!-- ???? -->
<map code="0x2" name="uni0002"/><!-- ???? -->
<map code="0xd" name="uni000D"/><!-- ???? -->
<map code="0x20" name="space"/><!-- SPACE -->
<map code="0x21" name="exclam"/><!-- EXCLAMATION MARK -->
<map code="0x22" name="quotedbl"/><!-- QUOTATION MARK -->
<map code="0x23" name="numbersign"/><!-- NUMBER SIGN -->
<!-- many more maps omitted -->
</cmap_format_4>
</cmap>
GSUB
This table contains rules for substitution of some glyphs with other glyphs (as we will later discover — some of the substitutions only apply in certain contexts or certain languages). That includes ligatures, each of which is a substitution of a sequence of glyphs with a single glyph.
This table is much more complicated than previous ones, but if you search for the LigatureSubst
element, you'll find ligature definitions deep inside of it. There several of those, one example:
<Lookup index="8">
<LookupType value="4"/>
<LookupFlag value="0"/>
<!-- SubTableCount=1 -->
<LigatureSubst index="0">
<LigatureSet glyph="f">
<Ligature components="f,i" glyph="uniFB03"/>
<Ligature components="i" glyph="uniFB01"/>
</LigatureSet>
</LigatureSubst>
</Lookup>
Entering the fight with bare hands
Let's add the ligatures by hand. Following Noel's advice, I have added two ligatures (one with a period and another — without it) into one of the Lookup
records of type 4. Type 4 means that the lookup contains ligatures. You can see that the lookup in the example above is also of type 4.
The ligature set looked like this:
<LigatureSet glyph="uni0440">
<Ligature components="uni0443,uni0431,period" glyph="uni20BD"/>
<Ligature components="uni0443,uni0431" glyph="uni20BD"/>
</LigatureSet>
The glyph
attribute on the root element corresponds to the first glyph in the replaced sequence. Then each Ligature
element specifies all the remaining glyphs in components
and the glyph that they are replaced with in glyph
.
uni0440
is the name of the glyph of the Russian lower case letter "р"uni0443
— the letter "у"uni0431
— the letter "б"period
is, well, period- and
uni20BD
is for the ruble symbol ₽
After compiling the TTX file back into TTF and loading it onto a test web page, you'll see that these ligatures... work! This was inspiringly simple. In fact, you can use the produced font as a replacement for the original Roboto to achieve the desired effect. But we're not looking for a font modification. We're looking for a drop-in addon, which can be easily applied on top of the original Roboto and just as easily disabled.
For that we need to make sure that the resulting font is as slim as possible, containing ideally only 5 glyphs and 2 ligatures.
Also, a note for myself, make sure to change the name of the font in the name
table, otherwise, Chrome Dev Tools don't distinguish the original and the new font:

Securing the victory...?
To slim down the font we have to look through the TTX and delete everything that doesn't apply to the glyphs we have used. Some tables are simple to clean up: GlyphOrder
, cmap
, hmtx
, hdmx
and glyf
are just lists with an item for each glyph. Other tables are small and seemingly only contain metadata, so they can be left unedited. GDEF
table seems to be pretty simple to clean up too.
I am trying to remove extra elements by hand to see if I can make it work. During the work, I get compilation errors from the ttx tool, because I have deleted a glyph but it is referenced in other parts of the font. Investigating one of these errors reveals a problem that slightly complicates things.
This is the glyph for the Russian letter "у" in the glyf
table.
<TTGlyph name="uni0443" xMin="22" yMin="-437" xMax="944" yMax="1082">
<component glyphName="y" x="0" y="0" flags="0x204"/>
</TTGlyph>
As you can see, it doesn't contain the drawing of the glyph, it only copies the drawing of the Latin letter "y". But I have deleted the Latin "y" everywhere! I should have kept it and all the other glyphs which are referenced this way. Scratch everything, start over.
On top of this, GPOS
and GSUB
tables appeared to have a complex structure that can't be easily slimmed down without proper understanding. What are those ScriptList
and FeatureList
elements? They don't reference any glyphs. Clearing them seems to break everything. Clearly, I'm missing the full picture.
After some unmotivated searching and deletion, I have managed to produce a font that compiles but... doesn't render. The ligatures no longer work and I have no idea why. I deleted something essential without knowing or noticing.
I need to understand a little more about what I'm doing and I need a script to do all this automatically and error-proof.
Doing the homework
Microsoft has an extensive reference for the OpenType format on their website and it was of huge help. Many of the search queries about font tables and features lead here.

In the sidebar there is a page for each table of the font, describing the structure of the table, attributes of its records, etc.
For example, among the two cmap
records we have seen in Roboto, the first one is a generic Unicode 2.0 mapping, and the second one is specific to Windows. Their purpose can be deduced from the platform ID and the platform encoding ID.
<cmap>
<tableVersion version="0"/>
<cmap_format_4 platformID="0" platEncID="3" language="0">
<map code="0x0" name="uni0000"/><!-- ???? -->
<map code="0x2" name="uni0002"/><!-- ???? -->
<map code="0xd" name="uni000D"/><!-- ???? -->
<map code="0x20" name="space"/><!-- SPACE -->
<map code="0x21" name="exclam"/><!-- EXCLAMATION MARK -->
<map code="0x22" name="quotedbl"/><!-- QUOTATION MARK -->
<map code="0x23" name="numbersign"/><!-- NUMBER SIGN -->
<!-- many more maps omitted -->
</cmap_format_4>
<cmap_format_4 platformID="3" platEncID="1" language="0">
<map code="0x0" name="uni0000"/><!-- ???? -->
<map code="0x2" name="uni0002"/><!-- ???? -->
<map code="0xd" name="uni000D"/><!-- ???? -->
<map code="0x20" name="space"/><!-- SPACE -->
<map code="0x21" name="exclam"/><!-- EXCLAMATION MARK -->
<map code="0x22" name="quotedbl"/><!-- QUOTATION MARK -->
<map code="0x23" name="numbersign"/><!-- NUMBER SIGN -->
<!-- many more maps omitted -->
</cmap_format_4>
</cmap>

Talking about GPOS
and GSUB
tables, they share a similar 4-level hierarchy inside of them and it is clearly described in the Microsoft reference too:

Both of these two tables list certain features of the font. GPOS
lists features related to the positioning of the glyphs (like kerning), and GSUB
lists features related to the substitution of glyphs. Each feature in these tables can be enabled for certain language systems and certain scripts, thus their place in the hierarchy.
Scripts
This level corresponds to the ScriptList
element in TTX. It contains the list of scripts for which the font enables features. Each script is identified by a script tag, for example, latn
stands for Latin script, cyrl
— for Cyrillic, and grek
— for Greek. DFLT
is the special tag for the default script, which is used to enable features for all scripts.
For example, Roboto enables "fi" and "fl" ligatures for Latin script, but not for other scripts.
Language Systems
Each script record is split into language system records, which usually correspond to languages. Language systems are also identified by tags: "FRA "
stands for French and so on. There is a special record for "default language system" which means "any language system".
Generally, most of Roboto's features are enabled in all language systems in all scripts. But there are exceptions. As a curious example, it replaces the "S with cedilla" glyph with "S with comma below" in the Romanian language system in Latin script. I think it does so for consistency because both spellings are used interchangeably in the Romanian language.

Features
Now, language system records list features that should be enabled for the given language system. Features are defined in a separate element of the TTX: FeatureList
, and language systems reference features by their indices in the list.
Features are identified by tags as well, and OpenType supports a lot of features. "Ligatures" is not just one feature of the font, there are many features used for ligatures of different kinds. These are just some of the more generic ones:
ccmp
feature ("Glyph Composition/Decomposition") is used in Roboto to handle Unicode's combining characters, like accents. Accent glyph put after the capital A, for example, produces a dedicated glyph of "A with an accent".clig
("Contextual Ligatures") is used for ligatures in certain contexts (i.e. surrounded or not surrounded by certain characters).dlig
("Discretionary Ligatures") covers ligatures that "may be used for special effects at user's preference".hlig
("Historical Ligatures") can be used to optionally enable a historical outdated look to the font. Not used in Roboto but it's curious that font vendors have this ability.liga
("Standard Ligatures") is the most generic one. Microsoft docs list "ffl" as an example of a ligature that this feature can be used for. And Roboto uses it exactly for that: "ffi", "fi", "ffl" and "fl".
By the way, did you know that you can control which font features are enabled using CSS?
Lookups
Finally, the LookupList
element of the TTX. Feature records are pretty shallow, they specify their function in the tag, but the data for the feature — which glyphs to replace with which glyphs (in the GSUB
table) or the positioning of the glyphs (in the GPOS
table) — these data are listed in lookups.
Lookups are not identified by tags, they are identified by numeric type IDs and these IDs have different meanings between GPOS
and GSUB
tables. As we have covered before, type 4 stands for non-contextual ligatures in GSUB
and for now, this is the only type we need to know.
Generally, lookups are just lists of glyphs or substitutions, or ligatures. Each type has its own structure and since it's the lowest level of the hierarchy, it doesn't reference other parts of the font, except for glyphs, so it's usually easy to understand just by looking at the TTX.
Attempting a more civilized approach
After all that research, I have come up with a Python script that parses the TTX using lxml
and produces a lightweight copy of the font containing the ligatures. I won't show the code itself, because it's not very tidy, but here is the general algorithm.
- Have the desired ligatures listed like this:
LIGATURES = [
("руб.", "₽"),
("руб", "₽"),
]
- Extract all the character code points used in the
LIGATURES
above.
print(used_codepoins)
# [1073, 8381, 1088, 1091, 46]
- Parse the
cmap
table in the font and convert code points from the previous step into glyph names.
print(used_glyphs)
# ['period', 'uni0431', 'uni0440', 'uni0443', 'uni20BD']
- Parse the
glyf
table and extract all the glyphs referenced by the glyphs above. Do this in a loop, to make sure transitive dependencies are covered. Now we have a definitive list of glyphs we need to keep in the font.
print(used_glyphs_with_dependencies)
# ['period', 'uni0431', 'uni0440', 'uni0443', 'uni20BD', 'p', 'y']
- Create the TTX file for the new font and populate the new
GlyphOrder
element with the used glyphs.
<GlyphOrder>
<GlyphID id="0" name="period"/>
<GlyphID id="1" name="uni0431"/>
<GlyphID id="2" name="uni0440"/>
<GlyphID id="3" name="uni0443"/>
<GlyphID id="4" name="uni20BD"/>
<GlyphID id="5" name="p"/>
<GlyphID id="6" name="y"/>
</GlyphOrder>
- Copy many of the tables as-is, because they don't reference glyphs and they are small anyway:
head
,hhea
,maxp
,OS_2
,fpgm
,prep
,cvt
,loca
,post
,gasp
- Parse the
hmtx
table and only keep records for the glyphs we need. Same withhdmx
,cmap
,glyf
.
<hmtx>
<mtx name="p" width="1149" lsb="140"/>
<mtx name="period" width="539" lsb="144"/>
<mtx name="uni0431" width="1132" lsb="97"/>
<mtx name="uni0440" width="1149" lsb="140"/>
<mtx name="uni0443" width="969" lsb="22"/>
<mtx name="uni20BD" width="1359" lsb="31"/>
<mtx name="y" width="969" lsb="22"/>
</hmtx>
<hdmx>
<hdmxData>
ppem: 9 ;
p: 5 ;
period: 2 ;
uni0431: 5 ;
uni0440: 5 ;
uni0443: 4 ;
uni20BD: 6 ;
y: 4 ;
</hdmxData>
</hdmx>
<cmap>
<tableVersion version="0"/>
<cmap_format_4 platformID="0" platEncID="3" language="0">
<map code="0x2e" name="period"/>
<map code="0x70" name="p"/>
<map code="0x79" name="y"/>
<map code="0x431" name="uni0431"/>
<map code="0x440" name="uni0440"/>
<map code="0x443" name="uni0443"/>
<map code="0x20bd" name="uni20BD"/>
</cmap_format_4>
<cmap_format_4 platformID="3" platEncID="1" language="0">
<map code="0x2e" name="period"/>
<map code="0x70" name="p"/>
<map code="0x79" name="y"/>
<map code="0x431" name="uni0431"/>
<map code="0x440" name="uni0440"/>
<map code="0x443" name="uni0443"/>
<map code="0x20bd" name="uni20BD"/>
</cmap_format_4>
</cmap>
<glyf>
<!-- omitted for brevity -->
</glyf>
- Parse the
name
table and replace "Roboto" with something along the lines of "Roboto-RubleLigature".
<name>
<namerecord nameID="0" platformID="3" platEncID="1" langID="0x409">
Copyright 2011 Google Inc. All Rights Reserved.
</namerecord>
<namerecord nameID="1" platformID="3" platEncID="1" langID="0x409">
Roboto-RubleLigature
</namerecord>
<namerecord nameID="2" platformID="3" platEncID="1" langID="0x409">
Regular
</namerecord>
<namerecord nameID="3" platformID="3" platEncID="1" langID="0x409">
Roboto-RubleLigature
</namerecord>
<namerecord nameID="4" platformID="3" platEncID="1" langID="0x409">
Roboto-RubleLigature
</namerecord>
<namerecord nameID="5" platformID="3" platEncID="1" langID="0x409">
Version 2.137; 2017
</namerecord>
<namerecord nameID="6" platformID="3" platEncID="1" langID="0x409">
Roboto-RubleLigature-Regular
</namerecord>
<namerecord nameID="7" platformID="3" platEncID="1" langID="0x409">
Roboto is a trademark of Google.
</namerecord>
<namerecord nameID="9" platformID="3" platEncID="1" langID="0x409">
Google
</namerecord>
<namerecord nameID="11" platformID="3" platEncID="1" langID="0x409">
Google.com
</namerecord>
<namerecord nameID="12" platformID="3" platEncID="1" langID="0x409">
Christian Robertson
</namerecord>
<namerecord nameID="13" platformID="3" platEncID="1" langID="0x409">
Licensed under the Apache License, Version 2.0
</namerecord>
<namerecord nameID="14" platformID="3" platEncID="1" langID="0x409">
http://www.apache.org/licenses/LICENSE-2.0
</namerecord>
</name>
- Traverse the trees of the
GDEF
andGPOS
tables removing elements that reference glyphs that we don't need. This produces a lot of empty elements — in particular, empty and useless lookups inGPOS
. But I'm not removing them because lookups are referenced by their indices and if we remove a lookup, indices of his subsequent siblings will shift down. So it's easier to keep them for padding. - Do not copy
GSUB
but produce a newGSUB
with nothing but the ligatures:
<GSUB>
<Version value="0x00010000"/>
<ScriptList>
<ScriptRecord index="0">
<ScriptTag value="DFLT"/>
<Script>
<DefaultLangSys>
<ReqFeatureIndex value="65535"/>
<FeatureIndex index="0" value="0"/>
</DefaultLangSys>
</Script>
</ScriptRecord>
</ScriptList>
<FeatureList>
<FeatureRecord index="0">
<FeatureTag value="liga"/>
<Feature>
<LookupListIndex index="0" value="0"/>
</Feature>
</FeatureRecord>
</FeatureList>
<LookupList>
<Lookup index="0">
<LookupType value="4"/>
<LookupFlag value="0"/>
<LigatureSubst index="0">
<LigatureSet glyph="uni0440">
<Ligature components="uni0443,uni0431,period" glyph="uni20BD"/>
<Ligature components="uni0443,uni0431" glyph="uni20BD"/>
</LigatureSet>
</LigatureSubst>
</Lookup>
</LookupList>
</GSUB>
- That's it. Close the TTX file.
The best part of this script is that the resulting TTX compiles into a perfectly functional font! It works just as intended and it weighs about 7.5 kb against 165 kb of the original Roboto. Maybe you could trim down a few extra bytes if you remove the extra lookups and features in GPOS
, but for now, it's perfect!

This same script can then be run on other variants of Roboto (medium, bold, thin, etc.) to have a ligature font for each weight. These fonts can then be linked to the page using the @font-face
rule.
@font-face {
font-family: "Roboto RubleLigature";
font-weight: 400;
src: url("./Roboto-RubleLigature-Regular.ttf");
}
.ruble-ligature {
font-family: "Roboto RubleLigature", "Roboto", sans-serif;
}
Issues to keep in mind
Despite the fact that 3 or 4 characters are displayed to look like one character, the browser understands that they're not. This produces some interesting effects.
For one, you can select part of the glyph:

Because for browser it looks like you're doing this:

This issue can be solved if you disable user selection for the elements where the ligature is used:
.ruble-ligature {
user-select: none;
}
Or, if you isolate the ligatures in separate inline elements, you can prevent the partial selection using user-select: all
:
<p class="ruble-ligature">
Ligature applied: 1000 <span class="select-all">руб.</span>
</p>
<style>
.select-all {
user-select: all;
}
</style>
Another issue is that if you select and copy text with the ligature, the copied text will contain the original characters. But this shouldn't be a problem from the UX side. In fact, it can even be considered a feature ;)
Conclusion
If any professional font editor ever reads this post, they will probably be horrified by the atrocities I have committed. But I hope it can be an interesting read for people who like to look at the ins and outs of things we commonly use but don't usually bother to look inside of.
Cover photo by Mr Cup / Fabien Barral on Unsplash