Skip to main content

Processing OpenOffice.org dictionary files using Lazarus

In this article we demonstrate how to processing OpenOffice.org dictionary files using Lazarus – FPC IDE. To use the OpenOffice.org dictionary first we need to extract AFF file and DIC files from the OXT (OpenOffice.org extension) file. This can be easily done using 7Zip or any other generally available file archiving utility. (Only thing we need to do is change OXT file extension to ZIP and extract the contents)

To process this dictionary (DIC) file we need to use affix table defined in the AFF file. In this sample code we implement the complete AFF and DIC file processor for English (United States) dictionary of the OpenOffice.org.

Our processing of this affix file in this sample application is based on the following rules,

AFF file generally consist with some conditional modules as follows,

SFX T N 4
SFX T 0 st e
SFX T y iest [^aeiou]y
SFX T 0 est [aeiou]y
SFX T 0 est [^ey]

In the first line "SFX" means suffix. In En(US) dictionary this may be either SFX or PFX.

T is the name of the module (and this helps us to establish the link between DIC and AFF file)

Digit 4 indicates the number of rules for the given condition.

Once read the conditional header you need to cross product the rule set with the given word.

Rule set of the given condition is decode as follows,

  • SFX : as previously described SFX is a suffix.
  • 0 : This indicates strip off character and in here 0 means NULL.
  • st : Suffix for the give word
  • e : This represents the logical part of the rule. In here "e" means target word might need to be end with the character "e".

Example for this rule is : Late > Latest

Likewise you need to apply all these rules to the root word and make all other possibilities for the word.

For example root word "happy" may have 4 forms, such as,
Happier, Happiest, Happiness and Unhappy.

This sample application is developed using Lazarus with minimum amount of system dependencies to demonstrate the above decoding process. With some minor adjustments this can be easily deployed to the Linux and Mac OS X also.

All the source codes and binaries of this sample application are available to download in here. This sample application is deployed under the terms and conditions of GNU GPL Version 3.0.

You can obtain more details about AFF and DIC files from the OpenOffice.org Lingucomponent Project.

Comments

Roulette Bets said…
I can consult you on this question. Together we can find the decision.

Popular posts from this blog

CD2003 - yet another simple FM radio receiver

In the last few days, we are looking for some simple FM radio receiver to integrate into one of our ongoing projects. For that, we try several FM radio receiver ICs including TDA7000, CD2003/TA2003/TA8164, CXA1019, and KA22429. Out of all those chips we select CD2003 (or TA2003/TA8164) based receiver for our project because of its simplicity and outstanding performance. Except to CD2003, Sony CXA1019 also perform well but we drop it because of its higher component count. We design our receiver based on Toshiba TA2003 datasheet and later we try TA8164 and CD2003 with the same circuit. Either CD2003 or TA8164 can directly replace TA2003 IC, and as per our observations, TA8164 gives excellent results out of those 3 chips. A prototype version of CD2003 FM radio receiver The PCB design and schematic which we used in our prototype project are available to download at google drive (including pin-outs of crystal filters and inductors ). Except for CD2003 IC, this receiver consist...

Arduino superheterodyne receiver

In this project, we extend the shortwave superheterodyne receiver we developed a few years ago . Like the previous design, this receiver operates on the traditional superheterodyne principle.  In this upgrade, we enhanced the local oscillator with Si5351 clock generator module and Arduino control circuit. Compared to the old design, this new receiver uses an improved version of an intermediate frequency amplifier with 3 I.F transformers. In this new design, we divide this receiver into several blocks, which include, mixer with a detector, a local oscillator, and an I.F amplifier. The I.F amplifier builds into one PCB. The filter stage, mixer, and detector stages place in another PCB. Prototype version of 455kHz I.F amplifier. In this prototype build, the Si5351 clock generator drives using an Arduino Uno board. With the given sketch, the user can tune and switch the shortwave meter bands using a rotary encoder. The supplied sketch support clock generation from 5205kHz (tuner f...

Calculator for audio output transformers

Audio output transformers are heavily used in a vacuum tube and some (older) transistor base audio power amplifiers, but these days output transformer are quite hard to find and expensive item. For homebrew projects, the best option is to construct those transformers by ourselves and this script helps to calculate winding parameters for those transformers. This " AF output transformer calculator " script is written using Python and it works with most of the commonly available Python interpreters . The script is available to download at google drive under the terms of GNU General Public License version 3.0 . Homebrewed 25k: 4 output transformer Once supplied the input parameters this script provides a winding ratio, the number of turns required for primary and secondary winding and required copper wire gauges for both primary and secondary windings, etc. We construct several AF output transformers based on results of this script, which including transformers for M...