Skip to main content

Processing OpenOffice.org dictionary files using Lazarus

In this article we demonstrate how to processing OpenOffice.org dictionary files using Lazarus – FPC IDE. To use the OpenOffice.org dictionary first we need to extract AFF file and DIC files from the OXT (OpenOffice.org extension) file. This can be easily done using 7Zip or any other generally available file archiving utility. (Only thing we need to do is change OXT file extension to ZIP and extract the contents)

To process this dictionary (DIC) file we need to use affix table defined in the AFF file. In this sample code we implement the complete AFF and DIC file processor for English (United States) dictionary of the OpenOffice.org.

Our processing of this affix file in this sample application is based on the following rules,

AFF file generally consist with some conditional modules as follows,

SFX T N 4
SFX T 0 st e
SFX T y iest [^aeiou]y
SFX T 0 est [aeiou]y
SFX T 0 est [^ey]

In the first line "SFX" means suffix. In En(US) dictionary this may be either SFX or PFX.

T is the name of the module (and this helps us to establish the link between DIC and AFF file)

Digit 4 indicates the number of rules for the given condition.

Once read the conditional header you need to cross product the rule set with the given word.

Rule set of the given condition is decode as follows,

  • SFX : as previously described SFX is a suffix.
  • 0 : This indicates strip off character and in here 0 means NULL.
  • st : Suffix for the give word
  • e : This represents the logical part of the rule. In here "e" means target word might need to be end with the character "e".

Example for this rule is : Late > Latest

Likewise you need to apply all these rules to the root word and make all other possibilities for the word.

For example root word "happy" may have 4 forms, such as,
Happier, Happiest, Happiness and Unhappy.

This sample application is developed using Lazarus with minimum amount of system dependencies to demonstrate the above decoding process. With some minor adjustments this can be easily deployed to the Linux and Mac OS X also.

All the source codes and binaries of this sample application are available to download in here. This sample application is deployed under the terms and conditions of GNU GPL Version 3.0.

You can obtain more details about AFF and DIC files from the OpenOffice.org Lingucomponent Project.

Comments

Roulette Bets said…
I can consult you on this question. Together we can find the decision.

Popular posts from this blog

CD2003 - yet another simple FM radio receiver

In the last few days, we are looking for some simple FM radio receiver to integrate into one of our ongoing projects. For that, we try several FM radio receiver ICs including TDA7000, CD2003/TA2003/TA8164, CXA1019, and KA22429. Out of all those chips we select CD2003 (or TA2003/TA8164) based receiver for our project because of its simplicity and outstanding performance. Except to CD2003, Sony CXA1019 also perform well but we drop it because of its higher component count. We design our receiver based on Toshiba TA2003 datasheet and later we try TA8164 and CD2003 with the same circuit. Either CD2003 or TA8164 can directly replace TA2003 IC, and as per our observations, TA8164 gives excellent results out of those 3 chips. A prototype version of CD2003 FM radio receiver The PCB design and schematic which we used in our prototype project are available to download at google drive (including pin-outs of crystal filters and inductors ). Except for CD2003 IC, this receiver consist...

Arduino superheterodyne receiver

In this project, we extend the shortwave superheterodyne receiver we developed a few years ago . Like the previous design, this receiver operates on the traditional superheterodyne principle.  In this upgrade, we enhanced the local oscillator with Si5351 clock generator module and Arduino control circuit. Compared to the old design, this new receiver uses an improved version of an intermediate frequency amplifier with 3 I.F transformers. In this new design, we divide this receiver into several blocks, which include, mixer with a detector, a local oscillator, and an I.F amplifier. The I.F amplifier builds into one PCB. The filter stage, mixer, and detector stages place in another PCB. Prototype version of 455kHz I.F amplifier. In this prototype build, the Si5351 clock generator drives using an Arduino Uno board. With the given sketch, the user can tune and switch the shortwave meter bands using a rotary encoder. The supplied sketch support clock generation from 5205kHz (tuner f...

Experimental narrowband FM receiver for 2-meter band

This project is about MC3362 and ADF4351 based modularized, 2-meter narrow band FM receiver. In this design, the receiver splits into three modules as RF preamplifier, MC3362 tuner, and ADF4351 oscillator. The RF preamplifier builts around BF900 dual-gate MOSFET. The tuner stage builts using the popular MC3362 , low power narrowband FM receiver IC. For the oscillator, we use the ADF4351 DDS RF signal generator module. The core component of this receiver is MC3362 IC. This IC was designed by Motorola and is no longer in production, but this IC is still available to purchase in many online stores . The chip we used in this receiver was purchased from a local electronic component store for LKR 75 (USD 0.2). The RF preamplifier used in this receiver extracts from the N.Ganesan's (VU3GEK) LRR200, 2-meter band FM receiver project . Prototype version of the RF preamplifier. In this prototype, the above preamplifier was built as a module using a Manhattan construction technique. Th...