Skip to main content

Processing OpenOffice.org dictionary files using Lazarus

In this article we demonstrate how to processing OpenOffice.org dictionary files using Lazarus – FPC IDE. To use the OpenOffice.org dictionary first we need to extract AFF file and DIC files from the OXT (OpenOffice.org extension) file. This can be easily done using 7Zip or any other generally available file archiving utility. (Only thing we need to do is change OXT file extension to ZIP and extract the contents)

To process this dictionary (DIC) file we need to use affix table defined in the AFF file. In this sample code we implement the complete AFF and DIC file processor for English (United States) dictionary of the OpenOffice.org.

Our processing of this affix file in this sample application is based on the following rules,

AFF file generally consist with some conditional modules as follows,

SFX T N 4
SFX T 0 st e
SFX T y iest [^aeiou]y
SFX T 0 est [aeiou]y
SFX T 0 est [^ey]

In the first line "SFX" means suffix. In En(US) dictionary this may be either SFX or PFX.

T is the name of the module (and this helps us to establish the link between DIC and AFF file)

Digit 4 indicates the number of rules for the given condition.

Once read the conditional header you need to cross product the rule set with the given word.

Rule set of the given condition is decode as follows,

  • SFX : as previously described SFX is a suffix.
  • 0 : This indicates strip off character and in here 0 means NULL.
  • st : Suffix for the give word
  • e : This represents the logical part of the rule. In here "e" means target word might need to be end with the character "e".

Example for this rule is : Late > Latest

Likewise you need to apply all these rules to the root word and make all other possibilities for the word.

For example root word "happy" may have 4 forms, such as,
Happier, Happiest, Happiness and Unhappy.

This sample application is developed using Lazarus with minimum amount of system dependencies to demonstrate the above decoding process. With some minor adjustments this can be easily deployed to the Linux and Mac OS X also.

All the source codes and binaries of this sample application are available to download in here. This sample application is deployed under the terms and conditions of GNU GPL Version 3.0.

You can obtain more details about AFF and DIC files from the OpenOffice.org Lingucomponent Project.

Comments

Roulette Bets said…
I can consult you on this question. Together we can find the decision.

Popular posts from this blog

Enable WebRTC on QtWebEngine for Raspberry Pi 3

WebRTC is a web technology to enable peer to peer communication in real-time. It mainly uses to create video conferencing and chat applications using web browsers. In this post, we describe how to enable this technology in QtWebEngine on Raspberry Pi 3 platform.

QtWebEngine is an embedded browser component which comes with the Qt framework. This component is based on Google Chromium browser and it supports most of the Chromium features including WebRTC. In PC, WebRTC applications run smoothly on QtWebEngine component. But in Raspberry Pi platform situation is different and none of the WebRTC application is work with the QtWebEngine. The only thing which we can see is a black box in an HTML5 video tag area. At the time of writing this problem exists in Qt version 5.6, 5.7 and 5.8.

BMP180 based USB atmospheric pressure monitor

We initially developed this USB atmospheric pressure monitor to study some operating characteristics of Bosch BMP180 sensor. BMP180 is a low-cost sensor for measuring barometric pressure and temperature. According to the datasheet this sensor can use to measure pressure ranging between 300hPa to 1100hPa. This sensor is introduced a couple of years back but still, it is popular due to lower cost and simplicity of its interface.


We did this unit to test the BMP180 sensor more accurately and to study its behaviors. This unit is based on PIC18F2550 microcontroller and the main reason to select this MCU is because of its built-in USB 2.0 interface.


To display sensor calibration data and it’s readings we did small windows application. This application display and plot temperature and pressure readings captured from the BMP180 sensor.

This unit is programmed to work as a USB HID device and no special device driver is required to use this device. We test this unit in Windows 10 environment.

Programmable light controller

The main objective of this project is to design a maintenance free and low-cost light which automatically turns on and off at the predetermined time of the day.

To meet the above requirement I designed this controller using ATmega8 MCU and DS1307 RTC. The driver stage of this light controller is intended to work with commonly available 7W LED modules.


The core component of this programmable light is ATmega8 low power CMOS microcontroller. The main reason to select this microcontroller is it’s lower cost and higher availability. Except for the above two reasons this microcontroller also bundled with a rich set of peripherals which including 23 GPIOs, 3 independent timers, Two-wire serial interface, EEPROM, etc.

Apart from ATmega8 microcontroller, this system uses DS1307 real time clock to maintain system time. Like ATmega8, DS1307 is also a very popular RTC in the market.

This controller is designed to work with a 24V DC power supply. The main reason to select 24V is that most of the…