TvQuran

Friday, July 3, 2009

GIZA++ Issues

el salamo 3alikom wa ra7amtoo ALLAH wa baraktoo

While I was trying to build GIZA++ for the first time, I've encountered a build break which made me enthusiastic enough to look into the code :). and play with it a little bit to solve that build break.
However when I used Cygwin after that to complie the original code before the modification, it compiled just fine and there was no need for my modification to successfuly build GIZA++

1st Issue...

Despite that, I liked to share the modification I've made to get the code build successfully, also I found that it was an issue reported at the official website of GIZA++ and the auther recommended to use different version of the gcc compliler - more info about the issue.

There are two definitions of a function and the complier get confused which one to call.
  1. Edit giza-pp\GIZA++-v2\collCounts.cpp, delete that function definition...
  2. template<classTRANSPAIR>

    doublecollectCountsOverNeighborhoodForSophisticatedModels(constMoveSwapMatrix&,LogProb,void*)

    {

    return 0.0;

    }


  3. And at the defenition...
  4. template<classTRANSPAIR,classMODEL>

    doublecollectCountsOverNeighborhoodForSophisticatedModels(constMoveSwapMatrix&msc,LogProb normalized_ascore,MODEL*d5Table){...}

    Change the last parameter of the following function call…
    from:

    _collectCountsOverNeighborhoodForSophisticatedModels(…,…,…,…,d5Table);

    to:

    _collectCountsOverNeighborhoodForSophisticatedModels(…,…,…,…,(d5model*)d5Table);

This is my first step to work with GIZA++, I don’t even know if the changes I made at the source files will cause problems or not but they seem logical to me :).

2nd Issue...

If you tried to run the command "make clean" at Cygwin the EXEs won't get cleaned because the code was written to be compiled under Linux and Linux doesn't know anything about EXEs, so you just let the Cygwin know.
  • Edit \giza-pp\GIZA++-v2\Makefile and modify "-rm -f snt2plain.out plain2snt.out snt2cooc.out GIZA++" to "GIZA++.exe"
  • Edit \giza-pp\mkcls-v2\Makefil and modify "-rm -f *.o mkcls" to "mkcls.exe"
Now the clean will work fine and delete the EXEs.

Hope it was a useful post for you guys.

GIZA++ commands

el salamo 3alikom wa ra7amtoo ALLAH wa baraktoo

Today was my first time to try running GIZA++ and most of the references at the internet is stating what I'll state here at that post. However I'll try to talk more about GIZA++ as I go further with it at coming posts In Sha'a ALLAH

So let me introduce how to train using GIZA++.
  • Assuming that you are having a bin folder that contain all the output files after you've built GIZA++
  • Assuming the parallel corpus files reside inside that bin folder, e.g. arabic.txt and english.txt
  • Execute the following commands under Cygwin...
  1. Convert the plain text to GIZA++ format
    Run: ./plain2snt.out english.txt arabic.txt
    Output: \bin\arabic.vcb
    \bin\arabic_english.snt
    \bin\english.vcb
    \bin\english_arabic.snt
  2. Generate Word vs. Freqency (classes) and Freqency vs. Words (cats) files
    Run: ./mkcls -penglish.txt -Venglish.vcb.classes
    ./mkcls -parabic.txt -Venglish.vcb.classes
    Output: \bin\arabic.vcb.classes
    \bin\arabic.vcb.classes.cats
    \bin\english.vcb.classes
    \bin\english.vcb.classes.cats
    Note: Almost every where they are saying executing that command is optional and I tried running GIZA++ with and without generating classes files and it worked fine
  3. Finally run GIZA++
    Run: ./GIZA++ -T english.vcb -S arabic.vcb -C english_arabic.snt
    Or
    ./GIZA++ -T english.vcb -S arabic.vcb -C arabic_english.snt
    Output: I found a nice PPT file through Google :) that describes the contents of all the output files and their jobs.
That's all for now, see you next post :).

Monday, June 29, 2009

How to setup GIZA++ on Windows?


el salamo 3alikom wa ra7matoo ALLAH wa baraktoo

I’ll illustrate in that post how to build GIZA++ under Windows to a pure Windows user.

What is GIZA++?

GIZA++ is a statistical machine translation toolkit one of its main features of the package is text alignment.

Note: The version of GIZA++ I’m going to talk about is uploaded on Mar 20, 2009.

Installation Steps...

  1. Download GIZA++
  2. Extract the files any location you want, using WinRAR for example.
  3. Edit \giza-pp\GIZA++-v2\Makefile, Search for DBINARY_SEARCH_FOR_TTABLE and delete it - more info.
  4. Download Cygwin and choose "gcc-g++", "binutils" and "make" at the Select Packages.
  5. Open Cygwin Bash Shell, navigate to the place you’ve extracted the GIZA++ files, and then start building the project using make command.
  6. Outfiles...
    \giza-pp\GIZA++-v2\GIZA++.exe
    \giza-pp\GIZA++-v2\snt2plain.out
    \giza-pp\GIZA++-v2\plain2snt.out
    \giza-pp\GIZA++-v2\snt2cooc.out

    \giza-pp\mkcls-v2\mkcls.exe

This is my first step to start working with GIZA++. Wait for coming posts about how to use GIZA++, In-Sha'a-ALLAH

Hope you liked the post :).

Sunday, June 28, 2009

Hello World !

el salamo 3alikom wa ra7amtoo ALLAH wa baraktoo

Hay everyone!

My name is Mohammed Moussa and I've decided to do my master's thesis in Natural Language Processing and most probably will be at Arabic Statistical Machine Translation.

I'm a beginner at that field, I almost know nothing about it except that I love it :).

So I decided to create a technical blog to share with you my knowledge as soon as I have one :D.

Since I'm still learning about NLP so most probably my posts will be about NLP in general and about the basic stuff needed to be known before start working at NLP like statistic, digital communication concepts, AI concepts, tools installation and configuration, etc....

Let's see together where I'm going with that blog, as I yet don't know :).

ahh, I almost forgot, I'm a Software Development Engineering Team Lead at NTP Software and doing my master's at Arab Academy for Science, Technology & Maritime Transport

Wish me luck and see you in the next post :).