TvQuran

Friday, July 3, 2009

GIZA++ commands

el salamo 3alikom wa ra7amtoo ALLAH wa baraktoo

Today was my first time to try running GIZA++ and most of the references at the internet is stating what I'll state here at that post. However I'll try to talk more about GIZA++ as I go further with it at coming posts In Sha'a ALLAH

So let me introduce how to train using GIZA++.
  • Assuming that you are having a bin folder that contain all the output files after you've built GIZA++
  • Assuming the parallel corpus files reside inside that bin folder, e.g. arabic.txt and english.txt
  • Execute the following commands under Cygwin...
  1. Convert the plain text to GIZA++ format
    Run: ./plain2snt.out english.txt arabic.txt
    Output: \bin\arabic.vcb
    \bin\arabic_english.snt
    \bin\english.vcb
    \bin\english_arabic.snt
  2. Generate Word vs. Freqency (classes) and Freqency vs. Words (cats) files
    Run: ./mkcls -penglish.txt -Venglish.vcb.classes
    ./mkcls -parabic.txt -Venglish.vcb.classes
    Output: \bin\arabic.vcb.classes
    \bin\arabic.vcb.classes.cats
    \bin\english.vcb.classes
    \bin\english.vcb.classes.cats
    Note: Almost every where they are saying executing that command is optional and I tried running GIZA++ with and without generating classes files and it worked fine
  3. Finally run GIZA++
    Run: ./GIZA++ -T english.vcb -S arabic.vcb -C english_arabic.snt
    Or
    ./GIZA++ -T english.vcb -S arabic.vcb -C arabic_english.snt
    Output: I found a nice PPT file through Google :) that describes the contents of all the output files and their jobs.
That's all for now, see you next post :).

7 comments:

  1. baraka Allah fek brother..
    I'm a beginner in Giza++, and your article helped me a lot :)

    Thank you,
    Ahlam

    ReplyDelete
  2. Wa3alykum esSalm wa Rahmatou'LLahi wa Barakatouhou ...Jazakum Allah 3anna Khayran..c'était vraiment très utile.

    ReplyDelete
  3. Wa3alykum esSalm wa Rahmatou'LLahi wa Barakatouhou ...Jazakum Allah 3anna Khayran..it was really very helpful, especially the file.ppt you joined at the end of the post ..once agian thank you very much brother :))

    ReplyDelete
  4. eSslamou 3laykum Wa rahmatou'LLah ,
    i hope someone will read this because I am really hopeless,
    Well I am working on language processing too and I need to do an alignement using the GIZA++, I compiled the GIZA++ very well , I tried with to align two files source.txt(written in french) and target.txt(written in english) and I got the alignement I wanted but when I try to do it with source.txt(written in arabic) and target.txt(written in arabic dialect) some files are generated but with 0ko like(user.ti.final , user.d4.final, user.d4.final and some others).It is the first time I use this Tool and I don't understand why it is not working (is it because I m trying to make an alignement betwenn two texts written in the same language ??)

    ReplyDelete
  5. Obviously it is not a problem of language I tried with a small texts (about one KO) src.txt(written in dialect) and trg.txt(written in Fus'ha) and it worked, so why it is not working with the real corpus(159 ko)

    ReplyDelete
  6. Well nobody was able to help me or maybe nobody read what I wrote but in case someone faces the same problem : it has nothing to do with the language the problem is in the corpus, it has to be organised a sentence per ligne.

    ReplyDelete
  7. I started to use GIZA++ with your help and it worked. now i need to know how to install other tools related to Statistical Machine Translation. there are some article on internet but that are not as clear as you have given. so please do it. i need that urgently

    ReplyDelete