Arabic Natural Language Processing: GIZA++ commands

Friday, July 3, 2009

GIZA++ commands

el salamo 3alikom wa ra7amtoo ALLAH wa baraktoo

Today was my first time to try running GIZA++ and most of the references at the internet is stating what I'll state here at that post. However I'll try to talk more about GIZA++ as I go further with it at coming posts In Sha'a ALLAH

So let me introduce how to train using GIZA++.

Assuming that you are having a bin folder that contain all the output files after you've built GIZA++
Assuming the parallel corpus files reside inside that bin folder, e.g. arabic.txt and english.txt
Execute the following commands under Cygwin...

Convert the plain text to GIZA++ format
Run: ./plain2snt.out english.txt arabic.txt
Output: \bin\arabic.vcb
\bin\arabic_english.snt
\bin\english.vcb
\bin\english_arabic.snt
Generate Word vs. Freqency (classes) and Freqency vs. Words (cats) files
Run: ./mkcls -penglish.txt -Venglish.vcb.classes
./mkcls -parabic.txt -Venglish.vcb.classes
Output: \bin\arabic.vcb.classes
\bin\arabic.vcb.classes.cats
\bin\english.vcb.classes
\bin\english.vcb.classes.cats
Note: Almost every where they are saying executing that command is optional and I tried running GIZA++ with and without generating classes files and it worked fine
Finally run GIZA++
Run: ./GIZA++ -T english.vcb -S arabic.vcb -C english_arabic.snt
Or
./GIZA++ -T english.vcb -S arabic.vcb -C arabic_english.snt
Output: I found a nice PPT file through Google :) that describes the contents of all the output files and their jobs.

That's all for now, see you next post :).

7 comments:

AhlamApril 9, 2011 at 3:20 PM
baraka Allah fek brother..
I'm a beginner in Giza++, and your article helped me a lot :)

Thank you,
Ahlam
ReplyDelete
Replies
SaraTASMarch 22, 2012 at 1:46 PM
Wa3alykum esSalm wa Rahmatou'LLahi wa Barakatouhou ...Jazakum Allah 3anna Khayran..c'était vraiment très utile.
ReplyDelete
Replies
SaraTASMarch 22, 2012 at 1:52 PM
Wa3alykum esSalm wa Rahmatou'LLahi wa Barakatouhou ...Jazakum Allah 3anna Khayran..it was really very helpful, especially the file.ppt you joined at the end of the post ..once agian thank you very much brother :))
ReplyDelete
Replies
SaraTASMarch 30, 2012 at 8:45 PM
eSslamou 3laykum Wa rahmatou'LLah ,
i hope someone will read this because I am really hopeless,
Well I am working on language processing too and I need to do an alignement using the GIZA++, I compiled the GIZA++ very well , I tried with to align two files source.txt(written in french) and target.txt(written in english) and I got the alignement I wanted but when I try to do it with source.txt(written in arabic) and target.txt(written in arabic dialect) some files are generated but with 0ko like(user.ti.final , user.d4.final, user.d4.final and some others).It is the first time I use this Tool and I don't understand why it is not working (is it because I m trying to make an alignement betwenn two texts written in the same language ??)
ReplyDelete
Replies
SaraTASMarch 30, 2012 at 11:19 PM
Obviously it is not a problem of language I tried with a small texts (about one KO) src.txt(written in dialect) and trg.txt(written in Fus'ha) and it worked, so why it is not working with the real corpus(159 ko)
ReplyDelete
Replies
SaraTASApril 2, 2012 at 11:08 AM
Well nobody was able to help me or maybe nobody read what I wrote but in case someone faces the same problem : it has nothing to do with the language the problem is in the corpus, it has to be organised a sentence per ligne.
ReplyDelete
Replies
JassiFebruary 21, 2014 at 12:14 PM
I started to use GIZA++ with your help and it worked. now i need to know how to install other tools related to Statistical Machine Translation. there are some article on internet but that are not as clear as you have given. so please do it. i need that urgently
ReplyDelete
Replies

Add comment

Arabic Natural Language Processing

Friday, July 3, 2009

GIZA++ commands

7 comments:

About Me

Blog Archive