Tuesday, October 2, 2012
Statistical Denormalization for Arabic Text
Friday, July 3, 2009
GIZA++ Issues
- Edit giza-pp\GIZA++-v2\collCounts.cpp, delete that function definition...
- And at the defenition...
template<classTRANSPAIR>
doublecollectCountsOverNeighborhoodForSophisticatedModels(constMoveSwapMatrix
{
return 0.0;
}
template<classTRANSPAIR,classMODEL>
doublecollectCountsOverNeighborhoodForSophisticatedModels(constMoveSwapMatrix
Change the last parameter of the following function call…
from:
_collectCountsOverNeighborhoodForSophisticatedModels
to:
_collectCountsOverNeighborhoodForSophisticatedModels
- Edit \giza-pp\GIZA++-v2\Makefile and modify "-rm -f snt2plain.out plain2snt.out snt2cooc.out GIZA++" to "GIZA++.exe"
- Edit \giza-pp\mkcls-v2\Makefil and modify "-rm -f *.o mkcls" to "mkcls.exe"
GIZA++ commands
- Assuming that you are having a bin folder that contain all the output files after you've built GIZA++
- Assuming the parallel corpus files reside inside that bin folder, e.g. arabic.txt and english.txt
- Execute the following commands under Cygwin...
- Convert the plain text to GIZA++ format
Run: ./plain2snt.out english.txt arabic.txt
Output: \bin\arabic.vcb
\bin\arabic_english.snt
\bin\english.vcb
\bin\english_arabic.snt - Generate Word vs. Freqency (classes) and Freqency vs. Words (cats) files
Run: ./mkcls -penglish.txt -Venglish.vcb.classes
./mkcls -parabic.txt -Venglish.vcb.classes
Output: \bin\arabic.vcb.classes
\bin\arabic.vcb.classes.cats
\bin\english.vcb.classes
\bin\english.vcb.classes.cats
Note: Almost every where they are saying executing that command is optional and I tried running GIZA++ with and without generating classes files and it worked fine - Finally run GIZA++
Run: ./GIZA++ -T english.vcb -S arabic.vcb -C english_arabic.snt
Or
./GIZA++ -T english.vcb -S arabic.vcb -C arabic_english.snt
Output: I found a nice PPT file through Google :) that describes the contents of all the output files and their jobs.
Monday, June 29, 2009
How to setup GIZA++ on Windows?
el salamo 3alikom wa ra7matoo ALLAH wa baraktoo
I’ll illustrate in that post how to build GIZA++ under Windows to a pure Windows user.
What is GIZA++?
GIZA++ is a statistical machine translation toolkit one of its main features of the package is text alignment.
Note: The version of GIZA++ I’m going to talk about is uploaded on Mar 20, 2009.
Installation Steps...
- Download GIZA++
- Extract the files any location you want, using WinRAR for example.
- Edit \giza-pp\GIZA++-v2\Makefile, Search for DBINARY_SEARCH_FOR_TTABLE and delete it - more info.
- Download Cygwin and choose "gcc-g++", "binutils" and "make" at the Select Packages.
- Open Cygwin Bash Shell, navigate to the place you’ve extracted the GIZA++ files, and then start building the project using make command.
- Outfiles...
\giza-pp\GIZA++-v2\GIZA++.exe
\giza-pp\GIZA++-v2\snt2plain.out
\giza-pp\GIZA++-v2\plain2snt.out
\giza-pp\GIZA++-v2\snt2cooc.out
\giza-pp\mkcls-v2\mkcls.exe
This is my first step to start working with GIZA++. Wait for coming posts about how to use GIZA++, In-Sha'a-ALLAH
Hope you liked the post :).
Sunday, June 28, 2009
Hello World !
Hay everyone!
My name is Mohammed Moussa and I've decided to do my master's thesis in Natural Language Processing and most probably will be at Arabic Statistical Machine Translation.
I'm a beginner at that field, I almost know nothing about it except that I love it :).
So I decided to create a technical blog to share with you my knowledge as soon as I have one :D.
Since I'm still learning about NLP so most probably my posts will be about NLP in general and about the basic stuff needed to be known before start working at NLP like statistic, digital communication concepts, AI concepts, tools installation and configuration, etc....
Let's see together where I'm going with that blog, as I yet don't know :).
ahh, I almost forgot, I'm a Software Development Engineering Team Lead at NTP Software and doing my master's at Arab Academy for Science, Technology & Maritime Transport
Wish me luck and see you in the next post :).