Google Summer of Code 2008 Ideas
The following is our ideas page for Google Summer of Code 2008. At Translate.org.za we are passionate about language and that is probably the most important requirement for any student working with us, you must be passionate about language and localisation.
Skills:
The skills vary depending on the task but in general you will need to be:
- Able to program or translate (some require both or one of)
- Be multilingual
- Be passionate about language
Projects:
The projects that we have chosen fit into a few categories.
- Those that directly benefit South African languages
- Those that produce valuable outputs that help all languages in that they produce some software or resource that is helpful for all
- Those that provide assistance to many other languages
In each project we have graded provided sections that highlight:
- Grade: how hard is the project
- Impact: one of the categories listed above
- Description: Some background and what needs to be done
- Poke the code: Where to look in the code
- Further reading: some helpful places to look for further information
We trust that you will find something in this list that catches your fancy.
Project: Afrikaans spell checker improvements
Grade: Medium
Impact: Direct impact on the language chosen, Bantu language being reusable by many future African spell checker creators.
Description: The Translate.org.za Afrikaans spell checker is based on MySpell. With advances in spell checking which are found in Hunspell we are able to create a much better Afrikaans spell checker that can easily handle aglutenation (samestellings).
For this project you will need to do the following. You will need to review the existing Afrikaans word list and classify words according to their parts of speach. You will need to wittle out bad spellings and will also need to work out methods of harvesting and adding new words. One method of obtaining new words would be to allows OpenOffice.org and Firefox users to submit their word lists to you for analysis.
With the cleaning and updating in progress you will need to adapt hunspell to add Afrikaans spelling rules. This might involve some coding in Hunspell and adding some features unique to Afrikaans.
Finally you will need to package and test the new spell checker.
You can replace Afrikaans with any other South African language. A language in the Nguni group would turn this into a Hard project as 1) Nguni is hard to spell, 2) There are fewer validated wordlists. A language in the Sotho group (Northern Sotho, Southern Sotho or Tswana) would be easier. By choosing one of these languages you would make your work reusable by many other Bantu languages (which cover most of Sub-Saharan Africa)
Poke the code:
Further reading:
Project: Porting hunspell to Microsoft CSAPI
Grade: Hard
Impact: Every spell checker user both on Windows and Linux
Description: Microsoft Office uses the CSAPI (Common Spell Checking API). This API allows vendors to create spell checkers that can work with the Microsoft Office suite. This project would require the implementation of a CSAPI v3 spell checker that uses Hunspell as a spell checking engine.Although Microsoft Office is not an Open Source product (obviously) there are a number of benefits in doing this work. It would allow users of Microsoft Office running on WINE to make use of platform spell checkers creating a much better integrated feel. It would be possible to create a spell checker that although based on Hunspell can also query a spell checking server, this would have benefits in that an organisation planning a migration to another office platform could eliminate spell checking from the equiation and provide a smooth transition from Office to for instance OpenOffice.org.
Your biggest hurdle is the fact that although in about 1997 the CSAPI v1 was published there is currently no way of getting this documentation unless you sign an NDA and a Microsoft related EULA. So you can revers engineer the protocols, the best approach being to implement this based on work done on the Irish spell checker.
Your final result should be a CSAPI spell checker that installs on Windows easily for an end user. That can retrieve and install any Hunspell spell checkers. It should also work with WINE (and might even become part of WINE) and allow the use of platform spell checkers. You might want to document the interaction with the checker so that it can be used as a Windows platform spell checker.
Poke the code:
- Enchant/Hunspell spell checking server
Further reading:
Project: Porting Hunspell to the Mac OS X platform spell checker framework
Grade: Medium
Impact: All spell checker users on Mac products, possibly also OpenOffice.org port to Mac OS X
Description This is similar to the above CSAPI spell checker work except working with Apples better documented AppleSpell platform spell checker. This would invovle making Hunspell work as a backend provider to AppleSpell.
For the brave this could be merged into the CSAPI work.
This work will probably help in the porting of OpenOffice.org to Mac OS X. Although work would still need to be done on OpenOffice.org to ensure that it can use the OS X backend. By porting Hunspell to run as a backend you would have solved half of the problem.
Your solution should deliver a fully packaged version of Hunspell that works as an AppleSpell backend.
If you wish to go the whole way and actually work on the OpenOffice.org port thus ensuring that the application can actually make use of the platform checker then this would definately be a hard project.
Poke the code:
- As above for the CSAPI work
Further reading:
- As above for the CSAPI
- OpenOffice.org Mac OS X porting (spell checking)
- Further OOo spell checking documentation
Project: Decathlon translation quality review
Grade: Easy
Impact: Almost anyone translating important applications
Description: Translate.org.za is running a project we call Decathlon. This sees us working with 10 projects with the aim of helping them to increase the number of localisers and to improve the quality of their source text.
The projects chosen are aimed at a number of criteria but many are potentially high impact in terms of end-users and in terms of promotion of localisation and Open Source.
Your work would be to assist in the review of these projects. Thus you will need to be able to translate. If you see errors you will need to create bug reports, work with the upstream programmers and get these fixed. Much of your work will thus traverse a number of programming languages and projects. Thus you need a wide skill but not indepth across many programming languages. You will also need to be able to communicate with many different types of developers and work within their different bug reporting systems and paradigms.
Poke the code:
Further reading:
Project: Translating Mozilla Firefox into a South African language
Grade: Easy
Impact: Depends on what language you choose your terminology list will be reusable. The tools you improve and scripts you write will help all Mozilla localisers.
Description: Translate.org.za translate Mozilla Firefox for v1.5. The Afrikaans version has been kept up to date by volunteers. But most of the others have fallen out of date.
Your task will be to take the translation from its current state and fully translate it into your chosen language. You will need to have language expertise in the given language (more then this imply being your mother tongue - we would require some demonstration about your passion for localization or your skills through your course of study). Plus you will be expected to be able to program so that you can fix bugs in the Translate Toolkit (this converts from Mozilla format to PO, this will require skills in Python).
You will need to deliver. Improved scripts to automate the upgrading of your translations. A 100% translated version of Firefox in your chosen South African language. A fully integrated build that you have begun to take through the Mozilla process and have hopefully completed.
Depending on your programming skill level you can improve the Translate Toolkit to cover the missing file formats such as xhtml and RDF this making the full Mozilla suite localisable in Gettext PO. If you do this work the grade will increase to Medium.
Poke the code:
- For any Translate Toolkit improvements see:
Further reading:
Project: Full set of free Venda fonts
Grade: Easy
Impact: All Venda speakers will now see their special characters ḓṱḽṋṅ in any free fonts
Description: Venda requires 5 additional characters to be displayed correctly. Most people either cannot type them or complain when thy see a square box indicating a missing glyph. Most computer users are uneducated when it comes to issues like missing fonts. It is so bad that the mother tongue name for Venda is often spelled incorrectly as Tshivenda instead of Tshivenḓa.
These special characters are only used by Venda and are part of Unicode. So a document or website in a Unicode based encoding will be able to display the characters if they are present.
This task will involve the student adding the 5 glyphs (plus capital version in all style) of the various free fonts available on any Linux platform. Including fonts used by WINE. You will use the free fontforge program to develop and test the fonts. You will need to create the combined as well as the combining characters.
Because this work is across a number of fonts you will need to interact with many project, some are quite active while others change at a snails pace. We expect you to push hard for inclusion of your changes.
The end result should be that any fonts used on a Free Software platform should have Venda characters.
Poke the code:
Further reading:
Project: Template Project
Grade: Easy, Medium, Hard
Impact: Impact on the community, broader users
Description: Some description
Poke the code:
- Some code references
Further reading:
- Reading to help you