TRNLTK is a natural language processing library for Turkish.
Why another NLP library? Taken from project site :
I’ve inspected other other approaches and I saw that tracking the problems are very hard with them. For example, one approach is creating a suffix graph by defining what suffix can come after other suffix. But with that approach it is impossible to have an overview of the graph, since there would be thousands of nodes and edges.
You can dive into the project by checking the kick start.
TRNLTK is built because of the limitations of most used Turkish NLP library Zemberek.
TRNLTK and Zemberek teams once tried to work together but it didn’t work because of the time constraints. We’re good friends with Akin brothers (authors of Zemberek) and both teams try to provide a proper solution to Turkish NLP needs.
What makes TRNLTK different from Zemberek are:
- It is highly customizable
- It is very easy to extend
- It is less hacky
- It is much easier to understand the underlying graph
- It is much easier to maintain
- Morphologic parser offers more parse results
Zemberek is better in these subjects:
- Project is more actively developed
- Performance is better
- It offers more tools such as very basic morphologic disambiguator
- It has a bigger community