In Kakao Brain, Pororo, an integrated natural language framework capable of responding to various natural language tasks, has been released as open source. Pororo stands for Platform Of neuRal mOdels for natuRal language prOcessing and you can think of it as a similar purpose to HuggingFace. Pororo is not only more optimized for Korean tasks, but also has the advantage of supporting audio processing such as speech recognition.
Here is an example of using Pororo to perform a simple Korean MRC task. [Excerpt from Pororo github]
Pororo's technical documentation (https://kakaobrain.github.io/pororo/), the main tasks currently included are:
- Text Classification
- Automated Essay Scoring
- Age Suitability Prediction
- Natural Language Inference
- Paraphrase Identification
- Review Scoring
- Semantic Textual Similarity
- Sentence Embedding
- Sentiment Analysis
- Zero-shot Topic Classification
- Sequence Tagging
- Contextualized Embedding
- Dependency Parsing
- Fill-in-the-blank
- Machine Reading Comprehension
- Named Entity Recognition
- Part-of-Speech Tagging
- Semantic Role Labeling
- Seq2Seq
- Constituency Parsing
- Grammatical Error Correction
- Grapheme-to-Phoneme
- Phoneme-to-Grapheme
- Machine Translation
- Paraphrase Generation
- Question Generation
- Text Summarization
- Word Sense Disambiguation
- Misc
- Automatic Speech Recognition
- Image Captioning
- Collocation
- Lemmatization
- Morphological Inflection
- Optical Character Recognition
- Speech Synthesis
- Tokenization
- Word Translation
- Word Embedding
I haven't tested each one of them, but it contains a lot of different tasks, so I think it will be helpful in many ways in terms of research. There have been various open source projects developed for Korean natural language processing, but I think that there have been few frameworks composed by integrating several tasks into one. In the future, I look forward to the continuous improvement of Pororo's own performance as well as many 3rd party open sources based on this.
Here is the Pororo github link: