TEXT NORMALIZATION AND SPELL CORRECTION OF PUNJABI TEXT
Keywords:
Tokenization, Normalization, NeuralNetworks, Transformer, Deep LearningAbstract
Text Normalization is the practice of mapping non-standardized words into standardized and canonical form.Training a language model of Punjabi language for Grammar Checker is very tedious task as notplentifulcorrect dataset for Punjabi linguistic is available. Collecting data from different sources may include noisy text, spelling errors and unwanted text etc. which require text normalization to make these data more suitable for training language model. In this paper we look at various texts’ normalization methods including spelling correction and highlight our framework for normalizing the Punjabi text.We treat text normalization of Punjabi text with neural machine translation approach. In this paper we propose ahybrid approach using deep learning-based encoder-decoder model using fine tuning of transformer with copy input method to do the task of text normalization and spelling correction of Punjabi language misspelled words and statistical technique which is highlighted and could be used as pre-processing or post-processing for enhancing the performance of our proposed model architecture. We trained and evaluated our proposed model on prepared Punjabi language parallel dataset consisting of correct-incorrect words.The experiments reveal that our proposed model touches significant performance on various semiotic classes and outperforms other existing models in terms of the accuracy.











