AI / Machine Learning

Nepali Image Captioning

Modern Image-to-Nepali Text Generation with Transformers

Year · 2024–presentBy Ayush Niroula

A high-performance deep learning system that automatically generates descriptive Nepali captions for images. This project implements a state-of-the-art CNN-Transformer architecture, utilizing InceptionV3 for image feature extraction and a custom-built Transformer decoder for natural language generation in the Devanagari script.

Highlights

CNN-Transformer hybrid architecture using InceptionV3 and Multi-Head Attention for complex visual-linguistic mapping
Customized Devanagari text processing engine for cleaning, tokenization, and handling of Nepali script nuances
Advanced GPU training pipeline with XLA compilation and mixed-precision (float16) for accelerated performance

Tech stack

TensorFlow
Keras
Python
Transformers
InceptionV3
NLTK
NumPy

Source code

← Back to all projects

Nepali Image Captioning

Highlights

Tech stack

Links