All projects
AI / Machine Learning

Nepali Image Captioning

Modern Image-to-Nepali Text Generation with Transformers

Year · 2024–presentBy Ayush Niroula

A high-performance deep learning system that automatically generates descriptive Nepali captions for images. This project implements a state-of-the-art CNN-Transformer architecture, utilizing InceptionV3 for image feature extraction and a custom-built Transformer decoder for natural language generation in the Devanagari script.

Highlights

  • CNN-Transformer hybrid architecture using InceptionV3 and Multi-Head Attention for complex visual-linguistic mapping
  • Customized Devanagari text processing engine for cleaning, tokenization, and handling of Nepali script nuances
  • Advanced GPU training pipeline with XLA compilation and mixed-precision (float16) for accelerated performance

Tech stack

  • TensorFlow
  • Keras
  • Python
  • Transformers
  • InceptionV3
  • NLTK
  • NumPy

Links

← Back to all projects