Introducing Embedding_Siyabasa

An Advanced Embeddings API for Sinhala Language

Leverage our UgannA_SiyabasaV2 model to build applications that deeply understand the semantics and context of Sinhala text.

What Are Embeddings?

Text embeddings are numerical representations—or vectors—of words and sentences. They capture the semantic relationships and contextual meaning of language, allowing machines to process and understand text in a way that's similar to human comprehension. Models trained specifically on a single language provide a more nuanced and accurate understanding than broad, multilingual models.

👑

රජ

👸

බිසව

👨

පුරුෂයා

👩

කාන්තාව

In vector space, the relationship between 'රජ' and 'බිසව' is analogous to the one between 'පුරුෂයා' and 'කාන්තාව'.

Technical Specifications

Factual, transparent details about our embedding model. No exaggerated claims.

Model Name

UgannA_SiyabasaV2

Architecture

FastText (300D)

Vocabulary Size

~500,000

Sinhala words & sub-words

Language Focus

Sinhala Only

Optimized for linguistic nuance

Vector Dimensions

300

Access Model

Free API

via Hugging Face Spaces

Primary Use Cases

Integrate Siyabasa embeddings to power intelligent features in your applications.

Retrieval-Augmented Generation (RAG)

Enhance LLMs by grounding them in your private Sinhala knowledge base. Use our embeddings to find relevant documents for accurate, context-aware answers.

Semantic Search

Build search systems that understand user intent, not just keywords. Deliver more precise results by matching queries based on contextual meaning.

Text Classification

Automate the categorization of Sinhala text. Ideal for sentiment analysis, topic modeling, content moderation, and customer support ticket routing.

Free API

Simple, predictable API endpoints to integrate into your stack.

The Embedding_Siyabasa API provides high-quality text embedding models specifically designed for the Sinhala language. Generate embeddings for Sinhala words, phrases, and sentences using our latest model UgannA_SiyabasaV2. These language-specific embeddings power advanced NLP tasks such as semantic search, text classification, and document clustering, delivering more accurate and context-aware results than traditional keyword-based approaches.

Get Started in Minutes

Follow these simple steps to start using the API.

1

Explore the API

Visit our Hugging Face Space to test the API directly in your browser. No API key or signup required.

2

Review Endpoints

Understand the simple request and response formats for the /embed endpoint to plan your integration.

3

Integrate Code

Copy our Python or JavaScript snippets to make API calls from your application backend or frontend.

Go to API Console

Frequently Asked Questions

Is the Embedding Siyabasa API completely free? +

Yes, the API is hosted as a free, public service on Hugging Face Spaces. This is suitable for development, testing, and low-traffic applications. For high-volume, performance-critical enterprise use, please contact us to discuss dedicated deployment options.

What makes this different from large multilingual models? +

Specialization. Our model is trained exclusively on a comprehensive Sinhala corpus. This language-specific focus allows it to capture the unique syntax, idioms, and contextual nuances of Sinhala more effectively than a general-purpose multilingual model.

Are there any rate limits for the free API? +

Public Hugging Face Spaces run on shared hardware, so there is no hard-coded rate limit from our side. However, usage is subject to fair-use policies and resource availability on the platform. If you anticipate high traffic, a dedicated instance is the recommended solution.

Can I use this API for a commercial project? +

Absolutely. The model and the public API are available for both personal and commercial use without any licensing fees.