Introducing Embedding_Siyabasa
An Advanced Embeddings API for Sinhala Language
Leverage our UgannA_SiyabasaV2 model to build applications that deeply understand the semantics and context of Sinhala text.
What Are Embeddings?
Text embeddings are numerical representations—or vectors—of words and sentences. They capture the semantic relationships and contextual meaning of language, allowing machines to process and understand text in a way that's similar to human comprehension. Models trained specifically on a single language provide a more nuanced and accurate understanding than broad, multilingual models.
රජ
බිසව
පුරුෂයා
කාන්තාව
In vector space, the relationship between 'රජ' and 'බිසව' is analogous to the one between 'පුරුෂයා' and 'කාන්තාව'.
Technical Specifications
Factual, transparent details about our embedding model. No exaggerated claims.
Model Name
UgannA_SiyabasaV2
Architecture
FastText (300D)
Vocabulary Size
~500,000
Sinhala words & sub-words
Language Focus
Sinhala Only
Optimized for linguistic nuance
Vector Dimensions
300
Access Model
Free API
via Hugging Face Spaces
Primary Use Cases
Integrate Siyabasa embeddings to power intelligent features in your applications.
Retrieval-Augmented Generation (RAG)
Enhance LLMs by grounding them in your private Sinhala knowledge base. Use our embeddings to find relevant documents for accurate, context-aware answers.
Semantic Search
Build search systems that understand user intent, not just keywords. Deliver more precise results by matching queries based on contextual meaning.
Text Classification
Automate the categorization of Sinhala text. Ideal for sentiment analysis, topic modeling, content moderation, and customer support ticket routing.
Free API
Simple, predictable API endpoints to integrate into your stack.
The Embedding_Siyabasa API provides high-quality text embedding models specifically designed for the Sinhala language. Generate embeddings for Sinhala words, phrases, and sentences using our latest model UgannA_SiyabasaV2. These language-specific embeddings power advanced NLP tasks such as semantic search, text classification, and document clustering, delivering more accurate and context-aware results than traditional keyword-based approaches.
Get Started in Minutes
Follow these simple steps to start using the API.
Explore the API
Visit our Hugging Face Space to test the API directly in your browser. No API key or signup required.
Review Endpoints
Understand the simple request and response formats for the /embed endpoint to plan your integration.
Integrate Code
Copy our Python or JavaScript snippets to make API calls from your application backend or frontend.
Frequently Asked Questions
Is the Embedding Siyabasa API completely free?
Yes, the API is hosted as a free, public service on Hugging Face Spaces. This is suitable for development, testing, and low-traffic applications. For high-volume, performance-critical enterprise use, please contact us to discuss dedicated deployment options.
What makes this different from large multilingual models?
Specialization. Our model is trained exclusively on a comprehensive Sinhala corpus. This language-specific focus allows it to capture the unique syntax, idioms, and contextual nuances of Sinhala more effectively than a general-purpose multilingual model.
Are there any rate limits for the free API?
Public Hugging Face Spaces run on shared hardware, so there is no hard-coded rate limit from our side. However, usage is subject to fair-use policies and resource availability on the platform. If you anticipate high traffic, a dedicated instance is the recommended solution.
Can I use this API for a commercial project?
Absolutely. The model and the public API are available for both personal and commercial use without any licensing fees.
