Colombo: The emerging AI platform Chat2Find has taken a major step forward in multilingual artificial intelligence with the release of its foundational Chat2Find Base model, now available via Hugging Face.
The Base model forms the backbone of Chat2Find’s upcoming open-weight model suite and is designed to support Sinhala, Tamil, and English, including naturally code-mixed variations such as Singlish and Tanglish. The release marks a significant milestone for locally grounded AI development, particularly in low-resource language ecosystems.
Built on a 255M+ Token Corpus
The Chat2Find Base model is trained through continual pre-training (CPT) on the recently released Chat2Find Corpus a large-scale conversational dataset containing over 255 million tokens across nearly 280,000 records.
Unlike many global datasets, the corpus is derived from real user interactions, capturing authentic linguistic patterns, cultural nuances, and regional knowledge specific to South Asia. This gives the Base model a strong advantage in understanding local context, informal speech, and multilingual switching areas where traditional models often struggle.
Foundation for Advanced AI Models
Industry observers note that “base models” represent the pretrained core of AI systems, which can later be fine-tuned for tasks such as instruction-following or reasoning.
Chat2Find has confirmed that the Base model will be followed by specialized variants, including:
- Chat2Find Instruct – optimized for task execution and prompts
- Chat2Find Reasoning – focused on complex problem-solving
These models are expected to expand the capabilities of AI tools in education, business, and public services across Sri Lanka and the wider region.
Access Base Model: Hugging Face and Lanka Data Net (Local Repository)
Boost for Sri Lanka’s AI Ecosystem
The release is being seen as a breakthrough for Sri Lanka’s AI landscape, where access to high-quality, locally relevant training data has historically been limited. By open-sourcing both the dataset and model components under permissive licensing, Chat2Find is enabling researchers, startups, and institutions to build next-generation applications tailored to regional needs.
Analysts say the initiative could position Sri Lanka as a regional hub for multilingual AI innovation, particularly in South Asian language technologies.
Looking Ahead
With the Base model now accessible to developers worldwide, attention is shifting to real-world deployments and fine-tuned applications. As global AI development increasingly moves toward open ecosystems, Chat2Find’s approach highlights the growing importance of localized data and inclusive language representation in shaping the future of artificial intelligence.

