typesense search engine
Typesense is an open-source, typo-tolerant search engine that offers a blazing-fast and developer-friendly experience. Built from the ground up with simplicity, speed, and accuracy in mind, it serves as a modern alternative to more complex search systems like Elasticsearch.
Use Cases
- E-commerce platforms looking to provide fast and accurate product search with typo-tolerance.
- Knowledge bases and documentation sites requiring intuitive and quick search experiences.
- Mobile applications and websites needing real-time search capabilities without heavy infrastructure overhead.
What Makes Typesense Unique
- Typo Tolerance: Typesense automatically handles typos in search queries without requiring manual configuration.
- Lightning-Fast Performance: It delivers search results in milliseconds, even for large datasets.
- Easy Setup: Typesense is simple to install and manage, with a straightforward API for developers.
- Relevance Without Complexity: Unlike other search engines, Typesense does not require complicated scoring or ranking configurations to achieve accurate results.
- Open Source: Typesense is freely available under an open-source license, making it accessible for businesses of all sizes.
To learn more, visit the official Typesense Documentation.
Setting Up a Typesense Collection
Here we try to understanding typesense and its interaction using JavaScript.
Typesense is a fast, open-source search engine optimized for full-text search and designed to be easy to use. Sysadmins or developers interact with the Typesense server via its HTTP API or using official client libraries like the JavaScript client.
1. Setting Up Typesense with JavaScript
Connect to the Typesense Server:
const Typesense = require('typesense');
const client = new Typesense.Client({
nodes: [
{
host: 'localhost', // Replace with your server's address
port: 8108,
protocol: 'http',
},
],
apiKey: '<your_api_key>',
connectionTimeoutSeconds: 2,
});
2. Creating a Collection
Collection Definition Format:
A collection in Typesense is defined by its schema. Here's an example schema for a books collection:
const schema = {
name: 'books',
fields: [
{ name: 'id', type: 'string' },
{ name: 'title', type: 'string' },
{ name: 'author', type: 'string' },
{ name: 'publication_year', type: 'int32' },
{ name: 'genres', type: 'string[]', facet: true }, // Facetable field
],
default_sorting_field: 'publication_year',
};
Create the Collection:
client.collections().create(schema)
.then(response => console.log('Collection created:', response))
.catch(error => console.error('Error creating collection:', error));
3. What Happens When a Collection is Created
The Typesense server allocates space in its internal data directory.
The collection schema is stored, and Typesense prepares indexing structures based on the schema.
The collection is immediately ready for document insertion and querying.
4. Testing That a Collection Exists
List All Collections:
client.collections().retrieve()
.then(response => console.log('Collections:', response))
.catch(error => console.error('Error retrieving collections:', error));
Check a Specific Collection:
client.collections('books').retrieve()
.then(response => console.log('Collection details:', response))
.catch(error => console.error('Collection not found:', error));
5. Deleting a Collection
To delete a collection (e.g., books):
client.collections('books').delete()
.then(response => console.log('Collection deleted:', response))
.catch(error => console.error('Error deleting collection:', error));
Typesense Collection Design and Organization
Typesense is a highly efficient search engine designed to handle structured data, making it ideal for use cases like website search. A collection in Typesense is a logical grouping of documents with a specific schema, similar to a table in a relational database.
Key Concepts in Typesense Collection Design
- Collections: A collection is the top-level data structure in Typesense, containing documents that follow a predefined schema.
- Documents: Each document represents a searchable record within a collection.
- Schema Design: A schema defines the structure of documents in a collection, including field names, types, and attributes like faceting, sorting, and indexing.
Best Practices for Organizing Collections
- Separate Collections by Entity Type: Group similar data into distinct collections. For example, a website might have collections for products, articles, and users.
- Design for Search Use Cases: Tailor each collection schema to meet specific search requirements. Include fields that users will search or filter by.
- Use Meaningful Field Names: Keep field names descriptive but concise to ensure clarity.
- Leverage Facets and Sorting: Define facetable and sortable fields to enable advanced filtering and sorting options.
Example: Setting Up a Website Search Engine
Let’s consider a website with an e-commerce and blog section. We’ll create two collections: products for the e-commerce store and articles for the blog.
1. Collection Schema for Products
The typesense docs show this useful schema:
const productSchema = {
name: 'products',
fields: [
{ name: 'id', type: 'string' },
{ name: 'name', type: 'string' },
{ name: 'description', type: 'string' },
{ name: 'price', type: 'float', facet: true },
{ name: 'category', type: 'string', facet: true },
{ name: 'rating', type: 'float', facet: true },
{ name: 'in_stock', type: 'bool', facet: true },
],
default_sorting_field: 'price',
};
2. Collection Schema for Articles
The typesense docs show this useful schema:
const articleSchema = {
name: 'articles',
fields: [
{ name: 'id', type: 'string' },
{ name: 'title', type: 'string' },
{ name: 'content', type: 'string' },
{ name: 'author', type: 'string', facet: true },
{ name: 'tags', type: 'string[]', facet: true },
{ name: 'published_date', type: 'int64', facet: true },
],
default_sorting_field: 'published_date',
};
Steps to Set Up Collections
Create Collections: Use the following JavaScript code to create the collections:
client.collections().create(productSchema)
.then(response => console.log('Products collection created:', response))
.catch(error => console.error('Error creating products collection:', error));
client.collections().create(articleSchema)
.then(response => console.log('Articles collection created:', response))
.catch(error => console.error('Error creating articles collection:', error));
Index Documents: Add documents to the collections:
client.collections('products').documents().create({
id: '1',
name: 'Wireless Mouse',
description: 'A sleek wireless mouse with ergonomic design.',
price: 25.99,
category: 'Electronics',
rating: 4.5,
in_stock: true,
});
Test Search Queries: Test the search functionality:
client.collections('products').documents().search({
q: 'mouse',
query_by: 'name,description',
}).then(response => console.log('Search results:', response));
Testing and Maintenance
- Verify Collection Setup: Use client.collections().retrieve() to list and verify collections.
- Monitor Performance: Ensure Typesense is performing well under load, especially if your website has high traffic.
- Backup and Restore: Regularly back up the data-dir to avoid data loss in case of failure.
By organizing Typesense collections based on use cases and adhering to best practices in schema design, a sysadmin can build a powerful, efficient search engine for a website.
This setup ensures fast, relevant search results while enabling advanced filtering and sorting for a seamless user experience.
Typesense Docs Discuss Hierarchy
Source: typesense.org/docs/guide/organizing-collections.html#hierarchy
Hierarchy
"In Typesense, a cluster can have one or more nodes and each node stores an exact replica of the entire dataset you send to the cluster.
"A cluster can have one or more collections, and a collection can have many documents that share the same/similar structure (fields/attributes).
"For example, say you have CRM system that stores details of people and companies. To store this data in Typesense, you would create: (1) a collection called people that stores individual documents containing information about people (eg: attributes like name, title, company_name, etc) and (2) a collection called companies that stores individual documents containing information about companies (eg: attributes like name, location, num_employees, etc).
"Here's a visual representation of this hierarchy:
[Typesense Cluster] ===has-many===> [Collections] ===has-many===> [Documents] ===has-many===> [Attributes/Fields]
Think of of a Collection like a Relational Database Table
The docs compare a collection to a relational database table. In this way, collection definitions should correspond to types of documents. For example:
- products collection to store all product records
- blog_articles collection to store all blog article
Syncing Data into Typesense
Syncing with Typesense ensures that your search index stays up-to-date with the latest data from your source systems.
This is crucial for delivering accurate and relevant search results to users, especially for dynamic applications like e-commerce sites, blogs, or real-time data platforms.
Regular synchronization minimizes discrepancies, enhances search performance, and ensures that newly added, updated, or deleted records are reflected promptly, providing a seamless and reliable search experience.
See the docs: typesense.org/docs/guide/syncing-data-into-typesense.html#sync-changes-in-bulk-periodically
collection schema for a webpages of a website
This schema provides a solid foundation for enabling website search, allowing users to search by title, content, description, and keywords, and filter by URL, keywords, and last updated date.
Here's a schema definition for a collection that would enable a website to be searchable:
{
"name": "website_search",
"fields": [
{
"name": "title",
"type": "string",
"indexing": {
"tokenized": true,
"filterable": true
}
},
{
"name": "content",
"type": "string",
"indexing": {
"tokenized": true,
"filterable": false
}
},
{
"name": "url",
"type": "string",
"indexing": {
"tokenized": false,
"filterable": true
}
},
{
"name": "description",
"type": "string",
"indexing": {
"tokenized": true,
"filterable": false
}
},
{
"name": "keywords",
"type": "string[]",
"indexing": {
"tokenized": false,
"filterable": true
}
},
{
"name": "last_updated",
"type": "datetime",
"indexing": {
"tokenized": false,
"filterable": true
}
}
]
}
This schema defines a collection called `website_search` with six fields:
1. `title`: The title of the webpage.
* Type: `string`
* Tokenized: `true` (to allow for full-text search)
* Filterable: `true` (to allow for filtering by title)
2. `content`: The HTML content of the webpage.
* Type: `string`
* Tokenized: `true` (to allow for full-text search)
* Filterable: `false` (since this field is primarily used for searching, not filtering)
3. `url`: The URL of the webpage.
* Type: `string`
* Tokenized: `false` (since URLs are typically not tokenized)
* Filterable: `true` (to allow for filtering by URL)
4. `description`: A short description of the webpage (e.g., the meta description tag).
* Type: `string`
* Tokenized: `true` (to allow for full-text search)
* Filterable: `false` (since this field is primarily used for searching, not filtering)
5. `keywords`: A list of keywords associated with the webpage.
* Type: `string[]` (an array of strings)
* Tokenized: `false` (since individual keywords should be treated as a single unit)
* Filterable: `true` (to allow for filtering by keyword)
. `last_updated`: The date and time when the webpage was last updated.
* Type: `datetime`
* Tokenized: `false` (since dates and times are not typically tokenized)
* Filterable: `true` (to allow for filtering by update date)