freeradiantbunny.org

freeradiantbunny.org/blog

typesense search engine

Typesense is an open-source, typo-tolerant search engine that offers a blazing-fast and developer-friendly experience. Built from the ground up with simplicity, speed, and accuracy in mind, it serves as a modern alternative to more complex search systems like Elasticsearch.

Use Cases

What Makes Typesense Unique

To learn more, visit the official Typesense Documentation.

Setting Up a Typesense Collection

Here we try to understanding typesense and its interaction using JavaScript.

Typesense is a fast, open-source search engine optimized for full-text search and designed to be easy to use. Sysadmins or developers interact with the Typesense server via its HTTP API or using official client libraries like the JavaScript client.

1. Setting Up Typesense with JavaScript

Connect to the Typesense Server:

    const Typesense = require('typesense');
                                                                    const client = new Typesense.Client({
                                                                      nodes: [
                                                                        {
                                                                          host: 'localhost',   // Replace with your server's address
                                                                          port: 8108,
                                                                          protocol: 'http',
                                                                        },
                                                                      ],
                                                                      apiKey: '<your_api_key>',
                                                                      connectionTimeoutSeconds: 2,
                                                                });

2. Creating a Collection

Collection Definition Format:

A collection in Typesense is defined by its schema. Here's an example schema for a books collection:

    const schema = {
                                                                      name: 'books',
                                                                      fields: [
                                                                        { name: 'id', type: 'string' },
                                                                        { name: 'title', type: 'string' },
                                                                        { name: 'author', type: 'string' },
                                                                        { name: 'publication_year', type: 'int32' },
                                                                        { name: 'genres', type: 'string[]', facet: true }, // Facetable field
                                                                      ],
                                                                      default_sorting_field: 'publication_year',
                                                                };

Create the Collection:

    client.collections().create(schema)
                                                                      .then(response => console.log('Collection created:', response))
                                                                .catch(error => console.error('Error creating collection:', error));

3. What Happens When a Collection is Created

The Typesense server allocates space in its internal data directory.

The collection schema is stored, and Typesense prepares indexing structures based on the schema.

The collection is immediately ready for document insertion and querying.

4. Testing That a Collection Exists

List All Collections:

    client.collections().retrieve()
                                                                      .then(response => console.log('Collections:', response))
                                                                .catch(error => console.error('Error retrieving collections:', error));

Check a Specific Collection:

    client.collections('books').retrieve()
                                                                      .then(response => console.log('Collection details:', response))
                                                                .catch(error => console.error('Collection not found:', error));

5. Deleting a Collection

To delete a collection (e.g., books):

    client.collections('books').delete()
                                                                      .then(response => console.log('Collection deleted:', response))
                                                                .catch(error => console.error('Error deleting collection:', error));

Typesense Collection Design and Organization

Typesense is a highly efficient search engine designed to handle structured data, making it ideal for use cases like website search. A collection in Typesense is a logical grouping of documents with a specific schema, similar to a table in a relational database.

Key Concepts in Typesense Collection Design

Best Practices for Organizing Collections

Example: Setting Up a Website Search Engine

Let’s consider a website with an e-commerce and blog section. We’ll create two collections: products for the e-commerce store and articles for the blog.

1. Collection Schema for Products

The typesense docs show this useful schema:

    const productSchema = {
                                                                      name: 'products',
                                                                      fields: [
                                                                        { name: 'id', type: 'string' },
                                                                        { name: 'name', type: 'string' },
                                                                        { name: 'description', type: 'string' },
                                                                        { name: 'price', type: 'float', facet: true },
                                                                        { name: 'category', type: 'string', facet: true },
                                                                        { name: 'rating', type: 'float', facet: true },
                                                                        { name: 'in_stock', type: 'bool', facet: true },
                                                                      ],
                                                                      default_sorting_field: 'price',
                                                                };

2. Collection Schema for Articles

The typesense docs show this useful schema:

    const articleSchema = {
                                                                      name: 'articles',
                                                                      fields: [
                                                                        { name: 'id', type: 'string' },
                                                                        { name: 'title', type: 'string' },
                                                                        { name: 'content', type: 'string' },
                                                                        { name: 'author', type: 'string', facet: true },
                                                                        { name: 'tags', type: 'string[]', facet: true },
                                                                        { name: 'published_date', type: 'int64', facet: true },
                                                                      ],
                                                                      default_sorting_field: 'published_date',
                                                                };

Steps to Set Up Collections

Create Collections: Use the following JavaScript code to create the collections:

    client.collections().create(productSchema)
                                                                      .then(response => console.log('Products collection created:', response))
                                                                      .catch(error => console.error('Error creating products collection:', error));
                                                                    client.collections().create(articleSchema)
                                                                      .then(response => console.log('Articles collection created:', response))
                                                                .catch(error => console.error('Error creating articles collection:', error));

Index Documents: Add documents to the collections:

    client.collections('products').documents().create({
                                                                      id: '1',
                                                                      name: 'Wireless Mouse',
                                                                      description: 'A sleek wireless mouse with ergonomic design.',
                                                                      price: 25.99,
                                                                      category: 'Electronics',
                                                                      rating: 4.5,
                                                                      in_stock: true,
                                                                });

Test Search Queries: Test the search functionality:

    client.collections('products').documents().search({
                                                                      q: 'mouse',
                                                                      query_by: 'name,description',
                                                                }).then(response => console.log('Search results:', response));

Testing and Maintenance

By organizing Typesense collections based on use cases and adhering to best practices in schema design, a sysadmin can build a powerful, efficient search engine for a website.

This setup ensures fast, relevant search results while enabling advanced filtering and sorting for a seamless user experience.

Typesense Docs Discuss Hierarchy

Source: typesense.org/docs/guide/organizing-collections.html#hierarchy

Hierarchy

"In Typesense, a cluster can have one or more nodes and each node stores an exact replica of the entire dataset you send to the cluster.

"A cluster can have one or more collections, and a collection can have many documents that share the same/similar structure (fields/attributes).

"For example, say you have CRM system that stores details of people and companies. To store this data in Typesense, you would create: (1) a collection called people that stores individual documents containing information about people (eg: attributes like name, title, company_name, etc) and (2) a collection called companies that stores individual documents containing information about companies (eg: attributes like name, location, num_employees, etc).

"Here's a visual representation of this hierarchy:

                                                                  [Typesense Cluster] ===has-many===> [Collections] ===has-many===> [Documents] ===has-many===> [Attributes/Fields]
                                                                

Think of of a Collection like a Relational Database Table

The docs compare a collection to a relational database table. In this way, collection definitions should correspond to types of documents. For example:

Syncing Data into Typesense

Syncing with Typesense ensures that your search index stays up-to-date with the latest data from your source systems.

This is crucial for delivering accurate and relevant search results to users, especially for dynamic applications like e-commerce sites, blogs, or real-time data platforms.

Regular synchronization minimizes discrepancies, enhances search performance, and ensures that newly added, updated, or deleted records are reflected promptly, providing a seamless and reliable search experience.

See the docs: typesense.org/docs/guide/syncing-data-into-typesense.html#sync-changes-in-bulk-periodically


collection schema for a webpages of a website

This schema provides a solid foundation for enabling website search, allowing users to search by title, content, description, and keywords, and filter by URL, keywords, and last updated date.

Here's a schema definition for a collection that would enable a website to be searchable:

  {
                                          "name": "website_search",
                                          "fields": [
                                            {
                                              "name": "title",
                                              "type": "string",
                                              "indexing": {
                                                "tokenized": true,
                                                "filterable": true
                                              }
                                            },
                                            {
                                              "name": "content",
                                              "type": "string",
                                              "indexing": {
                                                "tokenized": true,
                                                "filterable": false
                                              }
                                            },
                                            {
                                              "name": "url",
                                              "type": "string",
                                              "indexing": {
                                                "tokenized": false,
                                                "filterable": true
                                              }
                                            },
                                            {
                                              "name": "description",
                                              "type": "string",
                                              "indexing": {
                                                "tokenized": true,
                                                "filterable": false
                                              }
                                            },
                                            {
                                              "name": "keywords",
                                              "type": "string[]",
                                              "indexing": {
                                                "tokenized": false,
                                                "filterable": true
                                              }
                                            },
                                            {
                                              "name": "last_updated",
                                              "type": "datetime",
                                              "indexing": {
                                                "tokenized": false,
                                                "filterable": true
                                              }
                                            }
                                          ]
                                        }

This schema defines a collection called `website_search` with six fields:

1. `title`: The title of the webpage.

* Type: `string`

* Tokenized: `true` (to allow for full-text search)

* Filterable: `true` (to allow for filtering by title)

2. `content`: The HTML content of the webpage.

* Type: `string`

* Tokenized: `true` (to allow for full-text search)

* Filterable: `false` (since this field is primarily used for searching, not filtering)

3. `url`: The URL of the webpage.

* Type: `string`

* Tokenized: `false` (since URLs are typically not tokenized)

* Filterable: `true` (to allow for filtering by URL)

4. `description`: A short description of the webpage (e.g., the meta description tag).

* Type: `string`

* Tokenized: `true` (to allow for full-text search)

* Filterable: `false` (since this field is primarily used for searching, not filtering)

5. `keywords`: A list of keywords associated with the webpage.

* Type: `string[]` (an array of strings)

* Tokenized: `false` (since individual keywords should be treated as a single unit)

* Filterable: `true` (to allow for filtering by keyword)

. `last_updated`: The date and time when the webpage was last updated.

* Type: `datetime`

* Tokenized: `false` (since dates and times are not typically tokenized)

* Filterable: `true` (to allow for filtering by update date)