freeradiantbunny.org

freeradiantbunny.org/blog

whoosh search

Whoosh is a fast, feature-rich, and easy-to-use full-text search library implemented in Python. It's designed to handle indexing and searching of textual content efficiently. Whoosh is lightweight, making it a good choice for smaller projects, and does not require external dependencies such as databases or external search engines like Elasticsearch or Solr.

Key Features of Whoosh:

Use Cases:

Limitations:

Whoosh Set-up

When setting up a search engine using the Whoosh library, you'll need to create and organize a few files. These files include Python scripts for managing the index and handling search requests, as well as HTML templates for interacting with the search engine via a web browser. Below is a guide on how to structure these files and where the code snippets fit.

See: whoosh documentation

Directory Structure

Here's an example of a typical directory structure for your project:


      my_search_engine/
      │
      ├── app.py                # Flask app to handle search requests and indexing
      ├── indexdir/             # Directory where the Whoosh index is stored
      │
      ├── templates/            # Folder for HTML templates
      │   └── search.html       # Search form template for the webpage
      │
      └── static/               # Folder for static files like CSS, JS, etc. (optional)
Step-by-Step Guide to Creating and Running Files

1. Create the Flask Application (`app.py`)

The primary Python script will handle the creation of the Whoosh index, querying it, and displaying results. You can use the `Flask` framework for handling web requests.

Here is how the code fits into `app.py`:

from flask import Flask, render_template, request
      from whoosh import index
      from whoosh.fields import Schema, TEXT, ID
      from whoosh.qparser import QueryParser
      import os
      app = Flask(__name__)
      # Define schema for Whoosh index (create this only once)
      schema = Schema(title=TEXT(stored=True), content=TEXT(stored=True), path=ID(stored=True))
      # Directory to store the index
      index_dir = "indexdir"
      # Function to create the index (only run once to initialize the index)
      def create_index():
      if not os.path.exists(index_dir):
      os.mkdir(index_dir)
      ix = index.create_in(index_dir, schema)
      writer = ix.writer()
      writer.add_document(title="First Document", content="This is the content of the first document.", path="/a")
      writer.add_document(title="Second Document", content="Content of the second document goes here.", path="/b")
      writer.commit()
      # Uncomment to create the index if not already created (run once)
      # create_index()
      # Route to display the search form and show results
      @app.route("/", methods=["GET", "POST"])
      def search():
      results = []
      if request.method == "POST":
      query_string = request.form["query"]
      ix = index.open_dir(index_dir)
      qp = QueryParser("content", schema=ix.schema)
      query = qp.parse(query_string)
      with ix.searcher() as searcher:
      results = searcher.search(query)
      return render_template("search.html", results=results)
      if __name__ == "__main__":
      app.run(debug=True)
Explanation of `app.py`:

- Flask App: This is a simple web server that serves the search page and displays search results.

- Whoosh Index: The `create_index` function creates the Whoosh index if it does not already exist. This index is stored in the `indexdir` directory.

- Query Parsing: When the form is submitted, the search term is parsed, and the Whoosh index is queried for relevant results.

- Template Rendering: The search results are passed to the HTML template (`search.html`) for display.

2. Create the HTML Template (`templates/search.html`)

This file defines the structure of the web page with a search form and a place to display results. Flask automatically looks for HTML files in a `templates/` directory.


    <form method="POST">
    <input type="text" name="query" placeholder="Search..." />
    <button type="submit">Search</button>
    </form>
    <h1>Search Results</h1>
    <ul>
    {% for result in results %}
    <li>{{ result['title'] }} - <a href="{{ result['path'] }}">Link</a></li>
    {% endfor %}
    </ul>
Explanation of `search.html`:

- Form: A simple HTML form is used to collect search input from the user.

- Displaying Results: If there are any search results, they are displayed in a list format. Each result shows the document title, and the user can click on a link to the `path` of the document.

3. Create the Whoosh Index (`indexdir/`)

- indexdir: This is the directory where the Whoosh index will be stored. You do not need to manually create any files inside `indexdir`; the Whoosh library will handle it when you run the `create_index` function.

- Creating the Index: The `create_index` function in `app.py` will create an index with documents if the directory does not already exist. The index will be stored in `indexdir/` and will contain the documents added through the `writer.add_document()` method.

4. Running the Application

1. Run the Flask App:

After you have created the files as described above, you can run your Flask app by running the following command:

python app.py
  

2. Access the Web Interface:

- Open a browser and go to `http://127.0.0.1:5000/` to see the search form.

- Enter a query in the form to search the documents in the Whoosh index.

5. How to Update the Index

- To add new documents or update the index, you can modify the `create_index` function, or you could write a separate script to handle indexing dynamically, depending on your needs. The index can be rebuilt using the `writer.add_document()` method.

Additional Considerations:

- Deployment: When deploying the application to a production environment, you might need to adjust settings for the web server and file storage.

- Index Management: If you want to update the index, use the `writer.update_document()` method for updating existing documents and `writer.delete_document()` for removing documents.

- Search Optimization: You may need to experiment with different query types and parsers (e.g., `QueryParser`, `FuzzyQuery`) to improve search results accuracy.

This setup provides a basic Whoosh search engine integrated with a Flask web application. From here, you can expand the system, add more sophisticated search features, or integrate it with a more complex database.