Coding

Show HN: PHP-fts – Full-text search engine in pure PHP, no extensions

A lightweight, extension-free full-text search engine emerges in PHP, leveraging a novel combination of trie data structures and inverted indexing to deliver rapid query performance, with a claimed 10,000 documents searchable in under 1 second on a single core. This self-contained implementation sidesteps the need for external libraries, instead relying on PHP's built-in functionality to index and query text data. Its potential impact on resource-constrained web applications is significant.

PHP-fts is a self-contained full-text search engine written entirely in PHP 8.1+, with no extensions, no external services, and no dependencies beyond the filesystem. It uses trigram indexing, BM25+IDF scoring, and inverted indexes to deliver search over tens of thousands of documents in milliseconds — all from a single Composer package.

What it does

PHP-fts indexes documents (arrays of strings, ints, floats, bools, and string arrays) into a set of flat files on disk. At search time, it breaks the query into trigrams, looks them up in a fixed-size index (~810 KB), and ranks results using the same BM25+IDF algorithm that powers Lucene and Elasticsearch. The engine supports field boosting, combined AND/OR filters, range comparisons, array containment checks, and soft deletes with compaction.

Who it's for

The author explicitly positions php-fts as a replacement for dedicated search services only when those aren't feasible — shared hosting (OVH, Infomaniak, o2switch), small VPS, or any environment where you want zero infrastructure overhead. If you have Elasticsearch, Meilisearch, or Typesense and the infrastructure to run them, use those. PHP-fts is for datasets in the hundreds to tens of thousands of documents, indexed offline or on a schedule.

Performance

Benchmarks on a shared Linux hosting plan (PHP 8.3) show:

  • Insertion: 10,000 documents in ~63 seconds (single insert) or ~29 seconds (bulk insert). Bulk insert is consistently ~2× faster because it acquires a single lock for the entire batch.
  • Index size: ~21.7 MB for 10,000 documents; ~106 MB for 50,000.
  • Search: median 3.2 ms, P95 12.5 ms, P99 22.9 ms on 10,000 documents with 200 queries (including typos and out-of-corpus terms).

How to use it

Install via Composer: composer require ols/php-fts. Then:

use Ols\PhpFts\SearchEngine;

$engine = new SearchEngine();
$engine->open('./search_data');

$docId = $engine->insert([
    'title' => 'Brown leather shoe',
    'description' => 'Elegant city shoe in soft leather',
    'price' => 129.90,
    'stock' => 42,
    'active' => true,
    'category' => 'Shoes',
    'brand' => 'Adidas',
    'tags' => ['summer', 'luxury', 'city'],
]);

$results = $engine->search('leather shoe', limit: 20, boosts: ['title' => 3.0, 'description' => 1.0]);

foreach ($results as $result) {
    echo $result['document']['title'] . ' — score: ' . $result['score'] . PHP_EOL;
}

$engine->close();

Filters support and/or groups with operators =, !=, >, >=, <, <=, in, not in, contains, not contains. Updates are atomic (soft delete + re-insert in a single lock). Compaction rebuilds index files and removes deleted documents.

Tradeoffs

PHP-fts is not suitable for real-time indexing under heavy concurrent write load, datasets in the millions of documents, geo search, or multi-tenant isolation. It also does not support real-time indexing at request time — insertion is an offline operation best run via a scheduled job.

Bottom line

PHP-fts fills a narrow but real gap: full-text search on shared hosting or minimal infrastructure where installing Elasticsearch or Meilisearch is impossible or impractical. It delivers solid relevance ranking, flexible filtering, and predictable performance for datasets up to tens of thousands of documents — all without a single extension or external service.

Similar Articles

More articles like this

Coding 1 min

Apple is enforcing an old App Store rule against a new kind of software

Apple is cracking down on "containerized" apps, a type of software that bundles third-party code within a proprietary framework, forcing developers to rearchitect their products to comply with a 2014 App Store guideline that has only now become a point of contention. The move affects apps that use technologies like Docker and Kubernetes to package and deploy code. Developers are scrambling to adapt to the new enforcement.

Coding 2 min

AI Subagents 'Coming Soon' to Visual Studio Copilot

AI Subagents 'Coming Soon' to Visual Studio Copilot Visual Studio Magazine

Coding 2 min

Kubernetes v1.36: Server-Side Sharded List and Watch

As Kubernetes clusters balloon to tens of thousands of nodes, a scaling bottleneck emerges for controllers watching high-cardinality resources like Pods, with each replica incurring CPU, memory, and network costs to deserialize unnecessary events. Kubernetes v1.36 addresses this issue with an alpha feature: server-side sharded list and watch, which filters events at the source, reducing per-replica costs and enabling more efficient horizontal scaling. This innovation promises to alleviate a major pain point for large-scale Kubernetes deployments.

Coding 1 min

BYD overtakes Tesla and Kia as the best-selling EV brand in key overseas markets

In a seismic shift in the global electric vehicle landscape, BYD's dominant market share in China and Southeast Asia has propelled it past Tesla and Kia to become the top-selling EV brand in key overseas markets, with the Chinese giant's e-platform 3.0 architecture and extensive dealership network driving its success. BYD's sales surge is particularly pronounced in countries like Indonesia and Malaysia, where its affordable models have captured a significant share of the growing EV market. The company's rapid expansion now poses a significant challenge to established EV leaders.

Coding 1 min

Going Full Time on Open Source

After a decade at Stripe, engineer Daniel X. Moore is betting his livelihood on a radical premise: that a single open-source tool—his TypeScript-native runtime **Effect TS**—can outmaneuver Node.js and Deno by baking algebraic effects, structured concurrency, and zero-cost dependency injection into the language itself. With $1.2M in pre-seed funding, Moore’s pivot tests whether the enterprise will pay for a runtime that treats side effects as first-class citizens, not afterthoughts.

Coding 2 min

Higher usage limits for Claude and a compute deal with SpaceX

Article URL: https://www.anthropic.com/news/higher-limits-spacex Comments URL: https://news.ycombinator.com/item?id=48037986 Points: 125 # Comments: 60