Releases: pat/thinking-sphinx
v3.3.0
Upgrading
There are no breaking changes in this release - upgrading should be a painless process (but do let me know if that's not the case).
A big thank you to all contributors of this release - in particular, Julio Monteiro and Asaf Barton.
New Features
Running the ts:generate
task loads model instances in batches of 1000. You can customise this globally by setting the batch_size option in your config/thinking_sphinx.yml
file per environment.
Also, if you prefer to have data persisted to your real-time indices after the database transaction is committed, the callback helper works with after_commit
just like it does with after_save
- though you should only use one! Also, if you're using after_commit
, that means you can't wrap tests that involve Sphinx in transactions.
class Article < ActiveRecord::Base
# ...
after_commit ThinkingSphinx::RealTime.callback_for(:article)
# ...
end
Changes to behaviour
- Memoize the default primary keys per context to improve performance.
- Added custom exception class for invalid database adapters, rather than relying on Ruby's default exceptions.
- Sort engine paths for loading indices to ensure they're consistent.
- The ts:start and ts:stop rake tasks default to verbose, and respect Rake's quiet and silent flags, so those are the recommended approach for getting the output you desire.
- Use Riddle's reworked command interface for interacting with Sphinx's command-line tools.
- Delta indexing is now quiet by default (rather than verbose).
- Only toggle the delta value if the record has changed or is new (rather than on every single save call).
Bug Fixes
- Ensure custom primary key columns are handled consistently (Julio Monteiro).
- Fixed handling of multiple field tokens in wildcarding logic.
- Improved Rails 5 / JRuby support.
- Check search query length and raise an exception if they are too long for Sphinx.
- Don't load ActiveRecord earlier than necessary. This avoids loading Rails out of order, which caused problems with Rails 5.
- Load indices before deleting index files, to ensure the files are actually found and deleted.
- Add an explicit source method in the SQLQuery Builder instead of relying on method missing, thus avoiding any global methods named 'source' (Asaf Bartov).
v3.2.0
Upgrading
There are no breaking changes in this release - upgrading should be a painless process (but do let me know if that's not the case).
A big thank you to all contributors of this release, which has been a while coming (it's been almost a year since 3.1.4). Andrey Novikov, Nathaneal Gray, Mattia Gheda, Roman Usherenko, Jonathan del Strother, Chance Downs, Andrew Roth, @arrtchiu, Brandon Dewitt: your commits and feedback is greatly appreciated!
New Features
Much like the existing suspended deltas feature, you can now suspend/resume all Thinking Sphinx callbacks using ThinkingSphinx::Callbacks.suspend!
and ThinkingSphinx::Callbacks.resume!
. This will disable all attribute update callbacks, delta callbacks, real-time update callbacks, and object deletion callbacks. This is particularly useful for unit tests.
Since Thinking Sphinx was first built, the indexing approach has been to process all of the indices in a single indexer
call. It is now possible to opt for a different approach: to call indexer
for each index, one by one:
# This can go in an initialiser:
ThinkingSphinx::Configuration.instance.indexing_strategy = \
ThinkingSphinx::IndexingStrategies::OneAtATime
# or, the default is:
ThinkingSphinx::Configuration.instance.indexing_strategy = \
ThinkingSphinx::IndexingStrategies::AllAtOnce
You can give ThinkingSphinx::Configuration.instance.indexing_strategy
anything you like that responds to call
and expects an array of index options, and yields index names. You can see the implementations of the two approaches here.
Andrey Novikov has given you the ability to use the environment variable NODETACH when running rake ts:start
, and that keeps Sphinx around as a foreground process.
Nathaneal Gray has added a :primary_key
option when defining indices, in case you want something different to your model for Sphinx.
Mattia Gheda has added rand_seed
as an allowed SELECT clause option.
@arrtchiu has added the ability to define Sphinx's MySQL SSL options on a per-index basis (via the set_property
method within an index definition).
JSON attributes are now supported for real-time indices. Also, there's a new exception type ThinkingSphinx::OutOfBoundsError
for when search queries are requesting results outside of their pagination bounds.
Changes to behaviour
- Only use ERB to parse the YAML file if ERB is loaded.
- Disable deletion callbacks when real-time indices are in place and all other real-time callbacks are disabled.
- Reduce memory/object usage for model references (Jonathan del Strother).
- Use Sphinx's bulk insert ability (Chance Downs).
- Reset delta column before core indexing begins (reverting behaviour introduced in 3.1.0). See issue #958 for further discussion.
- Don't populate search results when requesting just the count values (Andrew Roth).
- Improved error messages for duplicate property names and missing columns.
Bug Fixes
- Improved handling of marshalled/demarshalled search results.
- Make preload_indices thread-safe.
- Handle quoting of namespaced tables (Roman Usherenko).
- Fix stale id handling for multiple search contexts (Jonathan del Strother).
- Fixed JRuby compatibility with camelCase method names (Brandon Dewitt).
- Fixed mysql2 compatibility for memory references (Roman Usherenko).
- Ensure SQL table aliases are reliable for SQL-backed index queries.
v3.1.4
Contributor Code of Conduct
This is the first release since I've added a Contributor Code of Conduct to the project. There haven't been any problems in the past, but I like being upfront about this. By participating in this project, you agree to abide by its terms.
Upgrading
If you're upgrading from v3.1.3 and you're not yet using Sphinx 2.2.x, then you'll probably want to add the charset_type
setting to config/thinking_sphinx.yml
for each of your environments - Thinking Sphinx used to specify a default of 'utf-8'
, but Sphinx now insists on UTF-8 and ignores the setting (and will print a warning).
Also, if you're using polymorphic associations within your index definitions and you're using Rails 3.2, you're going to have to upgrade Rails to use this version of Thinking Sphinx. It's just too painful to manage all the different ActiveRecord behaviours. Sorry.
And of course if you're using something older than v3.1.3, reading the earlier release notes is highly recommended.
New Features
If you're using MySQL and SQL-backed indices, and you want to use the GROUP BY
shortcut to speed things up, you can now specify minimal_group_by?
in config/thinking_sphinx.yml
(per environment) instead of needing to call set_property
in each index definition.
For those unfamiliar with this setting: MySQL is often configured by default to not care if you leave off columns from the GROUP BY clause even when you have aggregations. If you enable this, it'll group by only your primary key, along with any columns you specify yourself using the group_by
method in index definitions.
The other new feature of this release is courtesy of Daniel Vandersluis: proper JSON attribute support, which is automatically detected when tied to JSON database columns. Fancy.
Changes to behaviour
- Removing sql_query_info setting, as it's no longer used by Sphinx (nor is it actually used by Thinking Sphinx).
- Remove default charset_type - no longer required for Sphinx 2.2.
- Remove polymorphic association and (unofficial) HABTM query support (when related to Thinking Sphinx) when ActiveRecord 3.2 is involved.
Bug Fixes
- Bug fix for association creation (with polymophic fields/attributes).
- More consistent with escaping table names.
- Handle database settings reliably, now that ActiveRecord 4.2 uses strings all the time.
- Don't try to delete guard files if they don't exist (@exAspArk).
- Kaminari expects prev_page to be available.
v3.1.3
Upgrading
There's no modification required if you're upgrading from v3.1.2, though running rake ts:regenerate
is recommended if you're using real-time indices. Of course, if you're using something older than v3.1.2, reading the earlier release notes is highly recommended.
This is the first release to properly support Rails 4.2.
New Features
Two new features, both related to using Thinking Sphinx with multiple data sources (in particular, different PostgreSQL schemas via the Apartment gem):
Allow for custom IndexSet classes
If you want to change which indices are returned in different situations, you can set a custom class:
ThinkingSphinx::Configuration.instance.index_set_class = TenantIndexSet
Allow for custom offset references
Because Sphinx requires all document ids to be unique - even across different indices - they're generated via a unique offset combined with model primary keys. Normally, Thinking Sphinx will use the same offset calculation if you have more than one index for a given model - as they're likely the same record.
However, if you're using the Apartment gem, then this is probably not the case - you have identical tables in different schemas, with different sets of overlapping primary keys. So, there's a need for indices for each Apartment tenant on one model to be considered as separate. The :offset_option
when defining an index will sort this out.
Here is a gist covering both of these new features.
Changes to behaviour
- Add bigint support for real-time indices, and use bigints for the sphinx_internal_id attribute (mapped to model primary keys) (Chance Downs).
- Convert raw Sphinx results to an array when querying (Bryan Ricker).
- Load Railtie if
Rails::Railtie
is defined, instead of justRails
(Andrew Cone). - Log excerpt SphinxQL queries just like the search queries.
Bug Fixes
- Don't double-up on STI filtering, already handled by Rails.
- Don't load ActiveRecord early - fixes a warning in Rails 4.2.
- Use reflect_on_association instead of reflections, to stick to the public ActiveRecord::Base API.
- Generate de-polymorphised associations properly for Rails 4.2
v3.1.2
Upgrading
There's no modification required if you're upgrading from v3.1.1. Of course, if you're using something older than that, reading the earlier release notes is highly recommended.
New Features
Nothing massive, but a few helpful new things, in order of when they were committed:
Cast Sphinx document ids as 64-bit integers
To ensure document ids are reliably 64-bit integers (aka bigints), set big_document_ids
to true either via set_property
in a specific index or in config/thinking_sphinx.yml
for each appropriate environment.
Real-time index callbacks accept blocks
This is useful when your callback refers to multiple objects via an association and you want to ensure certain data is available by preloading:
ThinkingSphinx::RealTime.callback_for(:post) { |user| user.posts.include(:category) }
Rake task for Sphinx status
ts:status
lets you know if Sphinx is running or not.
Allow binlog_path to be blank
Courtesy of @uhlenbrock, this allows you to disable binlog files if you're not using real-time indices - just set binlog_path
to a blank string for each environment in config/thinking_sphinx.yml
.
Custom location paths for index files
If you want to change where specific indices are located, instead of all of them, you can supply a :path
option to ThinkingSphinx::Index.define
. This will be the directory where the index files will be stored, and an absolute path is expected.
Changes to behaviour
- rebuild task uses clear between stopping the daemon and indexing.
- Default the Capistrano TS Rails environment to use rails_env, and then fall back to stage.
- Paginate records by 1000 results at a time when flagging as deleted.
- Log indices that aren't processed due to guard files existing.
- Raise an exception when a populated search query is modified (as it can't be requeried).
- regenerate task now only deletes index files for real-time indices.
Bug Fixes
- Clear connections when raising connection errors.
- Some association fixes for Rails 4.1.
- Models with more than one index have correct facet counts (using Sphinx 2.1.x or newer).
- Field weights and other search options are now respected from set_property.
- Convert database setting keys to symbols for consistency with Rails (@dimko).
- Use STI base class for polymorphic association replacements.
- Don't update real-time indices for objects that are not persisted (Chance Downs).
- Ensure indexing guard files are removed when an exception is raised (Bobby Uhlenbrock).
v3.1.1
Upgrading
There's no modification required if you're upgrading from v3.1.0. Of course, if you're using something older than that, reading the earlier release notes is highly recommended.
New Features
Sphinx v2.2
This release has the beginnings of support for Sphinx v2.2, including the common options section. This is disabled by default (as it won't work with earlier versions of Sphinx), but if you're keen to give it a spin, add the following to each environment in config/thinking_sphinx.yml
:
common_sphinx_configuration: true
At some point, this will become the default behaviour (likely Thinking Sphinx v3.2.0), but we're a while away from that.
Disabling distributed indices
If you want to disable the automatically generated distributed indices, set distributed_indices: false
in each environment in config/thinking_sphinx.yml
.
Testing with real-time indices
ThinkingSphinx::Test
is now in a position for proper use with real-time indices. Here's how I use it with RSpec (with the relevant examples tagged with :search => true
):
RSpec.configure do |config|
config.before(:each) do
if example.metadata[:search]
ThinkingSphinx::Test.init
ThinkingSphinx::Test.start :index => false
end
ThinkingSphinx::Configuration.instance.settings['real_time_callbacks'] = !!example.metadata[:search]
end
config.after(:each) do
if example.metadata[:search]
ThinkingSphinx::Test.stop
ThinkingSphinx::Test.clear
end
end
end
The setting for disabling real-time callbacks can be used anywhere, of course - but keep in mind this could lead to your model data being out of sync with Sphinx.
HABTM MVAs with query/ranged-query sources
Previously this wasn't supported at all - now, it's only partially supported, for the foreign keys of single HABTM associations (you can't drill further through associations):
has genres.id, :as => :genre_ids, :source => :query
The association/column reference above is slightly misleading - it will actually use the genre_id column in the HABTM join table (thus, avoiding unnecessary joins). You still cannot use the :source
option with columns in other tables accessed through HABTM associations.
Changes to behaviour
- All indices now respond to a public attributes method.
- Log real-time index updates (Demian Ferreiro).
- Alias group and count columns for easier referencing in other clauses.
- Capistrano tasks use thinking_sphinx_rails_env (defaults to standard environment) (Robert Coleman).
- Raise an exception when a referenced column does not exist.
- Connection error messages now mention Sphinx, instead of just MySQL.
- Include full statements when query execution errors are raised (uglier, but more useful when debugging).
Bug Fixes
- Improved handling of association searches with real-time indices, including via has_many :though associations (Rob Anderton).
- Fixing wildcarding of Unicode strings.
- Handle JDBC connection errors appropriately (Adam Hutchison).
- Only expand log directory if it exists.
- :thinking_sphinx_roles is now used consistently in Capistrano v3 tasks.
- :populate option is now respected for single-model searches.
- Don't send unicode null characters to real-time Sphinx indices.
- Avoid null values in MVA query/ranged-query sources.
- respond_to? works reliably with masks (Konstantin Burnaev).
- Always use connection options for connection information.
- Don't presume all indices for a model have delta pairs, even if one does.
- Don't instantiate blank strings (via inheritance type columns) as constants.
- Don't apply attribute-only updates to real-time indices.
v3.1.0
New Features
Thinking Sphinx v3.1.0 is the first v3 release to support JRuby. You'll need the jdbc-mysql gem as well, and then it'll be smooth sailing. However, Rails 3.1 and MRI 1.9.2 are no longer supported - please upgrade to 3.2 and 1.9.3 (or 2.0.0/2.1.0) respectively.
Upgrading
Sphinx Versions
Thinking Sphinx now expects Sphinx v2.1.2 or newer by default. If you're using v2.1.2, or something newer than that, then you should not make any of the changes listed in this section.
However, If you're using Sphinx 2.1.1 or earlier, you'll want to add these lines to an initializer:
ThinkingSphinx::Middlewares::DEFAULT.insert_after(
ThinkingSphinx::Middlewares::Inquirer, ThinkingSphinx::Middlewares::UTF8
)
ThinkingSphinx::Middlewares::RAW_ONLY.insert_after(
ThinkingSphinx::Middlewares::Inquirer, ThinkingSphinx::Middlewares::UTF8
)
And add the following setting to config/thinking_sphinx.yml
:
development:
utf8: false
# repeat for each environment as necessary
If you're using Sphinx 2.0.x, you'll also need to put the following in an initializer as well:
ThinkingSphinx::SphinxQL.variables!
Custom SELECT Statements
If you're sending through custom SELECT statements via the :select
option in search calls, please note that you'll need to supply *
or specific column names to have them returned (the *
is no longer supplied by default if you're setting something custom). So:
Article.search 'pancakes', :select => 'weight() as w'
# becomes
Article.search 'pancakes', :select => '*, weight() as w'
If you don't want to return all the columns/attributes, but you do want ActiveRecord objects instantiated in your search results, you'll need to include the sphinx_internal_id
and sphinx_internal_class
columns. It's also worth noting that any attribute you refer to in other parts of the query (for example, the ORDER
clause) must exist in your SELECT
clause.
Capistrano
Capistrano v3 is now supported, and there are now cap tasks for real-time indices (thinking_sphinx:generate
and thinking_sphinx:regenerate
). There's no longer any automatic symlinking of directories - it's recommended that pid, index and configuration files are all located in the shared directory permanently, using something like the following in your config/thinking_sphinx.yml
file:
production:
pid_file: /path/to/app/shared/tmp/searchd.pid
indices_location: /path/to/app/shared/db/sphinx
configuration_file: /path/to/app/shared/production.sphinx.conf
Also: previously, thinking_sphinx:index
and thinking_sphinx:start
would automatically run after deploy:cold
. This is no longer the case, partially because the behaviour is different with real-time indices, and partially because it's better for you to have control over those decisions instead.
New features
- Set custom database settings within the index definition, using the
set_database
method within a index definition block. You can either pass in a database settings hash (like what would exist indatabase.yml
), or an environment name which corresponds to a known database configuration. - All delta records can have their core pairs marked as deleted after a suspended delta (use
ThinkingSphinx::Deltas.suspend_and_update
instead ofThinkingSphinx::Deltas.suspend
). - Pass through :delta_options to delta processors in index definitions (Timo Virkalla).
- Track what's being indexed, and don't double-up while indexing is running. Single indices (e.g. deltas) can be processed while a full index is happening, though.
- Persistent connections can be disabled if you wish (
ThinkingSphinx::Connection.persistent = false
). :group
option within:sql
options in a search call is passed through to the underlying ActiveRecord relation (Siarhei Hanchuk).- Capistrano recipe now includes tasks for realtime indices.
- Wildcard/starring can be applied directly to strings using ThinkingSphinx::Query.wildcard('pancakes'), and escaping via ThinkingSphinx::Query.escape('pancakes').
- Adding max_predicted_time search option (Sphinx 2.2.x).
- Support for Sphinx 2.2.x's HAVING and GROUP N BY SphinxQL options (via
:having
and:group_best
options respectively). - JRuby support (with Sphinx 2.1 or newer).
- Support for Capistrano v3 (Alexander Tipugin).
Changes to behaviour
- Provide a distributed index per model that covers both core and delta indices.
- Reset the delta column to true after core indexing is completed, instead of before, and don't filter out delta records from the core source.
- Insist on at least
*
for SphinxQL SELECT statements. - MRI 1.9.2 is no longer supported.
- Rails 3.1 is no longer supported.
- Sphinx functions are now the default, instead of the legacy special variables (in line with Sphinx 2.1.x).
- UTF-8 forced encoding is now disabled by default (in line with Sphinx 2.1.x).
- Capistrano recipe no longer automatically adds thinking_sphinx:index and thinking_sphinx:start to be run after deploy:cold.
- Auto-wildcard/starring (via
:star => true
) now treats escaped characters as word separators. - Geodist calculation is now prepended to the SELECT statement, so it can be referred to by other dynamic attributes (order matters in SELECT statements).
- Extracting join generation into its own gem: Joiner.
- Updating Riddle requirement to >= 1.5.10.
Bug fixes
- Don't split function calls when casting timestamps (Timo Virkalla).
- Separate per_page/max_matches values are respected in facet searches (Timo Virkkala).
- Track indices on parent STI models when marking documents as deleted.
- Blank STI values are converted to the parent class in Sphinx index data (Jonathan Greenberg).
- Destroy callbacks are ignored for non-persisted objects.
- Indices will be detected in Rails engines upon configuration.
v3.0.6
Upgrading
From this point onwards, Thinking Sphinx requires Sphinx v2.0.5 or newer.
If you're using Sphinx 2.1.1 or newer, you should add the following to an initialiser:
ThinkingSphinx::SphinxQL.functions!
Sphinx 2.1.x releases no longer support special variables with the @
prefix - instead, there are equivalent functions. The code above switches Thinking Sphinx to use the functions instead.
If you're using Sphinx 2.1.2 or newer, you'll also want to add the following to your initializer (as 2.1.2 now returns strings as UTF-8 properly, so conversion isn't required):
ThinkingSphinx::Middlewares::DEFAULT.delete ThinkingSphinx::Middlewares::UTF8
And in your config/thinking_sphinx.yml
file:
development:
utf8: true
# repeat for each environment as necessary
All of these changes will become the default behaviour in Thinking Sphinx v3.1.0.
New features
- MySQL users can enable a minimal GROUP BY statement, to speed up queries:
set_property :minimal_group_by? => true
. search_for_ids
can now be chained onto scoped search calls.- Added ability to switch between Sphinx special variables and the equivalent functions. Sphinx 2.1.x requires the latter, and that behaviour will become the default in Thinking Sphinx 3.1.0.
- Added ability to disable UTF-8 forced encoding, now that Sphinx 2.1.2 returns UTF-8 strings by default. This will be disabled by default in Thinking Sphinx 3.1.0.
- Added new search options in Sphinx 2.1.x.
skip_time_zone
setting is now available per environment viaconfig/thinking_sphinx.yml
to avoid thesql_query_pre
time zone command.- Raise an error if no indices match the search criteria (Bryan Ricker).
Changes to behaviour
- Rake's silent mode is respected for indexing (@endoscient).
- Insist on the log directory existing, to ensure correct behaviour for symlinked paths. (Michael Pearson).
- Realtime fields and attributes now accept symbols as well as column objects, and fields can be sortable (with a
_sort
prefix for the matching attribute). - Automatically load Riddle's Sphinx 2.0.5 compatibility changes.
- Don't clobber custom
:select
options for facet searches (Timo Virkkala). - Sphinx connection failures now have their own class,
ThinkingSphinx::ConnectionError
, instead of the standardMysql2::Error
. - Always use DISTINCT in group concatenation.
- Have tests index UTF-8 characters where appropriate (Pedro Cunha).
- Separated directory preparation from data generation for real-time index (re)generation tasks.
- Updating Riddle dependency to be >= 1.5.9.
Bug fixes
- Cast every column to a timestamp for timestamp attributes with multiple columns.
- Don't use Sphinx ordering if SQL order option is supplied to a search.
- Custom middleware and mask options now function correctly with model-scoped searches.
- Suspended deltas now no longer update core indices as well.
- Use alphabetical ordering for index paths consistently (@grin).
- Convert very small floats to fixed format for geo-searches.