Fields are named, and you can limit your searches to a single field (eg. search through "title" only)
or a subset of fields (eg. to "title" and "abstract" only). Sphinx index format generally supports up to
256 fields. However, up to version 2.0.1-beta indexes were forcibly limited by 32 fields, because of
certain complications in the matching engine. Full support for up to 256 fields was added in version
2.0.2-beta.
Note that the original contents of the fields are not stored in the Sphinx index. The text that you
send to Sphinx gets processed, and a full-text index (a special data structure that enables quick
searches for a keyword) gets built from that text. But the original text contents are then simply
discarded. Sphinx assumes that you store those contents elsewhere anyway.
Moreover, it is impossible to fully reconstruct the original text, because the specific whitespace,
capitalization, punctuation, etc will all be lost during indexing. It is theoretically possible to
partially reconstruct a given document from the Sphinx full-text index, but that would be a slow
process (especially if the CRC dictionary is used, which does not even store the original keywords
and works with their hashes instead).
3.3. Attributes
Attributes are additional values associated with each document that can be used to perform
additional filtering and sorting during search.
It is often desired to additionally process full-text search results based not only on matching
document ID and its rank, but on a number of other per-document values as well. For instance, one
might need to sort news search results by date and then relevance, or search through products within
specified price range, or limit blog search to posts made by selected users, or group results by
month. To do that efficiently, Sphinx allows to attach a number of additional attributes to each
document, and store their values in the full-text index. It's then possible to use stored values to filter,
sort, or group full-text matches.
Attributes, unlike the fields, are not full-text indexed. They are stored in the index, but it is not
possible to search them as full-text, and attempting to do so results in an error.
For example, it is impossible to use the extended matching mode expression @column 1 to match
documents where column is 1, if column is an attribute, and this is still true even if the numeric
digits are normally indexed.
Attributes can be used for filtering, though, to restrict returned rows, as well as sorting or result
grouping; it is entirely possible to sort results purely based on attributes, and ignore the search
relevance tools. Additionally, attributes are returned from the search daemon, while the indexed text
is not.
A good example for attributes would be a forum posts table. Assume that only title and content
fields need to be full-text searchable - but that sometimes it is also required to limit search to a
certain author or a sub-forum (ie. search only those rows that have some specific values of
author_id or forum_id columns in the SQL table); or to sort matches by post_date column; or to
group matching posts by month of the post_date and calculate per-group match counts.
This can be achieved by specifying all the mentioned columns (excluding title and content, that are
full-text fields) as attributes, indexing them, and then using API calls to setup filtering, sorting, and
grouping. Here as an example.
Example sphinx.conf part:
...
sql_query = SELECT id, title, content, \
author_id, forum_id, post_date FROM my_forum_posts