You can search for any word or
phrase on a Web site by typing the word or phrase
into a query form and clicking the button to
execute the query (for example, the Execute Query
button on the sample query form). This section
covers the following topics:
Searches produce a list of files
that contain the word or phrase no matter where
they appear in the text. This list gives the
rules for formulating queries:
Consecutive words are
treated as a phrase; they must appear in
the same order within a matching
document.
Queries are
case-insensitive, so you can type your
query in uppercase or lowercase.
You can search for any
word except for those in the exception
list (for English, this includes a,
an, and, as,
and other common words), which are
ignored during a search.
Words in the exception
list are treated as placeholders in
phrase and proximity queries. For
example, if you searched for Word
for Windows, the results could give
you Word for Windows and
Word and Windows, because for
is a noise word and appears in the
exception list.
Punctuation marks such as
the period (.), colon (:), semicolon (;),
and comma (,) are ignored during a
search.
To use specially treated
characters such as &, |, ^, #, @, $,
(, ), in a query, enclose your query in
quotation marks ().
To search for a word or
phrase containing quotation marks,
enclose the entire phrase in quotation
marks and then double the quotation marks
around the word or words you want to
surround with quotes. For example,
World-Wide Web or
Web
searches for World-Wide Web or
Web.
You can insert Boolean operators (AND,
OR, and NOT)
and the proximity
operator (NEAR) to
specify additional search information.
The wildcard
character (*) can match words with a
given prefix. The query esc* matches the
terms ESC,
escape, and so on.
Free-text queries
can be specified without regard to query
syntax.
Vector
space queries can be specified.
ActiveX™ (OLE) and
file attribute property
value queries can be issued.
Boolean and proximity operators
can create a more precise query.
To Search For
|
Example
|
Results
|
Both
terms in the same page
|
access
and basic
Or
access
& basic
|
Pages
with both the words access
and basic
|
Either
term in a page
|
cgi
or isapi
Or
cgi
| isapi
|
Pages
with the words cgi or
isapi
|
The
first term without the second term
|
access
and not basic
Or
access
& ! basic
|
Pages
with the word access but not
basic
|
Pages
not matching a property value
|
not
@size = 100
Or
!
@size = 100
|
Pages
that are not 100 bytes
|
Both
terms in the same page, close together
|
excel
near project
Or
excel
~ project
|
Pages
with the word excel near the
word project
|
Hints:
You can add parentheses
to nest expressions within a query. The
expressions in parentheses are evaluated
before the rest of the query.
Use double quotes
() to indicate that a Boolean or NEAR
operator keyword should be ignored in
your query. For example, Abbott and
Costello will match pages with the
phrase, not pages that match the Boolean
expression. In addition to being an
operator, the word and is a
noise word in English.
The NEAR
operator is similar to the AND
operator in that NEAR
returns a match if both words being
searched for are in the same page.
However, the NEAR
operator differs from AND
because the rank assigned by NEAR
depends on the proximity of words. That
is, the rank of a page with the
searched-for words closer together is
greater than or equal to the rank of a
page where the words are farther apart.
If the searched-for words are more than
50 words apart, they are not considered
near enough, and the page is assigned a
rank of zero.
The NOT
operator can be used only after an AND
operator in content queries; it can be
used only to exclude pages that match a
previous content restriction. For
property value queries, the NOT
operator can be used apart from the AND
operator.
The AND
operator has a higher precedence than OR.
For example, the first three queries are
equal, but the fourth is not:a AND b OR c
c OR a AND b
c OR (a AND b)
(c OR a) AND b
Note The
symbols (&, |, !, ~) and the English keywords
AND, OR, NOT,
and NEAR work the same way in
all languages supported by Index Server.
Localized keywords are also available when the
browser locale is set to one of the following six
languages:
Language
|
Keywords
|
German
|
UND,
ODER, NICHT,
NAH
|
French
|
ET,
OU, SANS,
PRES
|
Spanish
|
Y,
O, NO, CERCA
|
Dutch
|
EN,
OF, NIET,
NABIJ
|
Swedish
|
OCH,
ELLER, INTE,
NÄRA
|
Italian
|
E, O,
NO, VICINO
|
Note The
NEAR operator can be applied only to words or
phrases.
Wildcard
operators help you find pages containing words
similar to a given word.
The
query engine finds pages that best match the
words and phrases in a free-text query. This is
done by automatically finding pages that match
the meaning, not the exact wording, of the query.
Boolean, proximity, and wildcard operators are
ignored within a free-text query. Free-text
queries are prefixed with $contents.
The query engine supports vector
space queries. Vector queries return pages that
match a list of words and phrases. The rank of
each page indicates how well the page matched the
query.
To Search For
|
Example
|
Results
|
Pages
that contain specific words
|
light,
bulb
|
Files
with words that best match the words
being searched for
|
Pages
that contain weighted prefixes, words,
and phrases
|
invent*,
light[50], bulb[10], "light
bulb"[400]
|
Files
that contain words prefixed by
invent, the words
light, bulb, and
the phrase light bulb (the
terms are weighted)
|
Components in vector
queries are separated by commas.
Components in vector
queries can be weighted by using the
[weight] syntax.
Pages returned by vector
queries do not necessarily match every
term in the query.
Vector queries work best
when the results are sorted by rank.
With property value queries, you
can find files that have property values that
match a given criteria. The properties over which
you can query include basic file information like
file name and file size, and ActiveX properties
including the document summary (information) that
is stored in files created by ActiveX-aware
applications.
There are two types of property
queries:
Relational
property queries consist of an
at character (@), a property name,
a relational
operator, and a property value.
For example, to find all of the files
larger than one million bytes, issue the
query @size > 1000000.
Regular expression
property queries consist of a number
sign (#), a property name, and a regular
expression for the property value.
For example, to find to find all of the
video (.avi) files, issue the query
#filename *.avi. Regular expressions will
never match the special properties
contents (#contents) and all (#all).
Properties that are not retrievable at
query time cannot be used in # queries.
these include HTML META properties not
stored in the property cache.
This section covers the following
topics:
Property names are preceded by
either the at (@) or number sign (#)
character. Use @ for relational queries, and #
for regular expression queries.
If no property name is specified,
@contents is assumed.
Properties available for all
files include:
Property Name
|
Description
|
All
|
Matches
words, phrases, and any property
|
Contents
|
Words
and phrases in the file
|
Filename
|
Name of
the file
|
Size
|
File
size
|
Write
|
Last
time the file was modified
|
ActiveX property values can also
be used in queries. Web sites with files created
by most ActiveX-aware applications can be queried
for these properties:
Property Name
|
Description
|
DocTitle
|
Title of
the document
|
DocSubject
|
Subject
of the document
|
DocAuthor
|
The
documents author
|
DocKeywords
|
Keywords
for the document
|
DocComments
|
Comments
about the document
|
For a complete list of property
names, see the List
of Property Names later on this page.
Relational operators are used in
relational property queries.
To Search For
|
Example
|
Results
|
Property
values in relation to a fixed value
|
@size
< 100
@size <= 100
@size = 100
@size != 100
@size >= 100
@size > 100
|
Files
whose size matches the query
|
Property
values with all of a set of bits on
|
@attrib
^a 0x820
|
Compressed
files with the archive bit on
|
Property
values with some of a set of bits on
|
@attrib
^s 0x20
|
Files
with the archive bit on
|
To Search For
|
Example
|
Results
|
A
specific value
|
@DocAuthor
= Bill Barnes
|
Files
authored by Bill Barnes
|
Values
beginning with a prefix
|
#DocAuthor
George*
|
Files
whose author property begins with
George
|
Files
with any of a set of extensions
|
#filename
*.|(exe|,dll|,sys|)
|
Files
with .exe, .dll, or .sys extensions
|
Files
modified after a certain date
|
@write
> 96/2/14 10:00:00
|
Files
modified after February 14, 1996 at 10:00
GMT
|
Files
modified after a relative date
|
@write
> -1d2h
|
Files
modified in the last 26 hours
|
Vectors
matching a vector
|
@vectorprop
= { 10, 15, 20 }
|
ActiveX
documents with a vectorprop value of {
10, 15, 20 }
|
Vectors
where each value matches a criteria
|
@vectorprop
>^a 15
|
ActiveX
documents with a vectorprop value in
which all values in the vector are
greater than 15
|
Vectors
where at least one value matches a
criteria
|
@vectorprop
=^s 15
|
ActiveX
documents with a vectorprop value in
which at least one value is 15
|
Be sure to use the pound
(#) character before the property name
when using a regular expression in a
property value, and an at (@)
character otherwise. The equal (=)
relational operator is assumed for
regular-expression queries.
File name (#filename) is
the only property that efficiently
supports regular expressions with
wildcards to the left of text.
Date and time values are
of the form yyyy/mm/dd hh:mm:ss
or yyyy-mm-dd hh:mm:ss. The
first two characters of the year and the
entire time can be omitted. If you omit
the first two characters of the year,
then 29 or less is interpreted as the
year 2000, and 30 or greater is
interpreted as the year 1900. All dates
and times are in Greenwich Mean Time
(GMT).
Dates and times relative
to the current time can be expressed with
a minus (-) character followed by zero or
by more integer unit and time unit pairs.
Time units are expressed as: (y) for
years, (m) for months, (w) for weeks, (d)
for days, (h) for hours, (n) for minutes,
and (s) for seconds. A three-digit
millisecond value can be optionally
specified after the seconds value in date
expressions. For example, 1997/12/8
10:10:03:452
Currency values are of
the form x.y, where x
is the whole value amount and y
is the fractional amount. There is no
assumption about units.
Boolean values are (t) or
(true) for TRUE and (f)
or (false) for FALSE.
Vectors (VT_VECTOR) are
expressed as an opening brace ({),
followed by a comma-separated list of
values, then a closing brace (}).
Single-value expressions
that are compared against vectors are
expressed as a relational
operator, then a (^a) for all of
or a (^s) for some of.
Numeric values can be in
decimal or hexadecimal (preceded by 0x).
The contents
property does not support relational
operators. If a relational operator is
specified, no results will be found. For
example, @contents Microsoft will find
documents containing Microsoft, but
@contents=Microsoft will
find none.
Regular expressions in property
queries are defined as follows:
Any character except
asterisk (*), period (.), question mark
(?), and vertical bar (|) defaults to
matching just itself.
Regular expressions can
be enclosed in matching quotes (),
and must be enclosed in quotes if they
contain a space ( ) or closing
parenthesis ()).
The characters *, ., and
? behave as they behave in Windows; they
match any number of characters, match (.)
or end of string, and match any one
character, respectively.
The character | is an
escape character. After |, the following
characters have special meaning:
( opens a group. Must be
followed by a matching ).
) closes a group. Must be
preceded by a matching (.
[ opens a character
class. Must be followed by a matching
(un-escaped) ].
{ opens a counted match.
Must be followed by a matching }.
} closes a counted match.
Must be preceded by a matching {.
, separates OR
clauses.
* matches zero or more
occurrences of the preceding expression.
? matches zero or one
occurrences of the preceding expression.
+ matches one or more
occurrences of the preceding expression.
Anything else, including
|, matches itself.
Between square brackets
([]) the following characters have
special meaning:
^ matches everything but
following classes. Must be the first
character.
] matches ]. May only be
preceded by ^, otherwise it closes the
class.
- range operator.
Preceded and followed by normal
characters.
Anything else matches
itself (or begins or ends a range at
itself).
Between curly braces ({})
the following syntax applies:
|{m|} matches exactly m
occurrences of the preceding expression.
(0 < m < 256).
|{m,|} matches at least m
occurrences of the preceding expression.
(1 < m < 256).
|{m,n|} matches between m
and n occurrences of the
preceding expression, inclusive. (0 <
m < 256, 0 < n < 256).
To match *, ., and ?,
enclose them in brackets (for example,
|[*]sample will match
*sample).
Example
|
Results
|
@size
> 1000000
|
Pages
larger than one million bytes
|
@write
> 95/12/23
|
Pages
modified after the date
|
Apple
tree
|
Pages
with the phrase apple tree
|
"apple
tree"
|
Same as
above
|
@contents
apple tree
|
Same as
above
|
Microsoft
and @size > 1000000
|
Pages
with the word Microsoft that
are larger than one million bytes
|
"microsoft
and @size > 1000000"
|
Pages
with the phrase specified (not the same
as above)
|
#filename
*.avi
|
Video
files (the # prefix is used because the
query contains a regular expression)
|
@attrib
^s 32
|
Pages
with the archive attribute bit on
|
@docauthor
= John Smith
|
Pages
with the given author
|
$contents
why is the sky blue?
|
Pages
that match the query
|
@size
< 100 & #filename *.gif
|
Graphics
Interchange Format (GIF) files less than
100 bytes in size
|
These properties are always
available for queries. Additional properties may
also be available depending on the configuration
of the Web server.
Friendly Name
|
Datatype
|
Property
|
A_HRef
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Text of HTML HREF. This
property name was created for Microsoft®
Site Server and corresponds with the
Index Server property name HtmlHRef. Can
be queried but not retrieved.
|
Access
|
VT_FILETIME
|
Last time file was accessed.
|
All
|
(not
applicable)
|
Searches every property for
a string. Can be queried but not
retrieved.
|
AllocSize
|
DBTYPE_I8
|
Size of disk allocation for
file.
|
Attrib
|
DBTYPE_UI4
|
File attributes. Documented
in Win32 SDK.
|
ClassId
|
DBTYPE_GUID
|
Class ID of object, for
example, WordPerfect, Word, and so on.
|
Characterization
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Characterization, or
abstract, of document. Computed by Index
Server.
|
Contents
|
(not
applicable)
|
Main contents of file. Can
be queried but not retrieved.
|
Create
|
VT_FILETIME
|
Time file was created.
|
Directory
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Physical path to the file,
not including the file name.
|
DocAppName
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Name of application that
created the file.
|
DocAuthor
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Author of document.
|
DocByteCount
|
DBTYPE_14
|
Number of bytes in a
document.
|
DocCategory
|
DBTYPE_STR |
DBTYPE_BYREF
|
Type of document such
as a memo, schedule, or whitepaper.
|
DocCharCount
|
DBTYPE_I4
|
Number of characters in
document.
|
DocComments
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Comments about document.
|
DocCompany
|
DBTYPE_STR |
DBTYPE_BYREF
|
Name of the company
for which the document was written.
|
DocCreatedTm
|
VT_FILETIME
|
Time document was created.
|
DocEditTime
|
VT_FILETIME
|
Total time spent editing
document.
|
DocHiddenCount
|
DBTYPE_14
|
Number of hidden
slides in a Microsoft® PowerPoint
document.
|
DocKeywords
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Document keywords.
|
DocLastAuthor
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Most recent user who edited
document.
|
DocLastPrinted
|
VT_FILETIME
|
Time document was last
printed.
|
DocLastSavedTm
|
VT_FILETIME
|
Time document was last
saved.
|
DocLineCount
|
DBTYPE_14
|
Number of lines
contained in a document.
|
DocManager
|
DBTYPE_STR |
DBTYPE_BYREF
|
Name of the manager
of the documents author.
|
DocNoteCount
|
DBTYPE_14
|
Number of pages with
notes in a PowerPoint document.
|
DocPageCount
|
DBTYPE_I4
|
Number of pages in document.
|
DocParaCount
|
DBTYPE_14
|
Number of paragraphs
in a document.
|
DocPartTitles
|
DBTYPE_STR |
DBTYPE_VECTOR
|
Names of document
parts. For example, in Excel part titles
are the names of spread sheets, in
PowerPoint slide titles, and in Word for
Windows the names of the documents in the
master document.
|
DocPresentationTarget
|
DBTYPE_STR|DBTYPE_BYREF
|
Target format (35mm,
printer, video, and so on) for a
presentation in PowerPoint.
|
DocRevNumber
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Current version number of
document.
|
DocSlideCount
|
DBTYPE_14
|
Number of slides in a
PowerPoint document.
|
DocSubject
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Subject of document.
|
DocTemplate
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Name of template for
document.
|
DocTitle
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Title of document.
|
DocWordCount
|
DBTYPE_I4
|
Number of words in document.
|
FileIndex
|
DBTYPE_I8
|
Unique ID of file.
|
FileName
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Name of file.
|
HitCount
|
DBTYPE_I4
|
Number of hits (words
matching query) in file.
|
HtmlHRef
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Text of HTML HREF. Can
be queried but not retrieved.
|
HtmlHeading1
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Text of HTML document in
style H1. Can be queried but not
retrieved.
|
HtmlHeading2
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Text of HTML document in
style H2. Can be queried but not
retrieved.
|
HtmlHeading3
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Text of HTML document in
style H3. Can be queried but not
retrieved.
|
HtmlHeading4
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Text of HTML document in
style H4. Can be queried but not
retrieved.
|
HtmlHeading5
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Text of HTML document in
style H5. Can be queried but not
retrieved.
|
HtmlHeading6
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Text of HTML document in
style H6. Can be queried but not
retrieved.
|
Img_Alt
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Alternate text for
<IMG> tags. Can be queried but
not retrieved.
|
Path
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Full physical path to file,
including file name.
|
Rank
|
DBTYPE_I4
|
Rank of row. Ranges from 0
to 1000. Larger numbers indicate better
matches.
|
RankVector
|
DBTYPE_I4
| DBTYPE_VECTOR
|
Ranks of individual
components of a vector
query.
|
ShortFileName
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Short (8.3) file name.
|
Size
|
DBTYPE_I8
|
Size of file, in bytes.
|
USN
|
DBTYPE_I8
|
Update Sequence Number. NTFS
drives only.
|
VPath
|
DBTYPE_WSTR
| DBTYPE_BYREF
|
Full virtual path to file,
including file name. If more than one
possible path, then the best match for
the specific query is chosen.
|
WorkId
|
DBTYPE_I4
|
Internal ID for file. Used
within Index Server.
|
Write
|
VT_FILETIME
|
Last time file was written.
|
|