Note:- This document pertains to the installation & configuration of site Search version 2.0. Previous versions cannot be configured using the instructions provided below. If you are currently running a version lower than version 2.0 it is strongly recommended that you upgrade.
This document is divided into two parts. The first part explains installation of site Search version 2.0 and gives a detailed explanation of the variables to be configured for running site Search. The second part explains the "error messages" outputted by site Search when run with the value of the variable "$check_script" set to "1".
Installation:-
Like any other "CGI" application, site Search has to installed
in the "HTTP" server's "cgi-bin" directory. Uncompress and untar the file
and copy it to the server's cgi-bin directory which normally would be a
directory called "cgi-bin" under the server's base directory. Make
sure that the permissions are world executable (initially set). It is recommended
that site Search is resident in its own directory under the "cgi-bin" directory.
Configuration:-
Though site Search has 22 options to be configured, it is relatively
simple to get the program up and going. Given below are the list of 22
options with their possible values and what the values mean. It is recommended
that you understand and configure each variable in the order in which it
is listed below. Open the file "site_Search.conf" and configure each option
as follows.
$check_script Possible values "0" or "1". The first time you actually install, configure and run this script on your server set the value to "1". This lets site Search call a module to check the validity of your options in the "site_Search.conf" file. If site Search finds any errors in your configurations due to invalid directories etc., it will inform you to rectify it. Once site Search okays your configurations, you can set the value to "0" and proceed. Though this module does very preliminary checks, it eliminates the need to do a repetitive "error-check". You should always first run the script with the value set to "1" if you do a major change in the configuration.
$desc1/$desc2 Possible values "0" or "1". When site Search displays the output of a search, it displays among other things a description about the file that contained the search term(s). Now the description you choose can either be included from a separate file, "site_Search.desc", or extracted from the <META NAME="DESCRIPTION"> tag of each file that was recorded as a hit. If you want the description to be included from the "site_Search.desc" file set the value of $desc1 to "1" or else set it to "0". If you set the value to "0", site Search will the extract the description from the <META> tag. However, if the value of $desc2 is also set to "0" no description is outputted. To create a description file read the section "Creating a description file" in the FAQ.html file. Note:- If both of the variables ($desc1 and $desc2) are set to "1", $desc2 supersedes $desc1.
$base_path This variable identifies the complete directory path to your server's base "HTML" directory. In other words the base directory that contains the files you want site Search to search. It has to be an absolute path for example a typical entry would look something like,
$base_path = "/home/apache_1.2.4/public_html";@directories_to_avoid This variable identifies directories within the "$base_path" that you want avoided when site Search gathers the files to be searched. You can provide more than one directory but remember to separate each entry with a comma. A typical entry would look like,
@directories_to_avoid = ("$base_path/scratch","$base_path/icons");@files_to_avoid This variable identifies particular files within the "$base_path" that you want avoided when site Search gathers the files to besearched. You can provide more than one file but remember to give a complete path name as there could be a file with the same name in more than one directory. A typical entry would look like,
@files_to_avoid = ("$base_path/robots.txt","$base_path/some_dir/robots.txt");@files_to_include This variable identifies any specific file or files that you would like to be included explicitly in the search. You can provide more than one file but remember to give a complete path name as there could be a file with the same name in more than one directory. A typical entry would look like,
@files_to_include = ("$base_path/myfile.htm","$base_path/myfile.txt");@filetypes This variable identifies the particular types of files that you want site Search to search. You can provide more than one filetype but remember to separate each entry with a comma. A typical entry would look like,
@filetypes = (".html",".shtml",".txt");$multi_display Possible values "0" or "1". As a web-administrator if you want to provide the end-user with the facility of viewing their search output page wise (for example "5 Hits per Page" or "10 Hits per Page" etc. ...) Then you should set the value to "1". However if the end-user would always get all of the search output on one page (which can be very frustrating if for instance there are over 100 hits), then
$scratch If you are going to setting the value of "$multi_display" to "1", then site Search would require a temporary directory within the "$base_path" to store the results of a search query. Remember you would have to create a directory for site Search and it should be within the "$base_path". A typical entry would look like,
$scratch = "scratch";where "scratch" is the name of the directory that you have created under "$base_path". You would also like to look at the FAQ.html for questions regarding this variable.
$clean_scratch Possible values "0" or "1". Since site Search would use the scratch directory for storing temporary files if the value of "$multi_display" is set to "1", temporay files can get accumulated in this directory. If you want site Search to clean the directory set the value of "$clean_scratch" to "1". For understanding how site Search decides the files to be deleted look at the next option.
$empty_scratch Possible values: Any valid number, without quotes. If you set the value of the "$clean_scratch" variable to "1", then site Search would clean the scratch directory specified by the "$scratch" variable. You can decide the age of the files that have to be deleted. For example if users are using the search engine a lot then a large amount of files would get accumulated in the temporary directory. In this case set the value low say 0.25 which stands for quarter of a day. The value you provide is calculated in days . For example providing the value 5 would delete all files that have not been accessed for 5 days. A good value is 0.5 which is half a day.
$record_usage Possible values "0" or "1". Setting
the value of this variable to "1" records the use of site Search. The record
is maintained in a file called "site_Search.usage". As a web-administrator,
you can better maintain your web-site if you know the kind of information
users seek from your web-site. Each record is written in the following
format,
system date|ip address of the host using site Search|machine name|search term(s)$title_for_search_page This variable provides the <TITLE> tag of the search output. A typical entry would look like,
$title_for_search_page = "My WEB-SITE's Search Results";$name_of_URI This variable identifies the proper WWW address of your web-site. A typical entry would look like,
$name_of_URI = "http://www.agl.uh.edu";$path_to_searchform This variable identifies the URL address to the search FORM that users would use to search your web-site. A typical entry would look like,
$path_to_searchform = "http://www.agl.uh.edu/search.html";$output_hits Possible values "0" or "1". If the value is set to "1", site Search will display the number of occurrences of each search term(s) within every file in the output. You would also like to look at the FAQ.html for questions regarding this variable.
$output_line Possible values "0" or "1". If the value is set to "1", site Search will display the first line containing the search term along with the lines before and after it. This gives the user a sense of what to expect when he actually looks at the file. You would also like to look at the FAQ.html for questions regarding this variable.
$smart Possible values "0" or "1". If you set the value of "$output_line" to "1" then site Search would search each file for the first line containing the search term(s). Now, the search pattern to search for lines is specified by the "$smart" variable. If the value is set to "1" it does a smart search. However you would like to set this to "0" as I have found the output lines to quite satisfactory even if it does a seemingly "dumb" search.
NOTE:-The next three options can be understood by looking at a screen snapshot of the search results under a browser.
$output_form Possible values "0" or "1". If you set the value to "1" a FORM displayed at the end of the output in the above screen is included. This FORM lets the end-user search only the hits on the output page. This can help narrow down the possiblities to a minimum. To disable this set the value to "0".
$form_to_use Possible values "internal" or complete UNIX path name to the HTML FORM you want to include. This variable is useful only if the value of "$output_form" is "1". Either the FORM is generated by site Search if the value is "internal" or you can display your own custom FORM. Remember to remove tags like <HTML>,</HTML>,<TITLE>,</TITLE> as this form is going to just inserted into the HTML file generated by site Search.
$program_to_use This variable identifies the CGi program you would be using for the form. This can be site Search itself or if you want some other custom program of your choice. The value of this variable should be the complete HTTP path to the program. For example,
$program_to_use = "http://www.mysite.com/cgi-bin/site_Search.cgi";
Error-1:- The configuration file used by site Search cannot be located in the current directory ie:- the directory "site Search" is resident. Either the file has accidentally got deleted or it does not have read permissions. You could either install site Search again or create your own "site_Search.conf" file by using the options listed above.
Error-2:- The "cgi-lib.pl" library used by site Search is not available. site Search uses "cgi-lib.pl" to parse the contents of the FORM, though future versions won't require the library. This library has to be downloaded from the "cgi-lib.pl" Home-Page and installed in your perl library directory. To get the perl library directory, type "perl -V" at the command line.
Error-3:- You have set the value of $desc1 to "1" which means you would like site Search to load a file called "site_Search.desc" from the directory site Search is resident. To understand the meaning of "$desc1" variable refer to part 1. Either the file is not present or it does not have read permissions.
Error-4:- The "$base_path" variable you specified in the "site_Search.conf" file does not seem to a valid directory or does not have read permissions. Check the variable "$base_path" in the "site_Search.conf" file.
Error-5:- No filetypes have been specified to search in the "@filetypes" variable. Refer part 1 to understand the meaning of this variable and configure it accordingly.
Error-6:- The scratch directory specified by the "$scratch" variable either does not exist, does not have read, wirte and/or execute permissions. Also the scratch directory has to within the "$base_path".
Error-7:- The variable "$record_usage" was set to "1" in the "site_Search.conf" file which instructs site Search to record its usage. The usage is recorded in a file called "site_Search.usage" within the directory site Search is resident. Either such a file is already present and does not have write permissions or the permissions of the site Search diirectory is a causing a conflict.
Error-8:- A valid HTTP address has to be specified for the "$path_to_searchform" variable in the "site_Search.conf" file. Something like "http://www.mysite.com/search.html".
Error-9:- You specified a FORM to be included in the search output. But the FORM cannot be located in the "$form_to_use" variable. The FORM has to be a valid HTML FORM. If you want the internal FORM to be used instead , replace the value of the variable as "internal".