Update the Search Index for Single Language Datasets

If you are only using single language datasets, then you can update the search index by running the script BuildSXV4SearchIndex, which is located in the SuperADMIN program data directory. For example, on Windows if you install to the default location, it will be located at C:\ProgramData\STR\SuperADMIN\MetaData\MetaDataUtilities\BuildSXV4SearchIndex.bat

Step 1 - Update databases.txt

This is a text file that specifies which SuperSTAR datasets you want to include in the index. It is located in the same directory as the indexing script.

You need to update this file so that it contains a list of all the datasets on your deployment that you want to be indexed.

You should include all your datasets in the index. If you have datasets that not all users have access to, then you should still include these in your index. SuperADMIN will automatically take care of permissions when a user searches in SuperWEB2. The search results will automatically be filtered so that they only contain results from datasets and fields that the user has permission to access.

You can either update databases.txt manually or use the createdatabaselist command in SuperADMIN.

The file must use the following format (if you use the SuperADMIN command it will generate a list of all of your SuperSTAR datasets in this format):

CODE

<dataset_id>|<display_name>|<full_path_to_SXV4>

Where:

<dataset_id>	The ID of the dataset in the SuperSTAR catalogue.
<display_name>	The dataset display name from the SuperSTAR catalogue.
<full_path_to_SXV4>	The full path to the .sxv4 file that contains the dataset but without the .sxv4 file extension.

For example, the shipped databases.txt file is as follows. This would instruct the batch process to index the sample People and Retail Banking datasets:

CODE

people|people|C:\ProgramData\STR\SuperSERVER SA\databases\People
bank|bank|C:\ProgramData\STR\SuperSERVER SA\databases\RetailBanking

When you have finished editing databases.txt, save the file.

You are recommended to save this file in the same location as the standard shipped file. If for any reason you want to save the file to a different location, you must update SET DB_FILE_LIST="databases.txt" in BuildSXV4SearchIndex before running the script so that it contains the full path to your new location of the databases.txt file.

Step 2 (Linux Only) - Set the Index and Destination Directories

On Linux, you will need to edit some of the variables in the indexing script before you can run it:

INDEX_DIR: set this to the full path of the MetaDataUtilities directory where the indexing script is located. For example: export INDEX_DIR=/home/str/superadmin/Metadata/MetaDataUtilities
DESTINATION_FOLDER: set this to the full path to the location where you want to generate the index files. This will need to match the configuration in SuperADMIN, which is set to use the meta_search_index directory in the SuperADMIN program data directory by default. See the next step for more details on this variable.

Step 3 - (Optional) Configure Incremental Or Full Updates

By default, the search index will be incrementally updated. This minimises the time required to build the index. However, you can configure a full rebuild of the index if required.

If running a full update, check that you have plenty of available disk space. The exact amount required to store the final index will depend on the size of your dataset catalogue, but can be several gigabytes for a large catalogue. In addition, the indexing process will create some temporary files while it builds the index. These will be cleaned up automatically at the end of the process, but may cause the index to temporarily grow to around double its final size before these files are cleaned up.

If you encounter disk space errors during the indexing, you should either increase the size of the disk or use the DESTINATION_FOLDER setting to change the location of the index to a partition that has sufficient disk space.

The settings to select an incremental or full update are located in BuildSXV4SearchIndex

INCREMENTAL

Determines whether the update is incremental or full.

true

(Default). Run an incremental update:

Datasets listed in databases.txt are added to the index if not already present.
Datasets listed in remove_databases.txt are removed from the search index if present.
Any previously indexed datasets that are not listed in either file are retained in the index.

false

Run a full update. The index is fully wiped and all datasets referenced in databases.txt are re-indexed from scratch.

If there is no existing search index, then a new index will always be created, regardless of this setting being true.

If performing an incremental update, make sure you do not have datasets listed simultaneously in both databases.txt and remove_databases.txt.

SKIP_EXISTING

Determines whether to skip indexing for any datasets that already exist in the index. This setting only takes effect if INCREMENTAL is true

true	(Default). Skip re-indexing for any datasets that already exist in the index (only new datasets will be added to the index).
false	Existing datasets will be re-indexed (alongside those newly added).

Step 4 - Check the Index Location in BuildSXV4SearchIndex

There is a setting in the script (DESTINATION_FOLDER) that instructs it where to generate the index files. There is also a setting in SuperADMIN that determines where it will look for the generated index when a user performs a search (you can check this using the command gc search indexDirectory).

By default, these are both set to the meta_search_index directory in the SuperADMIN program data directory (by default, C:\ProgramData\STR\SuperADMIN\server\meta_search_index on Windows). The predefined index that is supplied with SuperSTAR (which covers the sample Retail Banking and People databases) is located in this directory.

Step 5 - Check the Thread Count Setting

By default, the indexing script is configured to use all the available cores/threads on the machine running the indexing process. This will provide maximum possible performance/speed when building the index, but may cause issues if that machine is being used for other purposes (for example if the machine is currently running SuperSERVER to process tabulation queries, as this will delay or block the processing of those requests while the index is being built).

You can change this setting according to your requirements by modifying the value of INDEXING_THREAD_COUNT in the BuildSXV4SearchIndex script. Choose one of the following values:

(Default). The indexing process will use one thread per core across all the available cores on the machine running the indexing process. This will provide the fastest possible index update, but may block other activities on that machine (such as preventing or delaying SuperSERVER from running tabulations).

A negative integer value

Use the available cores minus the absolute number specified. For example, setting INDEXING_THREAD_COUNT=-1 means use all the available cores except for 1 (i.e. 1 core will be left free for other activity).

You should set this to a value that is less than the number of available cores. If you accidentally set this to a value that is equal to or greater than the available cores, then the indexing process will use 1 core.

A positive integer value

Use the specified number of threads for indexing. For example, setting INDEXING_THREAD_COUNT=2 means use exactly 2 threads/cores for the indexing process.

Step 6 - Run the Indexing Process

Once you have checked all the settings, run BuildSXV4SearchIndex to update the search index and check the results in SuperWEB2.

Tips and Tricks

Deleting Specific Datasets from the Search Index

If you need to remove specific datasets from the index, you can do so using the remove_databases.txt file:

Use a text editor to add the dataset IDs of each dataset you want to remove from the index, one per line, to remove_databases.txt.
Make sure these datasets are not also listed in databases.txt.
Make sure INCREMENTAL and SKIP_EXISTING are both set to true in BuildSXV4SearchIndex.
Run BuildSXV4SearchIndex.

Updating to a New Location to Minimise Index Downtime

By applying incremental updates, the extended downtime associated with a full update can be avoided. However there may be situations when a full update is desirable. Normally, when you run the script to fully update the index, the first thing it will do is to remove the existing index, meaning that the index will be unavailable to users until it is rebuilt.

As it may take some time to rebuild the index (particularly if you have a large number of datasets), then you may want to take the following steps to maintain the availability of the old index until the new index is created:

Update the DESTINATION_FOLDER setting in the BuildSXV4SearchIndex script to point to a new location.
Run the script.
When the script finishes, update SuperADMIN to use the new location.

This will ensure that there is no downtime of the search index and that users can continue to search (using the old index files) while you are doing the update.

The following steps assume you are going to build the index to a new location to avoid downtime. If you are not concerned about search being unavailable, then you can simply execute the indexing script using the default output location. Once it has completed, log in to SuperWEB2 and check that search now includes all your datasets.

To update the index location and rebuild the index:

Open BuildSXV4SearchIndex in a text editor.
Check the DESTINATION_FOLDER setting. By default, this is set as follows:
CODE
```
set DESTINATION_FOLDER="%SA_PROGRAM_DATA%\server\meta_search_index"
```
This indicates that the search index files will be saved to the meta_search_index directory in the SuperADMIN program data directory (by default, C:\ProgramData\STR\SuperADMIN\server\meta_search_index).
Update the location to a new directory. For example:
CODE
```
set DESTINATION_FOLDER="%SA_PROGRAM_DATA%\server\meta_search_index_july_2014"
```
You do not need to create this directory; the index script will automatically create it when it runs. Please note that if you are running the script on Linux then the script does not define the variable %SA_PROGRAM_DATA% so you will need to specify the full path instead.
Save your changes to the file.
Run BuildSXV4SearchIndex, and wait for it to finish indexing your datasets.
Go to SuperADMIN and use the the following command to update the index location to your new location:
CODE
```
> gc search indexDirectory value "meta_search_index_july_2014"
```
Go to SuperWEB2 and check that search now includes all your datasets.

Note about Indexing when SuperADMIN and SuperSERVER are Located on Separate Machines

If you have installed SuperADMIN Server and SuperSERVER on separate machines, then the process will be slightly different because you will have to generate the index on the SuperSERVER machine and then copy the completed index files back to the SuperADMIN machine. This is because the indexing scripts are installed as part of SuperADMIN Server, while the SXV4s (which need to be read during the indexing process) will be installed on the SuperSERVER machine.

Follow the process above, with the following differences:

Generate the databases.txt file on the SuperADMIN machine.
Copy the entire MetaDataUtilities directory from the SuperADMIN machine to the SuperSERVER machine.
Follow the steps above to configure paths and other settings in the indexing script. Make sure you set the indexing location in the script to a suitable location somewhere on the SuperSERVER machine.
When the indexing process finishes, copy the completed index directory back to the SuperADMIN machine.
Use the gc command in SuperADMIN to update the index location to the new set of index files you have copied to the SuperADMIN machine.