Update the Search Index for Single Language Datasets
If you are only using single language datasets, then you can update the search index by running the script BuildSXV4SearchIndex, which is located in the SuperADMIN program data directory. For example, on Windows if you install to the default location, it will be located at C:\ProgramData\STR\SuperADMIN\MetaData\MetaDataUtilities\BuildSXV4SearchIndex.bat
Step 1 - Update databases.txt
This is a text file that specifies which SuperSTAR datasets you want to include in the index. It is located in the same directory as the indexing script.
You need to update this file so that it contains a list of all the datasets on your deployment that you want to be indexed.
You should include all your datasets in the index. If you have datasets that not all users have access to, then you should still include these in your index. SuperADMIN will automatically take care of permissions when a user searches in SuperWEB2. The search results will automatically be filtered so that they only contain results from datasets and fields that the user has permission to access.
You can either update databases.txt manually or use the createdatabaselist
command in SuperADMIN.
The file must use the following format (if you use the SuperADMIN command it will generate a list of all of your SuperSTAR datasets in this format):
<dataset_id>|<display_name>|<full_path_to_SXV4>
Where:
<dataset_id> | The ID of the dataset in the SuperSTAR catalogue. |
---|---|
<display_name> | The dataset display name from the SuperSTAR catalogue. |
<full_path_to_SXV4> | The full path to the .sxv4 file that contains the dataset but without the .sxv4 file extension. |
For example, the shipped databases.txt file is as follows. This would instruct the batch process to index the sample People and Retail Banking datasets:
people|people|C:\ProgramData\STR\SuperSERVER SA\databases\People
bank|bank|C:\ProgramData\STR\SuperSERVER SA\databases\RetailBanking
When you have finished editing databases.txt, save the file.
You are recommended to save this file in the same location as the standard shipped file. If for any reason you want to save the file to a different location, you must update SET DB_FILE_LIST="databases.txt"
in BuildSXV4SearchIndex before running the script so that it contains the full path to your new location of the databases.txt file.
Step 2 (Linux Only) - Set the Index and Destination Directories
On Linux, you will need to edit some of the variables in the indexing script before you can run it:
INDEX_DIR
: set this to the full path of the MetaDataUtilities directory where the indexing script is located. For example:export INDEX_DIR=/home/str/superadmin/Metadata/MetaDataUtilities
DESTINATION_FOLDER
: set this to the full path to the location where you want to generate the index files. This will need to match the configuration in SuperADMIN, which is set to use the meta_search_index directory in the SuperADMIN program data directory by default. See the next step for more details on this variable.
Step 3 - (Optional) Configure Incremental Or Full Updates
By default, the search index will be incrementally updated. This minimises the time required to build the index. However, you can configure a full rebuild of the index if required.
If running a full update, check that you have plenty of available disk space. The exact amount required to store the final index will depend on the size of your dataset catalogue, but can be several gigabytes for a large catalogue. In addition, the indexing process will create some temporary files while it builds the index. These will be cleaned up automatically at the end of the process, but may cause the index to temporarily grow to around double its final size before these files are cleaned up.
If you encounter disk space errors during the indexing, you should either increase the size of the disk or use the DESTINATION_FOLDER
setting to change the location of the index to a partition that has sufficient disk space.
The settings to select an incremental or full update are located in BuildSXV4SearchIndex
INCREMENTAL | Determines whether the update is incremental or full.
If there is no existing search index, then a new index will always be created, regardless of this setting being true. If performing an incremental update, make sure you do not have datasets listed simultaneously in both databases.txt and remove_databases.txt. | ||||
---|---|---|---|---|---|
SKIP_EXISTING | Determines whether to skip indexing for any datasets that already exist in the index. This setting only takes effect if
|
Step 4 - Check the Index Location in BuildSXV4SearchIndex
There is a setting in the script (DESTINATION_FOLDER
) that instructs it where to generate the index files. There is also a setting in SuperADMIN that determines where it will look for the generated index when a user performs a search (you can check this using the command gc search indexDirectory
).
By default, these are both set to the meta_search_index directory in the SuperADMIN program data directory (by default, C:\ProgramData\STR\SuperADMIN\server\meta_search_index on Windows). The predefined index that is supplied with SuperSTAR (which covers the sample Retail Banking and People databases) is located in this directory.
Step 5 - Check the Thread Count Setting
By default, the indexing script is configured to use all the available cores/threads on the machine running the indexing process. This will provide maximum possible performance/speed when building the index, but may cause issues if that machine is being used for other purposes (for example if the machine is currently running SuperSERVER to process tabulation queries, as this will delay or block the processing of those requests while the index is being built).
You can change this setting according to your requirements by modifying the value of INDEXING_THREAD_COUNT
in the BuildSXV4SearchIndex script. Choose one of the following values:
0 | (Default). The indexing process will use one thread per core across all the available cores on the machine running the indexing process. This will provide the fastest possible index update, but may block other activities on that machine (such as preventing or delaying SuperSERVER from running tabulations). |
---|---|
A negative integer value | Use the available cores minus the absolute number specified. For example, setting You should set this to a value that is less than the number of available cores. If you accidentally set this to a value that is equal to or greater than the available cores, then the indexing process will use 1 core. |
A positive integer value | Use the specified number of threads for indexing. For example, setting |
Step 6 - Run the Indexing Process
Once you have checked all the settings, run BuildSXV4SearchIndex to update the search index and check the results in SuperWEB2.
Tips and Tricks
Deleting Specific Datasets from the Search Index
If you need to remove specific datasets from the index, you can do so using the remove_databases.txt file:
- Use a text editor to add the dataset IDs of each dataset you want to remove from the index, one per line, to remove_databases.txt.
- Make sure these datasets are not also listed in databases.txt.
- Make sure
INCREMENTAL
andSKIP_EXISTING
are both set totrue
in BuildSXV4SearchIndex. - Run BuildSXV4SearchIndex.
Updating to a New Location to Minimise Index Downtime
By applying incremental updates, the extended downtime associated with a full update can be avoided. However there may be situations when a full update is desirable. Normally, when you run the script to fully update the index, the first thing it will do is to remove the existing index, meaning that the index will be unavailable to users until it is rebuilt.
As it may take some time to rebuild the index (particularly if you have a large number of datasets), then you may want to take the following steps to maintain the availability of the old index until the new index is created:
- Update the
DESTINATION_FOLDER
setting in the BuildSXV4SearchIndex script to point to a new location. - Run the script.
- When the script finishes, update SuperADMIN to use the new location.
This will ensure that there is no downtime of the search index and that users can continue to search (using the old index files) while you are doing the update.
To update the index location and rebuild the index:
- Open BuildSXV4SearchIndex in a text editor.
Check the
DESTINATION_FOLDER
setting. By default, this is set as follows:CODEset DESTINATION_FOLDER="%SA_PROGRAM_DATA%\server\meta_search_index"
This indicates that the search index files will be saved to the meta_search_index directory in the SuperADMIN program data directory (by default, C:\ProgramData\STR\SuperADMIN\server\meta_search_index).
Update the location to a new directory. For example:
CODEset DESTINATION_FOLDER="%SA_PROGRAM_DATA%\server\meta_search_index_july_2014"
You do not need to create this directory; the index script will automatically create it when it runs. Please note that if you are running the script on Linux then the script does not define the variable
%SA_PROGRAM_DATA%
so you will need to specify the full path instead.- Save your changes to the file.
- Run BuildSXV4SearchIndex, and wait for it to finish indexing your datasets.
Go to SuperADMIN and use the the following command to update the index location to your new location:
CODE> gc search indexDirectory value "meta_search_index_july_2014"
Go to SuperWEB2 and check that search now includes all your datasets.
Note about Indexing when SuperADMIN and SuperSERVER are Located on Separate Machines
If you have installed SuperADMIN Server and SuperSERVER on separate machines, then the process will be slightly different because you will have to generate the index on the SuperSERVER machine and then copy the completed index files back to the SuperADMIN machine. This is because the indexing scripts are installed as part of SuperADMIN Server, while the SXV4s (which need to be read during the indexing process) will be installed on the SuperSERVER machine.
Follow the process above, with the following differences:
- Generate the databases.txt file on the SuperADMIN machine.
- Copy the entire MetaDataUtilities directory from the SuperADMIN machine to the SuperSERVER machine.
- Follow the steps above to configure paths and other settings in the indexing script. Make sure you set the indexing location in the script to a suitable location somewhere on the SuperSERVER machine.
- When the indexing process finishes, copy the completed index directory back to the SuperADMIN machine.
- Use the
gc
command in SuperADMIN to update the index location to the new set of index files you have copied to the SuperADMIN machine.