Update the Search Index for Multilingual Datasets
If you are using the Metadata Server, either because you have set up multilingual datasets or you are using a relational database to store metadata, then you need to use the BuildMetadataSearchIndex script to keep the search index updated whenever there is a change to either the catalogue or the external multilingual/metadata database. If you are not using Metadata Server, follow the process for Single Language Datasets instead.
To update the search index you need to run the script BuildMetadataSearchIndex, which is located in the SuperADMIN program data directory. For example, on Windows if you installed to the default location, it will be located at C:\ProgramData\STR\SuperADMIN\MetaData\MetaDataUtilities\BuildMetadataSearchIndex.bat
Step 1 - Update databases.txt
This is a text file that specifies which SuperSTAR datasets you want to include in the index. It is located in the same directory as the indexing script.
You need to update this file so that it contains a list of all the datasets on your deployment that you want to be indexed.
You should include all your datasets in the index. If you have datasets that not all users have access to, then you should still include these in your index. SuperADMIN will automatically take care of permissions when a user searches in SuperWEB2. The search results will automatically be filtered so that they only contain results from datasets and fields that the user has permission to access.
You can either update databases.txt manually or use the createdatabaselist
command in SuperADMIN.
The file must use the following format (if you use the SuperADMIN command it will generate a list of all of your SuperSTAR datasets in this format):
<dataset_id>|<display_name>|<full_path_to_SXV4>
Where:
<dataset_id> | The ID of the dataset in the SuperSTAR catalogue. |
---|---|
<display_name> | The dataset display name from the SuperSTAR catalogue. |
<full_path_to_SXV4> | The full path to the .sxv4 file that contains the dataset but without the .sxv4 file extension. |
For example, the shipped databases.txt file is as follows. This would instruct the batch process to index the sample People and Retail Banking datasets:
people|people|C:\ProgramData\STR\SuperSERVER SA\databases\People
bank|bank|C:\ProgramData\STR\SuperSERVER SA\databases\RetailBanking
When you have finished editing databases.txt, save the file.
You are recommended to save this file in the same location as the standard shipped file. If for any reason you want to save the file to a different location, you must update SET DB_FILE_LIST="databases.txt"
in BuildMetadataSearchIndex before running the script so that it contains the full path to your new location of the databases.txt file.
Step 2 - Configure BuildMetadataSearchIndex
Before you can run the indexing script, you need to configure some settings so that it can connect to your external metadata database.
Open BuildMetadataSearchIndex in a text editor. Modify the following lines:
Setting: | Make This Change: |
---|---|
INDEX_DIR | (Linux Only): Set this to the full path of the MetaDataUtilities directory where the indexing script is located. For example: export INDEX_DIR=/home/str/superadmin/Metadata/MetaDataUtilities |
DESTINATION_FOLDER | (Linux Only): Set this to the full path to the location where you want to generate the index files. This will need to match the configuration in SuperADMIN, which is set to use the meta_search_index directory in the SuperADMIN program data directory by default. See the next step for more details on this website. |
DB_DRIVER_CLASS | Add the details of the JDBC database driver to use to connect to your external database. For example, to use the Microsoft SQL Server JDBC driver, set the driver class as follows: |
SET DB_DRIVER_CLASS="com.microsoft.sqlserver.jdbc.SQLServerDriver" | |
DB_DRIVER_LOCATION | Add the full path to the location of the JDBC driver (jar file) on your system. For example: |
SET DB_DRIVER_LOCATION="C:\Drivers\mssql-jdbc-7.2.2.jre11.jar" | |
DB_URL | Add the connection string the script will use to connect to your metadata database. For example, for SQL Server the connection string is similar to the following: |
SET DB_URL="jdbc:sqlserver://MYSERVER;databaseName=Metadata;user=mydbuser;password=myuserpassword;" | |
Replace the server and user details with the appropriate values for your system. In this example:
| |
REPOSITORY | Add the repository ID. This must be the same value that you used when you created the external metadata database. For example: |
SET REPOSITORY=metadatadbid |
Step 3 - (Optional) Configure Incremental or Full Updates
By default, the search index will be incrementally updated. This minimises the time required to build the index. However, you can configure a full rebuild of the index if required.
If running a full update, check that you have plenty of available disk space. The exact amount required to store the final index will depend on the size of your dataset catalogue, but can be several gigabytes for a large catalogue. In addition, the indexing process will create some temporary files while it builds the index. These will be cleaned up automatically at the end of the process, but may cause the index to temporarily grow to around double its final size before these files are cleaned up.
If you encounter disk space errors during the indexing, you should either increase the size of the disk or use the DESTINATION_FOLDER
setting to change the location of the index to a partition that has sufficient disk space.
The settings to select an incremental or full update are located in BuildMetadataSearchIndex
INCREMENTAL | Determines whether the update is incremental or full.
If there is no existing search index, a new index will be created, regardless of this setting being true. If performing an incremental update, make sure you do not have datasets listed simultaneously in both databases.txt and remove_databases.txt. | ||||
---|---|---|---|---|---|
SKIP_EXISTING | Determines whether to skip indexing for any datasets that already exist in the index. This setting only takes effect in
|
Step 4 - Check the Index Location in BuildMetadataSearchIndex
There is a setting in the indexing script (DESTINATION_FOLDER
) that instructs it where to generate the index files. There is also a setting in SuperADMIN that determines where it will look for the generated index when a user performs a search (you can check this using the command gc search indexDirectory
).
By default, these are both set to the meta_search_index directory in the SuperADMIN program data directory (by default, C:\ProgramData\STR\SuperADMIN\server\meta_search_index on Windows). The predefined index that is supplied with SuperSTAR (which covers the sample Retail Banking and People datasets) is located in this directory.
Step 5 - Check the Thread Count Setting
By default, the indexing script is configured to use all the available cores/threads on the machine running the indexing process. This will provide maximum possible performance/speed when building the index, but may cause issues if that machine is being used for other purposes (for example if the machine is currently running SuperSERVER to process tabulation queries, as this will delay or block the processing of those requests while the index is being built).
You can change this setting according to your requirements by modifying the value of INDEXING_THREAD_COUNT
in the BuildMetadataSearchIndex script. Choose one of the following values:
0 | (Default). The indexing process will use one thread per core across all the available cores on the machine running the indexing process. This will provide the fastest possible index update, but may block other activities on that machine (such as preventing or delaying SuperSERVER from running tabulations). |
---|---|
A negative integer value | Use the available cores minus the absolute number specified. For example, setting You should set this to a value that is less than the number of available cores. If you accidentally set this to a value that is equal to or greater than the available cores, then the indexing process will use 1 core. |
A positive integer value | Use the specified number of threads for indexing. For example, setting INDEXING_THREAD_COUNT=2 means use exactly 2 threads/cores for the indexing process. |
Step 6 - Run the Indexing Process
Once you have checked all the settings, run BuildMetadataSearchIndex to update the search index and check the results in SuperWEB2.
Tips and Tricks
Deleting Specific Datasets from the Search Index
If you need to remove specific datasets from the index, you can do so using the remove_databases.txt file:
- Use a text editor to add the dataset IDs of each dataset you want to remove from the index, one per line, to remove_databases.txt.
- Make sure these datasets are not also listed in databases.txt.
- Make sure
INCREMENTAL
andSKIP_EXISTING
are both set totrue
in BuildSXV4SearchIndex. - Run BuildMetadataSearchIndex.
Updating to a New Location to Minimise Index Downtime
By applying incremental updates, the extended downtime associated with a full update can be avoided. However there may be situations when a full update is desirable. Normally, when you run the script to fully update the index, the first thing it will do is to remove the existing index, meaning that the index will be unavailable to users until it is rebuilt.
As it may take some time to rebuild the index (particularly if you have a large number of datasets), then you may want to take the following steps to maintain the availability of the old index until the new index is created:
- Update the
DESTINATION_FOLDER
setting in the BuildMetadataSearchIndex script to point to a new location. - Run the script.
- When the script finishes, update SuperADMIN to use the new location.
This will ensure that there is no downtime of the search index and that users can continue to search (using the old index files) while you are doing the update.
To update the index location and rebuild the index:
- Open BuildMetadataSearchIndex in a text editor.
Check the
DESTINATION_FOLDER
setting. By default, this is set as follows:CODEset DESTINATION_FOLDER="%SA_PROGRAM_DATA%\server\meta_search_index"
This indicates that the search index files will be saved to the meta_search_index directory in the SuperADMIN program data directory (by default, C:\ProgramData\STR\SuperADMIN\server\meta_search_index).
Update the location to a new directory. For example:
CODEset DESTINATION_FOLDER="%SA_PROGRAM_DATA%\server\meta_search_index_multilingual_july_2014"
You do not need to create this directory; the index script will automatically create it when it runs. Please note that if you are running the script on Linux then the script does not define the variable
%SA_PROGRAM_DATA%
so you will need to specify the full path instead.- Save your changes to the file.
- Run BuildMetadataSearchIndex, and wait for it to finish indexing your datasets.
Go to SuperADMIN and use the the following command to update the index location to your new location:
CODE> gc search indexDirectory value "meta_search_index_multilingual_july_2014
Go to SuperWEB2 and check that search now includes all your datasets and languages.
Note about Indexing when SuperADMIN and SuperSERVER are Located on Separate Machines
If you have installed SuperADMIN Server and SuperSERVER on separate machines, then the process will be slightly different because you will have to generate the index on the SuperSERVER machine and then copy the completed index files back to the SuperADMIN machine. This is because the indexing scripts are installed as part of SuperADMIN Server, while the SXV4s (which need to be read during the indexing process) will be installed on the SuperSERVER machine.
Follow the process above, with the following differences:
- Generate the databases.txt file on the SuperADMIN machine.
- Copy the entire MetaDataUtilities directory from the SuperADMIN machine to the SuperSERVER machine.
- Follow the steps above to configure paths and other settings in the indexing script. Make sure you set the indexing location in the script to a suitable location somewhere on the SuperSERVER machine.
- When the indexing process finishes, copy the completed index directory back to the SuperADMIN machine.
- Use the
gc
command in SuperADMIN to update the index location to the new set of index files you have copied to the SuperADMIN machine.