Update the Search Index for Multilingual Datasets

If you are using the Metadata Server, either because you have set up multilingual datasets or you are using a relational database to store metadata, then you need to use the BuildMetadataSearchIndex script to keep the search index updated whenever there is a change to either the catalogue or the external multilingual/metadata database. If you are not using Metadata Server, follow the process for Single Language Datasets instead.

When you have multilingual datasets and search configured, search queries will search through the names of fields and values in the currently selected dataset language.

To update the search index you need to run the script BuildMetadataSearchIndex, which is located in the SuperADMIN program data directory. For example, on Windows if you installed to the default location, it will be located at C:\ProgramData\STR\SuperADMIN\MetaData\MetaDataUtilities\BuildMetadataSearchIndex.bat

Step 1 - Update databases.txt

This is a text file that specifies which SuperSTAR datasets you want to include in the index. It is located in the same directory as the indexing script.

You need to update this file so that it contains a list of all the datasets on your deployment that you want to be indexed.

You should include all your datasets in the index. If you have datasets that not all users have access to, then you should still include these in your index. SuperADMIN will automatically take care of permissions when a user searches in SuperWEB2. The search results will automatically be filtered so that they only contain results from datasets and fields that the user has permission to access.

You can either update databases.txt manually or use the createdatabaselist command in SuperADMIN.

The file must use the following format (if you use the SuperADMIN command it will generate a list of all of your SuperSTAR datasets in this format):

CODE

<dataset_id>|<display_name>|<full_path_to_SXV4>

Where:

<dataset_id>	The ID of the dataset in the SuperSTAR catalogue.
<display_name>	The dataset display name from the SuperSTAR catalogue.
<full_path_to_SXV4>	The full path to the .sxv4 file that contains the dataset but without the .sxv4 file extension.

For example, the shipped databases.txt file is as follows. This would instruct the batch process to index the sample People and Retail Banking datasets:

CODE

people|people|C:\ProgramData\STR\SuperSERVER SA\databases\People
bank|bank|C:\ProgramData\STR\SuperSERVER SA\databases\RetailBanking

When you have finished editing databases.txt, save the file.

You are recommended to save this file in the same location as the standard shipped file. If for any reason you want to save the file to a different location, you must update SET DB_FILE_LIST="databases.txt" in BuildMetadataSearchIndex before running the script so that it contains the full path to your new location of the databases.txt file.

Step 2 - Configure BuildMetadataSearchIndex

Before you can run the indexing script, you need to configure some settings so that it can connect to your external metadata database.

You can use the same JDBC driver that you used to create the external metadata database (when you ran BuildMetadataTemplate.bat during the Metadata Server set up process).

Open BuildMetadataSearchIndex in a text editor. Modify the following lines:

Setting:	Make This Change:
INDEX_DIR	(Linux Only): Set this to the full path of the MetaDataUtilities directory where the indexing script is located. For example: `export INDEX_DIR=/home/str/superadmin/Metadata/MetaDataUtilities`
DESTINATION_FOLDER	(Linux Only): Set this to the full path to the location where you want to generate the index files. This will need to match the configuration in SuperADMIN, which is set to use the meta_search_index directory in the SuperADMIN program data directory by default. See the next step for more details on this website.
DB_DRIVER_CLASS	Add the details of the JDBC database driver to use to connect to your external database. For example, to use the Microsoft SQL Server JDBC driver, set the driver class as follows:
DB_DRIVER_CLASS	SET DB_DRIVER_CLASS="com.microsoft.sqlserver.jdbc.SQLServerDriver"
DB_DRIVER_LOCATION	Add the full path to the location of the JDBC driver (jar file) on your system. For example:
DB_DRIVER_LOCATION	SET DB_DRIVER_LOCATION="C:\Drivers\mssql-jdbc-7.2.2.jre11.jar"
DB_URL	Add the connection string the script will use to connect to your metadata database. For example, for SQL Server the connection string is similar to the following:
	SET DB_URL="jdbc:sqlserver://MYSERVER;databaseName=Metadata;user=mydbuser;password=myuserpassword;"
	Replace the server and user details with the appropriate values for your system. In this example: MYSERVER is the server name. Metadata is the name of the metadata database you created in the previous step. mydbuser is the user account to connect to the database with. myuserpassword is that user's password.
REPOSITORY	Add the repository ID. This must be the same value that you used when you created the external metadata database. For example:
REPOSITORY	SET REPOSITORY=metadatadbid

Step 3 - (Optional) Configure Incremental or Full Updates

By default, the search index will be incrementally updated. This minimises the time required to build the index. However, you can configure a full rebuild of the index if required.

If running a full update, check that you have plenty of available disk space. The exact amount required to store the final index will depend on the size of your dataset catalogue, but can be several gigabytes for a large catalogue. In addition, the indexing process will create some temporary files while it builds the index. These will be cleaned up automatically at the end of the process, but may cause the index to temporarily grow to around double its final size before these files are cleaned up.

If you encounter disk space errors during the indexing, you should either increase the size of the disk or use the DESTINATION_FOLDER setting to change the location of the index to a partition that has sufficient disk space.

The settings to select an incremental or full update are located in BuildMetadataSearchIndex

INCREMENTAL

Determines whether the update is incremental or full.

true

(Default). Run an incremental update:

Datasets listed in databases.txt are added to the index if not already present.
Datasets listed in remove_databases.txt are removed from the existing search index if present.
Any previously indexed datasets that are not listed in either file are retained in the index.

false

Run a full update. The index is fully wiped and all datasets referenced in databases.txt are re-indexed from scratch.

If there is no existing search index, a new index will be created, regardless of this setting being true.

If performing an incremental update, make sure you do not have datasets listed simultaneously in both databases.txt and remove_databases.txt.

SKIP_EXISTING

Determines whether to skip indexing for any datasets that already exist in the index. This setting only takes effect in INCREMENTAL is true.

true	(Default). Skip re-indexing for any datasets that already exist in the index (only new datasets will be added to the index).
false	Existing databases will be re-indexed (alongside those newly added).

Step 4 - Check the Index Location in BuildMetadataSearchIndex

There is a setting in the indexing script (DESTINATION_FOLDER) that instructs it where to generate the index files. There is also a setting in SuperADMIN that determines where it will look for the generated index when a user performs a search (you can check this using the command gc search indexDirectory).

By default, these are both set to the meta_search_index directory in the SuperADMIN program data directory (by default, C:\ProgramData\STR\SuperADMIN\server\meta_search_index on Windows). The predefined index that is supplied with SuperSTAR (which covers the sample Retail Banking and People datasets) is located in this directory.

Step 5 - Check the Thread Count Setting

By default, the indexing script is configured to use all the available cores/threads on the machine running the indexing process. This will provide maximum possible performance/speed when building the index, but may cause issues if that machine is being used for other purposes (for example if the machine is currently running SuperSERVER to process tabulation queries, as this will delay or block the processing of those requests while the index is being built).

You can change this setting according to your requirements by modifying the value of INDEXING_THREAD_COUNT in the BuildMetadataSearchIndex script. Choose one of the following values:

(Default). The indexing process will use one thread per core across all the available cores on the machine running the indexing process. This will provide the fastest possible index update, but may block other activities on that machine (such as preventing or delaying SuperSERVER from running tabulations).

A negative integer value

Use the available cores minus the absolute number specified. For example, setting INDEXING_THREAD_COUNT=-1 means use all the available cores except for 1 (i.e. 1 core will be left free for other activity).

You should set this to a value that is less than the number of available cores. If you accidentally set this to a value that is equal to or greater than the available cores, then the indexing process will use 1 core.

A positive integer value Use the specified number of threads for indexing. For example, setting INDEXING_THREAD_COUNT=2 means use exactly 2 threads/cores for the indexing process.

Step 6 - Run the Indexing Process

Once you have checked all the settings, run BuildMetadataSearchIndex to update the search index and check the results in SuperWEB2.

Tips and Tricks

Deleting Specific Datasets from the Search Index

If you need to remove specific datasets from the index, you can do so using the remove_databases.txt file:

Use a text editor to add the dataset IDs of each dataset you want to remove from the index, one per line, to remove_databases.txt.
Make sure these datasets are not also listed in databases.txt.
Make sure INCREMENTAL and SKIP_EXISTING are both set to true in BuildSXV4SearchIndex.
Run BuildMetadataSearchIndex.

Updating to a New Location to Minimise Index Downtime

By applying incremental updates, the extended downtime associated with a full update can be avoided. However there may be situations when a full update is desirable. Normally, when you run the script to fully update the index, the first thing it will do is to remove the existing index, meaning that the index will be unavailable to users until it is rebuilt.

As it may take some time to rebuild the index (particularly if you have a large number of datasets), then you may want to take the following steps to maintain the availability of the old index until the new index is created:

Update the DESTINATION_FOLDER setting in the BuildMetadataSearchIndex script to point to a new location.
Run the script.
When the script finishes, update SuperADMIN to use the new location.

This will ensure that there is no downtime of the search index and that users can continue to search (using the old index files) while you are doing the update.

The following steps assume you are going to build the index to a new location to avoid downtime. If you are not concerned about search being unavailable, then you can simply execute the indexing script. Once it has completed, log in to SuperWEB2 and check that search now includes all your datasets.

To update the index location and rebuild the index:

Open BuildMetadataSearchIndex in a text editor.
Check the DESTINATION_FOLDER setting. By default, this is set as follows:
CODE
```
set DESTINATION_FOLDER="%SA_PROGRAM_DATA%\server\meta_search_index"
```
This indicates that the search index files will be saved to the meta_search_index directory in the SuperADMIN program data directory (by default, C:\ProgramData\STR\SuperADMIN\server\meta_search_index).
Update the location to a new directory. For example:
CODE
```
set DESTINATION_FOLDER="%SA_PROGRAM_DATA%\server\meta_search_index_multilingual_july_2014"
```
You do not need to create this directory; the index script will automatically create it when it runs. Please note that if you are running the script on Linux then the script does not define the variable %SA_PROGRAM_DATA% so you will need to specify the full path instead.
Save your changes to the file.
Run BuildMetadataSearchIndex, and wait for it to finish indexing your datasets.
Go to SuperADMIN and use the the following command to update the index location to your new location:
CODE
```
> gc search indexDirectory value "meta_search_index_multilingual_july_2014
```
Go to SuperWEB2 and check that search now includes all your datasets and languages.

Note about Indexing when SuperADMIN and SuperSERVER are Located on Separate Machines

If you have installed SuperADMIN Server and SuperSERVER on separate machines, then the process will be slightly different because you will have to generate the index on the SuperSERVER machine and then copy the completed index files back to the SuperADMIN machine. This is because the indexing scripts are installed as part of SuperADMIN Server, while the SXV4s (which need to be read during the indexing process) will be installed on the SuperSERVER machine.

Follow the process above, with the following differences:

Generate the databases.txt file on the SuperADMIN machine.
Copy the entire MetaDataUtilities directory from the SuperADMIN machine to the SuperSERVER machine.
Follow the steps above to configure paths and other settings in the indexing script. Make sure you set the indexing location in the script to a suitable location somewhere on the SuperSERVER machine.
When the indexing process finishes, copy the completed index directory back to the SuperADMIN machine.
Use the gc command in SuperADMIN to update the index location to the new set of index files you have copied to the SuperADMIN machine.