Crawl config.xml

From TeamWeaverWiki

Jump to: navigation, search

The file repo_config.xml in the \teamweaverIS-backend\WEB-INF\conf of your TeamWeaverIS backend allows you to group repositories to be crawled (c.f. repo_config.xml) into "crawls" which can be triggered by the TeamWeaverIS backend.

You need at least one crawl to be defined in order to be able to crawl data sources. Defining multiple crawls makes sense, if you would like to group repositories to treat them differently - e.g. one group of repositories to be crawled only once a week while another one is crawled every night.

Example crawl_config.xml

<?xml version="1.0" encoding="UTF-8"?>
<crawl_config>
	<crawlInfo>
		<crawlId>0</crawlId>
		<crawlName>nightly crawl</crawlName>
		<updateFrequency>3600000</updateFrequency>
		<repositoryIds>
			<repositoryId>1</repositoryId>
			<repositoryId>2</repositoryId>
			<repositoryId>3</repositoryId>
		</repositoryIds>		
	</crawlInfo>
</crawl_config>

Documentation of parameters

  • crawl_config.xml contains <crawlInfo> entries for each single crawl that should be possible to start from the command line ("crawl.bat 0")
  • The parameters inside the <crawlInfo> element are as follows:
    • <crawlId>0</crawlId> - unique numeric id of the crawl which is used from the command line to reference the crawl
    • <crawlName>nightly crawl</crawlName> - human readable description for logging/documentation purposes
    • <updateFrequency>3600000</updateFrequency> - parameter is currently not used (OPTIONAL)
    • <repositoryIds><repositoryId>1</repositoryId><repositoryId>2</repositoryId><repositoryId>3</repositoryId></repositoryIds> - insert one <repositoryId> element for each repository to be included in that crawl (use the Id define int repo_config.xml for referencing)
This page was last modified on 17 June 2009, at 09:23. This page has been accessed 5,681 times.