Enterprise Search v1.0 (c) 2004, Innerprise. http://www.innerprise.net/ ============ INSTALLATION ============ Full Install: See included readme.txt Update: Simply copy the new spider.exe over your old spider.exe in your Enterprise Search directory (C:\Program Files\Innerprise\Enterprise Search\ by default) =================== BUG FIXES & CHANGES =================== -2004- July 21 (MS SQL) - Fixed bug with cached documents that was caused by the last update. July 19 (MS SQL) - Fixed bug that prevented the spider from adding documents to the docs table for some users. - Fixed bug with cached documents that removed class and other elements from the anchor tags. April 5  (MS SQL/MySQL) - Character set feature is now working again. (MySQL) - Fixed bug which prevented adding URLs from the command-line  March 10  (MS SQL/MySQL) - Reports are that exception #00402619 has now been fixed in this build. March 1  (MS SQL/MySQL) - Fixed bug that prevented the spider from closing properly when using command-line arguements. - Exception #00402619 should now be fixed. - New command-line parameter (/suppress) suppresses displaying the error report when a thread crashes. Feb 6 (MS SQL/MySQL) - Fixed bug with scheduler that caused collection configurations to be written with the wrong collections settings. Jan 25 (MS SQL) - Improved speed when following links is enabled. Jan 13 (MS SQL/MySQL) - Improved error reporting to provide more useful information to help track down thread crashing. - Added new option to specify whether or not to index documents that do not contain a file extension (Following tab in the options). Jan 8 (MS SQL/MySQL) - Fixed support for Microsoft Word documents. - Added error reporting. When a thread crashes, you can optionally send an error report with important information. Jan 7 (MS SQL) - Fixed occasional thread crashing when Follow Links is enabled. (MySQL) - Fixed thread crashing when re-visiting indexed documents. (MS SQL/MySQL) - Fixed UTF8 support. -2003- Dec 3 (MS SQL) - Fixed "Duplicate key was ignored" error. Dec 2 (MS SQL) - Fixed Parameter 'Description' not found error. - Fixed broken import when connecting to a networked or remote SQL Server. - When re-visiting error URLs they are kept in the error table if they fail again instead of being removed. - When indexing Word documents, the actual title of the document is now used as the Title instead of the filename. - %7e is now converted to ~ within URLs. Sep 11 (MS SQL) - Improved speed when importing URLs into the Todo list by using BULK INSERT. (MS SQL/MySQL) - Fixed bug that prevented keyword filters from being found. Sep 5 (MS SQL/MySQL) - Added command-line option for adding filters to the domain filter list. +, -, and ? are now supported. Example: spider.exe +http://www.mydomain.com/*. - Added an option for pausing the spider for a certain (user defined) amount of time between connections. This setting can be found in the options under the Connection tab. (MS SQL) - Added option that if enabled will instruct the spider to automatically start incremental population of the Full-Text Index. This will occur when the spider has finished collecting documents. This option can be found in the options under the Indexing tab. Aug 16 (MS SQL/MySQL) - Fixed bug introduced in last update which caused the spider to occasionally freeze. Aug 15 (MS SQL/MySQL) - Domain filters are now checked before visiting URLs. Previously domain filters were only checked when adding new found URLs to the Todo list. This change will allow you to change filters without needing to clearing the Todo list and start over. (MS SQL) - Improved speed when following URLs. Aug 13 (MS SQL/MySQL) - New feature allowing trial users to enter license code directly into the spider immediately after the order instead of waiting for a key file. Aug 7 (MySQL) - Improved speed when following links. Aug 5 (MySQL) - Fixed bug that prevented documents from being cached. *** All MSSQL Below *** Apr 4 - Fixed bug which caused http:// from being added to URLs already containing http:// when importing a list of URLs. Apr 3 - Fixed bug with following links. - Spider now prevents URLs with # characters in them from being added to the Todo list. - Fixed bug with domain and keyword pattern matching. Apr 1 - Spider now re-starts threads if they crash. - Case sensitive SQL statement fixed for importing URLs into the Todo list. - Fixed bug related to the sites robots.txt. ES would check not only the path of the URL, but also the domain for the matching value specified in the disallow of the robots.txt. - Increased speed when verifying keywords specified under the Keywords tab in the options. Feb 3 - Fixed bug which prevented URLs from being followed from pages that were filtered by keywords. Jan 28 - Indexed PDF docs now contain actual title from the document itself. - Support added for indexing Excel, PowerPoint, and WordPerfect documents. See converter page for more information. Jan 14 - Fixed bug in collection command-line parameter. - Spider now converts & in URLs to & to allow the spider to follow them correctly. Jan 12 - New command-line parameter allowing selection of collection to use upon startup. See updated PDF Admin Guide for details. - New command-line parameter allowing URLs to be blocked by adding to the Filter URLs list (under Domains tab). See updated PDF Admin Guide for details. - Fixed bug which allowed text to run together when removing HTML code. This bug occurred when a space was not placed between HTML elements.