ES.NET 2004 ©2004-2006, Innerprise http://www.innerprise.com/ ============ INSTALLATION ============ Before installing, uninstall any previously installed builds. You do not need to remove any collections or the database. ES.NET 2004 will automatically detect the existing database and skip over the creation process. =================== BUG FIXES & CHANGES =================== 02/08/2006 Release (2.0 Build 2230): - Fixed bug that could cause the collection creation screen to thrown an error. 01/30/2006 Release (2.0 Build 2221): - Fixed bug with Content Filtering. When filtering using the "Allow" tags, URLs found in the page to visit were limited to that section of the page. This was causing the spider to miss links on the page. 01/25/2006 Release (2.0 Build 2216): - Fixed issue with indexing documents with a file size greater than specified in the settings. This happened only when the file size could not be obtained. 01/13/2006 Release (2.0 Build 2203): - Fixed performance bug that would greatly decrease performance as the number of URLs to visit grew. A data type mismatch caused a table scan rather than a clustered index seek. 10/28/2005 Release (2.0 Build 2127): - Fixed bug with user defined domain extensions. Under certain circumstances user defined domain extensions were being ignored. - Fixed bug with displaying status when the spider is in continuous crawl mode and there are only a few URLs to visit. The interface would report idle. - Description length in esearch look and feel configuration files is now working. In order to take advantage of this setting, the the PerformSearch stored procedure needs updating to increase the amount of text for the summary that's returned. Simply run the included PerformSearch.sql code through Query Analyzer to update your stored procedure. 10/26/2005 Release (2.0 Build 2125): - Fixed issue with losing database name when saving configuration settings in the Search Application. - Fixed exception #3000 bug with the search script when the parameter "col" is not found in the query string. "col" is not needed when "id" is used. 09/30/2005 Release (2.0 Build 2099): - Fixed configuration file locking issue when saving the Starting page without any URLs specified. File would remain locked by the .NET process. - Fixed issue with obeying the robots.txt. - Fixed issue with determining file extension. Under certain circumstances, the spider would extract the wrong file extension from URLs. 09/16/2005 Release (2.0 Build 2085): - Search now properly handles non ASCII #34 quote characters. These characters were causing a parsing issue with the search query, which could result in incorrect results. - Fixed bug with basic search boxes in the search application. The collection name wasn't properly being passed in the querystring resulting no results found. - Searches now default to AND rather than NEAR. Using NEAR was resulting in missing some search results if the keywords were far apart from each other. 09/06/2005 Release (2.0 Build 2075): - Second fix for the invalid license key error. This fix has been tested on additional machines using various regional settings. 08/17/2005 Release (2.0 Build 2055): - The spider now properly detects type of document when document is launched from another document type. For example, an ASPX page launching a PDF document. 08/16/2005 Release (2.0 Build 2054): - Fixed "license key invalid" bug that some users experienced when using either a trial key or licensed key. 08/09/2005 Release (2.0 Build 2047): - HTML code for ASCII characters are now being converted to the actual ASCII character. - Fixed issue with extracting comment tags. Comment tags were not being added to the database for searching. 08/04/2005 Release (2.0 Build 2042): - Fixed bug that prevented legitimate license keys from being accepted under certain regional settings. 07/25/2005 Release (2.0 Build 2032): - Fixed scheduler bug. Starting URLs were not being added to the Todo list when the scheduled collection is started. 07/21/2005 Release (2.0 Build 2028): - Fixed scheduler bug. Schedules could be created, but could not be modified. - Fixed bug when starting scheduled collections that contain a space in the collection name. Collections could not be started. 06/29/2005 Release (2.0 Build 2006): - Fixed date bug that affected a small number of users when saving schedules. - Fixed bug introduced in a previous build that prevented indexing documents on local and network file systems. 06/20/2005 Release (2.0 Build 1997): - You can now change the text that's displayed to the user when search results are omitted as well as when no results are found. The text can be edited in the web.config. - Fixes bug with indexing documents that contain unicode characters when using MySQL. 06/14/2005 Release (2.0 Build 1991): - Fixed bug when getting the next URL to visit when re-visiting indexed documents and Friendly Crawling is enabled (MySQL only). - Fixed object reference bug when revisiting sites when no URLs are provided on the Starting screen. - Added option to disable caching of search results within the ES.NET Web Application. This option can be found on the Settings screen. When caching is enabled, cache expires after 5 minutes. 04/18/2005 Release (2.0 Build 1934): - Fixed bug with collections that have spaces in them (MySQL only). Activity page was throwing a Object Reference error. - Fixed bug with mime types that begin with "text/" but aren't "text/html". The spider should have processed them as HTML, but instead tried to locate a converter for them. This caused them to be indexed incorrectly. - When creating a collection, the hyphen is now an ignored character. - Fixed bug with finishing the installation process. An exception was being thrown about a non existing paths.xsd. - Fixed bug with showing cached link when cache does exist. - Added new option to allow caching of all documents -- even ones that do not provide any kind of formatting when converted to text. This option is called "CacheAll" and currently must be edited directly in your collections .xml file. True = cache all, False = do not cache all. 04/15/2005 Release (2.0 Build 1931): - Fixed bug with redirecting outside of the active domain when Stay In Site is enabled. This caused the spider to crawl and index unwanted pages. - Major changes to the document conversion support. ES.NET 2004 can now detect IFilters installed on your system. It will attempt to use an external .exe converter before checking the registry for an IFilter. A new file is located in your ES.NET 2004 directory called mime.txt that allows you to specify an .exe converter based on mime type. - Fixed bug with crawling UNC paths. If your UNC path requires authentication, you will need to run the ES.NET 2004 service under a user account with access. - Fixed bug with crawling local file system. Previously, ES.NET wasn't checking the document extensions before indexing them. This caused it to index executables and other unwanted files. - More document converters supported: http://www.innerprise.com/esnet-filetypes.asp. Added file types include: Windows Meda Audio, Windows Media Video, Compiled HTML Help, Visio, and built-in support for XML. - When a cached copy of a page is unavailable, the cached link is no longer shown for that search result. - Improved XML indexing. 04/05/2005 Release (2.0 Build 1922): - Fixed bug that prevented schedules from running when using MySQL. - Fixed bug with "The process cannot access the file" errors when reading/writing to the collection configuration files. 03/31/2005 Release (2.0 Build 1916): - Fixed bug when converting DOC files. DOC files were being converted to empty files. - Fixed bug with Title tag that contain elements within them. 03/18/2005 Release (2.0 Build 1903): - Fixed bug with searches containing a single quote. In addition to code changes within the Web App and Search App, a change to the PerformSearch stored procedure is required. See very bottom of this document (this is only for MSSQL users). - Fixed bug when searches contain a space before and after a minus or plus sign. - Fixed bug when searching only ignored characters. - Fixed AM/PM bug when creating schedules. - Fixed bug when Title tag spans multiple lines. 01/21/2005 Release (2.0 Build 1847): - Fixed bug with documents containing umlauts. These characters were being dropped. - More improvements in converting dynamic links into static links for cached pages. - Improved keyword highlighting. More than 4 search terms are now hightlighted. 01/05/2005 Release (2.0 Build 1831): - Fixed bug with in search script that prevented passing of the search within parameter while paging through the results. - Fixed bug with the Allow Content Filtering that was preventing document tags from being extracted. - Fixed support for both Basic and NTLM authentication. - Improved conversion of dynamic links into static links for cached pages. - Fixed keyword highlighting bug that caused html tags to be highlighted. 10/29/2004 Release (2.0 Build 1763): - Fixed bug with checking filters when finding new URLs - Fixed bug with checking filters when indexing documents 10/18/2004 Release (2.0 Build 1752): - Fixed installation bug that some users have experienced 10/13/2004 Release (2.0 Build 1747): - Fixed bug with scheduler that caused the schedule monitoring thread to crash, preventing schedules from running. - Fixed bug with customizing the search script that randomly lost the session, requiring logging in again. - Spider now assumes document is text if it doesn't recognize the content type of file extension. - Spider now grabs URLs without requiring href= or src= tags. - Fixed display issue with activity screen for FireFox Web browsers. - Many small bug fixes. 07/23/2004 Release (2.0.1665.17502): - Fixed bug with creating a collection if a database user other than 'sa' was used. This only affected Microsoft SQL Server. - Added support for "Integrated Windows authentication" 07/22/2004 Release (2.0.1664.11897): - Fixed login bug when connecting to a remote database. - Fixed bug with editing a user when using MSSQL Server as the database. 07/19/2004 Release (2.0.1660.16681): - Added support for crawling local file systems. Paths should be specified as: file:///:/. Example for crawling D:\: file:///d:/ - Fixed bug that caused some URLs with spaces in them to be truncated - Fixed bug with needing to restart service when doing a new install 07/17/2004 Release (2.0.1657.1056): - Added support for MySQL (version 4.1 required). - Fixed bug with reading robots.txt that caused the spider to think the entire site was disallowed if "Disallow" had no value. 07/06/2004 Release (2.0.1648.28367): - Fixed bug with trial key generation. 07/02/2004 Release (2.0.1644.17344): NOTE: ---------------------------------------------------------------------- A few customers were given an update after the 4/16 release that added support for "Integrated Windows authentication". This feature is turned off in this release as there still problems that need to be worked out with it. If you were given that special release, do not update to this new one. ---------------------------------------------------------------------- - Faster crawling when using keyword filters - Fixed search results of Search Application displaying problem with Firefox Web browser. - Fixed non-working omit feature of sample Search Application search script. - Added omit option to Web Application search - Added support for two new file types: DWG and DWF. IFilters are required: http://www.innerprise.net/esnet-filetypes.asp - Fixed log descriptions. Log descriptions were often showing "OK" instead of an error reason when a document could not be indexed. - Fixed bug that prevented URLs from being added to the errors table. This bug caused a looping problem. - ES.NET Service now only cleans up temp table while there's at least one collection running. This eliminates the constant database activity. - Fixed bug which caused a .NET error screen when attempting to go to the key.aspx from the login page (to update an expired key) - Fixed duplicate single quote problem. Prior to adding a document to the database, single quotes were changed to double quotes (required by the SQL Server). This wasn't a problem until the code was changed to use parameters (which don't require double quotes). This caused the actual document to have double quotes instead of just one. - Fixed problems with bookmarks in URLs. They are now removed. - Many small bug fixes. 04/16/2004 Release (2.0.1567.13554): - Added new maximum documents limitation on the Following screen. This feature will allow you to specify the maximum number of documents the collection should index. The collection will automatically stop once the value has been reached. Use 0 (zero), to allow an unlimited number of documents. - Fixed bug that prevented large PDF documents from being indexed. - Fixed bug with port numbers in URLs. 04/05/2004 Release (2.0.1556.18246): - Fixed arithmetic bug with search results when start value is empty 04/05/2004 Release (2.0.1555.34475): - Fixed bug with search results when return value is empty - Fixed AM/PM problem with scheduler - Added partial support for ES v1.0 converters. To use, simply download the converter from the converters page. http://www.innerprise.net/es-filetypes.aspx. - New content screen. This screen allows you to control which parts of a document can be indexed by setting tags within the documents. 03/24/2004 Release (2.0.1543.37164): - Username is now not changeable - Updated user management screens - Missing noise.txt with Web application is now included - Added confirmation when emptying tables 03/22/2004 Release (2.0.1542.26676): - Table changes. All are required. es_queued has a new column (SQL code at bottom of this page.) es_users has changed (SQL code at bottom of this page.) New table: es_uc (SQL code at bottom of this page.) - The administrator account is now stored in the SQL database. The Web application will automatically move the administrator to the database if the users table is empty. This login is shared between the search script and the Web application. - Fixed hard coded link in cached to page that should have been the link for the currently selected cache page. - Searches now check for, and remove, words that are considered ignore words from the SQL Server. The included noise.txt is read when searches are performed to filter those words. - Meta refresh added to the Activity screen, Collections screen, and Full Text screen. - Added the following to the options screen: - SMTP Username - SMTP Password - SMTP Port Number - Meta Refresh Time (zero = no refresh) - When creating a new collection, you can press the enter key instead of having to click the button. - The search script no longer uses the template.ascx file. Instead the settings (font, color, etc) are displayed using CSS. Items that are marked to not display are filtered out before displaying the results. - The top bar of the search scripts result page is now customizable. You can change the background color, text color, text size, and text font face. 03/03/2004 Release (2.0.1523.26878): - Revisit status accuracy has now been fixed. - Fixed bug with cached PDF, DOC, XLS, RTF, and MP3 documents that corrupted the data. - Changes were made to the key system. New keys will need to be issued. If you're a licensed customer and haven't received a new key via email, please contact support@innerprise.net. There shouldn't be any more changes after this. 03/02/2004 Release (2.0.1522.25093): - Internal search now remembers the last collection you were viewing (if any) and automatically selects it from the collection list to search. - Help has been updated, though still incomplete. - Fixed encoding problem with the search results. 02/29/2004 Release (2.0.1520.13523): - Another update to the PerformSearch stored procedure. This update greatly speeds up queries by eliminating the second temp table. 02/28/2004 Release: - Proxy server support is now working. - Searching within specific fields is now working. If updating, you must replace your existing PerformCount and PerformSearch stored procedures with the one in the sql/procedures.sql script. - Web admin help has been changed to integrate the PDF Administrator Guide. It replaces the help topics, which became less useful due to the quick help popups. - Update search script to display cached documents and search within specific fields. - User is now notified when searches are omitted. - Fixed problem with Title tags where there was other text within them: instead of <title>. 02/27/2004 Release: - Continuous crawling is now working. - Migrate is now working. Currently it requires the Enterprise Search v1.0 database login password be blank. Schedules and domain/keyword filters are not migrated. 02/26/2004 Release: - Fixed bug with creating a new collection. - Fixed bug with URLs with query strings. - Documents are now cached like Google. 02/25/2004 Release: - Fixed bug with RegEx extraction when Meta tag name element came after content. - URLs can now be added to the Todo list through the maintenance screen even if they have been indexed or visited. - Cookie support is now working. - Proxy server support should now be working. 02/24/2004 Release: - Fixed known bugs with International dates. - Fixed bug with URL extraction. - Fixed bug that was preventing Keyword filters from being saved. - Fixed bug that was enabling the wrong page extensions from being enabled. 02/23/2004 Release: - Starting URLs are now converted to the right format when saved. Example, if the starting URL specified is www.mysite.com it's automatically converted to http://www.mysite.com/. - Disallow URLs with query strings is now working. - Disallow FontPage special directories is now working. - Disallow documents within CGI-BIN is now working. - Pages without extensions are now added to the Todo list automatically. - Fixed progress bar when revisiting documents. Progress showed the wrong percentage. - IFilter support is now working. All titles will show up as Untitled at this point. To index PDF files, the Adobe PDF IFilter will need to be installed: http://www.adobe.com/support/salesdocs/1043a.htm. - When a collection is deleted, all schedules associated with that collection are now removed. - Fixed bug that was preventing deleting collections with spaces in their names. - Fixed bug that would replace a collection if the new collections name was the same as another collections short name. - Added basic SQL code checking to the create new collection box and add/remove URL. This is to prevent users from attempting to execute SQL code. 02/22/2004 Release: - Following filter (on filters screen) is now being checked. All of the filters are now using RegEx so they must be changed to resolve incompatabilities. Both indexing and following should be changed to: ?http://.*/.* -http://.*/.*[(].* -http://.*/.*[)].* -file://.* -mailto:.* -telnet://.* -gopher://.* - Fixed bug with cached robots.txt. If the cached value was empty due to the site not having a robots.txt, the spider would re-request it each time. - Page Rank page has been removed temporarily. The current internal rank value based on several factors will remain active and influence the search result order. Page Rank and SPAM options will make their way back into the product. For most customers this is of little value. - File & OS log levels now working. - Email notifications for severe errors are now working. Currently, this is limited to the SQL Server connection being lost - International date is now supported within SQL Statements. - Corrected and verified SQL Statements support case sensitive SQL Server installations. - Search application Queries screen has been updated to the Web application queries screen look and feel. - Search application Activity screen has been updated to the Web application queries screen look and feel. - Search boxes on the Search application now work. Certain fields will remain disabled until they are working. 02/20/2004 Release: - Fixed problem with the last run date and time. When the spider updated the number of running threads it would set the last run date to the current date and time. This causes problems if other collections are started because threads are decreased to allow the new collection to run and this then updates the database with the current time. - Scheduler is now working. - Schedules can now be deleted and updated. - User is now warned if adding a schedule that overlaps another schedule for the selected collection. - Fixed problem with collection hyperlinks on the activity page when the collection has a space in it. - Fixed problem with starting collections with spaces in them. - Confirmation now required when deleting a collection or schedule. 02/19/2004 Release: - Improved URL extraction - Fixed cache problem with the Web Application search. The search wasn't being encrypted, which can cause problems when a search with special characters is used. - Last Updated element of the collections list now displays one of the following: n Minutes Ago, n Hours Ago, n Days Ago, or Long Date - New element added to collections. A progress bar is now shown when collections are being revisited or crawled. - Schedules can now be created and viewed. Schedules do not work yet though. - Changes were made to the es_scheduler table. See SQL Code at very bottom. Schedules cannot be added until the changes are made. - Scheduler quick help topics are incomplete and are not included. - New Statistic added to the Web Application. The total number of searches is now displayed. 02/18/2004 Release: - Fixed minor problem with collection list on the activity.aspx page. - Collection names on the collection list are now hyperlinks. Clicking them will take you directly to the collections tab with the collection automatically selected. - Updated search result font type and size in the Web Application - Added help icons to most of the features in the Web Application. When clicked, a new window is shown with information about the item that the help icon was next to. - The Query Report has been updated to the new look and feel within the Web Application. It still needs to be updated in the Search app. It also doesn't provide paging yet, so all results are returned on the same page. - Updating is a little easier now. When uninstalling the Web Application it's not necessary to remove the esnet directory. - Built-in search now automatically fills in values if a search was performed. 02/17/2004 Release: - New User Interface (Web Application & Search) - Activity now adds the following status messages: Revisiting & Hibernating 02/16/2004 Release: - Revisit is now working - Collections now cannot be created if they already exist. They must be deleted first. - Queries are now removed when Empty Table "All" is selected - Queries are now removed when a table is deleted - URLs can now be removed using the Maintenance screen 02/14/2004 Release: - Users screen now hides "New User" and "Delete User" buttons since v2.0 won't include that functionality - Fixed bug which caused automatic logout when the proxy settings were saved - A default template.ascx file is now included with the search script -- it does not need to be generated before searches can be performed - Fixed "Arithmetic Overflow Exception" - Help topics updated: Activity, Settings, Users, About, Collections ================ SQL CODE CHANGES ================ 04/?/2005: PerformSearch Modification: Replace line: END,SUBSTRING([key],8,charindex(''/'',[key],8)-8) With: END, CASE WHEN CHARINDEX(''/'',[key], 8)>0 THEN SUBSTRING([key],8,charindex(''/'',[key],8)-8) ELSE '''' END 03/09/2005: PerformSearch Modification: After the following line: DECLARE @sql nvarchar(4000) Add the following new line: SELECT @OrigQuery = Replace(@OrigQuery, CHAR(39), CHAR(39)+CHAR(39)), @Query=Replace(@Query, CHAR(39), CHAR(39)+CHAR(39)), @FirstTerm=Replace(@FirstTerm, CHAR(39), CHAR(39)+CHAR(39)) 03/22/2004: Modified es_queued: DROP TABLE [es_queued] CREATE TABLE [es_queued] ( [collectionid] [int] NOT NULL , [status] [int] NOT NULL , [userid] [int] NOT NULL , CONSTRAINT [PK_es_queued] PRIMARY KEY CLUSTERED ( [collectionid] ) ON [PRIMARY] ) ON [PRIMARY] GO Modified es_users: DROP TABLE [es_users] CREATE TABLE [es_users] ( [userid] [int] IDENTITY (10000, 1) NOT NULL , [email] [varchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [name] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [login] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [password] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [admin] [bit] NOT NULL CONSTRAINT [DF_es_users_admin] DEFAULT (0), [create] [bit] NOT NULL , [delete] [bit] NOT NULL , CONSTRAINT [PK_es_users1] PRIMARY KEY CLUSTERED ( [userid] ) ON [PRIMARY] ) ON [PRIMARY] GO New table es_uc: CREATE TABLE [es_uc] ( [userid] [int] NOT NULL , [collectionid] [int] NOT NULL ) ON [PRIMARY] GO 02/19/2004: CREATE TABLE [es_scheduler] ( [scheduleid] [int] IDENTITY (1, 1) NOT NULL , [start] [datetime] NOT NULL , [stop] [datetime] NOT NULL , [command] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [frequency] [int] NOT NULL , [collectionid] [int] NOT NULL , CONSTRAINT [PK_es_scheduler] PRIMARY KEY CLUSTERED ( [scheduleid] ) ON [PRIMARY] ) ON [PRIMARY] GO