Text analytics

Text analytics is a search engine for IUCLID data and attachments. It is installed separately from, but connected to, an instance of IUCLID. It allows rapid and sophisticated searching of all IUCLID fields including the text content of attachments. Searches can be carried out on both structured data such as picklists and dates, and on unstructured data such as free text fields and attachments.

Text analytics indexes all the information from IUCLID dossiers using elastic search, which provides for high levels of performance. Text analytics also carries out optical character recognition (OCR) on scans within attachments.

Significant changes have been made to the user interface since the previous version (3.4.0). See the release notes for a summary, and the user manual for a detailed description.

Hardware requirements for a large IUCLID database are presented below; assuming that the Wildfly Server and Elastic Search Server are installed on the same host.

  • CPU: 6
  • RAM: 16 GB (4 GB Wildfly Server, 4 GB Elastic Search Server,  8 GB OS and Tesseract)
  • HDD: 50 GB

The IUCLID 6 Server and its database can also be hosted on the same server. The hardware requirements vary according to the amount of data managed by IUCLID.

Known issue: In a simple search, parts of a word match in addition to the the whole word, which causes extra unexpected hits. The workaround is to place words in double quotes, like "this". For more detail, see the user manual. This is due to be corrected in the next version of Text Analytics.

Documentation

The download of Text Analytics is stopped until further notice. More information is available here.
Downloads are temporarily disabled.