Baidu Research Introduces EICopilot: An Intelligent Agent-based Chatbot to Retrieve and Interpret Enterprise Information from Massive Graph Databases

Knowledge graphs have been used tremendously in the field of enterprise lately, with their applications realized in multiple data forms from legal persons to registered capital and shareholder’s details. Although graphs have high utility, they have been criticized for intricate text-based queries and manual exploration, which obstruct the extraction of pertinent information. With the massive […] The post Baidu Research Introduces EICopilot: An Intelligent Agent-based Chatbot to Retrieve and Interpret Enterprise Information from Massive Graph Databases appeared first on MarkTechPost.

Jan 31, 2025 - 09:58
 0
Baidu Research Introduces EICopilot: An Intelligent Agent-based Chatbot to Retrieve and Interpret Enterprise Information from Massive Graph Databases

Knowledge graphs have been used tremendously in the field of enterprise lately, with their applications realized in multiple data forms from legal persons to registered capital and shareholder’s details. Although graphs have high utility, they have been criticized for intricate text-based queries and manual exploration, which obstruct the extraction of pertinent information.

With the massive strides in natural language processing and generative intelligence in the past years, LLMs have been used to perform complex queries and summarization based on their language comprehension and exploration skill set. This article discusses the latest research that uses language models to streamline information extraction from graph databases.

Researchers from Baidu presented  “EICopilot,” an agent-based solution that streamlines search, exploration, and summarization of corporate data stored in knowledge graph databases to gain valuable insights about enterprises efficiently. To appreciate the work more, we must look at the scale of data handled by EICopilot. A typical graph dataset of this nature consists of hundreds of millions of nodes, tens of billions of edges, hundreds of billions of attributes, and millions of subgraphs as company communities representing a country’s registered corporations, organizations, and companies.

EICopilot is an LLM-based chatbot that utilizes a novel data preprocessing pipeline that optimizes database queries. To achieve this, the authors first gather real-world queries related to companies from general-purpose search engines. Post collection, developers reserve some representative queries exclusively as seed datasets and write search scripts for every query using Gremlin language for the graph dataset. Finally, the authors systematically annotate and augment the above queries and scripts to form a vector database that enhances search accuracy.EICopilot utilizes this vector database to generate search spaces in real-time for effective retrieval and exploration of graphs.

In addition to the above data processing pipeline, EICopilot employs a comprehensive reasoning pipeline to provide precise query responses. This pipeline uses Chain-of-Thought (CoT) and In-Context Learning (ICL) to provide more accurate responses.

The authors also highlight the importance of an entity name in the query rather than the intent in a vector database query matching. The authors also proposed a novel query masking strategy that masks entity names in queries to combat this.EICopilot ensures that queries are understood in their complexity and executed with greater precision and relevance to user intent.

The authors provided us with an extensive empirical analysis and real-world experimentation that validate the utility of the proposed framework. They obtained data from Baidu’s internal data platform and processed it rigorously to construct a dataset involving a query and graph database query pair. The authors introduce a length complexity score based on the traversal length of the query. Based on the above score, the query was categorized as simple, moderate, or complex. To assess the performance of