Quantitative Natural Conversations: Blending LLMs and SQL for Augmented Generation

The next generation of analytical tools will enable users to ask their data questions in their own words.

Oct 15, 2023

The next generation of analytical tools will enable users to ask their data questions in their own words.

The advent of large language models (LLMs) like GPT-3, PaLM, and Anthropic’s Claude has sparked a revolution in natural language AI. These models can understand complex language, reason about concepts, and generate coherent text. This has enabled conversational interfaces for various applications. However, a vast majority of enterprise data resides in structured SQL databases like PostgreSQL, Snowflake, and BigQuery. Seamlessly accessing and analyzing this data through natural conversations remains challenging.

Recent research has proposed techniques to enhance the integration of LLMs with SQL databases, with a focus on cross-domain and cross-compositional generalization. For example, Arora et al. have devised an algorithm to sample a diverse set of few-shot examples covering all SQL clauses and operators to effectively prompt the LLM. Their domain adaptation approach adapts these few-shot examples to the target database via semantic similarity. Further, the least-to-most prompting technique decomposes few-shot examples into sub-questions with intermediate representations to improve compositional generalization.

https://arxiv.org/abs/2308.02582

In this article, we explore the promise and challenges of integrating large language models with SQL databases.

The motivation is simple — imagine analysts or business users conversing with data in plain English without needing to write SQL queries.

This could democratize data access, accelerate analytics, and unlock new possibilities for data-driven automation.

However, effectively achieving this human-like semantic interaction between LLMs and SQL requires overcoming key challenges around query correctness, security, and performance.

We’ll discuss techniques like careful prompting, validation loops, and role-based access control that can pave the path towards production LLM-SQL integration.

Combining the conversational prowess of LLMs with the analytical richness of SQL could transform how humans interact with data. The future where data conversations sound as natural as human conversations may not be far away!

Now let’s dive deeper into the specifics.

Why Integrate LLMs with SQL Databases

There are several motivations for integrating LLMs with SQL:

Natural language interface — Rather than writing complex SQL queries, users could interact with the database conversationally using plain English. For example, they could simply ask “What were the top selling products last month?” rather than writing out JOIN and GROUPBY queries. This makes data access far more intuitive and lowers the barrier for working with analytical data.

Democratization of data — By enabling natural language interfaces, access to data can be democratized to users without technical SQL skills. Business users, analysts, and even executives could directly get valuable insights through natural conversations, allowing data-driven decision making across the organization.

Augmented analytics — LLMs have shown prowess in digesting information and producing insights by generating summaries, visualizations, and narratives. Integrating them with SQL allows automatically augmenting query results with generated texts, charts, and graphs that highlight key trends and findings. This could accelerate analytics workflows.

Workflow automation — With conversational interfaces, LLMs can integrate into automated workflows through voice commands or conversational bots. Common analytical workflows could be activated simply by speaking to a bot. For example, a weekly sales report could be generated by saying “Run my weekly sales analysis”. Bots with LLM SQL access can become intelligent assistants for data analysts.

Enhanced collaboration — By combining NLP capabilities with SQL access, LLMs can enable cross-functional collaboration on data analysis. Stakeholders can discuss datasets and exchange findings through natural dialogue. The LLM acts as the interface to SQL, translating conversations to queries and sharing insights.

Multi-modal analysis — LLM interfaces allow seamlessly combining SQL data with knowledge graphs, vector databases, and other sources. This enables multi-modal analysis for more comprehensive insights.

The end goal is to make interacting with data as easy and intuitive as human conversations. Democratization, automation, and collaboration get enhanced through the natural language prowess of LLMs coupled with analytical richness of SQL databases.

Augmenting SQL with External Data

So far we’ve focused on the interface between LLMs and SQL databases. But the data stored in enterprise SQL systems is only one piece of the overall data puzzle. Critical knowledge also resides in knowledge graphs, vector databases, external APIs, and more.

LLMs provide a unique opportunity to seamlessly blend and augment data from disparate sources.

An LLM could query a SQL database and then enhance the results by calling out to a knowledge graph to fetch related entities. It could run similarity searches across an enterprise vector database to find relevant documents. External APIs can provide supplementary data to augment the SQL results.

For example, when querying a sales database, the LLM could enhance the geo-analysis by overlaying results on maps using a geospatial API. When analyzing customer churn, it could pull in vectorized account profiles to uncover common patterns. Insights from SQL queries can become more powerful when blended with data from other sources.

To enable this, the LLM query interface should allow orchestrating calls to various internal and external data sources. The LLM can then leverage its strong language understanding capabilities to analyze the aggregated data and synthesize integrated insights through automatically generated summaries, visualizations, and narratives.

Augmenting SQL with multi-modal data not only enhances individual queries but also unlocks the possibility of building virtual knowledge graphs that connect structured databases with unstructured knowledge. This positions organizations to extract exponentially more value from their enterprise data.

SQL + Vector Querying

Method 1 — SQLAutoVectorQueryEngine:

Combines individual SQL and vector query engines into an SQLAutoVectorQueryEngine that can retrieve information from both data stores.
Demonstrated on city data with a SQLite database and Pinecone vector index.
Routes queries to appropriate engine and combines results.

https://docs.llamaindex.ai/en/stable/examples/query_engine/SQLAutoVectorQueryEngine.html#sql-auto-vector-query-engine

Method 2 — PGVector in SQLAlchemy:

Enables semantic search in PostgreSQL by storing embeddings as vector columns.
Allows expressive text-to-SQL queries with both filters and similarity search.
Custom text-to-SQL prompt is specialized for PGVector syntax.

https://docs.llamaindex.ai/en/latest/examples/query_engine/pgvector_sql_query_engine.html

Method 3 — OpenAI Function API:

Uses OpenAI’s function API and agent for retrieval across SQL and vectors instead of custom retrievers.
Defines function for auto-retrieval from vectors. Initializes agent with SQL and vector tools.
Shows potential for joint querying, but needs more robustness analysis.

https://docs.llamaindex.ai/en/stable/examples/agent/openai_agent_query_cookbook.html

Key Challenges:

Query routing logic and limitations of cross-database retrieval.
Performance and quality of semantic search capabilities.
Scaling robustness as complexity increases.

Future Research:

Hybrid interfaces for simplicity and expressiveness.
Techniques like paraphrasing to enhance prompting.
Incorporating database metadata and statistics.
Expanding to other data sources like knowledge graphs.

Challenges in LLM-SQL Integration

However, effectively integrating LLMs with SQL brings forth some unique challenges:

Hallucination — Since LLMs generate text probabilistically, they may hallucinate queries referring to non-existent database artifacts like tables or columns not present in the actual database. This can lead to invalid SQL and incorrect results.

Context length limits — Large language models have a limited context window length they can operate on. But database metadata including schema definitions and column details can far exceed these token limits. This makes it hard for LLMs to ground themselves in full database context.

Query errors — LLMs may successfully understand the intent but still generate logically incorrect SQL queries leading to wrong results. Suboptimal data sampling and prompting can contribute to such errors. Lack of formal reasoning makes error-free SQL generation difficult.

Cost and performance — As query complexity increases, the computational cost for generating and validating SQL through LLMs also increases. Heavy LLM usage leads to high and unpredictable costs. Performance may also suffer due to inference latencies.

Security and governance — With conversational interfaces, users are not explicitly specifying queries but stating intents. Safeguarding data access and proper governance becomes very challenging.

Explainability — It can be hard to explain why an LLM generated a particular SQL query from a conversational natural language intent. Lack of explainability makes debugging failures harder.

Monitoring and testing — With dynamic SQL generation, it is difficult to continuously monitor, test, and validate the SQL interfaces. Production readiness requires extensive testing.

Potential Solutions

Here are some potential strategies to address these challenges:

Informative prompting — Carefully prep the LLM by describing the database schema, sample data rows, valid SQL queries, and business logic. This grounds the LLM in reality to avoid hallucination errors. Techniques like Generic Prompting can help create representative prompts covering SQL constructs.

```python
 decompose_prompt(adapted_prompt):

  decomposed_prompt = []

  for query_pair in adapted_prompt:

    nl_questions = decompose_nl(query_pair.nl)
    sql_reps = generate_intermediate_sql(nl_questions)

    decomposed = [QueryRep(nl, sql) for nl, sql in zip(nl_questions, sql_reps)]

    decomposed_prompt.append(decomposed)

  return decomposed_prompt

```

Progressive prompting — Breaking down a complex conversational intent into logical sub-queries helps the LLM learn step-by-step. Intermediate steps explain the intent and data needs for each sub-query. This improves compositional generalization, which helps tackle query errors.

Syntax validation — Generated SQL can be validated for syntactic correctness before execution. In case of errors, the LLM can be re-prompted by explaining the error to make corrections. This catches any incorrect SQL.

The LLM generates a SQL query from the input “Show total sales for each product last month”. Before execution, the query is validated using a SQL parser which catches that the query mistakenly referenced the non-existent “sales” table. The LLM is then re-prompted to correct the query.

User feedback loops — Gathering ongoing user feedback on query quality and using it to further fine-tune prompts can enhance LLM performance over time through continuous learning.

Users rate the relevance of retrieved results for queries like “Which products had the best sales last quarter?” This feedback is used to further fine-tune prompt engineering and train the LLM to better ground its queries.

Hybrid interfaces — Allowing seamless switching between conversational and more accurate query interfaces provides a fallback and bridges the gap as LLM interfaces mature.

The conversational interface allows asking “Which are my top customers?” in plain English. However, for more complex analytic needs, users can seamlessly switch to a GUI query builder interface for crafting accurate SQL queries.

Usage monitoring — Tracking computational usage statistics allows preventing excessive costs by throttling requests if needed. Caching also helps.

Based on the volume and complexity of queries, the LLM compute usage is tracked. If excessive costs are projected due to spikes in usage, throttling can restrict non-critical queries to control expenses.

Access control — Integrating with existing access control frameworks to manage security rather than directly exposing the LLM. Queries can be validated against user credentials.

Rather than directly interacting with the LLM, users interface through a middleware layer. This validates if the user has appropriate data access permissions before allowing the query for “Show avg order value by customer geography” to be passed to the LLM.

Domain Adaptation — Adapting generic prompts to the target database schema helps adapt the LLM to new databases.

 adapt_to_target(generic_prompt, target_schema):

  adapted_prompt = []

  for query_pair in generic_prompt:

    sql = query_pair.sql
    adapted_sql = generate_similar_sql(sql, target_schema)

    nl = generate_nl(adapted_sql)

    adapted_prompt.append(QueryPair(nl, adapted_sql))

  return adapted_prompt

Least-to-Most Prompting — Progressively decomposing queries helps improve compositional generalization.

 decompose_prompt(adapted_prompt):

  decomposed_prompt = []

  for query_pair in adapted_prompt:

    nl_questions = decompose_nl(query_pair.nl)
    sql_reps = generate_intermediate_sql(nl_questions)

    decomposed = [QueryRep(nl, sql) for nl, sql in zip(nl_questions, sql_reps)]

    decomposed_prompt.append(decomposed)

  return decomposed_prompt

The end goal should be seamless, secure, performant LLM query interfaces on par with human analysts. A combination of careful training, validation, and hybrid interfaces is needed to enable enterprise adoption.

Conclusion

The promise of analyzing enterprise data through intuitive conversations is an exciting one. However, effectively integrating large language models with production SQL databases requires overcoming key challenges around correctness, security, costs, and governance.

With careful prompting strategies, validation loops, and hybrid interfaces, LLMs can provide value in making data access more intuitive without compromising accuracy, performance, or control. Techniques like domain adaptation, least-to-most prompting, and progressive decomposition also show promise in improving the robustness of natural language SQL interfaces.

As LLMs continue maturing at a rapid pace in capabilities, their integration into the data access stack becomes more viable. In the future, conversational analytics bots overseeing data workflows could become an enterprise reality. Democratization of data through language interfaces can usher in an era of cross-functional collaboration at new levels.

However, responsible and controlled integration remains crucial for enterprise adoption. The technology must earn its trust through rigorous testing and validation. With a prudent approach, the synergies between conversational AI and SQL analytics provide immense opportunities to transform how humans interact with data. The path forward is challenging but the possibilities make it a rewarding journey that could reshape the future of enterprise analytics and data-driven decision making.

In Plain English

Thank you for being a part of our community! Before you go:

Be sure to clap and follow the writer! 👏
You can find even more content at PlainEnglish.io 🚀
Sign up for our free weekly newsletter. 🗞️
Follow us on Twitter(X), LinkedIn, YouTube, and Discord.

AIDecide

Discussion about this post