Goglides Dev 🌱

Mallikarjun for goglides

Posted on • Updated on

How to query the database with natural language using LLM: A Comprehensive Guide

Today we will deep dive into learning how to query databases using natural language.

Project outline:

  1. Introduction
  2. Background
  3. Tools and Technologies
  4. Project Components
  5. How It Works
  6. Benefits
  7. Challenges and Considerations
  8. Future Directions
  9. Questions asked
  10. Conclusion

Project Workflow:
Image description

The complete source code of this project is available at Github

1. Introduction:

This project aims to bridge the gap between SQL databases and end users. Any professional who knows the natural language can ask any complex query on the database which gives us the final answer without having to worry about remembering SQL commands.

2. Background:

You don't have to go for an SQL developer to query your database. it is just a natural language query that can fetch your data seamlessly.

3. Tools and Technologies:

Tools and technologies used are Streamlit to take input from users, Langchain for integration, and OpenAI models like GPT-3.5-Turbo for converting prompt and natural language queries to SQL queries.

4. Project Components:

A). Streamlit: Streamlit is the open-source Python framework designed to quickly build and share AI/Data science web applications. we can also deploy the data science web app
using streamlit.

B). OpenAI API: OpenAI models are powerful and efficient in generating and understanding natural language texts. it is based on GPT (Generative pre-trained transformers) architecture. We have used the GPT-3.5-turbo model in our project.

C). Langchain: Langchain is an open-source LLM framework for LLM applications like RAG, Query database, web scraping, chatbots, etc. We have utilized SQL database agent toolkit from langchain.

5. How It Works:

Streamlit interface takes inputs like dbname, Host, username, password, and natural language query as shown below.

import streamlit as st
input_text = st.text_input("Enter your question here:","")
db_user = st.text_input("UserName:","root")
db_password = st.text_input("Password:","root")
db_host = st.text_input("Host:","localhost")
db_name = st.text_input("DBName:","bankdata")
Enter fullscreen mode Exit fullscreen mode

These inputs are sent to the SQL database agent toolkit function in langchain which establishes a connection with the MySQL database.

from langchain.sql_database import SQLDatabase
db = SQLDatabase.from_uri(f"mysql+pymysql://{db_user}:{db_password}@{db_host}/{db_name}")
print("Database connection successful!")
Enter fullscreen mode Exit fullscreen mode

LLM uses sql agent to understand the query, interpret it, and convert it to an SQL command. This command is run on the MySQL database that responds with an output, which is shown on the streamlit interface.

from langchain.agents import create_sql_agent
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", api_key=api_key)
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
agent_executor = create_sql_agent(llm=llm,toolkit=toolkit,
agent_executor.run(input) #Input query from streamlit
Enter fullscreen mode Exit fullscreen mode

6. Benefits:

Users don't have to remember complex SQL commands, they can simply bypass them and use natural language instead.

7. Challenges and Considerations:

While writing the query, we have to be very specific about what we want. unspecified queries lead to abrupt results.

8. Future Directions:

I have implemented this project on MySQL database, we can extend it to any database of our choice. we need to modify the corresponding SQL function in langchain that's all.

9. Questions Asked:

Q1. What is the maximum transaction in database table?

The maximum transaction value in the creditcard table is 25691.2.
Enter fullscreen mode Exit fullscreen mode

Q2. What is the average amount in database table?

The average amount in the creditcard table is 88.35
Enter fullscreen mode Exit fullscreen mode

Q3. For the given database table, count no of one's and zero's?

There are 568630 zeros and 984 ones in the "creditcard" table.
Enter fullscreen mode Exit fullscreen mode

10. Conclusion:

This project is significant in connecting end users with databases without having to worry about remembering complex SQL queries. Not only the MySQL database, it can also be extended to other databases seamlessly. It is evident that if we aim to use the full potential of the LLM application, need to write queries with the utmost precision.

Top comments (0)