RAG with Langchain and Oracle Database

The recent arrival of Generative AI in the database world will revolutionize data interaction and utilization. These models can understand natural language queries and generate complex SQL commands, making database access more intuitive and productive. This could automate routine tasks, improve query accuracy, and enable sophisticated data analysis. Integrating Generative AI into databases could transform data management, fostering dynamic, intelligent applications and decision-making. The combined efforts of integration technologies might drive innovation, empowering businesses to leverage AI for efficient data use.

In this post, we will create a proof of concept by connecting ChatGPT to our local Oracle database. We will test the model’s ability to analyze the data structure, create the necessary queries, retrieve the information, and integrate the outcome into a natural language response. This demonstration aims to showcase the practical application of Generative AI in enhancing database interactions and data-driven decision-making.

Hands-on

To cook this recipe we will need the following ingredients:

A Bowl of Python. The fundamental base ingredient to run our code.
A Cup of Jupyter Notebook. Essential for interactively testing and refining our code, offering a convenient and user-friendly interface.
A Pinch of API_KEYs.
- OPENAI_API_KEY. Your secret spice to access the selected language model from OpenAI.
- LANGCHAIN_API_KEY. The optional ingredient to enhance the flavor of this recipe, allowing for better request tracing in LangSmith.
A Handful of LangChain Libraries. Crucial components to facilitate seamless integration and processing.
A Scoop of python-oracledb Library. We will use the new oracledb module for database connectivity, replacing the deprecated cx_Oracle.
A Jar of Oracle Database. Either local or remote. In this case, we are using a local Oracle Database 21c Express Edition running over Docker.

After putting on our apron, we start our recipe by installing the necessary libraries and obtaining the required API keys. The LANGCHAIN_API_KEY is optional but will be useful for tracking the activity of our requests in LangSmith.

The API keys must be kept completely private.

The API keys must be kept completely private, so the best option to avoid including them in the code is to store them as environment variables in the profile of the operating system we are using. In this case, we define the variables in the .bash_profile of our Linux home directory.

%%capture --no-stderr
%pip install -qU  langchain langchain-community langchain-openai langgraph oracledb

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass()

# Comment out the below to opt-out of using LangSmith in this notebook. Not required.
if not os.environ.get("LANGCHAIN_API_KEY"):
    os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()
    os.environ["LANGCHAIN_TRACING_V2"] = "true"

Now we establish and test the connection to our database. Note the function get_usable_table_names which returns the accessible tables according to the privileges of the specified user. It is advisable to fine-tune the permissions of this user to prevent an error from the LLM from affecting our data.

from langchain_community.utilities import SQLDatabase
import oracledb

#db_uri = "oracle+oracledb://[user]:[pwd]@[host]:[port]?service_name=[service_name]"
db_uri = "oracle+oracledb://hr:hr@localhost:1521?service_name=xepdb1"
db = SQLDatabase.from_uri(db_uri)

print(db.dialect)
print(db.get_usable_table_names())
db.run("SELECT * FROM departments")

oracle
['countries', 'departments', 'emp_temp', 'employees', 'j_purchaseorder', 'job_history', 'jobs', 'locations', 'regions', 'sales']

"[(10, 'Administration', 200, 1700), (20, 'Marketing', 201, 1800), (30, 'Purchasing', 114, 1700), (40, 'Human Resources', 203, 2400), (50, 'Shipping', 121, 1500), (60, 'IT', 103, 1400), (70, 'Public Relations', 204, 2700), ........]"

from langchain_openai import ChatOpenAI
from langchain_community.agent_toolkits import SQLDatabaseToolkit

#Select your fave model and properties.
#llm = ChatOpenAI(model="gpt-3.5-turbo-0125")
llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
#    max_tokens=None,
#    timeout=None,
#    max_retries=2,
)

toolkit = SQLDatabaseToolkit(db=db, llm=llm)
tools = toolkit.get_tools()
tools

[QuerySQLDataBaseTool(description="Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.", db=),
 InfoSQLDatabaseTool(description='Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3', db=),
 ListSQLDatabaseTool(db=),
 QuerySQLCheckerTool(description='Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!', db=, llm=ChatOpenAI(client=, async_client=, model_name='gpt-4o', temperature=0.0, openai_api_key=SecretStr('**********'), openai_proxy=''), llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['dialect', 'query'], template='\n{query}\nDouble check the {dialect} query above for common mistakes, including:\n- Using NOT IN with NULL values\n- Using UNION when UNION ALL should have been used\n- Using BETWEEN for exclusive ranges\n- Data type mismatch in predicates\n- Properly quoting identifiers\n- Using the correct number of arguments for functions\n- Casting to the correct data type\n- Using the proper columns for joins\n\nIf there are any of the above mistakes, rewrite the query. If there are no mistakes, just reproduce the original query.\n\nOutput the final SQL query only.\n\nSQL Query: '), llm=ChatOpenAI(client=, async_client=, model_name='gpt-4o', temperature=0.0, openai_api_key=SecretStr('**********'), openai_proxy='')))]

We configure our model and provide precise instructions on what we expect from it and what it should avoid.

from langchain_core.messages import SystemMessage

SQL_PREFIX = """You are an agent designed to interact with an Oracle SQL database.
Given an input question, create a syntactically correct Oracle SQL query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most 5 results.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
Only use the below tools. Only use the information returned by the below tools to construct your final answer.
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP, TRUNCATE etc.) to the database.

To start you should ALWAYS look at the tables in the database to see what you can query. 
Do NOT skip this step.
When possible try to join tables to extend the retrieved information.
Then you should query the schema of the most relevant tables."""

system_message = SystemMessage(content=SQL_PREFIX)
system_message.pretty_print

Subsequently, we create the agent that will manage the execution of our queries and pass it the LLM, the tools, and the instructions we configured earlier.

from langchain_core.messages import HumanMessage
from langgraph.prebuilt import create_react_agent

agent_executor = create_react_agent(llm, tools, messages_modifier=system_message)

Once our agent is defined, the hard work (which isn’t really that hard) is done. Now we can start playing with our data and see how our natural language request is transformed into an SQL query and how the results are expressed in a corresponding response.

for s in agent_executor.stream(
    {"messages": [HumanMessage(content="Which are the three departments with the most employees?. Show only the department's name")]}
):
    print(s)
    print("----")

Next, we will look at each of the steps executed and showed below. First, the agent examines the available tables. To do this, it uses the sql_db_list_tables tool, which returns a list of tables. From this list, the agent is interested in two tables: departments and employees. It uses the tool to obtain the DDL of these tables and some sample rows. Finally, our agent is able to write the SQL statement that, when executed, will return the requested data. Now, the LLM is capable of integrating those data into the expected response.

The sequence of steps is a bit long and tedious, but it is worth spending a few minutes to analyze it thoroughly and see how the agent, the model, the toolkit, and the database interact.

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_lZmqZ1aZcqFbOSoXKFextj5e', 'function': {'arguments': '{}', 'name': 'sql_db_list_tables'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 576, 'total_tokens': 588}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_5e6c71d4a8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-3fd3af3f-dc8e-4ce4-9249-dc1d70c68726-0', tool_calls=[{'name': 'sql_db_list_tables', 'args': {}, 'id': 'call_lZmqZ1aZcqFbOSoXKFextj5e'}], usage_metadata={'input_tokens': 576, 'output_tokens': 12, 'total_tokens': 588})]}}
----
{'tools': {'messages': [ToolMessage(content='countries, departments, emp_temp, employees, j_purchaseorder, job_history, jobs, locations, regions, sales', name='sql_db_list_tables', tool_call_id='call_lZmqZ1aZcqFbOSoXKFextj5e')]}}
----

more…

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_8O7wQzKWC5b2Q1Vtt9kqB3Wb', 'function': {'arguments': '{\n    "table_names": "departments, employees"\n}', 'name': 'sql_db_schema'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 23, 'prompt_tokens': 621, 'total_tokens': 644}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3e7d703517', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-48d69be3-3e93-476b-a848-c593474c0e65-0', tool_calls=[{'name': 'sql_db_schema', 'args': {'table_names': 'departments, employees'}, 'id': 'call_8O7wQzKWC5b2Q1Vtt9kqB3Wb'}], usage_metadata={'input_tokens': 621, 'output_tokens': 23, 'total_tokens': 644})]}}
----
{'tools': {'messages': [ToolMessage(content='\nCREATE TABLE departments (\n\tdepartment_id NUMBER(4, 0) NOT NULL, \n\tdepartment_name VARCHAR(30 CHAR) NOT NULL, \n\tmanager_id NUMBER(6, 0), \n\tlocation_id NUMBER(4, 0), \n\tCONSTRAINT dept_id_pk PRIMARY KEY (department_id), \n\tCONSTRAINT dept_loc_fk FOREIGN KEY(location_id) REFERENCES locations (location_id), \n\tCONSTRAINT dept_mgr_fk FOREIGN KEY(manager_id) REFERENCES employees (employee_id)\n)\n\n/*\n3 rows from departments table:\ndepartment_id\tdepartment_name\tmanager_id\tlocation_id\n10\tAdministration\t200\t1700\n20\tMarketing\t201\t1800\n30\tPurchasing\t114\t1700\n*/\n\n\nCREATE TABLE employees (\n\temployee_id NUMBER(6, 0) NOT NULL, \n\tfirst_name VARCHAR(20 CHAR), \n\tlast_name VARCHAR(25 CHAR) NOT NULL, \n\temail VARCHAR(25 CHAR) NOT NULL, \n\tphone_number VARCHAR(20 CHAR), \n\thire_date DATE NOT NULL, \n\tjob_id VARCHAR(10 CHAR) NOT NULL, \n\tsalary NUMBER(8, 2), \n\tcommission_pct NUMBER(2, 2), \n\tmanager_id NUMBER(6, 0), \n\tdepartment_id NUMBER(4, 0), \n\tCONSTRAINT emp_emp_id_pk PRIMARY KEY (employee_id), \n\tCONSTRAINT emp_dept_fk FOREIGN KEY(department_id) REFERENCES departments (department_id), \n\tCONSTRAINT emp_job_fk FOREIGN KEY(job_id) REFERENCES jobs (job_id), \n\tCONSTRAINT emp_manager_fk FOREIGN KEY(manager_id) REFERENCES employees (employee_id), \n\tCONSTRAINT emp_salary_min CHECK (salary > 0)\n)\n\n/*\n3 rows from employees table:\nemployee_id\tfirst_name\tlast_name\temail\tphone_number\thire_date\tjob_id\tsalary\tcommission_pct\tmanager_id\tdepartment_id\n199\tDouglas\tGrant\tDGRANT\t1.650.555.0164\t2018-01-13 00:00:00\tSH_CLERK\t2600\tNone\t124\t50\n200\tJennifer\tWhalen\tJWHALEN\t1.515.555.0165\t2013-09-17 00:00:00\tAD_ASST\t4400\tNone\t101\t10\n201\tMichael\tMartinez\tMMARTINE\t1.515.555.0166\t2014-02-17 00:00:00\tMK_MAN\t13000\tNone\t100\t20\n*/', name='sql_db_schema', tool_call_id='call_8O7wQzKWC5b2Q1Vtt9kqB3Wb')]}}
----
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/langchain_community/utilities/sql_database.py:316: SAWarning: Cannot correctly sort tables; there are unresolvable cycles between tables "departments, employees", which is usually caused by mutually dependent foreign key constraints.  Foreign key constraints involving these tables will not be considered; this warning may raise an error in a future release.
  metadata_table_names = [tbl.name for tbl in self._metadata.sorted_tables]
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/langchain_community/utilities/sql_database.py:328: SAWarning: Cannot correctly sort tables; there are unresolvable cycles between tables "departments, employees", which is usually caused by mutually dependent foreign key constraints.  Foreign key constraints involving these tables will not be considered; this warning may raise an error in a future release.
  for tbl in self._metadata.sorted_tables
{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_pP95j1gGdKgkTjTwUjy1oxHu', 'function': {'arguments': '{"query":"SELECT d.department_name\\nFROM departments d\\nJOIN employees e ON d.department_id = e.department_id\\nGROUP BY d.department_name\\nORDER BY COUNT(e.employee_id) DESC\\nFETCH FIRST 3 ROWS ONLY;"}', 'name': 'sql_db_query_checker'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 59, 'prompt_tokens': 1168, 'total_tokens': 1227}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3e7d703517', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-46a825f7-06e0-4ff7-851a-2cd56e5e8d08-0', tool_calls=[{'name': 'sql_db_query_checker', 'args': {'query': 'SELECT d.department_name\nFROM departments d\nJOIN employees e ON d.department_id = e.department_id\nGROUP BY d.department_name\nORDER BY COUNT(e.employee_id) DESC\nFETCH FIRST 3 ROWS ONLY;'}, 'id': 'call_pP95j1gGdKgkTjTwUjy1oxHu'}], usage_metadata={'input_tokens': 1168, 'output_tokens': 59, 'total_tokens': 1227})]}}
----
{'tools': {'messages': [ToolMessage(content='```sql\nSELECT d.department_name\nFROM departments d\nJOIN employees e ON d.department_id = e.department_id\nGROUP BY d.department_name\nORDER BY COUNT(e.employee_id) DESC\nFETCH FIRST 3 ROWS ONLY;\n```', name='sql_db_query_checker', tool_call_id='call_pP95j1gGdKgkTjTwUjy1oxHu')]}}
----
{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_nie8HJ0pA3HtT3JbtIvXNS47', 'function': {'arguments': '{"query":"SELECT d.department_name\\nFROM departments d\\nJOIN employees e ON d.department_id = e.department_id\\nGROUP BY d.department_name\\nORDER BY COUNT(e.employee_id) DESC\\nFETCH FIRST 3 ROWS ONLY;"}', 'name': 'sql_db_query'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 56, 'prompt_tokens': 1285, 'total_tokens': 1341}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3e7d703517', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-7e9bb19b-473c-4b46-b1ed-18bfe4eedfd3-0', tool_calls=[{'name': 'sql_db_query', 'args': {'query': 'SELECT d.department_name\nFROM departments d\nJOIN employees e ON d.department_id = e.department_id\nGROUP BY d.department_name\nORDER BY COUNT(e.employee_id) DESC\nFETCH FIRST 3 ROWS ONLY;'}, 'id': 'call_nie8HJ0pA3HtT3JbtIvXNS47'}], usage_metadata={'input_tokens': 1285, 'output_tokens': 56, 'total_tokens': 1341})]}}
----
{'tools': {'messages': [ToolMessage(content='Error: (oracledb.exceptions.DatabaseError) ORA-00933: SQL command not properly ended\nHelp: https://docs.oracle.com/error-help/db/ora-00933/\n[SQL: SELECT d.department_name\nFROM departments d\nJOIN employees e ON d.department_id = e.department_id\nGROUP BY d.department_name\nORDER BY COUNT(e.employee_id) DESC\nFETCH FIRST 3 ROWS ONLY;]\n(Background on this error at: https://sqlalche.me/e/20/4xp6)', name='sql_db_query', tool_call_id='call_nie8HJ0pA3HtT3JbtIvXNS47')]}}
----
{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_AybECd9b5qFg4QwdssSIoBQT', 'function': {'arguments': '{"query":"SELECT d.department_name\\nFROM departments d\\nJOIN employees e ON d.department_id = e.department_id\\nGROUP BY d.department_name\\nORDER BY COUNT(e.employee_id) DESC\\nFETCH FIRST 3 ROWS ONLY"}', 'name': 'sql_db_query'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 57, 'prompt_tokens': 1460, 'total_tokens': 1517}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3e7d703517', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-3b430236-5ffb-4a65-b903-4a2dfe04adab-0', tool_calls=[{'name': 'sql_db_query', 'args': {'query': 'SELECT d.department_name\nFROM departments d\nJOIN employees e ON d.department_id = e.department_id\nGROUP BY d.department_name\nORDER BY COUNT(e.employee_id) DESC\nFETCH FIRST 3 ROWS ONLY'}, 'id': 'call_AybECd9b5qFg4QwdssSIoBQT'}], usage_metadata={'input_tokens': 1460, 'output_tokens': 57, 'total_tokens': 1517})]}}
----
{'tools': {'messages': [ToolMessage(content="[('Shipping',), ('Sales',), ('Purchasing',)]", name='sql_db_query', tool_call_id='call_AybECd9b5qFg4QwdssSIoBQT')]}}
----
{'agent': {'messages': [AIMessage(content='The three departments with the most employees are:\n\n1. Shipping\n2. Sales\n3. Purchasing', response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 1540, 'total_tokens': 1561}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_9cb5d38cf7', 'finish_reason': 'stop', 'logprobs': None}, id='run-4b672689-2bda-4601-8271-ed06b771262c-0', usage_metadata={'input_tokens': 1540, 'output_tokens': 21, 'total_tokens': 1561})]}}
----

As you can see from the previous output, the agent is capable of interpreting and correcting errors in the SQL statement. In short, it is following one of the instructions we passed earlier in the system_message: "You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again".

Let’s try another question.

for s in agent_executor.stream(
    {"messages": [HumanMessage(content="what's the salary of John?. List all of them including the full name and salary")]}
):
    print(s)

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_HDdiG0vrxLUmqrbn72qUzafI', 'function': {'arguments': '{}', 'name': 'sql_db_list_tables'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 577, 'total_tokens': 589}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3e7d703517', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-997c5a45-d8ae-4c6b-bc33-9d4e25fed3ce-0', tool_calls=[{'name': 'sql_db_list_tables', 'args': {}, 'id': 'call_HDdiG0vrxLUmqrbn72qUzafI'}], usage_metadata={'input_tokens': 577, 'output_tokens': 12, 'total_tokens': 589})]}}

more…

{'tools': {'messages': [ToolMessage(content='countries, departments, emp_temp, employees, j_purchaseorder, job_history, jobs, locations, regions, sales', name='sql_db_list_tables', tool_call_id='call_HDdiG0vrxLUmqrbn72qUzafI')]}}
{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_lPxHV4CKJF17KBnllw1RD1k5', 'function': {'arguments': '{"table_names":"employees"}', 'name': 'sql_db_schema'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 622, 'total_tokens': 638}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3e7d703517', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-dc5647c7-a9a1-4048-8b5a-db00f41dd48e-0', tool_calls=[{'name': 'sql_db_schema', 'args': {'table_names': 'employees'}, 'id': 'call_lPxHV4CKJF17KBnllw1RD1k5'}], usage_metadata={'input_tokens': 622, 'output_tokens': 16, 'total_tokens': 638})]}}
{'tools': {'messages': [ToolMessage(content='\nCREATE TABLE employees (\n\temployee_id NUMBER(6, 0) NOT NULL, \n\tfirst_name VARCHAR(20 CHAR), \n\tlast_name VARCHAR(25 CHAR) NOT NULL, \n\temail VARCHAR(25 CHAR) NOT NULL, \n\tphone_number VARCHAR(20 CHAR), \n\thire_date DATE NOT NULL, \n\tjob_id VARCHAR(10 CHAR) NOT NULL, \n\tsalary NUMBER(8, 2), \n\tcommission_pct NUMBER(2, 2), \n\tmanager_id NUMBER(6, 0), \n\tdepartment_id NUMBER(4, 0), \n\tCONSTRAINT emp_emp_id_pk PRIMARY KEY (employee_id), \n\tCONSTRAINT emp_dept_fk FOREIGN KEY(department_id) REFERENCES departments (department_id), \n\tCONSTRAINT emp_job_fk FOREIGN KEY(job_id) REFERENCES jobs (job_id), \n\tCONSTRAINT emp_manager_fk FOREIGN KEY(manager_id) REFERENCES employees (employee_id), \n\tCONSTRAINT emp_salary_min CHECK (salary > 0)\n)\n\n/*\n3 rows from employees table:\nemployee_id\tfirst_name\tlast_name\temail\tphone_number\thire_date\tjob_id\tsalary\tcommission_pct\tmanager_id\tdepartment_id\n199\tDouglas\tGrant\tDGRANT\t1.650.555.0164\t2018-01-13 00:00:00\tSH_CLERK\t2600\tNone\t124\t50\n200\tJennifer\tWhalen\tJWHALEN\t1.515.555.0165\t2013-09-17 00:00:00\tAD_ASST\t4400\tNone\t101\t10\n201\tMichael\tMartinez\tMMARTINE\t1.515.555.0166\t2014-02-17 00:00:00\tMK_MAN\t13000\tNone\t100\t20\n*/', name='sql_db_schema', tool_call_id='call_lPxHV4CKJF17KBnllw1RD1k5')]}}
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/langchain_community/utilities/sql_database.py:316: SAWarning: Cannot correctly sort tables; there are unresolvable cycles between tables "departments, employees", which is usually caused by mutually dependent foreign key constraints.  Foreign key constraints involving these tables will not be considered; this warning may raise an error in a future release.
  metadata_table_names = [tbl.name for tbl in self._metadata.sorted_tables]
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/langchain_community/utilities/sql_database.py:328: SAWarning: Cannot correctly sort tables; there are unresolvable cycles between tables "departments, employees", which is usually caused by mutually dependent foreign key constraints.  Foreign key constraints involving these tables will not be considered; this warning may raise an error in a future release.
  for tbl in self._metadata.sorted_tables
{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_QbBqhzNWa3p9IBWO9V4ZRwHz', 'function': {'arguments': '{"query":"SELECT first_name, last_name, salary FROM employees WHERE first_name = \'John\'"}', 'name': 'sql_db_query_checker'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 32, 'prompt_tokens': 1021, 'total_tokens': 1053}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3e7d703517', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-f4c8dc31-516f-4281-b4fe-7ae36c4ca453-0', tool_calls=[{'name': 'sql_db_query_checker', 'args': {'query': "SELECT first_name, last_name, salary FROM employees WHERE first_name = 'John'"}, 'id': 'call_QbBqhzNWa3p9IBWO9V4ZRwHz'}], usage_metadata={'input_tokens': 1021, 'output_tokens': 32, 'total_tokens': 1053})]}}
{'tools': {'messages': [ToolMessage(content="```sql\nSELECT first_name, last_name, salary FROM employees WHERE first_name = 'John'\n```", name='sql_db_query_checker', tool_call_id='call_QbBqhzNWa3p9IBWO9V4ZRwHz')]}}
{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_wdLK9uNo3n9B6JQrepnVjrej', 'function': {'arguments': '{"query":"SELECT first_name, last_name, salary FROM employees WHERE first_name = \'John\'"}', 'name': 'sql_db_query'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 31, 'prompt_tokens': 1084, 'total_tokens': 1115}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3e7d703517', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-a4e14038-99b8-4b49-87ca-9f9656da35db-0', tool_calls=[{'name': 'sql_db_query', 'args': {'query': "SELECT first_name, last_name, salary FROM employees WHERE first_name = 'John'"}, 'id': 'call_wdLK9uNo3n9B6JQrepnVjrej'}], usage_metadata={'input_tokens': 1084, 'output_tokens': 31, 'total_tokens': 1115})]}}
{'tools': {'messages': [ToolMessage(content="[('John', 'Chen', Decimal('8200')), ('John', 'Seo', Decimal('2700')), ('John', 'Singh', Decimal('14000'))]", name='sql_db_query', tool_call_id='call_wdLK9uNo3n9B6JQrepnVjrej')]}}
{'agent': {'messages': [AIMessage(content='Here are the salaries for employees named John:\n\n1. **John Chen**: $8200\n2. **John Seo**: $2700\n3. **John Singh**: $14000', response_metadata={'token_usage': {'completion_tokens': 42, 'prompt_tokens': 1160, 'total_tokens': 1202}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3e7d703517', 'finish_reason': 'stop', 'logprobs': None}, id='run-6db4d640-576b-435c-904d-b45f664f0ff0-0', usage_metadata={'input_tokens': 1160, 'output_tokens': 42, 'total_tokens': 1202})]}}

Obviously, we don’t want to overwhelm our users with the entire sequence of steps, so the following function will only display the final response.

def show_final_message(response):
    """Print the last message from the response"""
    message = response["messages"][-1]
    if isinstance(message, tuple):
        print(message)
    else:
        message.pretty_print()

Let’s see this in action.

show_final_message(agent_executor.invoke(
    {"messages": [HumanMessage(content="What's John Chen's salary?")]}
    )
)

================================== Ai Message ==================================

John Chen's salary is 8200.

show_final_message( agent_executor.invoke(
    {"messages": [HumanMessage(content="What's the highest salary for accountants?.")]}
    )
)

show_final_message( agent_executor.invoke(
    {"messages": [HumanMessage(content="What's the salary and position of Daniel?.")]}
    )
)

================================== Ai Message ==================================

The highest salary for accountants is 9000.


================================== Ai Message ==================================

Daniel's salary is 9000, and his position is Accountant.

As we mentioned at the beginning, the LANGCHAIN_API_KEY is optional, but its use allows us to analyze the trace of the entire process in greater detail. This facilitates the debugging and optimization of our code. It is also highly recommended for aiding in the learning and understanding of this new technology.

Langsmith is the platform responsible for managing all this information. Among its many advantages, it stands out for its ability to monitor the economic cost and the consumed time of each query.

To wrap up

As a first step for our lab, it’s not bad at all. However, there are still some details to consider.

Memory: At the moment, our agent does not have memory. This means it is not capable of remembering our previous conversation, so its context is incomplete.
Privacy: Our employees would be a bit upset if we reveal their personal data or salary information to other companies. It is necessary to obfuscate this information to comply with the most fundamental privacy criteria. For this lab, we have used the data from Oracle’s sample HR schema.
Semantic Search: Currently, the search we perform does not have a semantic approximation. We can significantly improve it by using vectors.

We will address each of these points and improve our agent in future posts. Subscribe to receive updates on them.

Google Workspace Bible: The Complete Guide to Unlock Your Productivity Potential with All the Apps from the Suite | Includes Top Prompts for Bard AI

(3256)

$19.99 (as of July 12, 2025 13:09 GMT +00:00 - )

Oracle PL/SQL Tuning: Expert Secrets for High Performance Programming (Oracle In-Focus)

(36514)

$6.50 (as of July 12, 2025 13:09 GMT +00:00 - )

Create Rapid Low-Code Web Applications in Oracle APEX 21.2: A platform to develop stunning, scalable data-centric web apps fast

(40512)

$35.00 (as of July 12, 2025 13:09 GMT +00:00 - )

AI Fundamentals: Mastering the Basics of Artificial Intelligence

(3556)

$12.99 (as of July 12, 2025 13:09 GMT +00:00 - )

Oracle APEX 23 Crash Course for Beginners (Color Edition): Explore the NO-CODE Platform to Build Stunning Web Applications

(2553)

$35.99 (as of July 12, 2025 13:09 GMT +00:00 - )

5 thought on “RAG with Langchain and Oracle Database”

Biplab Mahato says:

August 21, 2024 at 6:12 pm

Hi Carlos,
First of all, lot of thanks for making such an amazing document for RAG with Oracle Database.
I am facing issue while implementing the same thing. Can you please help-

In the code snippet, while calling this – agent_executor.stream() in the for loop
I am getting this error in my windows vm.
ConnectTimeout: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

from langchain_core.messages import HumanMessage
from langgraph.prebuilt import create_react_agent

agent_executor = create_react_agent(llm, tools, state_modifier=system_message)

for s in agent_executor.stream(
{“messages”: [HumanMessage(content=”Which are the three departments with the most employees?. Show only the department’s name”)]}
):
print(s)
print(“—-“)

1. Carlos J says:
  
  August 22, 2024 at 8:24 am
  
  Hi Biplab,
  
  It seems like a connectivity issue. Please check the firewall, proxy, or network settings of the VM. Additionally, you can use the Langsmith platform to verify that the requests are reaching the LLM.
  
  Cheers
  
Anonymous says:

September 3, 2024 at 11:20 am

Hi, i have multiple tables in Oracle DB. I want the LLM to use only 1 specific table, anywhere can i make the particular change? Tried in the prompt itself, but its very slow.

1. Carlos Jimenez says:
  
  September 3, 2024 at 12:05 pm
  
  Hi, thanks for reading.
  
  In db = SQLDatabase.from_uri(db_uri) you can use the parameters ‘ignore_tables’ and ‘include_tables’. Check the available parameters here: https://api.python.langchain.com/en/latest/_modules/langchain_community/utilities/sql_database.html
  
2. Carlos Jimenez says:
  
  September 3, 2024 at 12:10 pm
  
  In addition, consider using views and materialized views instead of directly accessing the table. This approach is way better from many points of views.