How to query an Excel file like a database (no formulas)
Stop fighting VLOOKUP and pivot tables. A practical 2026 guide to treating .xlsx files as queryable data — multi-sheet joins, plain-English questions, instant charts.
By The iDBQuery Team
Most people who live in Excel don't actually want a spreadsheet. They want answers from data that happens to live in spreadsheets. Pivot tables, VLOOKUP, and INDEX-MATCH are workarounds for the fact that Excel was built to display values in cells, not to be queried.
This post is about treating an .xlsx file as a database — without exporting, without re-formatting, without writing SQL. It works in 2026 because of three things that didn't exist five years ago: schema inference on arbitrary spreadsheets, plain-English-to-SQL across heterogeneous columns, and federation across multiple sheets in the same workbook.
What "querying Excel like a database" actually means
Concretely: you point a tool at an .xlsx file and it does this in under 10 seconds:
- Reads every sheet (
Sheet1,Q4 Orders,Customer Master, etc.) - Infers the schema for each — column names, data types, primary key candidates
- Detects relationships between sheets (e.g.
Customer Master.idmatchesOrders.customer_id) - Lets you ask plain-English questions that return as charts, tables, or both
The result: "show me top 10 customers by Q4 revenue, joined with their region from the master sheet" works without writing a single VLOOKUP.
Excel integration handles this for .xlsx, .xls, and .xlsm files up to 100 MB. Multi-sheet workbooks with merged cells, formula-derived columns, and named ranges all work.
The four things that always break with naive approaches
If you've tried "just upload the file to ChatGPT and ask," you've hit at least two of these:
1. Multi-sheet joins
ChatGPT loads sheets independently. Asking "join the Orders sheet with the Customers sheet on customer_id" works the first time, then forgets the relationship the next question. Anything stateful — a multi-step analysis or a saved report — collapses.
The fix: a tool that builds a persistent schema graph at upload time. Once iDBQuery knows that Orders.customer_id references Customers.id, every subsequent question reuses that relationship without you re-stating it.
2. Mixed-type columns
Excel lets a single column hold strings, numbers, dates, and #N/A errors. Naive readers either crash, drop the rows, or coerce everything to text — which means you lose the ability to do SUM, AVG, or date-range filters.
The fix: type-inference per cell with explicit handling for the common Excel-isms — empty strings, error values, dates stored as integers, numbers stored as text with thousand separators. Production tools normalize these on import without losing data.
3. Header rows that aren't on row 1
Real spreadsheets have title rows, merged-cell headers, blank rows separating sections, and totals at the bottom. Naive parsers assume row 1 is the header and break on everything else.
The fix: header detection that looks for the first row where most cells are non-empty and the row below is consistent in type. Imperfect — sometimes you have to manually point at the header row — but right >90% of the time on real-world files.
4. Pivot tables and formula columns
Pivot tables aren't data; they're views over data. Formula columns (=B2*C2) need to be evaluated, not stored as text.
The fix: read the underlying calculated values from the .xlsx file's cached results (Excel stores both the formula and the last-calculated result), then expose them as regular columns. Pivot tables get unrolled back into the source data automatically.
A real workflow: monthly sales close
You get the sales close workbook on the 3rd of every month. Three sheets:
Orders— every line item, ~12,000 rows, columns:order_id, date, customer_id, sku, qty, unit_priceCustomer Master— 800 rows, columns:id, company_name, region, account_managerReturns— ~600 rows, columns:order_id, return_date, reason, refund_amount
The CFO wants: net revenue by region for the month, top 10 customers by net revenue, returns rate by SKU, and a comparison vs the previous month.
In Excel that's an afternoon of pivot tables, lookups, and chart formatting. With iDBQuery, you upload the file once and ask:
"Net revenue by region for last month, with prior-month comparison"
The AI generates this internally:
SELECT
cm.region,
SUM(o.qty * o.unit_price) - COALESCE(SUM(r.refund_amount), 0) AS net_revenue,
/* prior-month variant in a CTE … */
FROM "Orders" o
JOIN "Customer Master" cm ON cm.id = o.customer_id
LEFT JOIN "Returns" r ON r.order_id = o.order_id
WHERE date >= '2026-04-01' AND date < '2026-05-01'
GROUP BY cm.region;
…and returns a bar chart with prior-month comparison overlaid. Then "top 10 customers" reuses the same join. Then "returns rate by SKU" pulls from the third sheet. Each question takes 2–4 seconds; the whole analysis is done in 10 minutes instead of 4 hours.
Save the four-question sequence as a report and next month it re-runs against the new file with one click.
When chat-with-Excel is the wrong tool
It's wrong when:
- The "data" is actually a financial model with thousands of formula cells. Models built in Excel are programs, not data — the value is in the structure. A chat interface treats them as data and loses the program. Keep the model in Excel; export the outputs to query.
- The file changes structure every month. If column names, sheet names, or row layouts shift, schema inference breaks. Either standardize the export or use a tool with explicit schema mapping.
- The dataset is over ~100 MB or millions of rows. At that point you've outgrown Excel as a storage layer. Move to a real database (MySQL, Postgres, SQLite) and connect the same way.
- You need real-time updates. Excel files are snapshots. If the source data updates every minute, you want a live database connection, not file uploads.
What about CSVs?
Same story, simpler. CSV upload auto-detects delimiters (comma, tab, semicolon, pipe), quote characters, encoding (UTF-8, UTF-16, Latin-1), and column types. No header-row guessing — CSV's first row is always the header.
Security: what happens to your file after upload
Three things to verify before uploading anything sensitive:
- Where is the file stored? Files should land in per-account isolated storage, not pooled across tenants.
- Is the data sent to the LLM? It should not be. The LLM gets the schema (column names + 5 sample rows) plus your prompt — never the full dataset. Generated SQL runs against your file inside the tool's infrastructure.
- Can you delete the file? "Delete" should mean hard-delete cascade — the file, the cached schema, the embeddings, the chat history. Some tools soft-delete, which is not deletion.
iDBQuery's security page documents exactly what we send to the model and what we don't.
Conclusion
In 2026, .xlsx files are queryable like databases. Multi-sheet joins, mixed-type columns, header detection, and formula values all work without manual reformatting. You upload, you ask, you get charts. Pivot tables become a museum exhibit.
Try it: drop your monthly close workbook into iDBQuery and ask a question. Free tier handles up to 3 sources and 5 saved reports — no card required.
FAQ
Does it work with .xls (the old Excel format)? Yes. Both .xls (binary) and .xlsx (zip-XML) and .xlsm (macro-enabled) are supported. The macro code is ignored; only the data is read.
What if my workbook has 30 sheets? All sheets are loaded and queryable. The AI picks the relevant sheets based on your question, so you don't have to remember which sheet has what. For workbooks with 100+ sheets, performance degrades — split into multiple files.
Can I update the file and re-run my saved questions? Yes. Re-upload the file (same name) and saved reports re-run against the new data. Schema changes between versions are detected and surfaced before the report runs.
Does it preserve cell formatting (bold, colors, number formats)? No — formatting is presentation, not data. Currency formats and date formats are inferred from cell values. Visual formatting is dropped.
Can I join two separate Excel files? Yes — multi-source projects let you upload N files and join across all of them in one chat. No data movement; the join happens in-process.
Is there a row limit? 100 MB file size limit, which is roughly 1–2M rows depending on column count and content. Above that, switch to a real database connection.
What about charts that already exist in my workbook? They're ignored on import. Re-create them from chat prompts — typically faster than the original took to build, and they update automatically when you re-upload the file.