skip to content
Pradeep Chhetri

Software Engineer. Writing about databases, infrastructure, and distributed systems.

Main navigation

  • Home
  • Blog
  • TIL
GitHub LinkedIn RSS

DuckDB: Query SQLite and PostgreSQL Data

October 4, 2020 · TIL

Tags:
  • duckdb
  • postgresql
  • sqlite

Introduction

DuckDB can query data stored in SQLite and PostgreSQL using SQLiteScanner and PostgresScanner respectively. DuckDB claims reads via DuckDB client will be faster than reads via their respective datastore client. Let’s test it out.

Note:

  • This is not any kind of full-blown performance analysis, we just trying one random query and comparing its query duration.
  • To prevent performance difference due to indexes, we are not utilizing indexes at all.
  • We didn’t tweak any of the PRAGMA or configuration variables and used the default configuration for all three datastores.
  • We ran all the commands in the same physical host.

Let’s load one month of NYC Taxi Dataset (14M rows) into SQLite(v3.33.0) and try to query it via SQLite and DuckDB.

Querying SQLite data from SQLite

sqlite> SELECT
   ...>    COUNT(DISTINCT vendor_name) AS count_vendor,
   ...>    COUNT(DISTINCT Passenger_Count) AS count_passenger,
   ...>    COUNT(DISTINCT Trip_Distance) AS count_trip,
   ...>    COUNT(DISTINCT Rate_Code) AS count_code,
   ...>    COUNT(DISTINCT store_and_forward) AS count_store,
   ...>    COUNT(DISTINCT Payment_Type) AS count_payment
   ...> FROM taxi;
4|8|5748|1|3|6
Run Time: real 10.936 user 9.542476 sys 0.840094

Querying SQLite data from DuckDB

duckdb> LOAD 'build/release/sqlite_scanner.duckdb_extension';
duckdb> CALL sqlite_attach('sqlite3.db');
duckdb> SELECT
>    COUNT(DISTINCT vendor_name) AS count_vendor,
>    COUNT(DISTINCT Passenger_Count) AS count_passenger,
>    COUNT(DISTINCT Trip_Distance) AS count_trip,
>    COUNT(DISTINCT Rate_Code) AS count_code,
>    COUNT(DISTINCT store_and_forward) AS count_store,
>    COUNT(DISTINCT Payment_Type) AS count_payment
> FROM taxi;
┌──────────────┬─────────────────┬────────────┬────────────┬─────────────┬───────────────┐
│ count_vendor │ count_passenger │ count_trip │ count_code │ count_store │ count_payment │
├──────────────┼─────────────────┼────────────┼────────────┼─────────────┼───────────────┤
│ 4            │ 8               │ 5748       │ 1          │ 3           │ 6             │
└──────────────┴─────────────────┴────────────┴────────────┴─────────────┴───────────────┘
Run Time: real 7.127 user 12.567104 sys 22.495830
Query ViaQuery Duration (sec)
SQLite11
DuckDB7

Let’s load same one month of NYC Taxi Dataset into PostgreSQL(v14.3) and try to query it via PostgreSQL and DuckDB.

Querying PostgreSQL data from PostgreSQL

postgres> SELECT
   COUNT(DISTINCT "vendor_name") AS count_vendor,
   COUNT(DISTINCT "Passenger_Count") AS count_passenger,
   COUNT(DISTINCT "Trip_Distance") AS count_trip,
   COUNT(DISTINCT "Rate_Code") AS count_code,
   COUNT(DISTINCT "store_and_forward") AS count_store,
   COUNT(DISTINCT "Payment_Type") AS count_payment
FROM taxi;
 count_vendor | count_passenger | count_trip | count_code | count_store | count_payment
--------------+-----------------+------------+------------+-------------+---------------
            4 |               8 |       5748 |          1 |           3 |             6
(1 row)

Time: 27258.238 ms (00:27.258)

Querying PostgreSQL data from DuckDB

$ ./duckdb/build/release/duckdb
v0.3.5-dev258 fe433d317
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
duckdb> LOAD 'build/release/postgres_scanner.duckdb_extension';
duckdb> CALL POSTGRES_ATTACH('host=127.0.0.1 port=5432 user=postgres dbname=postgres');
duckdb> SELECT
>    COUNT(DISTINCT vendor_name) AS count_vendor,
>    COUNT(DISTINCT Passenger_Count) AS count_passenger,
>    COUNT(DISTINCT Trip_Distance) AS count_trip,
>    COUNT(DISTINCT Rate_Code) AS count_code,
>    COUNT(DISTINCT store_and_forward) AS count_store,
>    COUNT(DISTINCT Payment_Type) AS count_payment
> FROM taxi;
┌──────────────┬─────────────────┬────────────┬────────────┬─────────────┬───────────────┐
│ count_vendor │ count_passenger │ count_trip │ count_code │ count_store │ count_payment │
├──────────────┼─────────────────┼────────────┼────────────┼─────────────┼───────────────┤
│ 4            │ 8               │ 5748       │ 1          │ 3           │ 6             │
└──────────────┴─────────────────┴────────────┴────────────┴─────────────┴───────────────┘
Run Time: real 3.472 user 5.938940 sys 0.657260
Query ViaQuery Duration (sec)
PostgreSQL27
DuckDB3.5

Conclusion

Overall DuckDB claims were correct and it is already proving itself faster than well known databases.

Need to debug more

I was expecting PostgreSQL to be fastest among all three but it proved to be slowest for the query we took.

© 2026 Pradeep Chhetri