CREATE OR REPLACE TABLE test
(
id UInt64,
Name TEXT,
updatedAt DateTime DEFAULT now()
)
ENGINE = MergeTree
ORDER BY id;
通过 IMPORT FOREIGN SCHEMA 创建的外部表:
CREATE TABLE test
(
id BIGINT NOT NULL,
"Name" TEXT NOT NULL,
"updatedAt" TIMESTAMPTZ NOT NULL
);
查询时需正确使用双引号:
SELECT id, "Name", "updatedAt" FROM test;
若需创建名称不同或全小写(大小写不敏感)的对象,请使用 CREATE FOREIGN TABLE。
[root@bigdata ~]# docker run -d --network host --name clickhouse -p 8123:8123 -p 9000:9000 --ulimit nofile=262144:262144 clickhouse
Unable to find image 'clickhouse:latest' locally
latest: Pulling from library/clickhouse
7e49dc6156b0: Pull complete
66a9f27340b0: Pull complete
4ea1943fd65c: Pull complete
c4e91a1791f4: Pull complete
b95838ab0602: Pull complete
e70e29afad90: Pull complete
596aaa9e4cb7: Pull complete
077cf31b66a6: Pull complete
Digest: sha256:0c1acee29c905829331544ec71342ed4346a6383da60f0c488f68b6b45009010
Status: Downloaded newer image for clickhouse:latest
WARNING: Published ports are discarded when using host network mode
22ec632560f4d7ddafc04610623ceb80e9588ac5fa702c13eff57b15ad8578ac
[root@bigdata ~]# docker exec -it clickhouse clickhouse-client
ClickHouse client version 25.11.2.24 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 25.11.2.
Warnings:
* Delay accounting is not enabled, OSIOWaitMicroseconds will not be gathered. You can enable it using `sudo sh -c 'echo 1 > /proc/sys/kernel/task_delayacct'` or by using sysctl.
* Linux transparent hugepages are set to "always". Check /sys/kernel/mm/transparent_hugepage/enabled
* Available disk space for data at server startup is too low (1GiB): /var/lib/clickhouse/
* Available disk space for logs at server startup is too low (1GiB): /var/log/clickhouse-server
1. 简单关联查询(替代 dictGet)
taxi=# SELECT
count(1) AS total,
"Borough" AS borough_name
FROM taxi.trips
JOIN taxi.taxi_zone_dictionary
ON trips.pickup_nyct2010_gid = toUInt64(taxi.taxi_zone_dictionary."LocationID")
WHERE pickup_nyct2010_gid > 0 -- 过滤无效 gid
AND dropoff_nyct2010_gid IN (132, 138) -- 目的地为机场
GROUP BY "Borough"
ORDER BY total DESC;
total | borough_name
-------+---------------
7053 | Manhattan
6828 | Brooklyn
4458 | Queens
2670 | Bronx
554 | Staten Island
53 | EWR
(6 行记录)
时间:48.449 毫秒
说明:该查询结果与前文 dictGet 示例一致(不含 Unknown 行),底层 ClickHouse 会自动优化为字典查询,JOIN 语法更符合 SQL 开发者习惯。
2. 查看关联查询计划
taxi=# explain SELECT
count(1) AS total,
"Borough"
FROM taxi.trips
JOIN taxi.taxi_zone_dictionary
ON trips.pickup_nyct2010_gid = toUInt64(taxi.taxi_zone_dictionary."LocationID")
WHERE pickup_nyct2010_gid > 0
AND dropoff_nyct2010_gid IN (132, 138)
GROUP BY "Borough"
ORDER BY total DESC;
查询计划
-----------------------------------------------------------------------
外部扫描 (cost=1.00..5.10 rows=1000 width=40)
Relations: 聚合 on ((trips) INNER JOIN (taxi_zone_dictionary))
(2 行记录)
时间:2.012 毫秒
结果表明:关联查询已完全下推至 ClickHouse 执行,PostgreSQL 仅接收最终结果。
3. 复杂关联查询(筛选高小费行程)
查询小费金额前 1000 的行程,并关联字典获取下车点行政区信息:
taxi=# SELECT *
FROM taxi.trips
JOIN taxi.taxi_zone_dictionary
ON trips.dropoff_nyct2010_gid = taxi.taxi_zone_dictionary."LocationID"
WHERE tip_amount > 0
ORDER BY tip_amount DESC
LIMIT 1000;
注意事项
避免使用 SELECT *:查询时应明确指定所需列,减少数据传输量,提升性能。
数据类型兼容性:关联查询时需确保关联字段类型一致(如使用 toUInt64 转换类型),避免类型不匹配导致查询失败。
字典与表关联:ClickHouse 字典本质是内存中的维度表,关联查询性能优于普通表 JOIN,适合高频访问的维度数据。