diff --git a/docs/en/antalya/swarm.md b/docs/en/antalya/swarm.md new file mode 100644 index 000000000000..a26f9de26e0a --- /dev/null +++ b/docs/en/antalya/swarm.md @@ -0,0 +1,73 @@ +# Antalya branch + +## Swarm + +### Difference with upstream version + +#### `storage_type` argument in object storage functions + +In upstream ClickHouse, there are several table functions to read Iceberg tables from different storage backends such as `icebergLocal`, `icebergS3`, `icebergAzure`, `icebergHDFS`, cluster variants, the `iceberg` function as a synonym for `icebergS3`, and table engines like `IcebergLocal`, `IcebergS3`, `IcebergAzure`, `IcebergHDFS`. + +In the Antalya branch, the `iceberg` table function and the `Iceberg` table engine unify all variants into one by using a new named argument, `storage_type`, which can be one of `local`, `s3`, `azure`, or `hdfs`. + +Old syntax examples: + +```sql +SELECT * FROM icebergS3('http://minio1:9000/root/table_data', 'minio', 'minio123', 'Parquet'); +SELECT * FROM icebergAzureCluster('mycluster', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet'); +CREATE TABLE mytable ENGINE=IcebergHDFS('/table_data', 'Parquet'); +``` + +New syntax examples: + +```sql +SELECT * FROM iceberg(storage_type='s3', 'http://minio1:9000/root/table_data', 'minio', 'minio123', 'Parquet'); +SELECT * FROM icebergCluster('mycluster', storage_type='azure', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet'); +CREATE TABLE mytable ENGINE=Iceberg('/table_data', 'Parquet', storage_type='hdfs'); +``` + +Also, if a named collection is used to store access parameters, the field `storage_type` can be included in the same named collection: + +```xml + + + http://minio1:9001/root/ + minio + minio123 + s3 + + +``` + +```sql +SELECT * FROM iceberg(s3, filename='table_data'); +``` + +By default `storage_type` is `'s3'` to maintain backward compatibility. + + +#### `object_storage_cluster` setting + +The new setting `object_storage_cluster` controls whether a single-node or cluster variant of table functions reading from object storage (e.g., `s3`, `azure`, `iceberg`, and their cluster variants like `s3Cluster`, `azureCluster`, `icebergCluster`) is used. + +Old syntax examples: + +```sql +SELECT * from s3Cluster('myCluster', 'http://minio1:9001/root/data/{clickhouse,database}/*', 'minio', 'minio123', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))'); +SELECT * FROM icebergAzureCluster('mycluster', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet'); +``` + +New syntax examples: + +```sql +SELECT * from s3('http://minio1:9001/root/data/{clickhouse,database}/*', 'minio', 'minio123', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') + SETTINGS object_storage_cluster='myCluster'; +SELECT * FROM icebergAzure('http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet') + SETTINGS object_storage_cluster='myCluster'; +``` + +This setting also applies to table engines and can be used with tables managed by Iceberg Catalog. + +Note: The upstream ClickHouse has introduced analogous settings, such as `parallel_replicas_for_cluster_engines` and `cluster_for_parallel_replicas`. Since version 25.10, these settings work with table engines. It is possible that in the future, the `object_storage_cluster` setting will be deprecated. diff --git a/docs/en/engines/table-engines/integrations/iceberg.md b/docs/en/engines/table-engines/integrations/iceberg.md index af50e1099e4d..532d2f0eef31 100644 --- a/docs/en/engines/table-engines/integrations/iceberg.md +++ b/docs/en/engines/table-engines/integrations/iceberg.md @@ -324,6 +324,62 @@ SETTINGS iceberg_metadata_staleness_ms=120000 **Note**: Current expectation is that metadata cache size is sufficient to hold the latest metadata snapshot in full for all active tables, if asynchronous prefetching is enabled. +## Altinity Antalya branch + +### Specify storage type in arguments + +Only in the Altinity Antalya branch does `Iceberg` table engine support all storage types. The storage type can be specified using the named argument `storage_type`. Supported values are `s3`, `azure`, `hdfs`, and `local`. + +```sql +CREATE TABLE iceberg_table_s3 + ENGINE = Iceberg(storage_type='s3', url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression]) + +CREATE TABLE iceberg_table_azure + ENGINE = Iceberg(storage_type='azure', connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression]) + +CREATE TABLE iceberg_table_hdfs + ENGINE = Iceberg(storage_type='hdfs', path_to_table, [,format] [,compression_method]) + +CREATE TABLE iceberg_table_local + ENGINE = Iceberg(storage_type='local', path_to_table, [,format] [,compression_method]) +``` + +### Specify storage type in named collection + +Only in Altinity Antalya branch `storage_type` can be included as part of a named collection. This allows for centralized configuration of storage settings. + +```xml + + + + http://test.s3.amazonaws.com/clickhouse-bucket/ + test + test + auto + auto + s3 + + + +``` + +```sql +CREATE TABLE iceberg_table ENGINE=Iceberg(iceberg_conf, filename = 'test_table') +``` + +The default value for `storage_type` is `s3`. + +### The `object_storage_cluster` setting. + +Only in the Altinity Antalya branch is an alternative syntax for the `Iceberg` table engine available. This syntax allows execution on a cluster when the `object_storage_cluster` setting is non-empty and contains the cluster name. + +```sql +CREATE TABLE iceberg_table_s3 + ENGINE = Iceberg(storage_type='s3', url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression]); + +SELECT * FROM iceberg_table_s3 SETTINGS object_storage_cluster='cluster_simple'; +``` + ## See also {#see-also} - [iceberg table function](/sql-reference/table-functions/iceberg.md) diff --git a/docs/en/sql-reference/distribution-on-cluster.md b/docs/en/sql-reference/distribution-on-cluster.md new file mode 100644 index 000000000000..3a9835e23856 --- /dev/null +++ b/docs/en/sql-reference/distribution-on-cluster.md @@ -0,0 +1,23 @@ +# Task distribution in *Cluster family functions + +## Task distribution algorithm + +Table functions such as `s3Cluster`, `azureBlobStorageCluster`, `hdsfCluster`, `icebergCluster`, and table engines like `S3`, `Azure`, `HDFS`, `Iceberg` with the setting `object_storage_cluster` distribute tasks across all cluster nodes or a subset limited by the `object_storage_max_nodes` setting. This setting limits the number of nodes involved in processing a distributed query, randomly selecting nodes for each query. + +A single task corresponds to processing one source file. + +For each file, one cluster node is selected as the primary node using a consistent Rendezvous Hashing algorithm. This algorithm guarantees that: + * The same node is consistently selected as primary for each file, as long as the cluster remains unchanged. + * When the cluster changes (nodes added or removed), only files assigned to those affected nodes change their primary node assignment. + +This improves cache efficiency by minimizing data movement among nodes. + +## `lock_object_storage_task_distribution_ms` setting + +Each node begins processing files for which it is the primary node. After completing its assigned files, a node may take tasks from other nodes, either immediately or after waiting for `lock_object_storage_task_distribution_ms` milliseconds if the primary node does not request new files during that interval. The default value of `lock_object_storage_task_distribution_ms` is 500 milliseconds. This setting balances between caching efficiency and workload redistribution when nodes are imbalanced. + +## `SYSTEM STOP SWARM MODE` command + +If a node needs to shut down gracefully, the command `SYSTEM STOP SWARM MODE` prevents the node from receiving new tasks for *Cluster-family queries. The node finishes processing already assigned files before it can safely shut down without errors. + +Receiving new tasks can be resumed with the command `SYSTEM START SWARM MODE`. diff --git a/docs/en/sql-reference/table-functions/azureBlobStorageCluster.md b/docs/en/sql-reference/table-functions/azureBlobStorageCluster.md index 4db1dbb594c6..b67ea0efe2e2 100644 --- a/docs/en/sql-reference/table-functions/azureBlobStorageCluster.md +++ b/docs/en/sql-reference/table-functions/azureBlobStorageCluster.md @@ -54,6 +54,20 @@ SELECT count(*) FROM azureBlobStorageCluster( See [azureBlobStorage](/sql-reference/table-functions/azureBlobStorage#using-shared-access-signatures-sas-sas-tokens) for examples. +## Altinity Antalya branch + +### `object_storage_cluster` setting. + +Only in the Altinity Antalya branch, the alternative syntax for the `azureBlobStorageCluster` table function is avilable. This allows the `azureBlobStorage` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Azure Blob Storage across a ClickHouse cluster. + +```sql +SELECT count(*) FROM azureBlobStorage( + 'http://azurite1:10000/devstoreaccount1', 'testcontainer', 'test_cluster_count.csv', 'devstoreaccount1', + 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'CSV', + 'auto', 'key UInt64') +SETTINGS object_storage_cluster='cluster_simple' +``` + ## Related {#related} - [AzureBlobStorage engine](../../engines/table-engines/integrations/azureBlobStorage.md) diff --git a/docs/en/sql-reference/table-functions/deltalakeCluster.md b/docs/en/sql-reference/table-functions/deltalakeCluster.md index f01b40d5ce6f..865399172ff3 100644 --- a/docs/en/sql-reference/table-functions/deltalakeCluster.md +++ b/docs/en/sql-reference/table-functions/deltalakeCluster.md @@ -45,6 +45,17 @@ A table with the specified structure for reading data from cluster in the specif - `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`. - `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`. +## Altinity Antalya branch + +### `object_storage_cluster` setting. + +Only in the Altinity Antalya branch alternative syntax for `deltaLakeCluster` table function is available. This allows the `deltaLake` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Delta Lake Storage across a ClickHouse cluster. + +```sql +SELECT count(*) FROM deltaLake(url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression]) +SETTINGS object_storage_cluster='cluster_simple' +``` + ## Related {#related} - [deltaLake engine](engines/table-engines/integrations/deltalake.md) diff --git a/docs/en/sql-reference/table-functions/hdfsCluster.md b/docs/en/sql-reference/table-functions/hdfsCluster.md index 74a526a2de7b..bfc4ab30fd5b 100644 --- a/docs/en/sql-reference/table-functions/hdfsCluster.md +++ b/docs/en/sql-reference/table-functions/hdfsCluster.md @@ -60,6 +60,18 @@ FROM hdfsCluster('cluster_simple', 'hdfs://hdfs1:9000/{some,another}_dir/*', 'TS If your listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`. ::: +## Altinity Antalya branch + +### `object_storage_cluster` setting. + +Only in the Altinity Antalya branch alternative syntax for `hdfsCluster` table function is available. This allows the `hdfs` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over HDFS Storage across a ClickHouse cluster. + +```sql +SELECT count(*) +FROM hdfs('hdfs://hdfs1:9000/{some,another}_dir/*', 'TSV', 'name String, value UInt32') +SETTINGS object_storage_cluster='cluster_simple' +``` + ## Related {#related} - [HDFS engine](../../engines/table-engines/integrations/hdfs.md) diff --git a/docs/en/sql-reference/table-functions/hudiCluster.md b/docs/en/sql-reference/table-functions/hudiCluster.md index 3f44a369d062..1087ef51cb84 100644 --- a/docs/en/sql-reference/table-functions/hudiCluster.md +++ b/docs/en/sql-reference/table-functions/hudiCluster.md @@ -43,6 +43,18 @@ A table with the specified structure for reading data from cluster in the specif - `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`. - `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`. +## Altinity Antalya branch + +### `object_storage_cluster` setting. + +Only in the Altinity Antalya branch alternative syntax for `hudiCluster` table function is available. This allows the `hudi` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Hudi Storage across a ClickHouse cluster. + +```sql +SELECT * +FROM hudi(url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression]) +SETTINGS object_storage_cluster='cluster_simple' +``` + ## Related {#related} - [Hudi engine](engines/table-engines/integrations/hudi.md) diff --git a/docs/en/sql-reference/table-functions/iceberg.md b/docs/en/sql-reference/table-functions/iceberg.md index a4917c286e7e..c0c27b384429 100644 --- a/docs/en/sql-reference/table-functions/iceberg.md +++ b/docs/en/sql-reference/table-functions/iceberg.md @@ -649,6 +649,47 @@ GRANT ALTER TABLE ON my_iceberg_table TO my_user; - The catalog's own authorization (REST catalog auth, AWS Glue IAM, etc.) is enforced independently when ClickHouse updates the metadata ::: +## Altinity Antalya branch + +### Specify storage type in arguments + +Only in the Altinity Antalya branch does the `iceberg` table function support all storage types. The storage type can be specified using the named argument `storage_type`. Supported values are `s3`, `azure`, `hdfs`, and `local`. + +```sql +iceberg(storage_type='s3', url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method]) + +iceberg(storage_type='azure', connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method]) + +iceberg(storage_type='hdfs', path_to_table, [,format] [,compression_method]) + +iceberg(storage_type='local', path_to_table, [,format] [,compression_method]) +``` + +### Specify storage type in named collection + +Only in the Altinity Antalya branch can storage_type be included as part of a named collection. This allows for centralized configuration of storage settings. + +```xml + + + + http://test.s3.amazonaws.com/clickhouse-bucket/ + test + test + auto + auto + s3 + + + +``` + +```sql +iceberg(named_collection[, option=value [,..]]) +``` + +The default value for `storage_type` is `s3`. + ## See Also {#see-also} * [Iceberg engine](/engines/table-engines/integrations/iceberg.md) diff --git a/docs/en/sql-reference/table-functions/icebergCluster.md b/docs/en/sql-reference/table-functions/icebergCluster.md index d3ce33579d3e..f1db4c4f44ac 100644 --- a/docs/en/sql-reference/table-functions/icebergCluster.md +++ b/docs/en/sql-reference/table-functions/icebergCluster.md @@ -50,6 +50,81 @@ SELECT * FROM icebergS3Cluster('cluster_simple', 'http://test.s3.amazonaws.com/c - `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`. - `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`. +## Altinity Antalya branch + +### `icebergLocalCluster` table function + +Only in the Altinity Antalya branch, `icebergLocalCluster` designed to make distributed cluster queries when Iceberg data is stored on shared network storage mounted with a local path. The path must be identical on all replicas. + +```sql +icebergLocalCluster(cluster_name, path_to_table, [,format] [,compression_method]) +``` + +### Specify storage type in function arguments + +Only in the Altinity Antalya branch, the `icebergCluster` table function supports all storage backends. The storage backend can be specified using the named argument `storage_type`. Valid values include `s3`, `azure`, `hdfs`, and `local`. + +```sql +icebergCluster(storage_type='s3', cluster_name, url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method]) + +icebergCluster(storage_type='azure', cluster_name, connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method]) + +icebergCluster(storage_type='hdfs', cluster_name, path_to_table, [,format] [,compression_method]) + +icebergCluster(storage_type='local', cluster_name, path_to_table, [,format] [,compression_method]) +``` + +### Specify storage type in a named collection + +Only in the Altinity Antalya branch, `storage_type` can be part of a named collection. + +```xml + + + + http://test.s3.amazonaws.com/clickhouse-bucket/ + test + test + auto + auto + s3 + + + +``` + +```sql +icebergCluster(iceberg_conf[, option=value [,..]]) +``` + +The default value for `storage_type` is `s3`. + +### `object_storage_cluster` setting. + +Only in the Altinity Antalya branch, an alternative syntax for `icebergCluster` table function is available. This allows the `iceberg` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Iceberg table across a ClickHouse cluster. + +```sql +icebergS3(url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name' + +icebergAzure(connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name' + +icebergHDFS(path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name' + +icebergLocal(path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name' + +icebergS3(option=value [,..]) SETTINGS object_storage_cluster='cluster_name' + +iceberg(storage_type='s3', url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name' + +iceberg(storage_type='azure', connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name' + +iceberg(storage_type='hdfs', path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name' + +iceberg(storage_type='local', path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name' + +iceberg(iceberg_conf[, option=value [,..]]) SETTINGS object_storage_cluster='cluster_name' +``` + **See Also** - [Iceberg engine](/engines/table-engines/integrations/iceberg.md) diff --git a/docs/en/sql-reference/table-functions/s3Cluster.md b/docs/en/sql-reference/table-functions/s3Cluster.md index 2e6af0273ba0..f0cce77a0b49 100644 --- a/docs/en/sql-reference/table-functions/s3Cluster.md +++ b/docs/en/sql-reference/table-functions/s3Cluster.md @@ -91,6 +91,23 @@ Users can use the same approaches as document for the s3 function [here](/sql-re For details on optimizing the performance of the s3 function see [our detailed guide](/integrations/s3/performance). +## Altinity Antalya branch + +### `object_storage_cluster` setting. + +Only in the Altinity Antalya branch alternative syntax for `s3Cluster` table function is available. This allows the `s3` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over S3 Storage across a ClickHouse cluster. + +```sql +SELECT * FROM s3( + 'http://minio1:9001/root/data/{clickhouse,database}/*', + 'minio', + 'ClickHouse_Minio_P@ssw0rd', + 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))' +) ORDER BY (name, value, polygon) +SETTINGS object_storage_cluster='cluster_simple' +``` + ## Related {#related} - [S3 engine](../../engines/table-engines/integrations/s3.md) diff --git a/src/Analyzer/FunctionNode.cpp b/src/Analyzer/FunctionNode.cpp index 306f64db3bae..52ef493021fd 100644 --- a/src/Analyzer/FunctionNode.cpp +++ b/src/Analyzer/FunctionNode.cpp @@ -12,6 +12,7 @@ #include #include +#include #include @@ -164,6 +165,13 @@ void FunctionNode::dumpTreeImpl(WriteBuffer & buffer, FormatState & format_state buffer << '\n' << std::string(indent + 2, ' ') << "WINDOW\n"; getWindowNode()->dumpTreeImpl(buffer, format_state, indent + 4); } + + if (!settings_changes.empty()) + { + buffer << '\n' << std::string(indent + 2, ' ') << "SETTINGS"; + for (const auto & change : settings_changes) + buffer << fmt::format(" {}={}", change.name, fieldToString(change.value)); + } } bool FunctionNode::isEqualImpl(const IQueryTreeNode & rhs, CompareOptions compare_options) const @@ -171,7 +179,7 @@ bool FunctionNode::isEqualImpl(const IQueryTreeNode & rhs, CompareOptions compar const auto & rhs_typed = assert_cast(rhs); if (function_name != rhs_typed.function_name || isAggregateFunction() != rhs_typed.isAggregateFunction() || isOrdinaryFunction() != rhs_typed.isOrdinaryFunction() || isWindowFunction() != rhs_typed.isWindowFunction() - || nulls_action != rhs_typed.nulls_action) + || nulls_action != rhs_typed.nulls_action || settings_changes != rhs_typed.settings_changes) return false; /// is_operator is ignored here because it affects only AST formatting @@ -206,6 +214,17 @@ void FunctionNode::updateTreeHashImpl(HashState & hash_state, CompareOptions com hash_state.update(isWindowFunction()); hash_state.update(nulls_action); + hash_state.update(settings_changes.size()); + for (const auto & change : settings_changes) + { + hash_state.update(change.name.size()); + hash_state.update(change.name); + + const auto & value_dump = change.value.dump(); + hash_state.update(value_dump.size()); + hash_state.update(value_dump); + } + /// is_operator is ignored here because it affects only AST formatting if (!compare_options.compare_types) @@ -230,6 +249,7 @@ QueryTreeNodePtr FunctionNode::cloneImpl() const result_function->nulls_action = nulls_action; result_function->wrap_with_nullable = wrap_with_nullable; result_function->is_operator = is_operator; + result_function->settings_changes = settings_changes; return result_function; } @@ -292,6 +312,14 @@ ASTPtr FunctionNode::toASTImpl(const ConvertToASTOptions & options) const function_ast->window_definition = window_node->toAST(new_options); } + if (!settings_changes.empty()) + { + auto settings_ast = make_intrusive(); + settings_ast->changes = settings_changes; + settings_ast->is_standalone = false; + function_ast->arguments->children.push_back(settings_ast); + } + return function_ast; } diff --git a/src/Analyzer/FunctionNode.h b/src/Analyzer/FunctionNode.h index c0005016def6..0ec99c9ab40c 100644 --- a/src/Analyzer/FunctionNode.h +++ b/src/Analyzer/FunctionNode.h @@ -10,6 +10,7 @@ #include #include #include +#include namespace DB { @@ -204,6 +205,18 @@ class FunctionNode final : public IQueryTreeNode wrap_with_nullable = true; } + /// Get settings changes passed to table function + const SettingsChanges & getSettingsChanges() const + { + return settings_changes; + } + + /// Set settings changes passed as last argument to table function + void setSettingsChanges(SettingsChanges settings_changes_) + { + settings_changes = std::move(settings_changes_); + } + void dumpTreeImpl(WriteBuffer & buffer, FormatState & format_state, size_t indent) const override; protected: @@ -228,6 +241,8 @@ class FunctionNode final : public IQueryTreeNode static constexpr size_t arguments_child_index = 1; static constexpr size_t window_child_index = 2; static constexpr size_t children_size = window_child_index + 1; + + SettingsChanges settings_changes; }; } diff --git a/src/Analyzer/FunctionSecretArgumentsFinderTreeNode.h b/src/Analyzer/FunctionSecretArgumentsFinderTreeNode.h index 8bcb6e147420..e4f63192c95b 100644 --- a/src/Analyzer/FunctionSecretArgumentsFinderTreeNode.h +++ b/src/Analyzer/FunctionSecretArgumentsFinderTreeNode.h @@ -71,8 +71,14 @@ class FunctionTreeNodeImpl : public AbstractFunction { public: explicit ArgumentsTreeNode(const QueryTreeNodes * arguments_) : arguments(arguments_) {} - size_t size() const override { return arguments ? arguments->size() : 0; } - std::unique_ptr at(size_t n) const override { return std::make_unique(arguments->at(n).get()); } + size_t size() const override + { /// size withous skipped indexes + return arguments ? arguments->size() - skippedSize() : 0; + } + std::unique_ptr at(size_t n) const override + { /// n is relative index, some can be skipped + return std::make_unique(arguments->at(getRealIndex(n)).get()); + } private: const QueryTreeNodes * arguments = nullptr; }; diff --git a/src/Analyzer/QueryTreeBuilder.cpp b/src/Analyzer/QueryTreeBuilder.cpp index d11fecf3011a..37c0d0472429 100644 --- a/src/Analyzer/QueryTreeBuilder.cpp +++ b/src/Analyzer/QueryTreeBuilder.cpp @@ -762,7 +762,12 @@ QueryTreeNodePtr QueryTreeBuilder::buildExpression(const ASTPtr & expression, co { const auto & function_arguments_list = function->arguments->as()->children; for (const auto & argument : function_arguments_list) - function_node->getArguments().getNodes().push_back(buildExpression(argument, context)); + { + if (const auto * ast_set = argument->as()) + function_node->setSettingsChanges(ast_set->changes); + else + function_node->getArguments().getNodes().push_back(buildExpression(argument, context)); + } } if (function->isWindowFunction()) diff --git a/src/Analyzer/Resolve/QueryAnalyzer.cpp b/src/Analyzer/Resolve/QueryAnalyzer.cpp index f55197b602ef..38ad97fc0294 100644 --- a/src/Analyzer/Resolve/QueryAnalyzer.cpp +++ b/src/Analyzer/Resolve/QueryAnalyzer.cpp @@ -4011,6 +4011,7 @@ void QueryAnalyzer::resolveTableFunction(QueryTreeNodePtr & table_function_node, { auto table_function_node_to_resolve_typed = std::make_shared(table_function_argument_function_name); table_function_node_to_resolve_typed->getArgumentsNode() = table_function_argument_function->getArgumentsNode(); + table_function_node_to_resolve_typed->setSettingsChanges(table_function_argument_function->getSettingsChanges()); QueryTreeNodePtr table_function_node_to_resolve = std::move(table_function_node_to_resolve_typed); if (table_function_argument_function_name == "view") diff --git a/src/Common/ProfileEvents.cpp b/src/Common/ProfileEvents.cpp index 9872e039abfa..147c6610146f 100644 --- a/src/Common/ProfileEvents.cpp +++ b/src/Common/ProfileEvents.cpp @@ -360,6 +360,11 @@ M(IcebergTrivialCountOptimizationApplied, "Trivial count optimization applied while reading from Iceberg", ValueType::Number) \ M(IcebergVersionHintUsed, "Number of times version-hint.text has been used.", ValueType::Number) \ M(IcebergMinMaxIndexPrunedFiles, "Number of skipped files by using MinMax index in Iceberg", ValueType::Number) \ + M(IcebergAvroFileParsing, "Number of times avro metadata files have been parsed.", ValueType::Number) \ + M(IcebergAvroFileParsingMicroseconds, "Time spent for parsing avro metadata files for Iceberg tables.", ValueType::Microseconds) \ + M(IcebergJsonFileParsing, "Number of times json metadata files have been parsed.", ValueType::Number) \ + M(IcebergJsonFileParsingMicroseconds, "Time spent for parsing json metadata files for Iceberg tables.", ValueType::Microseconds) \ + \ M(JoinBuildTableRowCount, "Total number of rows in the build table for a JOIN operation.", ValueType::Number) \ M(JoinProbeTableRowCount, "Total number of rows in the probe table for a JOIN operation.", ValueType::Number) \ M(JoinResultRowCount, "Total number of rows in the result of a JOIN operation.", ValueType::Number) \ @@ -688,8 +693,10 @@ The server successfully detected this situation and will download merged part fr M(S3DeleteObjects, "Number of S3 API DeleteObject(s) calls.", ValueType::Number) \ M(S3CopyObject, "Number of S3 API CopyObject calls.", ValueType::Number) \ M(S3ListObjects, "Number of S3 API ListObjects calls.", ValueType::Number) \ + M(S3ListObjectsMicroseconds, "Time of S3 API ListObjects execution.", ValueType::Microseconds) \ M(S3HeadObject, "Number of S3 API HeadObject calls.", ValueType::Number) \ M(S3GetObjectTagging, "Number of S3 API GetObjectTagging calls.", ValueType::Number) \ + M(S3HeadObjectMicroseconds, "Time of S3 API HeadObject execution.", ValueType::Microseconds) \ M(S3CreateMultipartUpload, "Number of S3 API CreateMultipartUpload calls.", ValueType::Number) \ M(S3UploadPartCopy, "Number of S3 API UploadPartCopy calls.", ValueType::Number) \ M(S3UploadPart, "Number of S3 API UploadPart calls.", ValueType::Number) \ @@ -744,6 +751,7 @@ The server successfully detected this situation and will download merged part fr M(AzureCopyObject, "Number of Azure blob storage API CopyObject calls", ValueType::Number) \ M(AzureDeleteObjects, "Number of Azure blob storage API DeleteObject(s) calls.", ValueType::Number) \ M(AzureListObjects, "Number of Azure blob storage API ListObjects calls.", ValueType::Number) \ + M(AzureListObjectsMicroseconds, "Time of Azure blob storage API ListObjects execution.", ValueType::Microseconds) \ M(AzureGetProperties, "Number of Azure blob storage API GetProperties calls.", ValueType::Number) \ M(AzureCreateContainer, "Number of Azure blob storage API CreateContainer calls.", ValueType::Number) \ \ diff --git a/src/Core/Settings.cpp b/src/Core/Settings.cpp index 0cbb8921adc8..acd8b34b07ce 100644 --- a/src/Core/Settings.cpp +++ b/src/Core/Settings.cpp @@ -7480,6 +7480,25 @@ Always ignore ON CLUSTER clause for DDL queries with replicated databases. )", 0) \ DECLARE(UInt64, archive_adaptive_buffer_max_size_bytes, 8 * DBMS_DEFAULT_BUFFER_SIZE, R"( Limits the maximum size of the adaptive buffer used when writing to archive files (for example, tar archives)", 0) \ + DECLARE(Timezone, iceberg_timezone_for_timestamptz, "UTC", R"( +Timezone for Iceberg timestamptz field. + +Possible values: + +- Any valid timezone, e.g. `Europe/Berlin`, `UTC` or `Zulu` +- `` (empty value) - use session timezone + +Default value is `UTC`. +)", 0) \ + DECLARE(Timezone, iceberg_partition_timezone, "", R"( +Time zone by which partitioning of Iceberg tables was performed. +Possible values: + +- Any valid timezone, e.g. `Europe/Berlin`, `UTC` or `Zulu` +- `` (empty value) - use server or session timezone + +Default value is empty. +)", 0) \ \ /* ####################################################### */ \ /* ########### START OF EXPERIMENTAL FEATURES ############ */ \ @@ -7631,6 +7650,15 @@ Source SQL dialect for the polyglot transpiler (e.g. 'sqlite', 'mysql', 'postgre )", EXPERIMENTAL) \ DECLARE(Bool, enable_adaptive_memory_spill_scheduler, false, R"( Trigger processor to spill data into external storage adpatively. grace join is supported at present. +)", EXPERIMENTAL) \ + DECLARE(String, object_storage_cluster, "", R"( +Cluster to make distributed requests to object storages with alternative syntax. +)", EXPERIMENTAL) \ + DECLARE(UInt64, object_storage_max_nodes, 0, R"( +Limit for hosts used for request in object storage cluster table functions - azureBlobStorageCluster, s3Cluster, hdfsCluster, etc. +Possible values: +- Positive integer. +- 0 — All hosts in cluster. )", EXPERIMENTAL) \ DECLARE(Bool, allow_experimental_delta_kernel_rs, true, R"( Allow experimental delta-kernel-rs implementation. @@ -7712,6 +7740,12 @@ If the number of set bits in a runtime bloom filter exceeds this ratio the filte )", EXPERIMENTAL) \ DECLARE(Bool, rewrite_in_to_join, false, R"( Rewrite expressions like 'x IN subquery' to JOIN. This might be useful for optimizing the whole query with join reordering. +)", EXPERIMENTAL) \ + DECLARE(Bool, object_storage_remote_initiator, false, R"( +Execute request to object storage as remote on one of object_storage_cluster nodes. +)", EXPERIMENTAL) \ + DECLARE(String, object_storage_remote_initiator_cluster, "", R"( +Cluster to choose remote initiator, when `object_storage_remote_initiator` is true. When empty, `object_storage_cluster` is used. )", EXPERIMENTAL) \ \ /** Experimental timeSeries* aggregate functions. */ \ diff --git a/src/Core/SettingsChangesHistory.cpp b/src/Core/SettingsChangesHistory.cpp index ac6d42e957f6..dac022fdee3a 100644 --- a/src/Core/SettingsChangesHistory.cpp +++ b/src/Core/SettingsChangesHistory.cpp @@ -39,6 +39,9 @@ const VersionToSettingsChangesMap & getSettingsChangesHistory() /// controls new feature and it's 'true' by default, use 'false' as previous_value). /// It's used to implement `compatibility` setting (see https://github.com/ClickHouse/ClickHouse/issues/35972) /// Note: please check if the key already exists to prevent duplicate entries. + addSettingsChanges(settings_changes_history, "26.3.1.20001.altinityantalya", + { + }); addSettingsChanges(settings_changes_history, "26.3", { {"allow_experimental_polyglot_dialect", false, false, "New setting to enable the polyglot SQL transpiler dialect."}, @@ -108,12 +111,12 @@ const VersionToSettingsChangesMap & getSettingsChangesHistory() }); addSettingsChanges(settings_changes_history, "26.1.3.20001.altinityantalya", { - // {"iceberg_partition_timezone", "", "", "New setting."}, + {"iceberg_partition_timezone", "", "", "New setting."}, // {"s3_propagate_credentials_to_other_storages", false, false, "New setting"}, // {"export_merge_tree_part_filename_pattern", "", "{part_name}_{checksum}", "New setting"}, // {"use_parquet_metadata_cache", false, true, "Enables cache of parquet file metadata."}, // {"input_format_parquet_use_metadata_cache", true, false, "Obsolete. No-op"}, // https://github.com/Altinity/ClickHouse/pull/586 - // {"object_storage_remote_initiator_cluster", "", "", "New setting."}, + {"object_storage_remote_initiator_cluster", "", "", "New setting."}, // {"iceberg_metadata_staleness_ms", 0, 0, "New setting allowing using cached metadata version at READ operations to prevent fetching from remote catalog"}, }); addSettingsChanges(settings_changes_history, "26.1", @@ -200,7 +203,6 @@ const VersionToSettingsChangesMap & getSettingsChangesHistory() {"insert_select_deduplicate", Field{"auto"}, Field{"auto"}, "New setting"}, {"output_format_pretty_named_tuples_as_json", false, true, "New setting to control whether named tuples in Pretty format are output as JSON objects"}, {"deduplicate_insert_select", "enable_even_for_bad_queries", "enable_even_for_bad_queries", "New setting, replace insert_select_deduplicate"}, - }); addSettingsChanges(settings_changes_history, "25.11", { @@ -299,15 +301,15 @@ const VersionToSettingsChangesMap & getSettingsChangesHistory() }); addSettingsChanges(settings_changes_history, "25.8.16.20001.altinityantalya", { - // {"allow_experimental_database_iceberg", false, true, "Turned ON by default for Antalya."}, - // {"allow_experimental_database_unity_catalog", false, true, "Turned ON by default for Antalya."}, - // {"allow_experimental_database_glue_catalog", false, true, "Turned ON by default for Antalya."}, - // {"allow_database_iceberg", false, true, "Turned ON by default for Antalya (alias)."}, - // {"allow_database_unity_catalog", false, true, "Turned ON by default for Antalya (alias)."}, - // {"allow_database_glue_catalog", false, true, "Turned ON by default for Antalya (alias)."}, + {"allow_experimental_database_iceberg", false, true, "Turned ON by default for Antalya."}, + {"allow_experimental_database_unity_catalog", false, true, "Turned ON by default for Antalya."}, + {"allow_experimental_database_glue_catalog", false, true, "Turned ON by default for Antalya."}, + {"allow_database_iceberg", false, true, "Turned ON by default for Antalya (alias)."}, + {"allow_database_unity_catalog", false, true, "Turned ON by default for Antalya (alias)."}, + {"allow_database_glue_catalog", false, true, "Turned ON by default for Antalya (alias)."}, // {"input_format_parquet_use_metadata_cache", true, true, "New setting, turned ON by default"}, // https://github.com/Altinity/ClickHouse/pull/586 - // {"iceberg_timezone_for_timestamptz", "UTC", "UTC", "New setting."}, - // {"object_storage_remote_initiator", false, false, "New setting."}, + {"iceberg_timezone_for_timestamptz", "UTC", "UTC", "New setting."}, + {"object_storage_remote_initiator", false, false, "New setting."}, // {"allow_experimental_iceberg_read_optimization", true, true, "New setting."}, // {"object_storage_cluster_join_mode", "allow", "allow", "New setting"}, // {"lock_object_storage_task_distribution_ms", 500, 500, "New setting."}, @@ -327,8 +329,8 @@ const VersionToSettingsChangesMap & getSettingsChangesHistory() // {"export_merge_tree_partition_system_table_prefer_remote_information", true, true, "New setting."}, // {"export_merge_tree_part_throw_on_pending_mutations", true, true, "New setting."}, // {"export_merge_tree_part_throw_on_pending_patch_parts", true, true, "New setting."}, - // {"object_storage_cluster", "", "", "Antalya: New setting"}, - // {"object_storage_max_nodes", 0, 0, "Antalya: New setting"}, + {"object_storage_cluster", "", "", "Antalya: New setting"}, + {"object_storage_max_nodes", 0, 0, "Antalya: New setting"}, }); addSettingsChanges(settings_changes_history, "25.8", { diff --git a/src/Databases/DataLake/Common.cpp b/src/Databases/DataLake/Common.cpp index 681dd957b43f..8946d3412d70 100644 --- a/src/Databases/DataLake/Common.cpp +++ b/src/Databases/DataLake/Common.cpp @@ -61,14 +61,14 @@ std::vector splitTypeArguments(const String & type_str) return args; } -DB::DataTypePtr getType(const String & type_name, bool nullable, const String & prefix) +DB::DataTypePtr getType(const String & type_name, bool nullable, DB::ContextPtr context, const String & prefix) { String name = trim(type_name); if (name.starts_with("array<") && name.ends_with(">")) { String inner = name.substr(6, name.size() - 7); - return std::make_shared(getType(inner, nullable)); + return std::make_shared(getType(inner, nullable, context)); } if (name.starts_with("map<") && name.ends_with(">")) @@ -79,7 +79,7 @@ DB::DataTypePtr getType(const String & type_name, bool nullable, const String & if (args.size() != 2) throw DB::Exception(DB::ErrorCodes::DATALAKE_DATABASE_ERROR, "Invalid data type {}", type_name); - return std::make_shared(getType(args[0], false), getType(args[1], nullable)); + return std::make_shared(getType(args[0], false, context), getType(args[1], nullable, context)); } if (name.starts_with("struct<") && name.ends_with(">")) @@ -101,13 +101,13 @@ DB::DataTypePtr getType(const String & type_name, bool nullable, const String & String full_field_name = prefix.empty() ? field_name : prefix + "." + field_name; field_names.push_back(full_field_name); - field_types.push_back(getType(field_type, nullable, full_field_name)); + field_types.push_back(getType(field_type, nullable, context, full_field_name)); } return std::make_shared(field_types, field_names); } - return nullable ? DB::makeNullable(DB::Iceberg::IcebergSchemaProcessor::getSimpleType(name)) - : DB::Iceberg::IcebergSchemaProcessor::getSimpleType(name); + return nullable ? DB::makeNullable(DB::Iceberg::IcebergSchemaProcessor::getSimpleType(name, context)) + : DB::Iceberg::IcebergSchemaProcessor::getSimpleType(name, context); } std::pair parseTableName(const std::string & name) diff --git a/src/Databases/DataLake/Common.h b/src/Databases/DataLake/Common.h index cd4b6214e343..9b0dd7c626a6 100644 --- a/src/Databases/DataLake/Common.h +++ b/src/Databases/DataLake/Common.h @@ -2,6 +2,7 @@ #include #include +#include namespace DataLake { @@ -10,7 +11,7 @@ String trim(const String & str); std::vector splitTypeArguments(const String & type_str); -DB::DataTypePtr getType(const String & type_name, bool nullable, const String & prefix = ""); +DB::DataTypePtr getType(const String & type_name, bool nullable, DB::ContextPtr context, const String & prefix = ""); /// Parse a string, containing at least one dot, into a two substrings: /// A.B.C.D.E -> A.B.C.D and E, where diff --git a/src/Databases/DataLake/DataLakeConstants.h b/src/Databases/DataLake/DataLakeConstants.h index 0b228bf310ec..372cc92a6631 100644 --- a/src/Databases/DataLake/DataLakeConstants.h +++ b/src/Databases/DataLake/DataLakeConstants.h @@ -8,6 +8,7 @@ namespace DataLake { static constexpr auto DATABASE_ENGINE_NAME = "DataLakeCatalog"; +static constexpr auto DATABASE_ALIAS_NAME = "Iceberg"; static constexpr std::string_view FILE_PATH_PREFIX = "file:/"; /// Some catalogs (Unity or Glue) may store not only Iceberg/DeltaLake tables but other kinds of "tables" diff --git a/src/Databases/DataLake/DatabaseDataLake.cpp b/src/Databases/DataLake/DatabaseDataLake.cpp index d140a03d6fa0..127eb51cfcb3 100644 --- a/src/Databases/DataLake/DatabaseDataLake.cpp +++ b/src/Databases/DataLake/DatabaseDataLake.cpp @@ -61,6 +61,7 @@ namespace DatabaseDataLakeSetting extern const DatabaseDataLakeSettingsString oauth_server_uri; extern const DatabaseDataLakeSettingsBool oauth_server_use_request_body; extern const DatabaseDataLakeSettingsBool vended_credentials; + extern const DatabaseDataLakeSettingsString object_storage_cluster; extern const DatabaseDataLakeSettingsString aws_access_key_id; extern const DatabaseDataLakeSettingsString aws_secret_access_key; extern const DatabaseDataLakeSettingsString region; @@ -295,7 +296,7 @@ std::shared_ptr DatabaseDataLake::getCatalog() const return catalog_impl; } -std::shared_ptr DatabaseDataLake::getConfiguration( +StorageObjectStorageConfigurationPtr DatabaseDataLake::getConfiguration( DatabaseDataLakeStorageType type, DataLakeStorageSettingsPtr storage_settings) const { @@ -515,7 +516,7 @@ StoragePtr DatabaseDataLake::tryGetTableImpl(const String & name, ContextPtr con auto [namespace_name, table_name] = DataLake::parseTableName(name); - if (!catalog->tryGetTableMetadata(namespace_name, table_name, table_metadata)) + if (!catalog->tryGetTableMetadata(namespace_name, table_name, context_, table_metadata)) return nullptr; if (ignore_if_not_iceberg && !table_metadata.isDefaultReadableTable()) return nullptr; @@ -650,7 +651,7 @@ StoragePtr DatabaseDataLake::tryGetTableImpl(const String & name, ContextPtr con /// with_table_structure = false: because there will be /// no table structure in table definition AST. - StorageObjectStorageConfiguration::initialize(*configuration, args, context_copy, /* with_table_structure */false); + configuration->initialize(args, context_copy, /* with_table_structure */false); const auto & query_settings = context_->getSettingsRef(); @@ -662,50 +663,34 @@ StoragePtr DatabaseDataLake::tryGetTableImpl(const String & name, ContextPtr con const auto is_secondary_query = context_->getClientInfo().query_kind == ClientInfo::QueryKind::SECONDARY_QUERY; - if (can_use_parallel_replicas && !is_secondary_query) - { - auto storage_id = StorageID(getDatabaseName(), name); - auto storage_cluster = std::make_shared( - parallel_replicas_cluster_name, - configuration, - configuration->createObjectStorage(context_copy, /* is_readonly */ false, catalog->getCredentialsConfigurationCallback(storage_id)), - storage_id, - columns, - ConstraintsDescription{}, - nullptr, - context_, - /// Use is_table_function = true, - /// because this table is actually stateless like a table function. - /* is_table_function */true); - - storage_cluster->startup(); - return storage_cluster; - } + std::string cluster_name = configuration->isClusterSupported() ? settings[DatabaseDataLakeSetting::object_storage_cluster].value : ""; - bool can_use_distributed_iterator = - context_->getClientInfo().collaborate_with_initiator && - can_use_parallel_replicas; + if (cluster_name.empty() && can_use_parallel_replicas && !is_secondary_query) + cluster_name = parallel_replicas_cluster_name; - return std::make_shared( + auto storage_cluster = std::make_shared( + cluster_name, configuration, configuration->createObjectStorage(context_copy, /* is_readonly */ false, catalog->getCredentialsConfigurationCallback(StorageID(getDatabaseName(), name))), - context_copy, StorageID(getDatabaseName(), name), /* columns */columns, /* constraints */ConstraintsDescription{}, - /* comment */"", + /* partition_by */nullptr, + /* order_by */nullptr, + context_copy, + /* comment */ "", getFormatSettings(context_copy), LoadingStrictnessLevel::CREATE, getCatalog(), /* if_not_exists*/true, /* is_datalake_query*/true, - /* distributed_processing */can_use_distributed_iterator, - /* partition_by */nullptr, - /* order_by */nullptr, /// Use is_table_function = true, /// because this table is actually stateless like a table function. /* is_table_function */true, /* lazy_init */true); + + storage_cluster->startup(); + return storage_cluster; } void DatabaseDataLake::dropTable( /// NOLINT @@ -854,7 +839,7 @@ ASTPtr DatabaseDataLake::getCreateDatabaseQueryImpl() const ASTPtr DatabaseDataLake::getCreateTableQueryImpl( const String & name, - ContextPtr /* context_ */, + ContextPtr context_, bool throw_on_error) const { auto catalog = getCatalog(); @@ -862,7 +847,7 @@ ASTPtr DatabaseDataLake::getCreateTableQueryImpl( const auto [namespace_name, table_name] = DataLake::parseTableName(name); - if (!catalog->tryGetTableMetadata(namespace_name, table_name, table_metadata)) + if (!catalog->tryGetTableMetadata(namespace_name, table_name, context_, table_metadata)) { if (throw_on_error) throw Exception(ErrorCodes::CANNOT_GET_CREATE_TABLE_QUERY, "Table `{}` doesn't exist", name); @@ -955,6 +940,11 @@ void registerDatabaseDataLake(DatabaseFactory & factory) throw Exception(ErrorCodes::BAD_ARGUMENTS, "Engine `{}` must have arguments", database_engine_name); } + if (database_engine_name == "Iceberg" && catalog_type != DatabaseDataLakeCatalogType::ICEBERG_REST) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Engine `Iceberg` must have `rest` catalog type only"); + } + for (auto & engine_arg : engine_args) engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, args.context); @@ -1051,6 +1041,7 @@ void registerDatabaseDataLake(DatabaseFactory & factory) args.uuid); }; factory.registerDatabase("DataLakeCatalog", create_fn, { .supports_arguments = true, .supports_settings = true }); + factory.registerDatabase("Iceberg", create_fn, { .supports_arguments = true, .supports_settings = true }); } } diff --git a/src/Databases/DataLake/DatabaseDataLake.h b/src/Databases/DataLake/DatabaseDataLake.h index 4dc72224eb9c..efaf9f9db316 100644 --- a/src/Databases/DataLake/DatabaseDataLake.h +++ b/src/Databases/DataLake/DatabaseDataLake.h @@ -84,7 +84,7 @@ class DatabaseDataLake final : public IDatabase, WithContext void validateSettings(); std::shared_ptr getCatalog() const; - std::shared_ptr getConfiguration( + StorageObjectStorageConfigurationPtr getConfiguration( DatabaseDataLakeStorageType type, DataLakeStorageSettingsPtr storage_settings) const; diff --git a/src/Databases/DataLake/GlueCatalog.cpp b/src/Databases/DataLake/GlueCatalog.cpp index 426ea0f688ab..50c23dbe8314 100644 --- a/src/Databases/DataLake/GlueCatalog.cpp +++ b/src/Databases/DataLake/GlueCatalog.cpp @@ -283,6 +283,7 @@ bool GlueCatalog::existsTable(const std::string & database_name, const std::stri bool GlueCatalog::tryGetTableMetadata( const std::string & database_name, const std::string & table_name, + DB::ContextPtr /* context_ */, TableMetadata & result) const { Aws::Glue::Model::GetTableRequest request; @@ -376,7 +377,7 @@ bool GlueCatalog::tryGetTableMetadata( column_type = "timestamptz"; } - schema.push_back({column.GetName(), getType(column_type, can_be_nullable)}); + schema.push_back({column.GetName(), getType(column_type, can_be_nullable, getContext())}); } result.setSchema(schema); } @@ -398,9 +399,10 @@ bool GlueCatalog::tryGetTableMetadata( void GlueCatalog::getTableMetadata( const std::string & database_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const { - if (!tryGetTableMetadata(database_name, table_name, result)) + if (!tryGetTableMetadata(database_name, table_name, context_, result)) { throw DB::Exception( DB::ErrorCodes::DATALAKE_DATABASE_ERROR, @@ -509,7 +511,7 @@ GlueCatalog::ObjectStorageWithPath GlueCatalog::createObjectStorageForEarlyTable auto storage_settings = std::make_shared(); storage_settings->loadFromSettingsChanges(settings.allChanged()); auto configuration = std::make_shared(storage_settings); - DB::StorageObjectStorageConfiguration::initialize(*configuration, args, getContext(), false); + configuration->initialize(args, getContext(), false); auto object_storage = configuration->createObjectStorage(getContext(), true, {}); diff --git a/src/Databases/DataLake/GlueCatalog.h b/src/Databases/DataLake/GlueCatalog.h index 34c01ffc1da2..8d92650b2d98 100644 --- a/src/Databases/DataLake/GlueCatalog.h +++ b/src/Databases/DataLake/GlueCatalog.h @@ -46,11 +46,13 @@ class GlueCatalog final : public ICatalog, private DB::WithContext void getTableMetadata( const std::string & database_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const override; bool tryGetTableMetadata( const std::string & database_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const override; std::optional getStorageType() const override diff --git a/src/Databases/DataLake/HiveCatalog.cpp b/src/Databases/DataLake/HiveCatalog.cpp index b86f70dfc4b5..bc6ff5244749 100644 --- a/src/Databases/DataLake/HiveCatalog.cpp +++ b/src/Databases/DataLake/HiveCatalog.cpp @@ -121,13 +121,21 @@ bool HiveCatalog::existsTable(const std::string & namespace_name, const std::str return true; } -void HiveCatalog::getTableMetadata(const std::string & namespace_name, const std::string & table_name, TableMetadata & result) const +void HiveCatalog::getTableMetadata( + const std::string & namespace_name, + const std::string & table_name, + DB::ContextPtr context_, + TableMetadata & result) const { - if (!tryGetTableMetadata(namespace_name, table_name, result)) + if (!tryGetTableMetadata(namespace_name, table_name, context_, result)) throw DB::Exception(DB::ErrorCodes::DATALAKE_DATABASE_ERROR, "No response from iceberg catalog"); } -bool HiveCatalog::tryGetTableMetadata(const std::string & namespace_name, const std::string & table_name, TableMetadata & result) const +bool HiveCatalog::tryGetTableMetadata( + const std::string & namespace_name, + const std::string & table_name, + DB::ContextPtr context_, + TableMetadata & result) const { Apache::Hadoop::Hive::Table table; @@ -155,7 +163,7 @@ bool HiveCatalog::tryGetTableMetadata(const std::string & namespace_name, const auto columns = table.sd.cols; for (const auto & column : columns) { - schema.push_back({column.name, getType(column.type, true)}); + schema.push_back({column.name, getType(column.type, true, context_)}); } result.setSchema(schema); } diff --git a/src/Databases/DataLake/HiveCatalog.h b/src/Databases/DataLake/HiveCatalog.h index 29b4e6ce6c63..0fba0e132486 100644 --- a/src/Databases/DataLake/HiveCatalog.h +++ b/src/Databases/DataLake/HiveCatalog.h @@ -38,9 +38,17 @@ class HiveCatalog final : public ICatalog, private DB::WithContext bool existsTable(const std::string & namespace_name, const std::string & table_name) const override; - void getTableMetadata(const std::string & namespace_name, const std::string & table_name, TableMetadata & result) const override; - - bool tryGetTableMetadata(const std::string & namespace_name, const std::string & table_name, TableMetadata & result) const override; + void getTableMetadata( + const std::string & namespace_name, + const std::string & table_name, + DB::ContextPtr context_, + TableMetadata & result) const override; + + bool tryGetTableMetadata( + const std::string & namespace_name, + const std::string & table_name, + DB::ContextPtr context_, + TableMetadata & result) const override; std::optional getStorageType() const override; diff --git a/src/Databases/DataLake/ICatalog.cpp b/src/Databases/DataLake/ICatalog.cpp index 85d701d86840..e2170c038e52 100644 --- a/src/Databases/DataLake/ICatalog.cpp +++ b/src/Databases/DataLake/ICatalog.cpp @@ -102,33 +102,44 @@ void TableMetadata::setLocation(const std::string & location_) auto pos_to_path = location_.substr(pos_to_bucket).find('/'); if (pos_to_path == std::string::npos) - throw DB::Exception(DB::ErrorCodes::NOT_IMPLEMENTED, "Unexpected location format: {}", location_); - - pos_to_path = pos_to_bucket + pos_to_path; - - location_without_path = location_.substr(0, pos_to_path); - path = location_.substr(pos_to_path + 1); - - /// For Azure ABFSS format: abfss://container@account.dfs.core.windows.net/path - /// The bucket (container) is the part before '@', not the whole string before '/' - String bucket_part = location_.substr(pos_to_bucket, pos_to_path - pos_to_bucket); - auto at_pos = bucket_part.find('@'); - if (at_pos != std::string::npos) { - /// Azure ABFSS format: extract container (before @) and account (after @) - bucket = bucket_part.substr(0, at_pos); - azure_account_with_suffix = bucket_part.substr(at_pos + 1); - LOG_TEST(getLogger("TableMetadata"), - "Parsed Azure location - container: {}, account: {}, path: {}", - bucket, azure_account_with_suffix, path); + if (storage_type_str == "s3://") + { // empty path is allowed for AWS S3Table + location_without_path = location_; + path.clear(); + bucket = location_.substr(pos_to_bucket); + } + else + throw DB::Exception(DB::ErrorCodes::NOT_IMPLEMENTED, "Unexpected location format: {}", location_); } else { - /// Standard format (S3, GCS, etc.) - bucket = bucket_part; - LOG_TEST(getLogger("TableMetadata"), - "Parsed location without path: {}, path: {}", - location_without_path, path); + pos_to_path = pos_to_bucket + pos_to_path; + + location_without_path = location_.substr(0, pos_to_path); + path = location_.substr(pos_to_path + 1); + + /// For Azure ABFSS format: abfss://container@account.dfs.core.windows.net/path + /// The bucket (container) is the part before '@', not the whole string before '/' + String bucket_part = location_.substr(pos_to_bucket, pos_to_path - pos_to_bucket); + auto at_pos = bucket_part.find('@'); + if (at_pos != std::string::npos) + { + /// Azure ABFSS format: extract container (before @) and account (after @) + bucket = bucket_part.substr(0, at_pos); + azure_account_with_suffix = bucket_part.substr(at_pos + 1); + LOG_TEST(getLogger("TableMetadata"), + "Parsed Azure location - container: {}, account: {}, path: {}", + bucket, azure_account_with_suffix, path); + } + else + { + /// Standard format (S3, GCS, etc.) + bucket = bucket_part; + LOG_TEST(getLogger("TableMetadata"), + "Parsed location without path: {}, path: {}", + location_without_path, path); + } } } diff --git a/src/Databases/DataLake/ICatalog.h b/src/Databases/DataLake/ICatalog.h index e3333c1c58cd..1e67123447e2 100644 --- a/src/Databases/DataLake/ICatalog.h +++ b/src/Databases/DataLake/ICatalog.h @@ -10,6 +10,14 @@ #include #include +namespace DB +{ + +class Context; +using ContextPtr = std::shared_ptr; + +} + namespace DataLake { @@ -158,6 +166,7 @@ class ICatalog virtual void getTableMetadata( const std::string & namespace_name, const std::string & table_name, + DB::ContextPtr context, TableMetadata & result) const = 0; /// Get table metadata in the given namespace. @@ -165,6 +174,7 @@ class ICatalog virtual bool tryGetTableMetadata( const std::string & namespace_name, const std::string & table_name, + DB::ContextPtr context, TableMetadata & result) const = 0; /// Get storage type, where Iceberg tables' data is stored. diff --git a/src/Databases/DataLake/PaimonRestCatalog.cpp b/src/Databases/DataLake/PaimonRestCatalog.cpp index c53859ee9f40..63e33e993e45 100644 --- a/src/Databases/DataLake/PaimonRestCatalog.cpp +++ b/src/Databases/DataLake/PaimonRestCatalog.cpp @@ -467,7 +467,7 @@ bool PaimonRestCatalog::existsTable(const String & database_name, const String & return true; } -bool PaimonRestCatalog::tryGetTableMetadata(const String & database_name, const String & table_name, TableMetadata & result) const +bool PaimonRestCatalog::tryGetTableMetadata(const String & database_name, const String & table_name, DB::ContextPtr /*context_*/, TableMetadata & result) const { try { @@ -593,9 +593,9 @@ Poco::JSON::Object::Ptr PaimonRestCatalog::requestRest( return json.extract(); } -void PaimonRestCatalog::getTableMetadata(const String & database_name, const String & table_name, TableMetadata & result) const +void PaimonRestCatalog::getTableMetadata(const String & database_name, const String & table_name, DB::ContextPtr context_, TableMetadata & result) const { - if (!tryGetTableMetadata(database_name, table_name, result)) + if (!tryGetTableMetadata(database_name, table_name, context_, result)) { throw DB::Exception(DB::ErrorCodes::DATALAKE_DATABASE_ERROR, "No response from paimon rest catalog"); } diff --git a/src/Databases/DataLake/PaimonRestCatalog.h b/src/Databases/DataLake/PaimonRestCatalog.h index 78713832e288..c81722c63964 100644 --- a/src/Databases/DataLake/PaimonRestCatalog.h +++ b/src/Databases/DataLake/PaimonRestCatalog.h @@ -89,9 +89,9 @@ class PaimonRestCatalog final : public ICatalog, private DB::WithContext bool existsTable(const String & database_name, const String & table_name) const override; - void getTableMetadata(const String & database_name, const String & table_name, TableMetadata & result) const override; + void getTableMetadata(const String & database_name, const String & table_name, DB::ContextPtr context_, TableMetadata & result) const override; - bool tryGetTableMetadata(const String & database_name, const String & table_name, TableMetadata & result) const override; + bool tryGetTableMetadata(const String & database_name, const String & table_name, DB::ContextPtr /*context_*/, TableMetadata & result) const override; std::optional getStorageType() const override { return storage_type; } diff --git a/src/Databases/DataLake/RestCatalog.cpp b/src/Databases/DataLake/RestCatalog.cpp index b71744e9660e..ddbd14252a7b 100644 --- a/src/Databases/DataLake/RestCatalog.cpp +++ b/src/Databases/DataLake/RestCatalog.cpp @@ -807,17 +807,18 @@ DB::Names RestCatalog::parseTables(DB::ReadBuffer & buf, const std::string & bas bool RestCatalog::existsTable(const std::string & namespace_name, const std::string & table_name) const { TableMetadata table_metadata; - return tryGetTableMetadata(namespace_name, table_name, table_metadata); + return tryGetTableMetadata(namespace_name, table_name, getContext(), table_metadata); } bool RestCatalog::tryGetTableMetadata( const std::string & namespace_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const { try { - return getTableMetadataImpl(namespace_name, table_name, result); + return getTableMetadataImpl(namespace_name, table_name, context_, result); } catch (const DB::Exception & ex) { @@ -829,15 +830,17 @@ bool RestCatalog::tryGetTableMetadata( void RestCatalog::getTableMetadata( const std::string & namespace_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const { - if (!getTableMetadataImpl(namespace_name, table_name, result)) + if (!getTableMetadataImpl(namespace_name, table_name, context_, result)) throw DB::Exception(DB::ErrorCodes::DATALAKE_DATABASE_ERROR, "No response from iceberg catalog"); } bool RestCatalog::getTableMetadataImpl( const std::string & namespace_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const { LOG_DEBUG(log, "Checking table {} in namespace {}", table_name, namespace_name); @@ -898,8 +901,8 @@ bool RestCatalog::getTableMetadataImpl( if (result.requiresSchema()) { // int format_version = metadata_object->getValue("format-version"); - auto schema_processor = DB::Iceberg::IcebergSchemaProcessor(); - auto id = DB::IcebergMetadata::parseTableSchema(metadata_object, schema_processor, log); + auto schema_processor = DB::Iceberg::IcebergSchemaProcessor(context_); + auto id = DB::IcebergMetadata::parseTableSchema(metadata_object, schema_processor, context_, log); auto schema = schema_processor.getClickhouseTableSchemaById(id); result.setSchema(*schema); } diff --git a/src/Databases/DataLake/RestCatalog.h b/src/Databases/DataLake/RestCatalog.h index 0837b5b88e9d..8ebb843930a1 100644 --- a/src/Databases/DataLake/RestCatalog.h +++ b/src/Databases/DataLake/RestCatalog.h @@ -55,11 +55,13 @@ class RestCatalog : public ICatalog, public DB::WithContext void getTableMetadata( const std::string & namespace_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const override; bool tryGetTableMetadata( const std::string & namespace_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const override; std::optional getStorageType() const override; @@ -152,6 +154,7 @@ class RestCatalog : public ICatalog, public DB::WithContext bool getTableMetadataImpl( const std::string & namespace_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const; Config loadConfig(); diff --git a/src/Databases/DataLake/UnityCatalog.cpp b/src/Databases/DataLake/UnityCatalog.cpp index 886425162e9b..7b1f1e00795f 100644 --- a/src/Databases/DataLake/UnityCatalog.cpp +++ b/src/Databases/DataLake/UnityCatalog.cpp @@ -92,9 +92,10 @@ DB::Names UnityCatalog::getTables() const void UnityCatalog::getTableMetadata( const std::string & namespace_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const { - if (!tryGetTableMetadata(namespace_name, table_name, result)) + if (!tryGetTableMetadata(namespace_name, table_name, context_, result)) throw DB::Exception(DB::ErrorCodes::DATALAKE_DATABASE_ERROR, "No response from unity catalog"); } @@ -160,6 +161,7 @@ void UnityCatalog::getCredentials(const std::string & table_id, TableMetadata & bool UnityCatalog::tryGetTableMetadata( const std::string & schema_name, const std::string & table_name, + DB::ContextPtr /* context_ */, TableMetadata & result) const { auto full_table_name = warehouse + "." + schema_name + "." + table_name; diff --git a/src/Databases/DataLake/UnityCatalog.h b/src/Databases/DataLake/UnityCatalog.h index 85c8a57a579b..3dfe71a78fac 100644 --- a/src/Databases/DataLake/UnityCatalog.h +++ b/src/Databases/DataLake/UnityCatalog.h @@ -34,11 +34,13 @@ class UnityCatalog final : public ICatalog, private DB::WithContext void getTableMetadata( const std::string & namespace_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const override; bool tryGetTableMetadata( const std::string & schema_name, const std::string & table_name, + DB::ContextPtr context_, TableMetadata & result) const override; std::optional getStorageType() const override { return std::nullopt; } diff --git a/src/Disks/DiskObjectStorage/ObjectStorages/AzureBlobStorage/AzureObjectStorage.cpp b/src/Disks/DiskObjectStorage/ObjectStorages/AzureBlobStorage/AzureObjectStorage.cpp index 3363efbbbc2f..0aaf8564f5bf 100644 --- a/src/Disks/DiskObjectStorage/ObjectStorages/AzureBlobStorage/AzureObjectStorage.cpp +++ b/src/Disks/DiskObjectStorage/ObjectStorages/AzureBlobStorage/AzureObjectStorage.cpp @@ -22,6 +22,7 @@ #include #include #include +#include namespace CurrentMetrics @@ -34,6 +35,7 @@ namespace CurrentMetrics namespace ProfileEvents { extern const Event AzureListObjects; + extern const Event AzureListObjectsMicroseconds; extern const Event DiskAzureListObjects; extern const Event AzureDeleteObjects; extern const Event DiskAzureDeleteObjects; @@ -85,6 +87,7 @@ class AzureIteratorAsync final : public IObjectStorageIteratorAsync ProfileEvents::increment(ProfileEvents::AzureListObjects); if (client->IsClientForDisk()) ProfileEvents::increment(ProfileEvents::DiskAzureListObjects); + ProfileEventTimeIncrement watch(ProfileEvents::AzureListObjectsMicroseconds); chassert(batch.empty()); auto blob_list_response = client->ListBlobs(options); @@ -192,7 +195,15 @@ void AzureObjectStorage::listObjects(const std::string & path, RelativePathsWith else options.PageSizeHint = settings.get()->list_object_keys_size; - for (auto blob_list_response = client_ptr->ListBlobs(options); blob_list_response.HasPage(); blob_list_response.MoveToNextPage()) + AzureBlobStorage::ListBlobsPagedResponse blob_list_response; + + auto list_blobs = [&]()->void + { + ProfileEventTimeIncrement watch(ProfileEvents::AzureListObjectsMicroseconds); + blob_list_response = client_ptr->ListBlobs(options); + }; + + for (list_blobs(); blob_list_response.HasPage(); blob_list_response.MoveToNextPage()) { ProfileEvents::increment(ProfileEvents::AzureListObjects); if (client_ptr->IsClientForDisk()) diff --git a/src/Disks/DiskObjectStorage/ObjectStorages/S3/S3ObjectStorage.cpp b/src/Disks/DiskObjectStorage/ObjectStorages/S3/S3ObjectStorage.cpp index b4edbae24a4b..303c58efeb73 100644 --- a/src/Disks/DiskObjectStorage/ObjectStorages/S3/S3ObjectStorage.cpp +++ b/src/Disks/DiskObjectStorage/ObjectStorages/S3/S3ObjectStorage.cpp @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -40,6 +41,7 @@ namespace ProfileEvents { extern const Event S3ListObjects; + extern const Event S3ListObjectsMicroseconds; extern const Event DiskS3DeleteObjects; extern const Event DiskS3ListObjects; } @@ -148,7 +150,12 @@ class S3IteratorAsync final : public IObjectStorageIteratorAsync ProfileEvents::increment(ProfileEvents::S3ListObjects); ProfileEvents::increment(ProfileEvents::DiskS3ListObjects); - auto outcome = client->ListObjectsV2(*request); + Aws::S3::Model::ListObjectsV2Outcome outcome; + + { + ProfileEventTimeIncrement watch(ProfileEvents::S3ListObjectsMicroseconds); + outcome = client->ListObjectsV2(*request); + } /// Outcome failure will be handled on the caller side. if (outcome.IsSuccess()) @@ -321,7 +328,11 @@ void S3ObjectStorage::listObjects(const std::string & path, RelativePathsWithMet ProfileEvents::increment(ProfileEvents::S3ListObjects); ProfileEvents::increment(ProfileEvents::DiskS3ListObjects); - outcome = client.get()->ListObjectsV2(request); + { + ProfileEventTimeIncrement watch(ProfileEvents::S3ListObjectsMicroseconds); + outcome = client.get()->ListObjectsV2(request); + } + throwIfError(outcome); auto result = outcome.GetResult(); diff --git a/src/Disks/DiskType.cpp b/src/Disks/DiskType.cpp index bf4506b4cbf6..ddc42cd07dc3 100644 --- a/src/Disks/DiskType.cpp +++ b/src/Disks/DiskType.cpp @@ -10,7 +10,7 @@ namespace ErrorCodes extern const int LOGICAL_ERROR; } -MetadataStorageType metadataTypeFromString(const String & type) +MetadataStorageType metadataTypeFromString(const std::string & type) { auto check_type = Poco::toLower(type); if (check_type == "local") @@ -58,25 +58,7 @@ String DataSourceDescription::name() const case DataSourceType::RAM: return "memory"; case DataSourceType::ObjectStorage: - { - switch (object_storage_type) - { - case ObjectStorageType::S3: - return "s3"; - case ObjectStorageType::HDFS: - return "hdfs"; - case ObjectStorageType::Azure: - return "azure_blob_storage"; - case ObjectStorageType::Local: - return "local_blob_storage"; - case ObjectStorageType::Web: - return "web"; - case ObjectStorageType::None: - return "none"; - case ObjectStorageType::Max: - throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected object storage type: Max"); - } - } + return DB::toString(object_storage_type); } } @@ -86,4 +68,45 @@ String DataSourceDescription::toString() const name(), description, is_encrypted, is_cached, zookeeper_name); } +ObjectStorageType objectStorageTypeFromString(const std::string & type) +{ + auto check_type = Poco::toLower(type); + if (check_type == "s3") + return ObjectStorageType::S3; + if (check_type == "hdfs") + return ObjectStorageType::HDFS; + if (check_type == "azure_blob_storage" || check_type == "azure") + return ObjectStorageType::Azure; + if (check_type == "local_blob_storage" || check_type == "local") + return ObjectStorageType::Local; + if (check_type == "web") + return ObjectStorageType::Web; + if (check_type == "none") + return ObjectStorageType::None; + + throw Exception(ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG, + "Unknown object storage type: {}", type); +} + +std::string toString(ObjectStorageType type) +{ + switch (type) + { + case ObjectStorageType::S3: + return "s3"; + case ObjectStorageType::HDFS: + return "hdfs"; + case ObjectStorageType::Azure: + return "azure_blob_storage"; + case ObjectStorageType::Local: + return "local_blob_storage"; + case ObjectStorageType::Web: + return "web"; + case ObjectStorageType::None: + return "none"; + case ObjectStorageType::Max: + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected object storage type: Max"); + } +} + } diff --git a/src/Disks/DiskType.h b/src/Disks/DiskType.h index 9018cd605481..835c1341775b 100644 --- a/src/Disks/DiskType.h +++ b/src/Disks/DiskType.h @@ -36,7 +36,10 @@ enum class MetadataStorageType : uint8_t Memory, }; -MetadataStorageType metadataTypeFromString(const String & type); +MetadataStorageType metadataTypeFromString(const std::string & type); + +ObjectStorageType objectStorageTypeFromString(const std::string & type); +std::string toString(ObjectStorageType type); struct DataSourceDescription { diff --git a/src/IO/ReadBufferFromS3.cpp b/src/IO/ReadBufferFromS3.cpp index 74e654c5160b..b46a2d55c7f0 100644 --- a/src/IO/ReadBufferFromS3.cpp +++ b/src/IO/ReadBufferFromS3.cpp @@ -501,6 +501,12 @@ Aws::S3::Model::GetObjectResult ReadBufferFromS3::sendRequest(size_t attempt, si log, "Read S3 object. Bucket: {}, Key: {}, Version: {}, Offset: {}", bucket, key, version_id.empty() ? "Latest" : version_id, range_begin); } + else + { + LOG_TEST( + log, "Read S3 object. Bucket: {}, Key: {}, Version: {}", + bucket, key, version_id.empty() ? "Latest" : version_id); + } ProfileEvents::increment(ProfileEvents::S3GetObject); if (client_ptr->isClientForDisk()) diff --git a/src/IO/S3/Client.cpp b/src/IO/S3/Client.cpp index 27ecbb2a1dff..5df6a86d4327 100644 --- a/src/IO/S3/Client.cpp +++ b/src/IO/S3/Client.cpp @@ -455,7 +455,7 @@ Model::HeadObjectOutcome Client::HeadObject(HeadObjectRequest & request) const auto bucket_uri = getURIForBucket(bucket); if (!bucket_uri) { - if (auto maybe_error = updateURIForBucketForHead(bucket); maybe_error.has_value()) + if (auto maybe_error = updateURIForBucketForHead(bucket, request.GetKey()); maybe_error.has_value()) return *maybe_error; if (auto region = getRegionForBucket(bucket); !region.empty()) @@ -672,7 +672,6 @@ Client::doRequest(RequestType & request, RequestFn request_fn) const if (auto uri = getURIForBucket(bucket); uri.has_value()) request.overrideURI(std::move(*uri)); - bool found_new_endpoint = false; // if we found correct endpoint after 301 responses, update the cache for future requests SCOPE_EXIT( @@ -1041,12 +1040,15 @@ std::optional Client::getURIFromError(const Aws::S3::S3Error & error) c } // Do a list request because head requests don't have body in response -std::optional Client::updateURIForBucketForHead(const std::string & bucket) const +// S3 Tables don't support ListObjects, so made dirty workaroung - changed on GetObject +std::optional Client::updateURIForBucketForHead(const std::string & bucket, const std::string & key) const { - ListObjectsV2Request req; + GetObjectRequest req; req.SetBucket(bucket); - req.SetMaxKeys(1); - auto result = ListObjectsV2(req); + req.SetKey(key); + req.SetRange("bytes=0-1"); + auto result = GetObject(req); + if (result.IsSuccess()) return std::nullopt; return result.GetError(); diff --git a/src/IO/S3/Client.h b/src/IO/S3/Client.h index 61ee4ead3dc0..4a65239a8582 100644 --- a/src/IO/S3/Client.h +++ b/src/IO/S3/Client.h @@ -285,7 +285,7 @@ class Client : private Aws::S3::S3Client void updateURIForBucket(const std::string & bucket, S3::URI new_uri) const; std::optional getURIFromError(const Aws::S3::S3Error & error) const; - std::optional updateURIForBucketForHead(const std::string & bucket) const; + std::optional updateURIForBucketForHead(const std::string & bucket, const std::string & key) const; std::optional getURIForBucket(const std::string & bucket) const; diff --git a/src/IO/S3/URI.cpp b/src/IO/S3/URI.cpp index b8150e740144..5ab8e5cfd724 100644 --- a/src/IO/S3/URI.cpp +++ b/src/IO/S3/URI.cpp @@ -191,10 +191,72 @@ URI::URI(const std::string & uri_, bool allow_archive_path_syntax, bool keep_pre validateKey(key, uri); } +bool URI::isAWSRegion(std::string_view region) +{ + /// List from https://docs.aws.amazon.com/general/latest/gr/s3.html + static const std::unordered_set regions = { + "us-east-2", + "us-east-1", + "us-west-1", + "us-west-2", + "af-south-1", + "ap-east-1", + "ap-south-2", + "ap-southeast-3", + "ap-southeast-5", + "ap-southeast-4", + "ap-south-1", + "ap-northeast-3", + "ap-northeast-2", + "ap-southeast-1", + "ap-southeast-2", + "ap-east-2", + "ap-southeast-7", + "ap-northeast-1", + "ca-central-1", + "ca-west-1", + "eu-central-1", + "eu-west-1", + "eu-west-2", + "eu-south-1", + "eu-west-3", + "eu-south-2", + "eu-north-1", + "eu-central-2", + "il-central-1", + "mx-central-1", + "me-south-1", + "me-central-1", + "sa-east-1", + "us-gov-east-1", + "us-gov-west-1" + }; + + /// 's3-us-west-2' is a legacy region format for S3 storage, equals to 'us-west-2' + /// See https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#VirtualHostingBackwardsCompatibility + if (region.substr(0, 3) == "s3-") + region = region.substr(3); + + return regions.contains(region); +} + void URI::addRegionToURI(const std::string ®ion) { if (auto pos = endpoint.find(".amazonaws.com"); pos != std::string::npos) + { + if (pos > 0) + { /// Check if region is already in endpoint to avoid add it second time + auto prev_pos = endpoint.find_last_of("/.", pos - 1); + if (prev_pos == std::string::npos) + prev_pos = 0; + else + ++prev_pos; + std::string_view endpoint_region = std::string_view(endpoint).substr(prev_pos, pos - prev_pos); + if (isAWSRegion(endpoint_region)) + return; + } endpoint = endpoint.substr(0, pos) + "." + region + endpoint.substr(pos); + } } void URI::validateBucket(const String & bucket, const Poco::URI & uri) diff --git a/src/IO/S3/URI.h b/src/IO/S3/URI.h index fa259b9de451..fd45baa39774 100644 --- a/src/IO/S3/URI.h +++ b/src/IO/S3/URI.h @@ -44,6 +44,10 @@ struct URI static void validateBucket(const std::string & bucket, const Poco::URI & uri); static void validateKey(const std::string & key, const Poco::URI & uri); + + /// Returns true if 'region' string is an AWS S3 region + /// https://docs.aws.amazon.com/general/latest/gr/s3.html + static bool isAWSRegion(std::string_view region); }; } diff --git a/src/IO/S3/getObjectInfo.cpp b/src/IO/S3/getObjectInfo.cpp index a31c5f7add33..b54ce9cf0b6c 100644 --- a/src/IO/S3/getObjectInfo.cpp +++ b/src/IO/S3/getObjectInfo.cpp @@ -1,6 +1,7 @@ #include #include #include +#include #if USE_AWS_S3 @@ -15,6 +16,7 @@ namespace ProfileEvents extern const Event S3GetObject; extern const Event S3GetObjectTagging; extern const Event S3HeadObject; + extern const Event S3HeadObjectMicroseconds; extern const Event DiskS3GetObject; extern const Event DiskS3GetObjectTagging; extern const Event DiskS3HeadObject; @@ -35,6 +37,7 @@ namespace ProfileEvents::increment(ProfileEvents::S3HeadObject); if (client.isClientForDisk()) ProfileEvents::increment(ProfileEvents::DiskS3HeadObject); + ProfileEventTimeIncrement watch(ProfileEvents::S3HeadObjectMicroseconds); S3::HeadObjectRequest req; req.SetBucket(bucket); diff --git a/src/IO/S3Common.cpp b/src/IO/S3Common.cpp index 6d49fb90ef59..d281d5102e39 100644 --- a/src/IO/S3Common.cpp +++ b/src/IO/S3Common.cpp @@ -19,14 +19,6 @@ #include -namespace ProfileEvents -{ - extern const Event S3GetObjectMetadata; - extern const Event S3HeadObject; - extern const Event DiskS3GetObjectMetadata; - extern const Event DiskS3HeadObject; -} - namespace DB { diff --git a/src/Interpreters/Cluster.cpp b/src/Interpreters/Cluster.cpp index 9dfae5c4dcf9..95f0559748fc 100644 --- a/src/Interpreters/Cluster.cpp +++ b/src/Interpreters/Cluster.cpp @@ -740,9 +740,9 @@ void Cluster::initMisc() } } -std::unique_ptr Cluster::getClusterWithReplicasAsShards(const Settings & settings, size_t max_replicas_from_shard) const +std::unique_ptr Cluster::getClusterWithReplicasAsShards(const Settings & settings, size_t max_replicas_from_shard, size_t max_hosts) const { - return std::unique_ptr{ new Cluster(ReplicasAsShardsTag{}, *this, settings, max_replicas_from_shard)}; + return std::unique_ptr{ new Cluster(ReplicasAsShardsTag{}, *this, settings, max_replicas_from_shard, max_hosts)}; } std::unique_ptr Cluster::getClusterWithSingleShard(size_t index) const @@ -791,7 +791,7 @@ void shuffleReplicas(std::vector & replicas, const Settings & } -Cluster::Cluster(Cluster::ReplicasAsShardsTag, const Cluster & from, const Settings & settings, size_t max_replicas_from_shard) +Cluster::Cluster(Cluster::ReplicasAsShardsTag, const Cluster & from, const Settings & settings, size_t max_replicas_from_shard, size_t max_hosts) { if (from.addresses_with_failover.empty()) throw Exception(ErrorCodes::LOGICAL_ERROR, "Cluster is empty"); @@ -813,6 +813,7 @@ Cluster::Cluster(Cluster::ReplicasAsShardsTag, const Cluster & from, const Setti if (address.is_local) info.local_addresses.push_back(address); + addresses_with_failover.emplace_back(Addresses({address})); auto pool = ConnectionPoolFactory::instance().get( static_cast(settings[Setting::distributed_connections_pool_size]), @@ -836,9 +837,6 @@ Cluster::Cluster(Cluster::ReplicasAsShardsTag, const Cluster & from, const Setti info.per_replica_pools = {std::move(pool)}; info.default_database = address.default_database; - addresses_with_failover.emplace_back(Addresses{address}); - - slot_to_shard.insert(std::end(slot_to_shard), info.weight, shards_info.size()); shards_info.emplace_back(std::move(info)); } }; @@ -860,10 +858,37 @@ Cluster::Cluster(Cluster::ReplicasAsShardsTag, const Cluster & from, const Setti secret = from.secret; name = from.name; + constrainShardInfoAndAddressesToMaxHosts(max_hosts); + + for (size_t i = 0; i < shards_info.size(); ++i) + slot_to_shard.insert(std::end(slot_to_shard), shards_info[i].weight, i); + initMisc(); } +void Cluster::constrainShardInfoAndAddressesToMaxHosts(size_t max_hosts) +{ + if (max_hosts == 0 || shards_info.size() <= max_hosts) + return; + + pcg64_fast gen{randomSeed()}; + std::shuffle(shards_info.begin(), shards_info.end(), gen); + shards_info.resize(max_hosts); + + AddressesWithFailover addresses_with_failover_; + + UInt32 shard_num = 0; + for (auto & shard_info : shards_info) + { + addresses_with_failover_.push_back(addresses_with_failover[shard_info.shard_num - 1]); + shard_info.shard_num = ++shard_num; + } + + addresses_with_failover.swap(addresses_with_failover_); +} + + Cluster::Cluster(Cluster::SubclusterTag, const Cluster & from, const std::vector & indices) { for (size_t index : indices) diff --git a/src/Interpreters/Cluster.h b/src/Interpreters/Cluster.h index b00cf5738f4b..b20f989da2d2 100644 --- a/src/Interpreters/Cluster.h +++ b/src/Interpreters/Cluster.h @@ -276,7 +276,7 @@ class Cluster std::unique_ptr getClusterWithMultipleShards(const std::vector & indices) const; /// Get a new Cluster that contains all servers (all shards with all replicas) from existing cluster as independent shards. - std::unique_ptr getClusterWithReplicasAsShards(const Settings & settings, size_t max_replicas_from_shard = 0) const; + std::unique_ptr getClusterWithReplicasAsShards(const Settings & settings, size_t max_replicas_from_shard = 0, size_t max_hosts = 0) const; /// Returns false if cluster configuration doesn't allow to use it for cross-replication. /// NOTE: true does not mean, that it's actually a cross-replication cluster. @@ -302,7 +302,7 @@ class Cluster /// For getClusterWithReplicasAsShards implementation struct ReplicasAsShardsTag {}; - Cluster(ReplicasAsShardsTag, const Cluster & from, const Settings & settings, size_t max_replicas_from_shard); + Cluster(ReplicasAsShardsTag, const Cluster & from, const Settings & settings, size_t max_replicas_from_shard, size_t max_hosts); void addShard( const Settings & settings, @@ -313,6 +313,9 @@ class Cluster UInt32 weight = 1, bool internal_replication = false); + /// Reduce size of cluster to max_hosts + void constrainShardInfoAndAddressesToMaxHosts(size_t max_hosts); + /// Inter-server secret String secret; diff --git a/src/Interpreters/ClusterDiscovery.cpp b/src/Interpreters/ClusterDiscovery.cpp index 98f57c31e9e5..e150da22cb1e 100644 --- a/src/Interpreters/ClusterDiscovery.cpp +++ b/src/Interpreters/ClusterDiscovery.cpp @@ -290,17 +290,32 @@ Strings ClusterDiscovery::getNodeNames(zkutil::ZooKeeperPtr & zk, auto callback = get_nodes_callbacks.find(cluster_name); if (callback == get_nodes_callbacks.end()) { - auto watch_dynamic_callback = std::make_shared([ - cluster_name, - my_clusters_to_update = clusters_to_update, - my_discovery_paths_need_update = multicluster_discovery_paths[zk_root_index - 1].need_update - ](auto) - { - my_discovery_paths_need_update->store(true); - my_clusters_to_update->set(cluster_name); - }); - auto res = get_nodes_callbacks.insert(std::make_pair(cluster_name, watch_dynamic_callback)); - callback = res.first; + if (zk_root_index > 0) + { + auto watch_dynamic_callback = std::make_shared([ + cluster_name, + my_clusters_to_update = clusters_to_update, + my_discovery_paths_need_update = multicluster_discovery_paths[zk_root_index - 1].need_update + ](auto) + { + my_discovery_paths_need_update->store(true); + my_clusters_to_update->set(cluster_name); + }); + auto res = get_nodes_callbacks.insert(std::make_pair(cluster_name, watch_dynamic_callback)); + callback = res.first; + } + else + { // zk_root_index == 0 for static clusters + auto watch_dynamic_callback = std::make_shared([ + cluster_name, + my_clusters_to_update = clusters_to_update + ](auto) + { + my_clusters_to_update->set(cluster_name); + }); + auto res = get_nodes_callbacks.insert(std::make_pair(cluster_name, watch_dynamic_callback)); + callback = res.first; + } } nodes = zk->getChildrenWatch(getShardsListPath(zk_root), &stat, callback->second); } diff --git a/src/Interpreters/IcebergMetadataLog.cpp b/src/Interpreters/IcebergMetadataLog.cpp index 2df936e77253..c0552c14e0c2 100644 --- a/src/Interpreters/IcebergMetadataLog.cpp +++ b/src/Interpreters/IcebergMetadataLog.cpp @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -80,7 +81,7 @@ void IcebergMetadataLogElement::appendToBlock(MutableColumns & columns) const void insertRowToLogTable( const ContextPtr & local_context, - String row, + std::function get_row, IcebergMetadataLogLevel row_log_level, const String & table_path, const String & file_path, @@ -108,7 +109,7 @@ void insertRowToLogTable( .content_type = row_log_level, .table_path = table_path, .file_path = file_path, - .metadata_content = row, + .metadata_content = get_row(), .row_in_file = row_in_file, .pruning_status = pruning_status}); } diff --git a/src/Interpreters/IcebergMetadataLog.h b/src/Interpreters/IcebergMetadataLog.h index 0a86cf921083..be1114c4a847 100644 --- a/src/Interpreters/IcebergMetadataLog.h +++ b/src/Interpreters/IcebergMetadataLog.h @@ -26,9 +26,11 @@ struct IcebergMetadataLogElement void appendToBlock(MutableColumns & columns) const; }; +/// Here `get_row` function is used instead `row` string to calculate string only when required. +/// Inside `insertRowToLogTable` code can exit immediately after `iceberg_metadata_log_level` setting check. void insertRowToLogTable( const ContextPtr & local_context, - String row, + std::function get_row, IcebergMetadataLogLevel row_log_level, const String & table_path, const String & file_path, diff --git a/src/Interpreters/InterpreterCreateQuery.cpp b/src/Interpreters/InterpreterCreateQuery.cpp index 8cffff74a894..8540f1893900 100644 --- a/src/Interpreters/InterpreterCreateQuery.cpp +++ b/src/Interpreters/InterpreterCreateQuery.cpp @@ -2026,8 +2026,7 @@ bool InterpreterCreateQuery::doCreateTable(ASTCreateQuery & create, auto table_function_ast = create.as_table_function->ptr(); auto table_function = TableFunctionFactory::instance().get(table_function_ast, getContext()); - if (!table_function->canBeUsedToCreateTable()) - throw Exception(ErrorCodes::BAD_ARGUMENTS, "Table function '{}' cannot be used to create a table", table_function->getName()); + table_function->validateUseToCreateTable(); /// In case of CREATE AS table_function() query we should use global context /// in storage creation because there will be no query context on server startup diff --git a/src/Interpreters/InterpreterInsertQuery.cpp b/src/Interpreters/InterpreterInsertQuery.cpp index a254a0ad9ea5..69281ec1f95d 100644 --- a/src/Interpreters/InterpreterInsertQuery.cpp +++ b/src/Interpreters/InterpreterInsertQuery.cpp @@ -814,6 +814,9 @@ std::optional InterpreterInsertQuery::distributedWriteIntoReplica if (!src_storage_cluster) return {}; + if (src_storage_cluster->getClusterName(local_context).empty()) + return {}; + if (!isInsertSelectTrivialEnoughForDistributedExecution(query)) return {}; diff --git a/src/Parsers/ASTSetQuery.cpp b/src/Parsers/ASTSetQuery.cpp index f4fe077280ac..968e4f4ee569 100644 --- a/src/Parsers/ASTSetQuery.cpp +++ b/src/Parsers/ASTSetQuery.cpp @@ -131,7 +131,8 @@ void ASTSetQuery::formatImpl(WriteBuffer & ostr, const FormatSettings & format, return true; } - if (DataLake::DATABASE_ENGINE_NAME == state.create_engine_name) + if (DataLake::DATABASE_ENGINE_NAME == state.create_engine_name + || DataLake::DATABASE_ALIAS_NAME == state.create_engine_name) { if (DataLake::SETTINGS_TO_HIDE.contains(change.name)) { diff --git a/src/Parsers/FunctionSecretArgumentsFinder.h b/src/Parsers/FunctionSecretArgumentsFinder.h index 8a3ef97422e8..e2b9a957f833 100644 --- a/src/Parsers/FunctionSecretArgumentsFinder.h +++ b/src/Parsers/FunctionSecretArgumentsFinder.h @@ -3,9 +3,12 @@ #include #include #include +#include +#include #include #include #include +#include namespace DB @@ -29,6 +32,21 @@ class AbstractFunction virtual ~Arguments() = default; virtual size_t size() const = 0; virtual std::unique_ptr at(size_t n) const = 0; + void skipArgument(size_t n) { skipped_indexes.insert(n); } + void unskipArguments() { skipped_indexes.clear(); } + size_t getRealIndex(size_t n) const + { + for (auto idx : skipped_indexes) + { + if (n < idx) + break; + ++n; + } + return n; + } + size_t skippedSize() const { return skipped_indexes.size(); } + private: + std::set skipped_indexes; }; virtual ~AbstractFunction() = default; @@ -77,14 +95,15 @@ class FunctionSecretArgumentsFinder { if (index >= function->arguments->size()) return; + auto real_index = function->arguments->getRealIndex(index); if (!result.count) { - result.start = index; + result.start = real_index; result.are_named = argument_is_named; } - chassert(index >= result.start); /// We always check arguments consecutively + chassert(real_index >= result.start); /// We always check arguments consecutively chassert(result.replacement.empty()); /// We shouldn't use replacement with masking other arguments - result.count = index + 1 - result.start; + result.count = real_index + 1 - result.start; if (!argument_is_named) result.are_named = false; } @@ -102,8 +121,16 @@ class FunctionSecretArgumentsFinder { findMongoDBSecretArguments(); } + else if (function->name() == "iceberg") + { + findIcebergFunctionSecretArguments(/* is_cluster_function= */ false); + } + else if (function ->name() == "icebergCluster") + { + findIcebergFunctionSecretArguments(/* is_cluster_function= */ true); + } else if ((function->name() == "s3") || (function->name() == "cosn") || (function->name() == "oss") || - (function->name() == "deltaLake") || (function->name() == "hudi") || (function->name() == "iceberg") || + (function->name() == "deltaLake") || (function->name() == "hudi") || (function->name() == "gcs") || (function->name() == "icebergS3") || (function->name() == "paimon") || (function->name() == "paimonS3")) { @@ -112,7 +139,7 @@ class FunctionSecretArgumentsFinder } else if ((function->name() == "s3Cluster") || (function ->name() == "hudiCluster") || (function ->name() == "deltaLakeCluster") || (function ->name() == "deltaLakeS3Cluster") || - (function ->name() == "icebergS3Cluster") || (function ->name() == "icebergCluster")) + (function ->name() == "icebergS3Cluster")) { /// s3Cluster('cluster_name', 'url', 'aws_access_key_id', 'aws_secret_access_key', ...) findS3FunctionSecretArguments(/* is_cluster_function= */ true); @@ -270,6 +297,12 @@ class FunctionSecretArgumentsFinder findSecretNamedArgument("secret_access_key", 1); return; } + if (is_cluster_function && isNamedCollectionName(1)) + { + /// s3Cluster(cluster, named_collection, ..., secret_access_key = 'secret_access_key', ...) + findSecretNamedArgument("secret_access_key", 2); + return; + } findSecretNamedArgument("secret_access_key", url_arg_idx); @@ -277,6 +310,7 @@ class FunctionSecretArgumentsFinder /// s3('url', NOSIGN, 'format' [, 'compression'] [, extra_credentials(..)] [, headers(..)]) /// s3('url', 'format', 'structure' [, 'compression'] [, extra_credentials(..)] [, headers(..)]) size_t count = excludeS3OrURLNestedMaps(); + if ((url_arg_idx + 3 <= count) && (count <= url_arg_idx + 4)) { String second_arg; @@ -341,6 +375,48 @@ class FunctionSecretArgumentsFinder markSecretArgument(url_arg_idx + 4); } + std::string findIcebergStorageType(bool is_cluster_function) + { + std::string storage_type = "s3"; + + size_t count = function->arguments->size(); + if (!count) + return storage_type; + + auto storage_type_idx = findNamedArgument(&storage_type, "storage_type"); + if (storage_type_idx != -1) + { + storage_type = Poco::toLower(storage_type); + function->arguments->skipArgument(storage_type_idx); + } + else if (isNamedCollectionName(is_cluster_function ? 1 : 0)) + { + std::string collection_name; + if (function->arguments->at(is_cluster_function ? 1 : 0)->tryGetString(&collection_name, true)) + { + NamedCollectionPtr collection = NamedCollectionFactory::instance().tryGet(collection_name); + if (collection && collection->has("storage_type")) + { + storage_type = Poco::toLower(collection->get("storage_type")); + } + } + } + + return storage_type; + } + + void findIcebergFunctionSecretArguments(bool is_cluster_function) + { + auto storage_type = findIcebergStorageType(is_cluster_function); + + if (storage_type == "s3") + findS3FunctionSecretArguments(is_cluster_function); + else if (storage_type == "azure") + findAzureBlobStorageFunctionSecretArguments(is_cluster_function); + + function->arguments->unskipArguments(); + } + bool maskAzureConnectionString(ssize_t url_arg_idx, bool argument_is_named = false, size_t start = 0) { String url_arg; @@ -364,7 +440,7 @@ class FunctionSecretArgumentsFinder if (RE2::Replace(&url_arg, account_key_pattern, "AccountKey=[HIDDEN]\\1")) { chassert(result.count == 0); /// We shouldn't use replacement with masking other arguments - result.start = url_arg_idx; + result.start = function->arguments->getRealIndex(url_arg_idx); result.are_named = argument_is_named; result.count = 1; result.replacement = url_arg; @@ -375,7 +451,7 @@ class FunctionSecretArgumentsFinder if (RE2::Replace(&url_arg, sas_signature_pattern, "SharedAccessSignature=[HIDDEN]\\1")) { chassert(result.count == 0); /// We shouldn't use replacement with masking other arguments - result.start = url_arg_idx; + result.start = function->arguments->getRealIndex(url_arg_idx); result.are_named = argument_is_named; result.count = 1; result.replacement = url_arg; @@ -534,6 +610,7 @@ class FunctionSecretArgumentsFinder void findTableEngineSecretArguments() { const String & engine_name = function->name(); + if (engine_name == "ExternalDistributed") { /// ExternalDistributed('engine', 'host:port', 'database', 'table', 'user', 'password') @@ -551,10 +628,13 @@ class FunctionSecretArgumentsFinder { findMongoDBSecretArguments(); } + else if (engine_name == "Iceberg") + { + findIcebergTableEngineSecretArguments(); + } else if ((engine_name == "S3") || (engine_name == "COSN") || (engine_name == "OSS") || (engine_name == "DeltaLake") || (engine_name == "Hudi") - || (engine_name == "Iceberg") || (engine_name == "IcebergS3") - || (engine_name == "S3Queue")) + || (engine_name == "IcebergS3") || (engine_name == "S3Queue")) { /// S3('url', ['aws_access_key_id', 'aws_secret_access_key',] ...) findS3TableEngineSecretArguments(); @@ -563,7 +643,7 @@ class FunctionSecretArgumentsFinder { findURLSecretArguments(); } - else if (engine_name == "AzureBlobStorage" || engine_name == "AzureQueue") + else if (engine_name == "AzureBlobStorage" || engine_name == "AzureQueue" || engine_name == "IcebergAzure") { findAzureBlobStorageTableEngineSecretArguments(); } @@ -681,6 +761,18 @@ class FunctionSecretArgumentsFinder markSecretArgument(2); } + void findIcebergTableEngineSecretArguments() + { + auto storage_type = findIcebergStorageType(0); + + if (storage_type == "s3") + findS3TableEngineSecretArguments(); + else if (storage_type == "azure") + findAzureBlobStorageTableEngineSecretArguments(); + + function->arguments->unskipArguments(); + } + void findDatabaseEngineSecretArguments() { const String & engine_name = function->name(); @@ -697,7 +789,7 @@ class FunctionSecretArgumentsFinder /// S3('url', 'access_key_id', 'secret_access_key') findS3DatabaseSecretArguments(); } - else if (engine_name == "DataLakeCatalog") + else if (engine_name == "DataLakeCatalog" || engine_name == "Iceberg") { findDataLakeCatalogSecretArguments(); } diff --git a/src/Parsers/FunctionSecretArgumentsFinderAST.h b/src/Parsers/FunctionSecretArgumentsFinderAST.h index 86211b3a299c..42d6ffc806c0 100644 --- a/src/Parsers/FunctionSecretArgumentsFinderAST.h +++ b/src/Parsers/FunctionSecretArgumentsFinderAST.h @@ -54,10 +54,13 @@ class FunctionAST : public AbstractFunction { public: explicit ArgumentsAST(const ASTs * arguments_) : arguments(arguments_) {} - size_t size() const override { return arguments ? arguments->size() : 0; } + size_t size() const override + { /// size withous skipped indexes + return arguments ? arguments->size() - skippedSize() : 0; + } std::unique_ptr at(size_t n) const override - { - return std::make_unique(arguments->at(n).get()); + { /// n is relative index, some can be skipped + return std::make_unique(arguments->at(getRealIndex(n)).get()); } private: const ASTs * arguments = nullptr; diff --git a/src/Processors/QueryPlan/ReadFromObjectStorageStep.cpp b/src/Processors/QueryPlan/ReadFromObjectStorageStep.cpp index ee08f83b46d8..827c53b3ed6b 100644 --- a/src/Processors/QueryPlan/ReadFromObjectStorageStep.cpp +++ b/src/Processors/QueryPlan/ReadFromObjectStorageStep.cpp @@ -132,7 +132,7 @@ void ReadFromObjectStorageStep::initializePipeline(QueryPipelineBuilder & pipeli size_t output_ports = pipe.numOutputPorts(); const bool parallelize_output = context->getSettingsRef()[Setting::parallelize_output_from_storages]; if (parallelize_output - && FormatFactory::instance().checkParallelizeOutputAfterReading(configuration->format, context) + && FormatFactory::instance().checkParallelizeOutputAfterReading(configuration->getFormat(), context) && output_ports > 0 && output_ports < max_num_streams) pipe.resize(max_num_streams); diff --git a/src/Server/TCPHandler.cpp b/src/Server/TCPHandler.cpp index 85ad3a977a4c..dfcd5ffca2bc 100644 --- a/src/Server/TCPHandler.cpp +++ b/src/Server/TCPHandler.cpp @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -34,7 +35,6 @@ #include #include #include -#include #include #include #include diff --git a/src/Storages/HivePartitioningUtils.cpp b/src/Storages/HivePartitioningUtils.cpp index 86084717dd8e..060f04474e98 100644 --- a/src/Storages/HivePartitioningUtils.cpp +++ b/src/Storages/HivePartitioningUtils.cpp @@ -210,9 +210,9 @@ HivePartitionColumnsWithFileColumnsPair setupHivePartitioningForObjectStorage( * Otherwise, in case `use_hive_partitioning=1`, we can keep the old behavior of extracting it from the sample path. * And if the schema was inferred (not specified in the table definition), we need to enrich it with the path partition columns */ - if (configuration->partition_strategy && configuration->partition_strategy_type == PartitionStrategyFactory::StrategyType::HIVE) + if (configuration->getPartitionStrategy() && configuration->getPartitionStrategyType() == PartitionStrategyFactory::StrategyType::HIVE) { - hive_partition_columns_to_read_from_file_path = configuration->partition_strategy->getPartitionColumns(); + hive_partition_columns_to_read_from_file_path = configuration->getPartitionStrategy()->getPartitionColumns(); sanityCheckSchemaAndHivePartitionColumns(hive_partition_columns_to_read_from_file_path, columns, /* check_contained_in_schema */true); } else if (context->getSettingsRef()[Setting::use_hive_partitioning]) @@ -226,7 +226,7 @@ HivePartitionColumnsWithFileColumnsPair setupHivePartitioningForObjectStorage( sanityCheckSchemaAndHivePartitionColumns(hive_partition_columns_to_read_from_file_path, columns, /* check_contained_in_schema */false); } - if (configuration->partition_columns_in_data_file) + if (configuration->getPartitionColumnsInDataFile()) { file_columns = columns.getAllPhysical(); } diff --git a/src/Storages/IStorage.h b/src/Storages/IStorage.h index f0084630324d..ed69b0365ae6 100644 --- a/src/Storages/IStorage.h +++ b/src/Storages/IStorage.h @@ -71,6 +71,9 @@ using ConditionSelectivityEstimatorPtr = std::shared_ptr; + class ActionsDAG; /** Storage. Describes the table. Responsible for @@ -434,6 +437,7 @@ class IStorage : public std::enable_shared_from_this, public TypePromo size_t /*max_block_size*/, size_t /*num_streams*/); +public: /// Should we process blocks of data returned by the storage in parallel /// even when the storage returned only one stream of data for reading? /// It is beneficial, for example, when you read from a file quickly, @@ -444,7 +448,6 @@ class IStorage : public std::enable_shared_from_this, public TypePromo /// useless). virtual bool parallelizeOutputAfterReading(ContextPtr) const { return !isSystemStorage(); } -public: /// Other version of read which adds reading step to query plan. /// Default implementation creates ReadFromStorageStep and uses usual read. /// Can be called after `shutdown`, but not after `drop`. diff --git a/src/Storages/IStorageCluster.cpp b/src/Storages/IStorageCluster.cpp index c6c69c0f21bc..b98779260112 100644 --- a/src/Storages/IStorageCluster.cpp +++ b/src/Storages/IStorageCluster.cpp @@ -1,5 +1,8 @@ #include +#include +#include + #include #include #include @@ -12,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -20,6 +24,10 @@ #include #include #include +#include +#include +#include +#include #include #include @@ -34,13 +42,15 @@ namespace Setting extern const SettingsBool async_query_sending_for_remote; extern const SettingsBool async_socket_for_remote; extern const SettingsBool skip_unavailable_shards; - extern const SettingsBool parallel_replicas_local_plan; - extern const SettingsString cluster_for_parallel_replicas; extern const SettingsNonZeroUInt64 max_parallel_replicas; + extern const SettingsUInt64 object_storage_max_nodes; + extern const SettingsBool object_storage_remote_initiator; + extern const SettingsString object_storage_remote_initiator_cluster; } namespace ErrorCodes { + extern const int NOT_IMPLEMENTED; extern const int ALL_CONNECTION_TRIES_FAILED; } @@ -131,22 +141,29 @@ void IStorageCluster::read( SelectQueryInfo & query_info, ContextPtr context, QueryProcessingStage::Enum processed_stage, - size_t /*max_block_size*/, - size_t /*num_streams*/) + size_t max_block_size, + size_t num_streams) { + auto cluster_name_from_settings = getClusterName(context); + + if (!isClusterSupported() || cluster_name_from_settings.empty()) + { + readFallBackToPure(query_plan, column_names, storage_snapshot, query_info, context, processed_stage, max_block_size, num_streams); + return; + } + updateConfigurationIfNeeded(context); storage_snapshot->check(column_names); - updateBeforeRead(context); - auto cluster = getCluster(context); + const auto & settings = context->getSettingsRef(); /// Calculate the header. This is significant, because some columns could be thrown away in some cases like query with count(*) SharedHeader sample_block; ASTPtr query_to_send = query_info.query; - if (context->getSettingsRef()[Setting::allow_experimental_analyzer]) + if (settings[Setting::allow_experimental_analyzer]) { sample_block = InterpreterSelectQueryAnalyzer::getSampleBlock(query_info.query, context, SelectQueryOptions(processed_stage)); } @@ -159,6 +176,31 @@ void IStorageCluster::read( updateQueryToSendIfNeeded(query_to_send, storage_snapshot, context); + /// In case the current node is not supposed to initiate the clustered query + /// Sends this query to a remote initiator using the `remote` table function + if (settings[Setting::object_storage_remote_initiator]) + { + /// Re-writes queries in the form of: + /// Input: SELECT * FROM iceberg(...) SETTINGS object_storage_cluster='swarm', object_storage_remote_initiator=1 + /// Output: SELECT * FROM remote('remote_host', icebergCluster('swarm', ...) + /// Where `remote_host` is a random host from the cluster which will execute the query + /// This means the initiator node belongs to the same cluster that will execute the query + /// In case remote_initiator_cluster_name is set, the initiator might be set to a different cluster + auto remote_initiator_cluster_name = settings[Setting::object_storage_remote_initiator_cluster].value; + if (remote_initiator_cluster_name.empty()) + remote_initiator_cluster_name = cluster_name_from_settings; + auto remote_initiator_cluster = getClusterImpl(context, remote_initiator_cluster_name); + auto storage_and_context = convertToRemote(remote_initiator_cluster, context, remote_initiator_cluster_name, query_to_send); + auto src_distributed = std::dynamic_pointer_cast(storage_and_context.storage); + auto modified_query_info = query_info; + modified_query_info.cluster = src_distributed->getCluster(); + auto new_storage_snapshot = storage_and_context.storage->getStorageSnapshot(storage_snapshot->metadata, storage_and_context.context); + storage_and_context.storage->read(query_plan, column_names, new_storage_snapshot, modified_query_info, storage_and_context.context, processed_stage, max_block_size, num_streams); + return; + } + + auto cluster = getClusterImpl(context, cluster_name_from_settings, isObjectStorage() ? settings[Setting::object_storage_max_nodes] : 0); + RestoreQualifiedNamesVisitor::Data data; data.distributed_table = DatabaseAndTableWithAlias(*getTableExpression(query_info.query->as(), 0)); data.remote_table.database = context->getCurrentDatabase(); @@ -186,6 +228,95 @@ void IStorageCluster::read( query_plan.addStep(std::move(reading)); } +IStorageCluster::RemoteCallVariables IStorageCluster::convertToRemote( + ClusterPtr cluster, + ContextPtr context, + const std::string & cluster_name_from_settings, + ASTPtr query_to_send) +{ + /// TODO: Allow to use secret for remote queries + if (!cluster->getSecret().empty()) + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Can't convert query to remote when cluster uses secret"); + + auto host_addresses = cluster->getShardsAddresses(); + if (host_addresses.empty()) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Empty cluster {}", cluster_name_from_settings); + + pcg64 rng(randomSeed()); + size_t shard_num = rng() % host_addresses.size(); + auto shard_addresses = host_addresses[shard_num]; + /// After getClusterImpl each shard must have exactly 1 replica + if (shard_addresses.size() != 1) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Size of shard {} in cluster {} is not equal 1", shard_num, cluster_name_from_settings); + std::string host_name; + Poco::URI::decode(shard_addresses[0].toString(), host_name); + + LOG_INFO(log, "Choose remote initiator '{}'", host_name); + + bool secure = shard_addresses[0].secure == Protocol::Secure::Enable; + std::string remote_function_name = secure ? "remoteSecure" : "remote"; + + /// Clean object_storage_remote_initiator setting to avoid infinite remote call + auto new_context = Context::createCopy(context); + new_context->setSetting("object_storage_remote_initiator", false); + new_context->setSetting("object_storage_remote_initiator_cluster", String("")); + + auto * select_query = query_to_send->as(); + if (!select_query) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Expected SELECT query"); + + auto query_settings = select_query->settings(); + if (query_settings) + { + auto & settings_ast = query_settings->as(); + if (settings_ast.changes.removeSetting("object_storage_remote_initiator") && settings_ast.changes.empty()) + { + select_query->setExpression(ASTSelectQuery::Expression::SETTINGS, {}); + } + } + + ASTTableExpression * table_expression = extractTableExpressionASTPtrFromSelectQuery(query_to_send); + if (!table_expression) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Can't find table expression"); + + boost::intrusive_ptr remote_query; + + if (shard_addresses[0].user_specified) + { // with user/password for clsuter access remote query is executed from this user, add it in query parameters + remote_query = makeASTFunction(remote_function_name, + make_intrusive(host_name), + table_expression->table_function, + make_intrusive(shard_addresses[0].user), + make_intrusive(shard_addresses[0].password)); + } + else + { // without specified user/password remote query is executed from default user + remote_query = makeASTFunction(remote_function_name, make_intrusive(host_name), table_expression->table_function); + } + + table_expression->table_function = remote_query; + + auto remote_function = TableFunctionFactory::instance().get(remote_query, new_context); + + auto storage = remote_function->execute(query_to_send, new_context, remote_function_name); + + return RemoteCallVariables{storage, new_context}; +} + +SinkToStoragePtr IStorageCluster::write( + const ASTPtr & query, + const StorageMetadataPtr & metadata_snapshot, + ContextPtr context, + bool async_insert) +{ + auto cluster_name_from_settings = getClusterName(context); + + if (cluster_name_from_settings.empty()) + return writeFallBackToPure(query, metadata_snapshot, context, async_insert); + + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method write is not supported by storage {}", getName()); +} + void ReadFromCluster::initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &) { const Scalars & scalars = context->hasQueryContext() ? context->getQueryContext()->getScalars() : Scalars{}; @@ -278,9 +409,9 @@ ContextPtr ReadFromCluster::updateSettings(const Settings & settings) return new_context; } -ClusterPtr IStorageCluster::getCluster(ContextPtr context) const +ClusterPtr IStorageCluster::getClusterImpl(ContextPtr context, const String & cluster_name_, size_t max_hosts) { - return context->getCluster(cluster_name)->getClusterWithReplicasAsShards(context->getSettingsRef()); + return context->getCluster(cluster_name_)->getClusterWithReplicasAsShards(context->getSettingsRef(), /* max_replicas_from_shard */ 0, max_hosts); } } diff --git a/src/Storages/IStorageCluster.h b/src/Storages/IStorageCluster.h index 3248b26b8c5e..ac266bf82da7 100644 --- a/src/Storages/IStorageCluster.h +++ b/src/Storages/IStorageCluster.h @@ -30,10 +30,16 @@ class IStorageCluster : public IStorage SelectQueryInfo & query_info, ContextPtr context, QueryProcessingStage::Enum processed_stage, - size_t /*max_block_size*/, - size_t /*num_streams*/) override; + size_t max_block_size, + size_t num_streams) override; - ClusterPtr getCluster(ContextPtr context) const; + SinkToStoragePtr write( + const ASTPtr & query, + const StorageMetadataPtr & metadata_snapshot, + ContextPtr context, + bool async_insert) override; + + ClusterPtr getCluster(ContextPtr context) const { return getClusterImpl(context, cluster_name); } /// Query is needed for pruning by virtual columns (_file, _path) virtual RemoteQueryExecutor::Extension getTaskIteratorExtension( @@ -51,13 +57,53 @@ class IStorageCluster : public IStorage bool supportsOptimizationToSubcolumns() const override { return false; } bool supportsTrivialCountOptimization(const StorageSnapshotPtr &, ContextPtr) const override { return true; } + const String & getOriginalClusterName() const { return cluster_name; } + virtual String getClusterName(ContextPtr /* context */) const { return getOriginalClusterName(); } + protected: - virtual void updateBeforeRead(const ContextPtr &) {} virtual void updateQueryToSendIfNeeded(ASTPtr & /*query*/, const StorageSnapshotPtr & /*storage_snapshot*/, const ContextPtr & /*context*/) {} virtual void updateConfigurationIfNeeded(ContextPtr /* context */) {} + struct RemoteCallVariables + { + StoragePtr storage; + ContextPtr context; + }; + + RemoteCallVariables convertToRemote( + ClusterPtr cluster, + ContextPtr context, + const std::string & cluster_name_from_settings, + ASTPtr query_to_send); + + virtual void readFallBackToPure( + QueryPlan & /* query_plan */, + const Names & /* column_names */, + const StorageSnapshotPtr & /* storage_snapshot */, + SelectQueryInfo & /* query_info */, + ContextPtr /* context */, + QueryProcessingStage::Enum /* processed_stage */, + size_t /* max_block_size */, + size_t /* num_streams */) + { + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method readFallBackToPure is not supported by storage {}", getName()); + } + + virtual SinkToStoragePtr writeFallBackToPure( + const ASTPtr & /*query*/, + const StorageMetadataPtr & /*metadata_snapshot*/, + ContextPtr /*context*/, + bool /*async_insert*/) + { + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method writeFallBackToPure is not supported by storage {}", getName()); + } + private: + static ClusterPtr getClusterImpl(ContextPtr context, const String & cluster_name_, size_t max_hosts = 0); + + virtual bool isClusterSupported() const { return true; } + LoggerPtr log; String cluster_name; }; diff --git a/src/Storages/ObjectStorage/Azure/Configuration.cpp b/src/Storages/ObjectStorage/Azure/Configuration.cpp index f4abe3ac2c29..23aa8307bf7d 100644 --- a/src/Storages/ObjectStorage/Azure/Configuration.cpp +++ b/src/Storages/ObjectStorage/Azure/Configuration.cpp @@ -65,6 +65,7 @@ const std::unordered_set optional_configuration_keys = { "partition_columns_in_data_file", "client_id", "tenant_id", + "storage_type", }; void StorageAzureConfiguration::check(ContextPtr context) @@ -208,10 +209,6 @@ void AzureStorageParsedArguments::fromNamedCollection(const NamedCollection & co String connection_url; String container_name; - std::optional account_name; - std::optional account_key; - std::optional client_id; - std::optional tenant_id; if (collection.has("connection_string")) connection_url = collection.get("connection_string"); @@ -392,16 +389,10 @@ void AzureStorageParsedArguments::fromAST(ASTs & engine_args, ContextPtr context std::unordered_map engine_args_to_idx; - String connection_url = checkAndGetLiteralArgument(engine_args[0], "connection_string/storage_account_url"); String container_name = checkAndGetLiteralArgument(engine_args[1], "container"); blob_path = checkAndGetLiteralArgument(engine_args[2], "blobpath"); - std::optional account_name; - std::optional account_key; - std::optional client_id; - std::optional tenant_id; - collectCredentials(extra_credentials, client_id, tenant_id, context); auto is_format_arg = [] (const std::string & s) -> bool @@ -451,8 +442,7 @@ void AzureStorageParsedArguments::fromAST(ASTs & engine_args, ContextPtr context auto sixth_arg = checkAndGetLiteralArgument(engine_args[5], "partition_strategy/structure"); if (magic_enum::enum_contains(sixth_arg, magic_enum::case_insensitive)) { - partition_strategy_type - = magic_enum::enum_cast(sixth_arg, magic_enum::case_insensitive).value(); + partition_strategy_type = magic_enum::enum_cast(sixth_arg, magic_enum::case_insensitive).value(); } else { @@ -572,8 +562,7 @@ void AzureStorageParsedArguments::fromAST(ASTs & engine_args, ContextPtr context auto eighth_arg = checkAndGetLiteralArgument(engine_args[7], "partition_strategy/structure"); if (magic_enum::enum_contains(eighth_arg, magic_enum::case_insensitive)) { - partition_strategy_type - = magic_enum::enum_cast(eighth_arg, magic_enum::case_insensitive).value(); + partition_strategy_type = magic_enum::enum_cast(eighth_arg, magic_enum::case_insensitive).value(); } else { @@ -825,6 +814,26 @@ void StorageAzureConfiguration::initializeFromParsedArguments(const AzureStorage StorageObjectStorageConfiguration::initializeFromParsedArguments(parsed_arguments); blob_path = parsed_arguments.blob_path; connection_params = parsed_arguments.connection_params; + account_name = parsed_arguments.account_name; + account_key = parsed_arguments.account_key; + client_id = parsed_arguments.client_id; + tenant_id = parsed_arguments.tenant_id; +} + +ASTPtr StorageAzureConfiguration::createArgsWithAccessData() const +{ + auto arguments = make_intrusive(); + + arguments->children.push_back(make_intrusive(connection_params.endpoint.storage_account_url)); + arguments->children.push_back(make_intrusive(connection_params.endpoint.container_name)); + arguments->children.push_back(make_intrusive(blob_path.path)); + if (account_name && account_key) + { + arguments->children.push_back(make_intrusive(*account_name)); + arguments->children.push_back(make_intrusive(*account_key)); + } + + return arguments; } void StorageAzureConfiguration::addStructureAndFormatToArgsIfNeeded( @@ -832,13 +841,13 @@ void StorageAzureConfiguration::addStructureAndFormatToArgsIfNeeded( { if (disk) { - if (format == "auto") + if (getFormat() == "auto") { ASTs format_equal_func_args = {make_intrusive("format"), make_intrusive(format_)}; auto format_equal_func = makeASTFunction("equals", std::move(format_equal_func_args)); args.push_back(format_equal_func); } - if (structure == "auto") + if (getStructure() == "auto") { ASTs structure_equal_func_args = {make_intrusive("structure"), make_intrusive(structure_)}; auto structure_equal_func = makeASTFunction("equals", std::move(structure_equal_func_args)); diff --git a/src/Storages/ObjectStorage/Azure/Configuration.h b/src/Storages/ObjectStorage/Azure/Configuration.h index 70e60e502799..41c9795b1a78 100644 --- a/src/Storages/ObjectStorage/Azure/Configuration.h +++ b/src/Storages/ObjectStorage/Azure/Configuration.h @@ -76,6 +76,11 @@ struct AzureStorageParsedArguments : private StorageParsedArguments Path blob_path; AzureBlobStorage::ConnectionParams connection_params; + + std::optional account_name; + std::optional account_key; + std::optional client_id; + std::optional tenant_id; }; class StorageAzureConfiguration : public StorageObjectStorageConfiguration @@ -125,6 +130,7 @@ class StorageAzureConfiguration : public StorageObjectStorageConfiguration onelake_client_secret = client_secret_; onelake_tenant_id = tenant_id_; } + ASTPtr createArgsWithAccessData() const override; protected: void fromDisk(const String & disk_name, ASTs & args, ContextPtr context, bool with_structure) override; @@ -136,14 +142,21 @@ class StorageAzureConfiguration : public StorageObjectStorageConfiguration Path blob_path; Paths blobs_paths; AzureBlobStorage::ConnectionParams connection_params; - DiskPtr disk; + + std::optional account_name; + std::optional account_key; + std::optional client_id; + std::optional tenant_id; String onelake_client_id; String onelake_client_secret; String onelake_tenant_id; + DiskPtr disk; + void initializeFromParsedArguments(const AzureStorageParsedArguments & parsed_arguments); }; + } #endif diff --git a/src/Storages/ObjectStorage/DataLakes/Common/AvroForIcebergDeserializer.cpp b/src/Storages/ObjectStorage/DataLakes/Common/AvroForIcebergDeserializer.cpp index 3c07bbb8ab4f..c27bb2ac7117 100644 --- a/src/Storages/ObjectStorage/DataLakes/Common/AvroForIcebergDeserializer.cpp +++ b/src/Storages/ObjectStorage/DataLakes/Common/AvroForIcebergDeserializer.cpp @@ -15,6 +15,7 @@ #include #include #include +#include namespace DB::ErrorCodes { @@ -22,6 +23,12 @@ namespace DB::ErrorCodes extern const int INCORRECT_DATA; } +namespace ProfileEvents +{ + extern const Event IcebergAvroFileParsing; + extern const Event IcebergAvroFileParsingMicroseconds; +} + namespace DB::Iceberg { @@ -33,6 +40,9 @@ try : buffer(std::move(buffer_)) , manifest_file_path(manifest_file_path_) { + ProfileEvents::increment(ProfileEvents::IcebergAvroFileParsing); + ProfileEventTimeIncrement watch(ProfileEvents::IcebergAvroFileParsingMicroseconds); + auto manifest_file_reader = std::make_unique(std::make_unique(*buffer)); diff --git a/src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h b/src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h index 5697b586a5ea..931bd1901d75 100644 --- a/src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h +++ b/src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h @@ -9,6 +9,7 @@ #include #include +#include #include #include #include @@ -19,11 +20,17 @@ #include #include #include -#include +#include #include #include #include #include +#include +#include +#include +#include +#include + #include #include #include @@ -55,30 +62,32 @@ namespace ErrorCodes namespace DataLakeStorageSetting { - extern DataLakeStorageSettingsDatabaseDataLakeCatalogType storage_catalog_type; - extern DataLakeStorageSettingsString object_storage_endpoint; - extern DataLakeStorageSettingsString storage_aws_access_key_id; - extern DataLakeStorageSettingsString storage_aws_secret_access_key; - extern DataLakeStorageSettingsString storage_region; - extern DataLakeStorageSettingsString storage_aws_role_arn; - extern DataLakeStorageSettingsString storage_aws_role_session_name; - extern DataLakeStorageSettingsString storage_catalog_url; - extern DataLakeStorageSettingsString storage_warehouse; - extern DataLakeStorageSettingsString storage_catalog_credential; - - extern DataLakeStorageSettingsString storage_auth_scope; - extern DataLakeStorageSettingsString storage_auth_header; - extern DataLakeStorageSettingsString storage_oauth_server_uri; - extern DataLakeStorageSettingsBool storage_oauth_server_use_request_body; + extern const DataLakeStorageSettingsDatabaseDataLakeCatalogType storage_catalog_type; + extern const DataLakeStorageSettingsString object_storage_endpoint; + extern const DataLakeStorageSettingsString storage_aws_access_key_id; + extern const DataLakeStorageSettingsString storage_aws_secret_access_key; + extern const DataLakeStorageSettingsString storage_region; + extern const DataLakeStorageSettingsString storage_aws_role_arn; + extern const DataLakeStorageSettingsString storage_aws_role_session_name; + extern const DataLakeStorageSettingsString storage_catalog_url; + extern const DataLakeStorageSettingsString storage_warehouse; + extern const DataLakeStorageSettingsString storage_catalog_credential; + extern const DataLakeStorageSettingsString storage_auth_scope; + extern const DataLakeStorageSettingsString storage_auth_header; + extern const DataLakeStorageSettingsString storage_oauth_server_uri; + extern const DataLakeStorageSettingsBool storage_oauth_server_use_request_body; + extern const DataLakeStorageSettingsString iceberg_metadata_file_path; } template concept StorageConfiguration = std::derived_from; -template +template class DataLakeConfiguration : public BaseStorageConfiguration, public std::enable_shared_from_this { public: + DataLakeConfiguration() {} + explicit DataLakeConfiguration(DataLakeStorageSettingsPtr settings_) : settings(settings_) {} bool isDataLakeConfiguration() const override { return true; } @@ -92,6 +101,7 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl auto result = BaseStorageConfiguration::getRawPath().path; return StorageObjectStorageConfiguration::Path(result.ends_with('/') ? result : result + "/"); } + void setRawPath(const StorageObjectStorageConfiguration::Path & path) override { BaseStorageConfiguration::setRawPath(path); } void update(ObjectStoragePtr object_storage, ContextPtr local_context) override { @@ -133,13 +143,13 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl bool supportsDelete() const override { - assertInitialized(); + assertInitializedDL(); return current_metadata->supportsDelete(); } bool supportsParallelInsert() const override { - assertInitialized(); + assertInitializedDL(); return current_metadata->supportsParallelInsert(); } @@ -150,25 +160,25 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl std::shared_ptr catalog, const std::optional & format_settings) override { - assertInitialized(); + assertInitializedDL(); current_metadata->mutate(commands, shared_from_this(), context, storage_id, metadata_snapshot, catalog, format_settings); } void checkMutationIsPossible(const MutationCommands & commands) override { - assertInitialized(); + assertInitializedDL(); current_metadata->checkMutationIsPossible(commands); } void checkAlterIsPossible(const AlterCommands & commands) override { - assertInitialized(); + assertInitializedDL(); current_metadata->checkAlterIsPossible(commands); } void alter(const AlterCommands & params, ContextPtr context) override { - assertInitialized(); + assertInitializedDL(); current_metadata->alter(params, context); } @@ -182,7 +192,7 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl std::optional tryGetTableStructureFromMetadata(ContextPtr local_context) const override { - assertInitialized(); + assertInitializedDL(); if (auto schema = current_metadata->getTableSchema(local_context); !schema.empty()) return ColumnsDescription(std::move(schema)); return std::nullopt; @@ -195,7 +205,7 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl std::optional totalRows(ContextPtr local_context) override { - assertInitialized(); + assertInitializedDL(); return current_metadata->totalRows(local_context); } @@ -206,38 +216,38 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl std::optional totalBytes(ContextPtr local_context) override { - assertInitialized(); + assertInitializedDL(); return current_metadata->totalBytes(local_context); } bool isDataSortedBySortingKey(StorageMetadataPtr metadata_snapshot, ContextPtr local_context) const override { - assertInitialized(); + assertInitializedDL(); return current_metadata->isDataSortedBySortingKey(metadata_snapshot, local_context); } std::shared_ptr getInitialSchemaByPath(ContextPtr local_context, ObjectInfoPtr object_info) const override { - assertInitialized(); + assertInitializedDL(); return current_metadata->getInitialSchemaByPath(local_context, object_info); } std::shared_ptr getSchemaTransformer(ContextPtr local_context, ObjectInfoPtr object_info) const override { - assertInitialized(); + assertInitializedDL(); return current_metadata->getSchemaTransformer(local_context, object_info); } std::optional getTableStateSnapshot(ContextPtr context) const override { - assertInitialized(); + assertInitializedDL(); return current_metadata->getTableStateSnapshot(context); } std::unique_ptr buildStorageMetadataFromState( const DataLakeTableStateSnapshot & state, ContextPtr context) const override { - assertInitialized(); + assertInitializedDL(); auto metadata = current_metadata->buildStorageMetadataFromState(state, context); if (metadata) LOG_TEST(log, "Built storage metadata from state with columns: {}", @@ -247,13 +257,13 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl bool shouldReloadSchemaForConsistency(ContextPtr context) const override { - assertInitialized(); + assertInitializedDL(); return current_metadata->shouldReloadSchemaForConsistency(context); } IDataLakeMetadata * getExternalMetadata() override { - assertInitialized(); + assertInitializedDL(); return current_metadata.get(); } @@ -261,7 +271,7 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl bool supportsWrites() const override { - assertInitialized(); + assertInitializedDL(); return current_metadata->supportsWrites(); } @@ -272,7 +282,7 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl StorageMetadataPtr storage_metadata, ContextPtr context) override { - assertInitialized(); + assertInitializedDL(); return current_metadata->iterate(filter_dag, callback, list_batch_size, storage_metadata, context); } @@ -284,7 +294,7 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl /// because the code will be removed ASAP anyway) DeltaLakePartitionColumns getDeltaLakePartitionColumns() const { - assertInitialized(); + assertInitializedDL(); const auto * delta_lake_metadata = dynamic_cast(current_metadata.get()); if (delta_lake_metadata) return delta_lake_metadata->getPartitionColumns(); @@ -294,18 +304,18 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl void modifyFormatSettings(FormatSettings & settings_, const Context & local_context) const override { - assertInitialized(); + assertInitializedDL(); current_metadata->modifyFormatSettings(settings_, local_context); } ColumnMapperPtr getColumnMapperForObject(ObjectInfoPtr object_info) const override { - assertInitialized(); + assertInitializedDL(); return current_metadata->getColumnMapperForObject(object_info); } ColumnMapperPtr getColumnMapperForCurrentSchema(StorageMetadataPtr storage_metadata_snapshot, ContextPtr context) const override { - assertInitialized(); + assertInitializedDL(); return current_metadata->getColumnMapperForCurrentSchema(storage_metadata_snapshot, context); } @@ -376,7 +386,7 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl bool optimize(const StorageMetadataPtr & metadata_snapshot, ContextPtr context, const std::optional & format_settings) override { - assertInitialized(); + assertInitializedDL(); return current_metadata->optimize(metadata_snapshot, context, format_settings); } @@ -404,6 +414,44 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl #endif } + bool isClusterSupported() const override { return is_cluster_supported; } + + ASTPtr createArgsWithAccessData() const override + { + auto res = BaseStorageConfiguration::createArgsWithAccessData(); + + auto iceberg_metadata_file_path = (*settings)[DataLakeStorageSetting::iceberg_metadata_file_path]; + + if (iceberg_metadata_file_path.changed) + { + auto * arguments = res->template as(); + if (!arguments) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Arguments are not an expression list"); + + bool has_settings = false; + + for (auto & arg : arguments->children) + { + if (auto * settings_ast = arg->template as()) + { + has_settings = true; + settings_ast->changes.setSetting("iceberg_metadata_file_path", iceberg_metadata_file_path.value); + break; + } + } + + if (!has_settings) + { + boost::intrusive_ptr settings_ast = make_intrusive(); + settings_ast->is_standalone = false; + settings_ast->changes.setSetting("iceberg_metadata_file_path", iceberg_metadata_file_path.value); + arguments->children.push_back(settings_ast); + } + } + + return res; + } + private: const DataLakeStorageSettingsPtr settings; ObjectStoragePtr ready_object_storage; @@ -421,8 +469,9 @@ class DataLakeConfiguration : public BaseStorageConfiguration, public std::enabl } } - void assertInitialized() const + void assertInitializedDL() const { + BaseStorageConfiguration::assertInitialized(); if (!current_metadata) throw Exception(ErrorCodes::LOGICAL_ERROR, "Metadata is not initialized"); } @@ -456,18 +505,388 @@ using StorageS3IcebergConfiguration = DataLakeConfiguration; #endif -#if USE_AZURE_BLOB_STORAGE +# if USE_AZURE_BLOB_STORAGE using StorageAzureIcebergConfiguration = DataLakeConfiguration; using StorageAzurePaimonConfiguration = DataLakeConfiguration; #endif -#if USE_HDFS +# if USE_HDFS using StorageHDFSIcebergConfiguration = DataLakeConfiguration; using StorageHDFSPaimonConfiguration = DataLakeConfiguration; #endif using StorageLocalIcebergConfiguration = DataLakeConfiguration; -using StorageLocalPaimonConfiguration = DataLakeConfiguration; +using StorageLocalPaimonConfiguration = DataLakeConfiguration; + +/// Class detects storage type by `storage_type` parameter if exists +/// and uses appropriate implementation - S3, Azure, HDFS or Local +class StorageIcebergConfiguration : public StorageObjectStorageConfiguration, public std::enable_shared_from_this +{ + friend class StorageObjectStorageConfiguration; + +public: + StorageIcebergConfiguration() {} + + explicit StorageIcebergConfiguration(DataLakeStorageSettingsPtr settings_) : settings(settings_) {} + + void initialize( + ASTs & engine_args, + ContextPtr local_context, + bool with_table_structure, + const StorageID * table_id = nullptr) override + { + createDynamicConfiguration(engine_args, local_context); + getImpl().initialize(engine_args, local_context, with_table_structure, table_id); + } + + ObjectStorageType getType() const override { return getImpl().getType(); } + + std::string getTypeName() const override { return getImpl().getTypeName(); } + std::string getEngineName() const override { return getImpl().getEngineName(); } + std::string getNamespaceType() const override { return getImpl().getNamespaceType(); } + + Path getRawPath() const override { return getImpl().getRawPath(); } + void setRawPath(const Path & path) override { getImpl().setRawPath(path); } + const String & getRawURI() const override { return getImpl().getRawURI(); } + const Path & getPathForRead() const override { return getImpl().getPathForRead(); } + Path getPathForWrite(const std::string & partition_id) const override { return getImpl().getPathForWrite(partition_id); } + + void setPathForRead(const Path & path) override { getImpl().setPathForRead(path); } + + const Paths & getPaths() const override { return getImpl().getPaths(); } + void setPaths(const Paths & paths) override { getImpl().setPaths(paths); } + + String getDataSourceDescription() const override { return getImpl().getDataSourceDescription(); } + String getNamespace() const override { return getImpl().getNamespace(); } + + StorageObjectStorageQuerySettings getQuerySettings(const ContextPtr & context) const override + { return getImpl().getQuerySettings(context); } + + void addStructureAndFormatToArgsIfNeeded( + ASTs & args, const String & structure_, const String & format_, ContextPtr context, bool with_structure) override + { getImpl().addStructureAndFormatToArgsIfNeeded(args, structure_, format_, context, with_structure); } + + bool isNamespaceWithGlobs() const override { return getImpl().isNamespaceWithGlobs(); } + + bool isArchive() const override { return getImpl().isArchive(); } + bool isPathInArchiveWithGlobs() const override { return getImpl().isPathInArchiveWithGlobs(); } + std::string getPathInArchive() const override { return getImpl().getPathInArchive(); } + + void check(ContextPtr context) override { getImpl().check(context); } + void validateNamespace(const String & name) const override { getImpl().validateNamespace(name); } + + ObjectStoragePtr createObjectStorage(ContextPtr context, bool is_readonly, CredentialsConfigurationCallback refresh_credentials_callback) override + { return getImpl().createObjectStorage(context, is_readonly, refresh_credentials_callback); } + bool isStaticConfiguration() const override { return getImpl().isStaticConfiguration(); } + + bool isDataLakeConfiguration() const override { return getImpl().isDataLakeConfiguration(); } + + bool supportsTotalRows(ContextPtr context, ObjectStorageType storage_type) const override { return getImpl().supportsTotalRows(context, storage_type); } + std::optional totalRows(ContextPtr context) override { return getImpl().totalRows(context); } + bool supportsTotalBytes(ContextPtr context, ObjectStorageType storage_type) const override { return getImpl().supportsTotalBytes(context, storage_type); } + std::optional totalBytes(ContextPtr context) override { return getImpl().totalBytes(context); } + bool isDataSortedBySortingKey(StorageMetadataPtr storage_metadata, ContextPtr context) const override + { return getImpl().isDataSortedBySortingKey(storage_metadata, context); } + + IDataLakeMetadata * getExternalMetadata() override { return getImpl().getExternalMetadata(); } + + std::shared_ptr getInitialSchemaByPath(ContextPtr context, ObjectInfoPtr object_info) const override + { return getImpl().getInitialSchemaByPath(context, object_info); } + + std::shared_ptr getSchemaTransformer(ContextPtr context, ObjectInfoPtr object_info) const override + { return getImpl().getSchemaTransformer(context, object_info); } + + void modifyFormatSettings(FormatSettings & settings_, const Context & context) const override + { getImpl().modifyFormatSettings(settings_, context); } + + void addDeleteTransformers( + ObjectInfoPtr object_info, + QueryPipelineBuilder & builder, + const std::optional & format_settings, + FormatParserSharedResourcesPtr parser_shared_resources, + ContextPtr local_context) const override + { getImpl().addDeleteTransformers(object_info, builder, format_settings, parser_shared_resources, local_context); } + + ReadFromFormatInfo prepareReadingFromFormat( + ObjectStoragePtr object_storage, + const Strings & requested_columns, + const StorageSnapshotPtr & storage_snapshot, + bool supports_subset_of_columns, + bool supports_tuple_elements, + ContextPtr local_context, + const PrepareReadingFromFormatHiveParams & hive_parameters) override + { + return getImpl().prepareReadingFromFormat( + object_storage, + requested_columns, + storage_snapshot, + supports_subset_of_columns, + supports_tuple_elements, + local_context, + hive_parameters); + } + + void setSchemaHash(const String & hash) override { getImpl().setSchemaHash(hash); } + + void initPartitionStrategy(ASTPtr partition_by, const ColumnsDescription & columns, ContextPtr context) override + { getImpl().initPartitionStrategy(partition_by, columns, context); } + + std::optional getTableStateSnapshot(ContextPtr local_context) const override { return getImpl().getTableStateSnapshot(local_context); } + std::unique_ptr buildStorageMetadataFromState(const DataLakeTableStateSnapshot & state, ContextPtr local_context) const override + { return getImpl().buildStorageMetadataFromState(state, local_context); } + bool shouldReloadSchemaForConsistency(ContextPtr local_context) const override { return getImpl().shouldReloadSchemaForConsistency(local_context); } + std::optional tryGetTableStructureFromMetadata(ContextPtr local_context) const override + { return getImpl().tryGetTableStructureFromMetadata(local_context); } + + bool supportsFileIterator() const override { return getImpl().supportsFileIterator(); } + bool supportsParallelInsert() const override { return getImpl().supportsParallelInsert(); } + bool supportsWrites() const override { return getImpl().supportsWrites(); } + + bool supportsPartialPathPrefix() const override { return getImpl().supportsPartialPathPrefix(); } + + ObjectIterator iterate( + const ActionsDAG * filter_dag, + IDataLakeMetadata::FileProgressCallback callback, + size_t list_batch_size, + StorageMetadataPtr storage_metadata, + ContextPtr context) override + { + return getImpl().iterate(filter_dag, callback, list_batch_size, storage_metadata, context); + } + + void update( + ObjectStoragePtr object_storage_ptr, + ContextPtr context) override + { + getImpl().update(object_storage_ptr, context); + } + void lazyInitializeIfNeeded(ObjectStoragePtr object_storage, ContextPtr local_context) override + { return getImpl().lazyInitializeIfNeeded(object_storage, local_context); } + + void create( + ObjectStoragePtr object_storage, + ContextPtr local_context, + const std::optional & columns, + ASTPtr partition_by, + ASTPtr order_by, + bool if_not_exists, + std::shared_ptr catalog, + const StorageID & table_id_) override + { + getImpl().create(object_storage, local_context, columns, partition_by, order_by, if_not_exists, catalog, table_id_); + } + + SinkToStoragePtr write( + SharedHeader sample_block, + const StorageID & table_id, + ObjectStoragePtr object_storage, + const std::optional & format_settings, + ContextPtr context, + std::shared_ptr catalog) override + { + return getImpl().write(sample_block, table_id, object_storage, format_settings, context, catalog); + } + + bool supportsDelete() const override { return getImpl().supportsDelete(); } + void mutate(const MutationCommands & commands, + ContextPtr context, + const StorageID & storage_id, + StorageMetadataPtr metadata_snapshot, + std::shared_ptr catalog, + const std::optional & format_settings) override + { + getImpl().mutate(commands, context, storage_id, metadata_snapshot, catalog, format_settings); + } + void checkMutationIsPossible(const MutationCommands & commands) override { getImpl().checkMutationIsPossible(commands); } + + void checkAlterIsPossible(const AlterCommands & commands) override { getImpl().checkAlterIsPossible(commands); } + + void alter(const AlterCommands & params, ContextPtr context) override { getImpl().alter(params, context); } + + const DataLakeStorageSettings & getDataLakeSettings() const override { return getImpl().getDataLakeSettings(); } + + ASTPtr createArgsWithAccessData() const override + { + return getImpl().createArgsWithAccessData(); + } + + void fromNamedCollection(const NamedCollection & collection, ContextPtr context) override + { getImpl().fromNamedCollection(collection, context); } + void fromAST(ASTs & args, ContextPtr context, bool with_structure) override + { getImpl().fromAST(args, context, with_structure); } + void fromDisk(const String & disk_name, ASTs & args, ContextPtr context, bool with_structure) override + { getImpl().fromDisk(disk_name, args, context, with_structure); } + + /// Find storage_type argument and remove it from args if exists. + /// Return storage type. + ObjectStorageType extractDynamicStorageType(ASTs & args, ContextPtr context, ASTPtr * type_arg, bool cluster_name_first) const override + { + static const auto * const storage_type_name = "storage_type"; + + { + auto args_copy = args; + if (cluster_name_first) + { + // Remove cluster name from args to avoid confusing cluster name and named collection name + args_copy.erase(args_copy.begin()); + } + + if (auto named_collection = tryGetNamedCollectionWithOverrides(args_copy, context)) + { + if (named_collection->has(storage_type_name)) + { + return objectStorageTypeFromString(named_collection->get(storage_type_name)); + } + } + } + + auto type_it = args.end(); + + /// S3 by default for backward compatibility + /// Iceberg without storage_type == IcebergS3 + ObjectStorageType type = ObjectStorageType::S3; + + for (auto arg_it = args.begin(); arg_it != args.end(); ++arg_it) + { + const auto * type_ast_function = (*arg_it)->as(); + + if (type_ast_function && type_ast_function->name == "equals" + && type_ast_function->arguments && type_ast_function->arguments->children.size() == 2) + { + auto * name = type_ast_function->arguments->children[0]->as(); + + if (name && name->name() == storage_type_name) + { + if (type_it != args.end()) + { + throw Exception( + ErrorCodes::BAD_ARGUMENTS, + "DataLake can have only one key-value argument: storage_type='type'."); + } + + auto * value = type_ast_function->arguments->children[1]->as(); + + if (!value) + { + throw Exception( + ErrorCodes::BAD_ARGUMENTS, + "DataLake parameter 'storage_type' has wrong type, string literal expected."); + } + + if (value->value.getType() != Field::Types::String) + { + throw Exception( + ErrorCodes::BAD_ARGUMENTS, + "DataLake parameter 'storage_type' has wrong value type, string expected."); + } + + type = objectStorageTypeFromString(value->value.safeGet()); + + type_it = arg_it; + } + } + } + + if (type_it != args.end()) + { + if (type_arg) + *type_arg = *type_it; + args.erase(type_it); + } + + return type; + } + + const String & getFormat() const override { return getImpl().getFormat(); } + const String & getCompressionMethod() const override { return getImpl().getCompressionMethod(); } + const String & getStructure() const override { return getImpl().getStructure(); } + + PartitionStrategyFactory::StrategyType getPartitionStrategyType() const override { return getImpl().getPartitionStrategyType(); } + bool getPartitionColumnsInDataFile() const override { return getImpl().getPartitionColumnsInDataFile(); } + std::shared_ptr getPartitionStrategy() const override { return getImpl().getPartitionStrategy(); } + + void setFormat(const String & format_) override { getImpl().setFormat(format_); } + void setCompressionMethod(const String & compression_method_) override { getImpl().setCompressionMethod(compression_method_); } + void setStructure(const String & structure_) override { getImpl().setStructure(structure_); } + + void setPartitionStrategyType(PartitionStrategyFactory::StrategyType partition_strategy_type_) override + { getImpl().setPartitionStrategyType(partition_strategy_type_); } + void setPartitionColumnsInDataFile(bool partition_columns_in_data_file_) override + { getImpl().setPartitionColumnsInDataFile(partition_columns_in_data_file_); } + void setPartitionStrategy(const std::shared_ptr & partition_strategy_) override + { getImpl().setPartitionStrategy(partition_strategy_); } + + void assertInitialized() const override { getImpl().assertInitialized(); } + + ColumnMapperPtr getColumnMapperForObject(ObjectInfoPtr obj) const override { return getImpl().getColumnMapperForObject(obj); } + + ColumnMapperPtr getColumnMapperForCurrentSchema(StorageMetadataPtr storage_metadata_snapshot, ContextPtr context) const override + { return getImpl().getColumnMapperForCurrentSchema(storage_metadata_snapshot, context); } + + std::shared_ptr getCatalog(ContextPtr context, bool is_attach) const override + { return getImpl().getCatalog(context, is_attach); } + + bool optimize(const StorageMetadataPtr & metadata_snapshot, ContextPtr context, const std::optional & format_settings) override + { return getImpl().optimize(metadata_snapshot, context, format_settings); } + + bool supportsPrewhere() const override { return getImpl().supportsPrewhere(); } + + void drop(ContextPtr context) override { getImpl().drop(context); } + +protected: + void createDynamicConfiguration(ASTs & args, ContextPtr context) + { + ObjectStorageType type = extractDynamicStorageType(args, context, nullptr, false); + createDynamicStorage(type); + } + +private: + inline StorageObjectStorageConfiguration & getImpl() const + { + if (!impl) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Dynamic DataLake storage not initialized"); + + return *impl; + } + + void createDynamicStorage(ObjectStorageType type) + { + if (impl) + { + if (impl->getType() == type) + return; + + throw Exception(ErrorCodes::LOGICAL_ERROR, "Can't change datalake engine storage"); + } + + switch (type) + { +# if USE_AWS_S3 + case ObjectStorageType::S3: + impl = std::make_unique(settings); + break; +# endif +# if USE_AZURE_BLOB_STORAGE + case ObjectStorageType::Azure: + impl = std::make_unique(settings); + break; +# endif +# if USE_HDFS + case ObjectStorageType::HDFS: + impl = std::make_unique(settings); + break; +# endif + case ObjectStorageType::Local: + impl = std::make_unique(settings); + break; + default: + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unsuported DataLake storage {}", type); + } + } + + StorageObjectStorageConfigurationPtr impl; + DataLakeStorageSettingsPtr settings; +}; #endif #if USE_PARQUET @@ -479,7 +898,7 @@ using StorageS3DeltaLakeConfiguration = DataLakeConfiguration; #endif -using StorageLocalDeltaLakeConfiguration = DataLakeConfiguration; +using StorageLocalDeltaLakeConfiguration = DataLakeConfiguration; #endif diff --git a/src/Storages/ObjectStorage/DataLakes/DataLakeStorageSettings.h b/src/Storages/ObjectStorage/DataLakes/DataLakeStorageSettings.h index 560b4fb88ebc..59ece70828ae 100644 --- a/src/Storages/ObjectStorage/DataLakes/DataLakeStorageSettings.h +++ b/src/Storages/ObjectStorage/DataLakes/DataLakeStorageSettings.h @@ -60,6 +60,9 @@ The period in milliseconds to asynchronously prefetch the latest metadata snapsh )", 0) \ DECLARE(Bool, iceberg_use_version_hint, false, R"( Get latest metadata path from version-hint.text file. +)", 0) \ + DECLARE(String, object_storage_cluster, "", R"( +Cluster for distributed requests )", 0) \ DECLARE(NonZeroUInt64, iceberg_format_version, 2, R"( Metadata format version. diff --git a/src/Storages/ObjectStorage/DataLakes/DeltaLakeMetadataDeltaKernel.cpp b/src/Storages/ObjectStorage/DataLakes/DeltaLakeMetadataDeltaKernel.cpp index 201f2ca83f09..19d17b07dcc9 100644 --- a/src/Storages/ObjectStorage/DataLakes/DeltaLakeMetadataDeltaKernel.cpp +++ b/src/Storages/ObjectStorage/DataLakes/DeltaLakeMetadataDeltaKernel.cpp @@ -116,7 +116,7 @@ DeltaLakeMetadataDeltaKernel::DeltaLakeMetadataDeltaKernel( : log(getLogger("DeltaLakeMetadata")) , kernel_helper(DB::getKernelHelper(configuration_.lock(), object_storage_)) , object_storage(object_storage_) - , format_name(configuration_.lock()->format) + , format_name(configuration_.lock()->getFormat()) /// TODO: Supports size limit, not just elements limit. /// TODO: Support weight function (by default weight = 1 for all elements). /// TODO: Add a setting for cache size. @@ -642,8 +642,8 @@ SinkToStoragePtr DeltaLakeMetadataDeltaKernel::write( context, sample_block, format_settings, - configuration->format, - configuration->compression_method); + configuration->getFormat(), + configuration->getCompressionMethod()); } return std::make_shared( @@ -653,8 +653,8 @@ SinkToStoragePtr DeltaLakeMetadataDeltaKernel::write( context, sample_block, format_settings, - configuration->format, - configuration->compression_method); + configuration->getFormat(), + configuration->getCompressionMethod()); } void DeltaLakeMetadataDeltaKernel::logMetadataFiles(ContextPtr context) const diff --git a/src/Storages/ObjectStorage/DataLakes/HudiMetadata.cpp b/src/Storages/ObjectStorage/DataLakes/HudiMetadata.cpp index 81fd17be94ac..9869fd20c315 100644 --- a/src/Storages/ObjectStorage/DataLakes/HudiMetadata.cpp +++ b/src/Storages/ObjectStorage/DataLakes/HudiMetadata.cpp @@ -89,11 +89,11 @@ HudiMetadata::HudiMetadata(ObjectStoragePtr object_storage_, StorageObjectStorag : WithContext(context_) , object_storage(object_storage_) , table_path(configuration_->getPathForRead().path) - , format(configuration_->format) + , format(configuration_->getFormat()) { } -Strings HudiMetadata::getDataFiles(const ActionsDAG *) const +Strings HudiMetadata::getDataFiles() const { if (data_files.empty()) data_files = getDataFilesImpl(); @@ -101,13 +101,13 @@ Strings HudiMetadata::getDataFiles(const ActionsDAG *) const } ObjectIterator HudiMetadata::iterate( - const ActionsDAG * filter_dag, + const ActionsDAG * /* filter_dag */, FileProgressCallback callback, size_t /* list_batch_size */, StorageMetadataPtr /* storage_metadata_snapshot*/, ContextPtr /* context */) const { - return createKeysIterator(getDataFiles(filter_dag), object_storage, callback); + return createKeysIterator(getDataFiles(), object_storage, callback); } } diff --git a/src/Storages/ObjectStorage/DataLakes/HudiMetadata.h b/src/Storages/ObjectStorage/DataLakes/HudiMetadata.h index d2700f405fc8..b941a84a3747 100644 --- a/src/Storages/ObjectStorage/DataLakes/HudiMetadata.h +++ b/src/Storages/ObjectStorage/DataLakes/HudiMetadata.h @@ -65,7 +65,7 @@ class HudiMetadata final : public IDataLakeMetadata, private WithContext mutable Strings data_files; Strings getDataFilesImpl() const; - Strings getDataFiles(const ActionsDAG * filter_dag) const; + Strings getDataFiles() const; }; } diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/ChunkPartitioner.cpp b/src/Storages/ObjectStorage/DataLakes/Iceberg/ChunkPartitioner.cpp index 26a2256fe5f8..c61c566c1744 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/ChunkPartitioner.cpp +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/ChunkPartitioner.cpp @@ -18,6 +18,7 @@ namespace DB namespace Setting { extern const SettingsUInt64 iceberg_insert_max_partitions; + extern const SettingsTimezone iceberg_partition_timezone; } namespace ErrorCodes @@ -53,7 +54,7 @@ ChunkPartitioner::ChunkPartitioner( auto & factory = FunctionFactory::instance(); - auto transform_and_argument = Iceberg::parseTransformAndArgument(transform_name); + auto transform_and_argument = Iceberg::parseTransformAndArgument(transform_name, context->getSettingsRef()[Setting::iceberg_partition_timezone]); if (!transform_and_argument) throw Exception(ErrorCodes::BAD_ARGUMENTS, "Unknown transform {}", transform_name); @@ -67,6 +68,7 @@ ChunkPartitioner::ChunkPartitioner( result_data_types.push_back(function->getReturnType(columns_for_function)); functions.push_back(function); function_params.push_back(transform_and_argument->argument); + function_time_zones.push_back(transform_and_argument->time_zone); columns_to_apply.push_back(column_name); } } @@ -105,6 +107,14 @@ ChunkPartitioner::partitionChunk(const Chunk & chunk) arguments.push_back(ColumnWithTypeAndName(const_column->clone(), type, "#")); } arguments.push_back(name_to_column[columns_to_apply[transform_ind]]); + if (function_time_zones[transform_ind].has_value()) + { + auto type = std::make_shared(); + auto column_value = ColumnString::create(); + column_value->insert(*function_time_zones[transform_ind]); + auto const_column = ColumnConst::create(std::move(column_value), chunk.getNumRows()); + arguments.push_back(ColumnWithTypeAndName(const_column->clone(), type, "PartitioningTimezone")); + } auto result = functions[transform_ind]->build(arguments)->execute(arguments, std::make_shared(), chunk.getNumRows(), false); functions_columns.push_back(result); diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/ChunkPartitioner.h b/src/Storages/ObjectStorage/DataLakes/Iceberg/ChunkPartitioner.h index 4c42e037174a..f77b27a1b15e 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/ChunkPartitioner.h +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/ChunkPartitioner.h @@ -39,6 +39,7 @@ class ChunkPartitioner std::vector functions; std::vector> function_params; + std::vector> function_time_zones; std::vector columns_to_apply; std::vector result_data_types; diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp b/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp index 88207f8f253b..4b1fe96eb30c 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp @@ -180,13 +180,14 @@ Iceberg::PersistentTableComponents IcebergMetadata::initializePersistentTableCom } } return PersistentTableComponents{ - .schema_processor = std::make_shared(), + .schema_processor = std::make_shared(context_), .metadata_cache = cache_ptr, .format_version = format_version, .table_location = table_location, .metadata_compression_method = compression_method, .table_path = configuration->getPathForRead().path, .table_uuid = table_uuid, + .common_namespace = configuration->getNamespace(), }; } @@ -213,7 +214,7 @@ IcebergMetadata::IcebergMetadata( , object_storage(std::move(object_storage_)) , persistent_components(initializePersistentTableComponents(configuration_, cache_ptr, context_)) , data_lake_settings(configuration_->getDataLakeSettings()) - , write_format(configuration_->format) + , write_format(configuration_->getFormat()) { /// TODO: for now it's okay to start/stop the task via constructor/destructor. Once refactored, we'd need to plumb startup/shutdown and schedule the task from there if (cache_ptr && data_lake_settings[DataLakeStorageSetting::iceberg_metadata_async_prefetch_period_ms] != 0) @@ -288,13 +289,16 @@ void IcebergMetadata::backgroundMetadataPrefetcherThread() } Int32 IcebergMetadata::parseTableSchema( - const Poco::JSON::Object::Ptr & metadata_object, IcebergSchemaProcessor & schema_processor, LoggerPtr metadata_logger) + const Poco::JSON::Object::Ptr & metadata_object, + IcebergSchemaProcessor & schema_processor, + ContextPtr context_, + LoggerPtr metadata_logger) { const auto format_version = metadata_object->getValue(f_format_version); if (format_version == 2) { auto [schema, current_schema_id] = parseTableSchemaV2Method(metadata_object); - schema_processor.addIcebergTableSchema(schema); + schema_processor.addIcebergTableSchema(schema, context_); return current_schema_id; } else @@ -302,7 +306,7 @@ Int32 IcebergMetadata::parseTableSchema( try { auto [schema, current_schema_id] = parseTableSchemaV1Method(metadata_object); - schema_processor.addIcebergTableSchema(schema); + schema_processor.addIcebergTableSchema(schema, context_); return current_schema_id; } catch (const Exception & first_error) @@ -312,7 +316,7 @@ Int32 IcebergMetadata::parseTableSchema( try { auto [schema, current_schema_id] = parseTableSchemaV2Method(metadata_object); - schema_processor.addIcebergTableSchema(schema); + schema_processor.addIcebergTableSchema(schema, context_); LOG_WARNING( metadata_logger, "Iceberg table schema was parsed using v2 specification, but it was impossible to parse it using v1 " @@ -336,7 +340,10 @@ Int32 IcebergMetadata::parseTableSchema( } Poco::JSON::Object::Ptr traverseMetadataAndFindNecessarySnapshotObject( - Poco::JSON::Object::Ptr metadata_object, Int64 snapshot_id, IcebergSchemaProcessorPtr schema_processor) + Poco::JSON::Object::Ptr metadata_object, + Int64 snapshot_id, + IcebergSchemaProcessorPtr schema_processor, + ContextPtr local_context) { if (!metadata_object->has(f_snapshots)) throw Exception(ErrorCodes::ICEBERG_SPECIFICATION_VIOLATION, "No snapshot set found in metadata for iceberg file"); @@ -344,7 +351,7 @@ Poco::JSON::Object::Ptr traverseMetadataAndFindNecessarySnapshotObject( for (UInt32 j = 0; j < schemas->size(); ++j) { auto schema = schemas->getObject(j); - schema_processor->addIcebergTableSchema(schema); + schema_processor->addIcebergTableSchema(schema, local_context); } Poco::JSON::Object::Ptr current_snapshot = nullptr; auto snapshots = metadata_object->get(f_snapshots).extract(); @@ -413,7 +420,11 @@ IcebergDataSnapshotPtr IcebergMetadata::createIcebergDataSnapshotFromSnapshotJSO IcebergDataSnapshotPtr IcebergMetadata::getIcebergDataSnapshot(Poco::JSON::Object::Ptr metadata_object, Int64 snapshot_id, ContextPtr local_context) const { - auto object = traverseMetadataAndFindNecessarySnapshotObject(metadata_object, snapshot_id, persistent_components.schema_processor); + auto object = traverseMetadataAndFindNecessarySnapshotObject( + metadata_object, + snapshot_id, + persistent_components.schema_processor, + local_context); if (!object) throw Exception(ErrorCodes::ICEBERG_SPECIFICATION_VIOLATION, "No snapshot found for id `{}`", snapshot_id); @@ -498,7 +509,7 @@ IcebergMetadata::getStateImpl(const ContextPtr & local_context, Poco::JSON::Obje } else { - auto schema_id = parseTableSchema(metadata_object, *persistent_components.schema_processor, log); + auto schema_id = parseTableSchema(metadata_object, *persistent_components.schema_processor, local_context, log); if (!metadata_object->has(f_current_snapshot_id)) { return {nullptr, schema_id}; @@ -521,9 +532,10 @@ IcebergMetadata::getState(const ContextPtr & local_context, const String & metad auto metadata_object = getMetadataJSONObject( metadata_path, object_storage, persistent_components.metadata_cache, local_context, log, persistent_components.metadata_compression_method, persistent_components.table_uuid); + auto dump_metadata = [&]()->String { return dumpMetadataObjectToString(metadata_object); }; insertRowToLogTable( local_context, - dumpMetadataObjectToString(metadata_object), + dump_metadata, DB::IcebergMetadataLogLevel::Metadata, persistent_components.table_path, metadata_path, @@ -551,14 +563,16 @@ std::shared_ptr IcebergMetadata::getInitialSchemaByPath(Conte : nullptr; } -std::shared_ptr IcebergMetadata::getSchemaTransformer(ContextPtr, ObjectInfoPtr object_info) const +std::shared_ptr IcebergMetadata::getSchemaTransformer(ContextPtr context_, ObjectInfoPtr object_info) const { IcebergDataObjectInfo * iceberg_object_info = dynamic_cast(object_info.get()); if (!iceberg_object_info) return nullptr; return (iceberg_object_info->info.underlying_format_read_schema_id != iceberg_object_info->info.schema_id_relevant_to_iterator) ? persistent_components.schema_processor->getSchemaTransformationDagByIds( - iceberg_object_info->info.underlying_format_read_schema_id, iceberg_object_info->info.schema_id_relevant_to_iterator) + context_, + iceberg_object_info->info.underlying_format_read_schema_id, + iceberg_object_info->info.schema_id_relevant_to_iterator) : nullptr; } @@ -805,7 +819,7 @@ Iceberg::IcebergDataSnapshotPtr IcebergMetadata::getRelevantDataSnapshotFromTabl if (!table_state_snapshot.snapshot_id.has_value()) return nullptr; Poco::JSON::Object::Ptr snapshot_object = traverseMetadataAndFindNecessarySnapshotObject( - metadata_object, *table_state_snapshot.snapshot_id, persistent_components.schema_processor); + metadata_object, *table_state_snapshot.snapshot_id, persistent_components.schema_processor, local_context); return createIcebergDataSnapshotFromSnapshotJSON(snapshot_object, *table_state_snapshot.snapshot_id, local_context); } diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.h b/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.h index f53ebd0d2659..394cc2d62b84 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.h +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.h @@ -78,7 +78,10 @@ class IcebergMetadata : public IDataLakeMetadata std::shared_ptr getSchemaTransformer(ContextPtr local_context, ObjectInfoPtr object_info) const override; static Int32 parseTableSchema( - const Poco::JSON::Object::Ptr & metadata_object, Iceberg::IcebergSchemaProcessor & schema_processor, LoggerPtr metadata_logger); + const Poco::JSON::Object::Ptr & metadata_object, + Iceberg::IcebergSchemaProcessor & schema_processor, + ContextPtr context_, + LoggerPtr metadata_logger); bool supportsUpdate() const override { return true; } bool supportsWrites() const override { return true; } diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp b/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp index bd557f8c6197..4f4acb4974bc 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp @@ -632,7 +632,7 @@ IcebergStorageSink::IcebergStorageSink( , table_id(table_id_) , persistent_table_components(persistent_table_components_) , data_lake_settings(configuration_->getDataLakeSettings()) - , write_format(configuration_->format) + , write_format(configuration_->getFormat()) , blob_storage_type_name(configuration_->getTypeName()) , blob_storage_namespace_name(configuration_->getNamespace()) { diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFileIterator.cpp b/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFileIterator.cpp index 9142b458d816..e87819d2e73b 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFileIterator.cpp +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFileIterator.cpp @@ -6,6 +6,7 @@ #include #include +#include #include #include @@ -14,6 +15,7 @@ #include #include +#include #include #include #include @@ -34,6 +36,11 @@ namespace DB::ErrorCodes extern const int BAD_ARGUMENTS; } +namespace DB::Setting +{ + extern const SettingsTimezone iceberg_partition_timezone; +} + namespace ProfileEvents { extern const Event IcebergPartitionPrunedFiles; @@ -231,9 +238,10 @@ std::shared_ptr ManifestFileIterator::create( std::shared_ptr filter_dag_, Int32 table_snapshot_schema_id_) { + auto dump_metadata = [&]()->String { return manifest_file_deserializer_->getMetadataContent(); }; insertRowToLogTable( context_, - manifest_file_deserializer_->getMetadataContent(), + dump_metadata, DB::IcebergMetadataLogLevel::ManifestFileMetadata, common_path_, path_to_manifest_file_, @@ -278,7 +286,7 @@ std::shared_ptr ManifestFileIterator::create( const Poco::JSON::Object::Ptr & schema_object = json.extract(); Int32 manifest_schema_id = schema_object->getValue(f_schema_id); - schema_processor.addIcebergTableSchema(schema_object); + schema_processor.addIcebergTableSchema(schema_object, context_); PartitionSpecification partition_spec_vec; for (size_t i = 0; i != partition_specification->size(); ++i) @@ -296,7 +304,7 @@ std::shared_ptr ManifestFileIterator::create( auto transform_name = partition_specification_field->getValue(f_partition_transform); auto partition_name = partition_specification_field->getValue(f_partition_name); partition_spec_vec.emplace_back(source_id, transform_name, partition_name); - auto partition_ast = getASTFromTransform(transform_name, numeric_column_name); + auto partition_ast = getASTFromTransform(transform_name, numeric_column_name, context_->getSettingsRef()[Setting::iceberg_partition_timezone]); /// Unsupported partition key expression if (partition_ast == nullptr) continue; @@ -383,9 +391,10 @@ ProcessedManifestFileEntryPtr ManifestFileIterator::processRow(size_t row_index) if (parsed_entry->status == ManifestEntryStatus::DELETED) { + auto dump_metadata = [&]()->String { return manifest_file_deserializer->getContent(row_index); }; insertRowToLogTable( context, - manifest_file_deserializer->getContent(row_index), + dump_metadata, DB::IcebergMetadataLogLevel::ManifestFileEntry, common_path, path_to_manifest_file, @@ -497,9 +506,10 @@ ProcessedManifestFileEntryPtr ManifestFileIterator::processRow(size_t row_index) const ManifestFilesPruner * current_pruner = getOrCreatePruner(entry->resolved_schema_id); pruning_status = current_pruner->canBePruned(entry, hyperrectangles); } + auto dump_metadata = [&]()->String { return manifest_file_deserializer->getContent(row_index); }; insertRowToLogTable( context, - manifest_file_deserializer->getContent(row_index), + dump_metadata, DB::IcebergMetadataLogLevel::ManifestFileEntry, common_path, path_to_manifest_file, diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFilesPruning.cpp b/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFilesPruning.cpp index 3e85798df6fa..8d5f11271529 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFilesPruning.cpp +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFilesPruning.cpp @@ -27,9 +27,9 @@ using namespace DB; namespace DB::Iceberg { -DB::ASTPtr getASTFromTransform(const String & transform_name_src, const String & column_name) +DB::ASTPtr getASTFromTransform(const String & transform_name_src, const String & column_name, const String & time_zone) { - auto transform_and_argument = parseTransformAndArgument(transform_name_src); + auto transform_and_argument = parseTransformAndArgument(transform_name_src, time_zone); if (!transform_and_argument) { LOG_WARNING(&Poco::Logger::get("Iceberg Partition Pruning"), "Cannot parse iceberg transform name: {}.", transform_name_src); @@ -48,6 +48,13 @@ DB::ASTPtr getASTFromTransform(const String & transform_name_src, const String & return makeASTFunction( transform_and_argument->transform_name, make_intrusive(*transform_and_argument->argument), make_intrusive(column_name)); } + if (transform_and_argument->time_zone) + { + return makeASTFunction( + transform_and_argument->transform_name, + make_intrusive(column_name), + make_intrusive(*transform_and_argument->time_zone)); + } return makeASTFunction(transform_and_argument->transform_name, make_intrusive(column_name)); } diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFilesPruning.h b/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFilesPruning.h index daa91a6a0bb3..422e514bb99c 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFilesPruning.h +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFilesPruning.h @@ -31,7 +31,7 @@ namespace DB::Iceberg struct ProcessedManifestFileEntry; class ManifestFileIterator; -DB::ASTPtr getASTFromTransform(const String & transform_name_src, const String & column_name); +DB::ASTPtr getASTFromTransform(const String & transform_name_src, const String & column_name, const String & time_zone); /// Prune specific data files based on manifest content class ManifestFilesPruner diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/PersistentTableComponents.h b/src/Storages/ObjectStorage/DataLakes/Iceberg/PersistentTableComponents.h index 7a1aa97afc61..376e2ef536d2 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/PersistentTableComponents.h +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/PersistentTableComponents.h @@ -20,6 +20,7 @@ struct PersistentTableComponents const CompressionMethod metadata_compression_method; const String table_path; const std::optional table_uuid; + const String common_namespace; }; } diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/SchemaProcessor.cpp b/src/Storages/ObjectStorage/DataLakes/Iceberg/SchemaProcessor.cpp index f97dbe9c1e94..4efa9b46615a 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/SchemaProcessor.cpp +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/SchemaProcessor.cpp @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -32,6 +33,8 @@ #include #include #include +#include +#include #include @@ -46,6 +49,10 @@ extern const int LOGICAL_ERROR; extern const int BAD_ARGUMENTS; } +namespace Setting +{ +extern const SettingsTimezone iceberg_timezone_for_timestamptz; +} namespace { @@ -144,7 +151,7 @@ namespace Iceberg std::string IcebergSchemaProcessor::default_link{}; -void IcebergSchemaProcessor::addIcebergTableSchema(Poco::JSON::Object::Ptr schema_ptr) +void IcebergSchemaProcessor::addIcebergTableSchema(Poco::JSON::Object::Ptr schema_ptr, ContextPtr context_) { std::lock_guard lock(mutex); @@ -167,7 +174,7 @@ void IcebergSchemaProcessor::addIcebergTableSchema(Poco::JSON::Object::Ptr schem auto name = field->getValue(f_name); bool required = field->getValue(f_required); current_full_name = name; - auto type = getFieldType(field, f_type, required, current_full_name, true); + auto type = getFieldType(field, f_type, context_, required, current_full_name, true); clickhouse_schema->push_back(NameAndTypePair{name, type}); clickhouse_types_by_source_ids[{schema_id, field->getValue(f_id)}] = NameAndTypePair{current_full_name, type}; clickhouse_ids_by_source_names[{schema_id, current_full_name}] = field->getValue(f_id); @@ -221,7 +228,7 @@ NamesAndTypesList IcebergSchemaProcessor::tryGetFieldsCharacteristics(Int32 sche return fields; } -DataTypePtr IcebergSchemaProcessor::getSimpleType(const String & type_name) +DataTypePtr IcebergSchemaProcessor::getSimpleType(const String & type_name, ContextPtr context_) { if (type_name == f_boolean) return DataTypeFactory::instance().get("Bool"); @@ -240,7 +247,10 @@ DataTypePtr IcebergSchemaProcessor::getSimpleType(const String & type_name) if (type_name == f_timestamp) return std::make_shared(6); if (type_name == f_timestamptz) - return std::make_shared(6, "UTC"); + { + std::string timezone = context_->getSettingsRef()[Setting::iceberg_timezone_for_timestamptz]; + return std::make_shared(6, timezone); + } if (type_name == f_string || type_name == f_binary) return std::make_shared(); if (type_name == f_uuid) @@ -265,21 +275,25 @@ DataTypePtr IcebergSchemaProcessor::getSimpleType(const String & type_name) } DataTypePtr -IcebergSchemaProcessor::getComplexTypeFromObject(const Poco::JSON::Object::Ptr & type, String & current_full_name, bool is_subfield_of_root) +IcebergSchemaProcessor::getComplexTypeFromObject( + const Poco::JSON::Object::Ptr & type, + String & current_full_name, + ContextPtr context_, + bool is_subfield_of_root) { String type_name = type->getValue(f_type); if (type_name == f_list) { bool element_required = type->getValue("element-required"); - auto element_type = getFieldType(type, f_element, element_required); + auto element_type = getFieldType(type, f_element, context_, element_required); return std::make_shared(element_type); } if (type_name == f_map) { - auto key_type = getFieldType(type, f_key, true); + auto key_type = getFieldType(type, f_key, context_, true); auto value_required = type->getValue("value-required"); - auto value_type = getFieldType(type, f_value, value_required); + auto value_type = getFieldType(type, f_value, context_, value_required); return std::make_shared(key_type, value_type); } @@ -303,7 +317,7 @@ IcebergSchemaProcessor::getComplexTypeFromObject(const Poco::JSON::Object::Ptr & (current_full_name += ".").append(element_names.back()); scope_guard guard([&] { current_full_name.resize(current_full_name.size() - element_names.back().size() - 1); }); - element_types.push_back(getFieldType(field, f_type, required, current_full_name, true)); + element_types.push_back(getFieldType(field, f_type, context_, required, current_full_name, true)); TSA_SUPPRESS_WARNING_FOR_WRITE(clickhouse_types_by_source_ids) [{schema_id, field->getValue(f_id)}] = NameAndTypePair{current_full_name, element_types.back()}; @@ -312,7 +326,7 @@ IcebergSchemaProcessor::getComplexTypeFromObject(const Poco::JSON::Object::Ptr & } else { - element_types.push_back(getFieldType(field, f_type, required)); + element_types.push_back(getFieldType(field, f_type, context_, required)); } } @@ -323,16 +337,21 @@ IcebergSchemaProcessor::getComplexTypeFromObject(const Poco::JSON::Object::Ptr & } DataTypePtr IcebergSchemaProcessor::getFieldType( - const Poco::JSON::Object::Ptr & field, const String & type_key, bool required, String & current_full_name, bool is_subfield_of_root) + const Poco::JSON::Object::Ptr & field, + const String & type_key, + ContextPtr context_, + bool required, + String & current_full_name, + bool is_subfield_of_root) { if (field->isObject(type_key)) - return getComplexTypeFromObject(field->getObject(type_key), current_full_name, is_subfield_of_root); + return getComplexTypeFromObject(field->getObject(type_key), current_full_name, context_, is_subfield_of_root); auto type = field->get(type_key); if (type.isString()) { const String & type_name = type.extract(); - auto data_type = getSimpleType(type_name); + auto data_type = getSimpleType(type_name, context_); return required ? data_type : makeNullable(data_type); } @@ -362,7 +381,11 @@ bool IcebergSchemaProcessor::allowPrimitiveTypeConversion(const String & old_typ // Ids are passed only for error logging purposes std::shared_ptr IcebergSchemaProcessor::getSchemaTransformationDag( - const Poco::JSON::Object::Ptr & old_schema, const Poco::JSON::Object::Ptr & new_schema, Int32 old_id, Int32 new_id) + const Poco::JSON::Object::Ptr & old_schema, + const Poco::JSON::Object::Ptr & new_schema, + ContextPtr context_, + Int32 old_id, + Int32 new_id) { std::unordered_map> old_schema_entries; auto old_schema_fields = old_schema->get(f_fields).extract(); @@ -374,7 +397,7 @@ std::shared_ptr IcebergSchemaProcessor::getSchemaTransformationDag( size_t id = field->getValue(f_id); auto name = field->getValue(f_name); bool required = field->getValue(f_required); - old_schema_entries[id] = {field, &dag->addInput(name, getFieldType(field, f_type, required))}; + old_schema_entries[id] = {field, &dag->addInput(name, getFieldType(field, f_type, context_, required))}; } auto new_schema_fields = new_schema->get(f_fields).extract(); for (size_t i = 0; i != new_schema_fields->size(); ++i) @@ -383,7 +406,7 @@ std::shared_ptr IcebergSchemaProcessor::getSchemaTransformationDag( size_t id = field->getValue(f_id); auto name = field->getValue(f_name); bool required = field->getValue(f_required); - auto type = getFieldType(field, f_type, required); + auto type = getFieldType(field, f_type, context_, required); auto old_node_it = old_schema_entries.find(id); if (old_node_it != old_schema_entries.end()) { @@ -393,7 +416,7 @@ std::shared_ptr IcebergSchemaProcessor::getSchemaTransformationDag( || field->getObject(f_type)->getValue(f_type) == "list" || field->getObject(f_type)->getValue(f_type) == "map")) { - auto old_type = getFieldType(old_json, "type", required); + auto old_type = getFieldType(old_json, "type", context_, required); auto transform = std::make_shared(std::vector{type}, std::vector{old_type}, old_json, field); old_node = &dag->addFunction(transform, std::vector{old_node}, name); @@ -423,7 +446,7 @@ std::shared_ptr IcebergSchemaProcessor::getSchemaTransformationDag( } else if (allowPrimitiveTypeConversion(old_type, new_type)) { - node = &dag->addCast(*old_node, getFieldType(field, f_type, required), name, nullptr); + node = &dag->addCast(*old_node, getFieldType(field, f_type, context_, required), name, nullptr); } outputs.push_back(node); } @@ -449,7 +472,10 @@ std::shared_ptr IcebergSchemaProcessor::getSchemaTransformationDag( return dag; } -std::shared_ptr IcebergSchemaProcessor::getSchemaTransformationDagByIds(Int32 old_id, Int32 new_id) +std::shared_ptr IcebergSchemaProcessor::getSchemaTransformationDagByIds( + ContextPtr context_, + Int32 old_id, + Int32 new_id) { if (old_id == new_id) return nullptr; @@ -468,7 +494,7 @@ std::shared_ptr IcebergSchemaProcessor::getSchemaTransformatio throw Exception(ErrorCodes::BAD_ARGUMENTS, "Schema with schema-id {} is unknown", new_id); return transform_dags_by_ids[{old_id, new_id}] - = getSchemaTransformationDag(old_schema_it->second, new_schema_it->second, old_id, new_id); + = getSchemaTransformationDag(old_schema_it->second, new_schema_it->second, context_, old_id, new_id); } Poco::JSON::Object::Ptr IcebergSchemaProcessor::getIcebergTableSchemaById(Int32 id) const diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/SchemaProcessor.h b/src/Storages/ObjectStorage/DataLakes/Iceberg/SchemaProcessor.h index 93601ec35f14..66034639aaa2 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/SchemaProcessor.h +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/SchemaProcessor.h @@ -74,16 +74,18 @@ ColumnMapperPtr createColumnMapper(Poco::JSON::Object::Ptr schema_object); * } * } */ -class IcebergSchemaProcessor +class IcebergSchemaProcessor : private WithContext { static std::string default_link; using Node = ActionsDAG::Node; public: - void addIcebergTableSchema(Poco::JSON::Object::Ptr schema_ptr); + explicit IcebergSchemaProcessor(ContextPtr context_) : WithContext(context_) {} + + void addIcebergTableSchema(Poco::JSON::Object::Ptr schema_ptr, ContextPtr context_); std::shared_ptr getClickhouseTableSchemaById(Int32 id); - std::shared_ptr getSchemaTransformationDagByIds(Int32 old_id, Int32 new_id); + std::shared_ptr getSchemaTransformationDagByIds(ContextPtr context_, Int32 old_id, Int32 new_id); NameAndTypePair getFieldCharacteristics(Int32 schema_version, Int32 source_id) const; std::optional tryGetFieldCharacteristics(Int32 schema_version, Int32 source_id) const; NamesAndTypesList tryGetFieldsCharacteristics(Int32 schema_id, const std::vector & source_ids) const; @@ -91,7 +93,7 @@ class IcebergSchemaProcessor Poco::JSON::Object::Ptr getIcebergTableSchemaById(Int32 id) const; bool hasClickhouseTableSchemaById(Int32 id) const; - static DataTypePtr getSimpleType(const String & type_name); + static DataTypePtr getSimpleType(const String & type_name, ContextPtr context_); static std::unordered_map traverseSchema(Poco::JSON::Array::Ptr schema); @@ -111,10 +113,15 @@ class IcebergSchemaProcessor std::unordered_map schema_id_by_snapshot TSA_GUARDED_BY(mutex); NamesAndTypesList getSchemaType(const Poco::JSON::Object::Ptr & schema); - DataTypePtr getComplexTypeFromObject(const Poco::JSON::Object::Ptr & type, String & current_full_name, bool is_subfield_of_root); + DataTypePtr getComplexTypeFromObject( + const Poco::JSON::Object::Ptr & type, + String & current_full_name, + ContextPtr context_, + bool is_subfield_of_root); DataTypePtr getFieldType( const Poco::JSON::Object::Ptr & field, const String & type_key, + ContextPtr context_, bool required, String & current_full_name = default_link, bool is_subfield_of_root = false); @@ -123,7 +130,11 @@ class IcebergSchemaProcessor const Node * getDefaultNodeForField(const Poco::JSON::Object::Ptr & field); std::shared_ptr getSchemaTransformationDag( - const Poco::JSON::Object::Ptr & old_schema, const Poco::JSON::Object::Ptr & new_schema, Int32 old_id, Int32 new_id); + const Poco::JSON::Object::Ptr & old_schema, + const Poco::JSON::Object::Ptr & new_schema, + ContextPtr context_, + Int32 old_id, + Int32 new_id); mutable SharedMutex mutex; }; diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/StatelessMetadataFileGetter.cpp b/src/Storages/ObjectStorage/DataLakes/Iceberg/StatelessMetadataFileGetter.cpp index 80b61b2fa728..c55e215cc95d 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/StatelessMetadataFileGetter.cpp +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/StatelessMetadataFileGetter.cpp @@ -165,9 +165,10 @@ ManifestFileCacheKeys getManifestList( ManifestFileCacheKeys manifest_file_cache_keys; + auto dump_metadata = [&]()->String { return manifest_list_deserializer.getMetadataContent(); }; insertRowToLogTable( local_context, - manifest_list_deserializer.getMetadataContent(), + dump_metadata, DB::IcebergMetadataLogLevel::ManifestListMetadata, persistent_table_components.table_path, filename, @@ -210,9 +211,10 @@ ManifestFileCacheKeys getManifestList( manifest_file_cache_keys.emplace_back( manifest_file_name, manifest_length, added_sequence_number, added_snapshot_id.safeGet(), content_type); + auto dump_row_metadata = [&]()->String { return manifest_list_deserializer.getContent(i); }; insertRowToLogTable( local_context, - manifest_list_deserializer.getContent(i), + dump_row_metadata, DB::IcebergMetadataLogLevel::ManifestListEntry, persistent_table_components.table_path, filename, diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp b/src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp index cb58006b471c..1a92d923be29 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp @@ -54,6 +54,7 @@ #include #include +#include using namespace DB; @@ -81,12 +82,15 @@ namespace DB::DataLakeStorageSetting namespace ProfileEvents { extern const Event IcebergVersionHintUsed; + extern const Event IcebergJsonFileParsing; + extern const Event IcebergJsonFileParsingMicroseconds; } namespace DB::Setting { extern const SettingsUInt64 iceberg_metadata_staleness_ms; extern const SettingsUInt64 output_format_compression_level; + extern const SettingsTimezone iceberg_partition_timezone; } /// Hard to imagine a hint file larger than 10 MB @@ -135,8 +139,9 @@ static Iceberg::MetadataFileWithInfo getMetadataFileAndVersion(const std::string } String version_str; /// v.metadata.json + /// v-.metadata.json - generated by FileNamesGenerator::generateMetadataName with use_uuid_in_metadata flag if (file_name.starts_with('v')) - version_str = String(file_name.begin() + 1, file_name.begin() + file_name.find_first_of('.')); + version_str = String(file_name.begin() + 1, file_name.begin() + file_name.find_first_of(".-")); /// -.metadata.json else version_str = String(file_name.begin(), file_name.begin() + file_name.find_first_of('-')); @@ -249,27 +254,31 @@ bool writeMetadataFileAndVersionHint( } -std::optional parseTransformAndArgument(const String & transform_name_src) +std::optional parseTransformAndArgument(const String & transform_name_src, const String & time_zone) { std::string transform_name = Poco::toLower(transform_name_src); + std::optional time_zone_opt; + if (!time_zone.empty()) + time_zone_opt = time_zone; + if (transform_name == "year" || transform_name == "years") - return TransformAndArgument{"toYearNumSinceEpoch", std::nullopt}; + return TransformAndArgument{"toYearNumSinceEpoch", std::nullopt, time_zone_opt}; if (transform_name == "month" || transform_name == "months") - return TransformAndArgument{"toMonthNumSinceEpoch", std::nullopt}; + return TransformAndArgument{"toMonthNumSinceEpoch", std::nullopt, time_zone_opt}; if (transform_name == "day" || transform_name == "date" || transform_name == "days" || transform_name == "dates") - return TransformAndArgument{"toRelativeDayNum", std::nullopt}; + return TransformAndArgument{"toRelativeDayNum", std::nullopt, time_zone_opt}; if (transform_name == "hour" || transform_name == "hours") - return TransformAndArgument{"toRelativeHourNum", std::nullopt}; + return TransformAndArgument{"toRelativeHourNum", std::nullopt, time_zone_opt}; if (transform_name == "identity") - return TransformAndArgument{"identity", std::nullopt}; + return TransformAndArgument{"identity", std::nullopt, std::nullopt}; if (transform_name == "void") - return TransformAndArgument{"tuple", std::nullopt}; + return TransformAndArgument{"tuple", std::nullopt, std::nullopt}; if (transform_name.starts_with("truncate") || transform_name.starts_with("bucket")) { @@ -293,11 +302,11 @@ std::optional parseTransformAndArgument(const String & tra if (transform_name.starts_with("truncate")) { - return TransformAndArgument{"icebergTruncate", argument}; + return TransformAndArgument{"icebergTruncate", argument, std::nullopt}; } else if (transform_name.starts_with("bucket")) { - return TransformAndArgument{"icebergBucket", argument}; + return TransformAndArgument{"icebergBucket", argument, std::nullopt}; } } return std::nullopt; @@ -433,6 +442,9 @@ Poco::JSON::Object::Ptr getMetadataJSONObject( return json_str; }; + ProfileEvents::increment(ProfileEvents::IcebergJsonFileParsing); + ProfileEventTimeIncrement watch(ProfileEvents::IcebergJsonFileParsingMicroseconds); + String metadata_json_str; if (metadata_cache && table_uuid.has_value()) metadata_json_str = metadata_cache->getOrSetTableMetadata( @@ -1284,7 +1296,8 @@ KeyDescription getSortingKeyDescriptionFromMetadata(Poco::JSON::Object::Ptr meta auto column_name = source_id_to_column_name[source_id]; int direction = field->getValue(f_direction) == "asc" ? 1 : -1; auto iceberg_transform_name = field->getValue(f_transform); - auto clickhouse_transform_name = parseTransformAndArgument(iceberg_transform_name); + auto clickhouse_transform_name = parseTransformAndArgument(iceberg_transform_name, + local_context->getSettingsRef()[Setting::iceberg_partition_timezone]); String full_argument; if (clickhouse_transform_name->transform_name != "identity") { @@ -1293,7 +1306,10 @@ KeyDescription getSortingKeyDescriptionFromMetadata(Poco::JSON::Object::Ptr meta { full_argument += std::to_string(*clickhouse_transform_name->argument) + ", "; } - full_argument += column_name + ")"; + full_argument += column_name; + if (clickhouse_transform_name->time_zone) + full_argument += ", '" + *clickhouse_transform_name->time_zone + "'"; + full_argument += ")"; } else { diff --git a/src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h b/src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h index 486be68a597c..91c13b8d17a9 100644 --- a/src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h +++ b/src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h @@ -54,9 +54,14 @@ struct TransformAndArgument { String transform_name; std::optional argument; + /// When Iceberg table is partitioned by time, splitting by partitions can be made using different timezone + /// (UTC in most cases). This timezone can be set with setting `iceberg_partition_timezone`, value is in this member. + /// When Iceberg partition condition converted to ClickHouse function in `parseTransformAndArgument` method + /// `time_zone` added as second argument to functions like `toRelativeDayNum`, `toYearNumSinceEpoch`, etc. + std::optional time_zone; }; -std::optional parseTransformAndArgument(const String & transform_name_src); +std::optional parseTransformAndArgument(const String & transform_name_src, const String & time_zone); Poco::JSON::Object::Ptr getMetadataJSONObject( const String & metadata_file_path, diff --git a/src/Storages/ObjectStorage/HDFS/Configuration.cpp b/src/Storages/ObjectStorage/HDFS/Configuration.cpp index bc1b214e65e4..273f6cacf03d 100644 --- a/src/Storages/ObjectStorage/HDFS/Configuration.cpp +++ b/src/Storages/ObjectStorage/HDFS/Configuration.cpp @@ -236,6 +236,14 @@ void StorageHDFSConfiguration::addStructureAndFormatToArgsIfNeeded( { addStructureAndFormatToArgsIfNeededHDFS(args, structure_, format_, context, with_structure); } + +ASTPtr StorageHDFSConfiguration::createArgsWithAccessData() const +{ + auto arguments = make_intrusive(); + arguments->children.push_back(make_intrusive(url + path.path)); + return arguments; +} + } #endif diff --git a/src/Storages/ObjectStorage/HDFS/Configuration.h b/src/Storages/ObjectStorage/HDFS/Configuration.h index c52567f4cc72..79c23bbb81b9 100644 --- a/src/Storages/ObjectStorage/HDFS/Configuration.h +++ b/src/Storages/ObjectStorage/HDFS/Configuration.h @@ -81,6 +81,8 @@ class StorageHDFSConfiguration : public StorageObjectStorageConfiguration void addStructureAndFormatToArgsIfNeeded( ASTs & args, const String & structure_, const String & format_, ContextPtr context, bool with_structure) override; + ASTPtr createArgsWithAccessData() const override; + private: void initializeFromParsedArguments(const HDFSStorageParsedArguments & parsed_arguments); void setURL(const std::string & url_); diff --git a/src/Storages/ObjectStorage/Local/Configuration.cpp b/src/Storages/ObjectStorage/Local/Configuration.cpp index 2d88b97dfebf..1a8dc9c75551 100644 --- a/src/Storages/ObjectStorage/Local/Configuration.cpp +++ b/src/Storages/ObjectStorage/Local/Configuration.cpp @@ -126,4 +126,21 @@ void StorageLocalConfiguration::fromNamedCollection(const NamedCollection & coll initializeFromParsedArguments(parsed_arguments); paths = {path}; } + +ASTPtr StorageLocalConfiguration::createArgsWithAccessData() const +{ + auto arguments = make_intrusive(); + + arguments->children.push_back(make_intrusive(path.path)); + if (getFormat() != "auto") + arguments->children.push_back(make_intrusive(getFormat())); + if (getStructure() != "auto") + arguments->children.push_back(make_intrusive(getStructure())); + if (getCompressionMethod() != "auto") + arguments->children.push_back(make_intrusive(getCompressionMethod())); + + return arguments; +} + + } diff --git a/src/Storages/ObjectStorage/Local/Configuration.h b/src/Storages/ObjectStorage/Local/Configuration.h index f7033ea1e0b1..453ad5519b1f 100644 --- a/src/Storages/ObjectStorage/Local/Configuration.h +++ b/src/Storages/ObjectStorage/Local/Configuration.h @@ -82,6 +82,8 @@ class StorageLocalConfiguration : public StorageObjectStorageConfiguration void addStructureAndFormatToArgsIfNeeded(ASTs &, const String &, const String &, ContextPtr, bool) override { } + ASTPtr createArgsWithAccessData() const override; + protected: void fromAST(ASTs & args, ContextPtr context, bool with_structure) override; void fromDisk(const String & disk_name_, ASTs & args, ContextPtr context, bool with_structure) override; diff --git a/src/Storages/ObjectStorage/ReadBufferIterator.cpp b/src/Storages/ObjectStorage/ReadBufferIterator.cpp index 97d1c4730c12..7c653ca28a84 100644 --- a/src/Storages/ObjectStorage/ReadBufferIterator.cpp +++ b/src/Storages/ObjectStorage/ReadBufferIterator.cpp @@ -39,8 +39,8 @@ ReadBufferIterator::ReadBufferIterator( , read_keys(read_keys_) , prev_read_keys_size(read_keys_.size()) { - if (configuration->format != "auto") - format = configuration->format; + if (configuration->getFormat() != "auto") + format = configuration->getFormat(); } SchemaCache::Key ReadBufferIterator::getKeyForSchemaCache(const ObjectInfo & object_info, const String & format_name) const @@ -143,7 +143,7 @@ std::unique_ptr ReadBufferIterator::recreateLastReadBuffer() auto impl = createReadBuffer(current_object_info->relative_path_with_metadata, object_storage, context, getLogger("ReadBufferIterator")); - const auto compression_method = chooseCompressionMethod(current_object_info->getFileName(), configuration->compression_method); + const auto compression_method = chooseCompressionMethod(current_object_info->getFileName(), configuration->getCompressionMethod()); const auto zstd_window = static_cast(context->getSettingsRef()[Setting::zstd_window_log_max]); return wrapReadBufferWithCompressionMethod(std::move(impl), compression_method, zstd_window); @@ -260,13 +260,13 @@ ReadBufferIterator::Data ReadBufferIterator::next() using ObjectInfoInArchive = StorageObjectStorageSource::ArchiveIterator::ObjectInfoInArchive; if (const auto * object_info_in_archive = dynamic_cast(current_object_info.get())) { - compression_method = chooseCompressionMethod(filename, configuration->compression_method); + compression_method = chooseCompressionMethod(filename, configuration->getCompressionMethod()); const auto & archive_reader = object_info_in_archive->archive_reader; read_buf = archive_reader->readFile(object_info_in_archive->path_in_archive, /*throw_on_not_found=*/true); } else { - compression_method = chooseCompressionMethod(filename, configuration->compression_method); + compression_method = chooseCompressionMethod(filename, configuration->getCompressionMethod()); read_buf = createReadBuffer( current_object_info->relative_path_with_metadata, object_storage, getContext(), getLogger("ReadBufferIterator")); } diff --git a/src/Storages/ObjectStorage/S3/Configuration.cpp b/src/Storages/ObjectStorage/S3/Configuration.cpp index 7f3a80792991..8cbfb882163c 100644 --- a/src/Storages/ObjectStorage/S3/Configuration.cpp +++ b/src/Storages/ObjectStorage/S3/Configuration.cpp @@ -106,6 +106,7 @@ static const std::unordered_set optional_configuration_keys = "partition_strategy", "partition_columns_in_data_file", "storage_class_name", + "storage_type", /// Private configuration options "role_arn", /// for extra_credentials "role_session_name", /// for extra_credentials @@ -632,6 +633,7 @@ void S3StorageParsedArguments::fromAST(ASTs & args, ContextPtr context, bool wit compression_method = compression_method_value.value(); } + if (auto partition_strategy_value = getFromPositionOrKeyValue("partition_strategy", args, engine_args_to_idx, key_value_args); partition_strategy_value.has_value()) { @@ -1027,6 +1029,31 @@ void StorageS3Configuration::addStructureAndFormatToArgsIfNeeded( addStructureAndFormatToArgsIfNeededS3( args, structure_, format_, context, with_structure, S3StorageParsedArguments::getMaxNumberOfArguments(with_structure)); } + +ASTPtr StorageS3Configuration::createArgsWithAccessData() const +{ + auto arguments = make_intrusive(); + + arguments->children.push_back(make_intrusive(url.uri_str)); + if (s3_settings->auth_settings[S3AuthSetting::no_sign_request]) + { + arguments->children.push_back(make_intrusive("NOSIGN")); + } + else + { + arguments->children.push_back(make_intrusive(s3_settings->auth_settings[S3AuthSetting::access_key_id].value)); + arguments->children.push_back(make_intrusive(s3_settings->auth_settings[S3AuthSetting::secret_access_key].value)); + if (!s3_settings->auth_settings[S3AuthSetting::session_token].value.empty()) + arguments->children.push_back(make_intrusive(s3_settings->auth_settings[S3AuthSetting::session_token].value)); + if (getFormat() != "auto") + arguments->children.push_back(make_intrusive(getFormat())); + if (!getCompressionMethod().empty()) + arguments->children.push_back(make_intrusive(getCompressionMethod())); + } + + return arguments; +} + } #endif diff --git a/src/Storages/ObjectStorage/S3/Configuration.h b/src/Storages/ObjectStorage/S3/Configuration.h index d8624f241a93..0ebcb616dff2 100644 --- a/src/Storages/ObjectStorage/S3/Configuration.h +++ b/src/Storages/ObjectStorage/S3/Configuration.h @@ -142,6 +142,8 @@ class StorageS3Configuration : public StorageObjectStorageConfiguration ContextPtr context, bool with_structure) override; + ASTPtr createArgsWithAccessData() const override; + static bool collectCredentials(ASTPtr maybe_credentials, S3::S3AuthSettings & auth_settings_, ContextPtr local_context); S3::URI url; diff --git a/src/Storages/ObjectStorage/StorageObjectStorage.cpp b/src/Storages/ObjectStorage/StorageObjectStorage.cpp index 7a55d1ade4ad..5fb659ffd359 100644 --- a/src/Storages/ObjectStorage/StorageObjectStorage.cpp +++ b/src/Storages/ObjectStorage/StorageObjectStorage.cpp @@ -66,6 +66,11 @@ String StorageObjectStorage::getPathSample(ContextPtr context) if (context->getSettingsRef()[Setting::use_hive_partitioning]) local_distributed_processing = false; + const auto path = configuration->getRawPath(); + + if (!configuration->isArchive() && !path.hasGlobs() && !local_distributed_processing) + return path.path; + auto file_iterator = StorageObjectStorageSource::createFileIterator( configuration, query_settings, @@ -84,11 +89,6 @@ String StorageObjectStorage::getPathSample(ContextPtr context) /// not for actual data reading, so do not emit ProfileEvents. file_iterator->setEmitProfileEvents(false); - const auto path = configuration->getRawPath(); - - if (!configuration->isArchive() && !path.hasGlobs() && !local_distributed_processing) - return path.path; - if (auto file = file_iterator->next(0)) return file->getPath(); return ""; @@ -105,13 +105,15 @@ StorageObjectStorage::StorageObjectStorage( std::optional format_settings_, LoadingStrictnessLevel mode, std::shared_ptr catalog_, - bool if_not_exists_, + bool /*if_not_exists_*/, bool is_datalake_query, bool distributed_processing_, ASTPtr partition_by_, - ASTPtr order_by_, + ASTPtr /*order_by_*/, bool is_table_function_, - bool lazy_init) + bool lazy_init, + bool updated_configuration, + std::optional sample_path_) : IStorage(table_id_) , configuration(configuration_) , object_storage(object_storage_) @@ -123,9 +125,9 @@ StorageObjectStorage::StorageObjectStorage( , storage_id(table_id_) { configuration->initPartitionStrategy(partition_by_, columns_in_table_or_function_definition, context); - const bool need_resolve_columns_or_format = columns_in_table_or_function_definition.empty() || (configuration->format == "auto"); + const bool need_resolve_columns_or_format = columns_in_table_or_function_definition.empty() || (configuration->getFormat() == "auto"); const bool need_resolve_sample_path = context->getSettingsRef()[Setting::use_hive_partitioning] - && !configuration->partition_strategy + && !configuration->getPartitionStrategy() && !configuration->isDataLakeConfiguration(); const bool do_lazy_init = lazy_init && !need_resolve_columns_or_format && !need_resolve_sample_path; @@ -136,24 +138,16 @@ StorageObjectStorage::StorageObjectStorage( is_datalake_query, columns_in_table_or_function_definition.toString(true)); bool is_delta_lake_cdf = context->getSettingsRef()[Setting::delta_lake_snapshot_start_version] != -1 - || context->getSettingsRef()[Setting::delta_lake_snapshot_start_version] != -1; + || context->getSettingsRef()[Setting::delta_lake_snapshot_end_version] != -1; if (!is_table_function && is_delta_lake_cdf) { throw Exception(ErrorCodes::BAD_ARGUMENTS, "Delta lake CDF is allowed only for deltaLake table function"); } - if (!is_table_function && !columns_in_table_or_function_definition.empty() && !is_datalake_query && mode == LoadingStrictnessLevel::CREATE) - { - LOG_DEBUG(log, "Creating new storage with specified columns"); - configuration->create( - object_storage, context, columns_in_table_or_function_definition, partition_by_, order_by_, if_not_exists_, catalog, storage_id); - } - - bool updated_configuration = false; try { - if (!do_lazy_init) + if (!do_lazy_init && !updated_configuration) { if (is_table_function) configuration->lazyInitializeIfNeeded(object_storage, context); @@ -173,7 +167,7 @@ StorageObjectStorage::StorageObjectStorage( tryLogCurrentException(log, /*start of message = */ "", LogsLevel::warning); } - std::string sample_path; + std::string sample_path = sample_path_.value_or(""); ColumnsDescription columns{columns_in_table_or_function_definition}; @@ -182,7 +176,7 @@ StorageObjectStorage::StorageObjectStorage( if (configuration->isDataLakeConfiguration()) throw Exception(ErrorCodes::BAD_ARGUMENTS, "The _schema_hash placeholder is not supported for DataLake engines"); - if (configuration->partition_strategy_type == PartitionStrategyFactory::StrategyType::HIVE) + if (configuration->getPartitionStrategyType() == PartitionStrategyFactory::StrategyType::HIVE) throw Exception(ErrorCodes::BAD_ARGUMENTS, "The _schema_hash placeholder is not supported with hive partition strategy"); if (columns.empty()) @@ -192,7 +186,7 @@ StorageObjectStorage::StorageObjectStorage( } if (need_resolve_columns_or_format) - resolveSchemaAndFormat(columns, configuration->format, object_storage, configuration, format_settings, sample_path, context); + resolveSchemaAndFormat(columns, object_storage, configuration, format_settings, sample_path, context); else validateSupportedColumns(columns, *configuration); @@ -200,7 +194,7 @@ StorageObjectStorage::StorageObjectStorage( /// FIXME: We need to call getPathSample() lazily on select /// in case it failed to be initialized in constructor. - if (updated_configuration && sample_path.empty() && need_resolve_sample_path && !configuration->partition_strategy) + if (updated_configuration && sample_path.empty() && need_resolve_sample_path && !configuration->getPartitionStrategy()) { try { @@ -232,7 +226,7 @@ StorageObjectStorage::StorageObjectStorage( sample_path); } - bool format_supports_prewhere = FormatFactory::instance().checkIfFormatSupportsPrewhere(configuration->format, context, format_settings); + bool format_supports_prewhere = FormatFactory::instance().checkIfFormatSupportsPrewhere(configuration->getFormat(), context, format_settings); /// TODO: Known problems with datalake prewhere: /// * If the iceberg table went through schema evolution, columns read from file may need to @@ -288,14 +282,16 @@ StorageObjectStorage::StorageObjectStorage( metadata.setConstraints(constraints_); metadata.setComment(comment); - if (configuration->partition_strategy) - metadata.partition_key = configuration->partition_strategy->getPartitionKeyDescription(); + if (configuration->getPartitionStrategy()) + { + metadata.partition_key = configuration->getPartitionStrategy()->getPartitionKeyDescription(); + } setVirtuals(VirtualColumnUtils::getVirtualsForFileLikeStorage( metadata.columns, context, format_settings, - configuration->partition_strategy_type, + configuration->getPartitionStrategyType(), sample_path)); setInMemoryMetadata(metadata); @@ -308,17 +304,17 @@ String StorageObjectStorage::getName() const bool StorageObjectStorage::prefersLargeBlocks() const { - return FormatFactory::instance().checkIfOutputFormatPrefersLargeBlocks(configuration->format); + return FormatFactory::instance().checkIfOutputFormatPrefersLargeBlocks(configuration->getFormat()); } bool StorageObjectStorage::parallelizeOutputAfterReading(ContextPtr context) const { - return FormatFactory::instance().checkParallelizeOutputAfterReading(configuration->format, context); + return FormatFactory::instance().checkParallelizeOutputAfterReading(configuration->getFormat(), context); } bool StorageObjectStorage::supportsSubsetOfColumns(const ContextPtr & context) const { - return FormatFactory::instance().checkIfFormatSupportsSubsetOfColumns(configuration->format, context, format_settings); + return FormatFactory::instance().checkIfFormatSupportsSubsetOfColumns(configuration->getFormat(), context, format_settings); } bool StorageObjectStorage::supportsPrewhere() const @@ -429,8 +425,7 @@ void StorageObjectStorage::read( configuration->update(object_storage, local_context); } - - if (configuration->partition_strategy && configuration->partition_strategy_type != PartitionStrategyFactory::StrategyType::HIVE) + if (configuration->getPartitionStrategy() && configuration->getPartitionStrategyType() != PartitionStrategyFactory::StrategyType::HIVE) { throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Reading from a partitioned {} storage is not implemented yet", @@ -561,7 +556,7 @@ SinkToStoragePtr StorageObjectStorage::write( /// Not a data lake, just raw object storage - if (configuration->partition_strategy) + if (configuration->getPartitionStrategy()) { return std::make_shared(object_storage, configuration, format_settings, sample_block, local_context); } @@ -579,8 +574,8 @@ SinkToStoragePtr StorageObjectStorage::write( format_settings, sample_block, local_context, - configuration->format, - configuration->compression_method); + configuration->getFormat(), + configuration->getCompressionMethod()); } bool StorageObjectStorage::optimize( @@ -686,7 +681,7 @@ ColumnsDescription StorageObjectStorage::resolveSchemaFromData( { ObjectInfos read_keys; auto iterator = createReadBufferIterator(object_storage, configuration, format_settings, read_keys, context); - auto schema = readSchemaFromFormat(configuration->format, format_settings, *iterator, context); + auto schema = readSchemaFromFormat(configuration->getFormat(), format_settings, *iterator, context); sample_path = iterator->getLastFilePath(); return schema; } @@ -707,7 +702,7 @@ std::string StorageObjectStorage::resolveFormatFromData( std::pair StorageObjectStorage::resolveSchemaAndFormatFromData( const ObjectStoragePtr & object_storage, - const StorageObjectStorageConfigurationPtr & configuration, + StorageObjectStorageConfigurationPtr & configuration, const std::optional & format_settings, std::string & sample_path, const ContextPtr & context) @@ -716,13 +711,13 @@ std::pair StorageObjectStorage::resolveSchemaAn auto iterator = createReadBufferIterator(object_storage, configuration, format_settings, read_keys, context); auto [columns, format] = detectFormatAndReadSchema(format_settings, *iterator, context); sample_path = iterator->getLastFilePath(); - configuration->format = format; + configuration->setFormat(format); return std::pair(columns, format); } void StorageObjectStorage::addInferredEngineArgsToCreateQuery(ASTs & args, const ContextPtr & context) const { - configuration->addStructureAndFormatToArgsIfNeeded(args, "", configuration->format, context, /*with_structure=*/false); + configuration->addStructureAndFormatToArgsIfNeeded(args, "", configuration->getFormat(), context, /*with_structure=*/false); } SchemaCache & StorageObjectStorage::getSchemaCache(const ContextPtr & context, const std::string & storage_engine_name) diff --git a/src/Storages/ObjectStorage/StorageObjectStorage.h b/src/Storages/ObjectStorage/StorageObjectStorage.h index 403bed005124..a8f4fc5cfdbc 100644 --- a/src/Storages/ObjectStorage/StorageObjectStorage.h +++ b/src/Storages/ObjectStorage/StorageObjectStorage.h @@ -53,7 +53,9 @@ class StorageObjectStorage : public IStorage ASTPtr partition_by_ = nullptr, ASTPtr order_by_ = nullptr, bool is_table_function_ = false, - bool lazy_init = false); + bool lazy_init = false, + bool updated_configuration = false, // avoid double update configuration from cluster and local versions + std::optional sample_path_ = std::nullopt); String getName() const override; @@ -125,7 +127,7 @@ class StorageObjectStorage : public IStorage static std::pair resolveSchemaAndFormatFromData( const ObjectStoragePtr & object_storage, - const StorageObjectStorageConfigurationPtr & configuration, + StorageObjectStorageConfigurationPtr & configuration, const std::optional & format_settings, std::string & sample_path, const ContextPtr & context); diff --git a/src/Storages/ObjectStorage/StorageObjectStorageCluster.cpp b/src/Storages/ObjectStorage/StorageObjectStorageCluster.cpp index 9af7bb7c6fdb..54ffca4eb3a2 100644 --- a/src/Storages/ObjectStorage/StorageObjectStorageCluster.cpp +++ b/src/Storages/ObjectStorage/StorageObjectStorageCluster.cpp @@ -8,9 +8,16 @@ #include #include +#include +#include +#include +#include +#include #include #include #include +#include +#include #include #include @@ -26,6 +33,10 @@ namespace Setting extern const SettingsBool use_hive_partitioning; extern const SettingsBool cluster_function_process_archive_on_multiple_nodes; extern const SettingsObjectStorageGranularityLevel cluster_table_function_split_granularity; + extern const SettingsBool parallel_replicas_for_cluster_engines; + extern const SettingsString object_storage_cluster; + extern const SettingsInt64 delta_lake_snapshot_start_version; + extern const SettingsInt64 delta_lake_snapshot_end_version; } namespace ErrorCodes @@ -38,6 +49,14 @@ String StorageObjectStorageCluster::getPathSample(ContextPtr context) auto query_settings = configuration->getQuerySettings(context); /// We don't want to throw an exception if there are no files with specified path. query_settings.throw_on_zero_files_match = false; + + if (!configuration->isArchive()) + { + const auto & path = configuration->getPathForRead(); + if (!path.hasGlobs()) + return path.path; + } + auto file_iterator = StorageObjectStorageSource::createFileIterator( configuration, query_settings, @@ -50,11 +69,14 @@ String StorageObjectStorageCluster::getPathSample(ContextPtr context) {}, // virtual_columns {}, // hive_columns nullptr, // read_keys - {} // file_progress_callback + {}, // file_progress_callback + false, // ignore_archive_globs + true // skip_object_metadata ); if (auto file = file_iterator->next(0)) return file->getPath(); + return ""; } @@ -66,29 +88,82 @@ StorageObjectStorageCluster::StorageObjectStorageCluster( const ColumnsDescription & columns_in_table_or_function_definition, const ConstraintsDescription & constraints_, const ASTPtr & partition_by, + const ASTPtr & order_by, ContextPtr context_, - bool is_table_function) + const String & comment_, + std::optional format_settings_, + LoadingStrictnessLevel mode_, + std::shared_ptr catalog, + bool if_not_exists, + bool is_datalake_query, + bool is_table_function, + bool lazy_init) : IStorageCluster( cluster_name_, table_id_, getLogger(fmt::format("{}({})", configuration_->getEngineName(), table_id_.table_name))) , configuration{configuration_} , object_storage(object_storage_) + , cluster_name_in_settings(false) { configuration->initPartitionStrategy(partition_by, columns_in_table_or_function_definition, context_); - /// We allow exceptions to be thrown on update(), - /// because Cluster engine can only be used as table function, - /// so no lazy initialization is allowed. - configuration->update(object_storage, context_); + + const bool need_resolve_columns_or_format = columns_in_table_or_function_definition.empty() || (configuration->getFormat() == "auto"); + const bool do_lazy_init = lazy_init && !need_resolve_columns_or_format && catalog; + + auto log = getLogger("StorageObjectStorageCluster"); + + bool is_delta_lake_cdf = context_->getSettingsRef()[Setting::delta_lake_snapshot_start_version] != -1 + || context_->getSettingsRef()[Setting::delta_lake_snapshot_end_version] != -1; + + if (!is_table_function && is_delta_lake_cdf) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Delta lake CDF is allowed only for deltaLake table function"); + } + + if (!is_table_function && !columns_in_table_or_function_definition.empty() && !is_datalake_query && mode_ == LoadingStrictnessLevel::CREATE) + { + LOG_DEBUG(log, "Creating new storage with specified columns"); + configuration->create( + object_storage, context_, columns_in_table_or_function_definition, partition_by, order_by, if_not_exists, catalog, table_id_); + } + + bool updated_configuration = false; + try + { + if (!do_lazy_init) + { + if (is_table_function) + configuration->lazyInitializeIfNeeded(object_storage, context_); + else + configuration->update(object_storage, context_); + updated_configuration = true; + } + } + catch (...) + { + // If we don't have format or schema yet, we can't ignore failed configuration update, + // because relevant configuration is crucial for format and schema inference + if (mode_ <= LoadingStrictnessLevel::CREATE || need_resolve_columns_or_format) + { + throw; + } + tryLogCurrentException(log); + } ColumnsDescription columns{columns_in_table_or_function_definition}; std::string sample_path; - resolveSchemaAndFormat(columns, configuration->format, object_storage, configuration, {}, sample_path, context_); + if (need_resolve_columns_or_format) + resolveSchemaAndFormat(columns, object_storage, configuration, {}, sample_path, context_); + else + validateSupportedColumns(columns, *configuration); configuration->check(context_); - if (sample_path.empty() - && context_->getSettingsRef()[Setting::use_hive_partitioning] - && !configuration->isDataLakeConfiguration() - && !configuration->partition_strategy) + if (updated_configuration && sample_path.empty() + && context_->getSettingsRef()[Setting::use_hive_partitioning] + && !configuration->isDataLakeConfiguration() + && !configuration->getPartitionStrategy()) + { sample_path = getPathSample(context_); + } /// Not grabbing the file_columns because it is not necessary to do it here. std::tie(hive_partition_columns_to_read_from_file_path, std::ignore) = HivePartitioningUtils::setupHivePartitioningForObjectStorage( @@ -101,7 +176,8 @@ StorageObjectStorageCluster::StorageObjectStorageCluster( StorageInMemoryMetadata metadata; metadata.setColumns(columns); - if (is_table_function && configuration->isDataLakeConfiguration()) + + if (!do_lazy_init && is_table_function && configuration->isDataLakeConfiguration()) { /// For datalake table functions, always pin the current snapshot version so that /// query execution uses the same snapshot as query analysis (logical-race fix). @@ -123,10 +199,45 @@ StorageObjectStorageCluster::StorageObjectStorageCluster( metadata.columns, context_, /* format_settings */std::nullopt, - configuration->partition_strategy_type, + configuration->getPartitionStrategyType(), sample_path)); setInMemoryMetadata(metadata); + + const auto can_use_parallel_replicas = !cluster_name_.empty() + && context_->getSettingsRef()[Setting::parallel_replicas_for_cluster_engines] + && context_->canUseTaskBasedParallelReplicas() + && !context_->isDistributed(); + + bool can_use_distributed_iterator = + context_->getClientInfo().collaborate_with_initiator && + can_use_parallel_replicas; + + pure_storage = std::make_shared( + configuration, + object_storage, + context_, + getStorageID(), + IStorageCluster::getInMemoryMetadata().getColumns(), + IStorageCluster::getInMemoryMetadata().getConstraints(), + comment_, + format_settings_, + mode_, + catalog, + if_not_exists, + is_datalake_query, + /* distributed_processing */can_use_distributed_iterator, + partition_by, + order_by, + /* is_table_function */is_table_function, + /* lazy_init */lazy_init, + updated_configuration, + sample_path); + + auto virtuals_ = getVirtualsPtr(); + if (virtuals_) + pure_storage->setVirtuals(*virtuals_); + pure_storage->setInMemoryMetadata(IStorageCluster::getInMemoryMetadata()); } std::string StorageObjectStorageCluster::getName() const @@ -136,6 +247,8 @@ std::string StorageObjectStorageCluster::getName() const std::optional StorageObjectStorageCluster::totalRows(ContextPtr query_context) const { + if (pure_storage) + return pure_storage->totalRows(query_context); configuration->lazyInitializeIfNeeded( object_storage, query_context); @@ -144,17 +257,148 @@ std::optional StorageObjectStorageCluster::totalRows(ContextPtr query_co std::optional StorageObjectStorageCluster::totalBytes(ContextPtr query_context) const { + if (pure_storage) + return pure_storage->totalBytes(query_context); configuration->lazyInitializeIfNeeded( object_storage, query_context); return configuration->totalBytes(query_context); } +void StorageObjectStorageCluster::updateQueryForDistributedEngineIfNeeded(ASTPtr & query, ContextPtr context) +{ + // Change table engine on table function for distributed request + // CREATE TABLE t (...) ENGINE=IcebergS3(...) + // SELECT * FROM t + // change on + // SELECT * FROM icebergS3(...) + // to execute on cluster nodes + + auto * select_query = query->as(); + if (!select_query || !select_query->tables()) + return; + + auto * tables = select_query->tables()->as(); + + if (tables->children.empty()) + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "Expected SELECT query from table with engine {}, got '{}'", + configuration->getEngineName(), query->formatForLogging()); + + auto * table_expression = tables->children[0]->as()->table_expression->as(); + + if (!table_expression) + return; + + if (!table_expression->database_and_table_name) + return; + + auto & table_identifier_typed = table_expression->database_and_table_name->as(); + + auto table_alias = table_identifier_typed.tryGetAlias(); + + auto storage_engine_name = configuration->getEngineName(); + if (storage_engine_name == "Iceberg") + { + switch (configuration->getType()) + { + case ObjectStorageType::S3: + storage_engine_name = "IcebergS3"; + break; + case ObjectStorageType::Azure: + storage_engine_name = "IcebergAzure"; + break; + case ObjectStorageType::HDFS: + storage_engine_name = "IcebergHDFS"; + break; + default: + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "Can't find table function for engine {}", + storage_engine_name + ); + } + } + + static std::unordered_map engine_to_function = { + {"S3", "s3"}, + {"Azure", "azureBlobStorage"}, + {"HDFS", "hdfs"}, + {"Iceberg", "iceberg"}, + {"IcebergS3", "icebergS3"}, + {"IcebergAzure", "icebergAzure"}, + {"IcebergHDFS", "icebergHDFS"}, + {"IcebergLocal", "icebergLocal"}, + {"DeltaLake", "deltaLake"}, + {"DeltaLakeS3", "deltaLakeS3"}, + {"DeltaLakeAzure", "deltaLakeAzure"}, + {"DeltaLakeLocal", "deltaLakeLocal"}, + {"Hudi", "hudi"}, + {"COSN", "cosn"}, + {"GCS", "gcs"}, + {"OSS", "oss"}, + }; + + auto p = engine_to_function.find(storage_engine_name); + if (p == engine_to_function.end()) + { + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "Can't find table function for engine {}", + storage_engine_name + ); + } + + std::string table_function_name = p->second; + + auto function_ast = make_intrusive(); + function_ast->name = table_function_name; + + auto cluster_name = getClusterName(context); + + if (cluster_name.empty()) + { + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "Can't be here without cluster name, no cluster name in query {}", + query->formatForLogging()); + } + + function_ast->arguments = configuration->createArgsWithAccessData(); + function_ast->children.push_back(function_ast->arguments); + function_ast->setAlias(table_alias); + + ASTPtr function_ast_ptr(function_ast); + + table_expression->database_and_table_name = nullptr; + table_expression->table_function = function_ast_ptr; + table_expression->children[0] = function_ast_ptr; + + auto settings = select_query->settings(); + if (settings) + { + auto & settings_ast = settings->as(); + settings_ast.changes.insertSetting("object_storage_cluster", cluster_name); + } + else + { + auto settings_ast_ptr = make_intrusive(); + settings_ast_ptr->is_standalone = false; + settings_ast_ptr->changes.setSetting("object_storage_cluster", cluster_name); + select_query->setExpression(ASTSelectQuery::Expression::SETTINGS, std::move(settings_ast_ptr)); + } + + cluster_name_in_settings = true; +} + void StorageObjectStorageCluster::updateQueryToSendIfNeeded( ASTPtr & query, const DB::StorageSnapshotPtr & storage_snapshot, const ContextPtr & context) { + updateQueryForDistributedEngineIfNeeded(query, context); + auto * table_function = extractTableFunctionFromSelectQuery(query); if (!table_function) return; @@ -177,6 +421,9 @@ void StorageObjectStorageCluster::updateQueryToSendIfNeeded( configuration->getEngineName()); } + ASTPtr object_storage_type_arg; + configuration->extractDynamicStorageType(args, context, &object_storage_type_arg, !cluster_name_in_settings); + ASTPtr settings_temporary_storage = nullptr; for (auto it = args.begin(); it != args.end(); ++it) { @@ -189,19 +436,75 @@ void StorageObjectStorageCluster::updateQueryToSendIfNeeded( } } - if (!endsWith(table_function->name, "Cluster")) - configuration->addStructureAndFormatToArgsIfNeeded(args, structure, configuration->format, context, /*with_structure=*/true); + if (cluster_name_in_settings || !endsWith(table_function->name, "Cluster")) + { + configuration->addStructureAndFormatToArgsIfNeeded(args, structure, configuration->getFormat(), context, /*with_structure=*/true); + + /// Convert to old-stype *Cluster table function. + /// This allows to use old clickhouse versions in cluster. + static std::unordered_map function_to_cluster_function = { + {"s3", "s3Cluster"}, + {"azureBlobStorage", "azureBlobStorageCluster"}, + {"hdfs", "hdfsCluster"}, + {"iceberg", "icebergCluster"}, + {"icebergS3", "icebergS3Cluster"}, + {"icebergAzure", "icebergAzureCluster"}, + {"icebergHDFS", "icebergHDFSCluster"}, + {"icebergLocal", "icebergLocalCluster"}, + {"deltaLake", "deltaLakeCluster"}, + {"deltaLakeS3", "deltaLakeS3Cluster"}, + {"deltaLakeAzure", "deltaLakeAzureCluster"}, + {"hudi", "hudiCluster"}, + {"paimonS3", "paimonS3Cluster"}, + {"paimonAzure", "paimonAzureCluster"}, + }; + + auto p = function_to_cluster_function.find(table_function->name); + if (p == function_to_cluster_function.end()) + { + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "Can't find cluster variant for table function {}", + table_function->name); + } + + table_function->name = p->second; + + auto cluster_name = getClusterName(context); + auto cluster_name_arg = make_intrusive(cluster_name); + args.insert(args.begin(), cluster_name_arg); + + auto * select_query = query->as(); + if (!select_query) + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "Expected SELECT query from table function {}", + configuration->getEngineName()); + + auto settings = select_query->settings(); + if (settings) + { + auto & settings_ast = settings->as(); + if (settings_ast.changes.removeSetting("object_storage_cluster") && settings_ast.changes.empty()) + { + select_query->setExpression(ASTSelectQuery::Expression::SETTINGS, {}); + } + /// No throw if not found - `object_storage_cluster` can be global setting. + } + } else { ASTPtr cluster_name_arg = args.front(); args.erase(args.begin()); - configuration->addStructureAndFormatToArgsIfNeeded(args, structure, configuration->format, context, /*with_structure=*/true); + configuration->addStructureAndFormatToArgsIfNeeded(args, structure, configuration->getFormat(), context, /*with_structure=*/true); args.insert(args.begin(), cluster_name_arg); } if (settings_temporary_storage) { args.insert(args.end(), std::move(settings_temporary_storage)); } + if (object_storage_type_arg) + args.insert(args.end(), object_storage_type_arg); } void StorageObjectStorageCluster::updateExternalDynamicMetadataIfExists(ContextPtr query_context) @@ -230,6 +533,9 @@ void StorageObjectStorageCluster::updateExternalDynamicMetadataIfExists(ContextP } setInMemoryMetadata(new_metadata); + + if (pure_storage) + pure_storage->setInMemoryMetadata(IStorageCluster::getInMemoryMetadata()); } RemoteQueryExecutor::Extension StorageObjectStorageCluster::getTaskIteratorExtension( @@ -259,7 +565,7 @@ RemoteQueryExecutor::Extension StorageObjectStorageCluster::getTaskIteratorExten { iterator = std::make_shared( std::move(iterator), - configuration->format, + configuration->getFormat(), object_storage, local_context ); @@ -295,5 +601,401 @@ RemoteQueryExecutor::Extension StorageObjectStorageCluster::getTaskIteratorExten return RemoteQueryExecutor::Extension{ .task_iterator = std::move(callback) }; } +void StorageObjectStorageCluster::readFallBackToPure( + QueryPlan & query_plan, + const Names & column_names, + const StorageSnapshotPtr & storage_snapshot, + SelectQueryInfo & query_info, + ContextPtr context, + QueryProcessingStage::Enum processed_stage, + size_t max_block_size, + size_t num_streams) +{ + pure_storage->read(query_plan, column_names, storage_snapshot, query_info, context, processed_stage, max_block_size, num_streams); +} + +bool StorageObjectStorageCluster::isClusterSupported() const +{ + return configuration->isClusterSupported(); +} + +SinkToStoragePtr StorageObjectStorageCluster::writeFallBackToPure( + const ASTPtr & query, + const StorageMetadataPtr & metadata_snapshot, + ContextPtr context, + bool async_insert) +{ + return pure_storage->write(query, metadata_snapshot, context, async_insert); +} + +String StorageObjectStorageCluster::getClusterName(ContextPtr context) const +{ + /// StorageObjectStorageCluster is always created for cluster or non-cluster variants. + /// User can specify cluster name in table definition or in setting `object_storage_cluster` + /// only for several queries. When it specified in both places, priority is given to the query setting. + /// When it is empty, non-cluster realization is used. + + if (!isClusterSupported()) + return ""; + + auto cluster_name_from_settings = context->getSettingsRef()[Setting::object_storage_cluster].value; + if (cluster_name_from_settings.empty()) + cluster_name_from_settings = getOriginalClusterName(); + return cluster_name_from_settings; +} + +QueryProcessingStage::Enum StorageObjectStorageCluster::getQueryProcessingStage( + ContextPtr context, QueryProcessingStage::Enum to_stage, const StorageSnapshotPtr & storage_snapshot, SelectQueryInfo & query_info) const +{ + /// Full query if fall back to pure storage. + if (getClusterName(context).empty()) + return QueryProcessingStage::Enum::FetchColumns; + + /// Distributed storage. + return IStorageCluster::getQueryProcessingStage(context, to_stage, storage_snapshot, query_info); +} + +std::optional StorageObjectStorageCluster::distributedWrite( + const ASTInsertQuery & query, + ContextPtr context) +{ + if (getClusterName(context).empty()) + return pure_storage->distributedWrite(query, context); + return IStorageCluster::distributedWrite(query, context); +} + +void StorageObjectStorageCluster::drop() +{ + if (pure_storage) + { + pure_storage->drop(); + return; + } + IStorageCluster::drop(); +} + +void StorageObjectStorageCluster::dropInnerTableIfAny(bool sync, ContextPtr context) +{ + if (getClusterName(context).empty()) + { + pure_storage->dropInnerTableIfAny(sync, context); + return; + } + IStorageCluster::dropInnerTableIfAny(sync, context); +} + +void StorageObjectStorageCluster::truncate( + const ASTPtr & query, + const StorageMetadataPtr & metadata_snapshot, + ContextPtr local_context, + TableExclusiveLockHolder & lock_holder) +{ + /// Full query if fall back to pure storage. + if (getClusterName(local_context).empty()) + { + pure_storage->truncate(query, metadata_snapshot, local_context, lock_holder); + return; + } + + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Truncate is not supported by storage {}", getName()); } +void StorageObjectStorageCluster::checkTableCanBeRenamed(const StorageID & new_name) const +{ + if (pure_storage) + pure_storage->checkTableCanBeRenamed(new_name); + IStorageCluster::checkTableCanBeRenamed(new_name); +} + +void StorageObjectStorageCluster::rename(const String & new_path_to_table_data, const StorageID & new_table_id) +{ + if (pure_storage) + pure_storage->rename(new_path_to_table_data, new_table_id); + IStorageCluster::rename(new_path_to_table_data, new_table_id); +} + +void StorageObjectStorageCluster::renameInMemory(const StorageID & new_table_id) +{ + if (pure_storage) + pure_storage->renameInMemory(new_table_id); + IStorageCluster::renameInMemory(new_table_id); +} + +void StorageObjectStorageCluster::alter(const AlterCommands & params, ContextPtr context, AlterLockHolder & alter_lock_holder) +{ + if (getClusterName(context).empty()) + { + pure_storage->alter(params, context, alter_lock_holder); + setInMemoryMetadata(pure_storage->getInMemoryMetadata()); + return; + } + IStorageCluster::alter(params, context, alter_lock_holder); + pure_storage->setInMemoryMetadata(IStorageCluster::getInMemoryMetadata()); +} + +void StorageObjectStorageCluster::addInferredEngineArgsToCreateQuery(ASTs & args, const ContextPtr & context) const +{ + configuration->addStructureAndFormatToArgsIfNeeded(args, "", configuration->getFormat(), context, /*with_structure=*/false); +} + +StorageMetadataPtr StorageObjectStorageCluster::getInMemoryMetadataPtr(bool bypass_metadata_cache) const +{ + if (pure_storage) + return pure_storage->getInMemoryMetadataPtr(bypass_metadata_cache); + return IStorageCluster::getInMemoryMetadataPtr(bypass_metadata_cache); +} + +IDataLakeMetadata * StorageObjectStorageCluster::getExternalMetadata(ContextPtr query_context) +{ + if (getClusterName(query_context).empty()) + return pure_storage->getExternalMetadata(query_context); + + configuration->update( + object_storage, + query_context); + + return configuration->getExternalMetadata(); +} + +void StorageObjectStorageCluster::checkAlterIsPossible(const AlterCommands & commands, ContextPtr context) const +{ + if (getClusterName(context).empty()) + { + pure_storage->checkAlterIsPossible(commands, context); + return; + } + IStorageCluster::checkAlterIsPossible(commands, context); +} + +void StorageObjectStorageCluster::checkMutationIsPossible(const MutationCommands & commands, const Settings & settings) const +{ + if (pure_storage) + { + pure_storage->checkMutationIsPossible(commands, settings); + return; + } + IStorageCluster::checkMutationIsPossible(commands, settings); +} + +Pipe StorageObjectStorageCluster::alterPartition( + const StorageMetadataPtr & metadata_snapshot, + const PartitionCommands & commands, + ContextPtr context) +{ + if (getClusterName(context).empty()) + return pure_storage->alterPartition(metadata_snapshot, commands, context); + return IStorageCluster::alterPartition(metadata_snapshot, commands, context); +} + +void StorageObjectStorageCluster::checkAlterPartitionIsPossible( + const PartitionCommands & commands, + const StorageMetadataPtr & metadata_snapshot, + const Settings & settings, + ContextPtr context) const +{ + if (getClusterName(context).empty()) + { + pure_storage->checkAlterPartitionIsPossible(commands, metadata_snapshot, settings, context); + return; + } + IStorageCluster::checkAlterPartitionIsPossible(commands, metadata_snapshot, settings, context); +} + +bool StorageObjectStorageCluster::optimize( + const ASTPtr & query, + const StorageMetadataPtr & metadata_snapshot, + const ASTPtr & partition, + bool final, + bool deduplicate, + const Names & deduplicate_by_columns, + bool cleanup, + ContextPtr context) +{ + if (getClusterName(context).empty()) + return pure_storage->optimize(query, metadata_snapshot, partition, final, deduplicate, deduplicate_by_columns, cleanup, context); + return IStorageCluster::optimize(query, metadata_snapshot, partition, final, deduplicate, deduplicate_by_columns, cleanup, context); +} + +QueryPipeline StorageObjectStorageCluster::updateLightweight(const MutationCommands & commands, ContextPtr context) +{ + if (getClusterName(context).empty()) + return pure_storage->updateLightweight(commands, context); + return IStorageCluster::updateLightweight(commands, context); +} + +void StorageObjectStorageCluster::mutate(const MutationCommands & commands, ContextPtr context) +{ + if (getClusterName(context).empty()) + { + pure_storage->mutate(commands, context); + return; + } + IStorageCluster::mutate(commands, context); +} + +CancellationCode StorageObjectStorageCluster::killMutation(const String & mutation_id) +{ + if (pure_storage) + return pure_storage->killMutation(mutation_id); + return IStorageCluster::killMutation(mutation_id); +} + +void StorageObjectStorageCluster::waitForMutation(const String & mutation_id, bool wait_for_another_mutation) +{ + if (pure_storage) + { + pure_storage->waitForMutation(mutation_id, wait_for_another_mutation); + return; + } + IStorageCluster::waitForMutation(mutation_id, wait_for_another_mutation); +} + +void StorageObjectStorageCluster::setMutationCSN(const String & mutation_id, UInt64 csn) +{ + if (pure_storage) + { + pure_storage->setMutationCSN(mutation_id, csn); + return; + } + IStorageCluster::setMutationCSN(mutation_id, csn); +} + +CancellationCode StorageObjectStorageCluster::killPartMoveToShard(const UUID & task_uuid) +{ + if (pure_storage) + return pure_storage->killPartMoveToShard(task_uuid); + return IStorageCluster::killPartMoveToShard(task_uuid); +} + +void StorageObjectStorageCluster::startup() +{ + if (pure_storage) + { + pure_storage->startup(); + return; + } + IStorageCluster::startup(); +} + +void StorageObjectStorageCluster::shutdown(bool is_drop) +{ + if (pure_storage) + { + pure_storage->shutdown(is_drop); + return; + } + IStorageCluster::shutdown(is_drop); +} + +void StorageObjectStorageCluster::flushAndPrepareForShutdown() +{ + if (pure_storage) + { + pure_storage->flushAndPrepareForShutdown(); + return; + } + IStorageCluster::flushAndPrepareForShutdown(); +} + +ActionLock StorageObjectStorageCluster::getActionLock(StorageActionBlockType action_type) +{ + if (pure_storage) + return pure_storage->getActionLock(action_type); + return IStorageCluster::getActionLock(action_type); +} + +void StorageObjectStorageCluster::onActionLockRemove(StorageActionBlockType action_type) +{ + if (pure_storage) + { + pure_storage->onActionLockRemove(action_type); + return; + } + IStorageCluster::onActionLockRemove(action_type); +} + +bool StorageObjectStorageCluster::supportsDelete() const +{ + if (pure_storage) + return pure_storage->supportsDelete(); + return IStorageCluster::supportsDelete(); +} + +bool StorageObjectStorageCluster::supportsParallelInsert() const +{ + if (pure_storage) + return pure_storage->supportsParallelInsert(); + return IStorageCluster::supportsParallelInsert(); +} + +bool StorageObjectStorageCluster::prefersLargeBlocks() const +{ + if (pure_storage) + return pure_storage->prefersLargeBlocks(); + return IStorageCluster::prefersLargeBlocks(); +} + +bool StorageObjectStorageCluster::supportsPartitionBy() const +{ + if (pure_storage) + return pure_storage->supportsPartitionBy(); + return IStorageCluster::supportsPartitionBy(); +} + +bool StorageObjectStorageCluster::supportsSubcolumns() const +{ + if (pure_storage) + return pure_storage->supportsSubcolumns(); + return IStorageCluster::supportsSubcolumns(); +} + +bool StorageObjectStorageCluster::supportsTrivialCountOptimization(const StorageSnapshotPtr & snapshot, ContextPtr context) const +{ + if (pure_storage) + return pure_storage->supportsTrivialCountOptimization(snapshot, context); + return IStorageCluster::supportsTrivialCountOptimization(snapshot, context); +} + +bool StorageObjectStorageCluster::supportsPrewhere() const +{ + if (pure_storage) + return pure_storage->supportsPrewhere(); + return IStorageCluster::supportsPrewhere(); +} + +bool StorageObjectStorageCluster::canMoveConditionsToPrewhere() const +{ + if (pure_storage) + return pure_storage->canMoveConditionsToPrewhere(); + return IStorageCluster::canMoveConditionsToPrewhere(); +} + +std::optional StorageObjectStorageCluster::supportedPrewhereColumns() const +{ + if (pure_storage) + return pure_storage->supportedPrewhereColumns(); + return IStorageCluster::supportedPrewhereColumns(); +} + +IStorageCluster::ColumnSizeByName StorageObjectStorageCluster::getColumnSizes() const +{ + if (pure_storage) + return pure_storage->getColumnSizes(); + return IStorageCluster::getColumnSizes(); +} + +bool StorageObjectStorageCluster::parallelizeOutputAfterReading(ContextPtr context) const +{ + if (pure_storage) + return pure_storage->parallelizeOutputAfterReading(context); + return IStorageCluster::parallelizeOutputAfterReading(context); +} + +Pipe StorageObjectStorageCluster::executeCommand(const String & command_name, const ASTPtr & args, ContextPtr context) +{ + if (pure_storage) + return pure_storage->executeCommand(command_name, args, context); + return IStorageCluster::executeCommand(command_name, args, context); +} + +} diff --git a/src/Storages/ObjectStorage/StorageObjectStorageCluster.h b/src/Storages/ObjectStorage/StorageObjectStorageCluster.h index 012278264531..4a88ba75f7b2 100644 --- a/src/Storages/ObjectStorage/StorageObjectStorageCluster.h +++ b/src/Storages/ObjectStorage/StorageObjectStorageCluster.h @@ -18,8 +18,16 @@ class StorageObjectStorageCluster : public IStorageCluster const ColumnsDescription & columns_in_table_or_function_definition, const ConstraintsDescription & constraints_, const ASTPtr & partition_by, + const ASTPtr & order_by, ContextPtr context_, - bool is_table_function_ = false); + const String & comment_, + std::optional format_settings_, + LoadingStrictnessLevel mode_, + std::shared_ptr catalog, + bool if_not_exists, + bool is_datalake_query, + bool is_table_function_ = false, + bool lazy_init = false); std::string getName() const override; @@ -34,20 +42,160 @@ class StorageObjectStorageCluster : public IStorageCluster std::optional totalRows(ContextPtr query_context) const override; std::optional totalBytes(ContextPtr query_context) const override; + void setClusterNameInSettings(bool cluster_name_in_settings_) { cluster_name_in_settings = cluster_name_in_settings_; } + + String getClusterName(ContextPtr context) const override; + + QueryProcessingStage::Enum getQueryProcessingStage(ContextPtr, QueryProcessingStage::Enum, const StorageSnapshotPtr &, SelectQueryInfo &) const override; + + std::optional distributedWrite( + const ASTInsertQuery & query, + ContextPtr context) override; + + void drop() override; + + void dropInnerTableIfAny(bool sync, ContextPtr context) override; + + void truncate( + const ASTPtr & query, + const StorageMetadataPtr & metadata_snapshot, + ContextPtr local_context, + TableExclusiveLockHolder &) override; + + void checkTableCanBeRenamed(const StorageID & new_name) const override; + + void rename(const String & new_path_to_table_data, const StorageID & new_table_id) override; + + void renameInMemory(const StorageID & new_table_id) override; + + void alter(const AlterCommands & params, ContextPtr context, AlterLockHolder & alter_lock_holder) override; + + void addInferredEngineArgsToCreateQuery(ASTs & args, const ContextPtr & context) const override; + + IDataLakeMetadata * getExternalMetadata(ContextPtr query_context); + + StorageMetadataPtr getInMemoryMetadataPtr(bool bypass_metadata_cache = false) const override; + + void checkAlterIsPossible(const AlterCommands & commands, ContextPtr context) const override; + + void checkMutationIsPossible(const MutationCommands & commands, const Settings & settings) const override; + + Pipe alterPartition( + const StorageMetadataPtr & metadata_snapshot, + const PartitionCommands & commands, + ContextPtr context) override; + + void checkAlterPartitionIsPossible( + const PartitionCommands & commands, + const StorageMetadataPtr & metadata_snapshot, + const Settings & settings, + ContextPtr context) const override; + + bool optimize( + const ASTPtr & query, + const StorageMetadataPtr & metadata_snapshot, + const ASTPtr & partition, + bool final, + bool deduplicate, + const Names & deduplicate_by_columns, + bool cleanup, + ContextPtr context) override; + + QueryPipeline updateLightweight(const MutationCommands & commands, ContextPtr context) override; + + void mutate(const MutationCommands & commands, ContextPtr context) override; + + Pipe executeCommand(const String & command_name, const ASTPtr & args, ContextPtr context) override; + + CancellationCode killMutation(const String & mutation_id) override; + + void waitForMutation(const String & mutation_id, bool wait_for_another_mutation) override; + + void setMutationCSN(const String & mutation_id, UInt64 csn) override; + + CancellationCode killPartMoveToShard(const UUID & task_uuid) override; + + void startup() override; + + void shutdown(bool is_drop = false) override; + + void flushAndPrepareForShutdown() override; + + ActionLock getActionLock(StorageActionBlockType action_type) override; + + void onActionLockRemove(StorageActionBlockType action_type) override; void updateExternalDynamicMetadataIfExists(ContextPtr query_context) override; + bool supportsDelete() const override; + + bool supportsParallelInsert() const override; + + bool prefersLargeBlocks() const override; + + bool supportsPartitionBy() const override; + + bool supportsSubcolumns() const override; + + bool supportsTrivialCountOptimization(const StorageSnapshotPtr &, ContextPtr) const override; + + /// Things required for PREWHERE. + bool supportsPrewhere() const override; + bool canMoveConditionsToPrewhere() const override; + std::optional supportedPrewhereColumns() const override; + ColumnSizeByName getColumnSizes() const override; + + bool parallelizeOutputAfterReading(ContextPtr context) const override; + + bool isObjectStorage() const override { return true; } + private: void updateQueryToSendIfNeeded( ASTPtr & query, const StorageSnapshotPtr & storage_snapshot, const ContextPtr & context) override; + bool isClusterSupported() const override; + + void readFallBackToPure( + QueryPlan & query_plan, + const Names & column_names, + const StorageSnapshotPtr & storage_snapshot, + SelectQueryInfo & query_info, + ContextPtr context, + QueryProcessingStage::Enum processed_stage, + size_t max_block_size, + size_t num_streams) override; + + SinkToStoragePtr writeFallBackToPure( + const ASTPtr & query, + const StorageMetadataPtr & metadata_snapshot, + ContextPtr context, + bool async_insert) override; + + /* + In case the table was created with `object_storage_cluster` setting, + modify the AST query object so that it uses the table function implementation + by mapping the engine name to table function name and setting `object_storage_cluster`. + For table like + CREATE TABLE table ENGINE=S3(...) SETTINGS object_storage_cluster='cluster' + coverts request + SELECT * FROM table + to + SELECT * FROM s3(...) SETTINGS object_storage_cluster='cluster' + to make distributed request over cluster 'cluster'. + */ + void updateQueryForDistributedEngineIfNeeded(ASTPtr & query, ContextPtr context); + const String engine_name; - const StorageObjectStorageConfigurationPtr configuration; + StorageObjectStorageConfigurationPtr configuration; const ObjectStoragePtr object_storage; NamesAndTypesList virtual_columns; NamesAndTypesList hive_partition_columns_to_read_from_file_path; + bool cluster_name_in_settings; + + /// non-clustered storage to fall back on pure realisation if needed + std::shared_ptr pure_storage; }; } diff --git a/src/Storages/ObjectStorage/StorageObjectStorageConfiguration.cpp b/src/Storages/ObjectStorage/StorageObjectStorageConfiguration.cpp index fe768a7b037f..253af498ae88 100644 --- a/src/Storages/ObjectStorage/StorageObjectStorageConfiguration.cpp +++ b/src/Storages/ObjectStorage/StorageObjectStorageConfiguration.cpp @@ -90,70 +90,69 @@ bool StorageObjectStorageConfiguration::shouldReloadSchemaForConsistency(Context void StorageObjectStorageConfiguration::initialize( - StorageObjectStorageConfiguration & configuration_to_initialize, ASTs & engine_args, ContextPtr local_context, bool with_table_structure, const StorageID * table_id) { std::string disk_name; - if (configuration_to_initialize.isDataLakeConfiguration()) + if (isDataLakeConfiguration()) { - const auto & storage_settings = configuration_to_initialize.getDataLakeSettings(); + const auto & storage_settings = getDataLakeSettings(); disk_name = storage_settings[DataLakeStorageSetting::disk].changed ? storage_settings[DataLakeStorageSetting::disk].value : ""; } if (!disk_name.empty()) - configuration_to_initialize.fromDisk(disk_name, engine_args, local_context, with_table_structure); + fromDisk(disk_name, engine_args, local_context, with_table_structure); else if (auto named_collection = tryGetNamedCollectionWithOverrides(engine_args, local_context, true, nullptr, table_id)) - configuration_to_initialize.fromNamedCollection(*named_collection, local_context); + fromNamedCollection(*named_collection, local_context); else - configuration_to_initialize.fromAST(engine_args, local_context, with_table_structure); + fromAST(engine_args, local_context, with_table_structure); - if (configuration_to_initialize.isNamespaceWithGlobs()) + if (isNamespaceWithGlobs()) throw Exception(ErrorCodes::BAD_ARGUMENTS, - "Expression can not have wildcards inside {} name", configuration_to_initialize.getNamespaceType()); + "Expression can not have wildcards inside {} name", getNamespaceType()); - if (configuration_to_initialize.isDataLakeConfiguration()) + if (isDataLakeConfiguration()) { - if (configuration_to_initialize.partition_strategy_type != PartitionStrategyFactory::StrategyType::NONE) + if (getPartitionStrategyType() != PartitionStrategyFactory::StrategyType::NONE) { throw Exception(ErrorCodes::BAD_ARGUMENTS, "The `partition_strategy` argument is incompatible with data lakes"); } } - else if (configuration_to_initialize.partition_strategy_type == PartitionStrategyFactory::StrategyType::NONE) + else if (getPartitionStrategyType() == PartitionStrategyFactory::StrategyType::NONE) { - if (configuration_to_initialize.getRawPath().hasPartitionWildcard()) + if (getRawPath().hasPartitionWildcard()) { // Promote to wildcard in case it is not data lake to make it backwards compatible - configuration_to_initialize.partition_strategy_type = PartitionStrategyFactory::StrategyType::WILDCARD; + setPartitionStrategyType(PartitionStrategyFactory::StrategyType::WILDCARD); } } - if (configuration_to_initialize.format == "auto") + if (format == "auto") { - if (configuration_to_initialize.isDataLakeConfiguration()) + if (isDataLakeConfiguration()) { - configuration_to_initialize.format = "Parquet"; + format = "Parquet"; } else { - configuration_to_initialize.format + format = FormatFactory::instance() - .tryGetFormatFromFileName(configuration_to_initialize.isArchive() ? configuration_to_initialize.getPathInArchive() : configuration_to_initialize.getRawPath().path) + .tryGetFormatFromFileName(isArchive() ? getPathInArchive() : getRawPath().path) .value_or("auto"); } } else - FormatFactory::instance().checkFormatName(configuration_to_initialize.format); + FormatFactory::instance().checkFormatName(format); /// It might be changed on `StorageObjectStorageConfiguration::initPartitionStrategy` /// We shouldn't set path for disk setup because path prefix is already set in used object_storage. if (disk_name.empty()) - configuration_to_initialize.read_path = configuration_to_initialize.getRawPath(); + read_path = getRawPath(); - configuration_to_initialize.initialized = true; + initialized = true; } String StorageObjectStorageConfiguration::computeSchemaHash(const ColumnsDescription & columns) diff --git a/src/Storages/ObjectStorage/StorageObjectStorageConfiguration.h b/src/Storages/ObjectStorage/StorageObjectStorageConfiguration.h index a1b95973a5c6..fecdffdc3777 100644 --- a/src/Storages/ObjectStorage/StorageObjectStorageConfiguration.h +++ b/src/Storages/ObjectStorage/StorageObjectStorageConfiguration.h @@ -83,8 +83,7 @@ class StorageObjectStorageConfiguration using Paths = std::vector; /// Initialize configuration from either AST or NamedCollection. - static void initialize( - StorageObjectStorageConfiguration & configuration_to_initialize, + virtual void initialize( ASTs & engine_args, ContextPtr local_context, bool with_table_structure, @@ -107,11 +106,11 @@ class StorageObjectStorageConfiguration /// Raw URI, specified by a user. Used in permission check. virtual const String & getRawURI() const = 0; - const Path & getPathForRead() const; + virtual const Path & getPathForRead() const; // Path used for writing, it should not be globbed and might contain a partition key - Path getPathForWrite(const std::string & partition_id = "") const; + virtual Path getPathForWrite(const std::string & partition_id = "") const; - void setPathForRead(const Path & path) + virtual void setPathForRead(const Path & path) { read_path = path; } @@ -133,10 +132,10 @@ class StorageObjectStorageConfiguration virtual void addStructureAndFormatToArgsIfNeeded( ASTs & args, const String & structure_, const String & format_, ContextPtr context, bool with_structure) = 0; - bool isNamespaceWithGlobs() const; + virtual bool isNamespaceWithGlobs() const; virtual bool isArchive() const { return false; } - bool isPathInArchiveWithGlobs() const; + virtual bool isPathInArchiveWithGlobs() const; virtual std::string getPathInArchive() const; virtual void check(ContextPtr context); @@ -181,9 +180,9 @@ class StorageObjectStorageConfiguration const PrepareReadingFromFormatHiveParams & hive_parameters); static String computeSchemaHash(const ColumnsDescription & columns); - void setSchemaHash(const String & hash); + virtual void setSchemaHash(const String & hash); - void initPartitionStrategy(ASTPtr partition_by, const ColumnsDescription & columns, ContextPtr context); + virtual void initPartitionStrategy(ASTPtr partition_by, const ColumnsDescription & columns, ContextPtr context); virtual std::optional getTableStateSnapshot(ContextPtr local_context) const; virtual std::unique_ptr buildStorageMetadataFromState(const DataLakeTableStateSnapshot & state, ContextPtr local_context) const; @@ -262,6 +261,49 @@ class StorageObjectStorageConfiguration throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method getDataLakeSettings() is not implemented for configuration type {}", getTypeName()); } + /// Create arguments for table function with path and access parameters + virtual ASTPtr createArgsWithAccessData() const + { + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method createArgsWithAccessData is not supported by storage {}", getEngineName()); + } + + virtual void fromNamedCollection(const NamedCollection & collection, ContextPtr context) = 0; + virtual void fromAST(ASTs & args, ContextPtr context, bool with_structure) = 0; + virtual void fromDisk(const String & /*disk_name*/, ASTs & /*args*/, ContextPtr /*context*/, bool /*with_structure*/) + { + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "method fromDisk is not implemented"); + } + + virtual ObjectStorageType extractDynamicStorageType(ASTs & /* args */, ContextPtr /* context */, ASTPtr * /* type_arg */, bool /* cluster_name_first */) const + { return ObjectStorageType::None; } + + virtual const String & getFormat() const { return format; } + virtual const String & getCompressionMethod() const { return compression_method; } + virtual const String & getStructure() const { return structure; } + + virtual PartitionStrategyFactory::StrategyType getPartitionStrategyType() const { return partition_strategy_type; } + virtual bool getPartitionColumnsInDataFile() const { return partition_columns_in_data_file; } + virtual std::shared_ptr getPartitionStrategy() const { return partition_strategy; } + + virtual void setFormat(const String & format_) { format = format_; } + virtual void setCompressionMethod(const String & compression_method_) { compression_method = compression_method_; } + virtual void setStructure(const String & structure_) { structure = structure_; } + + virtual void setPartitionStrategyType(PartitionStrategyFactory::StrategyType partition_strategy_type_) + { + partition_strategy_type = partition_strategy_type_; + } + virtual void setPartitionColumnsInDataFile(bool partition_columns_in_data_file_) + { + partition_columns_in_data_file = partition_columns_in_data_file_; + } + virtual void setPartitionStrategy(const std::shared_ptr & partition_strategy_) + { + partition_strategy = partition_strategy_; + } + + virtual void assertInitialized() const; + virtual ColumnMapperPtr getColumnMapperForObject(ObjectInfoPtr /**/) const { return nullptr; } virtual ColumnMapperPtr getColumnMapperForCurrentSchema(StorageMetadataPtr /**/, ContextPtr /**/) const { return nullptr; } @@ -281,6 +323,9 @@ class StorageObjectStorageConfiguration virtual void drop(ContextPtr) {} + virtual bool isClusterSupported() const { return true; } + +private: String format = "auto"; String compression_method = "auto"; String structure = "auto"; @@ -292,14 +337,6 @@ class StorageObjectStorageConfiguration protected: void initializeFromParsedArguments(const StorageParsedArguments & parsed_arguments); - virtual void fromNamedCollection(const NamedCollection & collection, ContextPtr context) = 0; - virtual void fromAST(ASTs & args, ContextPtr context, bool with_structure) = 0; - virtual void fromDisk(const String & /*disk_name*/, ASTs & /*args*/, ContextPtr /*context*/, bool /*with_structure*/) - { - throw Exception(ErrorCodes::NOT_IMPLEMENTED, "method fromDisk is not implemented"); - } - - void assertInitialized() const; bool initialized = false; String schema_hash; diff --git a/src/Storages/ObjectStorage/StorageObjectStorageSettings.h b/src/Storages/ObjectStorage/StorageObjectStorageSettings.h index 1314b7d87c3d..180d71ea0c8a 100644 --- a/src/Storages/ObjectStorage/StorageObjectStorageSettings.h +++ b/src/Storages/ObjectStorage/StorageObjectStorageSettings.h @@ -68,7 +68,17 @@ struct StorageObjectStorageSettings using StorageObjectStorageSettingsPtr = std::shared_ptr; +// clang-format off + +#define STORAGE_OBJECT_STORAGE_RELATED_SETTINGS(DECLARE, ALIAS) \ + DECLARE(String, object_storage_cluster, "", R"( +Cluster for distributed requests +)", 0) \ + +// clang-format on + #define LIST_OF_STORAGE_OBJECT_STORAGE_SETTINGS(M, ALIAS) \ + STORAGE_OBJECT_STORAGE_RELATED_SETTINGS(M, ALIAS) \ LIST_OF_ALL_FORMAT_SETTINGS(M, ALIAS) } diff --git a/src/Storages/ObjectStorage/StorageObjectStorageSink.cpp b/src/Storages/ObjectStorage/StorageObjectStorageSink.cpp index 2e4fee714f6c..c629bf899f09 100644 --- a/src/Storages/ObjectStorage/StorageObjectStorageSink.cpp +++ b/src/Storages/ObjectStorage/StorageObjectStorageSink.cpp @@ -142,7 +142,7 @@ PartitionedStorageObjectStorageSink::PartitionedStorageObjectStorageSink( std::optional format_settings_, SharedHeader sample_block_, ContextPtr context_) - : PartitionedSink(configuration_->partition_strategy, context_, sample_block_) + : PartitionedSink(configuration_->getPartitionStrategy(), context_, sample_block_) , object_storage(object_storage_) , configuration(configuration_) , query_settings(configuration_->getQuerySettings(context_)) @@ -177,8 +177,8 @@ SinkPtr PartitionedStorageObjectStorageSink::createSinkForPartition(const String format_settings, std::make_shared(partition_strategy->getFormatHeader()), context, - configuration->format, - configuration->compression_method); + configuration->getFormat(), + configuration->getCompressionMethod()); } } diff --git a/src/Storages/ObjectStorage/StorageObjectStorageSource.cpp b/src/Storages/ObjectStorage/StorageObjectStorageSource.cpp index 2a13d2d09603..7bc70ad1e2ff 100644 --- a/src/Storages/ObjectStorage/StorageObjectStorageSource.cpp +++ b/src/Storages/ObjectStorage/StorageObjectStorageSource.cpp @@ -503,7 +503,7 @@ void StorageObjectStorageSource::addNumRowsToCache(const ObjectInfo & object_inf { const auto cache_key = getKeyForSchemaCache( getUniqueStoragePathIdentifier(*configuration, object_info), - object_info.getFileFormat().value_or(configuration->format), + object_info.getFileFormat().value_or(configuration->getFormat()), format_settings, read_context); schema_cache.addNumRows(cache_key, num_rows); @@ -583,7 +583,7 @@ StorageObjectStorageSource::ReaderHolder StorageObjectStorageSource::createReade const auto cache_key = getKeyForSchemaCache( getUniqueStoragePathIdentifier(*configuration, *object_info), - object_info->getFileFormat().value_or(configuration->format), + object_info->getFileFormat().value_or(configuration->getFormat()), format_settings, context_); @@ -620,13 +620,13 @@ StorageObjectStorageSource::ReaderHolder StorageObjectStorageSource::createReade CompressionMethod compression_method; if (const auto * object_info_in_archive = dynamic_cast(object_info.get())) { - compression_method = chooseCompressionMethod(configuration->getPathInArchive(), configuration->compression_method); + compression_method = chooseCompressionMethod(configuration->getPathInArchive(), configuration->getCompressionMethod()); const auto & archive_reader = object_info_in_archive->archive_reader; read_buf = archive_reader->readFile(object_info_in_archive->path_in_archive, /*throw_on_not_found=*/true); } else { - compression_method = chooseCompressionMethod(object_info->getFileName(), configuration->compression_method); + compression_method = chooseCompressionMethod(object_info->getFileName(), configuration->getCompressionMethod()); read_buf = createReadBuffer(object_info->relative_path_with_metadata, object_storage, context_, log); } @@ -660,7 +660,7 @@ StorageObjectStorageSource::ReaderHolder StorageObjectStorageSource::createReade "Reading object '{}', size: {} bytes, with format: {}", object_info->getPath(), object_info->getObjectMetadata()->size_bytes, - object_info->getFileFormat().value_or(configuration->format)); + object_info->getFileFormat().value_or(configuration->getFormat())); bool use_native_reader_v3 = format_settings.has_value() ? format_settings->parquet.use_native_reader_v3 @@ -668,12 +668,12 @@ StorageObjectStorageSource::ReaderHolder StorageObjectStorageSource::createReade InputFormatPtr input_format; if (context_->getSettingsRef()[Setting::use_parquet_metadata_cache] && use_native_reader_v3 - && (object_info->getFileFormat().value_or(configuration->format) == "Parquet") + && (object_info->getFileFormat().value_or(configuration->getFormat()) == "Parquet") && !object_info->getObjectMetadata()->etag.empty()) { const std::optional object_with_metadata = object_info->relative_path_with_metadata; input_format = FormatFactory::instance().getInputWithMetadata( - object_info->getFileFormat().value_or(configuration->format), + object_info->getFileFormat().value_or(configuration->getFormat()), *read_buf, initial_header, context_, @@ -692,7 +692,7 @@ StorageObjectStorageSource::ReaderHolder StorageObjectStorageSource::createReade else { input_format = FormatFactory::instance().getInput( - object_info->getFileFormat().value_or(configuration->format), + object_info->getFileFormat().value_or(configuration->getFormat()), *read_buf, initial_header, context_, diff --git a/src/Storages/ObjectStorage/Utils.cpp b/src/Storages/ObjectStorage/Utils.cpp index f1cb422e2cd2..542329f76944 100644 --- a/src/Storages/ObjectStorage/Utils.cpp +++ b/src/Storages/ObjectStorage/Utils.cpp @@ -60,14 +60,13 @@ std::optional checkAndGetNewFileOnInsertIfNeeded( void resolveSchemaAndFormat( ColumnsDescription & columns, - std::string & format, ObjectStoragePtr object_storage, - const StorageObjectStorageConfigurationPtr & configuration, + StorageObjectStorageConfigurationPtr & configuration, std::optional format_settings, std::string & sample_path, const ContextPtr & context) { - if (format == "auto") + if (configuration->getFormat() == "auto") { if (configuration->isDataLakeConfiguration()) { @@ -89,21 +88,23 @@ void resolveSchemaAndFormat( if (columns.empty()) { - if (format == "auto") + if (configuration->getFormat() == "auto") { + std::string format; std::tie(columns, format) = StorageObjectStorage::resolveSchemaAndFormatFromData( object_storage, configuration, format_settings, sample_path, context); + configuration->setFormat(format); } else { - chassert(!format.empty()); + chassert(!configuration->getFormat().empty()); columns = StorageObjectStorage::resolveSchemaFromData(object_storage, configuration, format_settings, sample_path, context); } } } - else if (format == "auto") + else if (configuration->getFormat() == "auto") { - format = StorageObjectStorage::resolveFormatFromData(object_storage, configuration, format_settings, sample_path, context); + configuration->setFormat(StorageObjectStorage::resolveFormatFromData(object_storage, configuration, format_settings, sample_path, context)); } validateSupportedColumns(columns, *configuration); diff --git a/src/Storages/ObjectStorage/Utils.h b/src/Storages/ObjectStorage/Utils.h index 3045c8ec74f4..5cc48a5d581d 100644 --- a/src/Storages/ObjectStorage/Utils.h +++ b/src/Storages/ObjectStorage/Utils.h @@ -16,9 +16,8 @@ std::optional checkAndGetNewFileOnInsertIfNeeded( void resolveSchemaAndFormat( ColumnsDescription & columns, - std::string & format, ObjectStoragePtr object_storage, - const StorageObjectStorageConfigurationPtr & configuration, + StorageObjectStorageConfigurationPtr & configuration, std::optional format_settings, std::string & sample_path, const ContextPtr & context); diff --git a/src/Storages/ObjectStorage/registerStorageObjectStorage.cpp b/src/Storages/ObjectStorage/registerStorageObjectStorage.cpp index e573919b7608..0861ac676b4d 100644 --- a/src/Storages/ObjectStorage/registerStorageObjectStorage.cpp +++ b/src/Storages/ObjectStorage/registerStorageObjectStorage.cpp @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -42,11 +43,20 @@ namespace // LocalObjectStorage is only supported for Iceberg Datalake operations where Avro format is required. For regular file access, use FileStorage instead. #if USE_AWS_S3 || USE_AZURE_BLOB_STORAGE || USE_HDFS || USE_AVRO -std::shared_ptr +StoragePtr createStorageObjectStorage(const StorageFactory::Arguments & args, StorageObjectStorageConfigurationPtr configuration) { const auto context = args.getLocalContext(); - StorageObjectStorageConfiguration::initialize(*configuration, args.engine_args, context, false, &args.table_id); + + std::string cluster_name; + + if (args.storage_def->settings) + { + if (const auto * value = args.storage_def->settings->changes.tryGet("object_storage_cluster")) + cluster_name = value->safeGet(); + } + + configuration->initialize(args.engine_args, context, false, &args.table_id); // Use format settings from global server context + settings from // the SETTINGS clause of the create query. Settings from current @@ -77,24 +87,26 @@ createStorageObjectStorage(const StorageFactory::Arguments & args, StorageObject ContextMutablePtr context_copy = Context::createCopy(args.getContext()); Settings settings_copy = args.getLocalContext()->getSettingsCopy(); context_copy->setSettings(settings_copy); - return std::make_shared( + return std::make_shared( + cluster_name, configuration, // We only want to perform write actions (e.g. create a container in Azure) when the table is being created, // and we want to avoid it when we load the table after a server restart. configuration->createObjectStorage(context, /* is_readonly */ args.mode != LoadingStrictnessLevel::CREATE, std::nullopt), - context_copy, /// Use global context. args.table_id, args.columns, args.constraints, + partition_by, + order_by, + context_copy, /// Use global context. args.comment, format_settings, args.mode, configuration->getCatalog(context, args.query.attach), args.query.if_not_exists, - /* is_datalake_query*/ false, - /* distributed_processing */ false, - partition_by, - order_by); + /* is_datalake_query */ false, + /* is_table_function */ false, + /* lazy_init */ false); } #endif @@ -241,9 +253,8 @@ void registerStorageIceberg(StorageFactory & factory) } } else -#if USE_AWS_S3 - configuration = std::make_shared(storage_settings); -#endif + configuration = std::make_shared(storage_settings); + if (configuration == nullptr) { throw Exception(ErrorCodes::BAD_ARGUMENTS, "This storage configuration is not available at this build"); @@ -386,7 +397,7 @@ void registerStorageIceberg(StorageFactory & factory) #if USE_PARQUET && USE_DELTA_KERNEL_RS void registerStorageDeltaLake(StorageFactory & factory) { -#if USE_AWS_S3 +# if USE_AWS_S3 factory.registerStorage( DeltaLakeDefinition::storage_engine_name, [&](const StorageFactory::Arguments & args) diff --git a/src/Storages/ObjectStorageQueue/StorageObjectStorageQueue.cpp b/src/Storages/ObjectStorageQueue/StorageObjectStorageQueue.cpp index 59db55cc013b..fc74f4e6872e 100644 --- a/src/Storages/ObjectStorageQueue/StorageObjectStorageQueue.cpp +++ b/src/Storages/ObjectStorageQueue/StorageObjectStorageQueue.cpp @@ -310,12 +310,12 @@ StorageObjectStorageQueue::StorageObjectStorageQueue( validateSettings(*queue_settings_, is_attach); object_storage = configuration->createObjectStorage(context_, /* is_readonly */true, std::nullopt); - FormatFactory::instance().checkFormatName(configuration->format); + FormatFactory::instance().checkFormatName(configuration->getFormat()); configuration->check(context_); ColumnsDescription columns{columns_}; std::string sample_path; - resolveSchemaAndFormat(columns, configuration->format, object_storage, configuration, format_settings, sample_path, context_); + resolveSchemaAndFormat(columns, object_storage, configuration, format_settings, sample_path, context_); configuration->check(context_); bool is_path_with_hive_partitioning = false; @@ -372,7 +372,7 @@ StorageObjectStorageQueue::StorageObjectStorageQueue( zk_path, *queue_settings_, storage_metadata.getColumns(), - configuration_->format, + configuration_->getFormat(), context_, is_attach, log); @@ -523,7 +523,7 @@ void StorageObjectStorageQueue::renameInMemory(const StorageID & new_table_id) bool StorageObjectStorageQueue::supportsSubsetOfColumns(const ContextPtr & context_) const { - return FormatFactory::instance().checkIfFormatSupportsSubsetOfColumns(configuration->format, context_, format_settings); + return FormatFactory::instance().checkIfFormatSupportsSubsetOfColumns(configuration->getFormat(), context_, format_settings); } class ReadFromObjectStorageQueue : public SourceStepWithFilter diff --git a/src/Storages/ObjectStorageQueue/StorageObjectStorageQueue.h b/src/Storages/ObjectStorageQueue/StorageObjectStorageQueue.h index 5a251e7b7e16..740bf7c22db1 100644 --- a/src/Storages/ObjectStorageQueue/StorageObjectStorageQueue.h +++ b/src/Storages/ObjectStorageQueue/StorageObjectStorageQueue.h @@ -58,7 +58,7 @@ class StorageObjectStorageQueue : public IStorage, WithContext void renameInMemory(const StorageID & new_table_id) override; - const auto & getFormatName() const { return configuration->format; } + const auto & getFormatName() const { return configuration->getFormat(); } const fs::path & getZooKeeperPath() const { return zk_path; } diff --git a/src/Storages/ObjectStorageQueue/registerQueueStorage.cpp b/src/Storages/ObjectStorageQueue/registerQueueStorage.cpp index eba3d81a0e3d..d3c80fe42028 100644 --- a/src/Storages/ObjectStorageQueue/registerQueueStorage.cpp +++ b/src/Storages/ObjectStorageQueue/registerQueueStorage.cpp @@ -48,7 +48,7 @@ StoragePtr createQueueStorage(const StorageFactory::Arguments & args) throw Exception(ErrorCodes::BAD_ARGUMENTS, "External data source must have arguments"); auto configuration = std::make_shared(); - StorageObjectStorageConfiguration::initialize(*configuration, args.engine_args, args.getContext(), false, &args.table_id); + configuration->initialize(args.engine_args, args.getContext(), false, &args.table_id); // Use format settings from global server context + settings from // the SETTINGS clause of the create query. Settings from current diff --git a/src/Storages/StorageDistributed.cpp b/src/Storages/StorageDistributed.cpp index 155587764a75..844eacd1e0e0 100644 --- a/src/Storages/StorageDistributed.cpp +++ b/src/Storages/StorageDistributed.cpp @@ -884,6 +884,7 @@ QueryTreeNodePtr buildQueryTreeDistributed(SelectQueryInfo & query_info, auto table_function_node = std::make_shared(remote_table_function_node.getFunctionName()); table_function_node->getArgumentsNode() = remote_table_function_node.getArgumentsNode(); + table_function_node->setSettingsChanges(remote_table_function_node.getSettingsChanges()); if (table_expression_modifiers) table_function_node->setTableExpressionModifiers(*table_expression_modifiers); @@ -1387,7 +1388,8 @@ std::optional StorageDistributed::distributedWrite(const ASTInser } if (auto src_storage_cluster = std::dynamic_pointer_cast(src_storage)) { - return distributedWriteFromClusterStorage(*src_storage_cluster, query, local_context); + if (!src_storage_cluster->getClusterName(local_context).empty()) + return distributedWriteFromClusterStorage(*src_storage_cluster, query, local_context); } return {}; diff --git a/src/Storages/System/StorageSystemIcebergHistory.cpp b/src/Storages/System/StorageSystemIcebergHistory.cpp index 778c2691fc36..f7ca27e65955 100644 --- a/src/Storages/System/StorageSystemIcebergHistory.cpp +++ b/src/Storages/System/StorageSystemIcebergHistory.cpp @@ -15,7 +15,7 @@ #include #include #include -#include +#include #include #include #include @@ -57,7 +57,7 @@ void StorageSystemIcebergHistory::fillData([[maybe_unused]] MutableColumns & res const auto access = context_copy->getAccess(); - auto add_history_record = [&](const DatabaseTablesIteratorPtr & it, StorageObjectStorage * object_storage) + auto add_history_record = [&](const DatabaseTablesIteratorPtr & it, StorageObjectStorageCluster * object_storage) { if (!access->isGranted(AccessType::SHOW_TABLES, it->databaseName(), it->name())) return; @@ -106,7 +106,7 @@ void StorageSystemIcebergHistory::fillData([[maybe_unused]] MutableColumns & res // Table was dropped while acquiring the lock, skipping table continue; - if (auto * object_storage_table = dynamic_cast(storage.get())) + if (auto * object_storage_table = dynamic_cast(storage.get())) { add_history_record(iterator, object_storage_table); } diff --git a/src/Storages/extractTableFunctionFromSelectQuery.cpp b/src/Storages/extractTableFunctionFromSelectQuery.cpp index 57302036c889..064f538eeae7 100644 --- a/src/Storages/extractTableFunctionFromSelectQuery.cpp +++ b/src/Storages/extractTableFunctionFromSelectQuery.cpp @@ -9,7 +9,7 @@ namespace DB { -ASTFunction * extractTableFunctionFromSelectQuery(ASTPtr & query) +ASTTableExpression * extractTableExpressionASTPtrFromSelectQuery(ASTPtr & query) { auto * select_query = query->as(); if (!select_query || !select_query->tables()) @@ -17,10 +17,36 @@ ASTFunction * extractTableFunctionFromSelectQuery(ASTPtr & query) auto * tables = select_query->tables()->as(); auto * table_expression = tables->children[0]->as()->table_expression->as(); - if (!table_expression->table_function) + return table_expression; +} + +ASTPtr extractTableFunctionASTPtrFromSelectQuery(ASTPtr & query) +{ + auto table_expression = extractTableExpressionASTPtrFromSelectQuery(query); + return table_expression ? table_expression->table_function : nullptr; +} + +ASTPtr extractTableASTPtrFromSelectQuery(ASTPtr & query) +{ + auto table_expression = extractTableExpressionASTPtrFromSelectQuery(query); + return table_expression ? table_expression->database_and_table_name : nullptr; +} + +ASTFunction * extractTableFunctionFromSelectQuery(ASTPtr & query) +{ + auto table_function_ast = extractTableFunctionASTPtrFromSelectQuery(query); + if (!table_function_ast) return nullptr; - return table_expression->table_function->as(); + return table_function_ast->as(); +} + +ASTExpressionList * extractTableFunctionArgumentsFromSelectQuery(ASTPtr & query) +{ + auto * table_function = extractTableFunctionFromSelectQuery(query); + if (!table_function) + return nullptr; + return table_function->arguments->as(); } } diff --git a/src/Storages/extractTableFunctionFromSelectQuery.h b/src/Storages/extractTableFunctionFromSelectQuery.h index c69cc7ce6c52..2a845477df82 100644 --- a/src/Storages/extractTableFunctionFromSelectQuery.h +++ b/src/Storages/extractTableFunctionFromSelectQuery.h @@ -1,12 +1,17 @@ #pragma once #include -#include #include +#include namespace DB { +struct ASTTableExpression; +ASTTableExpression * extractTableExpressionASTPtrFromSelectQuery(ASTPtr & query); +ASTPtr extractTableFunctionASTPtrFromSelectQuery(ASTPtr & query); +ASTPtr extractTableASTPtrFromSelectQuery(ASTPtr & query); ASTFunction * extractTableFunctionFromSelectQuery(ASTPtr & query); +ASTExpressionList * extractTableFunctionArgumentsFromSelectQuery(ASTPtr & query); } diff --git a/src/TableFunctions/ITableFunction.h b/src/TableFunctions/ITableFunction.h index 913aeb14843f..f47b6ce4d1bb 100644 --- a/src/TableFunctions/ITableFunction.h +++ b/src/TableFunctions/ITableFunction.h @@ -78,7 +78,7 @@ class ITableFunction : public std::enable_shared_from_this virtual bool supportsReadingSubsetOfColumns(const ContextPtr &) { return true; } - virtual bool canBeUsedToCreateTable() const { return true; } + virtual void validateUseToCreateTable() const {} // INSERT INTO TABLE FUNCTION ... PARTITION BY // Set partition by expression so `ITableFunctionObjectStorage` can construct a proper representation diff --git a/src/TableFunctions/ITableFunctionCluster.h b/src/TableFunctions/ITableFunctionCluster.h index 5345e1a0f0db..920f271f0535 100644 --- a/src/TableFunctions/ITableFunctionCluster.h +++ b/src/TableFunctions/ITableFunctionCluster.h @@ -16,6 +16,7 @@ namespace ErrorCodes extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; extern const int CLUSTER_DOESNT_EXIST; extern const int LOGICAL_ERROR; + extern const int BAD_ARGUMENTS; } /// Base class for *Cluster table functions that require cluster_name for the first argument. @@ -46,9 +47,13 @@ class ITableFunctionCluster : public Base throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected table function name: {}", table_function->name); } - bool canBeUsedToCreateTable() const override { return false; } bool isClusterFunction() const override { return true; } + void validateUseToCreateTable() const override + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Table function '{}' cannot be used to create a table", getName()); + } + protected: void parseArguments(const ASTPtr & ast, ContextPtr context) override { @@ -70,9 +75,11 @@ class ITableFunctionCluster : public Base /// Cluster name is always the first cluster_name = checkAndGetLiteralArgument(args[0], "cluster_name"); - - if (!context->tryGetCluster(cluster_name)) - throw Exception(ErrorCodes::CLUSTER_DOESNT_EXIST, "Requested cluster '{}' not found", cluster_name); + /// Remove check cluster existing here + /// In query like + /// remote('remote_host', xxxCluster('remote_cluster', ...)) + /// 'remote_cluster' can be defined only on 'remote_host' + /// If cluster not exists, query falls later /// Just cut the first arg (cluster_name) and try to parse other table function arguments as is args.erase(args.begin()); diff --git a/src/TableFunctions/TableFunctionObjectStorage.cpp b/src/TableFunctions/TableFunctionObjectStorage.cpp index c555834b7207..0b9a976efa6b 100644 --- a/src/TableFunctions/TableFunctionObjectStorage.cpp +++ b/src/TableFunctions/TableFunctionObjectStorage.cpp @@ -194,7 +194,7 @@ template ColumnsDescription TableFunctionObjectStorage< Definition, Configuration, is_data_lake>::getActualTableStructure(ContextPtr context, bool is_insert_query) const { - if (configuration->structure == "auto") + if (configuration->getStructure() == "auto") { auto storage = getObjectStorage(context, !is_insert_query); configuration->lazyInitializeIfNeeded(object_storage, context); @@ -203,7 +203,6 @@ ColumnsDescription TableFunctionObjectStorage< ColumnsDescription columns; resolveSchemaAndFormat( columns, - configuration->format, std::move(storage), configuration, /* format_settings */std::nullopt, @@ -220,7 +219,7 @@ ColumnsDescription TableFunctionObjectStorage< return columns; } - return parseColumnsListFromString(configuration->structure, context); + return parseColumnsListFromString(configuration->getStructure(), context); } template @@ -234,8 +233,8 @@ StoragePtr TableFunctionObjectStorage:: chassert(configuration); ColumnsDescription columns; - if (configuration->structure != "auto") - columns = parseColumnsListFromString(configuration->structure, context); + if (configuration->getStructure() != "auto") + columns = parseColumnsListFromString(configuration->getStructure(), context); else if (!structure_hint.empty()) columns = structure_hint; else if (!cached_columns.empty()) @@ -263,8 +262,15 @@ StoragePtr TableFunctionObjectStorage:: columns, ConstraintsDescription{}, partition_by, + /* order_by */ nullptr, context, - /* is_table_function */true); + /* comment */ String{}, + /* format_settings */ std::nullopt, /// No format_settings + /* mode */ LoadingStrictnessLevel::CREATE, + configuration->getCatalog(context, /* attach */ false), + /* if_not_exists */ false, + /* is_datalake_query*/ false, + /* is_table_function */ true); storage->startup(); return storage; @@ -314,16 +320,7 @@ void registerTableFunctionObjectStorage(TableFunctionFactory & factory) { UNUSED(factory); #if USE_AWS_S3 - factory.registerFunction>( - { - .description=R"(The table function can be used to read the data stored on AWS S3.)", - .examples{{S3Definition::name, "SELECT * FROM s3(url, access_key_id, secret_access_key)", ""}}, - .category = FunctionDocumentation::Category::TableFunction - }, - {.allow_readonly = false} - ); - - factory.registerFunction>( + factory.registerFunction>( { .description=R"(The table function can be used to read the data stored on GCS.)", .examples{{GCSDefinition::name, "SELECT * FROM gcs(url, access_key_id, secret_access_key)", ""}}, @@ -332,7 +329,7 @@ void registerTableFunctionObjectStorage(TableFunctionFactory & factory) {.allow_readonly = false} ); - factory.registerFunction>( + factory.registerFunction>( { .description=R"(The table function can be used to read the data stored on COSN.)", .examples{{COSNDefinition::name, "SELECT * FROM cosn(url, access_key_id, secret_access_key)", ""}}, @@ -341,7 +338,7 @@ void registerTableFunctionObjectStorage(TableFunctionFactory & factory) {.allow_readonly = false} ); - factory.registerFunction>( + factory.registerFunction>( { .description=R"(The table function can be used to read the data stored on OSS.)", .examples{{OSSDefinition::name, "SELECT * FROM oss(url, access_key_id, secret_access_key)", ""}}, @@ -350,54 +347,28 @@ void registerTableFunctionObjectStorage(TableFunctionFactory & factory) {.allow_readonly = false} ); #endif - -#if USE_AZURE_BLOB_STORAGE - factory.registerFunction>( - { - .description=R"(The table function can be used to read the data stored on Azure Blob Storage.)", - .examples{ - { - AzureDefinition::name, - "SELECT * FROM azureBlobStorage(connection_string|storage_account_url, container_name, blobpath, " - "[account_name, account_key, format, compression, structure])", "" - }}, - .category = FunctionDocumentation::Category::TableFunction - }, - {.allow_readonly = false} - ); -#endif -#if USE_HDFS - factory.registerFunction>( - { - .description=R"(The table function can be used to read the data stored on HDFS virtual filesystem.)", - .examples{ - { - HDFSDefinition::name, - "SELECT * FROM hdfs(url, format, compression, structure])", "" - }}, - .category = FunctionDocumentation::Category::TableFunction - }, - {.allow_readonly = false} - ); -#endif } #if USE_AZURE_BLOB_STORAGE -template class TableFunctionObjectStorage; -template class TableFunctionObjectStorage; +template class TableFunctionObjectStorage; +template class TableFunctionObjectStorage; #endif #if USE_AWS_S3 -template class TableFunctionObjectStorage; -template class TableFunctionObjectStorage; -template class TableFunctionObjectStorage; -template class TableFunctionObjectStorage; -template class TableFunctionObjectStorage; +template class TableFunctionObjectStorage; +template class TableFunctionObjectStorage; +template class TableFunctionObjectStorage; +template class TableFunctionObjectStorage; +template class TableFunctionObjectStorage; #endif #if USE_HDFS -template class TableFunctionObjectStorage; -template class TableFunctionObjectStorage; +template class TableFunctionObjectStorage; +template class TableFunctionObjectStorage; +#endif + +#if USE_AVRO +template class TableFunctionObjectStorage; #endif #if USE_AVRO @@ -443,44 +414,6 @@ template class TableFunctionObjectStorage; #endif -#if USE_AVRO -void registerTableFunctionIceberg(TableFunctionFactory & factory) -{ -#if USE_AWS_S3 - factory.registerFunction( - {.description = R"(The table function can be used to read the Iceberg table stored on S3 object store. Alias to icebergS3)", - .examples{{IcebergDefinition::name, "SELECT * FROM iceberg(url, access_key_id, secret_access_key)", ""}}, - .category = FunctionDocumentation::Category::TableFunction}, - {.allow_readonly = false}); - factory.registerFunction( - {.description = R"(The table function can be used to read the Iceberg table stored on S3 object store.)", - .examples{{IcebergS3Definition::name, "SELECT * FROM icebergS3(url, access_key_id, secret_access_key)", ""}}, - .category = FunctionDocumentation::Category::TableFunction}, - {.allow_readonly = false}); - -#endif -#if USE_AZURE_BLOB_STORAGE - factory.registerFunction( - {.description = R"(The table function can be used to read the Iceberg table stored on Azure object store.)", - .examples{{IcebergAzureDefinition::name, "SELECT * FROM icebergAzure(url, access_key_id, secret_access_key)", ""}}, - .category = FunctionDocumentation::Category::TableFunction}, - {.allow_readonly = false}); -#endif -#if USE_HDFS - factory.registerFunction( - {.description = R"(The table function can be used to read the Iceberg table stored on HDFS virtual filesystem.)", - .examples{{IcebergHDFSDefinition::name, "SELECT * FROM icebergHDFS(url)", ""}}, - .category = FunctionDocumentation::Category::TableFunction}, - {.allow_readonly = false}); -#endif - factory.registerFunction( - {.description = R"(The table function can be used to read the Iceberg table stored locally.)", - .examples{{IcebergLocalDefinition::name, "SELECT * FROM icebergLocal(filename)", ""}}, - .category = FunctionDocumentation::Category::TableFunction}, - {.allow_readonly = false}); -} -#endif - #if USE_AVRO void registerTableFunctionPaimon(TableFunctionFactory & factory) @@ -523,28 +456,6 @@ void registerTableFunctionPaimon(TableFunctionFactory & factory) #if USE_PARQUET && USE_DELTA_KERNEL_RS void registerTableFunctionDeltaLake(TableFunctionFactory & factory) { -#if USE_AWS_S3 - factory.registerFunction( - {.description = R"(The table function can be used to read the DeltaLake table stored on S3, alias of deltaLakeS3.)", - .examples{{DeltaLakeDefinition::name, "SELECT * FROM deltaLake(url, access_key_id, secret_access_key)", ""}}, - .category = FunctionDocumentation::Category::TableFunction}, - {.allow_readonly = false}); - - factory.registerFunction( - {.description = R"(The table function can be used to read the DeltaLake table stored on S3.)", - .examples{{DeltaLakeS3Definition::name, "SELECT * FROM deltaLakeS3(url, access_key_id, secret_access_key)", ""}}, - .category = FunctionDocumentation::Category::TableFunction}, - {.allow_readonly = false}); -#endif - -#if USE_AZURE_BLOB_STORAGE - factory.registerFunction( - {.description = R"(The table function can be used to read the DeltaLake table stored on Azure object store.)", - .examples{{DeltaLakeAzureDefinition::name, "SELECT * FROM deltaLakeAzure(connection_string|storage_account_url, container_name, blobpath, \"\n" - " \"[account_name, account_key, format, compression, structure])", ""}}, - .category = FunctionDocumentation::Category::TableFunction}, - {.allow_readonly = false}); -#endif // Register the new local Delta Lake table function factory.registerFunction( {.description = R"(The table function can be used to read the DeltaLake table stored locally.)", @@ -554,33 +465,15 @@ void registerTableFunctionDeltaLake(TableFunctionFactory & factory) } #endif -#if USE_AWS_S3 -void registerTableFunctionHudi(TableFunctionFactory & factory) -{ - factory.registerFunction( - {.description = R"(The table function can be used to read the Hudi table stored on object store.)", - .examples{{HudiDefinition::name, "SELECT * FROM hudi(url, access_key_id, secret_access_key)", ""}}, - .category = FunctionDocumentation::Category::TableFunction}, - {.allow_readonly = false}); -} -#endif - void registerDataLakeTableFunctions(TableFunctionFactory & factory) { UNUSED(factory); -#if USE_AVRO - registerTableFunctionIceberg(factory); -#endif - -#if USE_AVRO - registerTableFunctionPaimon(factory); -#endif #if USE_PARQUET && USE_DELTA_KERNEL_RS registerTableFunctionDeltaLake(factory); #endif -#if USE_AWS_S3 - registerTableFunctionHudi(factory); +#if USE_AVRO + registerTableFunctionPaimon(factory); #endif } } diff --git a/src/TableFunctions/TableFunctionObjectStorage.h b/src/TableFunctions/TableFunctionObjectStorage.h index b3a0a682746b..2c0e7af6f796 100644 --- a/src/TableFunctions/TableFunctionObjectStorage.h +++ b/src/TableFunctions/TableFunctionObjectStorage.h @@ -25,10 +25,12 @@ struct S3StorageSettings; struct AzureStorageSettings; struct HDFSStorageSettings; -template +template class TableFunctionObjectStorage : public ITableFunction { public: + using Configuration = StorageConfiguration; + static constexpr auto name = Definition::name; using Settings = typename std::conditional_t< is_data_lake, @@ -37,15 +39,16 @@ class TableFunctionObjectStorage : public ITableFunction String getName() const override { return name; } - bool hasStaticStructure() const override { return configuration->structure != "auto"; } + bool hasStaticStructure() const override { return configuration->getStructure() != "auto"; } - bool needStructureHint() const override { return configuration->structure == "auto"; } + bool needStructureHint() const override { return configuration->getStructure() == "auto"; } void setStructureHint(const ColumnsDescription & structure_hint_) override { structure_hint = structure_hint_; } bool supportsReadingSubsetOfColumns(const ContextPtr & context) override { - return configuration->format != "auto" && FormatFactory::instance().checkIfFormatSupportsSubsetOfColumns(configuration->format, context); + return configuration->getFormat() != "auto" + && FormatFactory::instance().checkIfFormatSupportsSubsetOfColumns(configuration->getFormat(), context); } std::unordered_set getVirtualsToCheckBeforeUsingStructureHint() const override @@ -55,7 +58,7 @@ class TableFunctionObjectStorage : public ITableFunction virtual void parseArgumentsImpl(ASTs & args, const ContextPtr & context) { - StorageObjectStorageConfiguration::initialize(*getConfiguration(context), args, context, true); + getConfiguration(context)->initialize(args, context, true); } static void updateStructureAndFormatArgumentsIfNeeded( @@ -67,8 +70,8 @@ class TableFunctionObjectStorage : public ITableFunction if constexpr (is_data_lake) { Configuration configuration(createEmptySettings()); - if (configuration.format == "auto") - configuration.format = "Parquet"; /// Default format of data lakes. + if (configuration.getFormat() == "auto") + configuration.setFormat("Parquet"); /// Default format of data lakes. configuration.addStructureAndFormatToArgsIfNeeded(args, structure, format, context, /*with_structure=*/true); } @@ -110,21 +113,22 @@ class TableFunctionObjectStorage : public ITableFunction }; #if USE_AWS_S3 -using TableFunctionS3 = TableFunctionObjectStorage; +using TableFunctionS3 = TableFunctionObjectStorage; #endif #if USE_AZURE_BLOB_STORAGE -using TableFunctionAzureBlob = TableFunctionObjectStorage; +using TableFunctionAzureBlob = TableFunctionObjectStorage; #endif #if USE_HDFS -using TableFunctionHDFS = TableFunctionObjectStorage; +using TableFunctionHDFS = TableFunctionObjectStorage; #endif #if USE_AVRO +using TableFunctionIceberg = TableFunctionObjectStorage; + # if USE_AWS_S3 -using TableFunctionIceberg = TableFunctionObjectStorage; using TableFunctionIcebergS3 = TableFunctionObjectStorage; # endif # if USE_AZURE_BLOB_STORAGE @@ -149,13 +153,13 @@ using TableFunctionPaimonHDFS = TableFunctionObjectStorage; #endif #if USE_PARQUET && USE_DELTA_KERNEL_RS -#if USE_AWS_S3 +# if USE_AWS_S3 using TableFunctionDeltaLake = TableFunctionObjectStorage; using TableFunctionDeltaLakeS3 = TableFunctionObjectStorage; -#endif -#if USE_AZURE_BLOB_STORAGE +# endif +# if USE_AZURE_BLOB_STORAGE using TableFunctionDeltaLakeAzure = TableFunctionObjectStorage; -#endif +# endif // New alias for local Delta Lake table function using TableFunctionDeltaLakeLocal = TableFunctionObjectStorage; #endif diff --git a/src/TableFunctions/TableFunctionObjectStorageCluster.cpp b/src/TableFunctions/TableFunctionObjectStorageCluster.cpp index 53153178923d..4f813e516f9a 100644 --- a/src/TableFunctions/TableFunctionObjectStorageCluster.cpp +++ b/src/TableFunctions/TableFunctionObjectStorageCluster.cpp @@ -31,8 +31,9 @@ StoragePtr TableFunctionObjectStorageClusterstructure != "auto") - columns = parseColumnsListFromString(configuration->structure, context); + + if (configuration->getStructure() != "auto") + columns = parseColumnsListFromString(configuration->getStructure(), context); else if (!Base::structure_hint.empty()) columns = Base::structure_hint; else if (!cached_columns.empty()) @@ -79,8 +80,16 @@ StoragePtr TableFunctionObjectStorageClusterstartup(); @@ -144,47 +153,53 @@ void registerTableFunctionIcebergCluster(TableFunctionFactory & factory) {.allow_readonly = false} ); -#if USE_AWS_S3 factory.registerFunction( { - .description = R"(The table function can be used to read the Iceberg table stored on store from disk in parallel for many nodes in a specified cluster.)", - .examples{{IcebergClusterDefinition::name, "SELECT * FROM icebergCluster(cluster) SETTINGS disk = 'disk'", ""},{IcebergClusterDefinition::name, "SELECT * FROM icebergCluster(cluster, url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression])", ""}}, + .description = R"(The table function can be used to read the Iceberg table stored on any object store in parallel for many nodes in a specified cluster.)", + .examples{ +# if USE_AWS_S3 + {"icebergCluster", "SELECT * FROM icebergCluster(cluster, url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression], storage_type='s3')", ""}, +# endif +# if USE_AZURE_BLOB_STORAGE + {"icebergCluster", "SELECT * FROM icebergCluster(cluster, connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression], storage_type='azure')", ""}, +# endif +# if USE_HDFS + {"icebergCluster", "SELECT * FROM icebergCluster(cluster, uri, [format], [structure], [compression_method], storage_type='hdfs')", ""}, +# endif + }, .category = FunctionDocumentation::Category::TableFunction }, - {.allow_readonly = false} - ); + {.allow_readonly = false}); +# if USE_AWS_S3 factory.registerFunction( { .description = R"(The table function can be used to read the Iceberg table stored on S3 object store in parallel for many nodes in a specified cluster.)", .examples{{IcebergS3ClusterDefinition::name, "SELECT * FROM icebergS3Cluster(cluster, url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression])", ""}}, .category = FunctionDocumentation::Category::TableFunction }, - {.allow_readonly = false} - ); -#endif + {.allow_readonly = false}); +# endif -#if USE_AZURE_BLOB_STORAGE +# if USE_AZURE_BLOB_STORAGE factory.registerFunction( { .description = R"(The table function can be used to read the Iceberg table stored on Azure object store in parallel for many nodes in a specified cluster.)", .examples{{IcebergAzureClusterDefinition::name, "SELECT * FROM icebergAzureCluster(cluster, connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression])", ""}}, .category = FunctionDocumentation::Category::TableFunction }, - {.allow_readonly = false} - ); -#endif + {.allow_readonly = false}); +# endif -#if USE_HDFS +# if USE_HDFS factory.registerFunction( { .description = R"(The table function can be used to read the Iceberg table stored on HDFS virtual filesystem in parallel for many nodes in a specified cluster.)", .examples{{IcebergHDFSClusterDefinition::name, "SELECT * FROM icebergHDFSCluster(cluster, uri, [format], [structure], [compression_method])", ""}}, .category = FunctionDocumentation::Category::TableFunction }, - {.allow_readonly = false} - ); -#endif + {.allow_readonly = false}); +# endif } void registerTableFunctionPaimonCluster(TableFunctionFactory & factory) diff --git a/src/TableFunctions/TableFunctionObjectStorageCluster.h b/src/TableFunctions/TableFunctionObjectStorageCluster.h index 58acf48d4f2a..26faefc2c5c2 100644 --- a/src/TableFunctions/TableFunctionObjectStorageCluster.h +++ b/src/TableFunctions/TableFunctionObjectStorageCluster.h @@ -12,8 +12,6 @@ namespace DB class Context; -class StorageS3Settings; -class StorageAzureBlobSettings; class StorageS3Configuration; class StorageAzureConfiguration; @@ -47,21 +45,25 @@ class TableFunctionObjectStorageCluster : public ITableFunctionClusterstructure != "auto"; } - bool needStructureHint() const override { return Base::getConfiguration(getQueryOrGlobalContext())->structure == "auto"; } + bool hasStaticStructure() const override { return Base::getConfiguration(getQueryOrGlobalContext())->getStructure() != "auto"; } + bool needStructureHint() const override { return Base::getConfiguration(getQueryOrGlobalContext())->getStructure() == "auto"; } void setStructureHint(const ColumnsDescription & structure_hint_) override { Base::structure_hint = structure_hint_; } }; #if USE_AWS_S3 -using TableFunctionS3Cluster = TableFunctionObjectStorageCluster; +using TableFunctionS3Cluster = TableFunctionObjectStorageCluster; #endif #if USE_AZURE_BLOB_STORAGE -using TableFunctionAzureBlobCluster = TableFunctionObjectStorageCluster; +using TableFunctionAzureBlobCluster = TableFunctionObjectStorageCluster; #endif #if USE_HDFS -using TableFunctionHDFSCluster = TableFunctionObjectStorageCluster; +using TableFunctionHDFSCluster = TableFunctionObjectStorageCluster; +#endif + +#if USE_AVRO +using TableFunctionIcebergCluster = TableFunctionObjectStorageCluster; #endif #if USE_AVRO @@ -70,7 +72,6 @@ using TableFunctionIcebergLocalCluster = TableFunctionObjectStorageCluster; -using TableFunctionIcebergCluster = TableFunctionObjectStorageCluster; #endif #if USE_AVRO && USE_AZURE_BLOB_STORAGE @@ -95,7 +96,7 @@ using TableFunctionPaimonHDFSCluster = TableFunctionObjectStorageCluster; using TableFunctionDeltaLakeS3Cluster = TableFunctionObjectStorageCluster; #endif diff --git a/src/TableFunctions/TableFunctionObjectStorageClusterFallback.cpp b/src/TableFunctions/TableFunctionObjectStorageClusterFallback.cpp new file mode 100644 index 000000000000..1cf854d68409 --- /dev/null +++ b/src/TableFunctions/TableFunctionObjectStorageClusterFallback.cpp @@ -0,0 +1,454 @@ +#include +#include +#include +#include +#include + +namespace DB +{ + +namespace Setting +{ + extern const SettingsString object_storage_cluster; +} + +namespace ErrorCodes +{ + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; + extern const int BAD_ARGUMENTS; +} + +struct S3ClusterFallbackDefinition +{ + static constexpr auto name = "s3"; + static constexpr auto storage_engine_name = "S3"; + static constexpr auto storage_engine_cluster_name = "S3Cluster"; +}; + +struct AzureClusterFallbackDefinition +{ + static constexpr auto name = "azureBlobStorage"; + static constexpr auto storage_engine_name = "Azure"; + static constexpr auto storage_engine_cluster_name = "AzureBlobStorageCluster"; +}; + +struct HDFSClusterFallbackDefinition +{ + static constexpr auto name = "hdfs"; + static constexpr auto storage_engine_name = "HDFS"; + static constexpr auto storage_engine_cluster_name = "HDFSCluster"; +}; + +struct IcebergClusterFallbackDefinition +{ + static constexpr auto name = "iceberg"; + static constexpr auto storage_engine_name = "UNDEFINED"; + static constexpr auto storage_engine_cluster_name = "IcebergCluster"; +}; + +struct IcebergS3ClusterFallbackDefinition +{ + static constexpr auto name = "icebergS3"; + static constexpr auto storage_engine_name = "S3"; + static constexpr auto storage_engine_cluster_name = "IcebergS3Cluster"; +}; + +struct IcebergAzureClusterFallbackDefinition +{ + static constexpr auto name = "icebergAzure"; + static constexpr auto storage_engine_name = "Azure"; + static constexpr auto storage_engine_cluster_name = "IcebergAzureCluster"; +}; + +struct IcebergHDFSClusterFallbackDefinition +{ + static constexpr auto name = "icebergHDFS"; + static constexpr auto storage_engine_name = "HDFS"; + static constexpr auto storage_engine_cluster_name = "IcebergHDFSCluster"; +}; + +struct IcebergLocalClusterFallbackDefinition +{ + static constexpr auto name = "icebergLocal"; + static constexpr auto storage_engine_name = "Local"; + static constexpr auto storage_engine_cluster_name = "IcebergLocalCluster"; +}; + +struct DeltaLakeClusterFallbackDefinition +{ + static constexpr auto name = "deltaLake"; + static constexpr auto storage_engine_name = "S3"; + static constexpr auto storage_engine_cluster_name = "DeltaLakeS3Cluster"; +}; + +struct DeltaLakeS3ClusterFallbackDefinition +{ + static constexpr auto name = "deltaLakeS3"; + static constexpr auto storage_engine_name = "S3"; + static constexpr auto storage_engine_cluster_name = "DeltaLakeS3Cluster"; +}; + +struct DeltaLakeAzureClusterFallbackDefinition +{ + static constexpr auto name = "deltaLakeAzure"; + static constexpr auto storage_engine_name = "Azure"; + static constexpr auto storage_engine_cluster_name = "DeltaLakeAzureCluster"; +}; + +struct HudiClusterFallbackDefinition +{ + static constexpr auto name = "hudi"; + static constexpr auto storage_engine_name = "S3"; + static constexpr auto storage_engine_cluster_name = "HudiS3Cluster"; +}; + +template +void TableFunctionObjectStorageClusterFallback::parseArgumentsImpl(ASTs & args, const ContextPtr & context) +{ + if (args.empty()) + throw Exception( + ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, + "The function {} should have arguments. The first argument must be the cluster name and the rest are the arguments of " + "corresponding table function", + getName()); + + const auto & settings = context->getSettingsRef(); + + is_cluster_function = !settings[Setting::object_storage_cluster].value.empty() && typename Base::Configuration().isClusterSupported(); + + if (is_cluster_function) + { + ASTPtr cluster_name_arg = make_intrusive(settings[Setting::object_storage_cluster].value); + args.insert(args.begin(), cluster_name_arg); + BaseCluster::parseArgumentsImpl(args, context); + args.erase(args.begin()); + } + else + BaseSimple::parseArgumentsImpl(args, context); // NOLINT(bugprone-parent-virtual-call) +} + +template +StoragePtr TableFunctionObjectStorageClusterFallback::executeImpl( + const ASTPtr & ast_function, + ContextPtr context, + const std::string & table_name, + ColumnsDescription cached_columns, + bool is_insert_query) const +{ + if (is_cluster_function) + { + auto result = BaseCluster::executeImpl(ast_function, context, table_name, cached_columns, is_insert_query); + if (auto storage = typeid_cast>(result)) + storage->setClusterNameInSettings(true); + return result; + } + else + return BaseSimple::executeImpl(ast_function, context, table_name, cached_columns, is_insert_query); // NOLINT(bugprone-parent-virtual-call) +} + +template +void TableFunctionObjectStorageClusterFallback::validateUseToCreateTable() const +{ + if (is_cluster_function) + throw Exception( + ErrorCodes::BAD_ARGUMENTS, + "Table function '{}' cannot be used to create a table in cluster mode", + getName()); +} + +#if USE_AWS_S3 +using TableFunctionS3ClusterFallback = TableFunctionObjectStorageClusterFallback; +#endif + +#if USE_AZURE_BLOB_STORAGE +using TableFunctionAzureClusterFallback = TableFunctionObjectStorageClusterFallback; +#endif + +#if USE_HDFS +using TableFunctionHDFSClusterFallback = TableFunctionObjectStorageClusterFallback; +#endif + +#if USE_AVRO +using TableFunctionIcebergClusterFallback = TableFunctionObjectStorageClusterFallback; +using TableFunctionIcebergLocalClusterFallback = TableFunctionObjectStorageClusterFallback; +#endif + +#if USE_AVRO && USE_AWS_S3 +using TableFunctionIcebergS3ClusterFallback = TableFunctionObjectStorageClusterFallback; +#endif + +#if USE_AVRO && USE_AZURE_BLOB_STORAGE +using TableFunctionIcebergAzureClusterFallback = TableFunctionObjectStorageClusterFallback; +#endif + +#if USE_AVRO && USE_HDFS +using TableFunctionIcebergHDFSClusterFallback = TableFunctionObjectStorageClusterFallback; +#endif + +#if USE_AWS_S3 && USE_PARQUET && USE_DELTA_KERNEL_RS +using TableFunctionDeltaLakeClusterFallback = TableFunctionObjectStorageClusterFallback; +using TableFunctionDeltaLakeS3ClusterFallback = TableFunctionObjectStorageClusterFallback; +#endif + +#if USE_AZURE_BLOB_STORAGE && USE_PARQUET && USE_DELTA_KERNEL_RS +using TableFunctionDeltaLakeAzureClusterFallback = TableFunctionObjectStorageClusterFallback; +#endif + +#if USE_AWS_S3 +using TableFunctionHudiClusterFallback = TableFunctionObjectStorageClusterFallback; +#endif + +void registerTableFunctionObjectStorageClusterFallback(TableFunctionFactory & factory) +{ + UNUSED(factory); +#if USE_AWS_S3 + factory.registerFunction( + { + .description=R"(The table function can be used to read the data stored on S3 in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + {"s3", "SELECT * FROM s3(url, format, structure)", ""}, + {"s3", "SELECT * FROM s3(url, format, structure) SETTINGS object_storage_cluster='cluster'", ""} + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); +#endif + +#if USE_AZURE_BLOB_STORAGE + factory.registerFunction( + { + .description=R"(The table function can be used to read the data stored on Azure Blob Storage in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + { + "azureBlobStorage", + "SELECT * FROM azureBlobStorage(connection_string|storage_account_url, container_name, blobpath, " + "[account_name, account_key, format, compression, structure])", "" + }, + { + "azureBlobStorage", + "SELECT * FROM azureBlobStorage(connection_string|storage_account_url, container_name, blobpath, " + "[account_name, account_key, format, compression, structure]) " + "SETTINGS object_storage_cluster='cluster'", "" + }, + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); +#endif + +#if USE_HDFS + factory.registerFunction( + { + .description=R"(The table function can be used to read the data stored on HDFS virtual filesystem in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + { + "hdfs", + "SELECT * FROM hdfs(url, format, compression, structure])", "" + }, + { + "hdfs", + "SELECT * FROM hdfs(url, format, compression, structure]) " + "SETTINGS object_storage_cluster='cluster'", "" + }, + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); +#endif + +#if USE_AVRO + factory.registerFunction( + { + .description=R"(The table function can be used to read the Iceberg table stored on different object store in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + { + "iceberg", + "SELECT * FROM iceberg(url, access_key_id, secret_access_key, storage_type='s3')", "" + }, + { + "iceberg", + "SELECT * FROM iceberg(url, access_key_id, secret_access_key, storage_type='s3') " + "SETTINGS object_storage_cluster='cluster'", "" + }, + { + "iceberg", + "SELECT * FROM iceberg(url, access_key_id, secret_access_key, storage_type='azure')", "" + }, + { + "iceberg", + "SELECT * FROM iceberg(url, storage_type='hdfs') SETTINGS object_storage_cluster='cluster'", "" + }, + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); + + factory.registerFunction( + { + .description=R"(The table function can be used to read the Iceberg table stored on shared disk in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + { + "icebergLocal", + "SELECT * FROM icebergLocal(filename)", "" + }, + { + "icebergLocal", + "SELECT * FROM icebergLocal(filename) " + "SETTINGS object_storage_cluster='cluster'", "" + }, + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); +#endif + +#if USE_AVRO && USE_AWS_S3 + factory.registerFunction( + { + .description=R"(The table function can be used to read the Iceberg table stored on S3 object store in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + { + "icebergS3", + "SELECT * FROM icebergS3(url, access_key_id, secret_access_key)", "" + }, + { + "icebergS3", + "SELECT * FROM icebergS3(url, access_key_id, secret_access_key) " + "SETTINGS object_storage_cluster='cluster'", "" + }, + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); +#endif + +#if USE_AVRO && USE_AZURE_BLOB_STORAGE + factory.registerFunction( + { + .description=R"(The table function can be used to read the Iceberg table stored on Azure object store in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + { + "icebergAzure", + "SELECT * FROM icebergAzure(url, access_key_id, secret_access_key)", "" + }, + { + "icebergAzure", + "SELECT * FROM icebergAzure(url, access_key_id, secret_access_key) " + "SETTINGS object_storage_cluster='cluster'", "" + }, + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); +#endif + +#if USE_AVRO && USE_HDFS + factory.registerFunction( + { + .description=R"(The table function can be used to read the Iceberg table stored on HDFS virtual filesystem in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + { + "icebergHDFS", + "SELECT * FROM icebergHDFS(url)", "" + }, + { + "icebergHDFS", + "SELECT * FROM icebergHDFS(url) SETTINGS object_storage_cluster='cluster'", "" + }, + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); +#endif + +#if USE_PARQUET && USE_DELTA_KERNEL_RS +# if USE_AWS_S3 + factory.registerFunction( + { + .description=R"(The table function can be used to read the DeltaLake table stored on object store in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + { + "deltaLake", + "SELECT * FROM deltaLake(url, access_key_id, secret_access_key)", "" + }, + { + "deltaLake", + "SELECT * FROM deltaLake(url, access_key_id, secret_access_key) " + "SETTINGS object_storage_cluster='cluster'", "" + }, + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); + factory.registerFunction( + { + .description=R"(The table function can be used to read the DeltaLake table stored on object store in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + { + "deltaLakeS3", + "SELECT * FROM deltaLakeS3(url, access_key_id, secret_access_key)", "" + }, + { + "deltaLakeS3", + "SELECT * FROM deltaLakeS3(url, access_key_id, secret_access_key) " + "SETTINGS object_storage_cluster='cluster'", "" + }, + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); +# endif +# if USE_AZURE_BLOB_STORAGE + factory.registerFunction( + { + .description=R"(The table function can be used to read the DeltaLake table stored on object store in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + { + "deltaLakeAzure", + "SELECT * FROM deltaLakeAzure(url, access_key_id, secret_access_key)", "" + }, + { + "deltaLakeAzure", + "SELECT * FROM deltaLakeAzure(url, access_key_id, secret_access_key) " + "SETTINGS object_storage_cluster='cluster'", "" + }, + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); +# endif +#endif + +#if USE_AWS_S3 + factory.registerFunction( + { + .description=R"(The table function can be used to read the Hudi table stored on object store in parallel for many nodes in a specified cluster or from single node.)", + .examples{ + { + "hudi", + "SELECT * FROM hudi(url, access_key_id, secret_access_key)", "" + }, + { + "hudi", + "SELECT * FROM hudi(url, access_key_id, secret_access_key) SETTINGS object_storage_cluster='cluster'", "" + }, + }, + .category = FunctionDocumentation::Category::TableFunction + }, + {.allow_readonly = false} + ); +#endif +} + +} diff --git a/src/TableFunctions/TableFunctionObjectStorageClusterFallback.h b/src/TableFunctions/TableFunctionObjectStorageClusterFallback.h new file mode 100644 index 000000000000..d81acac2be50 --- /dev/null +++ b/src/TableFunctions/TableFunctionObjectStorageClusterFallback.h @@ -0,0 +1,49 @@ +#pragma once +#include "config.h" +#include + +namespace DB +{ + +/** +* Class implementing s3/hdfs/azureBlobStorage(...) table functions, +* which allow to use simple or distributed function variant based on settings. +* If setting `object_storage_cluster` is empty, +* simple single-host variant is used, if setting not empty, cluster variant is used. +* `SELECT * FROM s3('s3://...', ...) SETTINGS object_storage_cluster='cluster'` +* is equal to +* `SELECT * FROM s3Cluster('cluster', 's3://...', ...)` +*/ + +template +class TableFunctionObjectStorageClusterFallback : public Base +{ +public: + using BaseCluster = Base; + using BaseSimple = BaseCluster::Base; + + static constexpr auto name = Definition::name; + + String getName() const override { return name; } + + void validateUseToCreateTable() const override; + +private: + const char * getStorageEngineName() const override + { + return is_cluster_function ? Definition::storage_engine_cluster_name : Definition::storage_engine_name; + } + + StoragePtr executeImpl( + const ASTPtr & ast_function, + ContextPtr context, + const std::string & table_name, + ColumnsDescription cached_columns, + bool is_insert_query) const override; + + void parseArgumentsImpl(ASTs & args, const ContextPtr & context) override; + + bool is_cluster_function = false; +}; + +} diff --git a/src/TableFunctions/TableFunctionRemote.h b/src/TableFunctions/TableFunctionRemote.h index e58d30cf48df..498339231153 100644 --- a/src/TableFunctions/TableFunctionRemote.h +++ b/src/TableFunctions/TableFunctionRemote.h @@ -26,6 +26,8 @@ class TableFunctionRemote : public ITableFunction bool needStructureConversion() const override { return false; } + void setRemoteTableFunction(ASTPtr remote_table_function_ptr_) { remote_table_function_ptr = remote_table_function_ptr_; } + private: StoragePtr executeImpl(const ASTPtr & ast_function, ContextPtr context, const std::string & table_name, ColumnsDescription cached_columns, bool is_insert_query) const override; diff --git a/src/TableFunctions/registerTableFunctions.cpp b/src/TableFunctions/registerTableFunctions.cpp index 28404a1f246f..d582acfef094 100644 --- a/src/TableFunctions/registerTableFunctions.cpp +++ b/src/TableFunctions/registerTableFunctions.cpp @@ -72,6 +72,7 @@ void registerTableFunctions() registerTableFunctionObjectStorage(factory); registerTableFunctionObjectStorageCluster(factory); registerDataLakeTableFunctions(factory); + registerTableFunctionObjectStorageClusterFallback(factory); registerDataLakeClusterTableFunctions(factory); #if USE_YTSAURUS diff --git a/src/TableFunctions/registerTableFunctions.h b/src/TableFunctions/registerTableFunctions.h index 440a13635d4b..8c67c02cb3b8 100644 --- a/src/TableFunctions/registerTableFunctions.h +++ b/src/TableFunctions/registerTableFunctions.h @@ -73,6 +73,7 @@ void registerTableFunctionExplain(TableFunctionFactory & factory); void registerTableFunctionObjectStorage(TableFunctionFactory & factory); void registerTableFunctionObjectStorageCluster(TableFunctionFactory & factory); void registerDataLakeTableFunctions(TableFunctionFactory & factory); +void registerTableFunctionObjectStorageClusterFallback(TableFunctionFactory & factory); void registerDataLakeClusterTableFunctions(TableFunctionFactory & factory); void registerTableFunctionTimeSeries(TableFunctionFactory & factory); diff --git a/tests/integration/compose/docker_compose_iceberg_rest_catalog.yml b/tests/integration/compose/docker_compose_iceberg_rest_catalog.yml index c69a89f6fa58..34de4ffed21b 100644 --- a/tests/integration/compose/docker_compose_iceberg_rest_catalog.yml +++ b/tests/integration/compose/docker_compose_iceberg_rest_catalog.yml @@ -12,15 +12,15 @@ services: - AWS_SECRET_ACCESS_KEY=password - AWS_REGION=us-east-1 ports: - - 8080:8080 - - 10002:10000 - - 10003:10001 + - ${SPARK_ICEBERG_EXTERNAL_PORT:-8080}:8080 + - ${SPARK_ICEBERG_EXTERNAL_PORT_2:-10002}:10000 + - ${SPARK_ICEBERG_EXTERNAL_PORT_3:-10003}:10001 stop_grace_period: 5s cpus: 3 rest: image: tabulario/iceberg-rest:1.6.0 ports: - - 8182:8181 + - ${ICEBERG_REST_EXTERNAL_PORT:-8182}:8181 environment: - AWS_ACCESS_KEY_ID=minio - AWS_SECRET_ACCESS_KEY=ClickHouse_Minio_P@ssw0rd diff --git a/tests/integration/helpers/cluster.py b/tests/integration/helpers/cluster.py index 69d541c495c9..aade584171a8 100644 --- a/tests/integration/helpers/cluster.py +++ b/tests/integration/helpers/cluster.py @@ -686,6 +686,9 @@ def __init__( self.minio_secret_key = minio_secret_key self.spark_session = None + self.spark_iceberg_external_port = 8080 + self.spark_iceberg_external_port_2 = 10002 + self.spark_iceberg_external_port_3 = 10003 self.with_iceberg_catalog = False self.iceberg_rest_catalog_port = 8182 self.with_glue_catalog = False @@ -900,6 +903,8 @@ def __init__( self._letsencrypt_pebble_api_port = 14000 self._letsencrypt_pebble_management_port = 15000 + self.iceberg_rest_external_port = 8182 + self.docker_client: docker.DockerClient = None self.is_up = False self.env = os.environ.copy() @@ -1717,6 +1722,10 @@ def setup_hms_catalog_cmd(self, instance, env_variables, docker_compose_yml_dir) def setup_iceberg_catalog_cmd( self, instance, env_variables, docker_compose_yml_dir, extra_parameters=None ): + env_variables["ICEBERG_REST_EXTERNAL_PORT"] = str(self.iceberg_rest_external_port) + env_variables["SPARK_ICEBERG_EXTERNAL_PORT"] = str(self.spark_iceberg_external_port) + env_variables["SPARK_ICEBERG_EXTERNAL_PORT_2"] = str(self.spark_iceberg_external_port_2) + env_variables["SPARK_ICEBERG_EXTERNAL_PORT_3"] = str(self.spark_iceberg_external_port_3) self.with_iceberg_catalog = True file_name = "docker_compose_iceberg_rest_catalog.yml" if extra_parameters is not None and extra_parameters["docker_compose_file_name"] != "": diff --git a/tests/integration/helpers/iceberg_utils.py b/tests/integration/helpers/iceberg_utils.py index 02f8832ba1b6..7548f61d7a48 100644 --- a/tests/integration/helpers/iceberg_utils.py +++ b/tests/integration/helpers/iceberg_utils.py @@ -236,8 +236,12 @@ def get_creation_expression( table_function=False, use_version_hint=False, run_on_cluster=False, + object_storage_cluster=False, explicit_metadata_path="", additional_settings = [], + storage_type_as_arg=False, + storage_type_in_named_collection=False, + cluster_name_as_literal=True, **kwargs, ): settings_array = list(additional_settings) @@ -248,6 +252,9 @@ def get_creation_expression( if use_version_hint: settings_array.append("iceberg_use_version_hint = true") + if object_storage_cluster: + settings_array.append(f"object_storage_cluster = '{object_storage_cluster}'") + if partition_by: partition_by = "PARTITION BY " + partition_by @@ -264,6 +271,24 @@ def get_creation_expression( else: settings_expression = "" + cluster_name = "'cluster_simple'" if cluster_name_as_literal else "cluster_simple" + + storage_arg = storage_type + engine_part = "" + if (storage_type_in_named_collection): + storage_arg += "_with_type" + elif (storage_type_as_arg): + storage_arg += f", storage_type='{storage_type}'" + else: + if (storage_type == "s3"): + engine_part = "S3" + elif (storage_type == "azure"): + engine_part = "Azure" + elif (storage_type == "hdfs"): + engine_part = "HDFS" + elif (storage_type == "local"): + engine_part = "Local" + if_not_exists_prefix = "" if if_not_exists: if_not_exists_prefix = "IF NOT EXISTS" @@ -276,16 +301,16 @@ def get_creation_expression( if run_on_cluster: assert table_function - return f"icebergS3Cluster('cluster_simple', s3, filename = 'var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')" + return f"iceberg{engine_part}Cluster({cluster_name}, {storage_arg}, filename = 'var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')" else: if table_function: - return f"icebergS3(s3, filename = 'var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')" + return f"iceberg{engine_part}({storage_arg}, filename = 'var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')" else: return ( f""" DROP TABLE IF EXISTS {table_name}; CREATE TABLE {if_not_exists_prefix} {table_name} {schema} - ENGINE=IcebergS3(s3, filename = 'var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/') + ENGINE=Iceberg{engine_part}({storage_arg}, filename = 'var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/') {order_by} {partition_by} {settings_expression}; @@ -296,19 +321,19 @@ def get_creation_expression( if run_on_cluster: assert table_function return f""" - icebergAzureCluster('cluster_simple', azure, container = '{cluster.azure_container_name}', storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}) + iceberg{engine_part}Cluster({cluster_name}, {storage_arg}, container = '{cluster.azure_container_name}', storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}) """ else: if table_function: return f""" - icebergAzure(azure, container = '{cluster.azure_container_name}', storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}) + iceberg{engine_part}({storage_arg}, container = '{cluster.azure_container_name}', storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}) """ else: return ( f""" DROP TABLE IF EXISTS {table_name}; CREATE TABLE {if_not_exists_prefix} {table_name} {schema} - ENGINE=IcebergAzure(azure, container = {cluster.azure_container_name}, storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}) + ENGINE=Iceberg{engine_part}({storage_arg}, container = {cluster.azure_container_name}, storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}) {order_by} {partition_by} {settings_expression} @@ -319,19 +344,19 @@ def get_creation_expression( if run_on_cluster: assert table_function return f""" - icebergLocalCluster('cluster_simple', local, path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}', format={format}) + iceberg{engine_part}Cluster({cluster_name}, {storage_arg}, path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}) """ else: if table_function: return f""" - icebergLocal(local, path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}', format={format}) + iceberg{engine_part}({storage_arg}, path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}', format={format}) """ else: return ( f""" DROP TABLE IF EXISTS {table_name}; CREATE TABLE {if_not_exists_prefix} {table_name} {schema} - ENGINE=IcebergLocal(local, path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}', format={format}) + ENGINE=Iceberg{engine_part}({storage_arg}, path = '/var/lib/clickhouse/user_files/iceberg_data/default/{table_name}/', format={format}) {order_by} {partition_by} {settings_expression} @@ -422,16 +447,43 @@ def create_iceberg_table( run_on_cluster=False, format="Parquet", order_by="", + object_storage_cluster=False, **kwargs, ): if 'output_format_parquet_use_custom_encoder' in kwargs: node.query( - get_creation_expression(storage_type, table_name, cluster, schema, format_version, partition_by, if_not_exists, compression_method, format, order_by, run_on_cluster = run_on_cluster, **kwargs), + get_creation_expression( + storage_type, + table_name, + cluster, + schema, + format_version, + partition_by, + if_not_exists, + compression_method, + format, + order_by, + run_on_cluster=run_on_cluster, + object_storage_cluster=object_storage_cluster, + **kwargs), settings={"output_format_parquet_use_custom_encoder" : 0, "output_format_parquet_parallel_encoding" : 0} ) else: node.query( - get_creation_expression(storage_type, table_name, cluster, schema, format_version, partition_by, if_not_exists, compression_method, format, order_by, run_on_cluster=run_on_cluster, **kwargs), + get_creation_expression( + storage_type, + table_name, + cluster, + schema, + format_version, + partition_by, + if_not_exists, + compression_method, + format, + order_by, + run_on_cluster=run_on_cluster, + object_storage_cluster=object_storage_cluster, + **kwargs), ) diff --git a/tests/integration/test_database_iceberg/configs/iceberg_partition_timezone.xml b/tests/integration/test_database_iceberg/configs/iceberg_partition_timezone.xml new file mode 100644 index 000000000000..40aebd33c515 --- /dev/null +++ b/tests/integration/test_database_iceberg/configs/iceberg_partition_timezone.xml @@ -0,0 +1,7 @@ + + + + UTC + + + diff --git a/tests/integration/test_database_iceberg/configs/timezone.xml b/tests/integration/test_database_iceberg/configs/timezone.xml new file mode 100644 index 000000000000..269e52ef2247 --- /dev/null +++ b/tests/integration/test_database_iceberg/configs/timezone.xml @@ -0,0 +1,3 @@ + + Asia/Istanbul + \ No newline at end of file diff --git a/tests/integration/test_database_iceberg/test.py b/tests/integration/test_database_iceberg/test.py index c75dfce46d5a..ca8b1f2275d5 100644 --- a/tests/integration/test_database_iceberg/test.py +++ b/tests/integration/test_database_iceberg/test.py @@ -74,6 +74,8 @@ DEFAULT_SORT_ORDER = SortOrder(SortField(source_id=2, transform=IdentityTransform())) +AVAILABLE_ENGINES = ["DataLakeCatalog", "Iceberg"] + def list_namespaces(): response = requests.get(f"{BASE_URL_LOCAL}/namespaces") @@ -124,7 +126,7 @@ def generate_record(): def create_clickhouse_iceberg_database( - started_cluster, node, name, additional_settings={} + started_cluster, node, name, additional_settings={}, engine='DataLakeCatalog' ): settings = { "catalog_type": "rest", @@ -139,7 +141,7 @@ def create_clickhouse_iceberg_database( DROP DATABASE IF EXISTS {name}; SET allow_database_iceberg=true; SET write_full_path_in_iceberg_metadata=1; -CREATE DATABASE {name} ENGINE = DataLakeCatalog('{BASE_URL}', 'minio', '{minio_secret_key}') +CREATE DATABASE {name} ENGINE = {engine}('{BASE_URL}', 'minio', '{minio_secret_key}') SETTINGS {",".join((k+"="+repr(v) for k, v in settings.items()))} """ ) @@ -218,7 +220,8 @@ def started_cluster(): cluster.shutdown() -def test_list_tables(started_cluster): +@pytest.mark.parametrize("engine", AVAILABLE_ENGINES) +def test_list_tables(started_cluster, engine): node = started_cluster.instances["node1"] root_namespace = f"clickhouse_{uuid.uuid4()}" @@ -249,7 +252,7 @@ def test_list_tables(started_cluster): for namespace in [namespace_1, namespace_2]: assert len(catalog.list_tables(namespace)) == 0 - create_clickhouse_iceberg_database(started_cluster, node, CATALOG_NAME) + create_clickhouse_iceberg_database(started_cluster, node, CATALOG_NAME, engine=engine) tables_list = "" for table in namespace_1_tables: @@ -284,7 +287,8 @@ def test_list_tables(started_cluster): ) -def test_many_namespaces(started_cluster): +@pytest.mark.parametrize("engine", AVAILABLE_ENGINES) +def test_many_namespaces(started_cluster, engine): node = started_cluster.instances["node1"] root_namespace_1 = f"A_{uuid.uuid4()}" root_namespace_2 = f"B_{uuid.uuid4()}" @@ -305,7 +309,7 @@ def test_many_namespaces(started_cluster): for table in tables: create_table(catalog, namespace, table) - create_clickhouse_iceberg_database(started_cluster, node, CATALOG_NAME) + create_clickhouse_iceberg_database(started_cluster, node, CATALOG_NAME, engine=engine) for namespace in namespaces: for table in tables: @@ -317,7 +321,8 @@ def test_many_namespaces(started_cluster): ) -def test_select(started_cluster): +@pytest.mark.parametrize("engine", AVAILABLE_ENGINES) +def test_select(started_cluster, engine): node = started_cluster.instances["node1"] test_ref = f"test_list_tables_{uuid.uuid4()}" @@ -345,7 +350,7 @@ def test_select(started_cluster): df = pa.Table.from_pylist(data) table.append(df) - create_clickhouse_iceberg_database(started_cluster, node, CATALOG_NAME) + create_clickhouse_iceberg_database(started_cluster, node, CATALOG_NAME, engine=engine) expected = DEFAULT_CREATE_TABLE.format(CATALOG_NAME, namespace, table_name) assert expected == node.query( @@ -359,7 +364,8 @@ def test_select(started_cluster): assert int(node.query(f"SELECT count() FROM system.iceberg_history WHERE table = '{namespace}.{table_name}' and database = '{CATALOG_NAME}'").strip()) == 1 -def test_hide_sensitive_info(started_cluster): +@pytest.mark.parametrize("engine", AVAILABLE_ENGINES) +def test_hide_sensitive_info(started_cluster, engine): node = started_cluster.instances["node1"] test_ref = f"test_hide_sensitive_info_{uuid.uuid4()}" @@ -377,6 +383,7 @@ def test_hide_sensitive_info(started_cluster): node, CATALOG_NAME, additional_settings={"catalog_credential": "SECRET_1"}, + engine=engine, ) assert "SECRET_1" not in node.query(f"SHOW CREATE DATABASE {CATALOG_NAME}") @@ -385,11 +392,13 @@ def test_hide_sensitive_info(started_cluster): node, CATALOG_NAME, additional_settings={"auth_header": "SECRET_2"}, + engine=engine, ) assert "SECRET_2" not in node.query(f"SHOW CREATE DATABASE {CATALOG_NAME}") -def test_tables_with_same_location(started_cluster): +@pytest.mark.parametrize("engine", AVAILABLE_ENGINES) +def test_tables_with_same_location(started_cluster, engine): node = started_cluster.instances["node1"] test_ref = f"test_tables_with_same_location_{uuid.uuid4()}" @@ -420,7 +429,7 @@ def record(key): df = pa.Table.from_pylist(data) table_2.append(df) - create_clickhouse_iceberg_database(started_cluster, node, CATALOG_NAME) + create_clickhouse_iceberg_database(started_cluster, node, CATALOG_NAME, engine=engine) assert 'aaa\naaa\naaa' == node.query(f"SELECT symbol FROM {CATALOG_NAME}.`{namespace}.{table_name}`").strip() assert 'bbb\nbbb\nbbb' == node.query(f"SELECT symbol FROM {CATALOG_NAME}.`{namespace}.{table_name_2}`").strip() @@ -542,6 +551,52 @@ def test_timestamps(started_cluster): assert node.query(f"SHOW CREATE TABLE {CATALOG_NAME}.`{root_namespace}.{table_name}`") == f"CREATE TABLE {CATALOG_NAME}.`{root_namespace}.{table_name}`\\n(\\n `timestamp` Nullable(DateTime64(6)),\\n `timestamptz` Nullable(DateTime64(6, \\'UTC\\'))\\n)\\nENGINE = Iceberg(\\'http://minio:9000/warehouse-rest/data/\\', \\'minio\\', \\'[HIDDEN]\\')\n" assert node.query(f"SELECT * FROM {CATALOG_NAME}.`{root_namespace}.{table_name}`") == "2024-01-01 12:00:00.000000\t2024-01-01 12:00:00.000000\n" + # Berlin - UTC+1 at winter + # Istanbul - UTC+3 at winter + + # 'UTC' is default value, responce is equal to query above + assert node.query(f""" + SELECT * FROM {CATALOG_NAME}.`{root_namespace}.{table_name}` + SETTINGS iceberg_timezone_for_timestamptz='UTC' + """) == "2024-01-01 12:00:00.000000\t2024-01-01 12:00:00.000000\n" + # Timezone from setting + assert node.query(f""" + SELECT * FROM {CATALOG_NAME}.`{root_namespace}.{table_name}` + SETTINGS iceberg_timezone_for_timestamptz='Europe/Berlin' + """) == "2024-01-01 12:00:00.000000\t2024-01-01 13:00:00.000000\n" + # Empty value means session timezone, by default it is 'UTC' too + assert node.query(f""" + SELECT * FROM {CATALOG_NAME}.`{root_namespace}.{table_name}` + SETTINGS iceberg_timezone_for_timestamptz='' + """) == "2024-01-01 12:00:00.000000\t2024-01-01 12:00:00.000000\n" + # If session timezone is used, `timestamptz` does not changed, 'UTC' by default + assert node.query(f""" + SELECT * FROM {CATALOG_NAME}.`{root_namespace}.{table_name}` + SETTINGS session_timezone='Asia/Istanbul' + """) == "2024-01-01 15:00:00.000000\t2024-01-01 12:00:00.000000\n" + # Setiing `iceberg_timezone_for_timestamptz` does not affect `timestamp` column + assert node.query(f""" + SELECT * FROM {CATALOG_NAME}.`{root_namespace}.{table_name}` + SETTINGS session_timezone='Asia/Istanbul', iceberg_timezone_for_timestamptz='Europe/Berlin' + """) == "2024-01-01 15:00:00.000000\t2024-01-01 13:00:00.000000\n" + # Empty value, used non-default session timezone + assert node.query(f""" + SELECT * FROM {CATALOG_NAME}.`{root_namespace}.{table_name}` + SETTINGS session_timezone='Asia/Istanbul', iceberg_timezone_for_timestamptz='' + """) == "2024-01-01 15:00:00.000000\t2024-01-01 15:00:00.000000\n" + # Invalid timezone + assert "Invalid time zone: Foo/Bar" in node.query_and_get_error(f""" + SELECT * FROM {CATALOG_NAME}.`{root_namespace}.{table_name}` + SETTINGS iceberg_timezone_for_timestamptz='Foo/Bar' + """) + + assert node.query(f"SHOW CREATE TABLE {CATALOG_NAME}.`{root_namespace}.{table_name}` SETTINGS iceberg_timezone_for_timestamptz='UTC'") == f"CREATE TABLE {CATALOG_NAME}.`{root_namespace}.{table_name}`\\n(\\n `timestamp` Nullable(DateTime64(6)),\\n `timestamptz` Nullable(DateTime64(6, \\'UTC\\'))\\n)\\nENGINE = Iceberg(\\'http://minio:9000/warehouse-rest/data/\\', \\'minio\\', \\'[HIDDEN]\\')\n" + assert node.query(f"SHOW CREATE TABLE {CATALOG_NAME}.`{root_namespace}.{table_name}` SETTINGS iceberg_timezone_for_timestamptz='Europe/Berlin'") == f"CREATE TABLE {CATALOG_NAME}.`{root_namespace}.{table_name}`\\n(\\n `timestamp` Nullable(DateTime64(6)),\\n `timestamptz` Nullable(DateTime64(6, \\'Europe/Berlin\\'))\\n)\\nENGINE = Iceberg(\\'http://minio:9000/warehouse-rest/data/\\', \\'minio\\', \\'[HIDDEN]\\')\n" + + assert node.query(f"SELECT timezoneOf(timestamptz) FROM {CATALOG_NAME}.`{root_namespace}.{table_name}` LIMIT 1") == "UTC\n" + assert node.query(f"SELECT timezoneOf(timestamptz) FROM {CATALOG_NAME}.`{root_namespace}.{table_name}` LIMIT 1 SETTINGS iceberg_timezone_for_timestamptz='UTC'") == "UTC\n" + assert node.query(f"SELECT timezoneOf(timestamptz) FROM {CATALOG_NAME}.`{root_namespace}.{table_name}` LIMIT 1 SETTINGS iceberg_timezone_for_timestamptz='Europe/Berlin'") == "Europe/Berlin\n" + def test_insert(started_cluster): node = started_cluster.instances["node1"] diff --git a/tests/integration/test_database_iceberg/test_partition_timezone.py b/tests/integration/test_database_iceberg/test_partition_timezone.py new file mode 100644 index 000000000000..1a43f3481ad7 --- /dev/null +++ b/tests/integration/test_database_iceberg/test_partition_timezone.py @@ -0,0 +1,186 @@ +import glob +import json +import logging +import os +import random +import time +import uuid +from datetime import datetime, timedelta + +import pyarrow as pa +import pytest +import requests +import urllib3 +import pytz +from minio import Minio +from pyiceberg.catalog import load_catalog +from pyiceberg.partitioning import PartitionField, PartitionSpec, UNPARTITIONED_PARTITION_SPEC +from pyiceberg.schema import Schema +from pyiceberg.table.sorting import SortField, SortOrder +from pyiceberg.transforms import DayTransform, IdentityTransform +from pyiceberg.types import ( + DoubleType, + LongType, + FloatType, + NestedField, + StringType, + StructType, + TimestampType, + TimestamptzType +) +from pyiceberg.table.sorting import UNSORTED_SORT_ORDER + +from helpers.cluster import ClickHouseCluster, ClickHouseInstance, is_arm +from helpers.config_cluster import minio_secret_key, minio_access_key +from helpers.s3_tools import get_file_contents, list_s3_objects, prepare_s3_bucket +from helpers.test_tools import TSV, csv_compare +from helpers.config_cluster import minio_secret_key + +ICEBERG_PORT = 8183 + +BASE_URL = "http://rest:8181/v1" +BASE_URL_LOCAL = f"http://localhost:{ICEBERG_PORT}/v1" +BASE_URL_LOCAL_RAW = f"http://localhost:{ICEBERG_PORT}" + +CATALOG_NAME = "demo" + +DEFAULT_PARTITION_SPEC = PartitionSpec( + PartitionField( + source_id=1, field_id=1000, transform=DayTransform(), name="datetime_day" + ) +) +DEFAULT_SORT_ORDER = SortOrder(SortField(source_id=1, transform=DayTransform())) +DEFAULT_SCHEMA = Schema( + NestedField(field_id=1, name="datetime", field_type=TimestampType(), required=False), + NestedField(field_id=2, name="value", field_type=LongType(), required=False), +) + + +@pytest.fixture(scope="module") +def started_cluster(): + try: + cluster = ClickHouseCluster(__file__) + cluster.iceberg_rest_external_port = ICEBERG_PORT + cluster.spark_iceberg_external_port = 10004 + cluster.spark_iceberg_external_port_2 = 10005 + cluster.spark_iceberg_external_port_3 = 10006 + cluster.add_instance( + "node1", + main_configs=["configs/timezone.xml", "configs/cluster.xml"], + user_configs=["configs/iceberg_partition_timezone.xml"], + stay_alive=True, + with_iceberg_catalog=True, + with_zookeeper=True, + ) + + logging.info("Starting cluster...") + cluster.start() + + # TODO: properly wait for container + time.sleep(10) + + yield cluster + + finally: + cluster.shutdown() + + +def load_catalog_impl(started_cluster): + return load_catalog( + CATALOG_NAME, + **{ + "uri": BASE_URL_LOCAL_RAW, + "type": "rest", + "s3.endpoint": f"http://{started_cluster.get_instance_ip('minio')}:9000", + "s3.access-key-id": minio_access_key, + "s3.secret-access-key": minio_secret_key, + }, + ) + + +def create_table( + catalog, + namespace, + table, + schema=DEFAULT_SCHEMA, + partition_spec=DEFAULT_PARTITION_SPEC, + sort_order=DEFAULT_SORT_ORDER, +): + return catalog.create_table( + identifier=f"{namespace}.{table}", + schema=schema, + location=f"s3://warehouse-rest/data", + partition_spec=partition_spec, + sort_order=sort_order, + ) + + +def create_clickhouse_iceberg_database( + node, name, additional_settings={}, engine='DataLakeCatalog' +): + settings = { + "catalog_type": "rest", + "warehouse": "demo", + "storage_endpoint": "http://minio:9000/warehouse-rest", + } + + settings.update(additional_settings) + + node.query( + f""" +DROP DATABASE IF EXISTS {name}; +SET allow_database_iceberg=true; +SET write_full_path_in_iceberg_metadata=1; +CREATE DATABASE {name} ENGINE = {engine}('{BASE_URL}', 'minio', '{minio_secret_key}') +SETTINGS {",".join((k+"="+repr(v) for k, v in settings.items()))} + """ + ) + show_result = node.query(f"SHOW DATABASE {name}") + assert minio_secret_key not in show_result + assert "HIDDEN" in show_result + + +def test_partition_timezone(started_cluster): + catalog = load_catalog_impl(started_cluster) + namespace = f"timezone_ns_{uuid.uuid4()}" + table_name = f"tz_table__{uuid.uuid4()}" + catalog.create_namespace(namespace) + table = create_table( + catalog, + namespace, + table_name, + ) + + # catalog accept data in UTC + data = [{"datetime": datetime(2024, 1, 1, 20, 0), "value": 1}, # partition 20240101 + {"datetime": datetime(2024, 1, 1, 23, 0), "value": 2}, # partition 20240101 + {"datetime": datetime(2024, 1, 2, 2, 0), "value": 3}] # partition 20240102 + df = pa.Table.from_pylist(data) + table.append(df) + + node = started_cluster.instances["node1"] + create_clickhouse_iceberg_database(node, CATALOG_NAME) + + # server timezone is Asia/Istanbul (UTC+3) + assert node.query(f""" + SELECT datetime, value + FROM {CATALOG_NAME}.`{namespace}.{table_name}` + ORDER BY datetime + """, timeout=10) == TSV( + [ + ["2024-01-01 23:00:00.000000", 1], + ["2024-01-02 02:00:00.000000", 2], + ["2024-01-02 05:00:00.000000", 3], + ]) + + # partitioning works correctly + assert node.query(f""" + SELECT datetime, value + FROM {CATALOG_NAME}.`{namespace}.{table_name}` + WHERE datetime >= '2024-01-02 00:00:00' + ORDER BY datetime + """, timeout=10) == TSV( + [ + ["2024-01-02 02:00:00.000000", 2], + ["2024-01-02 05:00:00.000000", 3], + ]) diff --git a/tests/integration/test_mask_sensitive_info/test.py b/tests/integration/test_mask_sensitive_info/test.py index 96b60a0d53dc..24e97e98aec5 100644 --- a/tests/integration/test_mask_sensitive_info/test.py +++ b/tests/integration/test_mask_sensitive_info/test.py @@ -3,6 +3,7 @@ import string import pytest +import uuid from helpers.cluster import ClickHouseCluster from helpers.test_tools import TSV @@ -248,6 +249,8 @@ def test_create_table(): azure_account_name = "devstoreaccount1" azure_account_key = "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==" + table_suffix = uuid.uuid4().hex + table_engines = [ f"MySQL('mysql80:3306', 'mysql_db', 'mysql_table', 'mysql_user', '{password}')", f"PostgreSQL('postgres1:5432', 'postgres_db', 'postgres_table', 'postgres_user', '{password}')", @@ -279,11 +282,13 @@ def test_create_table(): f"IcebergS3('http://minio1:9001/root/data/test11.csv.gz', 'minio', '{password}')", "DNS_ERROR", ), + ( + f"Iceberg(storage_type='s3', 'http://minio1:9001/root/data/test11.csv.gz', 'minio', '{password}')", + "DNS_ERROR", + ), f"AzureBlobStorage('{azure_conn_string}', 'cont', 'test_simple.csv', 'CSV')", f"AzureBlobStorage('{azure_conn_string}', 'cont', 'test_simple_1.csv', 'CSV', 'none')", - f"AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_2.csv', '{azure_account_name}', '{azure_account_key}')", - f"AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_3.csv', '{azure_account_name}', '{azure_account_key}', 'CSV')", - f"AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_4.csv', '{azure_account_name}', '{azure_account_key}', 'CSV', 'none')", + f"AzureQueue('{azure_conn_string}', 'cont', '*', 'CSV') SETTINGS mode = 'unordered'", f"AzureQueue('{azure_conn_string}', 'cont', '*', 'CSV', 'none') SETTINGS mode = 'unordered'", f"AzureQueue('{azure_conn_string}', 'cont', '*', 'CSV') SETTINGS mode = 'unordered', after_processing = 'move', after_processing_move_connection_string = '{azure_conn_string}', after_processing_move_container = 'chprocessed'", @@ -294,6 +299,21 @@ def test_create_table(): f"AzureBlobStorage('BlobEndpoint=https://my-endpoint/;SharedAccessSignature=sp=r&st=2025-09-29T14:58:11Z&se=2025-09-29T00:00:00Z&spr=https&sv=2022-11-02&sr=c&sig=SECRET%SECRET%SECRET%SECRET', 'exampledatasets', 'example.csv')", "STD_EXCEPTION", ), + + f"AzureBlobStorage(named_collection_2, connection_string = '{azure_conn_string}', container = 'cont', blob_path = 'test_simple_7.csv', format = 'CSV')", + f"AzureBlobStorage(named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_8.csv', account_name = '{azure_account_name}', account_key = '{azure_account_key}')", + f"AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_3.csv', '{azure_account_name}', '{azure_account_key}')", + f"AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_4.csv', '{azure_account_name}', '{azure_account_key}', 'CSV')", + f"AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_5.csv', '{azure_account_name}', '{azure_account_key}', 'CSV', 'none')", + f"IcebergAzure('{azure_conn_string}', 'cont', 'test_simple_0_{table_suffix}.csv')", + f"IcebergAzure('{azure_storage_account_url}', 'cont', 'test_simple_1_{table_suffix}.csv', '{azure_account_name}', '{azure_account_key}')", + f"IcebergAzure(named_collection_2, connection_string = '{azure_conn_string}', container = 'cont', blob_path = 'test_simple_2_{table_suffix}.csv', format = 'CSV')", + f"IcebergAzure(named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_3_{table_suffix}.csv', account_name = '{azure_account_name}', account_key = '{azure_account_key}')", + f"Iceberg(storage_type='azure', '{azure_conn_string}', 'cont', 'test_simple_4_{table_suffix}.csv')", + f"Iceberg(storage_type='azure', '{azure_storage_account_url}', 'cont', 'test_simple_5_{table_suffix}.csv', '{azure_account_name}', '{azure_account_key}')", + f"Iceberg(storage_type='azure', named_collection_2, connection_string = '{azure_conn_string}', container = 'cont', blob_path = 'test_simple_6_{table_suffix}.csv', format = 'CSV')", + f"Iceberg(storage_type='azure', named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_7_{table_suffix}.csv', account_name = '{azure_account_name}', account_key = '{azure_account_key}')", + f"Kafka() SETTINGS kafka_broker_list = '127.0.0.1', kafka_topic_list = 'topic', kafka_group_name = 'group', kafka_format = 'JSONEachRow', kafka_security_protocol = 'sasl_ssl', kafka_sasl_mechanism = 'PLAIN', kafka_sasl_username = 'user', kafka_sasl_password = '{password}', format_avro_schema_registry_url = 'http://schema_user:{password}@'", f"Kafka() SETTINGS kafka_broker_list = '127.0.0.1', kafka_topic_list = 'topic', kafka_group_name = 'group', kafka_format = 'JSONEachRow', kafka_security_protocol = 'sasl_ssl', kafka_sasl_mechanism = 'PLAIN', kafka_sasl_username = 'user', kafka_sasl_password = '{password}', format_avro_schema_registry_url = 'http://schema_user:{password}@domain.com'", f"S3('http://minio1:9001/root/data/test5.csv.gz', 'CSV', access_key_id = 'minio', secret_access_key = '{password}', compression_method = 'gzip')", @@ -304,7 +324,7 @@ def test_create_table(): ] def make_test_case(i): - table_name = f"table{i}" + table_name = f"table{i}_{table_suffix}" table_engine = table_engines[i] error = None if isinstance(table_engine, tuple): @@ -323,18 +343,18 @@ def make_test_case(i): for toggle, secret in enumerate(["[HIDDEN]", password]): assert ( - node.query(f"SHOW CREATE TABLE table0 {show_secrets}={toggle}") - == "CREATE TABLE default.table0\\n(\\n `x` Int32\\n)\\n" + node.query(f"SHOW CREATE TABLE table0_{table_suffix} {show_secrets}={toggle}") + == f"CREATE TABLE default.table0_{table_suffix}\\n(\\n `x` Int32\\n)\\n" "ENGINE = MySQL(\\'mysql80:3306\\', \\'mysql_db\\', " f"\\'mysql_table\\', \\'mysql_user\\', \\'{secret}\\')\n" ) assert node.query( - f"SELECT create_table_query, engine_full FROM system.tables WHERE name = 'table0' {show_secrets}={toggle}" + f"SELECT create_table_query, engine_full FROM system.tables WHERE name = 'table0_{table_suffix}' {show_secrets}={toggle}" ) == TSV( [ [ - "CREATE TABLE default.table0 (`x` Int32) ENGINE = MySQL(\\'mysql80:3306\\', \\'mysql_db\\', " + f"CREATE TABLE default.table0_{table_suffix} (`x` Int32) ENGINE = MySQL(\\'mysql80:3306\\', \\'mysql_db\\', " f"\\'mysql_table\\', \\'mysql_user\\', \\'{secret}\\')", f"MySQL(\\'mysql80:3306\\', \\'mysql_db\\', \\'mysql_table\\', \\'mysql_user\\', \\'{secret}\\')", ], @@ -344,7 +364,7 @@ def make_test_case(i): create_table_statement_counter = 0 def generate_create_table_numbered(tail): nonlocal create_table_statement_counter - result = f"CREATE TABLE table{create_table_statement_counter} {tail}" + result = f"CREATE TABLE table{create_table_statement_counter}_{table_suffix} {tail}" create_table_statement_counter += 1 return result @@ -375,11 +395,9 @@ def generate_create_table_numbered(tail): generate_create_table_numbered("(`x` int) ENGINE = S3Queue('http://minio1:9001/root/data/', 'CSV') SETTINGS mode = 'ordered', after_processing = 'move', after_processing_move_uri = 'http://minio1:9001/chprocessed', after_processing_move_access_key_id = 'minio', after_processing_move_secret_access_key = '[HIDDEN]'"), generate_create_table_numbered("(`x` int) ENGINE = Iceberg('http://minio1:9001/root/data/test11.csv.gz', 'minio', '[HIDDEN]')"), generate_create_table_numbered("(`x` int) ENGINE = IcebergS3('http://minio1:9001/root/data/test11.csv.gz', 'minio', '[HIDDEN]')"), + generate_create_table_numbered("(`x` int) ENGINE = Iceberg(storage_type = 's3', 'http://minio1:9001/root/data/test11.csv.gz', 'minio', '[HIDDEN]')"), generate_create_table_numbered(f"(`x` int) ENGINE = AzureBlobStorage('{masked_azure_conn_string}', 'cont', 'test_simple.csv', 'CSV')"), generate_create_table_numbered(f"(`x` int) ENGINE = AzureBlobStorage('{masked_azure_conn_string}', 'cont', 'test_simple_1.csv', 'CSV', 'none')"), - generate_create_table_numbered(f"(`x` int) ENGINE = AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_2.csv', '{azure_account_name}', '[HIDDEN]')"), - generate_create_table_numbered(f"(`x` int) ENGINE = AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_3.csv', '{azure_account_name}', '[HIDDEN]', 'CSV')"), - generate_create_table_numbered(f"(`x` int) ENGINE = AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_4.csv', '{azure_account_name}', '[HIDDEN]', 'CSV', 'none')"), generate_create_table_numbered(f"(`x` int) ENGINE = AzureQueue('{masked_azure_conn_string}', 'cont', '*', 'CSV') SETTINGS mode = 'unordered'"), generate_create_table_numbered(f"(`x` int) ENGINE = AzureQueue('{masked_azure_conn_string}', 'cont', '*', 'CSV', 'none') SETTINGS mode = 'unordered'"), generate_create_table_numbered(f"(`x` int) ENGINE = AzureQueue('{masked_azure_conn_string}', 'cont', '*', 'CSV') SETTINGS mode = 'unordered', after_processing = 'move', after_processing_move_connection_string = '{masked_azure_conn_string}', after_processing_move_container = 'chprocessed'",), @@ -387,6 +405,19 @@ def generate_create_table_numbered(tail): generate_create_table_numbered(f"(`x` int) ENGINE = AzureQueue('{azure_storage_account_url}', 'cont', '*', '{azure_account_name}', '[HIDDEN]', 'CSV') SETTINGS mode = 'unordered'"), generate_create_table_numbered(f"(`x` int) ENGINE = AzureQueue('{azure_storage_account_url}', 'cont', '*', '{azure_account_name}', '[HIDDEN]', 'CSV', 'none') SETTINGS mode = 'unordered'"), generate_create_table_numbered(f"(`x` int) ENGINE = AzureBlobStorage('{masked_sas_conn_string}', 'exampledatasets', 'example.csv')"), + generate_create_table_numbered(f"(`x` int) ENGINE = AzureBlobStorage(named_collection_2, connection_string = '{masked_azure_conn_string}', container = 'cont', blob_path = 'test_simple_7.csv', format = 'CSV')"), + generate_create_table_numbered(f"(`x` int) ENGINE = AzureBlobStorage(named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_8.csv', account_name = '{azure_account_name}', account_key = '[HIDDEN]')"), + generate_create_table_numbered(f"(`x` int) ENGINE = AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_3.csv', '{azure_account_name}', '[HIDDEN]')"), + generate_create_table_numbered(f"(`x` int) ENGINE = AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_4.csv', '{azure_account_name}', '[HIDDEN]', 'CSV')"), + generate_create_table_numbered(f"(`x` int) ENGINE = AzureBlobStorage('{azure_storage_account_url}', 'cont', 'test_simple_5.csv', '{azure_account_name}', '[HIDDEN]', 'CSV', 'none')"), + generate_create_table_numbered(f"(`x` int) ENGINE = IcebergAzure('{masked_azure_conn_string}', 'cont', 'test_simple_0_{table_suffix}.csv')"), + generate_create_table_numbered(f"(`x` int) ENGINE = IcebergAzure('{azure_storage_account_url}', 'cont', 'test_simple_1_{table_suffix}.csv', '{azure_account_name}', '[HIDDEN]')"), + generate_create_table_numbered(f"(`x` int) ENGINE = IcebergAzure(named_collection_2, connection_string = '{masked_azure_conn_string}', container = 'cont', blob_path = 'test_simple_2_{table_suffix}.csv', format = 'CSV')"), + generate_create_table_numbered(f"(`x` int) ENGINE = IcebergAzure(named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_3_{table_suffix}.csv', account_name = '{azure_account_name}', account_key = '[HIDDEN]')"), + generate_create_table_numbered(f"(`x` int) ENGINE = Iceberg(storage_type = 'azure', '{masked_azure_conn_string}', 'cont', 'test_simple_4_{table_suffix}.csv')"), + generate_create_table_numbered(f"(`x` int) ENGINE = Iceberg(storage_type = 'azure', '{azure_storage_account_url}', 'cont', 'test_simple_5_{table_suffix}.csv', '{azure_account_name}', '[HIDDEN]')"), + generate_create_table_numbered(f"(`x` int) ENGINE = Iceberg(storage_type = 'azure', named_collection_2, connection_string = '{masked_azure_conn_string}', container = 'cont', blob_path = 'test_simple_6_{table_suffix}.csv', format = 'CSV')"), + generate_create_table_numbered(f"(`x` int) ENGINE = Iceberg(storage_type = 'azure', named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_7_{table_suffix}.csv', account_name = '{azure_account_name}', account_key = '[HIDDEN]')"), generate_create_table_numbered("(`x` int) ENGINE = Kafka SETTINGS kafka_broker_list = '127.0.0.1', kafka_topic_list = 'topic', kafka_group_name = 'group', kafka_format = 'JSONEachRow', kafka_security_protocol = 'sasl_ssl', kafka_sasl_mechanism = 'PLAIN', kafka_sasl_username = 'user', kafka_sasl_password = '[HIDDEN]', format_avro_schema_registry_url = 'http://schema_user:[HIDDEN]@'"), generate_create_table_numbered("(`x` int) ENGINE = Kafka SETTINGS kafka_broker_list = '127.0.0.1', kafka_topic_list = 'topic', kafka_group_name = 'group', kafka_format = 'JSONEachRow', kafka_security_protocol = 'sasl_ssl', kafka_sasl_mechanism = 'PLAIN', kafka_sasl_username = 'user', kafka_sasl_password = '[HIDDEN]', format_avro_schema_registry_url = 'http://schema_user:[HIDDEN]@domain.com'"), generate_create_table_numbered("(`x` int) ENGINE = S3('http://minio1:9001/root/data/test5.csv.gz', 'CSV', access_key_id = 'minio', secret_access_key = '[HIDDEN]', compression_method = 'gzip')"), @@ -514,9 +545,22 @@ def test_table_functions(): f"azureBlobStorage(named_collection_2, connection_string = '{azure_conn_string}', container = 'cont', blob_path = 'test_simple_7.csv', format = 'CSV')", f"azureBlobStorage(named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_8.csv', account_name = '{azure_account_name}', account_key = '{azure_account_key}')", f"iceberg('http://minio1:9001/root/data/test11.csv.gz', 'minio', '{password}')", - f"gcs('http://minio1:9001/root/data/test11.csv.gz', 'minio', '{password}')", + f"iceberg(named_collection_2, url = 'http://minio1:9001/root/data/test4.csv', access_key_id = 'minio', secret_access_key = '{password}')", f"icebergS3('http://minio1:9001/root/data/test11.csv.gz', 'minio', '{password}')", + f"icebergS3(named_collection_2, url = 'http://minio1:9001/root/data/test4.csv', access_key_id = 'minio', secret_access_key = '{password}')", + f"icebergAzure('{azure_conn_string}', 'cont', 'test_simple.csv')", + f"icebergAzure('{azure_storage_account_url}', 'cont', 'test_simple.csv', '{azure_account_name}', '{azure_account_key}')", f"icebergAzure('{azure_storage_account_url}', 'cont', 'test_simple_6.csv', '{azure_account_name}', '{azure_account_key}', 'CSV', 'none', 'auto')", + f"icebergAzure(named_collection_2, connection_string = '{azure_conn_string}', container = 'cont', blob_path = 'test_simple_7.csv', format = 'CSV')", + f"icebergAzure(named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_8.csv', account_name = '{azure_account_name}', account_key = '{azure_account_key}')", + f"iceberg(storage_type='s3', 'http://minio1:9001/root/data/test11.csv.gz', 'minio', '{password}')", + f"iceberg(storage_type='s3', named_collection_2, url = 'http://minio1:9001/root/data/test4.csv', access_key_id = 'minio', secret_access_key = '{password}')", + f"iceberg(storage_type='azure', '{azure_conn_string}', 'cont', 'test_simple.csv')", + f"iceberg(storage_type='azure', '{azure_storage_account_url}', 'cont', 'test_simple.csv', '{azure_account_name}', '{azure_account_key}')", + f"iceberg(storage_type='azure', '{azure_storage_account_url}', 'cont', 'test_simple_6.csv', '{azure_account_name}', '{azure_account_key}', 'CSV', 'none', 'auto')", + f"iceberg(storage_type='azure', named_collection_2, connection_string = '{azure_conn_string}', container = 'cont', blob_path = 'test_simple_7.csv', format = 'CSV')", + f"iceberg(storage_type='azure', named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_8.csv', account_name = '{azure_account_name}', account_key = '{azure_account_key}')", + f"gcs('http://minio1:9001/root/data/test11.csv.gz', 'minio', '{password}')", f"deltaLakeAzure('{azure_storage_account_url}', 'cont', 'test_simple_6.csv', '{azure_account_name}', '{azure_account_key}', 'CSV', 'none', 'auto')" if has_delta_lake else (f"deltaLakeAzure('{azure_storage_account_url}', 'cont', 'test_simple_6.csv', '{azure_account_name}', '{azure_account_key}', 'CSV', 'none', 'auto')", "UNKNOWN_FUNCTION"), f"hudi('http://minio1:9001/root/data/test7.csv', 'minio', '{password}')", f"arrowFlight('arrowflight1:5006', 'dataset', 'arrowflight_user', '{password}')", @@ -607,16 +651,29 @@ def make_test_case(i): f"CREATE TABLE tablefunc37 (`x` int) AS azureBlobStorage(named_collection_2, connection_string = '{masked_azure_conn_string}', container = 'cont', blob_path = 'test_simple_7.csv', format = 'CSV')", f"CREATE TABLE tablefunc38 (`x` int) AS azureBlobStorage(named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_8.csv', account_name = '{azure_account_name}', account_key = '[HIDDEN]')", "CREATE TABLE tablefunc39 (`x` int) AS iceberg('http://minio1:9001/root/data/test11.csv.gz', 'minio', '[HIDDEN]')", - "CREATE TABLE tablefunc40 (`x` int) AS gcs('http://minio1:9001/root/data/test11.csv.gz', 'minio', '[HIDDEN]')", + "CREATE TABLE tablefunc40 (`x` int) AS iceberg(named_collection_2, url = 'http://minio1:9001/root/data/test4.csv', access_key_id = 'minio', secret_access_key = '[HIDDEN]')", "CREATE TABLE tablefunc41 (`x` int) AS icebergS3('http://minio1:9001/root/data/test11.csv.gz', 'minio', '[HIDDEN]')", - f"CREATE TABLE tablefunc42 (`x` int) AS icebergAzure('{azure_storage_account_url}', 'cont', 'test_simple_6.csv', '{azure_account_name}', '[HIDDEN]', 'CSV', 'none', 'auto')", - f"CREATE TABLE tablefunc43 (`x` int) AS deltaLakeAzure('{azure_storage_account_url}', 'cont', 'test_simple_6.csv', '{azure_account_name}', '[HIDDEN]', 'CSV', 'none', 'auto')", - "CREATE TABLE tablefunc44 (`x` int) AS hudi('http://minio1:9001/root/data/test7.csv', 'minio', '[HIDDEN]')", - "CREATE TABLE tablefunc45 (`x` int) AS arrowFlight('arrowflight1:5006', 'dataset', 'arrowflight_user', '[HIDDEN]')", - "CREATE TABLE tablefunc46 (`x` int) AS arrowFlight(named_collection_1, host = 'arrowflight1', port = 5006, dataset = 'dataset', username = 'arrowflight_user', password = '[HIDDEN]')", - "CREATE TABLE tablefunc47 (`x` int) AS arrowflight(named_collection_1, host = 'arrowflight1', port = 5006, dataset = 'dataset', username = 'arrowflight_user', password = '[HIDDEN]')", - "CREATE TABLE tablefunc48 (`x` int) AS url('https://username:[HIDDEN]@domain.com/path', 'CSV')", - "CREATE TABLE tablefunc49 (`x` int) AS redis('localhost', 'key', 'key Int64', 0, '[HIDDEN]')", + "CREATE TABLE tablefunc42 (`x` int) AS icebergS3(named_collection_2, url = 'http://minio1:9001/root/data/test4.csv', access_key_id = 'minio', secret_access_key = '[HIDDEN]')", + f"CREATE TABLE tablefunc43 (`x` int) AS icebergAzure('{masked_azure_conn_string}', 'cont', 'test_simple.csv')", + f"CREATE TABLE tablefunc44 (`x` int) AS icebergAzure('{azure_storage_account_url}', 'cont', 'test_simple.csv', '{azure_account_name}', '[HIDDEN]')", + f"CREATE TABLE tablefunc45 (`x` int) AS icebergAzure('{azure_storage_account_url}', 'cont', 'test_simple_6.csv', '{azure_account_name}', '[HIDDEN]', 'CSV', 'none', 'auto')", + f"CREATE TABLE tablefunc46 (`x` int) AS icebergAzure(named_collection_2, connection_string = '{masked_azure_conn_string}', container = 'cont', blob_path = 'test_simple_7.csv', format = 'CSV')", + f"CREATE TABLE tablefunc47 (`x` int) AS icebergAzure(named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_8.csv', account_name = '{azure_account_name}', account_key = '[HIDDEN]')", + "CREATE TABLE tablefunc48 (`x` int) AS iceberg(storage_type = 's3', 'http://minio1:9001/root/data/test11.csv.gz', 'minio', '[HIDDEN]')", + "CREATE TABLE tablefunc49 (`x` int) AS iceberg(storage_type = 's3', named_collection_2, url = 'http://minio1:9001/root/data/test4.csv', access_key_id = 'minio', secret_access_key = '[HIDDEN]')", + f"CREATE TABLE tablefunc50 (`x` int) AS iceberg(storage_type = 'azure', '{masked_azure_conn_string}', 'cont', 'test_simple.csv')", + f"CREATE TABLE tablefunc51 (`x` int) AS iceberg(storage_type = 'azure', '{azure_storage_account_url}', 'cont', 'test_simple.csv', '{azure_account_name}', '[HIDDEN]')", + f"CREATE TABLE tablefunc52 (`x` int) AS iceberg(storage_type = 'azure', '{azure_storage_account_url}', 'cont', 'test_simple_6.csv', '{azure_account_name}', '[HIDDEN]', 'CSV', 'none', 'auto')", + f"CREATE TABLE tablefunc53 (`x` int) AS iceberg(storage_type = 'azure', named_collection_2, connection_string = '{masked_azure_conn_string}', container = 'cont', blob_path = 'test_simple_7.csv', format = 'CSV')", + f"CREATE TABLE tablefunc54 (`x` int) AS iceberg(storage_type = 'azure', named_collection_2, storage_account_url = '{azure_storage_account_url}', container = 'cont', blob_path = 'test_simple_8.csv', account_name = '{azure_account_name}', account_key = '[HIDDEN]')", + "CREATE TABLE tablefunc55 (`x` int) AS gcs('http://minio1:9001/root/data/test11.csv.gz', 'minio', '[HIDDEN]')", + f"CREATE TABLE tablefunc56 (`x` int) AS deltaLakeAzure('{azure_storage_account_url}', 'cont', 'test_simple_6.csv', '{azure_account_name}', '[HIDDEN]', 'CSV', 'none', 'auto')", + "CREATE TABLE tablefunc57 (`x` int) AS hudi('http://minio1:9001/root/data/test7.csv', 'minio', '[HIDDEN]')", + "CREATE TABLE tablefunc58 (`x` int) AS arrowFlight('arrowflight1:5006', 'dataset', 'arrowflight_user', '[HIDDEN]')", + "CREATE TABLE tablefunc59 (`x` int) AS arrowFlight(named_collection_1, host = 'arrowflight1', port = 5006, dataset = 'dataset', username = 'arrowflight_user', password = '[HIDDEN]')", + "CREATE TABLE tablefunc60 (`x` int) AS arrowflight(named_collection_1, host = 'arrowflight1', port = 5006, dataset = 'dataset', username = 'arrowflight_user', password = '[HIDDEN]')", + "CREATE TABLE tablefunc61 (`x` int) AS url('https://username:[HIDDEN]@domain.com/path', 'CSV')", + "CREATE TABLE tablefunc62 (`x` int) AS redis('localhost', 'key', 'key Int64', 0, '[HIDDEN]')", ], must_not_contain=[password], ) diff --git a/tests/integration/test_s3_cluster/configs/cluster.xml b/tests/integration/test_s3_cluster/configs/cluster.xml index 84e6afd12f71..7f3dab539985 100644 --- a/tests/integration/test_s3_cluster/configs/cluster.xml +++ b/tests/integration/test_s3_cluster/configs/cluster.xml @@ -20,6 +20,20 @@ + + + + + s0_0_1 + 9000 + + + s0_1_0 + 9000 + + + + @@ -49,6 +63,77 @@ + + + + c2.s0_0_0 + 9000 + + + c2.s0_0_1 + 9000 + + + + + + + + s0_0_1 + 9000 + foo + bar + + + s0_1_0 + 9000 + foo + bar + + + + + + baz + + + s0_0_1 + 9000 + foo + + + s0_1_0 + 9000 + foo + + + + + + + + s0_0_0 + 9000 + + + s0_0_1 + 9000 + + + s0_1_0 + 9000 + + + c2.s0_0_0 + 9000 + + + c2.s0_0_1 + 9000 + + + + cluster_simple diff --git a/tests/integration/test_s3_cluster/configs/hidden_clusters.xml b/tests/integration/test_s3_cluster/configs/hidden_clusters.xml new file mode 100644 index 000000000000..8816cca1c79b --- /dev/null +++ b/tests/integration/test_s3_cluster/configs/hidden_clusters.xml @@ -0,0 +1,20 @@ + + + + + + s0_0_1 + 9000 + foo + bar + + + s0_1_0 + 9000 + foo + bar + + + + + diff --git a/tests/integration/test_s3_cluster/configs/users.xml b/tests/integration/test_s3_cluster/configs/users.xml index 4b6ba057ecb1..95d2d329cac0 100644 --- a/tests/integration/test_s3_cluster/configs/users.xml +++ b/tests/integration/test_s3_cluster/configs/users.xml @@ -5,5 +5,9 @@ default 1 + + bar + default + diff --git a/tests/integration/test_s3_cluster/test.py b/tests/integration/test_s3_cluster/test.py index 76b8f0df2881..e017e3326029 100644 --- a/tests/integration/test_s3_cluster/test.py +++ b/tests/integration/test_s3_cluster/test.py @@ -2,6 +2,8 @@ import logging import os import shutil +import uuid + import time from email.errors import HeaderParseError @@ -91,6 +93,22 @@ def started_cluster(): macros={"replica": "replica1", "shard": "shard2"}, with_zookeeper=True, ) + cluster.add_instance( + "c2.s0_0_0", + main_configs=["configs/cluster.xml", "configs/named_collections.xml", "configs/hidden_clusters.xml"], + user_configs=["configs/users.xml"], + macros={"replica": "replica1", "shard": "shard1"}, + with_zookeeper=True, + stay_alive=True, + ) + cluster.add_instance( + "c2.s0_0_1", + main_configs=["configs/cluster.xml", "configs/named_collections.xml", "configs/hidden_clusters.xml"], + user_configs=["configs/users.xml"], + macros={"replica": "replica2", "shard": "shard1"}, + with_zookeeper=True, + stay_alive=True, + ) logging.info("Starting cluster...") cluster.start() @@ -234,6 +252,21 @@ def test_wrong_cluster(started_cluster): assert "not found" in error + error = node.query_and_get_error( + f""" + SELECT count(*) from s3( + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', + 'minio', '{minio_secret_key}', 'CSV', 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') + UNION ALL + SELECT count(*) from s3( + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', + 'minio', '{minio_secret_key}', 'CSV', 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') + SETTINGS object_storage_cluster = 'non_existing_cluster' + """ + ) + + assert "not found" in error + def test_ambiguous_join(started_cluster): node = started_cluster.instances["s0_0_0"] @@ -252,6 +285,20 @@ def test_ambiguous_join(started_cluster): ) assert "AMBIGUOUS_COLUMN_NAME" not in result + result = node.query( + f""" + SELECT l.name, r.value from s3( + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') as l + JOIN s3( + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') as r + ON l.name = r.name + SETTINGS object_storage_cluster = 'cluster_simple' + """ + ) + assert "AMBIGUOUS_COLUMN_NAME" not in result + def test_skip_unavailable_shards(started_cluster): node = started_cluster.instances["s0_0_0"] @@ -267,6 +314,17 @@ def test_skip_unavailable_shards(started_cluster): assert result == "10\n" + result = node.query( + f""" + SELECT count(*) from s3( + 'http://minio1:9001/root/data/clickhouse/part1.csv', + 'minio', '{minio_secret_key}', 'CSV', 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') + SETTINGS skip_unavailable_shards = 1, object_storage_cluster = 'cluster_non_existent_port' + """ + ) + + assert result == "10\n" + def test_unset_skip_unavailable_shards(started_cluster): # Although skip_unavailable_shards is not set, cluster table functions should always skip unavailable shards. @@ -282,6 +340,17 @@ def test_unset_skip_unavailable_shards(started_cluster): assert result == "10\n" + result = node.query( + f""" + SELECT count(*) from s3( + 'http://minio1:9001/root/data/clickhouse/part1.csv', + 'minio', '{minio_secret_key}', 'CSV', 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') + SETTINGS object_storage_cluster = 'cluster_non_existent_port' + """ + ) + + assert result == "10\n" + def test_distributed_insert_select_with_replicated(started_cluster): first_replica_first_shard = started_cluster.instances["s0_0_0"] @@ -462,6 +531,18 @@ def test_cluster_format_detection(started_cluster): assert result == expected_result + result = node.query( + f"SELECT * FROM s3('http://minio1:9001/root/data/generated/*', 'minio', '{minio_secret_key}') order by c1, c2 SETTINGS object_storage_cluster = 'cluster_simple'" + ) + + assert result == expected_result + + result = node.query( + f"SELECT * FROM s3('http://minio1:9001/root/data/generated/*', 'minio', '{minio_secret_key}', auto, 'a String, b UInt64') order by a, b SETTINGS object_storage_cluster = 'cluster_simple'" + ) + + assert result == expected_result + def test_cluster_default_expression(started_cluster): node = started_cluster.instances["s0_0_0"] @@ -509,3 +590,374 @@ def test_cluster_default_expression(started_cluster): ) assert result == expected_result + + result = node.query( + f"SELECT * FROM s3('http://minio1:9001/root/data/data{{1,2,3}}', 'minio', '{minio_secret_key}', 'JSONEachRow', 'id UInt32, date Date DEFAULT 18262') order by id SETTINGS object_storage_cluster = 'cluster_simple'" + ) + + assert result == expected_result + + result = node.query( + f"SELECT * FROM s3('http://minio1:9001/root/data/data{{1,2,3}}', 'minio', '{minio_secret_key}', 'auto', 'id UInt32, date Date DEFAULT 18262') order by id SETTINGS object_storage_cluster = 'cluster_simple'" + ) + + assert result == expected_result + + result = node.query( + f"SELECT * FROM s3('http://minio1:9001/root/data/data{{1,2,3}}', 'minio', '{minio_secret_key}', 'JSONEachRow', 'id UInt32, date Date DEFAULT 18262', 'auto') order by id SETTINGS object_storage_cluster = 'cluster_simple'" + ) + + assert result == expected_result + + result = node.query( + f"SELECT * FROM s3('http://minio1:9001/root/data/data{{1,2,3}}', 'minio', '{minio_secret_key}', 'auto', 'id UInt32, date Date DEFAULT 18262', 'auto') order by id SETTINGS object_storage_cluster = 'cluster_simple'" + ) + + assert result == expected_result + + result = node.query( + "SELECT * FROM s3(test_s3_with_default) order by id SETTINGS object_storage_cluster = 'cluster_simple'" + ) + + assert result == expected_result + + +def test_distributed_s3_table_engine(started_cluster): + node = started_cluster.instances["s0_0_0"] + + resp_def = node.query( + f""" + SELECT * from s3Cluster( + 'cluster_simple', + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') ORDER BY (name, value, polygon) + """ + ) + + node.query("DROP TABLE IF EXISTS single_node"); + node.query( + f""" + CREATE TABLE single_node + (name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))) + ENGINE=S3('http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV') + """ + ) + query_id_engine_single_node = str(uuid.uuid4()) + resp_engine_single_node = node.query( + """ + SELECT * FROM single_node ORDER BY (name, value, polygon) + """, + query_id = query_id_engine_single_node + ) + assert resp_def == resp_engine_single_node + + node.query("DROP TABLE IF EXISTS distributed"); + node.query( + f""" + CREATE TABLE distributed + (name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))) + ENGINE=S3('http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV') + SETTINGS object_storage_cluster='cluster_simple' + """ + ) + query_id_engine_distributed = str(uuid.uuid4()) + resp_engine_distributed = node.query( + """ + SELECT * FROM distributed ORDER BY (name, value, polygon) + """, + query_id = query_id_engine_distributed + ) + assert resp_def == resp_engine_distributed + + node.query("SYSTEM FLUSH LOGS ON CLUSTER 'cluster_simple'") + + hosts_engine_single_node = node.query( + f""" + SELECT uniq(hostname) + FROM clusterAllReplicas('cluster_simple', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id_engine_single_node}' + """ + ) + assert int(hosts_engine_single_node) == 1 + hosts_engine_distributed = node.query( + f""" + SELECT uniq(hostname) + FROM clusterAllReplicas('cluster_simple', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id_engine_distributed}' + """ + ) + assert int(hosts_engine_distributed) == 3 + + +def test_cluster_hosts_limit(started_cluster): + node = started_cluster.instances["s0_0_0"] + + query_id_def = str(uuid.uuid4()) + resp_def = node.query( + f""" + SELECT * from s3Cluster( + 'cluster_simple', + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') ORDER BY (name, value, polygon) + """, + query_id = query_id_def + ) + + # object_storage_max_nodes is greater than number of hosts in cluster + query_id_4_hosts = str(uuid.uuid4()) + resp_4_hosts = node.query( + f""" + SELECT * from s3Cluster( + 'cluster_simple', + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') ORDER BY (name, value, polygon) + SETTINGS object_storage_max_nodes=4 + """, + query_id = query_id_4_hosts + ) + assert resp_def == resp_4_hosts + + # object_storage_max_nodes is equal number of hosts in cluster + query_id_3_hosts = str(uuid.uuid4()) + resp_3_hosts = node.query( + f""" + SELECT * from s3Cluster( + 'cluster_simple', + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') ORDER BY (name, value, polygon) + SETTINGS object_storage_max_nodes=3 + """, + query_id = query_id_3_hosts + ) + assert resp_def == resp_3_hosts + + # object_storage_max_nodes is less than number of hosts in cluster + query_id_2_hosts = str(uuid.uuid4()) + resp_2_hosts = node.query( + f""" + SELECT * from s3Cluster( + 'cluster_simple', + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') ORDER BY (name, value, polygon) + SETTINGS object_storage_max_nodes=2 + """, + query_id = query_id_2_hosts + ) + assert resp_def == resp_2_hosts + + node.query("SYSTEM FLUSH LOGS ON CLUSTER 'cluster_simple'") + + hosts_def = node.query( + f""" + SELECT uniq(hostname) + FROM clusterAllReplicas('cluster_simple', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id_def}' AND query_id!='{query_id_def}' + """ + ) + assert int(hosts_def) == 3 + + hosts_4 = node.query( + f""" + SELECT uniq(hostname) + FROM clusterAllReplicas('cluster_simple', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id_4_hosts}' AND query_id!='{query_id_4_hosts}' + """ + ) + assert int(hosts_4) == 3 + + hosts_3 = node.query( + f""" + SELECT uniq(hostname) + FROM clusterAllReplicas('cluster_simple', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id_3_hosts}' AND query_id!='{query_id_3_hosts}' + """ + ) + assert int(hosts_3) == 3 + + hosts_2 = node.query( + f""" + SELECT uniq(hostname) + FROM clusterAllReplicas('cluster_simple', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id_2_hosts}' AND query_id!='{query_id_2_hosts}' + """ + ) + assert int(hosts_2) == 2 + + +def test_object_storage_remote_initiator(started_cluster): + node = started_cluster.instances["s0_0_0"] + + # Simple cluster + query_id = uuid.uuid4().hex + result = node.query( + f""" + SELECT * from s3Cluster( + 'cluster_remote', + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') ORDER BY (name, value, polygon) + SETTINGS object_storage_remote_initiator=1 + """, + query_id = query_id, + ) + + assert result is not None + + node.query("SYSTEM FLUSH LOGS ON CLUSTER 'cluster_all'") + queries = node.query( + f""" + SELECT count() + FROM clusterAllReplicas('cluster_all', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id}' + FORMAT TSV + """ + ).splitlines() + + # initial node + describe table + remote initiator + 2 subqueries on replicas + assert queries == ["5"] + + # Cluster with dots in the host names + query_id = uuid.uuid4().hex + result = node.query( + f""" + SELECT * from s3Cluster( + 'cluster_with_dots', + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') ORDER BY (name, value, polygon) + SETTINGS object_storage_remote_initiator=1 + """, + query_id = query_id, + ) + + assert result is not None + + node.query("SYSTEM FLUSH LOGS ON CLUSTER 'cluster_all'") + queries = node.query( + f""" + SELECT count() + FROM clusterAllReplicas('cluster_all', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id}' + FORMAT TSV + """ + ).splitlines() + + # initial node + describe table + remote initiator + 2 subqueries on replicas + assert queries == ["5"] + + users = node.query( + f""" + SELECT DISTINCT hostname, user + FROM clusterAllReplicas('cluster_all', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id}' + ORDER BY ALL + FORMAT TSV + """ + ).splitlines() + + assert users == ["c2.s0_0_0\tdefault", + "c2.s0_0_1\tdefault", + "s0_0_0\tdefault"] + + # Cluster with user and password + query_id = uuid.uuid4().hex + result = node.query( + f""" + SELECT * from s3Cluster( + 'cluster_with_username_and_password', + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') ORDER BY (name, value, polygon) + SETTINGS object_storage_remote_initiator=1 + """, + query_id = query_id, + ) + + assert result is not None + + node.query("SYSTEM FLUSH LOGS ON CLUSTER 'cluster_all'") + queries = node.query( + f""" + SELECT count() + FROM clusterAllReplicas('cluster_all', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id}' + FORMAT TSV + """ + ).splitlines() + + # initial node + describe table + remote initiator + 2 subqueries on replicas + assert queries == ["5"] + + users = node.query( + f""" + SELECT DISTINCT hostname, user + FROM clusterAllReplicas('cluster_all', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id}' + ORDER BY ALL + FORMAT TSV + """ + ).splitlines() + + assert users == ["s0_0_0\tdefault", + "s0_0_1\tfoo", + "s0_1_0\tfoo"] + + # Cluster with secret + query_id = uuid.uuid4().hex + result = node.query_and_get_error( + f""" + SELECT * from s3Cluster( + 'cluster_with_secret', + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') ORDER BY (name, value, polygon) + SETTINGS object_storage_remote_initiator=1 + """, + query_id = query_id, + ) + + assert "Can't convert query to remote when cluster uses secret" in result + + # Different cluster for remote initiator and query execution + # with `hidden_cluster_with_username_and_password` existed only in `cluster_with_dots` nodes + query_id = uuid.uuid4().hex + + result = node.query( + f""" + SELECT * from s3( + 'http://minio1:9001/root/data/{{clickhouse,database}}/*', 'minio', '{minio_secret_key}', 'CSV', + 'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))') ORDER BY (name, value, polygon) + SETTINGS + object_storage_remote_initiator=1, + object_storage_cluster='hidden_cluster_with_username_and_password', + object_storage_remote_initiator_cluster='cluster_with_dots' + """, + query_id = query_id, + ) + + assert result is not None + + node.query("SYSTEM FLUSH LOGS ON CLUSTER 'cluster_all'") + queries = node.query( + f""" + SELECT count() + FROM clusterAllReplicas('cluster_all', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id}' + FORMAT TSV + """ + ).splitlines() + + # initial node + describe table + remote initiator + 2 subqueries on replicas + assert queries == ["5"] + + users = node.query( + f""" + SELECT DISTINCT hostname, user + FROM clusterAllReplicas('cluster_all', system.query_log) + WHERE type='QueryFinish' AND initial_query_id='{query_id}' + ORDER BY ALL + FORMAT TSV + """ + ).splitlines() + + # Random host from 'cluster_with_dots' for remote query + assert users[0] in ["c2.s0_0_0\tdefault", "c2.s0_0_1\tdefault"] + assert users[1:] == ["s0_0_0\tdefault", + "s0_0_1\tfoo", + "s0_1_0\tfoo"] diff --git a/tests/integration/test_storage_iceberg_no_spark/configs/config.d/named_collections.xml b/tests/integration/test_storage_iceberg_no_spark/configs/config.d/named_collections.xml index 516e4ba63a3a..7dfec41b2df8 100644 --- a/tests/integration/test_storage_iceberg_no_spark/configs/config.d/named_collections.xml +++ b/tests/integration/test_storage_iceberg_no_spark/configs/config.d/named_collections.xml @@ -11,5 +11,19 @@ + + http://minio1:9001/root/ + minio + ClickHouse_Minio_P@ssw0rd + s3 + + + devstoreaccount1 + Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw== + azure + + + local + diff --git a/tests/integration/test_storage_iceberg_with_spark/configs/config.d/named_collections.xml b/tests/integration/test_storage_iceberg_with_spark/configs/config.d/named_collections.xml index 516e4ba63a3a..7dfec41b2df8 100644 --- a/tests/integration/test_storage_iceberg_with_spark/configs/config.d/named_collections.xml +++ b/tests/integration/test_storage_iceberg_with_spark/configs/config.d/named_collections.xml @@ -11,5 +11,19 @@ + + http://minio1:9001/root/ + minio + ClickHouse_Minio_P@ssw0rd + s3 + + + devstoreaccount1 + Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw== + azure + + + local + diff --git a/tests/integration/test_storage_iceberg_with_spark/test_cluster_table_function.py b/tests/integration/test_storage_iceberg_with_spark/test_cluster_table_function.py index 39dae6e5fbd9..424373b32f86 100644 --- a/tests/integration/test_storage_iceberg_with_spark/test_cluster_table_function.py +++ b/tests/integration/test_storage_iceberg_with_spark/test_cluster_table_function.py @@ -18,9 +18,30 @@ from helpers.config_cluster import minio_secret_key +def count_secondary_subqueries(started_cluster, query_id, expected, comment): + for node_name, replica in started_cluster.instances.items(): + cluster_secondary_queries = ( + replica.query( + f""" + SELECT count(*) FROM system.query_log + WHERE + type = 'QueryFinish' + AND NOT is_initial_query + AND initial_query_id='{query_id}' + """ + ) + .strip() + ) + + logging.info( + f"[{node_name}] cluster_secondary_queries {comment}: {cluster_secondary_queries}" + ) + assert int(cluster_secondary_queries) == expected + @pytest.mark.parametrize("format_version", ["1", "2"]) @pytest.mark.parametrize("storage_type", ["s3", "azure", "local"]) -def test_cluster_table_function(started_cluster_iceberg_with_spark, format_version, storage_type): +@pytest.mark.parametrize("cluster_name_as_literal", [True, False]) +def test_cluster_table_function(started_cluster_iceberg_with_spark, format_version, storage_type, cluster_name_as_literal): instance = started_cluster_iceberg_with_spark.instances["node1"] spark = started_cluster_iceberg_with_spark.spark_session @@ -78,59 +99,159 @@ def add_df(mode): # Regular Query only node1 table_function_expr = get_creation_expression( - storage_type, TABLE_NAME, started_cluster_iceberg_with_spark, table_function=True + storage_type, TABLE_NAME, started_cluster_iceberg_with_spark, table_function=True, cluster_name_as_literal=cluster_name_as_literal ) select_regular = ( instance.query(f"SELECT * FROM {table_function_expr}").strip().split() ) + def make_query_from_function( + run_on_cluster=False, + alt_syntax=False, + remote=False, + storage_type_as_arg=False, + storage_type_in_named_collection=False, + ): + expr = get_creation_expression( + storage_type, + TABLE_NAME, + started_cluster_iceberg_with_spark, + table_function=True, + run_on_cluster=run_on_cluster, + storage_type_as_arg=storage_type_as_arg, + storage_type_in_named_collection=storage_type_in_named_collection, + cluster_name_as_literal=cluster_name_as_literal, + ) + query_id = str(uuid.uuid4()) + settings = f"SETTINGS object_storage_cluster='cluster_simple'" if (alt_syntax and not run_on_cluster) else "" + if remote: + query = f"SELECT * FROM remote('node2', {expr}) {settings}" + else: + query = f"SELECT * FROM {expr} {settings}" + responce = instance.query(query, query_id=query_id).strip().split() + return responce, query_id + # Cluster Query with node1 as coordinator - table_function_expr_cluster = get_creation_expression( - storage_type, - TABLE_NAME, - started_cluster_iceberg_with_spark, - table_function=True, + select_cluster, query_id_cluster = make_query_from_function(run_on_cluster=True) + + # Cluster Query with node1 as coordinator with alternative syntax + select_cluster_alt_syntax, query_id_cluster_alt_syntax = make_query_from_function( + run_on_cluster=True, + alt_syntax=True) + + # Cluster Query with node1 as coordinator and storage type as arg + select_cluster_with_type_arg, query_id_cluster_with_type_arg = make_query_from_function( run_on_cluster=True, + storage_type_as_arg=True, ) - select_cluster = ( - instance.query(f"SELECT * FROM {table_function_expr_cluster}").strip().split() + + # Cluster Query with node1 as coordinator and storage type in named collection + select_cluster_with_type_in_nc, query_id_cluster_with_type_in_nc = make_query_from_function( + run_on_cluster=True, + storage_type_in_named_collection=True, + ) + + # Cluster Query with node1 as coordinator and storage type as arg, alternative syntax + select_cluster_with_type_arg_alt_syntax, query_id_cluster_with_type_arg_alt_syntax = make_query_from_function( + storage_type_as_arg=True, + alt_syntax=True, ) + # Cluster Query with node1 as coordinator and storage type in named collection, alternative syntax + select_cluster_with_type_in_nc_alt_syntax, query_id_cluster_with_type_in_nc_alt_syntax = make_query_from_function( + storage_type_in_named_collection=True, + alt_syntax=True, + ) + + #select_remote_cluster, _ = make_query_from_function(run_on_cluster=True, remote=True) + + def make_query_from_table(alt_syntax=False): + query_id = str(uuid.uuid4()) + settings = "SETTINGS object_storage_cluster='cluster_simple'" if alt_syntax else "" + responce = ( + instance.query( + f"SELECT * FROM {TABLE_NAME} {settings}", + query_id=query_id, + ) + .strip() + .split() + ) + return responce, query_id + + create_iceberg_table(storage_type, instance, TABLE_NAME, started_cluster_iceberg_with_spark, object_storage_cluster='cluster_simple') + select_cluster_table_engine, query_id_cluster_table_engine = make_query_from_table() + + #select_remote_cluster = ( + # instance.query(f"SELECT * FROM remote('node2',{table_function_expr_cluster})") + # .strip() + # .split() + #) + + instance.query(f"DROP TABLE IF EXISTS `{TABLE_NAME}` SYNC") + + create_iceberg_table(storage_type, instance, TABLE_NAME, started_cluster_iceberg_with_spark) + select_pure_table_engine, query_id_pure_table_engine = make_query_from_table() + select_pure_table_engine_cluster, query_id_pure_table_engine_cluster = make_query_from_table(alt_syntax=True) + + create_iceberg_table(storage_type, instance, TABLE_NAME, started_cluster_iceberg_with_spark, storage_type_as_arg=True) + select_pure_table_engine_with_type_arg, query_id_pure_table_engine_with_type_arg = make_query_from_table() + select_pure_table_engine_cluster_with_type_arg, query_id_pure_table_engine_cluster_with_type_arg = make_query_from_table(alt_syntax=True) + + create_iceberg_table(storage_type, instance, TABLE_NAME, started_cluster_iceberg_with_spark, storage_type_in_named_collection=True) + select_pure_table_engine_with_type_in_nc, query_id_pure_table_engine_with_type_in_nc = make_query_from_table() + select_pure_table_engine_cluster_with_type_in_nc, query_id_pure_table_engine_cluster_with_type_in_nc = make_query_from_table(alt_syntax=True) + # Simple size check assert len(select_regular) == 600 assert len(select_cluster) == 600 + assert len(select_cluster_alt_syntax) == 600 + assert len(select_cluster_table_engine) == 600 + #assert len(select_remote_cluster) == 600 + assert len(select_cluster_with_type_arg) == 600 + assert len(select_cluster_with_type_in_nc) == 600 + assert len(select_cluster_with_type_arg_alt_syntax) == 600 + assert len(select_cluster_with_type_in_nc_alt_syntax) == 600 + assert len(select_pure_table_engine) == 600 + assert len(select_pure_table_engine_cluster) == 600 + assert len(select_pure_table_engine_with_type_arg) == 600 + assert len(select_pure_table_engine_cluster_with_type_arg) == 600 + assert len(select_pure_table_engine_with_type_in_nc) == 600 + assert len(select_pure_table_engine_cluster_with_type_in_nc) == 600 # Actual check assert select_cluster == select_regular + assert select_cluster_alt_syntax == select_regular + assert select_cluster_table_engine == select_regular + #assert select_remote_cluster == select_regular + assert select_cluster_with_type_arg == select_regular + assert select_cluster_with_type_in_nc == select_regular + assert select_cluster_with_type_arg_alt_syntax == select_regular + assert select_cluster_with_type_in_nc_alt_syntax == select_regular + assert select_pure_table_engine == select_regular + assert select_pure_table_engine_cluster == select_regular + assert select_pure_table_engine_with_type_arg == select_regular + assert select_pure_table_engine_cluster_with_type_arg == select_regular + assert select_pure_table_engine_with_type_in_nc == select_regular + assert select_pure_table_engine_cluster_with_type_in_nc == select_regular # Check query_log for replica in started_cluster_iceberg_with_spark.instances.values(): replica.query("SYSTEM FLUSH LOGS") - for node_name, replica in started_cluster_iceberg_with_spark.instances.items(): - cluster_secondary_queries = ( - replica.query( - f""" - SELECT query, type, is_initial_query, read_rows, read_bytes FROM system.query_log - WHERE - type = 'QueryStart' AND - positionCaseInsensitive(query, '{storage_type}Cluster') != 0 AND - position(query, '{TABLE_NAME}') != 0 AND - position(query, 'system.query_log') = 0 AND - NOT is_initial_query - """ - ) - .strip() - .split("\n") - ) - - logging.info( - f"[{node_name}] cluster_secondary_queries: {cluster_secondary_queries}" - ) - assert len(cluster_secondary_queries) == 1 + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_cluster, 1, "table function") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_cluster_alt_syntax, 1, "table function alt syntax") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_cluster_table_engine, 1, "cluster table engine") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_cluster_with_type_arg, 1, "table function with storage type in args") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_cluster_with_type_in_nc, 1, "table function with storage type in named collection") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_cluster_with_type_arg_alt_syntax, 1, "table function with storage type in args alt syntax") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_cluster_with_type_in_nc_alt_syntax, 1, "table function with storage type in named collection alt syntax") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_pure_table_engine, 0, "table engine") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_pure_table_engine_cluster, 1, "table engine with cluster setting") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_pure_table_engine_with_type_arg, 0, "table engine with storage type in args") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_pure_table_engine_cluster_with_type_arg, 1, "table engine with cluster setting with storage type in args") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_pure_table_engine_with_type_in_nc, 0, "table engine with storage type in named collection") + count_secondary_subqueries(started_cluster_iceberg_with_spark, query_id_pure_table_engine_cluster_with_type_in_nc, 1, "table engine with cluster setting with storage type in named collection") - # write 3 times - assert int(instance.query(f"SELECT count() FROM {table_function_expr_cluster}")) == 100 * 3 @pytest.mark.parametrize("format_version", ["1", "2"]) @pytest.mark.parametrize("storage_type", ["s3", "azure"]) diff --git a/tests/integration/test_storage_iceberg_with_spark/test_minmax_pruning_with_null.py b/tests/integration/test_storage_iceberg_with_spark/test_minmax_pruning_with_null.py index ceb630acbd73..93ba2f765914 100644 --- a/tests/integration/test_storage_iceberg_with_spark/test_minmax_pruning_with_null.py +++ b/tests/integration/test_storage_iceberg_with_spark/test_minmax_pruning_with_null.py @@ -9,7 +9,10 @@ ) @pytest.mark.parametrize("storage_type", ["s3", "azure", "local"]) -def test_minmax_pruning_with_null(started_cluster_iceberg_with_spark, storage_type): +@pytest.mark.parametrize("run_on_cluster", [False, True]) +def test_minmax_pruning_with_null(started_cluster_iceberg_with_spark, storage_type, run_on_cluster): + if run_on_cluster and storage_type == "local": + pytest.skip("Local storage is not supported on cluster") instance = started_cluster_iceberg_with_spark.instances["node1"] spark = started_cluster_iceberg_with_spark.spark_session TABLE_NAME = "test_minmax_pruning_with_null" + storage_type + "_" + get_uuid_str() @@ -21,6 +24,7 @@ def execute_spark_query(query: str): storage_type, TABLE_NAME, query, + additional_nodes=["node2", "node3"] if storage_type=="local" else [], ) execute_spark_query( @@ -79,7 +83,7 @@ def execute_spark_query(query: str): ) creation_expression = get_creation_expression( - storage_type, TABLE_NAME, started_cluster_iceberg_with_spark, table_function=True + storage_type, TABLE_NAME, started_cluster_iceberg_with_spark, table_function=True, run_on_cluster=run_on_cluster ) def check_validity_and_get_prunned_files(select_expression): diff --git a/tests/integration/test_storage_iceberg_with_spark/test_partition_pruning.py b/tests/integration/test_storage_iceberg_with_spark/test_partition_pruning.py index 6ade42e72537..4c6a6b4c7bd7 100644 --- a/tests/integration/test_storage_iceberg_with_spark/test_partition_pruning.py +++ b/tests/integration/test_storage_iceberg_with_spark/test_partition_pruning.py @@ -9,7 +9,7 @@ @pytest.mark.parametrize( "storage_type, run_on_cluster", - [("s3", False), ("s3", True), ("azure", False), ("local", False), ("local", True)], + [("s3", False), ("s3", True), ("azure", False), ("azure", True), ("local", False), ("local", True)], ) def test_partition_pruning(started_cluster_iceberg_with_spark, storage_type, run_on_cluster): instance = started_cluster_iceberg_with_spark.instances["node1"] diff --git a/tests/integration/test_storage_iceberg_with_spark/test_types.py b/tests/integration/test_storage_iceberg_with_spark/test_types.py index 7f63df522db1..1dd605098279 100644 --- a/tests/integration/test_storage_iceberg_with_spark/test_types.py +++ b/tests/integration/test_storage_iceberg_with_spark/test_types.py @@ -86,3 +86,49 @@ def test_types(started_cluster_iceberg_with_spark, format_version, storage_type) ["e", "Nullable(Bool)"], ] ) + + # Test storage type as function argument + table_function_expr = get_creation_expression( + storage_type, + TABLE_NAME, + started_cluster_iceberg_with_spark, + table_function=True, + storage_type_as_arg=True, + ) + assert ( + instance.query(f"SELECT a, b, c, d, e FROM {table_function_expr}").strip() + == "123\tstring\t2000-01-01\t['str1','str2']\ttrue" + ) + + assert instance.query(f"DESCRIBE {table_function_expr} FORMAT TSV") == TSV( + [ + ["a", "Nullable(Int32)"], + ["b", "Nullable(String)"], + ["c", "Nullable(Date32)"], + ["d", "Array(Nullable(String))"], + ["e", "Nullable(Bool)"], + ] + ) + + # Test storage type as field in named collection + table_function_expr = get_creation_expression( + storage_type, + TABLE_NAME, + started_cluster_iceberg_with_spark, + table_function=True, + storage_type_in_named_collection=True, + ) + assert ( + instance.query(f"SELECT a, b, c, d, e FROM {table_function_expr}").strip() + == "123\tstring\t2000-01-01\t['str1','str2']\ttrue" + ) + + assert instance.query(f"DESCRIBE {table_function_expr} FORMAT TSV") == TSV( + [ + ["a", "Nullable(Int32)"], + ["b", "Nullable(String)"], + ["c", "Nullable(Date32)"], + ["d", "Array(Nullable(String))"], + ["e", "Nullable(Bool)"], + ] + ) diff --git a/tests/queries/0_stateless/01625_constraints_index_append.reference b/tests/queries/0_stateless/01625_constraints_index_append.reference index b68b514ca8bd..bf6f37328286 100644 --- a/tests/queries/0_stateless/01625_constraints_index_append.reference +++ b/tests/queries/0_stateless/01625_constraints_index_append.reference @@ -13,14 +13,14 @@ Prewhere info Prewhere filter Prewhere filter column: less(multiply(2, b), 100) - Filter column: and(equals(a, 0), indexHint(greater(plus(i, 40), 0))) (removed) + Filter column: and(indexHint(greater(plus(i, 40), 0)), equals(a, 0)) (removed) Prewhere info Prewhere filter Prewhere filter column: equals(a, 0) Prewhere info Prewhere filter Prewhere filter column: less(a, 0) (removed) - Filter column: and(greaterOrEquals(a, 0), indexHint(greater(plus(i, 40), 0))) (removed) + Filter column: and(indexHint(greater(plus(i, 40), 0)), greaterOrEquals(a, 0)) (removed) Prewhere info Prewhere filter Prewhere filter column: greaterOrEquals(a, 0)