CloudDataCatalogSearchCatalogOperator

Google

Searches Data Catalog for multiple resources like entries, tags that match a query.

View on GitHub

Last Updated: Feb. 25, 2023

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

scopeRequiredRequired. The scope of this search request. If a dict is provided, it must be of the same form as the protobuf message Scope
queryRequiredRequired. The query string in search query syntax. The query must be non-empty. Query strings can be simple as “x” or more qualified as: name:x column:x description:y Note: Query tokens need to have a minimum of 3 characters for substring matching to work correctly. See Data Catalog Search Syntax for more information.
page_sizeThe maximum number of resources contained in the underlying API response. If page streaming is performed per-resource, this parameter does not affect the return value. If page streaming is performed per-page, this determines the maximum number of resources in a page.
order_bySpecifies the ordering of results, currently supported case-sensitive choices are: relevance, only supports descending last_access_timestamp [asc|desc], defaults to descending if not specified last_modified_timestamp [asc|desc], defaults to descending if not specified If not specified, defaults to relevance descending.
retryA retry object used to retry requests. If None is specified, requests will be retried using a default configuration.
timeoutThe amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadataAdditional metadata that is provided to the method.
gcp_conn_idOptional, The connection ID used to connect to Google Cloud. Defaults to ‘google_cloud_default’.
impersonation_chainOptional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

Documentation

Searches Data Catalog for multiple resources like entries, tags that match a query.

This does not return the complete resource, only the resource identifier and high level fields. Clients can subsequently call Get methods.

Note that searches do not have full recall. There may be results that match your query but are not returned, even in subsequent pages of results. These missing results may vary across repeated calls to search. Do not rely on this method if you need to guarantee full recall.

See also

For more information on how to use this operator, take a look at the guide: Search resources

Was this page helpful?