airflow.providers.amazon.aws.operators.comprehend
¶
Module Contents¶
Classes¶
This is the base operator for Comprehend Service operators (not supposed to be used directly in DAGs). |
|
Create a comprehend pii entities detection job for a collection of documents. |
|
Create a comprehend document classifier that can categorize documents. |
- class airflow.providers.amazon.aws.operators.comprehend.ComprehendBaseOperator(input_data_config, output_data_config, data_access_role_arn, language_code, **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator
[airflow.providers.amazon.aws.hooks.comprehend.ComprehendHook
]This is the base operator for Comprehend Service operators (not supposed to be used directly in DAGs).
- Parameters
input_data_config (dict) – The input properties for a PII entities detection job. (templated)
output_data_config (dict) – Provides configuration parameters for the output of PII entity detection jobs. (templated)
data_access_role_arn (str) – The Amazon Resource Name (ARN) of the IAM role that grants Amazon Comprehend read access to your input data. (templated)
language_code (str) – The language of the input documents. (templated)
- class airflow.providers.amazon.aws.operators.comprehend.ComprehendStartPiiEntitiesDetectionJobOperator(input_data_config, output_data_config, mode, data_access_role_arn, language_code, start_pii_entities_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
Bases:
ComprehendBaseOperator
Create a comprehend pii entities detection job for a collection of documents.
See also
For more information on how to use this operator, take a look at the guide: Create an Amazon Comprehend Start PII Entities Detection Job
- Parameters
input_data_config (dict) – The input properties for a PII entities detection job. (templated)
output_data_config (dict) – Provides configuration parameters for the output of PII entity detection jobs. (templated)
mode (str) – Specifies whether the output provides the locations (offsets) of PII entities or a file in which PII entities are redacted. If you set the mode parameter to ONLY_REDACTION. In that case you must provide a RedactionConfig in start_pii_entities_kwargs.
data_access_role_arn (str) – The Amazon Resource Name (ARN) of the IAM role that grants Amazon Comprehend read access to your input data. (templated)
language_code (str) – The language of the input documents. (templated)
start_pii_entities_kwargs (dict[str, Any] | None) – Any optional parameters to pass to the job. If JobName is not provided in start_pii_entities_kwargs, operator will create.
wait_for_completion (bool) – Whether to wait for job to stop. (default: True)
waiter_delay (int) – Time in seconds to wait between status checks. (default: 60)
waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 20)
deferrable (bool) – If True, the operator will wait asynchronously for the job to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)
aws_conn_id – The Airflow connection used for AWS credentials. If this is
None
or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html
- class airflow.providers.amazon.aws.operators.comprehend.ComprehendCreateDocumentClassifierOperator(document_classifier_name, input_data_config, mode, data_access_role_arn, language_code, fail_on_warnings=False, output_data_config=None, document_classifier_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), aws_conn_id='aws_default', **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator
[airflow.providers.amazon.aws.hooks.comprehend.ComprehendHook
]Create a comprehend document classifier that can categorize documents.
Provide a set of training documents that are labeled with the categories.
See also
For more information on how to use this operator, take a look at the guide: Create an Amazon Comprehend Document Classifier
- Parameters
document_classifier_name (str) – The name of the document classifier. (templated)
input_data_config (dict[str, Any]) – Specifies the format and location of the input data for the job. (templated)
mode (str) – Indicates the mode in which the classifier will be trained. (templated)
data_access_role_arn (str) – The Amazon Resource Name (ARN) of the IAM role that grants Amazon Comprehend read access to your input data. (templated)
language_code (str) – The language of the input documents. You can specify any of the languages supported by Amazon Comprehend. All documents must be in the same language. (templated)
fail_on_warnings (bool) – If set to True, the document classifier training job will throw an error when the status is TRAINED_WITH_WARNING. (default False)
output_data_config (dict[str, Any] | None) – Specifies the location for the output files from a custom classifier job. This parameter is required for a request that creates a native document model. (templated)
document_classifier_kwargs (dict[str, Any] | None) – Any optional parameters to pass to the document classifier. (templated)
wait_for_completion (bool) – Whether to wait for job to stop. (default: True)
waiter_delay (int) – Time in seconds to wait between status checks. (default: 60)
waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 20)
deferrable (bool) – If True, the operator will wait asynchronously for the job to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)
aws_conn_id (str | None) – The Airflow connection used for AWS credentials. If this is
None
or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html