{ "cells": [ { "cell_type": "markdown", "id": "b0ee9a28", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "[![AWS SDK for pandas](_static/logo.png \"AWS SDK for pandas\")](https://github.com/aws/aws-sdk-pandas)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3a2a7b51", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# 37 - Glue Data Quality" ] }, { "cell_type": "markdown", "id": "42724a76", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ " AWS Glue Data Quality helps you evaluate and monitor the quality of your data." ] }, { "attachments": {}, "cell_type": "markdown", "id": "a9fd6b59-9c79-4016-82b4-ed4f3c6946ee", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Create test data\n", "\n", "First, let's start by creating test data, writing it to S3, and registering it in the Glue Data Catalog." ] }, { "cell_type": "code", "execution_count": null, "id": "5237eed8", "metadata": { "collapsed": false, "pycharm": { "is_executing": true, "name": "#%%\n" } }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "import awswrangler as wr\n", "\n", "glue_database = \"aws_sdk_pandas\"\n", "glue_table = \"my_glue_table\"\n", "path = \"s3://BUCKET_NAME/my_glue_table/\"\n", "\n", "df = pd.DataFrame({\"c0\": [0, 1, 2], \"c1\": [0, 1, 2], \"c2\": [0, 0, 0]})\n", "wr.s3.to_parquet(df, path, dataset=True, database=glue_database, table=glue_table, partition_cols=[\"c2\"])" ] }, { "cell_type": "markdown", "id": "13ba6d9d", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Start with recommended data quality rules" ] }, { "attachments": {}, "cell_type": "markdown", "id": "81fa434e", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "AWS Glue Data Quality can recommend a set of data quality rules so you can get started quickly.\n", "\n", "Note: Running Glue Data Quality recommendation and evaluation tasks requires an IAM role. This role must trust the Glue principal and allow permissions to various resources including the Glue table and the S3 bucket where your data is stored. Moreover, data quality IAM actions must be granted. To find out more, check [Authorization](https://docs.aws.amazon.com/glue/latest/dg/data-quality-authorization.html)." ] }, { "cell_type": "code", "execution_count": 7, "id": "a1983e63", "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rule_typeparameterexpression
0RowCountNonebetween 1 and 6
1IsComplete\"c0\"None
2Uniqueness\"c0\"> 0.95
3ColumnValues\"c0\"<= 2
4IsComplete\"c1\"None
5Uniqueness\"c1\"> 0.95
6ColumnValues\"c1\"<= 2
7IsComplete\"c2\"None
8ColumnValues\"c2\"in [\"0\"]
\n", "
" ], "text/plain": [ " rule_type parameter expression\n", "0 RowCount None between 1 and 6\n", "1 IsComplete \"c0\" None\n", "2 Uniqueness \"c0\" > 0.95\n", "3 ColumnValues \"c0\" <= 2\n", "4 IsComplete \"c1\" None\n", "5 Uniqueness \"c1\" > 0.95\n", "6 ColumnValues \"c1\" <= 2\n", "7 IsComplete \"c2\" None\n", "8 ColumnValues \"c2\" in [\"0\"]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_ruleset = \"ruleset_1\"\n", "iam_role_arn = \"arn:aws:iam::...\" # IAM role assumed by the Glue Data Quality job to access resources\n", "\n", "df_recommended_ruleset = wr.data_quality.create_recommendation_ruleset( # Creates a recommended ruleset\n", " name=first_ruleset,\n", " database=glue_database,\n", " table=glue_table,\n", " iam_role_arn=iam_role_arn,\n", " number_of_workers=2,\n", ")\n", "\n", "df_recommended_ruleset" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a34270e9", "metadata": {}, "source": [ "## Update the recommended rules\n", "\n", "Recommended rulesets are not perfect and you are likely to modify them or create your own." ] }, { "cell_type": "code", "execution_count": 17, "id": "78f01dba", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rule_typeparameterexpression
0RowCountNonebetween 1 and 6
1IsComplete\"c0\"None
2Uniqueness\"c0\"> 0.95
3ColumnValues\"c0\"<= 2
4IsComplete\"c1\"None
5Uniqueness\"c1\"> 0.95
6ColumnValues\"c1\"<= 2
7IsComplete\"c2\"None
8ColumnValues\"c2\"in [0, 1, 2]
9Uniqueness\"c2\"> 0.95
\n", "
" ], "text/plain": [ " rule_type parameter expression\n", "0 RowCount None between 1 and 6\n", "1 IsComplete \"c0\" None\n", "2 Uniqueness \"c0\" > 0.95\n", "3 ColumnValues \"c0\" <= 2\n", "4 IsComplete \"c1\" None\n", "5 Uniqueness \"c1\" > 0.95\n", "6 ColumnValues \"c1\" <= 2\n", "7 IsComplete \"c2\" None\n", "8 ColumnValues \"c2\" in [0, 1, 2]\n", "9 Uniqueness \"c2\" > 0.95" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Append and update rules\n", "df_updated_ruleset = df_recommended_ruleset.append(\n", " {\"rule_type\": \"Uniqueness\", \"parameter\": '\"c2\"', \"expression\": \"> 0.95\"}, ignore_index=True\n", ")\n", "\n", "df_updated_ruleset.at[8, \"expression\"] = \"in [0, 1, 2]\"\n", "\n", "# Update the existing ruleset (upsert)\n", "wr.data_quality.update_ruleset(\n", " name=first_ruleset,\n", " df_rules=df_updated_ruleset,\n", " mode=\"upsert\", # update existing or insert new rules to the ruleset\n", ")\n", "\n", "wr.data_quality.get_ruleset(name=first_ruleset)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "52ee5851", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Run a data quality task\n", "\n", "The ruleset can now be evaluated against the data. A cluster with 2 workers is used for the run. It returns a report with `PASS`/`FAIL` results for each rule." ] }, { "cell_type": "code", "execution_count": 20, "id": "a3aeec39", "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameDescriptionResultResultIdEvaluationMessage
0Rule_1RowCount between 1 and 6PASSdqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5NaN
1Rule_2IsComplete \"c0\"PASSdqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5NaN
2Rule_3Uniqueness \"c0\" > 0.95PASSdqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5NaN
3Rule_4ColumnValues \"c0\" <= 2PASSdqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5NaN
4Rule_5IsComplete \"c1\"PASSdqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5NaN
5Rule_6Uniqueness \"c1\" > 0.95PASSdqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5NaN
6Rule_7ColumnValues \"c1\" <= 2PASSdqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5NaN
7Rule_8IsComplete \"c2\"PASSdqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5NaN
8Rule_9ColumnValues \"c2\" in [0,1,2]PASSdqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5NaN
9Rule_10Uniqueness \"c2\" > 0.95FAILdqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5Value: 0.0 does not meet the constraint requir...
\n", "
" ], "text/plain": [ " Name Description Result \\\n", "0 Rule_1 RowCount between 1 and 6 PASS \n", "1 Rule_2 IsComplete \"c0\" PASS \n", "2 Rule_3 Uniqueness \"c0\" > 0.95 PASS \n", "3 Rule_4 ColumnValues \"c0\" <= 2 PASS \n", "4 Rule_5 IsComplete \"c1\" PASS \n", "5 Rule_6 Uniqueness \"c1\" > 0.95 PASS \n", "6 Rule_7 ColumnValues \"c1\" <= 2 PASS \n", "7 Rule_8 IsComplete \"c2\" PASS \n", "8 Rule_9 ColumnValues \"c2\" in [0,1,2] PASS \n", "9 Rule_10 Uniqueness \"c2\" > 0.95 FAIL \n", "\n", " ResultId \\\n", "0 dqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5 \n", "1 dqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5 \n", "2 dqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5 \n", "3 dqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5 \n", "4 dqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5 \n", "5 dqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5 \n", "6 dqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5 \n", "7 dqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5 \n", "8 dqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5 \n", "9 dqresult-be413b527c0e5520ad843323fecd9cf2e2edbdd5 \n", "\n", " EvaluationMessage \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "5 NaN \n", "6 NaN \n", "7 NaN \n", "8 NaN \n", "9 Value: 0.0 does not meet the constraint requir... " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.data_quality.evaluate_ruleset(\n", " name=first_ruleset,\n", " iam_role_arn=iam_role_arn,\n", " number_of_workers=2,\n", ")" ] }, { "cell_type": "markdown", "id": "170cd16d", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Create ruleset from Data Quality Definition Language definition" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e973a4fd", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "The Data Quality Definition Language (DQDL) is a domain specific language that you can use to define Data Quality rules. For the full syntax reference, see [DQDL](https://docs.aws.amazon.com/glue/latest/dg/dqdl.html)." ] }, { "cell_type": "code", "execution_count": 21, "id": "f5209459", "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "second_ruleset = \"ruleset_2\"\n", "\n", "dqdl_rules = (\n", " \"Rules = [\"\n", " \"RowCount between 1 and 6,\"\n", " 'IsComplete \"c0\",'\n", " 'Uniqueness \"c0\" > 0.95,'\n", " 'ColumnValues \"c0\" <= 2,'\n", " 'IsComplete \"c1\",'\n", " 'Uniqueness \"c1\" > 0.95,'\n", " 'ColumnValues \"c1\" <= 2,'\n", " 'IsComplete \"c2\",'\n", " 'ColumnValues \"c2\" <= 1'\n", " \"]\"\n", ")\n", "\n", "wr.data_quality.create_ruleset(\n", " name=second_ruleset,\n", " database=glue_database,\n", " table=glue_table,\n", " dqdl_rules=dqdl_rules,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9b791ace", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Create or update a ruleset from a data frame" ] }, { "attachments": {}, "cell_type": "markdown", "id": "74fddded", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "AWS SDK for pandas also enables you to create or update a ruleset from a pandas data frame." ] }, { "cell_type": "code", "execution_count": 24, "id": "f1d1905c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rule_typeparameterexpression
0RowCountNonebetween 2 and 8
1ColumnCorrelation\"c0\" \"c1\"> 0.8
2Uniqueness\"c0\"> 0.95
\n", "
" ], "text/plain": [ " rule_type parameter expression\n", "0 RowCount None between 2 and 8\n", "1 ColumnCorrelation \"c0\" \"c1\" > 0.8\n", "2 Uniqueness \"c0\" > 0.95" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "third_ruleset = \"ruleset_3\"\n", "\n", "df_rules = pd.DataFrame(\n", " {\n", " \"rule_type\": [\"RowCount\", \"ColumnCorrelation\", \"Uniqueness\"],\n", " \"parameter\": [None, '\"c0\" \"c1\"', '\"c0\"'],\n", " \"expression\": [\"between 2 and 8\", \"> 0.8\", \"> 0.95\"],\n", " }\n", ")\n", "\n", "wr.data_quality.create_ruleset(\n", " name=third_ruleset,\n", " df_rules=df_rules,\n", " database=glue_database,\n", " table=glue_table,\n", ")\n", "\n", "wr.data_quality.get_ruleset(name=third_ruleset)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c846421e", "metadata": {}, "source": [ "## Get multiple rulesets" ] }, { "cell_type": "code", "execution_count": 25, "id": "b8b1d0b0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rule_typeparameterexpressionruleset
0RowCountNonebetween 1 and 6ruleset_1
1IsComplete\"c0\"Noneruleset_1
2Uniqueness\"c0\"> 0.95ruleset_1
3ColumnValues\"c0\"<= 2ruleset_1
4IsComplete\"c1\"Noneruleset_1
5Uniqueness\"c1\"> 0.95ruleset_1
6ColumnValues\"c1\"<= 2ruleset_1
7IsComplete\"c2\"Noneruleset_1
8ColumnValues\"c2\"in [0, 1, 2]ruleset_1
9Uniqueness\"c2\"> 0.95ruleset_1
0RowCountNonebetween 1 and 6ruleset_2
1IsComplete\"c0\"Noneruleset_2
2Uniqueness\"c0\"> 0.95ruleset_2
3ColumnValues\"c0\"<= 2ruleset_2
4IsComplete\"c1\"Noneruleset_2
5Uniqueness\"c1\"> 0.95ruleset_2
6ColumnValues\"c1\"<= 2ruleset_2
7IsComplete\"c2\"Noneruleset_2
8ColumnValues\"c2\"<= 1ruleset_2
0RowCountNonebetween 2 and 8ruleset_3
1ColumnCorrelation\"c0\" \"c1\"> 0.8ruleset_3
2Uniqueness\"c0\"> 0.95ruleset_3
\n", "
" ], "text/plain": [ " rule_type parameter expression ruleset\n", "0 RowCount None between 1 and 6 ruleset_1\n", "1 IsComplete \"c0\" None ruleset_1\n", "2 Uniqueness \"c0\" > 0.95 ruleset_1\n", "3 ColumnValues \"c0\" <= 2 ruleset_1\n", "4 IsComplete \"c1\" None ruleset_1\n", "5 Uniqueness \"c1\" > 0.95 ruleset_1\n", "6 ColumnValues \"c1\" <= 2 ruleset_1\n", "7 IsComplete \"c2\" None ruleset_1\n", "8 ColumnValues \"c2\" in [0, 1, 2] ruleset_1\n", "9 Uniqueness \"c2\" > 0.95 ruleset_1\n", "0 RowCount None between 1 and 6 ruleset_2\n", "1 IsComplete \"c0\" None ruleset_2\n", "2 Uniqueness \"c0\" > 0.95 ruleset_2\n", "3 ColumnValues \"c0\" <= 2 ruleset_2\n", "4 IsComplete \"c1\" None ruleset_2\n", "5 Uniqueness \"c1\" > 0.95 ruleset_2\n", "6 ColumnValues \"c1\" <= 2 ruleset_2\n", "7 IsComplete \"c2\" None ruleset_2\n", "8 ColumnValues \"c2\" <= 1 ruleset_2\n", "0 RowCount None between 2 and 8 ruleset_3\n", "1 ColumnCorrelation \"c0\" \"c1\" > 0.8 ruleset_3\n", "2 Uniqueness \"c0\" > 0.95 ruleset_3" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.data_quality.get_ruleset(name=[first_ruleset, second_ruleset, third_ruleset])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e8024a77", "metadata": {}, "source": [ "## Evaluate Data Quality for a given partition\n", "\n", "A data quality evaluation run can be limited to specific partition(s) by leveraging the `pushDownPredicate` expression in the `additional_options` argument " ] }, { "cell_type": "code", "execution_count": 26, "id": "332d2e95", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameDescriptionResultResultIdEvaluationMessage
0Rule_1RowCount between 2 and 8PASSdqresult-f676cfe0345aa93f492e3e3c3d6cf1ad99b84dc6NaN
1Rule_2ColumnCorrelation \"c0\" \"c1\" > 0.8FAILdqresult-f676cfe0345aa93f492e3e3c3d6cf1ad99b84dc6Value: 0.5 does not meet the constraint requir...
2Rule_3Uniqueness \"c0\" > 0.95PASSdqresult-f676cfe0345aa93f492e3e3c3d6cf1ad99b84dc6NaN
\n", "
" ], "text/plain": [ " Name Description Result \\\n", "0 Rule_1 RowCount between 2 and 8 PASS \n", "1 Rule_2 ColumnCorrelation \"c0\" \"c1\" > 0.8 FAIL \n", "2 Rule_3 Uniqueness \"c0\" > 0.95 PASS \n", "\n", " ResultId \\\n", "0 dqresult-f676cfe0345aa93f492e3e3c3d6cf1ad99b84dc6 \n", "1 dqresult-f676cfe0345aa93f492e3e3c3d6cf1ad99b84dc6 \n", "2 dqresult-f676cfe0345aa93f492e3e3c3d6cf1ad99b84dc6 \n", "\n", " EvaluationMessage \n", "0 NaN \n", "1 Value: 0.5 does not meet the constraint requir... \n", "2 NaN " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame({\"c0\": [2, 0, 1], \"c1\": [1, 0, 2], \"c2\": [1, 1, 1]})\n", "wr.s3.to_parquet(df, path, dataset=True, database=glue_database, table=glue_table, partition_cols=[\"c2\"])\n", "\n", "wr.data_quality.evaluate_ruleset(\n", " name=third_ruleset,\n", " iam_role_arn=iam_role_arn,\n", " number_of_workers=2,\n", " additional_options={\n", " \"pushDownPredicate\": \"(c2 == '1')\",\n", " },\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "awswrangler-v9JnknIF-py3.8", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "vscode": { "interpreter": { "hash": "83297b058d59ee0acd247586c837429190a8258f15c0eea6234359f5557dde51" } } }, "nbformat": 4, "nbformat_minor": 5 }