AWS Big Data Blog

Category: Announcements

Introducing self-managed data sources for Amazon OpenSearch Ingestion

Enterprise customers increasingly adopt Amazon OpenSearch Ingestion (OSI) to bring data into Amazon OpenSearch Service for various use cases. These include petabyte-scale log analytics, real-time streaming, security analytics, and searching semi-structured key-value or document data. OSI makes it simple, with straightforward integrations, to ingest data from many AWS services, including Amazon DynamoDB, Amazon Simple Storage […]

Amazon DataZone announces custom blueprints for AWS services

Last week, we announced the general availability of custom AWS service blueprints, a new feature in Amazon DataZone allowing you to customize your Amazon DataZone project environments to use existing AWS Identity and Access Management (IAM) roles and AWS services to embed the service into your existing processes. In this post, we share how this […]

Introducing AWS Glue usage profiles for flexible cost control

AWS Glue is a serverless data integration service that enables you to run extract, transform, and load (ETL) workloads on your data in a scalable and serverless manner. One of the main advantages of using a cloud platform is its flexibility; you can provision compute resources when you actually need them. However, with this ease […]

Introducing support for Apache Kafka on Raft mode (KRaft) with Amazon MSK clusters

Organizations are adopting Apache Kafka and Amazon Managed Streaming for Apache Kafka (Amazon MSK) to capture and analyze data in real time. Amazon MSK helps you build and run production applications on Apache Kafka without needing Kafka infrastructure management expertise or having to deal with the complex overhead associated with setting up and running Apache […]

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

Amazon OpenSearch Ingestion is a fully managed serverless pipeline that allows you to ingest, filter, transform, enrich, and route data to an Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection. OpenSearch Ingestion is capable of ingesting data from a wide variety of sources and has a rich ecosystem of built-in processors to take care […]

Amazon DocumentDB zero-ETL integration with Amazon OpenSearch Service is now available

Today, we are announcing the general availability of Amazon DocumentDB (with MongoDB compatibility) zero-ETL integration with Amazon OpenSearch Service. Amazon DocumentDB provides native text search and vector search capabilities. With Amazon OpenSearch Service, you can perform advanced search analytics, such as fuzzy search, synonym search, cross-collection search, and multilingual search, on Amazon DocumentDB data. Zero-ETL […]

Safely remove Kafka brokers from Amazon MSK provisioned clusters

Today, we are announcing broker removal capability for Amazon Managed Streaming for Apache Kafka (Amazon MSK) provisioned clusters, which lets you remove multiple brokers from your provisioned clusters. You can now reduce your cluster’s storage and compute capacity by removing sets of brokers, with no availability impact, data durability risk, or disruption to your data streaming […]

Introducing Amazon MWAA support for the Airflow REST API and web server auto scaling

Apache Airflow is a popular platform for enterprises looking to orchestrate complex data pipelines and workflows. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed service that streamlines the setup and operation of secure and highly available Airflow environments in the cloud. In this post, we’re excited to introduce two new features that […]

Introducing Amazon Q data integration in AWS Glue

Today, we’re excited to announce general availability of Amazon Q data integration in AWS Glue. Amazon Q data integration, a new generative AI-powered capability of Amazon Q Developer, enables you to build data integration pipelines using natural language. This reduces the time and effort you need to learn, build, and run data integration jobs using […]

Introducing Amazon MWAA larger environment sizes

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed service for Apache Airflow that streamlines the setup and operation of the infrastructure to orchestrate data pipelines in the cloud. Customers use Amazon MWAA to manage the scalability, availability, and security of their Apache Airflow environments. As they design more intensive, complex, and ever-growing […]