Best practices for importing and exporting data

This page provides best practices for importing and exporting data with Cloud SQL. For step-by-step instructions for importing data into Cloud SQL, see Importing Data. For step-by-step instructions for exporting your data, whether it is in Cloud SQL or an instance you manage, see Exporting Data.

Best practices for importing and exporting

The following are best practices to consider when importing and exporting data:

Don't use Cloud Storage Requester Pays buckets

You cannot use a Cloud Storage bucket that has Requester Pays enabled for imports and exports from Cloud SQL.

Compress data to reduce cost

Cloud SQL supports importing and exporting both compressed and uncompressed files. Compression can save significant storage space on Cloud Storage and reduce your storage costs, especially when you are exporting large instances.

When you export a BAK file, use a .gz file extension to compress the data. When you import a file with an extension of .gz, it is decompressed automatically.

Reduce long-running import and export processes

Imports into Cloud SQL and exports out of Cloud SQL can take a long time to complete, depending on the size of the data being processed. This can have the following impacts:

  • You can't stop a long-running Cloud SQL instance operation.
  • You can perform only one import or export operation at a time for each instance, and a long-running import or export blocks other operations, such as daily automated backups.

You can decrease the amount of time it takes to complete each operation by using the Cloud SQL import or export functionality with smaller batches of data.

For whole database migrations, you generally should use BAK files rather than SQL files for imports. Generally, importing from a SQL file takes much longer than importing from a BAK file.

Use SqlPackage for importing and exporting data

You can import and export data in Cloud SQL by using SqlPackage. It enables you to export a SQL database, including database schema and user data, to a BACPAC file (.bacpac) and to import the schema and table data from a BACPAC file into a new user database.

SqlPackage uses your credentials to connect to SQL Server to perform database imports and exports. It makes migrations available for all Cloud SQL users. To perform import and export operations, you must have the following:

  • A workstation that is connected to your instance, where you can run SqlPackage. To learn more about connectivity options, see About connection options.

  • SqlPackage installed on your system. To learn more about downloading and installing SqlPackage, see the Microsoft documentation.

  • Credentials set up to access your instance. To learn more about setting up credentials, see How to authenticate to Cloud SQL.

Examples

Import

To import data to a database AdventureWorks2017, run the following command:

c:\Program Files\Microsoft SQL Server\160\DAC\bin>SqlPackage
/Action:Import /tsn:myTargetServer /tdn:AdventureWorks2017
/tu:myUsername /sf:mySourceFile
/TargetTrustServerCertificate:True /tp:myPassword

Here,

  • mySourceFile is a source file that you want to use as the source of action from local storage. If you use this parameter, no other source parameter is valid.
  • myTargetServer is the name of the server hosting the target database.
  • myUsername is the SQL Server username that you want to use to access the target database.
  • myPassword is your password in the credentials.

To learn more, see the Microsoft documentation.

Export

To export data from a database AdventureWorks2017, run the following command:

c:\Program Files\Microsoft SQL Server\160\DAC\bin>SqlPackage
/Action:Export /TargetFile:"myTargetFile"
/ssn:mySourceServer /su:myUsername /sdn:AdventureWorks2017
/SourceTrustServerCertificate:True /sp:myPassword

Here,

  • myTargetFile is the target file (a .dacpac file) that you want to use as the target of action instead of a database. If you use this parameter, no other target parameter is valid. This parameter is invalid for actions that only support database targets.
  • myUsername is the SQL Server username that you want to use to access the source database.
  • mySourceServer is the name of the server hosting the source database.
  • myPassword is your password in the credentials.

To learn more, see the Microsoft documentation.

Use the bcp utility for importing and exporting data

Another option to import and export data in Cloud SQL is using the bulk copy program (bcp) utility. By using the bcp utility, you can export data from a SQL Server database into a data file and import data from a data file into a SQL Server database. The bcp utility uses your credentials to connect to SQL Server to perform database imports and exports. It makes transfers available for all Cloud SQL users. To perform import and export operations, you must have the following:

  • A workstation where you can run the bcp utility, and that has connectivity to your Cloud SQL instance. To learn more about connectivity options, see About connection options.

  • The bcp utility installed on your system. To learn more about downloading and installing bcp, see the Microsoft documentation.

  • Credentials set up to access your instance. To learn more about setting up credentials, see How to authenticate to Cloud SQL.

Examples

Import

To import data from the person.csv file to the Person table of the AdventureWorks2017 database, run the following command:

bcp Person.Person in "person.csv" -d AdventureWorks2017 -U myLoginID -S myServer

Here,

  • myLoginID is the login ID used to connect to SQL Server.
  • myServer is the instance of SQL Server to which you want to connect. If you don't specify a server, the bcp utility connects to the default instance of SQL Server on the local computer.

To learn more, see the Microsoft documentation.

Export

To export data from the Person table of the AdventureWorks2017 database to the person.dat file, run the following command:

bcp Person.Person out "person.dat" -U myLoginID -S myServer -d AdventureWorks2017

Here,

  • myLoginID is the login ID used to connect to SQL Server.
  • myServer is the instance of SQL Server to which you want to connect. If you don't specify a server, the bcp utility connects to the default instance of SQL Server on the local computer.

To learn more, see the Microsoft documentation.

Use bulk insert for importing data

Bulk insert lets you import data into your Cloud SQL for SQL Server database from a file stored in Cloud Storage.

This section describes the following:

Required roles and permissions

To configure bulk insert, you need the following:

  • The CONTROL permission on the database where you want to import the data.
  • An HMAC access key and a secret mapped to an IAM account with the following permissions:

    • storage.buckets.get
    • storage.objects.create and storage.multipartUploads.create for writing error logs and examples of bad data.

    Alternatively, you can also use following roles:

    • Storage Object Viewer
    • Storage Object Creator for writing error logs and examples of bad data.

To use bulk insert, you need the following:

  • The EXECUTE permission on the msdb.dbo.gcloudsql_bulk_insert stored procedure. Cloud SQL creates the stored procedure after bulk insert is enabled on the instance. Cloud SQL grants the EXECUTE permission to the sqlserver admin account by default.
  • The INSERT permission on the object where you want to import the data.

For more information on creating users for bulk insert, see Create and manage users.

Considerations when using bulk insert

This section has recommendations for handling security, performance, and reliability on instances while using bulk insert.

Security

Cloud SQL encrypts and stores the HMAC access key and secret in an instance as a database scoped credential. Their values cannot be accessed after they are saved. You can delete the key and secret from an instance by dropping the database scoped credential using a T-SQL command. If you take any backup while the key and secret are stored on the instance, then that backup would contain that key and secret. You can also render the key invalid by deactivating and deleting the HMAC key.

The following operations can inadvertently transfer the access key and secret and make them available:

  • Cloning the instance: the key and secret are available on the cloned instance.
  • Creating a read replica: the key and secret are available on the created read replica.
  • Restoring from a backup: the key and secret are available on the instance restored from a backup.

We recommend that you drop the key and secret from the target instance after performing these operations.

Bulk insert can write data that it can't parse to a file stored in a Cloud Storage bucket. If you want to protect data that bulk insert has access to, then configure VPC service controls.

Performance

We recommend doing the following to mitigate performance impacts while using bulk insert:

  • Test and set an appropriate value for @batchsize because by default, all data is imported in a single batch.
  • For large inserts, disable indexes temporarily to speed up data insertion.
  • If possible, use the @tablock option because this can reduce contention and increase data load performance.
  • Use the @ordercolumnsjson parameter to specify data sorted in the order of the clustered index. This helps with better instance performance.
Reliability

We recommend doing the following to mitigate impact on instance reliability while using bulk insert:

  • If a failure occurs and @batchsize is used, this can lead to partially loaded data. You might need to manually clean up this data on your instance.
  • Use the @errorfile option to keep a log of errors and examples of bad data detected during the load process. This makes it easier to identify rows that have failed to load.

Perform bulk insert

You can perform the bulk insert operation using the following stored procedure:

msdb.dbo.gcloudsql_bulk_insert

For more information, see Stored procedure for using bulk insert.

Example: Import data from a file in Cloud Storage and specify an error file
1. Enable bulk insert

To enable bulk insert on your instance, enable the cloud sql enable bulk insert flag.

gcloud sql instances patch INSTANCE_NAME --database-flags="cloud sql enable bulk insert"=on

Replace INSTANCE_NAME with the name of the instance that you want to use for bulk insert.

For more information, see configure database flags.

After you enable this flag on your instance, Cloud SQL installs the bulk insert stored procedure on your instance and gives the sqlserver admin account permissions to execute.

2. Create an HMAC key

You require an HMAC key to access your Cloud Storage bucket. We recommend that you create an HMAC key for a service account and grant the service account permissions to the buckets that you want to use for bulk insert. For more information and security considerations, see Considerations when using bulk insert.

3. Create sample data to import
  1. Using a text editor, create a file with ANSI or UTF-16 encoding that has the following sample data. Save the file in your Cloud Storage bucket and name it as bulkinsert.bcp, for example.

    1,Elijah,Johnson,1962-03-21
    2,Anya,Smith,1982-01-15
    3,Daniel,Jones,1990-05-21
    
  2. Create a format file using the following sample data. Save the file in your Cloud Storage bucket and name it as bulkinsert.fmt, for example. For more information about XML and non-XML format files in SQL Server, see Create a Format File.

    13.0
    4
    1       SQLCHAR             0       7       ","      1     PersonID               ""
    2       SQLCHAR             0       25      ","      2     FirstName            SQL_Latin1_General_CP1_CI_AS
    3       SQLCHAR             0       30      ","      3     LastName            SQL_Latin1_General_CP1_CI_AS
    4       SQLCHAR             0       11      "\r\n"   4     BirthDate             ""
    
4. Execute the stored procedure
  1. Connect to your instance using the sqlserver user and create a sample database and table for bulk insert.

    USE MASTER
    GO
    -- create test database
    DROP DATABASE IF EXISTS bulktest
    CREATE DATABASE bulktest
    GO
    
    -- create table to insert
    USE bulktest;
    GO
    CREATE TABLE dbo.myfirstimport(
    PersonID smallint,
    FirstName varchar(25),
    LastName varchar(30),
    BirthDate Date
    );
    
  2. Create a database master key, a database scoped credential, and an external data source. Set the identity as S3 Access Key.

      -- create master key
    CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'YourStrongPassword1';
    
    -- create database scoped credential
    CREATE DATABASE SCOPED CREDENTIAL GCSCredential
    WITH IDENTITY = 'S3 Access Key',
    SECRET = '<Access key>:<Secret>';
    
    --create external data source
    CREATE EXTERNAL DATA SOURCE GCSStorage
    WITH ( TYPE = BLOB_STORAGE,
    LOCATION = 's3://storage.googleapis.com/bulk-insert-demo/'
    , CREDENTIAL = GCSCredential
    );
    
    CREATE EXTERNAL DATA SOURCE GCSStorageError
    WITH ( TYPE = BLOB_STORAGE,
    LOCATION = 's3://storage.googleapis.com/bulk-insert-demo/'
    , CREDENTIAL = GCSCredential
    );
    
  3. Execute the bulk insert stored procedure to import the sample data.

    EXEC msdb.dbo.gcloudsql_bulk_insert
    @database = 'bulktest',
    @schema = 'dbo',
    @object = 'myfirstimport',
    @file = 's3://storage.googleapis.com/bulk-insert-demo/bulkinsert.bcp',
    @formatfile = 's3://storage.googleapis.com/bulk-insert-demo/bulkinsert.fmt',
    @fieldquote = '"',
    @formatfiledatasource = 'GCSStorage',
    @ROWTERMINATOR = '0x0A',
    @fieldterminator = ',',
    @datasource ='GCSStorage',
    @errorfiledatasource = 'GCSStorageError',
    @errorfile = 's3://storage.googleapis.com/oom-data/bulkinsert/bulkinsert_sampleimport.log',
    @ordercolumnsjson =
    '[{"name": "PersonID","order": " asc "},{"name": "BirthDate","order": "asc"}]'
    
    

View the imported data

You can view the imported data by using one of the following methods:

  • Run the following query:

    SELECT * FROM dbo.myfirstimport
    
  • Cloud SQL adds a record of this procedure to the SQL error log. You can view this in Cloud Logging. You can also view this in the SQL error log data on SQL Server Management Studio (SSMS).

Disable bulk insert

To disable bulk insert, remove the cloud sql enable bulk insert flag:

  gcloud sql instances patch INSTANCE_NAME --database-flags="cloudsql enable bulk insert"=off
  

Replace INSTANCE_NAME with the name of the instance where you want to remove bulk insert.

Alternatively, you can run the following command to clear all database flags:

  gcloud sql instances patch INSTANCE_NAME --clear-database-flags
  

Replace INSTANCE_NAME with the name of the instance where you want to remove bulk insert.

Use striped import and export

When you perform a striped import or export, you reduce the time it takes for the operation to complete, and enable databases larger than 5 TB to be imported and exported. For more information, see Export and import using BAK files.

Verify the imported database

After an import operation is complete, connect to your database and run the appropriate database commands to make sure the contents are correct. For example, connect and list the databases, tables, and specific entries.

Known limitations

For a list of known limitations, see Issues with importing and exporting data.

Automating export operations

Although Cloud SQL doesn't provide a built-in way to automate database exports, you can build your own automation tool using several Google Cloud components. To learn more, see this tutorial.

Troubleshooting

Troubleshooting import operations

Issue Troubleshooting
HTTP Error 409: Operation failed because another operation was already in progress. There is already a pending operation for your instance. Only one operation is allowed at a time. Try your request after the current operation is complete.
The import operation is taking too long. Too many active connections can interfere with import operations.

Close unused operations. Check the CPU and memory usage of your Cloud SQL instance to make sure there are plenty of resources available. The best way to ensure maximum resources for the import is to restart the instance before beginning the operation.

A restart:

  • Closes all connections.
  • Ends any tasks that may be consuming resources.
An import operation can fail when one or more users referenced in the dump file don't exist. Before importing a dump file, all the database users who own objects or were granted permissions on objects in the dumped database must exist in the target database. If they don't, the import operation fails to recreate the objects with the original ownership or permissions.

Create the database users before importing.

LSN mismatch The order of the import of transaction log backups is incorrect or the transaction log chain is broken.
Import the transaction log backups in the same order as that in the backup set table.
StopAt too early This error indicates that the first log in the transaction log file is after the StopAt timestamp. For example, if the first log in the transaction log file is at 2023-09-01T12:00:00 and the StopAt field has value of 2023-09-01T11:00:00, then Cloud SQL returns this error.
Ensure that you use the correct StopAt timestamp and the correct transaction log file.

Troubleshooting export operations

Issue Troubleshooting
HTTP Error 409: Operation failed because another operation was already in progress. There is already a pending operation for your instance. Only one operation is allowed at a time. Try your request after the current operation is complete.
HTTP Error 403: The service account does not have the required permissions for the bucket. Ensure that the bucket exists and the service account for the Cloud SQL instance (which is performing the export) has the Storage Object Creator role (roles/storage.objectCreator) to allow export to the bucket. See IAM roles for Cloud Storage.
You want exports to be automated. Cloud SQL does not provide a way to automate exports.

You could build your own automated export system using Google Cloud products such as Cloud Scheduler, Pub/Sub, and Cloud Run functions, similar to this article on automating backups.

What's next