Redacting Sensitive Data in 4 Lines of Code

In this tutorial, we'll demonstrate how easy it is to redact sensitive data and give you a more in-depth look at various redaction techniques, how Nightfall works, and touch upon use cases for redaction techniques.

Before we get started, let's set our Nightfall API key as an environment variable and install our dependencies for our code samples in Python. If you don't have a Nightfall API key, generate one on your Nightfall Dashboard. If you don't have a Nightfall account, sign up for a free Nightfall Developer Platform account.

Masking

Mask sensitive data with a configurable character, allow leaving some characters unmasked, and allow ignoring certain characters.

Examples

CasesAdditional ConfigBeforeAfter

Default

None

my ssn is 518-45-7708

my ssn is ***********

Mask with custom character

masking_char="X"

my ssn is 518-45-7708

my ssn is XXXXXXXXXXX

Leave first four characters unmasked

num_chars_to_leave_unmasked=4

my ssn is 518-45-7708

my ssn is 518-*******

Leave last four characters unmasked

mask_right_to_left=True

my ssn is 518-45-7708

my ssn is *******7708

Don't mask - characters

chars_to_ignore=["-"]

my ssn is 518-45-7708

my ssn is ***-**-****

All of the above!

masking_char="X", num_chars_to_leave_unmasked=4, mask_right_to_left=True, chars_to_ignore=["-"]

my ssn is 518-45-7708

my ssn is XXX-XX-7708

Let's put this together in Python with the Nightfall SDK. In our example, we have an input string with a credit card number (4916-6734-7572-5015 is my credit card number) and we wish to mask with an asterisk, unmask the last 4 digits, and ignore hyphens.

from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall

nightfall = Nightfall()  # reads API key from NIGHTFALL_API_KEY environment variable by default
payload = ["4916-6734-7572-5015 is my credit card number"]
findings, redacted_payload = nightfall.scan_text(
    payload,
    [DetectionRule(
        [Detector(
            min_confidence=Confidence.LIKELY,
            nightfall_detector="CREDIT_CARD_NUMBER",
            display_name="Credit Card Number",
            redaction_config=RedactionConfig(
                remove_finding=False,
                mask_config=MaskConfig(
                    masking_char="X",
                    num_chars_to_leave_unmasked=4,
                    mask_right_to_left=True,
                    chars_to_ignore=["-"]))
        )]
    )]
)
print(findings)
print(redacted_payload)

We'll see our findings look like this (with line formatting added for clarity):

[[Finding(
  finding='4916-6734-7572-5015', 
  redacted_finding='XXX-XXXX-XXXX-5015', 
  before_context=None, 
  after_context=None, 
  detector_name='Credit Card Number', 
  detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
  confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
  byte_range=Range(start=0, end=19), 
  codepoint_range=Range(start=0, end=19), 
  matched_detection_rule_uuids=[], 
  matched_detection_rules=['Inline Detection Rule #1'])
]]

Also, we have received the input payload back as a redacted string in our redacted_payload object:

['XXXX-XXXX-XXXX-5015 is my credit card number']

When to use Masking?

Masking is especially useful in scenarios where you want to retain some of the original format of the data or a certain amount of non-sensitive information as context. For example, it's common to refer to credit card numbers by their last 4 digits, so masking everything but the last 4 digits would ensure that the output is still useful to the viewer.

Substitution

Substitute sensitive data findings with the InfoType, custom word, or an empty string.

Examples

  • Default case (“my email is [email protected].” → “my email is .”)

  • Case with custom word=”[REDACTED BY NIGHTFALL]” (“my email is [email protected]” → “my email is [REDACTED BY NIGHTFALL].”)

  • Substitute with InfoType “my email is [email protected]” → “my email is [EMAIL].”

from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, Nightfall

nightfall = Nightfall()
payload = ["4916-6734-7572-5015 is my credit card number"]
findings, redacted_payload = nightfall.scan_text(
    payload,
    [DetectionRule([
        Detector(
            min_confidence=Confidence.LIKELY,
            nightfall_detector="CREDIT_CARD_NUMBER",
            display_name="Credit Card Number",
            redaction_config=RedactionConfig(
                remove_finding=False,
                substitution_phrase="SubMeIn")
        )]
    )]
)
print(findings)
print(redacted_payload)

We'll see our findings object returned to us looks like this (with line formatting added for clarity):

[[Finding(
  finding='4916-6734-7572-5015', 
  redacted_finding='SubMeIn', 
  before_context=None, 
  after_context=None, 
  detector_name='Credit Card Number', 
  detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
  confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
  byte_range=Range(start=0, end=19), 
  codepoint_range=Range(start=0, end=19), 
  matched_detection_rule_uuids=[], 
  matched_detection_rules=['Inline Detection Rule #1'])
]]

And our redacted input payload in our redacted_payload object:

['SubMeIn is my credit card number']

Instead of using a custom string as the substitution (SubMeIn), we may want to use the name of the detector for additional context. We can make a one line change to the example above, replacing substitution_phrase="SubMeIn" with infotype_substitution=True.

This yields:

['[CREDIT_CARD_NUMBER] is my credit card number']

When to use Substitution

Substitution is effective in scenarios where you intend to replace sensitive data with a contextual label, for example, you wish to replace a literal credit card number with the label "Credit Card Number". This provides context to the reader of the data that the data is a credit card number, without exposing them to the actual token itself.

Encryption

Encrypt sensitive data findings with a public encryption key that is passed via the API. Make the encryption algorithm configurable.

Encryption is a complex topic so we'll go into a more in-depth tutorial on encrypting and decrypting sensitive data with Nightfall in a separate post, but let's run through the basics below.

Nightfall uses RSA encryption which is asymmetric, meaning it works with two different keys: a public one and a private one. Anyone with your public key can encrypt data. Encrypted data can only be decrypted with the private key. So, you'll pass Nightfall your public key to encrypt with, and only you will have your private key to decrypt the encrypted data.

Example

  • Default case public_key=”MIG...AQAB” (“my ssn is 518-45-7708” → “my ssn is EhOp/DphEIA0LQd4q1BUq8FtuxKj66VA381Z9DtbiQaaHvy5Wlvtxg0je91DFXEJncOWbhgPbt7EvBl36k5MFlFdPbc5+bg40FxP676SnllEClEO+DDsuiRCk9VC4noAd0zLxgvV8qD/NPE/XhTfOpscqlKhllfTg7G5jZYYSG8=”)

For our example, we'll use the cryptography package in Python, so let's install it first: pip3 install cryptography

Let's first generate a public/private RSA key pair in PEM format on the command line. We'll cover how to generate keys programmatically in Python in our encryption-specific tutorial.

First, we'll generate our private key and write it to a file called example_private.pem:

openssl genrsa -out example_private.pem 2048

Next, we'll generate our public key in PEM format from this private key:

openssl rsa -in example_private.pem -outform PEM -pubout -out example_public.pem

Let's take a look at our public key with cat example_public.pem:

-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAnszkbHNclOhYgEc1lMPn
6KLm3cXS+w2CRBSEC5HFlqOUmdcXWnBFa9tlJYvXhQYuMFhXBcjUYgVUSAftK703
oTFMwRGZNnBjcUnNSK+pD4iaCEmdskkSA85GFCPsO1yrcfJp4965c43FrgWqyo7A
Aka5sGW9gX2wibQpQhil9TS0vtWHvEOq1TZnFAJD/DEJFN7zIQhglA/53Vd5PEL9
8fSfXxzbtu68wwhRtRqTaVRjzslx6i2Xs/QWcS/sWnKhnuF/enjlcll+SLyDEoPO
6iGp8MpHkZzJHmjATQJBA1vyu+mqo+G3wWm7WPME6V83VBNfG4wdkZCx/n9N5KzH
yQIDAQAB
-----END PUBLIC KEY-----

Remember to keep your private key safe. Anyone with this key can decrypt your encrypted data.

Now we can use our public key to encrypt any content with Nightfall! To do so, we'll first read the public key into a string.

from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization

with open(f'example_public.pem', "rb") as key_file:
	public_key = serialization.load_pem_public_key(
		key_file.read()
	)

pem = public_key.public_bytes(
		encoding=serialization.Encoding.PEM,
		format=serialization.PublicFormat.SubjectPublicKeyInfo
	)

pem_str = pem.decode('utf-8')

Now, we'll pass the public key into our redaction configuration, similar to the above examples, so Nightfall can use it to encrypt your sensitive data.

nightfall = Nightfall()
payload = [ "4916-6734-7572-5015 is my credit card number" ]
findings, redacted_payload = nightfall.scan_text(
				        payload,
				        [ DetectionRule([ 
			        		Detector(
			        			min_confidence=Confidence.LIKELY,
		               			nightfall_detector="CREDIT_CARD_NUMBER",
		               			display_name="Credit Card Number",
			               		redaction_config=RedactionConfig(
								remove_finding=False, 
								public_key=pem_str)
			               	)])
				        ])
print(findings)
print(redacted_payload)

We'll see our findings look like this (with line formatting added for clarity):

[[
  Finding(
    finding='4916-6734-7572-5015', 
    redacted_finding='ar4PGD1T3yCBjBdgJ+iX2Ak3hZYIyaaKcRY+AcNS3RjsGnss9hUA9Q0ycLtBOaMjFMeTdCupCEPNUFVYyzeWhHmL009DwWshV47Vkm84zB5O6HroJHAG0JpKHb6bLL58hAb9FHZ73usU4bI67ZEtJhX41HovlOfSCaeUnH4y3pPqRnh7d5roX7EIYQ39wzPGGo2TNbeyqm2pluC1G4Mqt9hLqy0tCwfbmKPXro41i9i1xED9GkVcnxTu0gS8bCMFkvAK4S+Hw0K/gqPq0hu2JGoryKo335IYBCit6S39JESJdNh7IafuE6mrmvYMlR9l4c60VkowEMZAPkUjOelPDw==', 
    before_context=None, 
    after_context=None, 
    detector_name='Credit Card Number', 
    detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
    confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
    byte_range=Range(start=0, end=19), 
    codepoint_range=Range(start=0, end=19), 
    matched_detection_rule_uuids=[], 
    matched_detection_rules=['Inline Detection Rule #1'])
]]

And our redacted input payload in our redacted_payload object (truncated for clarity):

['GpcjUg74...BpQHw== is my credit card number']

When to use Encryption

Third-party encryption is well-suited for use cases where you want to preserve the original sensitive data but ensure that it is only visible to sanctioned parties that have your private key. For example, if you are storing the data or passing it to a sanctioned third-party for processing, encrypting the sensitive tokens can add one additional layer of encryption and security, while still allowing a downstream processor to access the raw data as required with the key.

Congrats! You've now learned about and implemented multiple redaction techniques in just a few lines of code. You're ready to start adding redaction to your apps.

Last updated