Masking Package

Introduction

The Masking Package is a Python library designed to provide various masking operations on data columns. It supports hashing, fake data generation, and other masking techniques to ensure data privacy and security.

Table of Content

Development:

Installation

To install the Masking Package, use the following command:

pip install masking@git+"https://github.com/d-one/masking"

Usage

In the following we provide an example of how to use the masking module to hash the entire content of an entry.

Hashing operation

Here is a basic example of how to use the package

import pandas as pd
from masking.mask.pipeline import MaskDataFramePipeline
from masking.mask.operations.operation_hash import HashOperation

# Sample DataFrame
data = {'name': ['Alice', 'Bob', 'Charlie'],
        'ssn': ['123-45-6789', '987-65-4321', '555-55-5555']}
df = pd.DataFrame(data)

# Configuration for masking
config = {
        "name": {'masking_operation': HashOperation(col_name="name",
                                                    secret="my_secret")},
        }

# Create and apply the masking pipeline
pipeline = MaskDataFramePipeline(config)
masked_df = pipeline(df)

print(masked_df)

Using Spacy models

The module makes use of Presidio, a framework introduced by Microsoft, to enable entity recognition. This latter leverages Spacy as a NLP framework to do entity recognition.

Before using the masking module, make sure to have downloaded the necessary spacy model: open up a terminal in your project and execute the following line of code

python -m spacy download en_core_web_trf

You can make sure to have installed the spacy model correctly by running the following code:

import spacy

# Load English tokenizer, tagger, parser and NER
nlp = spacy.load("en_core_web_trf")

# Process whole documents
text = ("When Sebastian Thrun started working on self-driving cars at "
        "Google in 2007, few people outside of the company took him "
        "seriously. “I can tell you very senior CEOs of major American "
        "car companies would shake my hand and turn away because I wasn’t "
        "worth talking to,” said Thrun, in an interview with Recode earlier "
        "this week.")
doc = nlp(text)

# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])

# Find named entities, phrases and concepts
for entity in doc.ents:
    print(entity.text, entity.label_)

Once the model is correctly installed, one can make use of it for masking in the following manner:

import pandas as pd
from masking.mask.pipeline import MaskDataFramePipeline
from masking.mask.operations.operation_presidio import MaskPresidio
from masking.utils.presidio_handler import PresidioMultilingualAnalyzer

analyzer = PresidioMultilingualAnalyzer(
    models={"en": "en_core_web_trf"}
).analyzer

# Sample DataFrame
data = {'description': ['James Bond was a very clever inspector.']}
df = pd.DataFrame(data)

# Configuration for masking
config = {
        "name": {'masking_operation': MaskPresidio(col_name="name", secret="my_secret")},
        }

# Create and apply the masking pipeline
pipeline = MaskDataFramePipeline(config)
masked_df = pipeline(df)

print(masked_df)

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.vscode		.vscode
data		data
dist		dist
docs		docs
scripts		scripts
src/masking		src/masking
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Masking Package

Introduction

Table of Content

Installation

Usage

Hashing operation

Using Spacy models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

d-one/masking

Folders and files

Latest commit

History

Repository files navigation

Masking Package

Introduction

Table of Content

Installation

Usage

Hashing operation

Using Spacy models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages