PII
Detect and protect personally identifiable information in AI systems using Galileo’s PII Metric to identify sensitive data and implement appropriate safeguards.
PII Detection identifies personally identifiable information spans within a sample (both input and output).
This metric is particularly valuable for identifying sensitive personal data that may require special handling or protection. Detecting PII is essential for compliance with privacy regulations and protecting user information.
Calculation Method
PII detection is computed through a specialized process:
Model Foundation
A specialized Small Language Model (SLM) trained on proprietary datasets forms the core of the detection system, enabling accurate identification of various PII types.
Content Analysis
The system performs comprehensive scanning of both input and output text, utilizing pattern recognition and contextual analysis to identify potential PII occurrences.
Classification Process
Each detected PII instance is systematically categorized by its specific type (e.g., SSN, email, address) and assigned a confidence score based on the detection certainty.
Visual Reporting
Results are displayed through an interactive interface that highlights PII instances directly in the text, making it easy to identify and review sensitive information locations.
To highlight which parts of the text were detected as PII, click on the icon next to the PII metric value. The type of PII detected along with the model’s confidence will be shown on the input or output text.
PII Categories Detected
The current model detects the following precisely defined categories:
Account Information: Bank account numbers, Bank Identification Code (BIC) and International Bank Account Number (IBAN).
Address: A physical address. Must contain at least a street name and number, and may contain extra elements such as city, zip code, state, etc.
Credit Card: Credit card number (can be full or last 4 digits), Card Verification Value (CVV) and expiration date.
Date of Birth: This represents the day, month and year a person was born. The context should make it clear that it’s someone’s birthdate.
Email: An email address.
Name: A person’s full name. It must consist of at least a first and last name to be considered PII.
Network Information: IPv4, IPv6 and MAC addresses.
Password: A password.
Phone Number: A phone number.
Social Security Number (SSN): A US Social Security Number.
Username: A username.
Optimizing Your AI System
Addressing PII in Your System
When PII is detected in your system, consider these approaches:
Implement data redaction: Automatically mask or remove PII before processing or storing data.
Create PII handling policies: Develop clear guidelines for how different types of PII should be processed.
Set up user consent flows: Ensure users understand when and how their PII might be used.
Establish data retention policies: Define how long different types of PII should be stored.
Best Practices
Real-time PII Detection
Implement PII detection as part of your input validation pipeline to catch sensitive information before processing.
Data Minimization
Only collect and process the minimum amount of PII necessary for your application’s functionality.
Secure Storage
When PII must be stored, ensure it’s properly encrypted and access is strictly controlled.
Regular Audits
Periodically review your system for unintended PII exposure or collection.
Automatically identify PII occurrences in any part of the workflow (user input, chains, model output, etc), and respond accordingly by implementing guardrails or other preventative measures. This helps ensure compliance with privacy regulations like GDPR, CCPA, and others.