Open Source Live Product GDPR Privacy

Lastik — Data De-identification Tool

From market gap to live open-source product: designing a privacy-first PII anonymizer for the AI era

Context↑↓

As AI assistants became daily tools for knowledge workers, a new risk emerged in plain sight: professionals routinely copy-paste sensitive documents — contracts, CVs, medical records — directly into ChatGPT or Copilot without realizing their data is transmitted to external servers. No simple, browser-based tool existed to prevent this. Lastik was designed to close that gap.

The core architectural constraint is GDPR compliance: zero data transmission means all processing must run client-side in the browser. This rules out server-side NLP models and dictates the technology choice — TypeScript, Next.js, React, with PII detection via pattern matching (regex). The trade-off is deliberate: slightly lower accuracy on edge cases in exchange for an absolute privacy guarantee. No data ever leaves the device. This constraint introduces a cross-cutting acceptance criterion for all use cases: verified absence of outbound network requests during every stage of processing (see Use Cases).

Problem↑↓

Market Overview

The scale of the problem is significant. As of December 2025, ChatGPT alone has over 900 million weekly active users worldwide (Backlinko, 2025), processing roughly 1 billion queries per day (DataStudios, 2025) — and that is before counting Gemini, Copilot, and Claude on top.

A significant portion of these users work with personally identifiable information daily. Even just for the US:

944,300 HR specialists in the US alone handle CVs, employment contracts, compensation data, and background checks (BLS, 2024)
1.3M+ active lawyers in the US process contracts, NDAs, litigation documents, and deposition transcripts (SpotDraft / ABA, 2025)

Research shows the problem is already widespread and growing:

77% of employees paste sensitive company data — including PII and payment card information — into generative AI tools, often via unmanaged personal accounts (eSecurity Planet)
38% of workers share sensitive information with AI tools without their employer’s knowledge (CybSafe)

Real-world consequences are already materializing: in 2023, Samsung engineers leaked semiconductor source code and internal meeting transcripts into ChatGPT — the company subsequently banned all generative AI tools company-wide (Bloomberg). In December 2024, Italy’s data protection authority fined OpenAI €15 million for GDPR violations related to ChatGPT data processing (The Hacker News).

User Personas

Agnes

34 · HR Specialist

Processes 20–30 job applications per week at a mid-size company. Uses ChatGPT daily for CV screening and draft generation alongside Google Docs, Notion, and an ATS. Wants to move faster with AI — but is aware of GDPR obligations and fears a compliance incident. Not technical: cannot run Python scripts or configure APIs.

Screen CVs and contracts with AI assistance
Share documents with colleagues without exposing PII
Stay compliant without slowing down her workflow

John

29 · Job Applicant

Submits a CV with his full PII: name, date of birth, phone, email, home address, IBAN. Has no visibility into how Agnes processes his data. He is the data subject under GDPR — his rights are directly at risk if his document is shared with an AI service without proper safeguards.

Apply for a position and share personal documents
Trust that his data is handled responsibly
Expect GDPR-compliant processing by the employer

Competitive Landscape

Most professionals today handle PII the only way they know how: Find & Replace in Word or Google Docs — substituting names one by one, with no systematic tracking, no reversibility, and no guarantee nothing was missed. It works until it doesn’t.

Dedicated tools exist, but they are built for developers and enterprises — not for Agnes.

Tool	Browser / No setup	Free	Open Source	Review before masking
Lastik	✅	✅	✅	✅
Microsoft Presidio	❌ requires Python / Docker	✅	✅	❌
AWS Comprehend	❌ requires AWS account	❌ pay-per-use	❌	❌
Private AI	❌ requires API key	❌ trial only	❌	❌
Nightfall	❌ requires enterprise sales	❌ ~$75K/yr	❌	❌

No major vendor offers a zero-friction, browser-based, no-account PII anonymizer for knowledge workers. Lastik targets this gap directly.

Threat Analysis

To understand what Lastik solves, we need to locate exactly where data exposure occurs in a typical workflow.

sequenceDiagram
    participant J as John (Applicant)
    participant A as Agnes (HR)
    participant AI as AI Assistant

    rect rgb(255, 243, 243)
        J->>A: Submits CV — name, DOB, IBAN, phone
        Note over A,AI: 👇 THREAT POINT — PII leaves the browser here
        A->>AI: Pastes full CV text into ChatGPT
    end

Scenario B — with Lastik · Zero transmission to any server

sequenceDiagram
    participant J as John (Applicant)
    participant A as Agnes (HR)
    participant AI as AI Assistant

    rect rgb(240, 253, 250)
        J->>A: Submits CV — name, DOB, IBAN, phone
        Note over A: 👇 Data cleaned HERE
        A->>A: Paste into Lastik · detect & mask PII locally
        A->>AI: Masked text only: PERSON_1, DOB_1, IBAN_1
        AI->>A: AI analysis — no PII exposed
        A->>A: Reverse replacement via rules file
    end

The threat is not in the document itself — it is in the copy-paste step before AI submission. Once text is pasted into ChatGPT or Copilot, it travels to an external server. Lastik intercepts at exactly this point, processing everything locally before any network request is made.

Scenarios↑↓

S-01. De-identification

As Agnes, I want to hide all PII in a document before sending it to John or to ChatGPT so that I prevent a data leak and avoid fines or issues during an audit of my work.

S-02. Reverse replacement

As Agnes, I want to restore all personal data to its original form in the document so that I don’t have to retype anything or manually find every place where a token needs to be replaced.

S-03. Rules export

As Agnes, I want to export the masking rules so that I can return to my work later without losing everything if the browser refreshes, the computer restarts, or the power goes out.

S-04. Manual rule creation

As Agnes, I want to create custom replacements for entities the engine didn’t detect so that nothing is missed — without having to fix each occurrence by hand.

S-05. Color-coded review

As Agnes, I want to see all detected entities highlighted before confirming so that I don’t accidentally mask something I need or miss something I shouldn’t send.

S-06. Device-local processing

As John, I want my data to stay on my device and not be sent anywhere so that I have full control over my personal data and don’t risk it leaking to a third party.

Scope↑↓

V1.0 covers the minimum viable masking flow — the smallest set of features that delivers value independently. V1.1 adds polish and public launch. V2.0 closes the full de-identification cycle with rules export and reverse replacement. Release boundaries reflect the initial planning phase and may differ from actual delivery. Source code is available on GitHub.

Feature	v1.0	v1.1	v2.0
RAW mode	✅
Interactive review	✅
PII types: name, email, phone, DOB, IBAN, card	✅
Color-coded highlighting	✅
Copy controls	✅
Manual entity labeling	✅
Mobile UX		✅
GDPR pseudonymisation messaging		✅
SEO & public launch		✅
Rules export			✅
Reverse replacement			✅

Use Cases↑↓

UC-01 — De-identification

Field	Description
Use Case ID	UC-01
Use Case Name	De-identify a document before sharing with an AI assistant
Actor	Agnes (HR Specialist)
Preconditions	Agnes has received John’s CV as plain text from a source document. Lastik is open in the browser at lastik.chassaji.com. No account, login, or installation is required.
Main Flow	1. Agnes copies John’s CV text from the source document. 2. Agnes pastes the text into the Lastik editor panel. 3. Lastik automatically detects PII entities and highlights them by type: name, DOB, email, phone, IBAN. 4. Agnes reviews the highlighted entities in the right-hand sidebar (S-05). 5. Agnes adjusts any incorrect or missed detections manually (S-04), or confirms the rules as-is. 6. Agnes clicks Copy masked text (S-01). 7. Agnes pastes the masked text into ChatGPT.
Postconditions	Masked text is in Agnes’s clipboard. John’s PII has not left the device. Agnes is now interacting with the AI assistant without GDPR risk.
Alternative Flow A — Manual Entity	At step 5, Agnes highlights a text fragment the engine missed (e.g., an internal employee ID), assigns a label, and confirms. The entity is added to the masking rules.
Alternative Flow B — No PII Detected	If no entities are highlighted after step 3, Agnes may manually create rules (S-04) or verify that no masking is needed and proceed with the original text.
Exception 1 — Page Refreshed	All session data and unsaved rules are lost. If rules were not exported (S-03), the masking cannot be reversed.
Exception 2 — No Internet Connection	The page cannot load and the tool is unavailable.
Related Scenarios	S-01, S-04, S-05, S-06

UC-02 — Reverse Replacement

Field	Description
Use Case ID	UC-02
Use Case Name	Restore original values from masked AI output
Actor	Agnes (HR Specialist)
Preconditions	UC-01 was completed and the masking rules file was exported (S-03). Agnes has received the AI response containing masked tokens such as `PERSON_1`, `IBAN_1`. Lastik is open in the browser.
Main Flow	1. Agnes switches Lastik to reverse replacement mode. 2. Agnes copies the AI response text from the source. 3. Agnes pastes the text into the editor panel. 4. Agnes loads the previously exported rules file. 5. Lastik matches tokens to their original values and highlights them. 6. Agnes reviews the restored text. 7. Agnes clicks Copy original text (S-02). 8. Agnes uses the restored response in her workflow.
Postconditions	Agnes has the AI analysis with real data restored. The full de-identification cycle is closed: mask → AI → restore.
Alternative Flow A — Edited Rules File	Before step 4, Agnes opens the rules file and edits the replacement values. At step 5, Lastik performs substitutions according to the modified rules.
Exception 1 — Page Refreshed	All session data is lost. Agnes must reload the tool and the rules file before proceeding.
Exception 2 — No Internet Connection	The page cannot load and the tool is unavailable.
Exception 3 — Corrupted or Invalid File	The rules file cannot be parsed due to incorrect format or manual editing errors. Agnes must re-export the rules file from UC-01.
Exception 4 — Token Not Found	The AI paraphrased or omitted a token. The corresponding rule is displayed but no substitution is performed for that value.
Related Scenarios	S-02, S-03, S-06

Acceptance Criterion (cross-cutting, both UC-01 and UC-02)

Chrome DevTools Network tab must show zero outbound requests during all stages of processing — masking, rule generation, and substitution. This is a direct consequence of the client-side architecture and the GDPR zero-transmission constraint defined in Context.

Result↑↓

Who It’s Built For

900M+

ChatGPT weekly active users worldwide (Backlinko, 2025)

944K

HR specialists in the US (2024) (BLS)

1.3M+

active lawyers in the US processing PII documents daily (SpotDraft / ABA, 2025)

These professionals work with PII-heavy documents every day and increasingly rely on AI assistants to process them — but have no simple, safe way to do it. Lastik was built for them: no Python, no API keys, no enterprise contract.

What Was Delivered

Live product — no account, no installation, works instantly in any browser
Full de-identification cycle — mask → AI interaction → export rules → reverse replacement
6 PII types detected — names, dates of birth, phone numbers, emails, IBANs, credit cards
Zero data transmission — all processing runs locally on the device, never leaves it
GDPR Article 4(5) pseudonymisation — by design, not as an afterthought
Reversibility and audit trail — capabilities Word’s Find & Replace cannot provide
Open source on GitHub

Live demo ↗