Back to work
Open Source Live Product GDPR Privacy

Lastik — Data De-identification Tool

From market gap to live open-source product: designing a privacy-first PII anonymizer for the AI era

Context

As AI assistants became daily tools for knowledge workers, a new risk emerged in plain sight: professionals routinely copy-paste sensitive documents — contracts, CVs, medical records — directly into ChatGPT or Copilot without realizing their data is transmitted to external servers. No simple, browser-based tool existed to prevent this. Lastik was designed to close that gap.

The core architectural constraint is GDPR compliance: zero data transmission means all processing must run client-side in the browser. This rules out server-side NLP models and dictates the technology choice — TypeScript, Next.js, React, with PII detection via pattern matching (regex). The trade-off is deliberate: slightly lower accuracy on edge cases in exchange for an absolute privacy guarantee. No data ever leaves the device. This constraint introduces a cross-cutting acceptance criterion for all use cases: verified absence of outbound network requests during every stage of processing (see Use Cases).

Problem

Market Overview

The scale of the problem is significant. As of early 2025, ChatGPT alone has 400 million monthly active users (Backlinko, 2025), with the combined global AI assistant user base exceeding 1 billion MAU across ChatGPT, Gemini, Copilot, and Claude (DataStudios, 2025).

A significant portion of these users work with personally identifiable information daily:

  • ~40–50 million HR and recruiting professionals worldwide handle CVs, employment contracts, compensation data, and background checks (BLS, 2024)
  • ~20 million lawyers globally process contracts, NDAs, litigation documents, and deposition transcripts (SpotDraft, 2025)

Research shows the problem is already widespread and growing:

  • 77% of employees paste sensitive company data — including PII and payment card information — into generative AI tools, often via unmanaged personal accounts (eSecurity Planet)
  • 38% of workers share sensitive information with AI tools without their employer’s knowledge (CybSafe)
  • ~40% of files uploaded to AI tools contain PII or payment card (PCI) data (Kiteworks)

Real-world consequences are already materializing: in 2023, Samsung engineers leaked semiconductor source code and internal meeting transcripts into ChatGPT — the company subsequently banned all generative AI tools company-wide (Bloomberg). In December 2024, Italy’s data protection authority fined OpenAI €15 million for GDPR violations related to ChatGPT data processing (The Hacker News).

User Personas

Agnes
Agnes
34 · HR Specialist

Processes 20–30 job applications per week at a mid-size company. Uses ChatGPT daily for CV screening and draft generation alongside Google Docs, Notion, and an ATS. Wants to move faster with AI — but is aware of GDPR obligations and fears a compliance incident. Not technical: cannot run Python scripts or configure APIs.

  • Screen CVs and contracts with AI assistance
  • Share documents with colleagues without exposing PII
  • Stay compliant without slowing down her workflow
John
John
29 · Job Applicant

Submits a CV with his full PII: name, date of birth, phone, email, home address, IBAN. Has no visibility into how Agnes processes his data. He is the data subject under GDPR — his rights are directly at risk if his document is shared with an AI service without proper safeguards.

  • Apply for a position and share personal documents
  • Trust that his data is handled responsibly
  • Expect GDPR-compliant processing by the employer

Competitive Landscape

Most professionals today handle PII the only way they know how: Find & Replace in Word or Google Docs — substituting names one by one, with no systematic tracking, no reversibility, and no guarantee nothing was missed. It works until it doesn’t.

Dedicated tools exist, but they are built for developers and enterprises — not for Agnes.

ToolBrowser / No setupFreeOpen SourceReview before masking
Lastik
Microsoft Presidio❌ requires Python / Docker
AWS Comprehend❌ requires AWS account❌ pay-per-use
Private AI❌ requires API key❌ trial only
Nightfall❌ requires enterprise sales❌ ~$75K/yr

No major vendor offers a zero-friction, browser-based, no-account PII anonymizer for knowledge workers. Lastik targets this gap directly.

Threat Analysis

To understand what Lastik solves, we need to locate exactly where data exposure occurs in a typical workflow.

sequenceDiagram
    participant J as John (Applicant)
    participant A as Agnes (HR)
    participant AI as AI Assistant

    rect rgb(255, 243, 243)
        J->>A: Submits CV — name, DOB, IBAN, phone
        Note over A,AI: 👇 THREAT POINT — PII leaves the browser here
        A->>AI: Pastes full CV text into ChatGPT
    end

Scenario B — with Lastik · Zero transmission to any server

sequenceDiagram
    participant J as John (Applicant)
    participant A as Agnes (HR)
    participant AI as AI Assistant

    rect rgb(240, 253, 250)
        J->>A: Submits CV — name, DOB, IBAN, phone
        Note over A: 👇 Data cleaned HERE
        A->>A: Paste into Lastik · detect & mask PII locally
        A->>AI: Masked text only: PERSON_1, DOB_1, IBAN_1
        AI->>A: AI analysis — no PII exposed
        A->>A: Reverse replacement via rules file
    end

The threat is not in the document itself — it is in the copy-paste step before AI submission. Once text is pasted into ChatGPT or Copilot, it travels to an external server. Lastik intercepts at exactly this point, processing everything locally before any network request is made.

Scenarios

S-01. De-identification

As Agnes, I want to hide all PII in a document before sending it to John or to ChatGPT so that I prevent a data leak and avoid fines or issues during an audit of my work.

S-02. Reverse replacement

As Agnes, I want to restore all personal data to its original form in the document so that I don’t have to retype anything or manually find every place where a token needs to be replaced.

S-03. Rules export

As Agnes, I want to export the masking rules so that I can return to my work later without losing everything if the browser refreshes, the computer restarts, or the power goes out.

S-04. Manual rule creation

As Agnes, I want to create custom replacements for entities the engine didn’t detect so that nothing is missed — without having to fix each occurrence by hand.

S-05. Color-coded review

As Agnes, I want to see all detected entities highlighted before confirming so that I don’t accidentally mask something I need or miss something I shouldn’t send.

S-06. Device-local processing

As John, I want my data to stay on my device and not be sent anywhere so that I have full control over my personal data and don’t risk it leaking to a third party.

Scope

V1.0 covers the minimum viable masking flow — the smallest set of features that delivers value independently. V1.1 adds polish and public launch. V2.0 closes the full de-identification cycle with rules export and reverse replacement. Release boundaries reflect the initial planning phase and may differ from actual delivery. Source code is available on GitHub.

Featurev1.0v1.1v2.0
RAW mode
Interactive review
PII types: name, email, phone, DOB, IBAN, card
Color-coded highlighting
Copy controls
Manual entity labeling
Mobile UX
GDPR pseudonymisation messaging
SEO & public launch
Rules export
Reverse replacement

Use Cases

UC-01 — De-identification

FieldDescription
Use Case IDUC-01
Use Case NameDe-identify a document before sharing with an AI assistant
ActorAgnes (HR Specialist)
PreconditionsAgnes has received John’s CV as plain text from a source document. Lastik is open in the browser at lastik.chassaji.com. No account, login, or installation is required.
Main Flow1. Agnes copies John’s CV text from the source document.
2. Agnes pastes the text into the Lastik editor panel.
3. Lastik automatically detects PII entities and highlights them by type: name, DOB, email, phone, IBAN.
4. Agnes reviews the highlighted entities in the right-hand sidebar (S-05).
5. Agnes adjusts any incorrect or missed detections manually (S-04), or confirms the rules as-is.
6. Agnes clicks Copy masked text (S-01).
7. Agnes pastes the masked text into ChatGPT.
PostconditionsMasked text is in Agnes’s clipboard. John’s PII has not left the device. Agnes is now interacting with the AI assistant without GDPR risk.
Alternative Flow A — Manual EntityAt step 5, Agnes highlights a text fragment the engine missed (e.g., an internal employee ID), assigns a label, and confirms. The entity is added to the masking rules.
Alternative Flow B — No PII DetectedIf no entities are highlighted after step 3, Agnes may manually create rules (S-04) or verify that no masking is needed and proceed with the original text.
Exception 1 — Page RefreshedAll session data and unsaved rules are lost. If rules were not exported (S-03), the masking cannot be reversed.
Exception 2 — No Internet ConnectionThe page cannot load and the tool is unavailable.
Related ScenariosS-01, S-04, S-05, S-06

UC-02 — Reverse Replacement

FieldDescription
Use Case IDUC-02
Use Case NameRestore original values from masked AI output
ActorAgnes (HR Specialist)
PreconditionsUC-01 was completed and the masking rules file was exported (S-03). Agnes has received the AI response containing masked tokens such as PERSON_1, IBAN_1. Lastik is open in the browser.
Main Flow1. Agnes switches Lastik to reverse replacement mode.
2. Agnes copies the AI response text from the source.
3. Agnes pastes the text into the editor panel.
4. Agnes loads the previously exported rules file.
5. Lastik matches tokens to their original values and highlights them.
6. Agnes reviews the restored text.
7. Agnes clicks Copy original text (S-02).
8. Agnes uses the restored response in her workflow.
PostconditionsAgnes has the AI analysis with real data restored. The full de-identification cycle is closed: mask → AI → restore.
Alternative Flow A — Edited Rules FileBefore step 4, Agnes opens the rules file and edits the replacement values. At step 5, Lastik performs substitutions according to the modified rules.
Exception 1 — Page RefreshedAll session data is lost. Agnes must reload the tool and the rules file before proceeding.
Exception 2 — No Internet ConnectionThe page cannot load and the tool is unavailable.
Exception 3 — Corrupted or Invalid FileThe rules file cannot be parsed due to incorrect format or manual editing errors. Agnes must re-export the rules file from UC-01.
Exception 4 — Token Not FoundThe AI paraphrased or omitted a token. The corresponding rule is displayed but no substitution is performed for that value.
Related ScenariosS-02, S-03, S-06

Acceptance Criterion (cross-cutting, both UC-01 and UC-02)

Chrome DevTools Network tab must show zero outbound requests during all stages of processing — masking, rule generation, and substitution. This is a direct consequence of the client-side architecture and the GDPR zero-transmission constraint defined in Context.

Result

Who It’s Built For

1B+
AI assistant users globally (DataStudios, 2025)
40–50M
HR & recruiting professionals worldwide (BLS, 2024)
~20M
lawyers globally processing PII documents daily (SpotDraft, 2025)

These professionals work with PII-heavy documents every day and increasingly rely on AI assistants to process them — but have no simple, safe way to do it. Lastik was built for them: no Python, no API keys, no enterprise contract.

What Was Delivered

  • Live product — no account, no installation, works instantly in any browser
  • Full de-identification cycle — mask → AI interaction → export rules → reverse replacement
  • 6 PII types detected — names, dates of birth, phone numbers, emails, IBANs, credit cards
  • Zero data transmission — all processing runs locally on the device, never leaves it
  • GDPR Article 4(5) pseudonymisation — by design, not as an afterthought
  • Reversibility and audit trail — capabilities Word’s Find & Replace cannot provide
  • Open source on GitHub