What personal data does an AI chatbot actually collect from your prospects?
An AI chatbot deployed on a UK higher education institution's website starts processing personal data from the very first interaction — before a prospect types their name. IP address, session timestamp, typed messages, pages visited: each of these is personal data under UK GDPR. Understanding exactly what your chatbot collects is the foundation of lawful deployment.
72% of questions put to school chatbots are automatable FAQ queries, 21% require institutional context, and 7% require a human agent. (Source: Automated classification across 12,000 Skolbot conversations, 2025.) Each of those conversations is a data processing event. And the efficiency case for chatbots is compelling: AI chatbot response time is 3 seconds, 24/7, compared to 47 hours by email and 72 hours by a contact form (Source: Skolbot mystery shopping audit, 2025, 80 UK institutions). Capturing that advantage requires a lawful data collection framework.
A typical AI chatbot on a UK school or business school website collects the following categories of personal data:
- Conversation data — message text, timestamps, language used, session duration
- Voluntarily provided identifiers — first name, surname, email address, phone number (when the prospect provides them to be followed up)
- Interest data — programme(s) enquired about, target level of study, mode of study (full-time, part-time, degree apprenticeship)
- Demographic data — nationality, age, country of residence (if collected via an embedded form)
- Technical data — IP address, device type, browser, session identifier
The ICO's guidance on AI and data protection is clear that AI systems which process personal data must comply with UK GDPR in full. There is no exemption for chatbots. For the broader legal framework governing your institution's data activities, see our complete GDPR guide for schools.
Lawful bases under UK GDPR for each data category
UK GDPR Article 6 provides six lawful bases for processing personal data. Four are directly relevant to school chatbot deployments. The ICO's guidance on lawfulness in AI confirms that the lawful basis must be identified and documented before processing begins — not retrospectively.
Consent (Article 6(1)(a)) — The individual has given a freely given, specific, informed, and unambiguous indication of agreement. This is the appropriate basis for marketing follow-up communications (nurture emails, open day invitations, newsletters) triggered by a chatbot interaction. Consent must be obtained via an active mechanism (an unticked opt-in box), timestamped, and stored as evidence. Withdrawal must be as easy as giving it.
Performance of pre-contractual measures (Article 6(1)(b)) — Processing is necessary to take steps at the request of the individual before entering a contract. A prospect asking for programme fees and entry requirements — and providing an email address to receive that information — falls within this basis. The processing of their contact details to deliver the requested documentation is covered.
Legitimate interests (Article 6(1)(f)) — Processing is necessary for the legitimate interests of the controller or a third party, provided those interests are not overridden by the individual's rights. Analysing anonymised conversation logs to improve chatbot quality, or retaining minimal interaction data to handle follow-up queries, can fall under this basis — provided a documented legitimate interests assessment (LIA) has been completed. The ICO's AI guidance emphasises that legitimate interests requires genuine balancing, not a rubber stamp.
Legal obligation (Article 6(1)(c)) — Rarely applicable to the chatbot itself, but relevant if data collected must be retained for regulatory traceability (for example, evidence that pre-contractual disclosures were made to UCAS applicants).
For special category data — health information, data implying ethnic origin — Article 9 requires an additional, more demanding legal basis. See the section below.
The data minimisation principle: only collect what you need
Data minimisation (Article 5(1)(c) UK GDPR) is the principle most frequently violated in chatbot deployments at UK institutions. It requires that data collected is adequate, relevant, and limited to what is necessary in relation to the purposes for which it is processed.
In practice, for your institution's chatbot:
- The chatbot must not require an email address to answer a question about programmes. Email becomes necessary only when the prospect explicitly requests follow-up communication or a document.
- Nationality must not be collected as a default field. It becomes relevant only when the prospect is seeking information about international student fees, visa requirements, or UCAS pathways specific to their country of origin.
- Exact date of birth is not necessary if the chatbot only needs to determine whether the prospect is a sixth-form student or a graduate. A study level or approximate age band is sufficient.
- Full conversation logs must not be retained indefinitely. The retention purpose — service improvement, CRM transfer, compliance audit — must be defined before the chatbot goes live, and retention periods must be documented in your record of processing activities.
The ICO has stated repeatedly that privacy by design is not optional for AI systems: data minimisation must be built into the system architecture, not bolted on after deployment.
Special category data: what to watch for in chatbot conversations
Some data that flows through a school chatbot interface falls within UK GDPR's special category provisions (Article 9). Processing special category data is prohibited by default unless a specific exception applies and an Article 9 condition is met alongside the Article 6 lawful basis.
Nationality and ethnic origin — Nationality is not itself a special category, but it can reveal or imply racial or ethnic origin, which is. A chatbot that collects nationality to segment prospects (for example, distinguishing "home" from "international" students for fee-quoting purposes) must assess whether that processing risks inferring ethnic origin. Where the risk exists, explicit consent (Article 9(2)(a)) is required.
Health and disability data — Prospective students routinely seek information about disabled student allowance (DSA), exam adjustments, mental health support, or accessibility provisions. When they do, they may disclose health data. The chatbot must be configured to redirect these conversations to a designated adviser or a secure form, rather than logging the health-related content in standard conversation records. If any storage is technically unavoidable, the relevant field must be masked before archiving.
Financial and social circumstances — Questions about bursaries, hardship funds, or fee waivers may reveal a prospect's socio-economic situation. This is not a special category under Article 9, but it is confidential data requiring a robust lawful basis and appropriate security measures.
Practical rule: configure your chatbot to detect sensitive topics and trigger a redirection response rather than collecting the information within the standard conversational interface.
Data table: types, lawful bases, retention periods
| Data type | Lawful basis | Retention period | Notes |
|---|---|---|---|
| Conversation logs (anonymised) | Legitimate interests (Art. 6(1)(f)) | <12 months | Service improvement — anonymisation required |
| Email + name (qualified prospect) | Pre-contractual measures (Art. 6(1)(b)) or consent | 12–24 months from last active engagement | Automated purging required; align with UCAS cycle |
| Phone number | Consent (Art. 6(1)(a)) | 12–24 months from last active engagement | Specific consent required for telephone marketing |
| Programme(s) of interest | Legitimate interests (Art. 6(1)(f)) | Linked to prospect profile | Document in record of processing activities |
| Nationality | Consent or pre-contractual measures | Linked to prospect profile | Assess ethnic origin inference risk before collecting |
| Age / level of study | Legitimate interests (Art. 6(1)(f)) | Linked to prospect profile | Necessary for routing to correct programme |
| IP address (non-anonymised) | Legitimate interests (Art. 6(1)(f)) | <13 months | Anonymise where possible; document LIA |
| Health or disability data | Explicit consent (Art. 9(2)(a)) | Strictly necessary | Do not collect in standard chatbot interface |
| Technical session logs | Legitimate interests (Art. 6(1)(f)) | <3 months | Security and debugging purposes only |
When does your chatbot trigger a DPIA obligation?
Article 35 UK GDPR requires a Data Protection Impact Assessment (DPIA) for any processing likely to result in a high risk to individuals' rights and freedoms. The ICO maintains a list of processing types that automatically require a DPIA; several apply directly to school chatbot deployments.
Your chatbot likely requires a DPIA if one or more of the following apply:
- Large-scale processing — a chatbot handling thousands of conversations per month on a busy institution website readily meets the ICO's large-scale threshold
- Profiling — if the chatbot scores or classifies prospects (warm/cold, programme affinity, likelihood to apply) based on their interactions, that is profiling under Article 4(4)
- Special category data — processing nationality with ethnic origin risk, health data, or socio-economic data (even if not Article 9 category)
- Data transfers outside the UK — if your chatbot provider uses servers or language model infrastructure hosted in the US, EU, or elsewhere outside the UK, standard safeguards (International Data Transfer Agreements, adequacy decisions) must be verified
The DPIA must document: a description of the processing and its purposes; an assessment of necessity and proportionality; identified risks; and the mitigation measures implemented. If the DPIA concludes that a high residual risk cannot be mitigated, the ICO must be consulted before processing begins.
For the full list of technical and organisational measures your institution should have in place, our GDPR audit checklist for schools covers all 20 points, including the chatbot DPIA obligation.
Implementing compliant data collection: the chatbot interface
GDPR compliance for a chatbot operates at three levels: transparency before first interaction, lawful consent mechanisms, and data subject rights.
Transparency: the opening message
Before processing any conversation data, your chatbot must display an opening message that includes:
- The identity of the data controller (your institution's legal name)
- The purpose of the processing (answering enquiries, routing prospectus requests)
- A link to the full privacy notice
- A clear statement that the user is interacting with an AI system, not a human — required under the EU AI Act (Article 52) and consistent with ICO expectations for transparent AI deployment
A compliant example: "Hi, I'm [Bot Name], [Institution]'s AI assistant. Your messages are processed in line with our [privacy policy]. To exercise your rights, contact dpo@[institution].ac.uk."
Consent mechanism for contact data
When the chatbot collects an email address or phone number, an active consent mechanism must precede that collection:
- An unticked opt-in checkbox per marketing purpose (nurture emails, open day invitations, prospectus mailing)
- Separate consent for each distinct purpose — bundled consent is not valid under UK GDPR
- Timestamp and stored record of consent as proof of compliance
Data subject rights: access, rectification, erasure
The Data (Use and Access) Act 2025, now law in the UK, reinforces data subject rights alongside UK GDPR. Prospects must be able to exercise their rights via a clearly signposted mechanism. Standard practice: display the DPO's contact email in the chatbot interface and in the privacy notice. Erasure requests must be fulfilled within one calendar month and must cover all systems: chatbot logs, CRM, email platform, anonymised analytics where re-identification is possible.
For the cookie compliance obligations that attach to the chatbot widget itself, see our guide on cookie consent and GDPR compliance for UK schools.
Test Skolbot on your institution in 30 seconds
FAQ
Does our chatbot need a DPIA if it only answers FAQ questions?
Not automatically — but it cannot be ruled out without assessment. The DPIA obligation is triggered by high risk to individuals, not by the complexity or sensitivity of the chatbot's outputs. A chatbot that handles thousands of conversations per month, records IP addresses, and passes prospect data to a CRM may meet the large-scale and profiling thresholds even if the conversations themselves are limited to FAQ topics. The practical step: run through the ICO's DPIA screening criteria. If two or more criteria are met, a DPIA is required.
How long can we retain chatbot conversation logs?
The UK GDPR storage limitation principle (Article 5(1)(e)) requires that data is retained no longer than necessary for the stated purpose. For service improvement using anonymised logs: up to 12 months is a defensible standard. For qualified prospect profiles (email address provided): 12 to 24 months from the date of last meaningful engagement, aligned with the UCAS application cycle. Technical session logs (debugging, security) should not be retained beyond 3 months. These periods must be documented in your record of processing activities and enforced by automated deletion — not by an annual manual review that rarely happens in practice.
Can the chatbot pass data to the CRM without collecting fresh consent?
It depends on the original lawful basis. If the chatbot collected contact data under pre-contractual measures or legitimate interests for the purpose of processing an enquiry, transferring that data to the CRM for admissions follow-up is generally compatible with the original purpose. However, if you intend to use the CRM data for marketing campaigns unrelated to the original enquiry — retargeting, promotional emails, partner communications — a specific consent must have been collected at the point of chatbot interaction before that transfer takes place. Document the CRM transfer purpose explicitly in your record of processing activities.
How do we meet the AI transparency requirement in the chatbot interface?
Under the EU AI Act (Article 52, applicable from August 2026 for limited-risk AI systems) and ICO expectations for transparent AI, users must be told they are interacting with an AI system when this is not obvious. For a text-based chatbot on a school website, a clear bot name (e.g. "Skolbot — AI Assistant"), a disclosure in the opening message, and a visual indicator (bot avatar, distinct colour) collectively satisfy the obligation. This AI transparency requirement overlaps with UK GDPR transparency obligations: a single well-drafted opening message can address both simultaneously, covering the controller identity, the processing purpose, the AI disclosure, and the privacy notice link.
What happens if a prospect discloses sensitive information — health, disability, financial hardship — in the chatbot?
Configure the chatbot to detect sensitive topic keywords (disability support, mental health, bursary eligibility, DSA) and respond with a redirect: "For the best support on this, our student support team would like to speak with you directly. Would you like to leave your contact details?" Do not log the sensitive content in standard conversation records. If full session logging is technically unavoidable, implement field masking or redaction before the log is archived. The objective is to prevent health, disability, or socio-economic data from entering a marketing CRM without an adequate legal basis — a compliance exposure the ICO has specifically flagged in its AI guidance for the education sector.



