· 6 min read ·

Your Jira Backlog Is Now Training Data, Unless You Said Otherwise

Source: hackernews

Atlassian flipped a default that a lot of teams missed: data collection for AI training is now on unless an administrator explicitly turns it off. This is the kind of change that gets buried in a policy update email, generates 451 upvotes on Hacker News, and then quietly affects millions of Confluence pages and Jira tickets while the affected organizations are still debating whether to act.

The mechanism itself is not complicated. Atlassian Intelligence, their AI product suite covering Rovo, smart summaries in Confluence, and the various AI-assisted features across Jira and Bitbucket, requires training data. Their products sit on top of enormous amounts of enterprise knowledge. Flipping the default from opt-in to opt-out is the fastest way to dramatically expand the training corpus without having to negotiate with each customer individually.

What Is Actually in That Data

The reason this matters more for Atlassian than for, say, a consumer product is the specific nature of what lives in Jira and Confluence. These are not general-purpose note-taking apps. They are where organizations store:

  • Unpatched security vulnerabilities tracked before a fix ships
  • Internal post-mortems documenting what broke and why
  • Roadmap discussions that amount to unannounced product strategy
  • HR and performance-related tickets routed through project management workflows
  • Architectural decision records for systems that have not been publicly described
  • Acquisition due diligence notes, legal discussions, and compliance findings

Confluence in particular functions as the institutional memory for many companies. Pages accrete over years. They contain frank internal analysis that would never be written if the author thought it would leave the company’s systems. When Atlassian collects this content as training data, even in aggregate and nominally anonymized form, the sensitivity profile is substantially different from a company harvesting the text of public GitHub READMEs.

Anonymization is also not as clean as the term implies. Research on re-identification attacks has repeatedly shown that sufficiently specific text, even without explicit identifiers, can be traced back to its source. A detailed Confluence page describing a specific system architecture at a specific company is not generic content; it is identifiable even after stripping usernames and org names.

The Default-On Pattern in Enterprise SaaS

Atlassian is not doing anything novel here. Slack drew significant criticism in 2023 when it emerged that their global training setting was opt-out, and their privacy update language had to be clarified after customer backlash. Zoom faced a similar situation when their terms appeared to allow training on meeting content without explicit consent. Adobe Firefly’s training terms triggered a wave of concern among creative professionals worried about proprietary artwork entering model training.

The pattern is consistent: a company needs data, they have user data, they update terms to permit using it, and they set the default to opt-in on behalf of users. The opt-out mechanism exists and is technically accessible, which gives legal cover. Whether most administrators will find it, understand it, and act on it is a different question.

For enterprise products, this dynamic is particularly sharp because the person who agreed to Atlassian’s terms of service is usually not the person whose work is being collected. An IT administrator signed the contract. The engineers, product managers, and executives whose words fill those Confluence pages had no direct say.

What “Training” Means in Practice

It is worth being specific about what AI training on your data actually means, because the phrase covers a wide range of technical realities with very different implications.

At one end, Atlassian could be using customer data to train a shared global model. If that model encodes information from your internal documents, there is at least a theoretical path by which a prompt from a different customer could elicit something derived from your data. This is the scenario that generates the most alarm, and it is also the one that is hardest to fully rule out with current interpretability tools.

At the other end, they could be using aggregated signals, things like which AI suggestions users accepted or rejected, to improve suggestion quality without exposing raw document content. This is far less sensitive, even though it technically counts as “using data to train AI.”

Atlassian’s documentation on Atlassian Intelligence describes using data to improve their models, but the specifics of what architectural isolation exists between customer tenants in the training pipeline are not publicly detailed in a way that would let a security team make a confident risk assessment. That opacity is itself a problem for enterprise buyers who have data governance obligations.

The GDPR Dimension

For European organizations, the opt-out default creates a compliance tension that goes beyond preference. Under GDPR, training AI models on personal data requires a lawful basis. Atlassian’s position, shared by most large SaaS vendors, is that their terms of service and privacy policy establish legitimate interest or contractual necessity as that basis. Data protection regulators across the EU have not reached a unified position on whether that framing holds for AI training specifically.

Atlassian does offer data residency controls, allowing customers to pin certain data to specific regions. Whether that residency guarantee extends to the AI training pipeline, or whether data is moved to a training environment that operates under different geographic constraints, is the kind of detail that matters for GDPR compliance and is not always clearly addressed in vendor documentation.

Organizations that have signed a Data Processing Agreement with Atlassian should review whether it specifically addresses AI training data use. Many DPAs signed before 2022 predate the current wave of AI feature development and may not account for this use case.

What Organizations Should Actually Do

If you run Atlassian products in an environment with sensitive data, the immediate action is to find the opt-out setting. In Atlassian’s admin console, under Privacy and Security settings, there is a toggle controlling data use for AI model training. Turning it off is not complicated once you know it is there. The problem is that the path to finding it is not surfaced prominently.

Beyond the immediate toggle, this is a reasonable occasion to audit what your organization’s Atlassian instance actually contains. Many Confluence spaces accumulate years of content that was never intended to be permanent institutional knowledge. Old pages with sensitive details that were relevant during a specific project but serve no ongoing purpose are a liability, both for AI training and for general data hygiene.

For teams evaluating Atlassian products or negotiating renewals, asking specifically about AI training data controls during procurement is now a reasonable expectation. Enterprise vendors have become accustomed to answering SOC 2 and ISO 27001 questions; AI training data governance should be added to that checklist.

The Broader Shift

What Atlassian did is not an isolated decision. It reflects a structural pressure on every major SaaS company: AI features are now a competitive requirement, quality AI features require training data, and the most accessible training data is the data users have already given you. The incentives all point in the same direction, which is why the opt-out default pattern will keep appearing.

The organizations best positioned to handle this are the ones that treat data governance as an ongoing operational concern rather than a procurement checkbox. Knowing what data you have, where it lives, and what your vendors are permitted to do with it is foundational, and the current wave of AI training controversies is making the cost of neglecting that foundation visible in a way it was not a few years ago.

The opt-out exists. Use it if you need it. But the more durable response is building the internal process to catch this kind of default change before it spends several months collecting data while your team was looking elsewhere.

Was this interesting?