· 7 min read ·

Thirty Billion Images and One Line in the Terms of Service

Source: hackernews

When Popular Science reported that Pokemon Go players had collectively contributed 30 billion images to train delivery robots, most of the commentary landed on the consent angle: players didn’t know, the terms of service were vague, and a game company quietly sold vision data to an autonomous vehicle company. That framing isn’t wrong, but it skips past the more interesting question: why was this data so technically valuable in the first place, and why would a delivery robot company need 30 billion crowd-sourced street-level photos when they could have just driven a car around with a camera rig?

The answer is in the localization problem, and it explains why Niantic’s dataset was worth building a business around.

The Problem Delivery Robots Have to Solve

A sidewalk delivery robot from a company like Serve Robotics or Starship Technologies has a navigation problem that is distinct from the one a self-driving car faces. Cars operate on roads that are mapped exhaustively, with lane geometry encoded in HD map formats like OpenDRIVE. GPS accuracy on a road is usually sufficient to know which lane you’re in, and the sensor suite (lidar, radar, cameras) handles dynamic obstacles.

Sidewalks are harder. The built environment changes constantly: new scaffolding, seasonal plantings, outdoor seating that appears in summer and disappears in fall, temporary signage, construction barriers. A map built in January may be substantially wrong by March. And GPS accuracy in urban canyons, surrounded by tall buildings that bounce satellite signals, degrades to 5-10 meters, which is the entire width of a sidewalk.

The approach that scales is visual localization: the robot builds a geometric model of what it expects the world to look like from a given position, compares that to what its cameras actually see, and uses the comparison to estimate its precise pose. This is sometimes called Visual Place Recognition (VPR) or, in Niantic’s specific implementation, a Visual Positioning System (VPS).

The technical pipeline looks roughly like this:

Multiple 2D images of the same location (from multiple viewpoints and times)
  → Feature extraction: SIFT, ORB, or learned descriptors like SuperPoint
  → Feature matching across overlapping image pairs
  → Pose estimation via PnP solvers
  → Bundle adjustment to minimize reprojection error
  → Sparse point cloud with associated 3D feature locations

The output is a compressed geometric representation of the scene that can be stored on a server. When the robot encounters a location, it extracts features from its live camera feed and queries the database to find the closest matching pose. Good implementations achieve centimeter-level accuracy, which is what a robot navigating next to a curb actually needs.

The critical constraint is coverage. A visual map is only useful if it contains the location you’re trying to localize in. Building that coverage with a dedicated sensor vehicle is expensive and takes time. Building it with tens of millions of players who are already walking around with cameras in their pockets, photographing the same Pokestop from slightly different angles and in different lighting conditions, is not.

What Niantic Was Actually Collecting

Niantic’s route to this dataset began before Pokemon Go. Ingress, the predecessor game launched in 2012, built its landmark system from player-submitted locations. Pokemon Go inherited and massively expanded this with the “Wayfarer” program, where players could submit real-world locations as Pokestop or Gym candidates by photographing them and describing them. By 2020, Niantic had begun explicitly asking players to scan these locations with their phones, producing short video walkthroughs that captured the location from multiple angles.

These scans were the raw material for Niantic’s Lightship Visual Positioning System, announced as a developer platform in 2021. The pitch to developers was that they could anchor AR content to real-world locations with centimeter precision, using Niantic’s pre-built point cloud database. The underlying infrastructure is a set of georeferenced 3D reconstructions built from player scans.

The number 30 billion images is not outlandish given the scale. Pokemon Go peaked at over 230 million monthly active players in 2016 and has maintained tens of millions of active players since. Even a small fraction of those players contributing periodic scans accumulates quickly. For comparison, the ImageNet dataset that catalyzed the deep learning era of computer vision contains about 14 million labeled images. LAION-5B, one of the largest open image-text datasets used to train diffusion models, contains 5.85 billion pairs. Thirty billion is genuinely large.

The geographic distribution matters too. Niantic’s players are not concentrated in the corridors that a fleet operator would naturally survey first. They are in small towns, secondary streets, college campuses, park paths, and residential neighborhoods. That long-tail coverage of places a sensor car would never bother to drive through is precisely what makes the dataset commercially interesting to a delivery company trying to operate at national scale.

Here is what the data flow looked like from a user perspective. A player opened Pokemon Go, walked to a nearby location marked as a Pokestop, and triggered a scan to earn in-game rewards. Niantic’s terms of service, which the player agreed to at account creation, included language broadly licensing any content submitted to Niantic for use in connection with its services and business purposes. The scan uploaded, the player earned their reward, and the interaction was complete.

That ToS language is standard boilerplate. It does not say: the images you contribute may be processed to build a 3D reconstruction of public spaces, which may be licensed to third-party commercial operators of autonomous delivery robots. No player received a notification that their scan had been incorporated into a point cloud database and sublicensed to an automotive company. The terms permitted it; the disclosure did not describe it.

This is structurally different from how earlier crowdsourced data projects handled the same question. reCAPTCHA, when Luis von Ahn introduced it at Carnegie Mellon and later at Google, disclosed the dual-use purpose upfront: you are solving a challenge to prove you’re human, and in doing so you are helping digitize text. The social contract was stated. Users could evaluate whether they found it acceptable.

Waze, when it launched, was explicit that aggregated GPS traces from drivers contributed to the map. OpenStreetMap’s entire value proposition is voluntary contribution to a shared geographic commons, with clear licensing that contributors understand before they add data. These are not perfect models, but they represent a different relationship with contributors than “broad commercial license buried in terms of service.”

The European GDPR’s Article 5 establishes a principle of purpose limitation: personal data collected for one specified purpose cannot be processed for a materially different incompatible purpose without fresh consent or a legal basis. Whether visual scans of public locations constitute personal data under GDPR is a genuinely contested question. The scans may incidentally capture faces, license plates, private property details, and biometric-adjacent features of how a person moves through space. The Court of Justice of the EU has tended toward broad interpretations of personal data, and location data linked to a user account has been treated as personal data in multiple rulings.

The question of whether “helping Niantic build a visual positioning system for AR games” and “training autonomous delivery robots for a third-party commercial operator” are compatible purposes is the kind of thing that takes years to litigate and rarely produces a clear answer before the commercial fact is established.

The Broader Pattern

Pokemon Go is a vivid instance of a pattern that runs through the last decade of AI development: consumer applications that provide genuine value to users while simultaneously operating as data collection pipelines whose outputs are commercially valuable in ways users never anticipate.

Duolingo’s language learners contributed translations that trained machine translation systems. Waze and Google Maps users contributed mobility data that informed urban planning decisions and commercial real estate valuations. Captcha-solving contributed to OCR and autonomous vehicle pedestrian recognition. The difference in each case is how much the purpose was disclosed and whether users had meaningful alternatives.

The issue is not that these applications collected data, or even that the data was used commercially. It is that the described purpose and the actual use were far enough apart that a reasonable user, if told the actual use in advance, might have made different choices about participation, or at least would have made an informed one.

For the delivery robotics case, the asymmetry is particularly stark because the value extracted is so clearly defined and commercially significant. Niantic built a spatial data business on top of its game. The game provided the distribution, the reward structure, and the motivation. The players provided the labor. The resulting dataset became a commercial asset licensed to industrial buyers.

What Comes Next

The mechanics of this story will repeat. Any application that motivates users to photograph and describe the physical world, to trace routes through space, to annotate images, or to generate any kind of labeled data will face the same structural question: at what point does the data use diverge far enough from the disclosed purpose that the consent framework requires revision?

Regulatory pressure is moving toward more granular disclosure requirements. The EU AI Act, which came into force in 2024, includes transparency obligations for certain types of AI training data. Individual member states have begun enforcement actions under existing GDPR provisions that touch on data sourced from consumer applications. The US lacks a comprehensive federal framework, but state-level privacy laws, particularly in California and Colorado, are beginning to require more specific disclosure of secondary data uses.

The technical and business incentives point in the opposite direction. Granular disclosure reduces conversion rates. Users who understand exactly what data they’re contributing and how it will be used may contribute less of it. Game mechanics that implicitly reward data contribution are worth real money to the companies that operate them. The gap between what terms of service permit and what a reasonable user understands will persist until there is a regulatory cost to maintaining it.

The thirty billion images exist. The point clouds have been built. The robots are navigating. The question the story leaves open is what the next 30 billion images will be collected under, and whether the consent architecture will look any different when they are.

Was this interesting?