The fact that excessive data exposure is consistently in the top three of the top ten API Security threats, according to OWASP, year after year should help highlight the prevalence and importance of addressing it within your own APIs.
91% of enterprise security officials had an API security incident in 2020, including major companies like Paypal, Facebook, and Equifax - which saw massive losses to their reputation and company value due to those data breaches.
But what is excessive data exposure? And what are the best ways to protect your APIs from falling victim to a cyber-attack targeting this vulnerability?
What is Excessive Data Exposure and Why Is It on the OWASP Top 10 List?
Excessive data exposure occurs when an application, via API response, returns more information than necessary for a user to perform a specific action.
When web and mobile apps regularly rely on API calls that return more information to the user than necessary, those responses expose unfiltered data that an attacker can take advantage of to gain sensitive information.
An illustration of how excessive data exposure occurs and the harm that can be done if, for example, an e-store owner wants to pull customers' names and locations to use in a marketing campaign. Here's what such an API request might look like:
https://yourestore.com/api/v1/customers/show?customer_id=123
The API would then pull the entire object from the database, including the information you're looking for:
{
"id": 123,
"username": "user123",
"real_name": "John Doe",
"location": "San Antonio, TX",
"phone_number": "321-123-4565",
"address": "514 W Commerce St, San Antonio, TX, USA",
"creditCard": "2342 3424 5323 1234",
"CVV": "123",
“validUntil": "2030”
}
Excessive data exposure then occurs when the API returns too much information, instead of filtering only the fields required, which should look like this:
{
"id": 123,
"real_name": "John Doe",
"location": "San Antonio, TX"
"
}
When API developers mistakenly think that since data is not visible, it's not susceptible - that is when companies open their APIs to sensitive data exposure which can result in horrible situations like identity theft, fraud, and even leaked trade secrets.
With over 155.8 million individuals in the US affected by data breaches in 2020 alone, protecting sensitive data exposure has been a major focus for the OWASP organization to help developers understand that hidden data is still highly vulnerable to attackers.
Read More: How to Secure an API: Best Practices - APIsec
API Excessive Data Exposure vs Generic Information Leakage
Before diving into prevention, it's worth drawing a clear line between API-specific excessive data exposure and generic information leakage. Conflating the two leads to incomplete testing and a false sense of coverage.
Generic information leakage covers configuration and hygiene failures, debug endpoints left exposed, stack traces in error responses, server banners revealing version details. These are relatively straightforward to catch through standard disclosure testing. API excessive data exposure is a different problem entirely. The endpoint works exactly as built, authentication succeeds, and the server returns a 200. The problem is that the response carries far more fields than the client needs or is entitled to see.
What makes this harder to catch is that most of that excess data is invisible to the end user. A mobile app displays a name and profile photo, while the underlying API response carries date of birth, phone number, internal account flags, and payment identifiers that the app simply never renders.
Here are the ways this becomes a serious data exploitation risk:
- Excess fields in responses are invisible to end users but fully readable to anyone intercepting or inspecting API traffic
- Paired with predictable object identifiers, over-returned data gives attackers a systematic way to harvest sensitive records across an entire user base
- Every response looks legitimate, every status code is 200, and the sensitive source data breach happens without triggering a single authorization failure.
Field-Level Authorization: The Root Cause Most Teams Miss
Most discussions of excessive data exposure focus on what gets returned. The more useful question is why, and the answer is almost always field-level authorization that either doesn't exist or isn't enforced consistently across roles and tenants.
Field-level authorization failure means the API checks whether a user can access an endpoint, but never checks whether that user is entitled to each field within the response. A viewer role calling GET /api/v1/users/me should receive name, email, and profile photo. If the resolver returns the full user object internal flags, admin notes, linked payment methods, and historical access logs because no field-level filter exists for that role, the information leakage is silent and continuous.
Here are the ways this failure manifests across different access contexts:
- Role mismatches, a standard user receives fields documented as admin-only because the serialization layer returns the full object regardless of who's requesting it
- Tenant bleed in multi-tenant environments, a response for Tenant A includes metadata fields from Tenant B's configuration because field filtering doesn't account for tenant context in shared infrastructure
- Attribute inheritance gaps nested objects inherit the access control of their parent rather than enforcing their own, so accessing a user profile surfaces all embedded sub-objects regardless of their individual sensitivity
These failures don't show up as 403s or authentication errors. They show up as extra fields in otherwise legitimate responses, which is exactly why they survive multiple security audits undetected.
When Valid Access Becomes a Vulnerability
Excessive data exposure doesn't always require an authorization failure. Some of the most damaging scenarios occur when a user accesses their own data through a valid, authenticated request, and the response includes data that belongs to someone else or was never meant to be client-facing.
Here are the ways horizontal data exposure surfaces in real API environments:
- A call to GET /api/v1/account/summary returns the authenticated user's details alongside aggregate fields that inadvertently include data derived from other users in the same cohort
- A search endpoint accepting user-supplied filter parameters returns records from across the full dataset rather than scoping results to the authenticated user's own objects
- An /api/v1/orders/recent endpoint returns a correctly scoped list but includes embedded fields on each order supplier IDs, internal cost margins, fulfillment partner details that belong to the vendor's data model, not the customer's
No privilege escalation is required, and no obvious trace is left in logs. The user made a valid request, received a 200, and the API behaved exactly as coded. The fact that the response contained fields the user was never supposed to see is only visible at the field level, which is why automated response schema validation matters more than endpoint-level access control testing alone.
How To Protect Your APIs Against Excessive Data Exposure

Thankfully, there are measures you can take to protect your APIs from exposing sensitive data unnecessarily.
When you stop your APIs from sending excessive data, it becomes much more challenging for cybercriminals to gain access to anything you don't want them to see. These six tips will go a long way to locking down your data from those with malicious intent.
1. Restrict the Client from Performing Data Filtering
Delegating data filtering to the client is a shortcut hackers are more than willing to take advantage of to steal your sensitive data.
The golden rule is simple: never leave data filtering to the client when dealing with sensitive user information.
Accessing raw, unfiltered information is the gold standard for cybercriminals, so you need to take full control of your sensitive data from start to finish to actually protect it.
Instead of giving away entire data objects, craft specific API responses to all of the most common API calls to limit the flow of data to only fields necessary to complete a specific action. If absolutely necessary to return sensitive data, consider masking the data.
2. Control & Minimize Returns In Your API Responses
As we mentioned before, reviewing your most common use cases to minimize the amount of data all of your API responses contain to the bare minimum is the best way to avoid excessive data exposure.
Every response and every data field must be treated as a vulnerability that can potentially be exposed because they are.
Minimizing the returns not only lowers the attack surface but also shields your full data set, making it harder for attackers to get a complete understanding of the systems being used and discover critical vulnerabilities.
GraphQL Over-Fetching and Schema Exposure
GraphQL APIs introduce a specific variant of excessive data exposure that REST APIs largely avoid. Because GraphQL allows clients to request any combination of fields the schema exposes, the attack surface for data over-return scales directly with how much the schema exposes and how consistently field-level authorization is enforced on each field.
Here are the ways GraphQL amplifies this risk in practice:
- Unrestricted field selection: a query requesting every field on a User type returns everything the schema defines, regardless of whether the requesting user is entitled to each field. Without field-level authorization in every resolver, sensitive fields surface alongside public ones
- Schema introspection as a reconnaissance tool, with introspection enabled in production, an attacker downloads the complete schema before writing a single data query, turning it directly into an information leakage tool
- Nested object over-fetching GraphQL's relationship traversal allows a query on a public object to walk into deeply nested private ones, returning sensitive financial or personal data if nested resolvers don't enforce their own authorization independently
Disabling introspection removes the reconnaissance capability but doesn't fix the underlying problem. Every field in a GraphQL schema needs its own authorization check independent of the query entry point, and that needs to be tested explicitly rather than assumed from endpoint-level access controls.
What Over-Returned Data Actually Costs
Mapping specific over-returned field types to their business consequences helps teams prioritize remediation based on actual risk. Here is how common overexposed field types connect to real data exploits scenarios and how APIsec validates each:
3. Encrypt Data During Transit and at Rest
Encrypting data during transit with methods like SSL, TLS, or FTPS significantly reduces the likelihood of third parties gaining access to sensitive data even if they managed to hijack an API response.
Instead of capturing valuable, potentially sensitive data, hackers get a combination of random numbers and symbols that will remain completely useless without a specific key required to decode them.
4. Automate API Security Monitoring
An API is a complex system, so it’s not uncommon to see new vulnerabilities pop up as a direct result of patching up API security loopholes in the first place.
The problem is that even if you inspected every part of your code to protect your API, you need a full API security check every time you update your build, release a new feature, or even fix a few bugs here and there.
Doing that manually is not a realistic option as each API test takes a considerable amount of resources to execute.
Automated API security testing tools completely eliminate those issues by leveraging the power of AI to monitor APIs around the clock across hundreds of potential vulnerabilities.
APIsec allows you to do just that, providing automated, comprehensive, and continuous API testing to keep your API protected.
Are you ready to give it a try? Get in touch with our team today to get a free vulnerability assessment.
Shift-Left Enforcement Catching Over-Exposure Before Deployment
Encryption and response minimization address excessive data exposure after it's been built. Shift-left enforcement addresses it before over-returning fields ever make it into a deployed API, which is where fixing it is fastest and cheapest.
The core idea is straightforward: define what each endpoint is permitted to return, enforce that definition at the schema level, and validate it automatically on every deployment rather than discovering violations through post-production disclosure testing. Here are the ways schema governance and CI/CD integration prevent excessive data exposure from being introduced in the first place:
- Schema-level field classification fields carrying PII, financial data, or internal metadata are tagged at the schema level during design, driving automated validation rules downstream rather than relying on developers remembering to filter them in every resolver
- Response schema comparison in CI/CD APIsec maintains a baseline of expected response schemas per endpoint per role. Every deployment is compared against that baseline, and any newly introduced field triggers a review gate before the change reaches production
- Role-based variance detection on new endpoints. When a new endpoint is added, APIsec automatically runs role comparison tests against it, flagging any case where the response field set doesn't vary appropriately between privilege levels
- Sensitive field regression testing fields classified as sensitive are tracked across releases. If a refactor or serialization change causes a previously suppressed field to reappear in a response, the regression is caught in the pipeline rather than discovered after the fact
This turns excessive data exposure from a production discovery problem into a development-time enforcement problem, closing the gap between what the schema intends to expose and what the API actually returns across every role, tenant, and deployment cycle.
Are you ready to give it a try? Get in touch with our team today to get a free vulnerability assessment.
FAQs
1. How to handle large data in a web API?
Use pagination, server-side filtering, streaming, and efficient query design to avoid large payloads and slow responses.
2. What role does field-level authorization play in mitigating data overexposure in APIs?
Field-level authorization ensures users only receive attributes they’re allowed to see, preventing leaks even when endpoint access is legitimate.
3. Can fuzzing tools detect excessive data exposure in APIs automatically?
Fuzzers can reveal inconsistent or expanded responses, but detecting overexposure often requires logic-aware testing rather than random mutations.
4. Why is relying on generic serialization (e.g., to_json()) risky for data exposure?
Generic serialization dumps entire objects, exposing internal fields that were never intended for clients.
5. How does classifying and labeling sensitive fields reduce the risk of excessive data exposure?
Classification helps teams enforce stricter rules, masking, and access controls for fields carrying financial, personal, or regulated data.
6. What is the trade-off between response minimization and API usability for clients?
Minimal responses reduce exposure but may require more client calls; usability increases with richer responses but expands the attack surface.

.webp)
.png)
