AI Transparency Statement
v1.0.0
April 27, 2026
Voluntary Disclosure Aligned with California AB 2013
About This Statement
GraphIQ is the world's largest B2B knowledge graph. We are a data layer, not a generative AI model developer.
California Assembly Bill 2013 applies to developers of generative AI systems. GraphIQ does not train or release foundation models. We publish this statement as a voluntary disclosure.
Our goal is simple: to help customers who build AI or GTM systems on top of our data meet their own AB 2013 obligations. In this document, we share how the GraphIQ knowledge graph is built, sourced, and maintained.
This document follows the twelve disclosure categories outlined in AB 2013. We keep it current as our sources and practices evolve.
About GraphIQ
GraphIQ is based in Los Altos, California. We operate a continuously growing, self-updating business knowledge graph of business information.
The graph connects 300M+ organizations, 351M+ people, 740M+ news articles, 575M+ locations, and billions of capability tags.
We use natural language processing to parse over 300,000 news articles per day. Every fact in the graph is attributed to its source.
GraphIQ is GDPR compliant and offers CCPA rights management for individuals whose information appears in the graph.
As of this writing, customers gain access to the GraphIQ knowledge graph through four methods: seat license, CRM connector, API, and MCP Server.
1. Origin: Sources and Owners of the Datasets
GraphIQ aggregates publicly available business data, licensed datasets, and information provided by customers for their own use. We fully attribute every fact in the graph to its underlying source, similar to a search engine.
Core source categories include:
Business directories and registries: Bloomberg, Dun & Bradstreet, Crunchbase, LinkedIn (public profile data), S&P Global, Yellow Pages, Better Business Bureau, The Org, Wikipedia.
Verified corporate websites. Hundreds of millions of public company domains.
News and media: Reuters, Bloomberg, Al-Jazeera, BBC, CNN, The New York Times, The Guardian, Yahoo, and thousands of regional and trade publications.
Government and trade databases: NAICS, ITA, USPTO, SEC EDGAR, Sam.gov, U.S. Customs.
Industry ontologies: PubChem, ChEMBL, Wikidata, NAICS, NACE, SIC, ISIC, WCO/HS.
Customer-provided inputs (for customer-specific enrichment only, not added to the public graph).
Source ownership varies. Government databases are public domain. Commercial directories and news feeds are owned by their publishers. We access each source under the terms that apply to it. We only crawl websites with robots.txt permissions.
2. Purpose Alignment: How Datasets Support the Intended Function
GraphIQ is GTM infrastructure. Customers use the graph for sales prospecting, supplier discovery, market development, company monitoring, company intelligence, and more.
Each source maps to a specific function:
Business directories and corporate websites build the firmographic backbone: name, industry, revenue range, employee count, locations.
News feeds keep the graph fresh. NLP extracts events, people, product launches, and M&A activity in near real time.
Government and trade databases provide canonical identifiers, industry codes, and filings.
Industry ontologies let customers filter on capabilities, technologies, certifications, and product taxonomies.
People data supports outreach, account mapping, and contact enrichment.
The graph does not generate text, images, or decisions on its own. AI-adjacent features (NLP extraction, embedding-based lookalike search, optional summarization) exist to make the underlying data more useful, not to create new content.
3. Data Volume
Volumes as of the most recent graph refresh:
Organizations: 300M+
People: 351M+ (approximately 65M with verified business contact information)
News articles: 740M+ (growing by 240,000+ per day)
Locations: 575M+
Capability, technology, certification, and product tags: billions
Pre-defined job titles: 47,471 plus custom keyword support
These figures are approximate and grow continuously. GraphIQ typically updates the knowledge graph three times per quarter.
4. Data Types
The graph contains both labeled (structured) and unlabeled (raw text) data.
Labeled organization records include name, aliases, industry codes, revenue range, employee count, headquarters, subsidiaries, parent organizations, investors, suppliers, customers, competitors, partners, and capability tags.
Labeled people records include name, title, seniority, department, employment history, education, skills, certifications, and monthly-verified business contact information (work email, business phone, LinkedIn URL).
Labeled news records include sentiment, summary, mentioned companies and people, event type (for example, product launch, M&A, executive change), author, publication, language, region, and timestamp.
Labeled location records include address, geo-coordinates, facility type, and the people associated with that site.
Unlabeled data consists of raw web page and news article text ingested before NLP extraction. This content is used to build labeled records and is not redistributed verbatim in the product.
5. Intellectual Property Status
The graph includes a mix of public domain, publicly available third-party, and licensed content.
Public domain: government filings, SEC EDGAR, USPTO, Sam.gov, NAICS/SIC/NACE/ISIC codes.
Publicly available third-party content: news articles, corporate websites, and business directory pages. These may be subject to copyright held by the publisher.
Licensed content: commercial data feeds obtained under written agreements.
Trademarks and patents: we surface trademark and patent references sourced from USPTO and equivalent public registries.
GraphIQ does not reproduce copyrighted articles in full. We extract facts and short summaries with a link back to the original publisher.
6. Commercial Arrangements
Datasets enter the graph through a combination of:
Open web ingestion of publicly available pages.
Paid licenses with commercial data providers and news aggregators.
Purchased data subscriptions for specific coverage areas.
Free access to government and public-domain databases.
7. Personal Information (CCPA)
The graph contains personal information as defined by the California Consumer Privacy Act.
Specifically, it includes professional contact data for business contacts: name, job title, employer, work email, business phone number, LinkedIn URL, employment history, and work-related skills and certifications.
GraphIQ honors CCPA rights. California residents can request access, correction, deletion, or opt out of sale or sharing through the process described in our Privacy Policy at https://graphiq.ai/legal/privacy-policy.
We do not include sensitive personal information such as Social Security numbers, government IDs, financial account numbers, precise geolocation of individuals, or health data.
8. Aggregate Consumer Information (CCPA)
The graph includes aggregate consumer information as defined by the CCPA.
Examples include employee count ranges, departmental headcount breakdowns, revenue brackets, location-level staffing summaries, and industry-level benchmarks. These are aggregated from the underlying records and are not tied to any specific individual.
9. Data Processing: Cleaning, Processing, and Modifications
Every record in the graph is processed before it is exposed to customers. Processing steps include:
Entity resolution: deduplicating and merging records that refer to the same organization, person, or location across sources.
Natural language processing: parsing news articles and web content to extract organizations, people, events, product launches, M&A activity, and sentiment.
Relationship mapping: linking entities into the graph (parent/subsidiary, employer/employee, supplier/customer, partner, investor).
Quality scoring: ranking facts by source reliability, recency, and corroboration.
Source attribution: attaching every fact to its original source document(s).
Embedding generation: producing vector representations to power lookalike search and semantic filtering.
Contact verification: monthly re-verification of business emails and phone numbers.
Normalization: standardizing job titles, industry codes, currencies, and addresses.
Processing is intended to make the graph accurate, connected, and queryable. It is not intended to alter the substance of the underlying public record.
10. Collection Timeframe
Collection is ongoing.
News ingestion runs continuously and refreshes throughout each day. Contact information is re-verified on a monthly cycle. Firmographic and relationship data is updated as new public information becomes available.
GraphIQ saves the earliest date of data currently retained in the graph. Historical coverage depth varies by source.
11. Usage Dates: When Datasets Were First Used
GraphIQ has operated its knowledge graph in production since January 2024.
Major source categories were first integrated as follows:
Business directories and corporate websites: January 2024.
News and media feeds: July 2024.
Government and trade databases: February 2026.
Industry ontologies: January 2024.
Capability and technology tagging: January 2024.
Alumni databases: April 2026.
New sources are added on a rolling basis. This statement will be updated as material additions occur.
12. Synthetic Data
The graph itself is not built from synthetic data. Every organization, person, location, and news article in the graph originates from a real source.
If we decide to use synthetic data, we will update this statement.
Contact and Updates
Questions about this statement can be directed to privacy@graphiq.ai.
We review and refresh this statement on a regular cadence and whenever our data practices change in a material way.