HData TeamApril 21, 20265 min read

How File Selection Shapes Your Regulatory AI Results

8:04

Guidance on Document Selection

Many people assume that prompting AI with more information will produce better results. In regulatory research, that instinct can actually work against your goals.

HData’s Regulatory AI can process up to 400 files in a single session, which is far more than general-purpose AI tools. But the guide HData recently published, Regulatory AI: A Practical Guide for Powerful Results, makes a point worth sitting with: selecting more documents is not always the best approach. Knowing how to match your file selection set to your research goal is one of the most important skills you can develop, and it’s one that pays off every time you open a session.

Why Document Selection Matters

Regulatory AI is grounded. It only draws from documents you’ve selected, whether those are filings in HData’s Library catalogs or files you’ve uploaded or integrated directly. That is a deliberate design choice, not a limitation. Grounding ensures every response is attributable to specific documents, page numbers, and passages. You can trace any answer back to its source to ensure accuracy.

But grounding also means your file selection defines the boundaries of what Regulatory AI can provide in responses. The quality of your results depends as much on which documents you bring into a session as on the prompts you write.

That is why you should select documents strategically and not simply load files to the 400-file limit. You should consider the relevance more so than volume of files in relation to your research goal. Before starting a Regulatory AI session, continue refining your search to find the relevant files. Use HData’s built-in search and filter tools, such as boolean modifiers, to narrow results by docket, filing entity, or date. A well-filtered, targeted document set will consistently produce more precise and useful results than a large, loosely defined one.

The guide frames document selection around a simple principle: fewer files for depth, more files for breadth.

The Depth-Breadth Spectrum

You can think of file selection as a spectrum with four zones:

1–50 files: Focused analysis

When you need to go deep on a specific issue, such as a single witness's testimony, a contested cost-recovery proposal, a specific section of a rate case filing, keep your document set tight. Regulatory AI produces more precise, detailed answers when it is not scanning loosely related material.

Use case example: 10 files selected from Florida Public Service Commission docket No. 20250029 for 2025.

Prompt: “Analyze these witness testimonies and exhibits for gaps, unsupported assumptions, or inconsistencies.”

Result: Regulatory AI identified several significant gaps, unsupported assumptions, and inconsistencies across multiple areas of the case, with citations linking directly to the relevant passages in each file. The following screenshot shows an excerpt of the response.

file-selection-blog-post-screenshot (1)-1

50–150 files: Targeted comparison

When you need to compare how several utilities or parties address the same issue, this range gives Regulatory AI enough material to surface meaningful differences without diluting the analysis.

Use case example: 115 files selected under the California Public Utility Commission in 2024 and 2025.

Prompt: “Acting as an intervenor, summarize what the Sierra Club as an intervenor argued in 2024 and 2025 before the Commission.”

Result: Regulatory AI identified the Sierra Club as an active intervenor in numerous proceedings, organizing its findings by advocacy topics (e.g., environmental protection, ratepayer interests, and clean energy policies) and regulatory area (e.g., gas infrastructure, utility rate design, electrification projects, and cost of capital proceedings). The following screenshot shows an excerpt of the response.

file-selection-blog-post-screenshot (2)

150–300 files: Broad pattern analysis

For research that spans multiple proceedings, utilities, or time periods, a range of 150-300 files is appropriate. This range may be used to identify trends in IRP filings or track how a particular issue has evolved across dockets. The guide recommends breaking large research tasks into regional or topical subsets and using follow-up prompts to home in on recurring findings.

Use case example: 245 files selected covering program tariff applications that Gas and Electric Company submitted to the Wisconsin Public Service Commission in 2024.

Prompt: “From the perspective of a utility, analyze the major themes, contested issues, and outcomes across program tariff applications to the Commission.”

Result: Regulatory AI segmented findings into thematic sections with detailed subsections and outcomes, providing a structured foundation for deeper follow-up research. The following screenshot shows an excerpt of the response.

file-selection-blog-post-screenshot (3)-1

300–400 files: Wide-sweep research

Industry-wide benchmarking, such as comparing how utilities across 20 or more states approach storm hardening cost recovery, is a common approach with the high end of the file range. Keep in mind that Regulatory AI will cite only the documents that directly support your prompt, not necessarily all 400.

Use case example: 400 files selected covering utility cost recovery strategies in Pennsylvania.

Prompt: “Compare how utilities approached the issue of cost recovery.”

Result: Regulatory AI found that utilities employ diverse strategies for cost recovery, ranging from traditional base rate cases to specialized surcharge mechanisms, with significant variation in approach across regions when facing opposition from consumer advocates and regulatory bodies concerned about rate impacts and procedural compliance. The following screenshot shows an excerpt of the response.

file-selection-blog-post-screenshot (4)-1

A Practical Tip for Broad Research

When working across a large document set, the guide highlights that you shouldn’t ask one sweeping question. Instead, you get better results by breaking the research into regional or topical subsets. For example, rather than asking "What are common themes in recent Midwest utility IRPs?" across 300 files at once, you should run separate prompts for Illinois utilities, Indiana utilities, and so on. Once themes emerge across subsets, you can use follow-up prompts to go deeper on the ones that matter most. This keeps responses focused, citations traceable, and analysis actionable, regardless of how many files are in your session.

The guide is also clear about what happens when the spectrum is ignored: if a question spans too many documents or requires too much output, responses can be cut off or incomplete. When that happens, the fix is almost always to narrow the prompt and the document set, not to add more files.

Download the Guide

Document selection is one section of a broader framework in Regulatory AI: A Practical Guide for Powerful Results. The full guide covers goal-setting, prompting principles and frameworks, and use case prompt examples.

Download the guide here.

Not yet using HData's Regulatory AI? Request a demo to see what it can do for your team.

About HData

HData is a regulatory technology leader focused on making regulatory information accessible and actionable. It combines expansive data libraries with purpose-built AI and automation that powers compliance reporting, research, analytics, and operational intelligence across public-sector and regulated markets. HData’s solutions help government agencies, utilities, energy professionals, advocates, advisory firms, and energy technology companies ensure regulatory data readiness, risk reduction, better workflows, actionable insights, and decisions with confidence. Visit www.hdata.com to learn more and request a demo.