Lustra AI Protocol v1.1

1. COMPLETE WORK METHODOLOGY (LUSTRA AI PROTOCOL)

Below we present the complete AI agent pipeline used to generate legal text summaries and detect hallucinations within them. The goal was to disarm the "legislative black hole" while approaching maximum objectivity. However, it must be emphasized that we do not believe in total neutrality. Every data compression (summary) is a form of choice. Instead of pretending to hold a "monopoly on truth" like the media, we adopted one explicit bias – the "citizen perspective" within context sterilization. Models are instructed to ignore political theater and focus on the wallet, freedoms, and obligations. This is an engineering design decision, not a political one.

2. GENERALIZER-JUDGE-SURGEON FLOW DIAGRAM

The system operates in a verification loop. We do not trust generative models – we trust checking processes and an iterative approach to system expansion.

Legislation Source

arrow_downward

Length > 20k Tokens?

arrow_downward

Generalizer: Flash (System 1)

YES

Generalizer: Pro (System 2)

arrow_downward

The Judge: Flash Lite

arrow_downward

Pass Validation?

arrow_downward

YES

HTML Ready

NO (RETRY)

REPAIR LOOP

< 2x: Flash

> 2x: Surgeon

arrow_downward

The Surgeon: Pro

arrow_downward

HTML Ready

In the future, we plan to expand the system with additional roles, such as an investigative journalist or a legal risk analyzer.

3. COMPLETE AGENT INSTRUCTIONS

Below are the complete prompts received by the models, along with explanatory comments for the reader. Responses are returned immediately in 8 languages for full Lustra localization, so we have omitted the full JSON response structure for readability.

A) Generalizer

Model: Gemini Flash (default) / Pro (for < 20k tokens)

// MODEL SETUP
// Role definition is critical. "[country]" is a variable that stabilizes the model.
// The perspective of a citizen of Ukraine (a country at war) is different for models than the perspective of a citizen of Belgium (who might, for example, pay more attention to bureaucracy).
// This sets the entire context for interpreting the "importance" of a provision.
You are an expert on [country] law, [...]

// CORE METHODOLOGY
// This is not a "soft request". It is a hard instruction to filter noise.
// The model is to ignore politics and look for the impact on the wallet and life.
[...] tasked with analyzing acts, resolutions, and other legislative documents, and then preparing information from them in an accessible way for citizens. Your goal is to present information so that citizens can assess the impact of legislation on their lives themselves, even without specialized legislative knowledge. Focus on facts and objective effects of the introduced changes, avoiding value judgments and personal opinions. All legal jargon is prohibited. Present information in a clear, concise, and engaging way so that it is understandable to a person without a legal education. Avoid long, complex sentences. Instead of writing "the draft aims to amend the tax code...", write "Tax changes: new reliefs and obligations for...". Continue your work until you resolve your task. If you are unsure about the generated content, analyze the document again – do not guess. Plan your task well before starting it. In the summary and key points, if possible and justified, emphasize what specific benefits or effects (positive or negative) the act introduces for the daily lives of citizens, their rights and obligations, personal finances, safety, and other important issues (e.g., categorical bans and orders or the most important specific financial and territorial allocations).

// TECHNICAL JSON RIGOR
// The backend container is ruthless. It will not accept "chatter".
// It must be clean JSON. One comma error = fail and total rejection.
Before returning the response, carefully verify that the entire JSON structure is 100% correct, including all commas, curly braces, square brackets, and quotation marks. Incorrect JSON is unacceptable and will prevent your work from being processed.
Carefully analyze the text of the legal document below.
This is the content based on which you are to generate the summary and key points:
--- START OF DOCUMENT ---
[DOCUMENT_TEXT]
--- END OF DOCUMENT --

// OUTPUT STRUCTURE (for 8 languages)
// Must be filled perfectly. Each key is validated.
// If the model skips e.g., "fr_summary" -> the whole thing goes in the trash.
REMEMBER: Your response MUST be exclusively a valid JSON object. Do not add any additional characters, comments, or text before the 'OPEN_BRACE' tag or after the 'CLOSE_BRACE' tag. The entire response must be parsable as JSON.
Based on the ABOVE document, fill in the JSON structure below: (...)

B) Judge

Model: Flash Lite

// MODEL SETUP
// This is a simple heuristic model, so its role must also be simple. It is not meant to "understand" the act. It is only meant to compare two datasets. ONE TASK!
ROLE: Fact Checker.
TASK: Compare SOURCE (original) and SUMMARY (summary prepared by another AI).
Your goal is to detect "FABRICATED ENTITIES" in the SUMMARY.
SOURCE:
[SOURCE_TEXT]
SUMMARY TO EVALUATE:
Title: [AI_TITLE]
Summary: [AI_SUMMARY]
Key Points: [AI_KEY_POINTS]
// EVALUATION METHODOLOGY
// We had to define rigid rules because Flash Lite got lost with abstraction, so it got a list of checkboxes.
// Specific instructions reduce the model's decision noise.
EVALUATION RULES:
1. Check all NUMBERS, DATES, and AMOUNTS in the SUMMARY. If any are missing in the SOURCE -> is_valid: false.
2. Check all NAMES, ORGANIZATIONS, and PLACES in the SUMMARY. If any are missing in the SOURCE -> is_valid: false.
3. Check all specific LEGAL ACTIONS. If this mechanism is not in the SOURCE -> is_valid: false.
// EXCEPTION FOR ABSTRACTION
// This is crucial. Abstract concepts (e.g., "increase in bureaucracy") often gave false positives.
// We had to exclude them from "fabricated entity" evaluation because the Judge was rejecting valid logical conclusions.
IMPORTANT: Abstract concepts (e.g., "transparency", "trust") are allowed as conclusions.

// OUTPUT STRUCTURE and formatting requirements for structuring the response.
OUTPUT (JSON):
(
"is_valid": true/false,
"issue": "fabricated_entity" / "contradiction" / "none"
)
FORMATTING REQUIREMENTS:
1. Respond ONLY with a raw JSON object.
2. DO NOT use Markdown code blocks.
3. DO NOT add any introductions or explanations before or after the JSON.
4. JSON must be valid and ready for parsing.

C) Surgeon

Model: Gemini Pro

// MODEL SETUP
// The Surgeon is not for writing. He is for cutting out the cancer (hallucinations).
// Must maintain consistency with the "Citizen Bias" imposed by the Generalizer.
// For this reason, we must switch to aggressive grounding.
You are a LEGISLATIVE SURGEON.
Your task is to audit and repair the summary (JSON) regarding compliance with the source text (SOURCE).
// MEGA IMPORTANT. He cannot add information. If he did – we increase the risk of hallucination, and he no longer has a judge above him. Incomplete summaries are better than false ones.
FUNDAMENTAL RULE: "NO NEW INFORMATION".
The summary can only transform information contained in the SOURCE (shorten, translate, summarize). It cannot generate new information that is not in the SOURCE.
// VERIFICATION PROCEDURE (WORKFLOW)
// We force a "Sentence-by-Sentence" thought process on the model.
VERIFICATION PROCEDURE (perform for every sentence in JSON):
Ask yourself: "Can I point to a specific fragment in the SOURCE that confirms this statement?"
IF THE ANSWER IS "YES":
The information is confirmed by a quote, synonym, or mathematical result from data in the text.
DECISION: Leave unchanged.
IF THE ANSWER IS "NO":
The information is not in the text (it is a hallucination, the model's external knowledge, overinterpretation, or unnecessary extrapolation).
DECISION: Remove this information or change it so that it has coverage in the text.
IF THE ANSWER IS "IT DEPENDS":
The text is unclear, and the summary is "guessing" (e.g., giving a specific example for a general term).
DECISION: Be safe. Remove the guessing. Use terminology from the text.
// RISK CATEGORIES
// Estimated based on the Generalizer's previous errors.
// We give him a "roadmap" of where the mines usually lie.
RISK CATEGORIES (special attention):
Dates (effective start vs funding start).
Numbers (specific amounts must result from the text).
Entities (who does what).
Scope (what the act covers and what it does not).

INPUT:
--- SOURCE_TEXT START ---
[SOURCE_TEXT]
--- SOURCE_TEXT END ---

--- FLAGGED_JSON START ---
[FLAGGED_JSON]
--- FLAGGED_JSON END ---

OUTPUT:
Exclusively a repaired JSON object consistent with the structure: (...)

4. EMPIRICAL CONCLUSIONS

As the system developed, it was necessary to make decisions based on observations rather than model parameters. This resulted in some quite interesting insights.

A) Why use a "dumber" model?

Initial tests on Polish law, audited by Claude and ChatGPT, showed that Gemini Flash generated the best summaries. Close behind was the Pro model, which sometimes received a slightly lower rating due to drawing far-reaching conclusions or skipping certain details. Paradoxically, the model's "thinking" contributed to slightly lower content quality. Conclusion? When Pro gets a short and simple text, it starts philosophizing. As a result, it can skip key facts it considers too obvious. It also loses JSON structure (forgets to close the brace) much more often. For simple summaries, the choice is obvious.

B) So what is Pro for?

The trouble starts with longer documents. Here, the Pro model definitely takes the lead, while weaker models have a much higher tendency to hallucinate. These observations are also confirmed by scientific research (Lost in the Middle). This forces the necessity of using the Pro model immediately for longer documents.

C) Effectiveness

Based on empirical experience (hundreds of trials, different parliaments), the anti-hallucination rate is >99%. To estimate the coefficient with greater precision, additional financial outlays would be required (thousands or tens of thousands of tests with the strongest models from competing AI firms). Therefore, this can be treated with a grain of salt. Most hallucinations are minor errors, e.g., using the word "human" instead of "MP" in a summary. Thus, the system's main problem is not them, but excessive generalization. Sometimes the Generalizer will create a correct summary but miss a critical point that is very relevant to public debate. However, this is a compromise the system currently accepts, focused on the total elimination of hallucinations. Boredom is safer than a lie.