autotailor: Add validation for variable names, rule IDs, and profile IDs #2342
autotailor: Add validation for variable names, rule IDs, and profile IDs #2342ggbecker wants to merge 2 commits intoOpenSCAP:mainfrom
Conversation
Add DataStreamValidator class that validates all IDs against the SCAP datastream before generating tailoring XML. This prevents silent failures from invalid variable names, rule IDs, group IDs, or profile IDs. Key features: - Parses datastream to extract valid profiles, values, rules, and groups - Validates IDs before use in Profile and Tailoring classes - Provides fuzzy matching suggestions for typos using difflib - Generates clear error messages with suggestions - Add --no-validate flag for performance-critical use cases Performance: - ~227ms overhead on 20MB datastream (validation enabled by default) - ~33ms with --no-validate flag (7x faster) - Validation prevents compliance drift and silent failures Fixes issue where autotailor accepted arbitrary variable names without validation, creating invalid XML that fails at evaluation time.
Add comprehensive unit tests for the new validation feature: - test_datastream_validator: Tests validator with valid and invalid IDs for profiles, values, rules, and groups - test_profile_with_validator: Tests Profile class integration with validator, ensuring invalid IDs are rejected - test_validator_suggestions: Tests fuzzy matching suggestions for typos in ID names All tests pass and verify that: - Valid IDs are accepted - Invalid IDs are rejected with clear error messages - Similar valid IDs are suggested for typos - Validation integrates properly with Profile class
|
| root = tree.getroot() | ||
|
|
||
| # Register namespaces | ||
| namespaces = { |
There was a problem hiding this comment.
This can be a global variable. It could be used all over the script.
| self.group_ids = set() | ||
| self._parse_datastream() | ||
|
|
||
| def _parse_datastream(self): |
There was a problem hiding this comment.
Split this method to more smaller methods.
| help="Skip validation of IDs against the datastream. This significantly speeds up " | ||
| "execution on large datastreams but may produce invalid tailoring files if incorrect " |
There was a problem hiding this comment.
| help="Skip validation of IDs against the datastream. This significantly speeds up " | |
| "execution on large datastreams but may produce invalid tailoring files if incorrect " | |
| help="Skip validation of IDs against the data stream. This significantly speeds up " | |
| "execution on large data streams but may produce invalid tailoring files if incorrect " |
| return msg | ||
|
|
||
| def validate_profile(self, profile_id): | ||
| """Validate a profile ID exists in the datastream.""" |
There was a problem hiding this comment.
It's always a "data stream", never a "datastream" or even something worse like "DataStream".
| if args.json_tailoring: | ||
| t.import_json_tailoring(args.json_tailoring) | ||
| try: | ||
| t.import_json_tailoring(args.json_tailoring) |
There was a problem hiding this comment.
Is the JSON tailoring also validated?
| help="Use local path for the benchmark href instead of absolute file:// URI. " | ||
| "Absolute paths are converted to basename, relative paths are preserved.") | ||
| parser.add_argument( | ||
| "--no-validate", action="store_true", |
There was a problem hiding this comment.
Add this option to the man page.
| if profile_id not in self.profile_ids: | ||
| raise ValueError(self._create_validation_error("Profile", profile_id, self.profile_ids)) | ||
|
|
||
| def validate_value(self, value_id): |
There was a problem hiding this comment.
It would be amazing if the tool would also verify validity of selectors used in the -V CLI option.
For example:
utils/autotailor -V xccdf_org.ssgproject.content_value_var_selinux_state=OOO /usr/share/xml/scap/ssg/content/ssg-fedora-ds.xml cis
This command should error because the XCCDF Value doesn't contain selector OOO. But currently it isn't checked and the command doesn't error and creates invalid tailoring.



Fixes: https://redhat.atlassian.net/browse/RHEL-143568