fix: tag rename capitalization by jesperhodge · Pull Request #555 · openedx/openedx-core

jesperhodge · 2026-04-20T14:43:14Z

Testing Instructions

In Course Authoring, go to taxonomies - open a taxonomy detail view
Go to a tag row, click three dot menu in the row
Rename tag to the same name, only change capitalization (e.g. "Science" becomes "science"), nothing else
It should successfully rename the tag to the new capitalization and should not show any error

openedx-webhooks · 2026-04-20T14:43:20Z

Thanks for the pull request, @jesperhodge!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
- This process (including the steps you'll need to take) is documented here.
If it doesn't, simply proceed with the next step.

🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

Dependencies

This PR must be merged before / after / at the same time as ...
Blockers

This PR is waiting for OEP-1234 to be accepted.
Timeline information

This PR must be merged by XX date because ...
Partner information

This is for a course on edx.org.
Supporting documentation
Relevant Open edX discussion forum threads

🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details

Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

The size and impact of the changes that it introduces
The need for product review
Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

Copilot

Pull request overview

This PR fixes tag rename behavior so that updating a tag’s value with only capitalization changes (e.g., Science → science) is treated as a valid update rather than a duplicate-value error. This aligns the tagging API behavior with the Course Authoring taxonomy UI rename flow.

Changes:

Add a regression test covering capitalization-only tag rename via the taxonomy tags update endpoint.
Update tag value uniqueness validation to exclude the tag being updated from the “already exists” check.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`tests/openedx_tagging/test_views.py`	Adds a test ensuring capitalization-only renames succeed (HTTP 200) and return the updated value.
`src/openedx_tagging/rest_api/v1/serializers.py`	Adjusts uniqueness validation to avoid flagging the tag itself as a duplicate during updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-20T16:23:46Z

    taxonomy_id = context.get("taxonomy_id")
+    original_tag = Tag.objects.filter(taxonomy_id=taxonomy_id, value=original_value).first() if original_value else None
+    tag_id = original_tag.pk if original_tag else None
    if taxonomy_id is not None:
-        # Check if tag value already exists within this taxonomy. If so, raise a validation error.
        queryset = Tag.objects.filter(taxonomy_id=taxonomy_id, value=value)
+
+        # Don't compare tag against itself when validating its updated value.
+        if tag_id:
+            queryset = queryset.exclude(pk=tag_id)


validate_tag_value now does an extra DB query (.first()) to resolve the original tag before running the uniqueness check. If you want to keep renames as a single-query validation, you can either fetch only the pk via values_list('pk', flat=True) (and only when taxonomy_id is present) or fold the self-exclusion into the existing uniqueness query so it doesn't need the extra lookup.

On a related note, do we even need this parallel validation in the serializer? The API code and models are already doing the same validation elsewhere in the code, so I'm wondering if we can just catch those errors and report them correctly (as 400 errors), or if we really need this extra validation logic here to do that.

Yes, user input validation logic belongs in the view, not in the model. This validator was added in another issue because of missing validation on the model. ValidationErrors coming from the model did not catch duplicate tag updates in the first place, which caused an IntegrityError from the database - something that should never happen if you have proper validations. So it was necessary to add this additional validation.
As to where to put the validation, putting more validations on the model but allowing requests to pass through to the model even though they are not considered valid, and then running any other code that might be part of instantiating the model, invites security risks. Having user input validations in the model is in general already a red flag to me.
And there's no such thing as too much validation. If you want to have additional validations in the model, that is fine, but it does not replace the need for (possibly the same) validations in the View / Serializer.

The API code and models are already doing the same validation elsewhere in the code

No they are not: #536

validate_tag_value now does an extra DB query (.first()) to resolve the original tag before running the uniqueness check. If you want to keep renames as a single-query validation, you can either fetch only the pk via values_list('pk', flat=True) (and only when taxonomy_id is present) or fold the self-exclusion into the existing uniqueness query so it doesn't need the extra lookup.

@bradenmacdonald @mgwozdz-unicon Regarding copilot's review:
An extra DB Query is intended and fine, and really makes no significant difference performance-wise.

allowing requests to pass through to the model even though they are not considered valid, and then running any other code that might be part of instantiating the model, invites security risks.

Sure, in a theoretical sense, but this is not sensitive code. And generally this is the way Django is designed to work: https://docs.djangoproject.com/en/6.0/ref/models/instances/#validating-objects https://docs.djangoproject.com/en/6.0/topics/forms/modelforms/#validation-on-a-modelform (user input validation explicitly involves model validation)

Having user input validations in the model is in general already a red flag to me.

I understand that, but I tend to approach this in the opposite perspective. Data flows via REST API -> python API -> ORM model -> database. But some operations work directly on the database. Others use the ORM models directly. Others use the python API. And finally, many requests go through the REST API. I tend to try to keep all validation as "low level" as possible (i.e. in the database itself) because then it applies to 100% of changes. The higher level your validation is (e.g. in the serializer), the higher the chance of someone bypassing it and getting invalid data into your system.

That said, you're absolutely right that there's nothing wrong with too much validation, but in the spirit of keeping things consistent, any high-level validation should try to share code definitions like RESERVED_TAG_CHARS etc. with the low-level validations so we're sure the same rules are being applied consistently.

Django and DRF both try to support this approach: ModelForm generates validators automatically from the model validators and also calls the model validation explicitly. DRF ModelSerializer generates fields (and validators) from your model fields/validators automatically. (Although they do recommend replacing them with explicit serializers that you write manually.)

Feel free to leave your validation code in, especially the part that uses RESERVED_TAG_CHARS but I personally think we don't think we need the check for duplicate values, as it's easier to just call full_clean() and use Django's check. I guess I'm OK with either approach though.

Okay. Since this ticket is just a fix for the validation code and not an addition of validation code, removing it here would be out of scope anyway.

Some comments for posterity:

Any API endpoint is sensitive code and needs to be guarded against security breaches.

In terms of relying on the perspective to keep validation as "low level" as possible, I don't see that as acceptable. This is against security best practices. As stated in the OWASP documentation, "Input validation should happen as early as possible in the data flow." (https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html)

As for Django model serialization and validation, they are an additional layer of possible validation and security, but not a replacement. The standard in the DRF world is also to use the validation methods provided by the View Serializers.

Haha, well, they are supposed to be. Let's try to keep the code DRY and not have two totally separate validation code paths. Please see efa9128 for how to fix the existing validation so it runs everywhere, and results in 400 errors as appropriate in the API without the need to add any new redundant validation at the serializer level.

As I have outlined above, this is against security best practices. If you want to consolidate validation, you can think about extending DRF's View Serializer validations, but I would prefer not to remove any validations from the View level. That is setting a bad pattern inviting security flaws.

If you like to keep things DRY:
What do you think about extracting the core logic (like RESERVED_TAG_CHARS or the regex patterns) into a shared utility? That way, both the Serializer and the Model utilize the exact same rule, but they execute it at their respective, appropriate layers.

RESERVED_TAG_CHARS is already being shared among the layers, and yeah there is lots of other cleanup and validation like trimming whitespace that could be moved into a shared utility but it's way out of scope for this PR. Let's just focus on solving the bug.

I would prefer not to remove any validations from the View level

It was only added last week, and it introduced a bug because it does the validation wrong, whereas the existing validation code was already correct but was simply not being reported correctly as 400 error rather than a 500 error.

This is really not a security-sensitive context*, so my preference is to focus on implementing it correctly, and avoiding duplicate logic is one way to do that. However, your perspective is completely valid. I'm fine with moving forward with your approach as long as you update the documentation of your validator to clarify that we're trying to validate early and replicate the model validation which will still run later. I don't want developers being confused about what validation happens where nor which is the "true" validation and which is extra input validation.

* Tags are not used for access control, taxonomies are typically scoped to a single organization, these are simple string values, and there's no XSS/code/SQL/HTML injection risk.

ormsbee · 2026-04-20T18:11:11Z

Just so I'm clear: Is the idea that we don't allow tags that differ just by case, i.e. I couldn't have both "Science" and "science" at the same time, but we do want to allow changing case, so "science" can be renamed "Science"?

bradenmacdonald · 2026-04-21T00:27:30Z

    taxonomy_id = context.get("taxonomy_id")
+    original_tag = Tag.objects.filter(taxonomy_id=taxonomy_id, value=original_value).first() if original_value else None
+    tag_id = original_tag.pk if original_tag else None
    if taxonomy_id is not None:
-        # Check if tag value already exists within this taxonomy. If so, raise a validation error.
        queryset = Tag.objects.filter(taxonomy_id=taxonomy_id, value=value)
+
+        # Don't compare tag against itself when validating its updated value.
+        if tag_id:
+            queryset = queryset.exclude(pk=tag_id)


On a related note, do we even need this parallel validation in the serializer? The API code and models are already doing the same validation elsewhere in the code, so I'm wondering if we can just catch those errors and report them correctly (as 400 errors), or if we really need this extra validation logic here to do that.

jesperhodge · 2026-04-21T16:50:22Z

Hi @ormsbee ,

Just so I'm clear: Is the idea that we don't allow tags that differ just by case, i.e. I couldn't have both "Science" and "science" at the same time, but we do want to allow changing case, so "science" can be renamed "Science"?

That is correct, at least according to Jenna's remarks in UAT.
The idea is that we want to have different tags be unique. So tag A cannot have the same value as tag B, regardless of case.

But we should not check that new value of tag A is different than old value of A. That leads to unexpected validation errors, such as the case here where someone wants to rename a tag and he gets an error.

Regardless of whether he should get an error or not (according to UAT and common sense, no), throwing an error that says the new tag value is not unique - when in fact it is unique, just similar to the tag's old value - would be incorrect.

fix: tag rename capitalization

54459f2

openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Apr 20, 2026

openedx-webhooks added this to Contributions Apr 20, 2026

github-project-automation Bot moved this to Needs Triage in Contributions Apr 20, 2026

fix: types

7ed76ba

jesperhodge requested review from kdmccormick and mgwozdz-unicon April 20, 2026 15:04

mgwozdz-unicon approved these changes Apr 20, 2026

View reviewed changes

mgwozdz-unicon requested a review from Copilot April 20, 2026 16:19

Copilot started reviewing on behalf of mgwozdz-unicon April 20, 2026 16:19 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

bradenmacdonald reviewed Apr 21, 2026

View reviewed changes

fix: test

9a55118

jesperhodge requested a review from bradenmacdonald April 21, 2026 17:01

Conversation

jesperhodge commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing Instructions

Uh oh!

openedx-webhooks commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jesperhodge Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ormsbee commented Apr 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jesperhodge commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jesperhodge commented Apr 20, 2026 •

edited

Loading

jesperhodge Apr 21, 2026 •

edited

Loading