Skip to content

OSPOOL-158: Add NRP OSPool EP image builds#308

Open
brianhlin wants to merge 34 commits intoopensciencegrid:mainfrom
brianhlin:OSPOOL-158.nrp-ospool-ep
Open

OSPOOL-158: Add NRP OSPool EP image builds#308
brianhlin wants to merge 34 commits intoopensciencegrid:mainfrom
brianhlin:OSPOOL-158.nrp-ospool-ep

Conversation

@brianhlin
Copy link
Copy Markdown
Member

I bet this will fail, at the very least because the Harbor user for pushing images probably doesn't have permissions to push to the osg-htc project.

@brianhlin brianhlin requested a review from a team as a code owner April 17, 2026 22:58
Copilot AI review requested due to automatic review settings April 17, 2026 22:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new osg-htc/nrp-ospool-ep container image (and supporting init/entrypoint scripts) intended for NRP OSPool EP usage, and updates the GitHub Actions container build workflow to detect images under osg-htc/ in addition to opensciencegrid/.

Changes:

  • Introduces a new nrp-ospool-ep image (Dockerfile + build-config) with custom entrypoint behavior and HTCondor/pilot configuration scripts.
  • Adds wrapper scripts for singularity/apptainer to strip --pid/-p and adjusts runtime validation scripts.
  • Updates .github/workflows/build-containers.yml to include osg-htc/... image directories when building the image list.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
osg-htc/nrp-ospool-ep/Dockerfile Defines the new EP-derived image and wires in runtime/init scripts.
osg-htc/nrp-ospool-ep/build-config.json Declares build matrix inputs (OS/series/repos) for the new image.
osg-htc/nrp-ospool-ep/scripts/entrypoint.sh New entrypoint wrapper to run image-config hooks and supervise the pilot process.
osg-htc/nrp-ospool-ep/scripts/check_master.sh Adds condor_master monitoring/diagnostics for termination/restart handling.
osg-htc/nrp-ospool-ep/scripts/singularity_npid.sh Wrapper to drop --pid/-p and rewrite related flags.
osg-htc/nrp-ospool-ep/scripts/apptainer_npid.sh Wrapper to drop --pid/-p and rewrite related flags.
osg-htc/nrp-ospool-ep/scripts/01_token.sh Exposes the pilot token as an env var for osgvo-pilot.
osg-htc/nrp-ospool-ep/scripts/01_no_condor_host.sh Unsets CONDOR_HOST to satisfy osgvo-pilot expectations.
osg-htc/nrp-ospool-ep/scripts/02_validate_singularity.sh Adds a basic singularity execution validation at init time.
osg-htc/nrp-ospool-ep/scripts/02_validate_apptainer.sh Adds a basic apptainer execution validation at init time.
osg-htc/nrp-ospool-ep/scripts/11_set_OSGInstitutionID.sh Attempts to derive OSG_INSTITUTION_ID from k8s node labels.
osg-htc/nrp-ospool-ep/scripts/19_set_resources.sh Sets advertised CPU/memory/disk/GPU slot/resource config.
osg-htc/nrp-ospool-ep/scripts/20_advertise_glidein.sh Advertises glidein attribute(s) into the pilot config.
osg-htc/nrp-ospool-ep/scripts/20_advertise_k8s_domain.sh Advertises k8s pod/domain/namespace/physical host into the pilot config.
osg-htc/nrp-ospool-ep/scripts/21_advertise_k8s_provisioner.sh Advertises k8s provisioner name/type into the pilot config.
osg-htc/nrp-ospool-ep/scripts/22_set_requirements.sh Adds matching/provisioning requirements expressions into the pilot config.
.github/workflows/build-containers.yml Expands image discovery to include osg-htc/... directories.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +52
if [ $npids -gt 1 ]; then
echo "condor_master restrated" 1>&2
break
fi

if [ $nprocs -ne 1 ]; then
echo "condor_master not running" 1>&2
break
fi
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_master.sh computes nprocs once (line 25) but then uses it inside the monitoring loop (line 49) without recomputing it after each sleep. This means the loop may never detect a restarted/dead condor_master based on process state. Recompute the process check inside the loop (and consider using ps -p "$orgpid" or similar to avoid grep-based matching).

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +20
echo "condor_master restrated at first step" 1>&2
exit 1
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in error message: "restrated" should be "restarted" (appears in multiple messages). Fixing this will make logs easier to search and understand.

Copilot uses AI. Check for mistakes.
trap 'echo signal received!; kill $(jobs -p); wait' SIGINT SIGTERM

export HOME=/pilot
su osg -p -c "/usr/local/sbin/entrypoint.osg.sh $@" &
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

su ... -c "/usr/local/sbin/entrypoint.osg.sh $@" builds a shell command by interpolating the container's arguments into a string. This is fragile (breaks on spaces/quoting) and can allow shell metacharacters in args to be interpreted by the shell invoked by su -c. Prefer passing arguments without re-parsing (e.g., use su's ability to pass additional args to the shell, or switch to runuser/gosu/setpriv so you can exec the target with a proper "$@" argument vector).

Suggested change
su osg -p -c "/usr/local/sbin/entrypoint.osg.sh $@" &
su osg -p -s /bin/bash -c 'exec /usr/local/sbin/entrypoint.osg.sh "$@"' -- "$@" &

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/build-containers.yml
# else do nothing, let Condor figure it out

if [ -f "/usr/bin/apptainer" ]; then
# only test for apptainer functionality if singularity is present
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 02_validate_apptainer.sh, the comment says "only test for apptainer functionality if singularity is present" but the script checks for /usr/bin/apptainer. Update the comment to avoid confusion during debugging.

Suggested change
# only test for apptainer functionality if singularity is present
# only test for apptainer functionality if apptainer is present

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +66
env ACCEPT_JOBS_FOR_HOURS=24

# keep default idle time low, as we may over-provision certain kinds of resources
# but others may be waiting
env ACCEPT_IDLE_MINUTES=20
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dockerfile uses the instruction env (lowercase) to set environment variables. Dockerfile instructions are case-insensitive in some tooling, but env is not a valid Dockerfile directive in the standard parser; this will fail builds on common builders. Use ENV for these variables so the image builds reliably.

Suggested change
env ACCEPT_JOBS_FOR_HOURS=24
# keep default idle time low, as we may over-provision certain kinds of resources
# but others may be waiting
env ACCEPT_IDLE_MINUTES=20
ENV ACCEPT_JOBS_FOR_HOURS=24
# keep default idle time low, as we may over-provision certain kinds of resources
# but others may be waiting
ENV ACCEPT_IDLE_MINUTES=20

Copilot uses AI. Check for mistakes.
tail -100 /pilot/log/MasterLog

echo "=== tail /pilot/log/StartLog"
tail -100 /pilotl/log/StartLog
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diagnostics section tails tail -100 /pilotl/log/StartLog but the rest of the script uses /pilot/log/.... The /pilotl path looks like a typo and will cause the StartLog tail to fail, reducing the usefulness of the debug output.

Suggested change
tail -100 /pilotl/log/StartLog
tail -100 /pilot/log/StartLog

Copilot uses AI. Check for mistakes.
su osg -p -c "/usr/local/sbin/entrypoint.osg.sh $@" &
myproc=$!

# protection in case it does not terminate oby itself when condor dies or restarts
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment typo: "does not terminate oby itself" should be "does not terminate by itself".

Suggested change
# protection in case it does not terminate oby itself when condor dies or restarts
# protection in case it does not terminate by itself when condor dies or restarts

Copilot uses AI. Check for mistakes.
# unless it is already set
#
if [ "x${OSG_INSTITUTION_ID}" == "x" ]; then
OSG_INSTITUTION_ID=`/usr/sbin/kubectl get node ${PHYSICAL_HOSTNAME} -L nautilus.io/OSGInstitutionID | tail -1 | awk '{print $6}'`
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl get node ... | awk '{print $6}' relies on a fixed column number in human-readable output, which is brittle (kubectl output columns can change with versions/flags). Prefer a structured output format (e.g., -o jsonpath=... for the label value) so OSGInstitutionID extraction is stable.

Suggested change
OSG_INSTITUTION_ID=`/usr/sbin/kubectl get node ${PHYSICAL_HOSTNAME} -L nautilus.io/OSGInstitutionID | tail -1 | awk '{print $6}'`
OSG_INSTITUTION_ID=`/usr/sbin/kubectl get node ${PHYSICAL_HOSTNAME} -o jsonpath="{.metadata.labels['nautilus\.io/OSGInstitutionID']}"`

Copilot uses AI. Check for mistakes.
@brianhlin brianhlin force-pushed the OSPOOL-158.nrp-ospool-ep branch from a29deba to 34374da Compare April 17, 2026 23:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

#!/bin/bash

#
# osgvo pilot does not like is CONDOR_HOST is set
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment typo: "does not like is CONDOR_HOST" should be "does not like if CONDOR_HOST".

Suggested change
# osgvo pilot does not like is CONDOR_HOST is set
# osgvo pilot does not like if CONDOR_HOST is set

Copilot uses AI. Check for mistakes.
su osg -p -c "/usr/local/sbin/entrypoint.osg.sh $@" &
myproc=$!

# protection in case it does not terminate oby itself when condor dies or restarts
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "oby" → "on its own" / "by itself".

Suggested change
# protection in case it does not terminate oby itself when condor dies or restarts
# protection in case it does not terminate by itself when condor dies or restarts

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +8
export TOKEN=`cat /etc/condor/tokens.d/prp-wn.token`

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If /etc/condor/tokens.d/prp-wn.token is missing/unreadable, this will silently export an empty TOKEN and continue, which can lead to confusing pilot failures later. Add an explicit existence/readability check and fail fast with a clear error (and prefer $(...) over backticks).

Suggested change
export TOKEN=`cat /etc/condor/tokens.d/prp-wn.token`
token_file=/etc/condor/tokens.d/prp-wn.token
if [ ! -r "$token_file" ]; then
echo "ERROR: required token file '$token_file' is missing or unreadable" >&2
exit 1
fi
export TOKEN=$(cat "$token_file")

Copilot uses AI. Check for mistakes.
Comment on lines 33 to 58
@@ -54,7 +54,7 @@ jobs:
images=$(git diff --name-only \
"$BASE" \
"$GITHUB_SHA" |
egrep "^$ORG_DIR/" |
grep -E "$ORG_DIR_REGEX" |
cut -d/ -f -2 |
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow now includes osg-htc/... images in the build list, but the push job still uses the same Harbor registry (hub.opensciencegrid.org) and robot credentials for all contexts. If context is used as the registry repo name, this will attempt to push into an osg-htc namespace/project that the current robot account may not be authorized for; consider adding per-org registry/credentials routing or excluding osg-htc/* from the push steps until permissions are in place.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +46
echo "condor_master restrated" 1>&2
break
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in error message: "restrated" → "restarted".

Copilot uses AI. Check for mistakes.
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 10

# Add kubectl, to be able to interact with the k8s cluster
RUN curl -L "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" -o /usr/sbin/kubectl && \
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This downloads kubectl from the internet at build time using the moving stable.txt pointer and without checksum/signature verification. Pin the kubectl version (and verify the SHA256, or install via the distro package manager where possible) to reduce supply-chain risk and make builds reproducible.

Suggested change
RUN curl -L "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" -o /usr/sbin/kubectl && \
ARG KUBECTL_VERSION=v1.30.2
ARG KUBECTL_SHA256=4310270c87a4f80637ef6fa9af9d1f703d8d43b7efd2d7661f7902dc0e05251d
RUN curl -fL "https://dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/amd64/kubectl" -o /usr/sbin/kubectl && \
echo "${KUBECTL_SHA256} /usr/sbin/kubectl" | sha256sum -c - && \

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +20
echo "condor_master restrated at first step" 1>&2
exit 1
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in error message: "restrated" → "restarted".

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants