During SNO installation, the PXE boot process failed with the error:
error: ../../grub-core/net/tftp.c:254:file /var/lib/tftpboot/rhcos/kernel not found.
This occurred because the RHCOS kernel and initramfs files were not being downloaded correctly or were missing from the TFTP directory.
download_files.yaml)Changes made:
Key improvements:
- name: Downloading CoreOS kernel
get_url:
url: ""
dest: /var/lib/tftpboot/rhcos/kernel
mode: 0644
owner: dnsmasq
group: dnsmasq
force: true
timeout: 300
register: kernel_download
retries: 3
delay: 10
until: kernel_download is succeeded
- name: Verify kernel was downloaded
stat:
path: /var/lib/tftpboot/rhcos/kernel
register: kernel_stat
failed_when: not kernel_stat.stat.exists or kernel_stat.stat.size < 5000000
verify_rhcos_files.yaml)Purpose: Verify all RHCOS files are present and accessible before proceeding with installation.
Checks performed:
Benefits:
Added verification step after file downloads in main.yaml:
- name: Download OCP files
include_tasks: download_files.yaml
when: day2_workers is not defined
- name: Verify RHCOS files are present and accessible
include_tasks: verify_rhcos_files.yaml
when: day2_workers is not defined
For reference, typical RHCOS file sizes for ppc64le:
Files significantly smaller than these values indicate incomplete downloads.
Simply run the playbook as normal:
cd ansible-bastion
ansible-playbook -i inventory playbooks/main.yaml
The enhanced automation will:
If you already encountered the missing kernel error:
Option 1: Re-run the services setup
cd ansible-bastion
ansible-playbook -i inventory playbooks/step-1-setup-services.yaml
This will re-download and verify all files.
Option 2: Manual fix then continue
# Fix the files manually (see sno-quick-fix.md)
# Then continue from where you left off
cd ansible-bastion
ansible-playbook -i inventory playbooks/step-3-netboot-nodes.yaml
curl -I https://mirror.openshift.com/pub/openshift-v4/ppc64le/dependencies/rhcos/
# If behind a proxy, set environment variables
export http_proxy=http://proxy.example.com:8080
export https_proxy=http://proxy.example.com:8080
df -h /var/lib/tftpboot
df -h /var/www/html
ausearch -m avc -ts recent
The verification task will show exactly which file is missing or too small:
FAILED - RETRYING: Verify all RHCOS files are present
fatal: [localhost]: FAILED! => {
"assertion": "tftp_kernel.stat.exists",
"msg": "RHCOS files are missing or too small. Please check:
- Kernel: False (0.0 MB)
- Initramfs: True (125.5 MB)
- Rootfs: True (450.2 MB)"
}
This tells you exactly what needs to be fixed.
These fixes prevent the issue by:
To test the fixes without running a full installation:
# Test just the services setup
cd ansible-bastion
ansible-playbook -i inventory playbooks/step-1-setup-services.yaml --tags download
# Check the verification output
# You should see:
# - "All RHCOS files verified successfully"
# - File sizes for each component
# - "TFTP access verified"
Potential enhancements for consideration:
The automation now includes:
These changes ensure that RHCOS files are properly downloaded and accessible before attempting PXE boot, preventing the “kernel not found” error.