How to check Ext4 errors in Nutanix Cluster CVM
This article gives you more clarity to check ext4 issues under controller VM in Nutanix.
Follow the below steps to check Ext4 file system errors.
1. The NCC health check fs_inconsistency_check verifies whether any CVM (Controller VM) in the cluster is experiencing filesystem inconsistencies by checking for EXT4-fs error/warning messages in dmesg and scanning tune2fs output for all mounted disks.
Note : Starting from the NCC 4.x version, this check will generate the alert A3038 after 1 concurrent failure across scheduled intervals.
Or
2. There is another way to run an NCC health check by clicking “Run NCC Check “ under health section
Check the outcome weather it “pass/warning or Critical”.
If the check results in a PASS, there are no filesystem inconsistencies detected. No action needs to be taken.
For Status: WARN
Note: From NCC 4.5.0, the severity is changed to FAIL. The end-user will experience "Critical" alert on the UI and "FAIL" status on the CLI when the check fails.
1. Open the putty session and enter the impacted CVM IP address. The below screen will come.
2. Enter the below command to check the mount point which has an error. The impacted disk will have more than zero counts.
3. The following command can be used to collect the needed info from all of the CVMs and should find the disk SN outputs/mount point:
nutanix@CVM:~$ for j in `svmips`; do for i in `sudo blkid | awk -F : '!/xfs|iso9660/{print $1}'`;do echo $i;sudo tune2fs -l $i | egrep 'Filesystem state|Last checked|Maximum|error|orphan|mounted';done ; done
Examples for a good result:
/dev/sdb1
Last mounted on: <not available>
Filesystem state: clean
Maximum mount count: -1
Last checked: Mon Feb 27 12:09:08 2023
Last checked: Mon Feb 27 12:09:14 2023
To check affected disk try this command “ for i in `sudo blkid | awk -F : '!/xfs|iso9660/{print $1}'`;do echo $i;sudo tune2fs -l $i | egrep 'Filesystem state|Last checked|Maximum|error|orphan';done”
Above command process that, the disk has some issues. Either open a case with the Hardware Vendor or raise a trouble ticket with Nutanix for further analysis. You can try with CVM reboot process if this rebuilds the file system or mount and un-mount the impacted disk. In this way, the system will format the disk and build the file system again.
In case if above steps are not clear, check the Nutanix KB Article 8514