Skip to content

Common Issues and Solutions

James McMullan edited this page Aug 7, 2024 · 25 revisions

HPCC4J users should review this list of common issues and solutions before engaging the development team. If a specific issue isn't addressed in this list, feel free to report issues for the development team via our Jira system (Make sure to specify the project name: "JAPI" when creating new Jira issue).

Incompatible Wrapper Types Java build-time error

'Incompatible Wrapper Types Java build-time error'

  • Axis2 generated client code can unexpectedly change class name in rare situations
  • It's been noticed that Axis2 generated stub code representing repeating (array) fields might utilize the following naming scheme <fieldName>_type<X>
  • Where X is a counter which is incremented for each distinct field in the service with shared name and the index assigned to a given field can fluctuate.
  • For instance in wsclient prior to 9.0.x, org/hpccsystems/ws/client/wrappers/gen/wsworkunits/Activities_type0Wrapper.java represented container of WUDetailsActivityInfoWrapper, but after 9.0.x it represents a container of WUResponseScopeWrapper instead.
  • Work-around: Users are asked to update their code to match the new wrapper class naming.
  • Please track proper solution here: https://track.hpccsystems.com/browse/JAPI-516
DFSClient read fails: "Read failure for X copy locations"...

DFSClient read fails: "Read failure for X copy locations"... "createRemoteActivity: unathorized"

  • DFSClient actions are reliant on properly configured "ClusterGroup" Keys on the target HPCC Systems cluster.
  • User should contact the HPCC cluster admin and ensure the Keys section is correctly configured
  • Sample Keys configuration element:
<Keys>
    <ClusterGroup keyPairName="mythor" name="mythor"/>
    <ClusterGroup keyPairName="mythor" name="myroxie"/>
    <ClusterGroup keyPairName="mythor" name="hthor__myeclagent"/>```
    <KeyPair name="mythor" privateKey="/home/hpcc/certificate/key.pem" publicKey="/home/hpcc/certificate/public.key.pem"/>
</Keys>
  • Common issues include: Invalid keypairs, inconsistent keypairs across all nodes, invalid keypair paths, no keypair associated with target ClusterGroup
Create access denied for scope

'Create access denied for scope' error received from ESP

  • This situation usually arises when the caller does not have appropriate file scope access
  • Contact the administrator responsible for the target HPCC cluster
  • It is possible file permissions are not shared across environments. See conversation in below Jira: https://track.hpccsystems.com/browse/JAPI-456
Code: '8029' Assertion Error

Assertion Errors: Such as: Code: '8029' Message: 'ERROR: cmd=RFCStreamRead, error=Internal Error (3000, assert(required <= maxAvailable()) failed - file: rtlcommon.hpp, line 137) '

  • These are generally caused by a mismatch between the provided 'source record layout' and the record layout on disk.
  • Note this is the 'source record layout' not the projected record layout
  • First step in debugging this issue should be to use the DFSClient FileUtility to read the individual partitions that are failing. If the FileUtility is able to read the file parts then the issue is likely in the client's meta-data handling code.
Unable to read next record:

Unable to read next record: java.util.NoSuchElementException: Fatal read error: Failed to parse next record: Error while parsing field: vault_rid of type: INTEGER:

  • This error represents a misalignment within the read stream, connectivity issues to dafilesrv, or a problem with the file itself.
  • First step in debugging this issue should be to use the FileUtility to read the problematic file parts. If the FileUtility is able to read the file this points to a misalignment in the read stream. This has occurred in the past due to invalid record layout meta-data and / or a single reader being accessed from multiple threads without a synchronization mechanism.
  • If the FileUtility fails to read the file then the next debugging step should be to check the dafilesrv logs for potential crashes / restarts and verify the file can be read in its entirety from within HPCC.
Missing records when reading is finished

Missing records when reading is finished

  • If an error is encountered during the read process, such as a network connectivity issue, a partial read of a file will occur.
  • However, an exception will be thrown when this occurs; if an exception is thrown during the read process of a file that file read should be restarted via HPCCRemoteFileReader's resume support. See: HPCCRemoteFileReader.getFileReadResumeInfo().
  • Note: Internally if a connectivity issue occurs the reader will attempt to re-establish a connection 3 times before it gives up and throws an error
  • If resuming the read also fails the read records should be discarded and the file part in question should be read from the beginning.
Long type overflow error message

Long type overflow error Message

  • HPCC has support for unsigned8 support, but Java does not have a native type that would support that entire range unsigned.
  • However, most of the time unsigned8's are used as unique IDs.
  • In which case it is ok if those IDs are negative so by default we put unsigned8's in a Java Long, and warn the user when a value overflows and becomes negative.
  • There is an option to treat unsigned8's as BigIntegers
Authentication Issue

Authentication Issue: Received ERROR from Thor node (x.y.z.w): Code: '-11' Message: 'ERROR: cmd=RFCStreamRead, error=RFSERR_Unknown (4294967285, createRemoteActivity: unathorized)

  • Usually bad user credentials, could also be misconfigured Keys see below
Keys Issue

Keys Issue: EspException: Audience: user Source: CSoapResponseBinding Message: 2020-08-11 17:32:32 GMT: error:00000000:lib(0):func(0):reason(0) - CLoadedPrivateKeyFromFile: failed to open private key Received ERROR from Thor node (10.173.14.200): Code: '-11' Message: 'ERROR: cmd=RFCStreamRead, error=RFSERR_Unknown (4294967285, createRemoteActivity: unathorized)_ : Failure to read hpcc file part>thor_data400::in::uccv2::20190809::dnb::debtor-400 copy locations: {10.173.14.200, 10.173.14.101} :7601

  • Generally caused by misconfigured keys in the HPCC config for the Thor cluster begin acccessed
Timeout issue

Timeout issue: java.lang.AssertionError: Failed to write file with error: Row service returned error code: '-7' Message: 'ERROR: cmd=RFCStreamGeneral, error=RFSERR_Unknown (4294967289, createRemoteActivity: authorization expired)

  • This is caused by read / write token expiring before it is used.
  • Increasing the timeout is not recommended, best solution is to warm start connections to each file part before reading / writing. That way the tokens get used to start the connections before they expire.
ReadTimeout Issue

ReadTimeout Issue: Error while attempting to read row service response

  • This issue occurs when the row service doesn't respond within the expected timeout for a request. (Default 15s)
  • There was an issue on the HPCC platform side that caused this issue to occur on baremetal for version 9.6.4-9.6.24 and on Kubernetes clusters or row services configured with SSL on version upto 9.8.8
  • If your cluster is not one of the above affected version try increasing the socket operation timeout used in HPCCRemoteFileReader / HPCCRemoteFileWriter
SSL Issue

SSL Issue

  • If the cluster is using https with a self-signed certificate. Any computers using HPCC4j to communicate with the cluster will need to add the SSL cert to their trusted key store:
  • keytool -import -storepass changeit -noprompt -alias hpccsystems -file /home/hpcc/certificate/certificate.pem -keystore /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts
Certificate for X.Y.Z.V doesn't match any of the subject alternative names: [some.domain]

Certificate for <X.Y.Z.V> doesn't match any of the subject alternative names: [some.domain]

  • SSL based communication between WsClient/DFSClient/Spark-HPCC and HPCC can be affected by invalid security certificates or mismatched connection target. The error message signifies mismatch between certificate SAN and connection target.
  • Ensure the connection target address is covered by the certificate SAN
  • This could require the use of appropriate hostname, and/or addition of said host name mapping in certificate SAN section
Timeouts during Writing or Reading

Timeouts during Writing or Reading

  • When the initial request to either begin reading or begin writing an HPCC File is sent, an access token is created that will timeout by default after a few minutes.
  • A connection to each of the nodes in the remote cluster must be opened up before the access token has expired.
  • An HpccRemoteFileReader or HpccRemoteFileWriter should be created for each DataPartition before any records are read / written; This will establish the needed connections before the access token expires.
Sporadic "Unable to parse next record" errors while reading

Sporadic "Unable to parse next record" errors while reading

  • Since this error is occurring sporadically the file being read is not corrupt, but some other issue is causing the record parser state to become invalid.
  • This can occur if an HpccRemoteFileReader is accessed from multiple threads
Missing function signatures on JDK 8

Missing function signatures on JDK 8

  • Prior to 9.2.x the HPCC4j libraries were compiled for Java 8 but linked against the development environment JDK version on our build server, which was JDK 11. This generally doesn't cause any issues, but can cause an issue if a function signature changed between JDK 8 & 11, ByteBuffer as an example of had function signature changes. If you encounter this issue the easiest solution is to upgrade to version 9.2.x.
DocumentBuilder Feature Secure Processing not supported

DocumentBuilder Feature Secure Processing not supported

  • The HPCC4j project requires certain security features to be available within the javax.xml.parsers.* implementation to avoid security vulnerabilities. If your project overrides the default javax.xml.parsers implementation it will need to provide an implementation that has the support for the standard security features. It is also possible that there could be a conflict between dependencies that causes this issue in which case excluding the conflicting dependency.
  • Example of a conflicting pom configuration coming from Spark and excluding the dependency:
    <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_${scala.binary.version}</artifactId>
    <version>${spark.runtime.version}</version>
    <exclusions>
    <exclusion>
    <groupId>xerces</groupId>
    <artifactId>xercesImpl</artifactId>
    </exclusion>
    <exclusion>
    <groupId>xerces</groupId>
    <artifactId>xmlParserAPIs</artifactId>
    </exclusion>
    </exclusions>
    <scope>provided</scope>
    </dependency>
Signature Verification Failed or Failure to read / write individual file parts

Signature Verification Failed or Failure to read / write individual file parts

  • Writing or reading individual file parts may fail if the signing keys are invalid on a particular node or nodes. This is typically due to an issue that occurs during installation, and can be corrected by updating the keys on the problematic nodes or all keys across the HPCC cluster.
Publish failed due to missing file parts

Publish failed due to missing file parts

  • This issue could be due to an issue like the signature verification failure above, but could also be a problem on the client application side where an error caused a particular file part to not be written.
  • The HPCC4j logs contain file read / write stop messages that can be used to debug if file parts are being correctly opened and closed. This should be the first place to start when debugging this issue. Additional information such as record layout used to write the file parts is also available when increasing the log level to trace.
HTTPS hostname wrong: should be <some_correct_domain.net>

HTTPS hostname wrong: should be <some_correct_domain.net>

  • This issue could be due to a mismatch between the domain name listed on the SSL certificate of the HPCC systems cluster, or it could be related to a domain name that is invalid.
  • Note some version of Java 8 incorrectly treated underscores in domain names as invalid characters, upgrading to a patched version of the Java runtime should correct this issue.
Landing Zone Regression: Affecting Versions: 9.0.0-9.0.52, 9.2.0-9.2.30, 9.4.0-9.4.4

Landing Zone Regression: Affecting Versions: 9.0.0-9.0.52, 9.2.0-9.2.30, 9.4.0-9.4.4

  • A regression in ESP caused some HPCC4j features to begin failing in the above versions. The features that were affected are those that utilize landing zones such as the WSFileIO and FileSpray APIs. Upgrading the HPCC platform version to the latest point version will address these issues. Note: When running the 9.4.8+ HPCC4j unit tests against versions within the above range expect test failures.
DFSClient File Read Failure on Indexes

DFSClient File Read Failure on Indexes

  • A bug in the HPCC platform dafilesrv process is causing the process to crash when HPCC4j attempts to read the TLK file part of some index files. The TLK file part is used to do localized filtering of partitions within HPCC4j, but is not needed to successfully read the entire index file. An option was added to HPCC4j 9.4.8 to turn off the TLK prefetch and allow the file read to continue without utilizing the TLK. This prevents the dafilesrv process from crashing and allows for reading of the entire index without error. Note: When turning of the TLK feature filtering will not occur and the entire dataset will be returned regardless of any provided filter. See DFSIndexTest.tlkBypassTest for example usage.
Clone this wiki locally