Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support schedule only executes once in random one cluster when it's deployed in k8s clusters mode. #61

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

oneWalker
Copy link

@oneWalker oneWalker commented Nov 12, 2024

Checklist
  • npm test passes with github actions
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)
Description of change

support schedule only executes once when it's deployed in k8s clusters mode.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a locking mechanism for distributed scheduling in clustered environments using Redis.
    • Added new configuration options for cluster settings, including properties for enabling clustering and Redis configuration.
  • Bug Fixes

    • Enhanced locking mechanisms to prevent concurrent executions in clustered environments.
  • Documentation

    • Updated README.md with detailed instructions for configuring Redis clustering and new settings.
  • Chores

    • Added ioredis as a new dependency for Redis functionality.
    • Introduced mocha for improved testing capabilities.

Copy link

coderabbitai bot commented Nov 12, 2024

Walkthrough

The changes in this pull request introduce new clustering capabilities to the egg-schedule project by integrating Redis for scheduling management. Modifications include updates to the README.md for configuration instructions, the addition of a Redis client in agent.js, and new configuration properties in config/config.default.js. The AllStrategy and WorkerStrategy classes have been updated to implement asynchronous handlers with locking mechanisms using Redis. Additionally, the package.json file has been updated to include the ioredis dependency and a testing framework.

Changes

File Change Summary
README.md Added new configuration section for clustering, detailing cluster object and Redis settings.
config/config.default.js Introduced new properties under config.schedule: cluster, enable, lockType, lockedTtl, redis, client, port, host, password, db, default, and prefix.
agent.js Imported RedisLock, added lockManager, and modified agent.beforeStart to utilize Redis locking.
lib/strategy/all.js Updated handler() method to be asynchronous and added logic for Redis locking mechanism.
lib/strategy/worker.js Renamed class to AllStrategy, updated handler() method to be asynchronous and added logic for Redis locking mechanism.
lib/lock/base.js Introduced LockManager class with methods for managing locks.
lib/lock/redis_lock.js Added RedisLock class extending LockManager, implementing Redis-based lock management.
package.json Added "ioredis": "^5.4.1" as a dependency and "mocha": "^10.8.2" as a development dependency.

Poem

🐇 In the garden of code, changes bloom bright,
Redis now dances, bringing clusters to light.
With locks in the night, our tasks shall not clash,
Scheduling’s sweet rhythm, a well-timed flash!
So hop along, friends, let’s celebrate this feat,
In the world of egg-schedule, our work is complete! 🌼

Warning

There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.

🔧 eslint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

warning [email protected]: This version is no longer supported. Please see https://eslint.org/version-support for other options.
warning eslint > @humanwhocodes/[email protected]: Use @eslint/config-array instead
warning eslint > @humanwhocodes/config-array > @humanwhocodes/[email protected]: Use @eslint/object-schema instead
warning eslint > file-entry-cache > flat-cache > [email protected]: Rimraf versions prior to v4 are no longer supported
warning eslint > file-entry-cache > flat-cache > rimraf > [email protected]: Glob versions prior to v9 are no longer supported
warning eslint > file-entry-cache > flat-cache > rimraf > glob > [email protected]: This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.
error [email protected]: The engine "node" is incompatible with this module. Expected version "^14 || ^16 || ^17 || ^18 || ^19". Got "22.9.0"
error Found incompatible module.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🧹 Outside diff range and nitpick comments (4)
agent.js (1)

15-20: Add Redis health check mechanism.

Implement a health check mechanism to ensure Redis connection stability.

Consider implementing a periodic health check:

    if (
      agent.config.schedule?.cluster?.enable &&
      agent.config.schedule?.cluster?.redis
    ) {
      agent.redisClient = new Redis(agent.config.schedule.cluster.redis);
+      
+      // Periodic health check
+      const healthCheck = async () => {
+        try {
+          await agent.redisClient.ping();
+        } catch (err) {
+          agent.logger.error('[egg-schedule] Redis health check failed:', err);
+        }
+      };
+      
+      const HEALTH_CHECK_INTERVAL = 30000; // 30 seconds
+      const timer = setInterval(healthCheck, HEALTH_CHECK_INTERVAL);
+      
+      // Clean up timer on agent close
+      agent.beforeClose(async () => {
+        clearInterval(timer);
+      });
    }
README.md (2)

277-279: Clarify prefix and default configuration usage

The documentation should explain:

  1. The purpose and format of the default setting
  2. How the prefix is used in Redis key generation

Add the following explanation:

- `default`: The unique identifier for your application (e.g., 'my-app-name'). Used to prevent key conflicts when multiple applications share the same Redis instance.
- `prefix`: The prefix used for Redis lock keys. The final key format will be `{prefix}:{default}:{schedule-name}`.

264-279: Add troubleshooting section for cluster mode

The documentation should include common issues and solutions when using cluster mode.

Add the following section:

#### Troubleshooting Cluster Mode

Common issues and solutions:

1. **Multiple Task Executions**
   - Verify Redis connection settings
   - Check if lock TTL matches your task duration
   - Ensure clock synchronization across nodes

2. **Tasks Not Executing**
   - Confirm Redis connectivity from all nodes
   - Verify Redis credentials and permissions
   - Check for Redis connection timeouts

3. **Lock Not Released**
   - Implement proper error handling in tasks
   - Consider setting up lock timeouts
   - Monitor Redis memory usage
lib/strategy/worker.js (1)

12-20: Use the path module for cross-platform path handling

Using string manipulation for paths may cause issues on different operating systems (e.g., Windows uses backslashes). Use the Node.js path module to handle paths reliably across platforms.

Apply this diff:

+ const path = require('path');
  const projectName =
    curConfig.default === 'default'
-     ? this.agent.baseDir.split('/').pop()
+     ? path.basename(this.agent.baseDir)
      : curConfig.default;
  const prefix = curConfig.prefix;
- lockedKey = `${projectName}-${prefix}-${this.key.replace(
-   this.agent.baseDir,
-   ''
- )}`;
+ const relativeKey = path.relative(this.agent.baseDir, this.key);
+ lockedKey = `${projectName}-${prefix}-${relativeKey}`;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 78db046 and cf4792b.

📒 Files selected for processing (6)
  • README.md (1 hunks)
  • agent.js (2 hunks)
  • config/config.default.js (1 hunks)
  • lib/strategy/all.js (1 hunks)
  • lib/strategy/worker.js (1 hunks)
  • package.json (2 hunks)
🔇 Additional comments (4)
config/config.default.js (2)

16-28: Verify k8s deployment resilience

The cluster configuration needs additional consideration for k8s deployments:

  1. Ensure Redis connection is resilient to pod restarts
  2. Consider adding Redis sentinel support for HA deployments
  3. Document how the scheduler behaves during pod initialization and termination

Let's verify the k8s deployment considerations:

#!/bin/bash
# Check for k8s lifecycle handling
echo "Checking for pod lifecycle handlers:"
rg "beforeExit|beforeClose|beforeStop" -A 5

# Check for Redis connection error handling
echo "Checking Redis error handling:"
ast-grep --pattern 'catch ($_) {
  $$$
}'

Consider documenting k8s deployment best practices in README.md, including:

  1. Recommended pod anti-affinity rules
  2. Resource requests/limits for Redis
  3. Network policy recommendations

27-28: Document the purpose of new schedule properties

The new properties default and prefix lack documentation explaining their purpose and impact on the scheduling system.

Let's verify how these values are used:

Please add JSDoc comments explaining:

  1. The purpose of each property
  2. How they affect the Redis-based locking mechanism
  3. Whether they can be customized and any constraints on their values
agent.js (1)

5-5: LGTM! Verify ioredis dependency.

The choice of ioredis is appropriate for Redis cluster support.

✅ Verification successful

ioredis dependency is properly declared in package.json.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify ioredis dependency is properly declared
grep -r "ioredis" package.json

Length of output: 55

package.json (1)

29-29: Verify ioredis compatibility

The addition of ioredis for cluster management is appropriate, but let's verify the version compatibility.

✅ Verification successful

ioredis@^5.4.1 is compatible with the project.

  • Supported Node.js version: >=14.17.0
  • No known security vulnerabilities.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check ioredis version compatibility and security advisories

# Check npm for latest versions and compatibility
npm view ioredis@^5.4.1 engines.node

# Check for known vulnerabilities
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: NPM, package: "ioredis") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

Length of output: 391


Script:

#!/bin/bash
# Description: Retrieve the project's Node.js version

# Check the Node.js version specified in package.json
if [ -f package.json ]; then
  jq -r '.engines.node // empty' package.json
fi

# If not found in package.json, check for a .nvmrc file
if [ -f .nvmrc ]; then
  cat .nvmrc
fi

Length of output: 101

Comment on lines 16 to 26
cluster: {
enable: false,
redis: {
client: {
port: 6379, // Redis port
host: '127.0.0.1', // Redis host
password: 'auth',
db: 0,
},
},
},
Copy link

@coderabbitai coderabbitai bot Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Security and k8s deployment concerns in Redis configuration

The Redis configuration has several issues that need to be addressed:

  1. Hardcoded credentials in default configuration pose a security risk
  2. The default host 127.0.0.1 may not work in k8s clusters
  3. Missing important Redis connection settings (timeout, retry policy, TLS)

Consider this revised configuration:

     cluster: {
       enable: false,
       redis: {
         client: {
-          port: 6379, // Redis port
-          host: '127.0.0.1', // Redis host
-          password: 'auth',
-          db: 0,
+          port: process.env.REDIS_PORT || 6379,
+          host: process.env.REDIS_HOST || '127.0.0.1',
+          password: process.env.REDIS_PASSWORD,
+          db: process.env.REDIS_DB || 0,
+          connectTimeout: 5000,
+          retryStrategy: (times) => Math.min(times * 1000, 3000),
+          tls: process.env.REDIS_TLS === 'true' ? {} : undefined,
         },
       },
     },

Additionally:

  1. Document these environment variables in README.md
  2. Consider using k8s secrets for Redis credentials
  3. Add validation for these configuration values
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cluster: {
enable: false,
redis: {
client: {
port: 6379, // Redis port
host: '127.0.0.1', // Redis host
password: 'auth',
db: 0,
},
},
},
cluster: {
enable: false,
redis: {
client: {
port: process.env.REDIS_PORT || 6379,
host: process.env.REDIS_HOST || '127.0.0.1',
password: process.env.REDIS_PASSWORD,
db: process.env.REDIS_DB || 0,
connectTimeout: 5000,
retryStrategy: (times) => Math.min(times * 1000, 3000),
tls: process.env.REDIS_TLS === 'true' ? {} : undefined,
},
},
},

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

egg-plugin is a special one. if you want to use process.env, you should modify it in config.{default}.js in your project.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oneWalker, thank you for the clarification. I understand that in egg-plugin development, it's not appropriate to use process.env in the plugin's config/config.default.js. Instead, environment-specific configurations should be set in config.{env}.js in the user's project.


✏️ Learnings added
Learnt from: oneWalker
PR: eggjs/egg-schedule#61
File: config/config.default.js:16-26
Timestamp: 2024-11-12T19:37:36.347Z
Learning: In egg-plugin development, avoid using `process.env` in the plugin's `config/config.default.js` file. Environment-specific configurations should be set in the user's project `config/config.{env}.js` files.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

agent.js Outdated
Comment on lines 15 to 20
if (
agent.config.schedule?.cluster?.enable &&
agent.config.schedule?.cluster?.redis
) {
agent.redisClient = new Redis(agent.config.schedule.cluster.redis);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add Redis connection cleanup.

The Redis client connection should be properly closed when the agent stops to prevent resource leaks.

Add cleanup code in the agent's lifecycle:

+ agent.beforeClose(async () => {
+   if (agent.redisClient) {
+     await agent.redisClient.quit();
+     agent.redisClient = null;
+   }
+ });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (
agent.config.schedule?.cluster?.enable &&
agent.config.schedule?.cluster?.redis
) {
agent.redisClient = new Redis(agent.config.schedule.cluster.redis);
}
if (
agent.config.schedule?.cluster?.enable &&
agent.config.schedule?.cluster?.redis
) {
agent.redisClient = new Redis(agent.config.schedule.cluster.redis);
}
agent.beforeClose(async () => {
if (agent.redisClient) {
await agent.redisClient.quit();
agent.redisClient = null;
}
});

⚠️ Potential issue

Add error handling for Redis connection.

The Redis client initialization lacks error handling which could cause silent failures in production.

Apply this diff to add proper error handling:

    if (
      agent.config.schedule?.cluster?.enable &&
      agent.config.schedule?.cluster?.redis
    ) {
-      agent.redisClient = new Redis(agent.config.schedule.cluster.redis);
+      try {
+        agent.redisClient = new Redis(agent.config.schedule.cluster.redis);
+        agent.redisClient.on('error', err => {
+          agent.logger.error('[egg-schedule] Redis client error:', err);
+        });
+      } catch (err) {
+        agent.logger.error('[egg-schedule] Failed to initialize Redis client:', err);
+        throw err;
+      }
    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (
agent.config.schedule?.cluster?.enable &&
agent.config.schedule?.cluster?.redis
) {
agent.redisClient = new Redis(agent.config.schedule.cluster.redis);
}
if (
agent.config.schedule?.cluster?.enable &&
agent.config.schedule?.cluster?.redis
) {
try {
agent.redisClient = new Redis(agent.config.schedule.cluster.redis);
agent.redisClient.on('error', err => {
agent.logger.error('[egg-schedule] Redis client error:', err);
});
} catch (err) {
agent.logger.error('[egg-schedule] Failed to initialize Redis client:', err);
throw err;
}
}

🛠️ Refactor suggestion

Add Redis configuration validation and connection options.

The Redis configuration should be validated and include important connection options.

Apply this diff to add validation and connection options:

    if (
      agent.config.schedule?.cluster?.enable &&
      agent.config.schedule?.cluster?.redis
    ) {
+      const redisConfig = agent.config.schedule.cluster.redis;
+      if (!redisConfig.host || !redisConfig.port) {
+        throw new Error('[egg-schedule] Redis configuration must include host and port');
+      }
+      
+      const redisOptions = {
+        retryStrategy: times => Math.min(times * 100, 3000),
+        maxRetriesPerRequest: 3,
+        enableReadyCheck: true,
+        ...redisConfig,
+      };
+      
-      agent.redisClient = new Redis(agent.config.schedule.cluster.redis);
+      agent.redisClient = new Redis(redisOptions);
    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (
agent.config.schedule?.cluster?.enable &&
agent.config.schedule?.cluster?.redis
) {
agent.redisClient = new Redis(agent.config.schedule.cluster.redis);
}
if (
agent.config.schedule?.cluster?.enable &&
agent.config.schedule?.cluster?.redis
) {
const redisConfig = agent.config.schedule.cluster.redis;
if (!redisConfig.host || !redisConfig.port) {
throw new Error('[egg-schedule] Redis configuration must include host and port');
}
const redisOptions = {
retryStrategy: times => Math.min(times * 100, 3000),
maxRetriesPerRequest: 3,
enableReadyCheck: true,
...redisConfig,
};
agent.redisClient = new Redis(redisOptions);
}

@@ -36,7 +37,8 @@
"egg-mock": "^5.3.0",
"egg-tracer": "^1.1.0",
"eslint": "^8.29.0",
"eslint-config-egg": "^12.1.0"
"eslint-config-egg": "^12.1.0",
"mocha": "^10.8.2"
Copy link

@coderabbitai coderabbitai bot Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect mocha version number

The specified mocha version ^10.8.2 is invalid as the latest stable version of mocha is 10.2.0. This could be causing the npm test failures mentioned in the PR description.

Apply this diff to fix the version:

-    "mocha": "^10.8.2"
+    "mocha": "^10.2.0"

Additionally, since the project already uses egg-bin for testing, please verify if adding mocha directly is necessary.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"mocha": "^10.8.2"
"mocha": "^10.2.0"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't pass the ci in node 14 env without mocha module.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oneWalker, thank you for the clarification. I've noted that the CI in Node 14 environments requires the mocha module to pass.


✏️ Learnings added
Learnt from: oneWalker
PR: eggjs/egg-schedule#61
File: package.json:41-41
Timestamp: 2024-11-12T19:35:13.420Z
Learning: In the `egg-schedule` project, adding `mocha` as a devDependency is necessary for the CI to pass in Node.js 14 environments, even when `egg-bin` is used for testing.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

README.md Outdated
Comment on lines 264 to 279
// add it when making sure it only running in one cluster.
cluster: {
enable: true,
// add redis for redis lock
redis: {
client: {
port: 6379, // Redis port
host: '127.0.0.1', // Redis host
password: 'auth',
db: 0,
},
},
},
//the prefix for lockedKey for redis lock
default: 'default', // default schedule name,like project-name.
prefix: 'schedule', // default schedule prefix
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Document cluster mode behavior and configuration

The current documentation lacks essential information about the cluster mode. Consider adding:

  1. Explanation of how the Redis-based locking mechanism works
  2. Impact on task execution in a Kubernetes environment
  3. Best practices for high availability setups

Add the following section before the configuration example:

### Cluster Mode

When running in a Kubernetes cluster or any distributed environment, you might want to ensure that scheduled tasks execute only once across all instances. The cluster mode uses Redis for distributed locking to prevent duplicate task execution.

#### How it works
- When a task is scheduled to run, the scheduler attempts to acquire a Redis lock
- Only one instance in the cluster will successfully acquire the lock and execute the task
- The lock is automatically released after task completion or on failure

#### Configuration
The cluster mode requires Redis for coordination. Here's a complete configuration example:

README.md Outdated
Comment on lines 264 to 276
// add it when making sure it only running in one cluster.
cluster: {
enable: true,
// add redis for redis lock
redis: {
client: {
port: 6379, // Redis port
host: '127.0.0.1', // Redis host
password: 'auth',
db: 0,
},
},
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enhance security and flexibility of Redis configuration

The Redis configuration example shows hardcoded credentials, which is not a recommended practice. Consider:

  1. Using environment variables for sensitive data
  2. Adding SSL/TLS configuration options
  3. Documenting connection retry strategies

Here's an improved configuration example:

 cluster: {
   enable: true,
   redis: {
     client: {
-      port: 6379,
-      host: '127.0.0.1',
-      password: 'auth',
-      db: 0,
+      port: process.env.REDIS_PORT || 6379,
+      host: process.env.REDIS_HOST || '127.0.0.1',
+      password: process.env.REDIS_PASSWORD,
+      db: process.env.REDIS_DB || 0,
+      tls: process.env.REDIS_TLS === 'true' ? {} : undefined,
+      retryStrategy: (times) => Math.min(times * 50, 2000),
     },
   },
 },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// add it when making sure it only running in one cluster.
cluster: {
enable: true,
// add redis for redis lock
redis: {
client: {
port: 6379, // Redis port
host: '127.0.0.1', // Redis host
password: 'auth',
db: 0,
},
},
},
// add it when making sure it only running in one cluster.
cluster: {
enable: true,
// add redis for redis lock
redis: {
client: {
port: process.env.REDIS_PORT || 6379,
host: process.env.REDIS_HOST || '127.0.0.1',
password: process.env.REDIS_PASSWORD,
db: process.env.REDIS_DB || 0,
tls: process.env.REDIS_TLS === 'true' ? {} : undefined,
retryStrategy: (times) => Math.min(times * 50, 2000),
},
},
},

Comment on lines 12 to 15
const projectName =
curConfig.default === 'default'
? this.agent.baseDir.split('/').pop()
: curConfig.default;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use the path module for reliable path handling across platforms.

The current code uses string manipulation to extract the project name from this.agent.baseDir, which may cause issues on Windows systems due to different path separators. Using path.basename() from Node.js's path module ensures consistent behavior across all operating systems.

Import the path module and modify the code:

+ const path = require('path');

      const projectName =
        curConfig.default === 'default'
-         ? this.agent.baseDir.split('/').pop()
+         ? path.basename(this.agent.baseDir)
          : curConfig.default;

Committable suggestion skipped: line range outside the PR's diff.

if (await this.agent.redisClient.get(lockedKey)) {
isLocked = true;
}
await this.agent.redisClient.set(lockedKey, true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Set an expiration on the Redis lock key to prevent stale locks

If the process crashes before deleting the lock key, the lock may remain indefinitely, blocking future executions. Consider setting an expiration time on the Redis key to avoid stale locks.

Apply this diff:

- await this.agent.redisClient.set(lockedKey, true, 'NX');
+ const lockExpirationTime = 60; // lock expires in 60 seconds
+ const lockSet = await this.agent.redisClient.set(lockedKey, true, 'NX', 'EX', lockExpirationTime);

Ensure that lockExpirationTime is set to an appropriate value based on the expected execution time of the task.

Also applies to: 31-33

Comment on lines 31 to 33
if (curConfig.cluster.enable) {
await this.agent.redisClient.del(lockedKey);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consistently use optional chaining for curConfig.cluster

To prevent potential runtime errors when curConfig or curConfig.cluster is undefined, use optional chaining when accessing curConfig.cluster.enable.

Apply this diff:

- if (curConfig.cluster.enable) {
+ if (curConfig?.cluster?.enable) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (curConfig.cluster.enable) {
await this.agent.redisClient.del(lockedKey);
}
if (curConfig?.cluster?.enable) {
await this.agent.redisClient.del(lockedKey);
}

Comment on lines 21 to 24
if (await this.agent.redisClient.get(lockedKey)) {
isLocked = true;
}
await this.agent.redisClient.set(lockedKey, true);
Copy link

@coderabbitai coderabbitai bot Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use atomic Redis operations to prevent race conditions

Currently, there is a potential race condition between the GET and SET operations. To ensure that only one process can acquire the lock at a time, use an atomic Redis SET operation with the NX (set if not exists) option.

Apply this diff to fix the issue:

- if (await this.agent.redisClient.get(lockedKey)) {
-   isLocked = true;
- }
- await this.agent.redisClient.set(lockedKey, true);
+ const lockSet = await this.agent.redisClient.set(lockedKey, true, 'NX');
+ if (!lockSet) {
+   isLocked = true;
+ }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (await this.agent.redisClient.get(lockedKey)) {
isLocked = true;
}
await this.agent.redisClient.set(lockedKey, true);
const lockSet = await this.agent.redisClient.set(lockedKey, true, 'NX');
if (!lockSet) {
isLocked = true;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it ensures the setting in clusters should work When lockedKey is not occupied.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oneWalker, understood. Thank you for the clarification.

Comment on lines 27 to 33
if (!isLocked) {
this.sendOne();
}

if (curConfig.cluster.enable) {
await this.agent.redisClient.del(lockedKey);
}
Copy link

@coderabbitai coderabbitai bot Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure Redis lock key is released even if an exception occurs

To prevent the lock from remaining if an exception occurs during task execution, wrap the critical section in a try...finally block. This ensures that the lock key is deleted even if an error is thrown.

Apply this diff:

if (!isLocked) {
-  this.sendOne();
+  try {
+    await this.sendOne();
+  } finally {
+    if (curConfig?.cluster?.enable) {
+      await this.agent.redisClient.del(lockedKey);
+    }
+  }
}

- if (curConfig?.cluster?.enable) {
-   await this.agent.redisClient.del(lockedKey);
- }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!isLocked) {
this.sendOne();
}
if (curConfig.cluster.enable) {
await this.agent.redisClient.del(lockedKey);
}
if (!isLocked) {
try {
await this.sendOne();
} finally {
if (curConfig?.cluster?.enable) {
await this.agent.redisClient.del(lockedKey);
}
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only when the custer mode is enabled, we need to use Redis to achieve the lock.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oneWalker, understood. Thank you for the clarification that we should use Redis for locking only when cluster mode is enabled.


✏️ Learnings added
Learnt from: oneWalker
PR: eggjs/egg-schedule#61
File: lib/strategy/worker.js:27-33
Timestamp: 2024-11-12T19:21:57.336Z
Learning: In the `WorkerStrategy` class in `lib/strategy/worker.js`, ensure that Redis locks are used only when cluster mode is enabled.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@oneWalker oneWalker changed the title feat: support schedule only executes once when it's deployed in k8s clusters mode. feat: support schedule only executes once in random one cluster when it's deployed in k8s clusters mode. Nov 12, 2024
@fengmk2
Copy link
Member

fengmk2 commented Nov 15, 2024

得抽象一下看看如何实现,目前是强依赖 redis,不是所有公司都是基于 redis 的,还有其他分布式锁的实现。

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Outside diff range and nitpick comments (10)
lib/strategy/all.js (1)

6-23: Consider a more flexible locking mechanism design.

As mentioned in the PR feedback, the current implementation might be too tightly coupled with Redis. Consider implementing a more abstract locking interface that could support different backends (Redis, etcd, Zookeeper, etc.).

Suggestions:

  1. Create an abstract LockProvider interface
  2. Allow configuration of different lock providers
  3. Implement provider-specific logic in separate classes
  4. Use dependency injection to inject the configured provider

This would make the implementation more flexible and maintainable for different deployment scenarios.

lib/strategy/worker.js (1)

5-6: Rename the file to match the class name

The file is named worker.js but contains AllStrategy class. Consider renaming the file to all.js to maintain consistency between file names and class names.

lib/lock/base.js (2)

1-2: Remove redundant 'use strict' directive

The 'use strict' directive is unnecessary in ES modules as they are strict by default.

-'use strict';
-
🧰 Tools
🪛 Biome

[error] 1-1: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)


3-38: Consider implementing a provider pattern for lock implementations

Based on the PR feedback about Redis dependency, consider enhancing the architecture to support multiple lock providers:

  1. Different organizations might use different distributed lock solutions (Redis, etcd, Zookeeper, etc.)
  2. Some might prefer using k8s native solutions like leader election

Consider implementing a factory pattern:

// lib/lock/factory.js
class LockFactory {
  static createLock(type, agent) {
    switch (type) {
      case 'redis':
        return new RedisLock(agent);
      case 'etcd':
        return new EtcdLock(agent);
      case 'kubernetes':
        return new KubernetesLock(agent);
      default:
        throw new Error(`Unsupported lock type: ${type}`);
    }
  }
}

This would allow users to choose their preferred locking mechanism through configuration:

// config/config.default.js
exports.schedule = {
  lockStrategy: 'redis', // or 'etcd', 'kubernetes'
  // strategy specific config...
};
README.md (1)

264-281: Improve overall documentation structure for cluster mode

The cluster configuration would benefit from better integration into the overall documentation:

  1. Add a "Cluster Mode" section in the Overview
  2. Include a troubleshooting guide for common cluster issues
  3. Add examples of monitoring and debugging cluster deployments

Consider adding these sections before the Configuration section:

## Cluster Mode

When deploying the application in a clustered environment (e.g., Kubernetes), you may want to ensure that scheduled tasks execute only once across all instances. The cluster mode uses Redis for distributed locking to prevent duplicate task execution.

### How it works
- When a task is scheduled to run, the scheduler attempts to acquire a Redis lock
- Only one instance will successfully acquire the lock and execute the task
- The lock is automatically released after task completion or timeout

### Troubleshooting

Common issues and solutions:

1. **Duplicate task execution**
   - Verify Redis connection settings
   - Check `lockedTtl` against task duration
   - Ensure consistent clock synchronization

2. **Tasks not executing**
   - Verify Redis connectivity
   - Check Redis key patterns
   - Monitor lock acquisition attempts
lib/lock/redis_lock.js (5)

1-1: Remove redundant 'use strict' directive

The 'use strict' directive is redundant in ECMAScript modules, as modules are always in strict mode.

Apply this diff to remove the redundant directive:

- 'use strict';
🧰 Tools
🪛 Biome

[error] 1-1: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)


6-9: Handle Redis client errors gracefully

When initializing the Redis client, consider adding error handling to manage potential connection issues with Redis. This ensures that the application can handle connection failures without crashing.

Example:

constructor(agent) {
  super(agent);
  this.client = new Redis(this.options);

  this.client.on('error', err => {
    this.logger.error(`[egg-schedule] Redis client error: ${err.message}`);
  });
}

16-31: Consider decoupling lock acquisition timeout from lock expiration time

Currently, the expiredTime parameter is used both for the lock's expiration time and the maximum duration to attempt acquiring the lock in the acquire method. This could limit flexibility if different durations are needed for each.

Consider introducing separate parameters for lock TTL and acquisition timeout:

async acquire(
  lockedKey,
  lockTtl = this.agent.config.schedule.cluster.lockedTtl,
  acquireTimeout = 5000 // Default acquisition timeout in milliseconds
) {
  const start = Date.now();
  while (Date.now() - start < acquireTimeout) {
    if (await this.tryAcquire(lockedKey, lockTtl)) {
      return true;
    }
    // Sleep as before
    const randomSleepTime = Math.random() * 900 + 100;
    await new Promise(resolve => setTimeout(resolve, randomSleepTime));
  }
  return false;
}

27-28: Reconsider random sleep time strategy

Using a random sleep time between retries can reduce lock contention, but it may introduce unnecessary delays. Consider implementing an exponential backoff strategy to optimize the retry intervals.

Example:

let retryCount = 0;
const maxRetryDelay = 1000; // Maximum delay of 1 second
while (Date.now() - start < acquireTimeout) {
  if (await this.tryAcquire(lockedKey, lockTtl)) {
    return true;
  }
  const retryDelay = Math.min(100 * 2 ** retryCount, maxRetryDelay);
  await new Promise(resolve => setTimeout(resolve, retryDelay));
  retryCount++;
}

5-70: Consider abstracting the lock mechanism to support multiple backends

The current implementation introduces a strong dependency on Redis for distributed locking. As noted in the PR comments, not all organizations use Redis. To increase flexibility, consider abstracting the lock mechanism to support multiple backends (e.g., Redis, ZooKeeper, etcd).

This can be achieved by defining a common interface or base class for lock managers and allowing users to configure the desired backend.

Example:

  • Define a generic LockManager interface.
  • Implement specific lock managers like RedisLockManager, EtcdLockManager, etc.
  • Allow configuration to select the lock manager type.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between cf4792b and 2f46c2c.

📒 Files selected for processing (7)
  • README.md (1 hunks)
  • agent.js (2 hunks)
  • config/config.default.js (1 hunks)
  • lib/lock/base.js (1 hunks)
  • lib/lock/redis_lock.js (1 hunks)
  • lib/strategy/all.js (1 hunks)
  • lib/strategy/worker.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • config/config.default.js
🧰 Additional context used
📓 Learnings (1)
lib/strategy/worker.js (1)
Learnt from: oneWalker
PR: eggjs/egg-schedule#61
File: lib/strategy/worker.js:27-33
Timestamp: 2024-11-12T19:21:57.482Z
Learning: In the `WorkerStrategy` class in `lib/strategy/worker.js`, ensure that Redis locks are used only when cluster mode is enabled.
🪛 Biome
lib/lock/base.js

[error] 1-1: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)

lib/lock/redis_lock.js

[error] 1-1: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)

🔇 Additional comments (3)
lib/strategy/all.js (1)

6-8: LGTM! Good use of async/await and optional chaining.

The method signature and initialization are well implemented with proper error handling through optional chaining.

agent.js (1)

15-22: ⚠️ Potential issue

Improve lock manager initialization with better error handling and cleanup.

The lock manager initialization needs proper error handling, configuration validation, and cleanup.

The previous review comments about Redis connection handling are still valid:

  1. Add Redis connection cleanup
  2. Add error handling for Redis connection
  3. Add Redis configuration validation

Additionally, consider this improved implementation:

    if (agent?.config?.schedule?.cluster?.enable) {
-      if (
-        agent.config.schedule.cluster.lockType === 'redis' &&
-        agent.config.schedule.cluster.redis
-      ) {
-        agent.lockManager = new RedisLock(agent);
-      }
+      const { lockType, redis: redisConfig } = agent.config.schedule.cluster;
+      
+      try {
+        agent.lockManager = createLockManager(lockType, agent);
+        
+        // Register cleanup
+        agent.beforeClose(async () => {
+          if (agent.lockManager) {
+            await agent.lockManager.destroy();
+            agent.lockManager = null;
+          }
+        });
+      } catch (err) {
+        agent.logger.error('[egg-schedule] Failed to initialize lock manager:', err);
+        throw err;
+      }
    }
README.md (1)

270-278: Add cluster-specific Redis configuration guidance

While security improvements were previously suggested, the Redis configuration also needs cluster-specific guidance:

  1. Document Redis connection pool settings for cluster deployments
  2. Add retry/reconnection strategies for network issues
  3. Include high availability configuration examples

Add the following configuration examples:

     redis: {
       client: {
+        // Connection pool settings for cluster deployments
+        maxRetriesPerRequest: 3,
+        enableReadyCheck: true,
+        // High availability configuration
+        sentinels: [
+          { host: 'sentinel-1', port: 26379 },
+          { host: 'sentinel-2', port: 26379 }
+        ],
+        name: 'mymaster',
       },
     },

const curConfig = this.agent?.config?.schedule;
let lockedKey = '';
if (curConfig?.cluster?.enable) {
lockedKey = this.key.replace(this.agent.baseDir, '');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use path manipulation for reliable lock key generation.

The current lock key generation using string replace is fragile and may not work correctly across platforms or when the base directory path appears multiple times in the key.

Consider using path manipulation:

-      lockedKey = this.key.replace(this.agent.baseDir, '');
+      const path = require('path');
+      lockedKey = path.relative(this.agent.baseDir, this.key);

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +17 to +19
if (canBeLocked) {
this.sendAll();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling and make execution asynchronous.

The execution block needs proper error handling and should handle potential async operations in sendAll().

Consider adding try-catch and making it async:

     if (canBeLocked) {
-      this.sendAll();
+      try {
+        await this.sendAll();
+      } catch (error) {
+        this.agent.logger.error('[egg-schedule] Failed to execute scheduled task:', error);
+        throw error; // Re-throw to ensure lock is released in the finally block
+      }
     }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +12 to +14
if (!(await this.agent.lockManager.tryAcquire(lockedKey))) {
canBeLocked = false;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling and timeout for lock acquisition.

The lock acquisition needs proper error handling and a timeout to prevent indefinite waiting.

Consider adding try-catch and timeout:

-      if (!(await this.agent.lockManager.tryAcquire(lockedKey))) {
-        canBeLocked = false;
-      }
+      try {
+        const timeout = 5000; // 5 seconds timeout
+        if (!(await this.agent.lockManager.tryAcquire(lockedKey, timeout))) {
+          canBeLocked = false;
+        }
+      } catch (error) {
+        this.agent.logger.error('[egg-schedule] Failed to acquire lock:', error);
+        canBeLocked = false;
+      }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!(await this.agent.lockManager.tryAcquire(lockedKey))) {
canBeLocked = false;
}
try {
const timeout = 5000; // 5 seconds timeout
if (!(await this.agent.lockManager.tryAcquire(lockedKey, timeout))) {
canBeLocked = false;
}
} catch (error) {
this.agent.logger.error('[egg-schedule] Failed to acquire lock:', error);
canBeLocked = false;
}

Comment on lines +21 to +23
if (curConfig?.cluster?.enable) {
await this.agent.lockManager.release(lockedKey);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure lock release with proper error handling.

The lock release should be in a finally block and include error handling to ensure proper cleanup.

Restructure the code:

-    if (curConfig?.cluster?.enable) {
-      await this.agent.lockManager.release(lockedKey);
-    }
+    try {
+      if (canBeLocked) {
+        await this.sendAll();
+      }
+    } finally {
+      if (curConfig?.cluster?.enable) {
+        try {
+          await this.agent.lockManager.release(lockedKey);
+        } catch (error) {
+          this.agent.logger.error('[egg-schedule] Failed to release lock:', error);
+        }
+      }
+    }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +17 to +19
if (canBeLocked) {
this.sendOne();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling and ensure proper async execution

The task execution block lacks error handling and might not properly handle async operations.

Apply this diff:

     if (canBeLocked) {
-      this.sendOne();
+      try {
+        await this.sendOne();
+      } catch (error) {
+        this.agent.logger.error('[egg-schedule] Task execution failed:', error);
+        throw error; // Re-throw to trigger lock release in the finally block
+      }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (canBeLocked) {
this.sendOne();
}
if (canBeLocked) {
try {
await this.sendOne();
} catch (error) {
this.agent.logger.error('[egg-schedule] Task execution failed:', error);
throw error; // Re-throw to trigger lock release in the finally block
}
}

Comment on lines +11 to +37
/**
* Require a lock from lock manager
*
* @param {string} _lockedKey - The key to lock
*/
async acquire() {
// Implementation here
}

/**
* Release a lock from lock manager
*
* @param {string} _lockedKey - The key to unlock
*/
async release() {
// Implementation here
}

/**
* Try to acquire without waiting
*
* @param {string} _lockedKey - The key to try to lock
*/
async tryAcquire() {
// Use _lockedKey in the implementation
// Implementation here
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance abstract methods documentation and contract

The abstract methods need improvements:

  1. Methods should be truly abstract
  2. JSDoc is incomplete
  3. Parameters are defined but unused

Consider this improved implementation:

   /**
    * Require a lock from lock manager
    *
-   * @param {string} _lockedKey - The key to lock
+   * @param {string} lockedKey - The key to lock
+   * @returns {Promise<boolean>} True if lock was acquired, false otherwise
+   * @throws {Error} If lock acquisition fails
+   * @abstract
    */
-  async acquire() {
-    // Implementation here
-  }
+  async acquire(lockedKey) {
+    throw new Error('Method not implemented');
+  }

   /**
    * Release a lock from lock manager
    *
-   * @param {string} _lockedKey - The key to unlock
+   * @param {string} lockedKey - The key to unlock
+   * @returns {Promise<void>}
+   * @throws {Error} If lock release fails
+   * @abstract
    */
-  async release() {
-    // Implementation here
-  }
+  async release(lockedKey) {
+    throw new Error('Method not implemented');
+  }

   /**
    * Try to acquire without waiting
    *
-   * @param {string} _lockedKey - The key to try to lock
+   * @param {string} lockedKey - The key to try to lock
+   * @returns {Promise<boolean>} True if lock was acquired, false if already locked
+   * @throws {Error} If lock attempt fails
+   * @abstract
    */
-  async tryAcquire() {
-    // Use _lockedKey in the implementation
-    // Implementation here
-  }
+  async tryAcquire(lockedKey) {
+    throw new Error('Method not implemented');
+  }

Additionally, consider adding implementation guidance in the class JSDoc:

/**
 * Base class for lock management implementations.
 * 
 * Implementations must:
 * 1. Handle distributed locking mechanism
 * 2. Ensure atomic lock operations
 * 3. Handle lock expiration
 * 4. Implement proper error handling
 * 
 * @abstract
 */

Comment on lines +4 to +9
constructor(agent) {
this.agent = agent;
this.client = null;
this.options = null;
this.prefixKey = `${this.agent?.config?.schedule?.default === 'default' ? this.agent.name : this.agent?.config?.schedule?.default}-${this.agent?.config?.schedule?.prefix}`;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve constructor's type safety and validation

The constructor needs several improvements:

  1. The prefixKey construction is prone to null reference errors
  2. Missing parameter validation
  3. Lacks comprehensive type definitions

Consider this improved implementation:

+/**
+ * @typedef {Object} Agent
+ * @property {Object} config
+ * @property {Object} config.schedule
+ * @property {string} config.schedule.default
+ * @property {string} config.schedule.prefix
+ * @property {string} name
+ */

+/**
+ * Base class for lock management
+ */
 module.exports = class LockManager {
+  /**
+   * @param {Agent} agent - The agent instance
+   * @throws {Error} If agent or required configuration is missing
+   */
   constructor(agent) {
+    if (!agent) {
+      throw new Error('Agent is required');
+    }
     this.agent = agent;
     this.client = null;
     this.options = null;
-    this.prefixKey = `${this.agent?.config?.schedule?.default === 'default' ? this.agent.name : this.agent?.config?.schedule?.default}-${this.agent?.config?.schedule?.prefix}`;
+    const scheduleConfig = agent.config?.schedule;
+    if (!scheduleConfig?.prefix) {
+      throw new Error('Schedule prefix configuration is required');
+    }
+    const defaultValue = scheduleConfig.default || 'default';
+    const prefix = defaultValue === 'default' ? agent.name : defaultValue;
+    this.prefixKey = `${prefix}-${scheduleConfig.prefix}`;
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
constructor(agent) {
this.agent = agent;
this.client = null;
this.options = null;
this.prefixKey = `${this.agent?.config?.schedule?.default === 'default' ? this.agent.name : this.agent?.config?.schedule?.default}-${this.agent?.config?.schedule?.prefix}`;
}
/**
* @typedef {Object} Agent
* @property {Object} config
* @property {Object} config.schedule
* @property {string} config.schedule.default
* @property {string} config.schedule.prefix
* @property {string} name
*/
/**
* Base class for lock management
*/
module.exports = class LockManager {
/**
* @param {Agent} agent - The agent instance
* @throws {Error} If agent or required configuration is missing
*/
constructor(agent) {
if (!agent) {
throw new Error('Agent is required');
}
this.agent = agent;
this.client = null;
this.options = null;
const scheduleConfig = agent.config?.schedule;
if (!scheduleConfig?.prefix) {
throw new Error('Schedule prefix configuration is required');
}
const defaultValue = scheduleConfig.default || 'default';
const prefix = defaultValue === 'default' ? agent.name : defaultValue;
this.prefixKey = `${prefix}-${scheduleConfig.prefix}`;
}

README.md Outdated
Comment on lines 279 to 281
//the prefix for lockedKey for redis lock
default: 'default', // default schedule name,like project-name.
prefix: 'schedule', // default schedule prefix
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Document Redis key patterns and management

The default and prefix configuration properties need better documentation to help users understand:

  1. The resulting Redis key pattern
  2. Impact on monitoring and debugging
  3. Considerations for multi-application environments

Add detailed documentation:

-  //the prefix for lockedKey for redis lock
-  default: 'default', // default schedule name,like project-name.
-  prefix: 'schedule', // default schedule prefix
+  // Redis key generation configuration
+  // The final Redis key pattern will be: `${prefix}:${default}:${scheduleName}`
+  // Example: "schedule:my-app:clean_cache"
+  default: 'default', // Application identifier, recommended to use your project name
+  prefix: 'schedule', // Global prefix for all schedule locks
+
+  // Example resulting Redis keys:
+  // - schedule:my-app:clean_cache
+  // - schedule:my-app:sync_data
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//the prefix for lockedKey for redis lock
default: 'default', // default schedule name,like project-name.
prefix: 'schedule', // default schedule prefix
// Redis key generation configuration
// The final Redis key pattern will be: `${prefix}:${default}:${scheduleName}`
// Example: "schedule:my-app:clean_cache"
default: 'default', // Application identifier, recommended to use your project name
prefix: 'schedule', // Global prefix for all schedule locks
// Example resulting Redis keys:
// - schedule:my-app:clean_cache
// - schedule:my-app:sync_data

Comment on lines +37 to +45
try {
lockedKey = `${this.prefixKey}-${lockedKey}`;
await this.client.del(lockedKey);
} catch (err) {
this.logger.error(
`[egg-schedule] ${this.key} release lock error: ${err.message}`
);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure locks are released only by the owner

Currently, the release method deletes the lock key without checking if the releasing instance owns the lock. In distributed systems, it's possible that the lock has expired or was acquired by another process. Consider implementing a mechanism to ensure that only the owner who acquired the lock can release it.

One approach is to store a unique identifier (e.g., a UUID) when setting the lock and check this identifier when releasing the lock.

Example:

// In tryAcquire
const lockValue = `${this.key}-${Date.now()}`;
const result = await this.client.set(lockedKey, lockValue, 'PX', expiredTime, 'NX');

// In release
async release(lockedKey) {
  try {
    lockedKey = `${this.prefixKey}-${lockedKey}`;
    const lockValue = await this.client.get(lockedKey);
    if (lockValue === this.lockValue) {
      await this.client.del(lockedKey);
    }
  } catch (err) {
    this.logger.error(
      `[egg-schedule] ${this.key} release lock error: ${err.message}`
    );
  }
}

This ensures that a lock is only released by the process that acquired it.

lib/lock/redis_lock.js Show resolved Hide resolved
@oneWalker
Copy link
Author

得抽象一下看看如何实现,目前是强依赖 redis,不是所有公司都是基于 redis 的,还有其他分布式锁的实现。
@fengmk2 实现了一版,看看👀。其他方式做了预留。

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
README.md (2)

270-278: Add Redis connection pool and cluster configuration options

The Redis configuration should include additional options for production deployments:

  1. Connection pool settings for better performance
  2. Redis cluster/sentinel configuration

Enhance the Redis configuration:

     redis: {
       client: {
         port: 6379,
         host: '127.0.0.1',
         password: 'auth',
         db: 0,
+        // Connection Pool
+        maxRetriesPerRequest: 3,
+        enableReadyCheck: true,
+        connectionName: 'egg-schedule',
+        // Cluster/Sentinel Support
+        sentinels: process.env.REDIS_SENTINELS ? JSON.parse(process.env.REDIS_SENTINELS) : null,
+        sentinelPassword: process.env.REDIS_SENTINEL_PASSWORD,
+        name: process.env.REDIS_SENTINEL_NAME,
       },
     },

279-281: Document key collision prevention strategies

The key pattern configuration should address potential naming conflicts when multiple applications share the same Redis instance.

Add key collision prevention guidance:

-  // The final Redis key pattern will be: `${prefix}-${default}-${scheduleName}`
+  // The final Redis key pattern will be: `${prefix}:${default}:${scheduleName}`
+  // Key Collision Prevention:
+  // 1. Use unique application names in the 'default' field
+  // 2. Consider including environment (dev/staging/prod) in the prefix
+  // 3. Example: "schedule:myapp-prod:clean_cache"
   default: 'default', // default schedule name,like project-name.
   prefix: 'schedule', // default schedule prefix
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2f46c2c and 176ec2c.

📒 Files selected for processing (3)
  • README.md (1 hunks)
  • lib/strategy/all.js (1 hunks)
  • lib/strategy/worker.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/strategy/all.js
🧰 Additional context used
📓 Learnings (1)
lib/strategy/worker.js (1)
Learnt from: oneWalker
PR: eggjs/egg-schedule#61
File: lib/strategy/worker.js:27-33
Timestamp: 2024-11-12T19:21:57.482Z
Learning: In the `WorkerStrategy` class in `lib/strategy/worker.js`, ensure that Redis locks are used only when cluster mode is enabled.
🔇 Additional comments (4)
lib/strategy/worker.js (4)

7-10: LGTM! Good use of optional chaining and variable initialization

The initialization of the handler method is well-structured with:

  • Proper async/await usage
  • Safe configuration access using optional chaining
  • Clear variable scoping

18-20: ⚠️ Potential issue

Add proper async/await and error handling for task execution

The task execution needs to:

  1. Properly await the async operation
  2. Include error handling

Apply this diff:

 if (canBeLocked) {
-  this.sendOne();
+  try {
+    await this.sendOne();
+  } catch (error) {
+    this.agent.logger.error('[egg-schedule] Task execution failed:', error);
+    throw error;  // Ensure the error propagates for proper lock cleanup
+  }
 }

Likely invalid or redundant comment.


4-4: Verify the impact of class rename from WorkerStrategy to AllStrategy

The class has been renamed from WorkerStrategy to AllStrategy. This change might affect existing code that imports and uses this class.

Also applies to: 6-6


22-24: ⚠️ Potential issue

Ensure reliable lock cleanup with proper error handling

The lock release logic should:

  1. Be in a try-finally block to guarantee cleanup
  2. Include error handling
  3. Only attempt release if lock was acquired

Restructure the entire handler method with this diff:

 async handler() {
   let canBeLocked = true;
   const curConfig = this.agent?.config?.schedule;
   let lockedKey = '';
+  let lockAcquired = false;

   if (curConfig?.cluster?.enable) {
     try {
       lockedKey = path.relative(this.agent.baseDir, this.key)
         .replace(/[^a-zA-Z0-9]/g, '_');
       
       if (await this.agent.lockManager.tryAcquire(lockedKey, { ttl: 60000 })) {
+        lockAcquired = true;
+      } else {
         canBeLocked = false;
       }
     } catch (error) {
       this.agent.logger.error('[egg-schedule] Failed to acquire lock:', error);
       canBeLocked = false;
     }
   }

   try {
     if (canBeLocked) {
       await this.sendOne();
     }
   } finally {
     if (curConfig?.cluster?.enable && lockAcquired) {
       try {
         await this.agent.lockManager.release(lockedKey);
       } catch (error) {
         this.agent.logger.error('[egg-schedule] Failed to release lock:', error);
       }
     }
   }
 }

Likely invalid or redundant comment.

Comment on lines +11 to +16
if (curConfig?.cluster?.enable) {
lockedKey = path.relative(this.agent.baseDir, this.key);
if (!(await this.agent.lockManager.tryAcquire(lockedKey))) {
canBeLocked = false;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance lock acquisition robustness and error handling

The lock acquisition logic needs improvements in several areas:

  1. Lock key construction should handle edge cases
  2. Lock acquisition should include error handling
  3. Consider adding a TTL to prevent stale locks

Apply this diff:

 if (curConfig?.cluster?.enable) {
-  lockedKey = path.relative(this.agent.baseDir, this.key);
-  if (!(await this.agent.lockManager.tryAcquire(lockedKey))) {
-    canBeLocked = false;
+  try {
+    // Ensure safe lock key construction
+    lockedKey = path.relative(this.agent.baseDir, this.key)
+      .replace(/[^a-zA-Z0-9]/g, '_');  // sanitize key
+    
+    // Add TTL to prevent stale locks
+    if (!(await this.agent.lockManager.tryAcquire(lockedKey, { ttl: 60000 }))) {
+      canBeLocked = false;
+    }
+  } catch (error) {
+    this.agent.logger.error('[egg-schedule] Failed to acquire lock:', error);
+    canBeLocked = false;
   }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (curConfig?.cluster?.enable) {
lockedKey = path.relative(this.agent.baseDir, this.key);
if (!(await this.agent.lockManager.tryAcquire(lockedKey))) {
canBeLocked = false;
}
}
if (curConfig?.cluster?.enable) {
try {
// Ensure safe lock key construction
lockedKey = path.relative(this.agent.baseDir, this.key)
.replace(/[^a-zA-Z0-9]/g, '_'); // sanitize key
// Add TTL to prevent stale locks
if (!(await this.agent.lockManager.tryAcquire(lockedKey, { ttl: 60000 }))) {
canBeLocked = false;
}
} catch (error) {
this.agent.logger.error('[egg-schedule] Failed to acquire lock:', error);
canBeLocked = false;
}
}

Comment on lines +264 to +269
// add it when making sure it only running in one cluster.
cluster: {
enable: false, // default is false, true is for running
lockType: 'redis', //use redis as lock. reserve for other lock type
lockedTtl: 6000, // the automatic expired time for lock, avoid lock leak
// add redis for redis lock
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Document failure scenarios and high availability considerations

The cluster configuration documentation should include:

  1. How the system behaves during Redis failures
  2. Recovery mechanisms for lock acquisition failures
  3. Recommendations for high availability setups

Add the following section:

   cluster: {
     enable: false, // default is false, true is for running
     lockType: 'redis', //use redis as lock. reserve for other lock type
     lockedTtl: 6000, // the automatic expired time for lock, avoid lock leak
+    // Failure Handling:
+    // - If Redis is temporarily unavailable, tasks will execute on all nodes
+    // - If a node crashes while holding a lock, the lock expires after lockedTtl
+    // High Availability:
+    // - Recommended to use Redis sentinel or cluster for HA
+    // - Consider setting lockedTtl based on your task's typical execution time
   },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// add it when making sure it only running in one cluster.
cluster: {
enable: false, // default is false, true is for running
lockType: 'redis', //use redis as lock. reserve for other lock type
lockedTtl: 6000, // the automatic expired time for lock, avoid lock leak
// add redis for redis lock
// add it when making sure it only running in one cluster.
cluster: {
enable: false, // default is false, true is for running
lockType: 'redis', //use redis as lock. reserve for other lock type
lockedTtl: 6000, // the automatic expired time for lock, avoid lock leak
// Failure Handling:
// - If Redis is temporarily unavailable, tasks will execute on all nodes
// - If a node crashes while holding a lock, the lock expires after lockedTtl
// High Availability:
// - Recommended to use Redis sentinel or cluster for HA
// - Consider setting lockedTtl based on your task's typical execution time
// add redis for redis lock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants