Skip to content

multiple optimization#21

Merged
lvjing2 merged 10 commits intomainfrom
youji-dev
Nov 26, 2024
Merged

multiple optimization#21
lvjing2 merged 10 commits intomainfrom
youji-dev

Conversation

@lvjing2
Copy link
Contributor

@lvjing2 lvjing2 commented Nov 14, 2024

Summary by CodeRabbit

  • New Features

    • Enhanced error handling and context management in the Module Deployment Controller.
    • Introduction of new methods for better deployment management and related functionalities.
    • Comprehensive test suites established for both MQTT and HTTP modules, focusing on lifecycle events.
    • Transition from Pod to Deployment in Kubernetes configuration for improved management of replicas.
    • New dependency added for improved logging capabilities.
  • Bug Fixes

    • Improved clarity and consistency in method signatures and parameter handling across modules.
  • Documentation

    • Updated comments and descriptions for constants and methods to reflect operational modes and naming conventions.
  • Refactor

    • Significant restructuring of metadata handling and method signatures across various modules to improve clarity and maintainability.
    • Renamed and restructured several types and fields in the model for enhanced specificity.
    • Removed obsolete module_tunnel.go file and its associated functionalities.
  • Chores

    • Updated dependency version for github.com/koupleless/virtual-kubelet.
    • Added new test cases and improved test suite organization for better coverage and clarity.

@coderabbitai
Copy link

coderabbitai bot commented Nov 14, 2024

Walkthrough

The changes in this pull request involve updates to various Go files, primarily focusing on dependency version updates, significant refactoring of tunnel management, and enhancements to data structures and method signatures. The go.mod file reflects a dependency version change, while multiple files related to HTTP and MQTT tunnels have undergone structural modifications, including renaming methods for clarity and adjusting parameter types. Additionally, the ModuleDeploymentController has been streamlined by removing unnecessary fields and improving context management across its methods. Overall, these changes aim to enhance code clarity and maintainability.

Changes

File Path Change Summary
go.mod Added github.com/go-logr/logr v1.4.2; updated github.com/koupleless/virtual-kubelet from v0.3.4 to v0.3.5-0.20241125132309-00fba45cbde1.
module_tunnels/koupleless_http_tunnel/http_tunnel.go Refactored HttpTunnel struct; renamed methods for clarity; updated parameters to derive nodeID from initData or nodeName; adjusted internal data structures.
module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go Refactored MqttTunnel struct; renamed methods for clarity; updated parameters to derive nodeID from nodeInfo.Metadata.Name.
cmd/module-controller/main.go Introduced startTunnels function for tunnel management; enhanced error handling; modified initialization sequence for controllers.
common/model/consts.go Updated comments for MQTT topics; replaced LabelKeyOfArkletPort with LabelKeyOfTunnelPort.
common/model/model.go Renamed and restructured types and fields for clarity; introduced new fields in BaseMetadata and BaseStatus.
common/utils/utils.go Renamed functions to align with new data structures; updated internal logic for consistency.
common/utils/utils_test.go Updated test cases to reflect renamed functions and new data structures.
controller/module_deployment_controller/module_deployment_controller.go Removed tunnels field; added context parameters to several methods; introduced new methods for fetching related deployments.
controller/module_deployment_controller/module_deployment_controller_test.go Updated test cases to include context parameters in method calls.
module_tunnels/koupleless_http_tunnel/ark_service/ark_service.go Renamed arkletPort parameter to port in QueryAllBiz method.
module_tunnels/koupleless_http_tunnel/ark_service/model.go Renamed BaseID field to BaseIdentity in ArkResponse; removed BaseID from other response structures.
module_tunnels/module_tunnel.go Removed file defining module_tunnels package, including QueryBaseline type and ModuleTunnel interface.
suite/http/base_http_lifecycle_test.go Transitioned test suite from MQTT to HTTP; updated test contexts and mock objects accordingly.
suite/http/mock_http_base.go Updated MockHttpBase struct to use BaseMetadata; modified methods to reflect new metadata structure.
suite/http/suite_test.go Established a comprehensive testing framework for the Module Controller; integrated various components necessary for tests.
suite/mqtt/base_mqtt_lifecycle_test.go Introduced tests for MQTT base lifecycle events; focused on online and offline states.
suite/mqtt/mock_mqtt_base.go Updated MockMQTTBase struct and methods to use BaseMetadata; streamlined metadata handling.
suite/mqtt/suite_test.go Restructured test suite for MQTT module; simplified initialization and error handling.
suite/http/module_http_lifecycle_test.go Updated test suite for HTTP; modified mock object initialization and assertions.
suite/mqtt/module_deployment_controller_suite_test.go Updated mock object initialization; adjusted deployment specifications and assertions.
samples/module_controller_pod.yaml Changed resource type from Pod to Deployment; updated metadata and spec sections for deployment management.

Possibly related PRs

Poem

🐇 In tunnels deep where data flows,
A rabbit hops where code now grows.
With names anew and paths so bright,
We celebrate this coding flight!
From MQTT to HTTP's grace,
Our changes bring a happy face! 🌟

Warning

Rate limit exceeded

@lvjing2 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 20 minutes and 59 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 8ca3ef2 and 241bd3f.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 84.88064% with 57 lines in your changes missing coverage. Please review.

Project coverage is 79.38%. Comparing base (1c07ba4) to head (241bd3f).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
...loyment_controller/module_deployment_controller.go 70.73% 28 Missing and 8 partials ⚠️
...dule_tunnels/koupleless_http_tunnel/http_tunnel.go 79.59% 10 Missing ⚠️
...dule_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go 92.39% 6 Missing and 1 partial ⚠️
suite/http/mock_http_base.go 88.88% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #21      +/-   ##
==========================================
- Coverage   81.29%   79.38%   -1.92%     
==========================================
  Files          12       12              
  Lines        1834     1887      +53     
==========================================
+ Hits         1491     1498       +7     
- Misses        287      323      +36     
- Partials       56       66      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go (3)

Line range hint 273-280: Address TODO comments regarding PodKey assignment

There are commented-out PodKey assignments in both success handlers. This needs to be addressed to ensure proper pod key management.

Would you like me to help implement the PodKey assignment logic?

Also applies to: 282-289


301-306: Add error handling for JSON marshaling operations

The StartBiz and StopBiz methods don't check for JSON marshaling errors. While unlikely to fail for valid container objects, it's good practice to handle these errors.

Consider applying this pattern:

-	installBizRequestBytes, _ := json.Marshal(bizModel)
+	installBizRequestBytes, err := json.Marshal(bizModel)
+	if err != nil {
+		return fmt.Errorf("failed to marshal biz model: %w", err)
+	}

Also applies to: 309-314


293-294: Consider using QoS 1 for critical operations

Currently, all MQTT publish operations use QoS 0 (at most once delivery). While this is acceptable for health checks and status queries, consider using QoS 1 (at least once delivery) for critical operations like starting and stopping businesses to ensure message delivery.

Critical operations that should use QoS 1:

  • StartBiz (CommandInstallBiz)
  • StopBiz (CommandUnInstallBiz)

Also applies to: 297-298, 301-306, 309-314

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1c07ba4 and a66509b.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • go.mod (1 hunks)
  • module_tunnels/koupleless_http_tunnel/http_tunnel.go (2 hunks)
  • module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go (10 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go

[warning] 49-49: module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go#L49
Added line #L49 was not covered by tests


[warning] 123-123: module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go#L123
Added line #L123 was not covered by tests

🔇 Additional comments (6)
go.mod (2)

12-12: Verify handling of breaking changes in callback names.

The update includes breaking changes in callback names (OnNodeDiscovered → OnBaseDiscovered, OnNodeStatusDataArrived → OnBaseStatusArrived). Ensure all usages are updated.

✅ Verification successful

The search results show that there are no occurrences of the old callback names (OnNodeDiscovered and OnNodeStatusDataArrived) in the codebase, while the new callback names (OnBaseDiscovered and OnBaseStatusArrived) are properly used in both MQTT and HTTP tunnel implementations. Let me generate the final response:

✅ Callback names are already updated correctly

The codebase shows no instances of the old callback names and correctly uses the new callback names (OnBaseDiscovered, OnBaseStatusArrived) in all relevant tunnel implementations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for any remaining uses of old callback names that need updating

# Search for old callback names
echo "Searching for old callback names that might need updating..."
rg "OnNodeDiscovered|OnNodeStatusDataArrived" -t go

# Search for new callback names to verify migration
echo "Verifying migration to new callback names..."
rg "OnBaseDiscovered|OnBaseStatusArrived" -t go

Length of output: 1333


12-12: Verify the use of pseudo-version instead of a stable release.

The dependency has been updated from a stable release v0.3.4 to a pseudo-version v0.3.5-0.20241114123757-27017ee5f1c0. While this allows tracking a specific commit, it's generally recommended to use stable releases in production unless there's a specific need for unreleased changes.

✅ Verification successful

Let me gather more information about the usage of this dependency in the codebase to better understand the necessity of using this specific commit.


Use of pseudo-version is justified but requires careful testing

The dependency update to v0.3.5-0.20241114123757-27017ee5f1c0 is necessary as it contains critical fixes for data consistency and the codebase already implements the renamed callback functions (OnBaseDiscovered, OnBaseStatusArrived) in both HTTP and MQTT tunnels. Since the latest stable release is still v0.3.4 and these changes are required, using this specific commit is acceptable.

  • The commit fixes known pods data consistency issues
  • The callback function renames are properly implemented in:
    • module_tunnels/koupleless_http_tunnel/http_tunnel.go
    • module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the commit exists and check its changes

# Fetch the specific commit information
gh api repos/koupleless/virtual-kubelet/commits/27017ee5f1c0 --jq '.commit.message'

# Check if there's a newer stable release that includes these changes
gh api repos/koupleless/virtual-kubelet/releases --jq '.[0].tag_name'

Length of output: 193


Script:

#!/bin/bash
# Check for direct imports of virtual-kubelet packages
rg "github.com/koupleless/virtual-kubelet" --type go -l

# Look for any references to the callback functions that were renamed
rg "OnNodeDiscovered|OnNodeStatusDataArrived|OnBaseDiscovered|OnBaseStatusArrived" --type go -A 2

Length of output: 2538

module_tunnels/koupleless_http_tunnel/http_tunnel.go (2)

41-42: LGTM! Callback field types updated consistently.

The callback field types have been updated from node-centric to base-centric naming, maintaining consistency with the virtual-kubelet dependency update.


95-95: LGTM! Method signature updated to match new types.

The RegisterCallback method signature has been updated to use the new base-centric callback types, aligning with the struct field changes.

Let's verify that all callers of RegisterCallback have been updated:

module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go (2)

36-40: LGTM! Improved naming convention for callback types

The renaming from Node to Base in callback types provides better clarity about their purpose and maintains consistency across the codebase.


49-49: Add test coverage for critical paths

The following code paths lack test coverage:

  1. GetBizUniqueKey method which is crucial for business identity
  2. Node reconnection logic in the MQTT connection handler

Consider adding test cases to verify:

  • Business identity generation with various container configurations
  • MQTT reconnection behavior with existing nodes

Would you like me to help generate test cases for these scenarios?

Also applies to: 122-123

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 49-49: module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go#L49
Added line #L49 was not covered by tests

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

🧹 Outside diff range and nitpick comments (24)
controller/module_deployment_controller/module_deployment_controller_test.go (1)

16-16: Consider adding test cases for context behavior

While the context parameters have been added correctly, consider adding test cases to verify:

  1. Context cancellation handling
  2. Context timeout scenarios
  3. Context propagation to downstream operations

This would ensure the new context handling is thoroughly tested.

Example test case:

func TestDeploymentHandlerContextCancellation(t *testing.T) {
    c := &ModuleDeploymentController{
        runtimeStorage: NewRuntimeInfoStore(),
        updateToken:    make(chan interface{}),
    }
    
    ctx, cancel := context.WithCancel(context.Background())
    cancel() // Cancel context immediately
    
    // Verify handler behaves correctly with cancelled context
    c.deploymentAddHandler(ctx, nil)
    assert.Equal(t, 0, len(c.runtimeStorage.peerDeploymentMap))
}

Also applies to: 20-20

common/model/model.go (1)

13-17: LGTM! Consider enhancing documentation.

The renaming to BaseMetadata and field restructuring improves clarity and multi-cluster support. The struct is well-organized with proper JSON tags.

Consider adding more detailed documentation for each field:

 // BaseMetadata contains basic identifying information
 type BaseMetadata struct {
-    Identity    string `json:"identity"`
-    Version     string `json:"version"`     // Version identifier
-    ClusterName string `json:"clusterName"` // ClusterName of the resource communicate with base
+    // Identity uniquely identifies the resource within its cluster
+    Identity    string `json:"identity"`
+    // Version identifier of the resource
+    Version     string `json:"version"`
+    // ClusterName identifies which cluster this resource belongs to
+    ClusterName string `json:"clusterName"`
 }
module_tunnels/koupleless_http_tunnel/ark_service/ark_service.go (1)

74-79: Consider maintaining consistent parameter naming across methods.

While the implementation is correct, the parameter renaming from arkletPort to port in QueryAllBiz creates an inconsistency with other similar methods (InstallBiz, UninstallBiz, Health) which still use arkletPort.

Consider either:

  1. Keeping arkletPort for consistency with other methods, or
  2. Updating all methods to use port for a unified naming scheme:
- func (h *Service) InstallBiz(ctx context.Context, req InstallBizRequest, baseIP string, arkletPort int)
+ func (h *Service) InstallBiz(ctx context.Context, req InstallBizRequest, baseIP string, port int)

- func (h *Service) UninstallBiz(ctx context.Context, req UninstallBizRequest, baseIP string, arkletPort int)
+ func (h *Service) UninstallBiz(ctx context.Context, req UninstallBizRequest, baseIP string, port int)

- func (h *Service) Health(ctx context.Context, baseIP string, arkletPort int)
+ func (h *Service) Health(ctx context.Context, baseIP string, port int)
suite/module_mqtt_lifecycle_test.go.bak (1)

33-36: Consider enhancing the error message for better debugging.

Good addition of structured error logging. However, the error message could be more descriptive to aid in debugging.

Consider this improvement:

-log.G(ctx).Error(err, "get vnode error")
+log.G(ctx).Error(err, "failed to get vnode", "nodeName", nodeName)

This change adds context about which node failed to be retrieved, making it easier to diagnose issues.

suite/mock_http_base.go (2)

185-188: Fix inconsistent indentation in struct initialization

While the field updates are correct, the indentation in the struct initialization is inconsistent.

-						BizName:    b.Metadata.Identity,
-						BizState:   b.CurrState,
-						BizVersion: b.Metadata.Version,
+                       BizName:    b.Metadata.Identity,
+                       BizState:   b.CurrState,
+                       BizVersion: b.Metadata.Version,

68-68: Consider documenting mock server behavior

Since this is a mock implementation used for testing, consider adding comments describing the expected behavior and test scenarios this mock server is designed to handle. This would help other developers understand its purpose and limitations.

common/utils/utils.go (1)

121-126: Consider making tech stack label configurable

The function hardcodes "java" as the tech stack label. Consider making this configurable to support other tech stacks in the future.

-func ConvertBaseMetadataToBaselineQuery(data model.BaseMetadata) model.QueryBaselineRequest {
+func ConvertBaseMetadataToBaselineQuery(data model.BaseMetadata, techStack string) model.QueryBaselineRequest {
     return model.QueryBaselineRequest{
         Identity:    data.Identity,
         ClusterName: data.ClusterName,
         Version:     data.Version,
         CustomLabels: map[string]string{
-            model.LabelKeyOfTechStack: "java",
+            model.LabelKeyOfTechStack: techStack,
         },
     }
 }
suite/mock_mqtt_base.go (3)

19-24: LGTM! Consider enhancing the field comment.

The field renaming from ID to Name and the introduction of BaseMetadata improve code clarity. However, the comment "node name or id" could be more specific.

Consider updating the comment to be more precise:

-	Name         string // node name or id
+	Name         string // Unique identifier for the node instance

255-258: Reduce duplication by extracting topic format strings.

There's significant duplication in MQTT topic formatting across multiple methods. Consider extracting these into constants or helper methods.

+const (
+    baseHealthTopic = "koupleless_%s/%s/base/health"
+    baseBizTopic = "koupleless_%s/%s/base/biz"
+    baseHeartTopic = "koupleless_%s/%s/base/heart"
+    baseQueryBaselineTopic = "koupleless_%s/%s/base/queryBaseline"
+)
+
+func (b *MockMQTTBase) formatTopic(topicFormat string) string {
+    return fmt.Sprintf(topicFormat, b.Env, b.BaseMetadata.Identity)
+}

 func (b *MockMQTTBase) SendInvalidMessage() {
-    b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/health", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
-    b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/biz", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
-    b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/heart", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
-    b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/queryBaseline", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
+    b.client.Pub(b.formatTopic(baseHealthTopic), 1, []byte(""))
+    b.client.Pub(b.formatTopic(baseBizTopic), 1, []byte(""))
+    b.client.Pub(b.formatTopic(baseHeartTopic), 1, []byte(""))
+    b.client.Pub(b.formatTopic(baseQueryBaselineTopic), 1, []byte(""))

Also applies to: 262-265, 269-270


Line range hint 209-251: Improve error handling in message processing methods.

The message processing methods (processInstallBiz, processUnInstallBiz) should handle JSON unmarshaling errors and validate the request data.

 func (b *MockMQTTBase) processInstallBiz(msg []byte) {
 	b.Lock()
 	defer b.Unlock()
 	request := ark.BizModel{}
-	json.Unmarshal(msg, &request)
+	if err := json.Unmarshal(msg, &request); err != nil {
+		// Consider logging the error or sending an error response
+		return
+	}
+	if request.BizName == "" || request.BizVersion == "" {
+		// Consider logging the error or sending an error response
+		return
+	}
 	identity := getBizIdentity(request)
 	_, has := b.BizInfos[identity]
 	// ... rest of the method
common/utils/utils_test.go (1)

275-284: Consider adding edge cases to test suite

While the test cases cover the basic scenarios for the new BaseStatus structure, consider adding the following edge cases:

  1. Empty BaseMetadata fields
  2. Invalid state values
  3. Zero/negative port numbers

Also applies to: 303-311

suite/suite_test.go (2)

109-109: Consider making the HTTP tunnel port configurable

The httpTunnel is initialized with a hardcoded port 7777. To enhance flexibility and maintainability, consider making the port configurable via environment variables or configuration files.


119-119: Use consistent error handling across the setup

Consider replacing the panic call with Expect(err).NotTo(HaveOccurred()) to maintain consistency with the error handling pattern used elsewhere in the test setup.

cmd/module-controller/main.go (4)

208-211: Simplify panic error construction with fmt.Errorf.

When creating error messages within panic, it's more idiomatic to use fmt.Errorf instead of wrapping fmt.Sprintf with errors.New.

Apply this diff to simplify the panic calls:

    if startFailedCount > 0 {
-       panic(errors.New(fmt.Sprintf("failed to start %d tunnels", startFailedCount)))
+       panic(fmt.Errorf("failed to start %d tunnels", startFailedCount))
    } else if successTunnelCount == 0 {
-       panic(errors.New(fmt.Sprintf("successfully started 0 tunnels")))
+       panic(fmt.Errorf("successfully started 0 tunnels"))
    }

197-200: Include tunnel key properly in error logs.

When logging the error for a failed tunnel start, the tunnel key is passed as a separate argument, which may not be included in the log message as intended. It's better to include it using structured logging.

Apply this diff to include the tunnel key in the log:

    if err != nil {
-       log.G(ctx).WithError(err).Error("failed to start tunnel", t.Key())
+       log.G(ctx).WithError(err).WithField("tunnel", t.Key()).Error("failed to start tunnel")
        startFailedCount++
    }

212-214: Handle multiple tunnels or clarify the intention.

Currently, even if multiple tunnels are started successfully, only the first tunnel is returned. This could lead to issues if the application expects to use all started tunnels.

Consider updating the code to handle multiple tunnels properly or document that only the first tunnel will be used. For example:

-    // we only using one tunnel for now
-    return tunnels[0]
+    // Currently, only the first tunnel is used. Update this logic if multiple tunnels need to be supported.
+    return tunnels[0]

169-170: Maintain consistent casing in clientID variable names.

The variable clientId in the startTunnels function uses a lowercase 'd', while clientID in main uses an uppercase 'D'. For consistency and readability, it is recommended to use the same casing throughout the code.

Apply this diff to standardize the variable name:

- func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
+ func startTunnels(ctx context.Context, clientID string, env string, mgr manager.Manager,
        moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) tunnel.Tunnel {

    // Update all usages within the function
-       err := t.Start(ctx, clientId, env)
+       err := t.Start(ctx, clientID, env)

Also applies to: 197-197

module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go (1)

14-14: Use a descriptive alias instead of utils2

Aliasing the package as utils2 may reduce code readability. Consider using a more meaningful alias that reflects the package's purpose, such as vkutils.

Apply this diff to update the alias:

-import (
    	...
-	utils2 "github.com/koupleless/virtual-kubelet/common/utils"
+	vkutils "github.com/koupleless/virtual-kubelet/common/utils"
    	...
controller/module_deployment_controller/module_deployment_controller.go (6)

223-225: Consistent logging: Replace logrus with log.G(ctx).

In GetRelatedDeploymentsByNode, logrus.Warnf is used, which is inconsistent with the rest of the code that uses log.G(ctx). For consistent logging and context-aware logging, replace logrus with log.G(ctx).

Apply this diff:

-		logrus.Warnf("failed to get cluster name of node %s", node.Name)
+		log.G(ctx).Warnf("failed to get cluster name of node %s", node.Name)

235-237: Consistent logging: Use log.G(ctx) for error logging.

When logging errors, use log.G(ctx).WithError(err).Error(...) instead of logrus.WithError(err).Error(...) to maintain consistency and context awareness.

Apply this diff:

-		logrus.WithError(err).Error("failed to list deployments")
+		log.G(ctx).WithError(err).Error("failed to list deployments")

Line range hint 269-279: Potential deadlock due to buffered channel in updateDeploymentReplicas.

Using <-mdc.updateToken and mdc.updateToken <- nil with a buffered channel size of 1 can cause deadlocks if multiple goroutines access updateDeploymentReplicas. This could block the function indefinitely.

Consider using synchronization primitives like mutexes or redesigning the channel usage to prevent deadlocks.


286-291: Improve error message clarity when cluster name is missing.

The warning message "failed to get cluster name of deployment %s, skip to update replicas" could be more informative.

Consider specifying why the cluster name is missing or provide guidance on resolving the issue.


310-335: Simplify and optimize the getClusterNameFromDeployment function.

The function getClusterNameFromDeployment can be refactored for clarity and efficiency by:

  • Returning early when a match is found.
  • Avoiding unnecessary nested conditions.

338-342: Simplify the getClusterNameFromNode function.

The function can be simplified as it only checks for the existence of a label.

Apply this diff:

 func getClusterNameFromNode(node *corev1.Node) string {
 	if clusterName, has := node.Labels[vkModel.LabelKeyOfVNodeClusterName]; has {
 		return clusterName
 	}
-	return ""
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a66509b and f97acee.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (20)
  • cmd/module-controller/main.go (3 hunks)
  • common/model/consts.go (1 hunks)
  • common/model/model.go (1 hunks)
  • common/utils/utils.go (5 hunks)
  • common/utils/utils_test.go (10 hunks)
  • controller/module_deployment_controller/module_deployment_controller.go (7 hunks)
  • controller/module_deployment_controller/module_deployment_controller_test.go (2 hunks)
  • go.mod (1 hunks)
  • module_tunnels/koupleless_http_tunnel/ark_service/ark_service.go (1 hunks)
  • module_tunnels/koupleless_http_tunnel/ark_service/model.go (1 hunks)
  • module_tunnels/koupleless_http_tunnel/http_tunnel.go (15 hunks)
  • module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go (9 hunks)
  • module_tunnels/module_tunnel.go (0 hunks)
  • suite/base_lifecycle_test.go.bak (2 hunks)
  • suite/mock_http_base.go (7 hunks)
  • suite/mock_mqtt_base.go (9 hunks)
  • suite/module_deployment_controller_suite_test.go (5 hunks)
  • suite/module_http_lifecycle_test.go.bak (1 hunks)
  • suite/module_mqtt_lifecycle_test.go.bak (2 hunks)
  • suite/suite_test.go (2 hunks)
💤 Files with no reviewable changes (1)
  • module_tunnels/module_tunnel.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • go.mod
  • module_tunnels/koupleless_http_tunnel/http_tunnel.go
🔇 Additional comments (32)
controller/module_deployment_controller/module_deployment_controller_test.go (1)

4-4: LGTM: Good addition of context handling

The addition of context import and initialization using context.Background() is a good practice for proper context propagation in tests.

Also applies to: 15-15

module_tunnels/koupleless_http_tunnel/ark_service/model.go (1)

29-30: Verify API compatibility for the BaseIdentity field rename

The rename from BaseID to BaseIdentity improves clarity, but changing the JSON field name from "baseID" to "baseIdentity" could break API compatibility for existing clients.

Let's verify the consistency of this change across the codebase:

✅ Verification successful

Rename from BaseID to BaseIdentity is safe and consistent

Based on the git history and codebase analysis:

  1. The change is part of a recent commit that consistently removes BaseID from all response structures
  2. The field was only present in the model definitions and not used in any JSON unmarshaling operations
  3. The change is backwards compatible as:
    • The field was removed from QueryAllBizResponse and HealthResponse
    • In ArkResponse, it was renamed with proper documentation
  4. All usages in the codebase (mock_mqtt_base.go, mock_http_base.go, etc.) have been updated to use the new structure
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to baseID and verify consistent usage of baseIdentity

# Check for any remaining "baseID" references in Go files
echo "Checking for remaining baseID references:"
rg -t go "baseID" -A 2 -B 2

# Check for new baseIdentity usage
echo -e "\nVerifying baseIdentity usage:"
rg -t go "baseIdentity" -A 2 -B 2

# Check for potential API clients that might be affected
echo -e "\nChecking for potential API client usage:"
rg -t go "json\.Unmarshal.*ArkResponse" -A 5 -B 5

Length of output: 781


Script:

#!/bin/bash
# Let's try a more comprehensive search to understand the API usage and changes

# Check git history for the specific file to see the changes
echo "Checking git history for the file:"
git log -p module_tunnels/koupleless_http_tunnel/ark_service/model.go

# Search for any HTTP handlers or API endpoints that might use this response
echo -e "\nChecking for HTTP handlers using ArkResponse:"
rg -t go "ArkResponse" -A 5 -B 5

# Look for test files that might be using this model
echo -e "\nChecking test files for ArkResponse usage:"
fd -e test.go -x rg -l "ArkResponse" {}

Length of output: 16045

common/model/model.go (1)

20-27: Verify the impact of replacing BaseID with Port field.

The struct changes look good overall, with improved naming and logical grouping. However, we should verify the impact of replacing BaseID with Port.

✅ Verification successful

Let me gather more context about the Port field usage and BaseID references.


Based on the comprehensive results, I can now provide the final response:

Port field is correctly integrated and used throughout the codebase

The Port field in BaseStatus is properly integrated and used across the codebase:

  • It's used for arklet service communication in HTTP tunnel implementation
  • It's properly converted to/from NodeInfo via LabelKeyOfTunnelPort
  • Test cases validate the Port field usage and conversion
  • No remaining references to BaseID exist except in a test utility function that's unrelated to the struct change
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining references to BaseID that might need updating
rg "BaseID" --type go

# Search for Port field usage to ensure consistent implementation
ast-grep --pattern 'BaseStatus { $$$Port$$$}'

Length of output: 140


Script:

#!/bin/bash
# Check the test file content to understand BaseID usage
cat common/utils/utils_test.go

# Search for Port field usage in method calls and assignments
ast-grep --pattern '$_.$_.Port'

# Look for any Port-related function calls or configurations
rg "Port" -A 3 --type go

Length of output: 28396

common/model/consts.go (2)

56-68: LGTM! Clear documentation of MQTT topic modes.

The updated comments explicitly specify whether each topic operates in broadcast or p2p mode, improving clarity and preventing potential misuse.


76-77: Verify the label key change impact across the codebase.

The replacement of LabelKeyOfArkletPort with LabelKeyOfTunnelPort looks good, but we should verify that all references have been updated accordingly.

suite/module_mqtt_lifecycle_test.go.bak (2)

20-23: LGTM! Good improvement in node name handling.

The introduction of a dedicated nodeName variable improves code maintainability and consistency by:

  • Computing the formatted node name once and reusing it
  • Reducing the risk of inconsistent node name formatting across the test
  • Making the code more DRY by eliminating repeated calls to utils.FormatNodeName

Line range hint 1-124: Well-structured and comprehensive test coverage.

The test suite effectively covers the complete module lifecycle including:

  • Base node initialization and readiness
  • Module pod deployment and state transitions
  • Error scenarios and invalid message handling
  • Proper cleanup

The use of BDD style with appropriate timeouts and assertions makes the tests both readable and reliable.

suite/base_lifecycle_test.go.bak (1)

181-181: Verify consistent metadata handling across base implementations

The change from direct ID field to Metadata.Name suggests a structural change in how base identifiers are managed. Let's verify this approach is consistently implemented.

✅ Verification successful

Let me gather more information about the metadata structure and its usage.


Let me gather one final piece of information to understand the node name to ID extraction logic.


Based on the evidence gathered from the shell script executions, I can now provide a final response:

The change to use Metadata.Name is consistent with the codebase's approach

The codebase consistently uses Metadata.Name for node identification, which is then processed through ExtractNodeIDFromNodeName utility function across different components:

  • In MQTT tunnel implementation for node registration and topic management
  • In HTTP tunnel implementation for base status tracking
  • In utility functions for node identity management

The change from direct ID field to Metadata.Name aligns with the existing patterns in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for base ID usage patterns
rg -A 2 'Metadata.Name' --type go
rg -A 2 'ID\s*=' --type go

# Search for base metadata structure definitions
ast-grep --pattern 'type $_Base struct {
  $$$
  Metadata $_
  $$$
}'

Length of output: 1973


Script:

#!/bin/bash
# Search for metadata type definition and usage
ast-grep --pattern 'type Metadata struct {
  $$$
}'

# Look for base-related structs and their fields
ast-grep --pattern 'type $_ struct {
  $$$
  Base $_
  $$$
}'

# Search for direct name field usage
rg -A 2 '\.Name\s*=' --type go

# Search for ID field usage in base-related contexts
rg -B 2 -A 2 'Base.*ID' --type go

Length of output: 7390


Script:

#!/bin/bash
# Look for the node name to ID extraction implementation
ast-grep --pattern 'func ExtractNodeIDFromNodeName($_) $_'

# Search for any direct ID field usage in base-related files
rg -B 2 -A 2 '\bID\b' suite/base_lifecycle_test.go

Length of output: 182

suite/mock_http_base.go (3)

23-23: LGTM: Metadata type update is consistent

The change from model.Metadata to model.BaseMetadata aligns with the broader metadata restructuring across the codebase.


162-166: LGTM: Heartbeat message structure properly updated

The heartbeat message structure correctly uses the new BaseMetadata type and includes all necessary fields.


248-248: LGTM: Response structure updates are consistent

The change from BaseID to BaseIdentity is applied consistently in both install and uninstall response structures.

Also applies to: 273-273

suite/module_deployment_controller_suite_test.go (3)

21-24: LGTM: Well-structured mock initialization with cluster context.

The addition of an explicit clusterName variable and its consistent usage in mock initialization improves test clarity and maintainability.


117-117: LGTM: Consistent deployment configuration patterns.

The changes to Deployment2 maintain consistency with Deployment1, following the same patterns for:

  • JAR-based image naming
  • NodeSelector configuration with cluster name and tech stack

Also applies to: 127-129


55-55: Verify deployment specifications match production requirements.

The changes include:

  1. Updated image name to specify JAR artifact
  2. Modified NodeSelector to include cluster name and tech stack

Please verify:

  1. The .jar suffix in image name aligns with your container registry conventions
  2. The NodeSelector changes match your virtual node labeling strategy

Also applies to: 65-67

✅ Verification successful

Based on the results, I can now generate the final response:

Image naming and node selector patterns are consistently implemented

The verification shows:

  • .jar suffix is consistently used across test files and matches the implementation pattern
  • LabelKeyOfVNodeClusterName is properly integrated in both test suites and the controller implementation, confirming the virtual node labeling strategy
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for consistent image naming patterns across the codebase
rg -g '*.{yaml,yml,go}' 'Image:.*\.jar"'

# Verify NodeSelector label usage
rg -g '*.go' 'LabelKeyOfVNodeClusterName'

Length of output: 1666

common/utils/utils.go (5)

22-23: LGTM! Function rename maintains consistency

The renaming from GetBaseIDFromTopic to GetBaseIdentityFromTopic aligns well with the broader terminology changes in the codebase.


98-99: LGTM! Clean function rename

The function maintains its correct resource calculation logic while adopting the new naming convention.


206-210: LGTM! Improved error handling and logging

The function has been simplified while maintaining robust error handling and adding structured logging fields.


72-95: Verify state mapping completeness

While the binary state mapping (ACTIVATED to NodeStateActivated, everything else to NodeStateDeactivated) is straightforward, consider whether there are other valid states that should be handled explicitly.

✅ Verification successful

Let me search for the actual state definitions to get a complete picture.


Let me try a different approach to find the state definitions.


Let me check the model definition to understand the complete state enumeration.


Let me check the vkModel package for the NodeState type definition.


Based on the evidence gathered from the codebase:

Binary state mapping is correct and complete

The state mapping in ConvertBaseStatusToNodeInfo is correctly implemented as a binary state system:

  • From the mock files and tests, we can see that "ACTIVATED" and "DEACTIVATED" are the only states used
  • The base model (BaseStatus) defines state as a simple string without any enumeration constraints
  • The conversion logic properly handles both cases: "ACTIVATED" maps to NodeStateActivated, and all other values default to NodeStateDeactivated
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other possible state values in the codebase
rg -g '!{vendor,.*}' -i "state.*=.*" --type go

Length of output: 926


Script:

#!/bin/bash
# Search for NodeState type definition and constants
ast-grep --pattern 'type NodeState'

# Search for possible state values in model package
rg -g '!{vendor,.*}' "NodeState.*=" --type go

Length of output: 85


Script:

#!/bin/bash
# Search for state-related constants and type definitions
rg -g '!{vendor,.*}' -B 3 -A 3 'NodeState' --type go

# Also search for any state-related strings in the model package
fd -e go . | xargs grep -l "State.*string"

Length of output: 1494


Script:

#!/bin/bash
# Check the model file for state definitions
cat common/model/model.go

Length of output: 2644


Script:

#!/bin/bash
# Find and check the vkModel package
fd -e go -g "**/vk*model*.go" . | xargs cat

Length of output: 45


225-245: Consider defining port constant and verify field mapping

Two suggestions for improvement:

  1. The hardcoded fallback port 1238 should be defined as a constant
  2. Verify that all required fields from NodeInfo are properly mapped to BaseStatus
+const defaultTunnelPort = 1238
+
 func ConvertBaseStatusFromNodeInfo(initData vkModel.NodeInfo) model.BaseStatus {
     portStr := initData.CustomLabels[model.LabelKeyOfTunnelPort]
     port, err := strconv.Atoi(portStr)
     if err != nil {
         logrus.Errorf("failed to parse port %s from node info", portStr)
-        port = 1238
+        port = defaultTunnelPort
     }
✅ Verification successful

We need to find the NodeInfo struct definition to verify the field mapping. Let me search for it.


Let me try a different approach to find the NodeInfo struct definition.


Let me search for the NodeInfo struct in the virtual-kubelet repository since it's imported from there.


Let me also check the NetworkInfo and NodeMetadata structs to complete the verification.


Based on the code analysis, I can now provide the final response:

The field mapping is correct, but the fallback port should be a constant

The field mapping between NodeInfo and BaseStatus is accurate and complete:

  • NodeInfo.Metadata.Name -> BaseStatus.BaseMetadata.Identity (via ExtractNodeIDFromNodeName)
  • NodeInfo.Metadata.Version -> BaseStatus.BaseMetadata.Version
  • NodeInfo.Metadata.ClusterName -> BaseStatus.BaseMetadata.ClusterName
  • NodeInfo.NetworkInfo.NodeIP -> BaseStatus.LocalIP
  • NodeInfo.NetworkInfo.HostName -> BaseStatus.LocalHostName
  • NodeInfo.CustomLabels[LabelKeyOfTunnelPort] -> BaseStatus.Port
  • NodeInfo.State -> BaseStatus.State

The only improvement needed is to define the fallback port as a constant:

const defaultTunnelPort = 1238
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for NodeInfo and BaseStatus struct definitions to verify field mapping
rg -A 10 "type (NodeInfo|BaseStatus) struct" --type go

Length of output: 910


Script:

#!/bin/bash
# Search for NodeInfo struct definition in vk model
fd -e go | xargs rg -A 15 "type NodeInfo struct"

Length of output: 50


Script:

#!/bin/bash
# Search for NodeInfo struct using ast-grep
ast-grep --pattern 'type NodeInfo struct {
  $$$
}'

# Also search for any imports of vkModel to find the package
rg "vkModel" --type go -B 2

Length of output: 18770


Script:

#!/bin/bash
# Clone and search virtual-kubelet repo for NodeInfo struct
git clone https://github.com/koupleless/virtual-kubelet.git /tmp/vk
cd /tmp/vk && rg -A 15 "type NodeInfo struct" --type go

Length of output: 1178


Script:

#!/bin/bash
# Search for NetworkInfo and NodeMetadata structs in virtual-kubelet
cd /tmp/vk && rg -A 10 "type (NetworkInfo|NodeMetadata) struct" --type go

# Also check for NodeState type/const
cd /tmp/vk && rg "type NodeState|const.*NodeState" --type go

Length of output: 1044

common/utils/utils_test.go (1)

8-8: Verify virtual-kubelet dependency update

The import path has been updated to use the new virtual-kubelet package. This aligns with the PR's update to the virtual-kubelet dependency.

✅ Verification successful

Import path update is consistent across the codebase

The import path change to github.com/koupleless/virtual-kubelet is verified and consistent. The dependency is properly defined in go.mod at version v0.3.5-0.20241119083342-61c32865bec2, and there are no remaining references to the old import path github.com/virtual-kubelet/virtual-kubelet/node. All files in the codebase are using the new import path correctly.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the virtual-kubelet dependency version in go.mod

# Check the version in go.mod
rg -A 1 "github.com/koupleless/virtual-kubelet" go.mod

Length of output: 165


Script:

#!/bin/bash
# Check if there are any other files still using the old import path
rg "github.com/virtual-kubelet/virtual-kubelet/node" -l

# Check all current imports of virtual-kubelet to ensure consistency
rg "github.com/koupleless/virtual-kubelet" -l

Length of output: 736

suite/suite_test.go (7)

5-5: LGTM!

The fmt package is appropriately imported to support string formatting in panic messages.


14-14: LGTM!

The vnode_controller package is correctly imported to utilize its functionalities.


101-104: LGTM!

moduleDeploymentController is correctly initialized and set up with the manager.


108-108: LGTM!

The mqttTunnel is correctly initialized with the necessary parameters.


111-114: LGTM!

Collecting tunnels into a slice for iteration simplifies tunnel management.


133-134: LGTM!

vnodeController is correctly set up with the manager.


125-130: Verify that vnodeController functions correctly with a single tunnel

The vnodeController is now initialized with only tunnels[0]. Ensure that this change is intentional and that vnodeController operates correctly with a single tunnel instead of multiple tunnels.

Run the following script to verify the NewVNodeController function signature and usage:

✅ Verification successful

Let me gather more information about the VNodeController implementation to verify this change.


Based on the search results, I can now provide a final response as there's clear evidence about the tunnel usage in the codebase.

VNodeController is designed to work with a single tunnel

The codebase explicitly shows that using a single tunnel is the intended design:

  • In cmd/module-controller/main.go, the comment "we only using one tunnel for now" and the implementation confirms this is by design
  • The NewVNodeController function consistently accepts a single tunnel parameter across all usage sites
  • The system supports multiple tunnel types (HTTP/MQTT) but is designed to use only one at runtime based on configuration
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that NewVNodeController expects a single tunnel parameter.

# Search for the definition of NewVNodeController
rg 'func NewVNodeController\(' -A 5

# Expected output: Function definition showing it accepts one tunnel parameter.

# Search for all calls to NewVNodeController
rg 'NewVNodeController\(' -A 3

# Expected output: Instances where `NewVNodeController` is called, to verify consistent usage.

Length of output: 610


Script:

#!/bin/bash
# Search for the VNodeController struct and its initialization
ast-grep --pattern 'type VNodeController struct {
  $$$
}'

# Search for any tunnel-related fields or methods
rg 'tunnel' -g '*.go' -B 2 -A 2

Length of output: 15347

controller/module_deployment_controller/module_deployment_controller.go (5)

119-119: Pass context to updateDeploymentReplicas consistently.

The updateDeploymentReplicas method now accepts a ctx parameter. Ensure that all calls to this method include the context to maintain context propagation and cancellation capabilities.


124-132: Context propagation improved in vnode event handlers.

The vnode event handlers now pass the context ctx to their respective handler functions (vnodeCreateHandler, vnodeUpdateHandler, vnodeDeleteHandler). This enhancement ensures better context management throughout the event handling process.


144-149: Deployment event handlers updated with context parameter.

The deploymentAddHandler and deploymentUpdateHandler functions now include a ctx parameter, aligning them with standard context propagation practices.


50-53: Ensure all instantiations reflect the removed tunnels parameter.

The NewModuleDeploymentController function no longer accepts the tunnels parameter. Verify that all invocations of this constructor across the codebase have been updated to match the new signature to prevent instantiation errors.

Run the following script to confirm all instantiations are updated:

✅ Verification successful

Let me check the actual implementation of these calls to ensure they match the new signature.


All constructor calls are correctly updated to match the new signature

All three occurrences of NewModuleDeploymentController in the codebase are using the new signature with only the env parameter:

  • suite/suite_test.go: NewModuleDeploymentController(env)
  • cmd/module-controller/main.go: NewModuleDeploymentController(env)
  • The constructor definition itself in controller/module_deployment_controller/module_deployment_controller.go

No issues found as all calls have been properly updated to reflect the removed tunnels parameter.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all calls to NewModuleDeploymentController and check their parameters.

# Search for the constructor usage.
rg 'NewModuleDeploymentController\s*\('

Length of output: 426


Script:

#!/bin/bash
# Get context for each call to verify the parameters
rg -B2 -A2 'NewModuleDeploymentController\s*\(' --context-separator='---'

Length of output: 1250


166-184: 🛠️ Refactor suggestion

Use passed context instead of context.Background().

In the QueryContainerBaseline method, context.Background() is used instead of the passed ctx. This can lead to loss of cancellation signals or deadlines.

Apply this diff to use the correct context:

 func (mdc *ModuleDeploymentController) QueryContainerBaseline(req model.QueryBaselineRequest) []corev1.Container {
     labelMap := map[string]string{
         vkModel.LabelKeyOfEnv:              mdc.env,
         vkModel.LabelKeyOfVNodeVersion:     req.Version,
         vkModel.LabelKeyOfVNodeClusterName: req.ClusterName,
     }
     for key, value := range req.CustomLabels {
         labelMap[key] = value
     }
     relatedDeploymentsByNode := appsv1.DeploymentList{}
-    err := mdc.cache.List(context.Background(), &relatedDeploymentsByNode, &client.ListOptions{
+    ctx := context.Background()
+    err := mdc.cache.List(ctx, &relatedDeploymentsByNode, &client.ListOptions{
         LabelSelector: labels.SelectorFromSet(labelMap),
     })
     if err != nil {
-        log.G(context.Background()).WithError(err).Error("failed to list deployments")
+        log.G(ctx).WithError(err).Error("failed to list deployments")
         return []corev1.Container{}
     }

Likely invalid or redundant comment.


nodeID := "test-http-base"
mockBase := NewMockHttpBase("test-base", "1.0.0", nodeID, env, 1238)
mockBase := NewMockHttpBase(nodeID, "1.0.0", env, 1238)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Constructor parameters mismatch detected

The constructor NewMockHttpBase is defined with 5 parameters:

func NewMockHttpBase(name, clusterName, version, env string, port int)

But it's being called with only 4 parameters:

mockBase := NewMockHttpBase(nodeID, "1.0.0", env, 1238)

The clusterName parameter is missing in the constructor call, which will cause compilation errors.

🔗 Analysis chain

Verify consistency of MockHttpBase constructor usage.

The constructor call has been updated to remove the hardcoded "test-base" parameter and reorder parameters. Let's verify this change is consistent across the codebase.


The results show a discrepancy between the constructor usage and its actual definition. Let me verify the exact constructor signature and its usage.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other usages of NewMockHttpBase to ensure consistent parameter ordering

# Search for all NewMockHttpBase constructor calls
echo "Searching for NewMockHttpBase constructor calls:"
rg "NewMockHttpBase\(" --type go

# Search for MockHttpBase struct definition to verify the constructor signature
echo -e "\nSearching for MockHttpBase definition:"
ast-grep --pattern 'type MockHttpBase struct {
  $$$
}'

Length of output: 1008


Script:

#!/bin/bash
# Get the full constructor implementation and its usage in the test file
echo "Constructor implementation:"
rg "func NewMockHttpBase" -A 5 --type go

echo -e "\nConstructor usage in test:"
rg "mockBase :=" suite/module_http_lifecycle_test.go.bak -B 2 -A 2

Length of output: 780

type BizOperationResponse struct {
Command string `json:"command"` // Operation command executed
BizName string `json:"bizName"` // Name of the business
BizName string `json:"bizName"` // ClusterName of the business
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider renaming BizName field to match its purpose.

The field name BizName representing a ClusterName could be confusing. Consider renaming for clarity:

-    BizName    string                  `json:"bizName"`    // ClusterName of the business
+    ClusterName string                  `json:"bizName"`    // ClusterName of the business

Note: Keep the JSON tag as "bizName" for backward compatibility if needed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
BizName string `json:"bizName"` // ClusterName of the business
ClusterName string `json:"bizName"` // ClusterName of the business

Comment on lines +41 to +42
Identity string `json:"identity"` // Identity base to filter by
ClusterName string `json:"name"` // ClusterName to filter by
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistent JSON tag naming.

The ClusterName field uses the JSON tag "name" which is inconsistent with the clusterName tag used in BaseMetadata. This could lead to confusion and maintenance issues.

-    ClusterName  string            `json:"name"`         // ClusterName to filter by
+    ClusterName  string            `json:"clusterName"`  // ClusterName to filter by
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Identity string `json:"identity"` // Identity base to filter by
ClusterName string `json:"name"` // ClusterName to filter by
Identity string `json:"identity"` // Identity base to filter by
ClusterName string `json:"clusterName"` // ClusterName to filter by


mqttNodeID := "test-mqtt-base"
mockMqttBase := NewMockMqttBase("test-mqtt-base", "1.0.0", mqttNodeID, env)
mockMqttBase := NewMockMqttBase(mqttNodeID, "1.0.0", env)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Constructor signatures mismatch with usage

The mock constructors are being called incorrectly in the test file:

  • NewMockMqttBase is called with 3 args (mqttNodeID, "1.0.0", env) but its signature requires 4 args (baseName, clusterName, version, env string)
  • NewMockHttpBase is called with 4 args (httpNodeID, "1.0.0", env, 1238) but its signature requires 5 args (name, clusterName, version, env string, port int)

The test file needs to be updated to provide the missing clusterName parameter to both constructors.

🔗 Analysis chain

Verify mock constructors match their declarations

The mock object instantiations look correct, but let's verify the constructor signatures match their implementations.

Also applies to: 22-22

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for mock constructors to verify signatures
ast-grep --pattern 'func NewMockMqttBase($_) $_'
ast-grep --pattern 'func NewMockHttpBase($_) $_'

Length of output: 864


Script:

#!/bin/bash
# Let's check the actual usage in the test file
cat suite/base_lifecycle_test.go.bak

Length of output: 5533


Script:

#!/bin/bash
# Let's check the HTTP mock implementation as well
cat suite/mock_http_base.go

Length of output: 6447

Comment on lines +32 to +39
func NewMockHttpBase(name, clusterName, version, env string, port int) *MockHttpBase {
return &MockHttpBase{
ID: id,
Env: env,
CurrState: "ACTIVATED",
Metadata: model.Metadata{
Name: name,
Version: version,
Metadata: model.BaseMetadata{
Identity: name,
ClusterName: clusterName,
Version: version,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Update test files to include clusterName parameter in NewMockHttpBase calls

The constructor calls in test files need to be updated as they're using the old signature with 4 parameters (name, version, env, port), while the new signature requires 5 parameters (name, clusterName, version, env, port). Found calls that need updating in:

  • suite/module_http_lifecycle_test.go.bak
  • suite/base_lifecycle_test.go.bak
🔗 Analysis chain

Verify all constructor calls are updated with clusterName parameter

The constructor changes look good, properly initializing the new BaseMetadata structure.


Let me search for usages of this constructor in test files as well, since it's a mock object.


Let me try one more search to find any potential usages in the entire codebase, including any string literals that might contain this constructor name:


Let me try one final search with corrected syntax:


Let me try one last search with simpler syntax:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for all NewMockHttpBase calls to ensure they include the clusterName parameter
rg "NewMockHttpBase\(" --type go -A 1 -B 1

Length of output: 227


Script:

#!/bin/bash
# Search for all NewMockHttpBase calls including in test files
rg "NewMockHttpBase\(" --type go --type-add 'go:*.go' -A 2 -B 2

Length of output: 316


Script:

#!/bin/bash
# Search for any mention of NewMockHttpBase in the codebase
rg "NewMockHttpBase" --type-add 'all:*.*' --type all

Length of output: 112


Script:

#!/bin/bash
# Search for any mention of NewMockHttpBase in all files
rg "NewMockHttpBase" --type-add 'all:*' -t all -A 2 -B 2

Length of output: 116


Script:

#!/bin/bash
# Search for any mention of NewMockHttpBase in all files
rg "NewMockHttpBase" -A 2 -B 2

Length of output: 993

Comment on lines +308 to +309
nodeID := utils2.ExtractNodeIDFromNodeName(nodeName)
return mqttTunnel.mqttClient.Pub(utils.FormatArkletCommandTopic(mqttTunnel.env, nodeID, model.CommandInstallBiz), mqtt.Qos0, installBizRequestBytes)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle errors returned by json.Marshal

Ignoring the error from json.Marshal(bizModel) may lead to issues if serialization fails. Proper error handling ensures robustness.

Apply this diff to handle the error:

-	installBizRequestBytes, _ := json.Marshal(bizModel)
+	installBizRequestBytes, err := json.Marshal(bizModel)
+	if err != nil {
+		logger.WithError(err).Error("Failed to marshal bizModel")
+		return err
+	}
 	nodeID := utils2.ExtractNodeIDFromNodeName(nodeName)
 	return mqttTunnel.mqttClient.Pub(...)

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +312 to +318
func (mqttTunnel *MqttTunnel) StopBiz(ctx context.Context, nodeName, _ string, container *corev1.Container) error {
bizModel := utils.TranslateCoreV1ContainerToBizModel(container)
unInstallBizRequestBytes, _ := json.Marshal(bizModel)
logger := log.G(ctx).WithField("bizName", bizModel.BizName).WithField("bizVersion", bizModel.BizVersion)
logger.Info("UninstallModule")
return m.mqttClient.Pub(utils.FormatArkletCommandTopic(m.env, nodeID, model.CommandUnInstallBiz), mqtt.Qos0, unInstallBizRequestBytes)
nodeID := utils2.ExtractNodeIDFromNodeName(nodeName)
return mqttTunnel.mqttClient.Pub(utils.FormatArkletCommandTopic(mqttTunnel.env, nodeID, model.CommandUnInstallBiz), mqtt.Qos0, unInstallBizRequestBytes)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle errors returned by json.Marshal

Similarly, in StopBiz, the error from json.Marshal(bizModel) should not be ignored to prevent potential issues.

Apply this diff to handle the error:

-	unInstallBizRequestBytes, _ := json.Marshal(bizModel)
+	unInstallBizRequestBytes, err := json.Marshal(bizModel)
+	if err != nil {
+		logger.WithError(err).Error("Failed to marshal bizModel")
+		return err
+	}
 	logger := log.G(ctx).WithField("bizName", bizModel.BizName).WithField("bizVersion", bizModel.BizVersion)
 	logger.Info("UninstallModule")
 	nodeID := utils2.ExtractNodeIDFromNodeName(nodeName)
 	return mqttTunnel.mqttClient.Pub(...)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (mqttTunnel *MqttTunnel) StopBiz(ctx context.Context, nodeName, _ string, container *corev1.Container) error {
bizModel := utils.TranslateCoreV1ContainerToBizModel(container)
unInstallBizRequestBytes, _ := json.Marshal(bizModel)
logger := log.G(ctx).WithField("bizName", bizModel.BizName).WithField("bizVersion", bizModel.BizVersion)
logger.Info("UninstallModule")
return m.mqttClient.Pub(utils.FormatArkletCommandTopic(m.env, nodeID, model.CommandUnInstallBiz), mqtt.Qos0, unInstallBizRequestBytes)
nodeID := utils2.ExtractNodeIDFromNodeName(nodeName)
return mqttTunnel.mqttClient.Pub(utils.FormatArkletCommandTopic(mqttTunnel.env, nodeID, model.CommandUnInstallBiz), mqtt.Qos0, unInstallBizRequestBytes)
func (mqttTunnel *MqttTunnel) StopBiz(ctx context.Context, nodeName, _ string, container *corev1.Container) error {
bizModel := utils.TranslateCoreV1ContainerToBizModel(container)
unInstallBizRequestBytes, err := json.Marshal(bizModel)
if err != nil {
logger.WithError(err).Error("Failed to marshal bizModel")
return err
}
logger := log.G(ctx).WithField("bizName", bizModel.BizName).WithField("bizVersion", bizModel.BizVersion)
logger.Info("UninstallModule")
nodeID := utils2.ExtractNodeIDFromNodeName(nodeName)
return mqttTunnel.mqttClient.Pub(utils.FormatArkletCommandTopic(mqttTunnel.env, nodeID, model.CommandUnInstallBiz), mqtt.Qos0, unInstallBizRequestBytes)

Comment on lines +201 to +203
func (mdc *ModuleDeploymentController) vnodeCreateHandler(ctx context.Context, vnode *corev1.Node) {
relatedDeploymentsByNode := mdc.GetRelatedDeploymentsByNode(ctx, vnode)
mdc.updateDeploymentReplicas(ctx, relatedDeploymentsByNode)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential concurrency issues with runtimeStorage.

In vnodeCreateHandler, updates to runtimeStorage may not be thread-safe. If runtimeStorage is accessed concurrently, consider using synchronization mechanisms like mutexes to prevent data races.

Comment on lines +345 to +371
func (mdc *ModuleDeploymentController) getReadyNodeCount(ctx context.Context, clusterName string) (int, error) {
nodeList := &corev1.NodeList{}
err := mdc.cache.List(ctx, nodeList, &client.ListOptions{
LabelSelector: labels.SelectorFromSet(map[string]string{vkModel.LabelKeyOfVNodeClusterName: clusterName}),
})

if err != nil {
logrus.WithError(err).Errorf("failed to list nodes of cluster %s", clusterName)
return 0, err
}

readyNodeCnt := 0
for _, node := range nodeList.Items {
if node.Status.Phase == corev1.NodeRunning {
for _, cond := range node.Status.Conditions {
if cond.Type == corev1.NodeReady {
if cond.Status == corev1.ConditionTrue {
readyNodeCnt++
break
}
}
}
}
}

return readyNodeCnt, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize getReadyNodeCount for performance in large clusters.

Iterating over all nodes might be inefficient for large clusters. Consider applying additional filters or label selectors to limit the nodes processed.

Comment on lines +229 to +237
err := mdc.cache.List(ctx, &appsv1.DeploymentList{}, &client.ListOptions{
LabelSelector: labels.SelectorFromSet(map[string]string{
model.LabelKeyOfVPodDeploymentStrategy: string(model.VPodDeploymentStrategyPeer),
}),
})

if err != nil {
logrus.WithError(err).Error("failed to list deployments")
return matchedDeployments
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct the cache.List call to use the correct variable.

There's a bug where a new DeploymentList is created in the cache.List call instead of using the deploymentList variable.

Apply this diff to fix the issue:

 deploymentList := appsv1.DeploymentList{}
-	err := mdc.cache.List(ctx, &appsv1.DeploymentList{}, &client.ListOptions{
+	err := mdc.cache.List(ctx, &deploymentList, &client.ListOptions{
 		LabelSelector: labels.SelectorFromSet(map[string]string{
 			model.LabelKeyOfVPodDeploymentStrategy: string(model.VPodDeploymentStrategyPeer),
 		}),
 	})

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Outside diff range and nitpick comments (18)
common/model/model.go (1)

20-27: Consider enhancing documentation for State field.

The struct is well-designed, but the State field's documentation could be more specific about possible values.

-	State         string       `json:"state"`         // Current state of the base
+	State         string       `json:"state"`         // Current state of the base (e.g., "running", "stopped", etc.)
suite/suite_test.go (2)

89-96: Enhance error message for MQTT server failure.

While the error handling is good, consider making the error message more descriptive.

-			log.G(ctx).Error("failed to start mqtt server")
+			log.G(ctx).Error("failed to start mqtt server on port 1883")

113-124: Improve error handling for tunnel initialization.

While the tunnel initialization is correct, consider enhancing the error handling:

  1. The panic message could include more context about why the tunnel failed to start
  2. Consider logging the actual error details before panic
-			panic(fmt.Sprintf("failed to start tunnel %s", t.Key()))
+			log.G(ctx).WithError(err).Errorf("failed to start tunnel %s: detailed error", t.Key())
+			panic(fmt.Sprintf("failed to start tunnel %s: %v", t.Key(), err))
cmd/module-controller/main.go (1)

169-170: Add function documentation.

This function would benefit from a proper Go doc comment explaining its purpose, parameters, return value, and possible error conditions.

Add documentation like:

+// startTunnels initializes and starts the configured tunnels (MQTT and/or HTTP).
+// It returns the first successfully started tunnel or panics if no tunnels could be started.
+// Parameters:
+//   ctx: context for tunnel operations
+//   clientId: unique identifier for the client
+//   env: environment name (dev, prod, etc.)
+//   mgr: controller manager instance
+//   moduleDeploymentController: controller for module deployments
+// Returns:
+//   tunnel.Tunnel: the first successfully started tunnel
 func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
 	moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) tunnel.Tunnel {
common/utils/utils.go (6)

22-23: Consider improving input validation and error handling.

The function could benefit from better input validation and error handling to distinguish between different error cases.

Consider this improvement:

-func GetBaseIdentityFromTopic(topic string) string {
+func GetBaseIdentityFromTopic(topic string) (string, error) {
+	if topic == "" {
+		return "", fmt.Errorf("empty topic")
+	}
 	fileds := strings.Split(topic, "/")
 	if len(fileds) < 2 {
-		return ""
+		return "", fmt.Errorf("invalid topic format: %s", topic)
 	}
-	return fileds[1]
+	return fileds[1], nil
}

72-95: Add port range validation.

While the port check handles zero values, it should validate against the full range of valid port numbers (0-65535).

Consider this improvement:

 if data.Port != 0 {
+	if data.Port < 1 || data.Port > 65535 {
+		log.G(context.TODO()).Warnf("Invalid port number: %d, using default", data.Port)
+		data.Port = 1238 // default port
+	}
 	labels[model.LabelKeyOfTunnelPort] = strconv.Itoa(data.Port)
 }

Line range hint 98-120: Consider making the tech stack configurable.

The tech stack is currently hardcoded to "java". This might limit the function's reusability for other tech stacks.

Consider passing the tech stack as a parameter or deriving it from the input data:

-func ConvertHealthDataToNodeStatus(data ark.HealthData) vkModel.NodeStatusData {
+func ConvertHealthDataToNodeStatus(data ark.HealthData, techStack string) vkModel.NodeStatusData {
 	// ... existing code ...
 	CustomLabels: map[string]string{
-		model.LabelKeyOfTechStack: "java",
+		model.LabelKeyOfTechStack: techStack,
 	},

Line range hint 121-130: Extract duplicated tech stack label into a constant.

The tech stack label "java" is duplicated across multiple functions. This should be extracted into a shared constant.

Consider this improvement:

+const (
+	defaultTechStack = "java"
+)

 func ConvertBaseMetadataToBaselineQuery(data model.BaseMetadata) model.QueryBaselineRequest {
 	return model.QueryBaselineRequest{
 		Identity:    data.Identity,
 		ClusterName: data.ClusterName,
 		Version:     data.Version,
 		CustomLabels: map[string]string{
-			model.LabelKeyOfTechStack: "java",
+			model.LabelKeyOfTechStack: defaultTechStack,
 		},
 	}
 }

Line range hint 206-222: Add context cancellation check.

The function should respect context cancellation to ensure proper cleanup.

Consider this improvement:

 func OnBaseUnreachable(ctx context.Context, nodeName string, k8sClient client.Client) {
+	if ctx.Err() != nil {
+		return
+	}
 	// base not ready, delete from api server
 	node := corev1.Node{}
 	err := k8sClient.Get(ctx, client.ObjectKey{Name: nodeName}, &node)

225-246: Extract default port value into a constant.

The default port value (1238) should be defined as a constant at the package level for better maintainability.

Consider this improvement:

+const (
+	defaultTunnelPort = 1238
+)

 func ConvertBaseStatusFromNodeInfo(initData vkModel.NodeInfo) model.BaseStatus {
 	portStr := initData.CustomLabels[model.LabelKeyOfTunnelPort]
 	port, err := strconv.Atoi(portStr)
 	if err != nil {
-		logrus.Errorf("failed to parse port %s from node info", portStr)
-		port = 1238
+		logrus.Errorf("failed to parse port %s from node info, using default port %d", portStr, defaultTunnelPort)
+		port = defaultTunnelPort
 	}
suite/mock_mqtt_base.go (1)

255-258: Consider extracting topic paths to constants

Multiple methods use the same topic path format strings. Consider extracting these to constants to improve maintainability and reduce duplication.

Example refactor:

+const (
+    BaseHealthTopicFormat = "koupleless_%s/%s/base/health"
+    BaseBizTopicFormat = "koupleless_%s/%s/base/biz"
+    BaseHeartTopicFormat = "koupleless_%s/%s/base/heart"
+    BaseQueryBaselineTopicFormat = "koupleless_%s/%s/base/queryBaseline"
+)

 func (b *MockMQTTBase) SendInvalidMessage() {
-    b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/health", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
+    b.client.Pub(fmt.Sprintf(BaseHealthTopicFormat, b.Env, b.BaseMetadata.Identity), 1, []byte(""))
     // Apply similar changes to other topic strings
 }

Also applies to: 262-265, 269-270

module_tunnels/koupleless_http_tunnel/http_tunnel.go (4)

27-52: LGTM! Good architectural improvements.

The changes demonstrate good practices:

  1. Explicit interface assertion ensures type safety
  2. Clear field naming improves code readability
  3. Dependency injection of moduleDeploymentController follows SOLID principles

Consider documenting the purpose of each field with comments, especially for the callback functions, to improve maintainability.


Line range hint 196-207: Improve server shutdown handling.

The deferred shutdown is inside a goroutine, which means it might not execute if the main goroutine exits first. Consider moving the shutdown logic to the main context cancellation handler.

 go func() {
     if err := server.ListenAndServe(); err != nil {
         logger.WithError(err).Error("error starting http base discovery server")
     }
-    defer func() {
-        shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
-        defer cancel()
-        if err := server.Shutdown(shutdownCtx); err != nil {
-            logger.WithError(err).Error("error shutting down http server")
-        }
-    }()
 }()
+
+go func() {
+    <-ctx.Done()
+    shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+    defer cancel()
+    if err := server.Shutdown(shutdownCtx); err != nil {
+        logger.WithError(err).Error("error shutting down http server")
+    }
+}()

Line range hint 278-311: Enhance error handling in QueryAllBizStatusData.

The retry mechanism is good, but error details are lost when retrying. Consider logging the error before retrying and implementing a retry limit to prevent infinite retries.

 defer func() {
     h.queryAllBizLock.Unlock()
     if h.queryAllBizDataOutdated {
+        logger.WithError(err).Info("Retrying query due to outdated data")
         go h.QueryAllBizStatusData(ctx, nodeName)
     }
 }()

Line range hint 314-371: Reduce code duplication in StartBiz and StopBiz.

Both methods share similar setup and error handling logic. Consider extracting common code into a helper function.

+func (h *HttpTunnel) executeBizOperation(ctx context.Context, nodeName string, container *corev1.Container, 
+    operation func(ctx context.Context, request interface{}, ip string, port int) (model.GenericArkResponse, error),
+    command model.BizOperationCommand) error {
+    nodeID := utils2.ExtractNodeIDFromNodeName(nodeName)
+    h.Lock()
+    baseStatus, ok := h.nodeIdToBaseStatusMap[nodeID]
+    h.Unlock()
+    if !ok {
+        return errors.New("network info not found")
+    }
+
+    bizModel := utils.TranslateCoreV1ContainerToBizModel(container)
+    logger := log.G(ctx).WithField("bizName", bizModel.BizName).WithField("bizVersion", bizModel.BizVersion)
+    logger.Info(string(command))
+
+    bizOperationResponse := model.BizOperationResponse{
+        Command:    command,
+        BizName:    bizModel.BizName,
+        BizVersion: bizModel.BizVersion,
+    }
+
+    response, err := operation(ctx, bizModel, baseStatus.LocalIP, baseStatus.Port)
+    bizOperationResponse.Response = response
+    h.bizOperationResponseCallback(nodeID, bizOperationResponse)
+
+    return err
+}
module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go (1)

Line range hint 271-291: Verify onOneBizDataArrived is not nil before use

The callback is used without checking if it's nil, which could cause a panic.

Apply this diff:

+	if mqttTunnel.onOneBizDataArrived == nil {
+		return
+	}
 	if data.Data.Command == model.CommandInstallBiz && data.Data.Response.Code == "SUCCESS" {
 		mqttTunnel.onOneBizDataArrived(nodeName, vkModel.BizStatusData{
common/utils/utils_test.go (1)

539-544: Use %v for safer error message formatting

For better robustness when formatting error messages, use %v instead of direct string interpolation.

Apply this diff:

-			t.Errorf("ConvertHealthDataToNodeStatus(%v) = %v; want %v", tc.input, result, tc.expected)
+			t.Errorf("ConvertHealthDataToNodeStatus(%v) = %v; want %v", tc.input, result, tc.expected)
controller/module_deployment_controller/module_deployment_controller.go (1)

295-295: Fix typo in log message

There's a typo in the log message: "relicas" should be "replicas".

Apply this diff:

-logrus.WithError(err).Errorf("failed to get nodes of cluster %s, skip to update relicas", clusterName)
+logrus.WithError(err).Errorf("failed to get nodes of cluster %s, skip to update replicas", clusterName)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f97acee and 90d7e3d.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (20)
  • cmd/module-controller/main.go (3 hunks)
  • common/model/consts.go (1 hunks)
  • common/model/model.go (1 hunks)
  • common/utils/utils.go (5 hunks)
  • common/utils/utils_test.go (10 hunks)
  • controller/module_deployment_controller/module_deployment_controller.go (7 hunks)
  • controller/module_deployment_controller/module_deployment_controller_test.go (2 hunks)
  • go.mod (1 hunks)
  • module_tunnels/koupleless_http_tunnel/ark_service/ark_service.go (1 hunks)
  • module_tunnels/koupleless_http_tunnel/ark_service/model.go (1 hunks)
  • module_tunnels/koupleless_http_tunnel/http_tunnel.go (15 hunks)
  • module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go (9 hunks)
  • module_tunnels/module_tunnel.go (0 hunks)
  • suite/base_lifecycle_test.go.bak (2 hunks)
  • suite/mock_http_base.go (7 hunks)
  • suite/mock_mqtt_base.go (9 hunks)
  • suite/module_deployment_controller_suite_test.go (5 hunks)
  • suite/module_http_lifecycle_test.go.bak (1 hunks)
  • suite/module_mqtt_lifecycle_test.go.bak (2 hunks)
  • suite/suite_test.go (3 hunks)
💤 Files with no reviewable changes (1)
  • module_tunnels/module_tunnel.go
🚧 Files skipped from review as they are similar to previous changes (9)
  • common/model/consts.go
  • controller/module_deployment_controller/module_deployment_controller_test.go
  • go.mod
  • module_tunnels/koupleless_http_tunnel/ark_service/ark_service.go
  • module_tunnels/koupleless_http_tunnel/ark_service/model.go
  • suite/base_lifecycle_test.go.bak
  • suite/module_deployment_controller_suite_test.go
  • suite/module_http_lifecycle_test.go.bak
  • suite/module_mqtt_lifecycle_test.go.bak
🔇 Additional comments (22)
common/model/model.go (3)

13-17: LGTM! Well-structured metadata type.

The BaseMetadata struct is well-designed with clear field names and consistent JSON tags.


33-33: Previous comment about renaming BizName field remains valid.


41-42: Previous comment about JSON tag inconsistency remains valid.

Additionally, let's verify the impact of this inconsistency across the codebase.

#!/bin/bash
# Description: Check for inconsistent usage of clusterName/name JSON tags
# and their potential impact on unmarshaling logic

# Search for structs with clusterName or name JSON tags
rg -U 'json:"(clusterName|name)".*\n.*struct' 

# Search for unmarshal operations that might be affected
rg -A 3 'json.Unmarshal.*QueryBaselineRequest'
suite/suite_test.go (2)

5-5: LGTM: Import changes are appropriate.

The added imports support the enhanced error handling and updated package structure.

Also applies to: 14-14


106-106: LGTM: Simplified controller initialization.

The controller initialization is cleaner and maintains proper error handling.

suite/mock_http_base.go (6)

32-39: LGTM: Constructor changes align with BaseMetadata structure

The constructor properly initializes the new BaseMetadata structure with Identity, ClusterName, and Version fields.


68-68: LGTM: Improved comment accuracy

The comment now better reflects the component being mocked.


162-166: LGTM: HeartBeat message properly structured

The BaseStatus message structure is correctly initialized with all required fields from the new BaseMetadata.


185-187: LGTM: Health message fields properly updated

The health message correctly uses the new field names (Identity instead of Name) from BaseMetadata.


248-248: LGTM: Response structs consistently updated

The response structs now consistently use BaseIdentity instead of BaseID, aligning with the new metadata structure.

Also applies to: 273-273


23-23: Verify all references to Metadata field are updated

The change from model.Metadata to model.BaseMetadata is a breaking change. While the file's internal references have been updated, we should verify there are no external references to this field.

✅ Verification successful

Change from model.Metadata to model.BaseMetadata appears safe

Based on the code analysis:

  1. The Metadata field is only used internally within mock_http_base.go
  2. The only direct usage found is in getHeartBeatMsg() method where it's correctly using the new BaseMetadata type
  3. No external references to MockHttpBase.Metadata were found in other files
  4. The struct is only used for testing purposes as indicated by its name and location
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any direct access to MockHttpBase.Metadata
rg "MockHttpBase.*\.Metadata\." --type go -A 2 -B 2

Length of output: 53


Script:

#!/bin/bash
# Let's try a broader search for MockHttpBase and Metadata references
rg "MockHttpBase.*Metadata" --type go -A 2 -B 2

# Also search for any usage of the struct itself
rg "MockHttpBase" --type go -A 2 -B 2

# And let's check the actual struct definition and its usages using ast-grep
ast-grep --pattern 'type MockHttpBase struct {
  $$$
}'

# Search for any test files that might use this mock
fd ".*_test\.go$" --exec grep -l "MockHttpBase"

Length of output: 3019

suite/mock_mqtt_base.go (3)

19-25: LGTM: Struct field updates improve clarity

The renaming of ID to Name and the introduction of BaseMetadata improve code clarity and maintain consistency with the broader codebase changes.


74-75: Error handling needed for MQTT subscriptions

The previous review comment about adding error handling for MQTT subscriptions is still applicable.


105-110: LGTM: Heartbeat message structure properly updated

The heartbeat message structure has been correctly updated to use the new BaseStatus type and properly includes all required BaseMetadata fields.

module_tunnels/koupleless_http_tunnel/http_tunnel.go (1)

74-92: Verify thread safety of nodeIdToBaseStatusMap initialization.

While the methods properly use mutex for synchronization, ensure that nodeIdToBaseStatusMap is initialized before these methods are called. A race condition could occur if these methods are called before Start().

✅ Verification successful

Let me search for the initialization of HttpTunnel to verify the thread safety.


Let me try a different pattern to search for the initialization.


Thread safety is properly handled - no action needed

The map initialization is correctly done in the Start method before any concurrent access can occur. The NewHttpTunnel constructor doesn't initialize the map, but the Start method initializes it before starting the base discovery goroutine. All subsequent accesses to the map in RegisterNode and UnRegisterNode are properly protected by mutex locks.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for all accesses to nodeIdToBaseStatusMap
rg -A 2 "nodeIdToBaseStatusMap" 

# Search for potential concurrent calls to RegisterNode/UnRegisterNode
ast-grep --pattern 'go func() { 
  $$$
  RegisterNode($$$)
  $$$
}'

Length of output: 2472


Script:

#!/bin/bash
# Search for HttpTunnel initialization
ast-grep --pattern 'func NewHttpTunnel($$$) *HttpTunnel {
  $$$
}'

# Also search for Start method
ast-grep --pattern 'func (h *HttpTunnel) Start($$$) {
  $$$
}'

Length of output: 133


Script:

#!/bin/bash
# Search for HttpTunnel initialization with different pattern
rg -A 10 "func NewHttpTunnel"

# Search for Start method
rg -A 10 "func.*Start\("

# Search for map initialization
rg "nodeIdToBaseStatusMap.*make\("

Length of output: 4205

module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go (3)

68-69: Mutex usage appears unnecessary

The Lock/Unlock calls in both RegisterNode and UnRegisterNode don't protect any shared resources.

Also applies to: 82-83


126-136: LGTM! Improved error handling and cleanup

Good improvements:

  • Proper error handling for MQTT connection
  • Clean disconnection on context cancellation

303-309: Handle JSON marshaling errors

The error from json.Marshal is ignored in both StartBiz and StopBiz methods.

Also applies to: 312-318

common/utils/utils_test.go (3)

267-269: Test function name needs to be updated

The test function name doesn't match the function it's testing. The function has been renamed to ConvertBaseStatusToNodeInfo but the test is still named TestTranslateHeartBeatDataToNodeInfo.


603-603: Use %v for safe string formatting

The error message uses %s for formatting which could panic if the input contains format specifiers.


629-634: Document magic number usage

The assertion uses a magic number 1238 which appears to be a default port number.

controller/module_deployment_controller/module_deployment_controller.go (1)

191-191: 🛠️ Refactor suggestion

Correct log formatting and use ctx

In the log statement, replace context.Background() with ctx and fix the format specifier to %v to correctly display the containers.

Apply this diff:

-log.G(context.Background()).Infof("query base line got: %", containers)
+log.G(ctx).Infof("query base line got: %v", containers)

Likely invalid or redundant comment.

Comment on lines +130 to +139
vnodeController, err := vnode_controller.NewVNodeController(&model.BuildVNodeControllerConfig{
ClientID: clientID,
Env: env,
VPodIdentity: model2.ComponentModule,
VNodeWorkerNum: 4,
}, tunnels[0])
Expect(err).ToNot(HaveOccurred())

err = vnodeController.SetupWithManager(ctx, k8sManager)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add validation for tunnel selection.

The code assumes the tunnels slice is non-empty when accessing tunnels[0]. Consider adding validation to prevent potential runtime panic.

+	if len(tunnels) == 0 {
+		err = fmt.Errorf("no tunnels available for VNode controller")
+		log.G(ctx).Error(err)
+		panic(err)
+	}
 	vnodeController, err := vnode_controller.NewVNodeController(&model.BuildVNodeControllerConfig{
 		ClientID:       clientID,
 		Env:            env,
 		VPodIdentity:   model2.ComponentModule,
 		VNodeWorkerNum: 4,
 	}, tunnels[0])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
vnodeController, err := vnode_controller.NewVNodeController(&model.BuildVNodeControllerConfig{
ClientID: clientID,
Env: env,
VPodIdentity: model2.ComponentModule,
VNodeWorkerNum: 4,
}, tunnels[0])
Expect(err).ToNot(HaveOccurred())
err = vnodeController.SetupWithManager(ctx, k8sManager)
if len(tunnels) == 0 {
err = fmt.Errorf("no tunnels available for VNode controller")
log.G(ctx).Error(err)
panic(err)
}
vnodeController, err := vnode_controller.NewVNodeController(&model.BuildVNodeControllerConfig{
ClientID: clientID,
Env: env,
VPodIdentity: model2.ComponentModule,
VNodeWorkerNum: 4,
}, tunnels[0])
Expect(err).ToNot(HaveOccurred())
err = vnodeController.SetupWithManager(ctx, k8sManager)

Comment on lines +139 to +141
tunnel := startTunnels(ctx, clientID, env, mgr, mdc)

vc, err := vnode_controller.NewVNodeController(&rcc, tunnel)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for tunnel initialization.

The tunnel returned by startTunnels is used directly without checking if it's valid. While startTunnels will panic on failure, it's better to handle errors gracefully at this level for better error reporting and cleanup.

Consider this approach:

-	tunnel := startTunnels(ctx, clientID, env, mgr, mdc)
-
-	vc, err := vnode_controller.NewVNodeController(&rcc, tunnel)
+	tunnel, err := startTunnels(ctx, clientID, env, mgr, mdc)
+	if err != nil {
+		log.G(ctx).Error(err, "failed to initialize tunnels")
+		os.Exit(1)
+	}
+
+	vc, err := vnode_controller.NewVNodeController(&rcc, tunnel)

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +207 to +214
if startFailedCount > 0 {
panic(errors.New(fmt.Sprintf("failed to start %d tunnels", startFailedCount)))
} else if successTunnelCount == 0 {
panic(errors.New(fmt.Sprintf("successfully started 0 tunnels")))
}
// we only using one tunnel for now
return tunnels[0]
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve error handling and tunnel selection.

The current implementation has several issues:

  1. Using panic for error handling is not idiomatic Go
  2. Accessing tunnels[0] without checking slice length is unsafe
  3. The comment about using only one tunnel suggests potential technical debt

Consider this more robust approach:

-	if startFailedCount > 0 {
-		panic(errors.New(fmt.Sprintf("failed to start %d tunnels", startFailedCount)))
-	} else if successTunnelCount == 0 {
-		panic(errors.New(fmt.Sprintf("successfully started 0 tunnels")))
-	}
-	// we only using one tunnel for now
-	return tunnels[0]
+	if startFailedCount > 0 {
+		return nil, fmt.Errorf("failed to start %d out of %d tunnels", 
+			startFailedCount, len(tunnels))
+	}
+	if successTunnelCount == 0 {
+		return nil, errors.New("no tunnels were configured to start")
+	}
+	if len(tunnels) == 0 {
+		return nil, errors.New("no tunnels were initialized")
+	}
+	// TODO: Define criteria for selecting which tunnel to use when multiple are available
+	return tunnels[0], nil
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if startFailedCount > 0 {
panic(errors.New(fmt.Sprintf("failed to start %d tunnels", startFailedCount)))
} else if successTunnelCount == 0 {
panic(errors.New(fmt.Sprintf("successfully started 0 tunnels")))
}
// we only using one tunnel for now
return tunnels[0]
}
if startFailedCount > 0 {
return nil, fmt.Errorf("failed to start %d out of %d tunnels",
startFailedCount, len(tunnels))
}
if successTunnelCount == 0 {
return nil, errors.New("no tunnels were configured to start")
}
if len(tunnels) == 0 {
return nil, errors.New("no tunnels were initialized")
}
// TODO: Define criteria for selecting which tunnel to use when multiple are available
return tunnels[0], nil
}

Comment on lines +31 to +38
func NewMockMqttBase(baseName, clusterName, version, env string) *MockMQTTBase {
return &MockMQTTBase{
ID: id,
Name: baseName,
Env: env,
CurrState: "ACTIVATED",
Metadata: model.Metadata{
Name: name,
Version: version,
BaseMetadata: model.BaseMetadata{
ClusterName: clusterName,
Version: version,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Initialize BaseMetadata.Identity field

The BaseMetadata.Identity field is used throughout the code for MQTT topics but is not initialized in the constructor. This could lead to empty topic paths.

Add the Identity initialization:

 BaseMetadata: model.BaseMetadata{
   ClusterName: clusterName,
   Version:     version,
+  Identity:    baseName,
 },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func NewMockMqttBase(baseName, clusterName, version, env string) *MockMQTTBase {
return &MockMQTTBase{
ID: id,
Name: baseName,
Env: env,
CurrState: "ACTIVATED",
Metadata: model.Metadata{
Name: name,
Version: version,
BaseMetadata: model.BaseMetadata{
ClusterName: clusterName,
Version: version,
func NewMockMqttBase(baseName, clusterName, version, env string) *MockMQTTBase {
return &MockMQTTBase{
Name: baseName,
Env: env,
CurrState: "ACTIVATED",
BaseMetadata: model.BaseMetadata{
ClusterName: clusterName,
Version: version,
Identity: baseName,

Comment on lines +55 to +61
func NewHttpTunnel(env string, kubeClient client.Client, moduleDeploymentController *module_deployment_controller.ModuleDeploymentController, port int) HttpTunnel {
return HttpTunnel{
env: env,
kubeClient: kubeClient,
moduleDeploymentController: moduleDeploymentController,
Port: port,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider returning a pointer to HttpTunnel.

The constructor currently returns a value type. Since HttpTunnel contains mutex and maps, it should be used as a pointer type to avoid copying these sync primitives.

-func NewHttpTunnel(env string, kubeClient client.Client, moduleDeploymentController *module_deployment_controller.ModuleDeploymentController, port int) HttpTunnel {
-	return HttpTunnel{
+func NewHttpTunnel(env string, kubeClient client.Client, moduleDeploymentController *module_deployment_controller.ModuleDeploymentController, port int) *HttpTunnel {
+	return &HttpTunnel{
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func NewHttpTunnel(env string, kubeClient client.Client, moduleDeploymentController *module_deployment_controller.ModuleDeploymentController, port int) HttpTunnel {
return HttpTunnel{
env: env,
kubeClient: kubeClient,
moduleDeploymentController: moduleDeploymentController,
Port: port,
}
func NewHttpTunnel(env string, kubeClient client.Client, moduleDeploymentController *module_deployment_controller.ModuleDeploymentController, port int) *HttpTunnel {
return &HttpTunnel{
env: env,
kubeClient: kubeClient,
moduleDeploymentController: moduleDeploymentController,
Port: port,
}

Comment on lines +43 to +49
func NewMqttTunnel(env string, kubeClient client.Client, moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) MqttTunnel {
return MqttTunnel{
env: env,
kubeClient: kubeClient,
moduleDeploymentController: moduleDeploymentController,
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Return a pointer from the constructor

The constructor returns a value but all methods have pointer receivers. Following Go idioms, the constructor should return a pointer to avoid unnecessary copying.

Apply this diff:

-func NewMqttTunnel(env string, kubeClient client.Client, moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) MqttTunnel {
-	return MqttTunnel{
+func NewMqttTunnel(env string, kubeClient client.Client, moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) *MqttTunnel {
+	return &MqttTunnel{
 		env:                        env,
 		kubeClient:                 kubeClient,
 		moduleDeploymentController: moduleDeploymentController,
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func NewMqttTunnel(env string, kubeClient client.Client, moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) MqttTunnel {
return MqttTunnel{
env: env,
kubeClient: kubeClient,
moduleDeploymentController: moduleDeploymentController,
}
}
func NewMqttTunnel(env string, kubeClient client.Client, moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) *MqttTunnel {
return &MqttTunnel{
env: env,
kubeClient: kubeClient,
moduleDeploymentController: moduleDeploymentController,
}
}

Comment on lines +175 to +186
baselineContainers := mqttTunnel.moduleDeploymentController.QueryContainerBaseline(utils.ConvertBaseMetadataToBaselineQuery(data.Data))
go func() {
baselineBizs := make([]ark.BizModel, 0)
for _, container := range baselineContainers {
baselineBizs = append(baselineBizs, utils.TranslateCoreV1ContainerToBizModel(&container))
}
baselineBytes, _ := json.Marshal(baselineBizs)
err = mqttTunnel.mqttClient.Pub(utils.FormatBaselineResponseTopic(mqttTunnel.env, nodeID), mqtt.Qos1, baselineBytes)
if err != nil {
logrus.WithError(err).Errorf("Error publishing baselineContainers response data")
}
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for JSON marshaling

The baseline response marshaling ignores potential errors which could lead to sending invalid data.

Apply this diff:

-		baselineBytes, _ := json.Marshal(baselineBizs)
+		baselineBytes, err := json.Marshal(baselineBizs)
+		if err != nil {
+			logrus.WithError(err).Error("Failed to marshal baseline response")
+			return
+		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
baselineContainers := mqttTunnel.moduleDeploymentController.QueryContainerBaseline(utils.ConvertBaseMetadataToBaselineQuery(data.Data))
go func() {
baselineBizs := make([]ark.BizModel, 0)
for _, container := range baselineContainers {
baselineBizs = append(baselineBizs, utils.TranslateCoreV1ContainerToBizModel(&container))
}
baselineBytes, _ := json.Marshal(baselineBizs)
err = mqttTunnel.mqttClient.Pub(utils.FormatBaselineResponseTopic(mqttTunnel.env, nodeID), mqtt.Qos1, baselineBytes)
if err != nil {
logrus.WithError(err).Errorf("Error publishing baselineContainers response data")
}
}()
baselineContainers := mqttTunnel.moduleDeploymentController.QueryContainerBaseline(utils.ConvertBaseMetadataToBaselineQuery(data.Data))
go func() {
baselineBizs := make([]ark.BizModel, 0)
for _, container := range baselineContainers {
baselineBizs = append(baselineBizs, utils.TranslateCoreV1ContainerToBizModel(&container))
}
baselineBytes, err := json.Marshal(baselineBizs)
if err != nil {
logrus.WithError(err).Error("Failed to marshal baseline response")
return
}
err = mqttTunnel.mqttClient.Pub(utils.FormatBaselineResponseTopic(mqttTunnel.env, nodeID), mqtt.Qos1, baselineBytes)
if err != nil {
logrus.WithError(err).Errorf("Error publishing baselineContainers response data")
}
}()

continue
}

if int32(sameClusterNodeCount) != *deployment.Spec.Replicas {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Check for nil before dereferencing deployment.Spec.Replicas

deployment.Spec.Replicas is a pointer and may be nil. Dereferencing it without checking can cause a runtime panic. Ensure it's not nil before comparison.

Apply this diff to handle possible nil value:

- if int32(sameClusterNodeCount) != *deployment.Spec.Replicas {
+ if deployment.Spec.Replicas == nil || int32(sameClusterNodeCount) != *deployment.Spec.Replicas {

Alternatively, decide on a default value or handling strategy when deployment.Spec.Replicas is nil.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if int32(sameClusterNodeCount) != *deployment.Spec.Replicas {
if deployment.Spec.Replicas == nil || int32(sameClusterNodeCount) != *deployment.Spec.Replicas {

Comment on lines +177 to +183
err := mdc.cache.List(context.Background(), &relatedDeploymentsByNode, &client.ListOptions{
LabelSelector: labels.SelectorFromSet(labelMap),
})
if err != nil {
log.G(context.Background()).WithError(err).Error("failed to list deployments")
return []corev1.Container{}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use ctx instead of context.Background() for context management

In QueryContainerBaseline, using context.Background() ignores cancellation signals and deadlines from the parent context. Replace context.Background() with the passed ctx parameter to properly handle context propagation.

Apply this diff to fix the issue:

 err := mdc.cache.List(
-    context.Background(),
+    ctx,
     &relatedDeploymentsByNode,
     &client.ListOptions{
         LabelSelector: labels.SelectorFromSet(labelMap),
     },
 )

Similarly, update the error logging:

 if err != nil {
-    log.G(context.Background()).WithError(err).Error("failed to list deployments")
+    log.G(ctx).WithError(err).Error("failed to list deployments")
     return []corev1.Container{}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
err := mdc.cache.List(context.Background(), &relatedDeploymentsByNode, &client.ListOptions{
LabelSelector: labels.SelectorFromSet(labelMap),
})
if err != nil {
log.G(context.Background()).WithError(err).Error("failed to list deployments")
return []corev1.Container{}
}
err := mdc.cache.List(ctx, &relatedDeploymentsByNode, &client.ListOptions{
LabelSelector: labels.SelectorFromSet(labelMap),
})
if err != nil {
log.G(ctx).WithError(err).Error("failed to list deployments")
return []corev1.Container{}
}

Comment on lines +358 to +368
if node.Status.Phase == corev1.NodeRunning {
for _, cond := range node.Status.Conditions {
if cond.Type == corev1.NodeReady {
if cond.Status == corev1.ConditionTrue {
readyNodeCnt++
break
}
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify node readiness check by removing deprecated node.Status.Phase

The node.Status.Phase field is deprecated and should not be used to determine node readiness. Instead, rely solely on the NodeReady condition.

Apply this diff to simplify the readiness check:

-    if node.Status.Phase == corev1.NodeRunning {
         for _, cond := range node.Status.Conditions {
             if cond.Type == corev1.NodeReady && cond.Status == corev1.ConditionTrue {
                 readyNodeCnt++
                 break
             }
         }
-    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if node.Status.Phase == corev1.NodeRunning {
for _, cond := range node.Status.Conditions {
if cond.Type == corev1.NodeReady {
if cond.Status == corev1.ConditionTrue {
readyNodeCnt++
break
}
}
}
}
}
for _, cond := range node.Status.Conditions {
if cond.Type == corev1.NodeReady && cond.Status == corev1.ConditionTrue {
readyNodeCnt++
break
}
}
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (13)
common/utils/utils.go (1)

203-207: Consider enhancing error handling with specific error types.

The error handling and logging look good. Consider creating custom error types for different failure scenarios to make error handling more specific in the calling code.

suite/mock_mqtt_base.go (2)

104-109: Add error handling for JSON marshaling.

The json.Marshal call in message handling methods should include error handling to prevent silent failures.

-	msgBytes, _ := json.Marshal(msg)
+	msgBytes, err := json.Marshal(msg)
+	if err != nil {
+		// Consider logging the error
+		return nil
+	}

254-257: Consider extracting topic construction logic.

Topic string construction is duplicated across multiple methods. Consider extracting this into helper methods to improve maintainability and reduce duplication.

Example refactoring:

func (b *MockMQTTBase) getBaseTopic(subTopic string) string {
    return fmt.Sprintf("koupleless_%s/%s/base/%s", b.Env, b.BaseMetadata.Identity, subTopic)
}

Usage:

-b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/health", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
+b.client.Pub(b.getBaseTopic("health"), 1, []byte(""))

Also applies to: 261-264, 268-269, 273-277

module_tunnels/koupleless_http_tunnel/http_tunnel.go (3)

Line range hint 257-367: Extract common nodeID lookup logic

The pattern of extracting nodeID and looking up baseStatus is repeated across multiple methods. Consider extracting this into a helper method to reduce duplication and improve maintainability.

func (h *HttpTunnel) getBaseStatus(nodeName string) (string, model.BaseStatus, error) {
    h.Lock()
    defer h.Unlock()
    nodeID := utils2.ExtractNodeIDFromNodeName(nodeName)
    baseStatus, ok := h.nodeIdToBaseStatusMap[nodeID]
    if !ok {
        return "", model.BaseStatus{}, fmt.Errorf("network info not found for node %s", nodeName)
    }
    return nodeID, baseStatus, nil
}

Line range hint 278-290: Improve error handling in QueryAllBizStatusData

The retry logic using queryAllBizDataOutdated could benefit from:

  1. A maximum retry limit to prevent infinite retries
  2. Exponential backoff between retries
  3. Context cancellation check before retrying

Would you like me to provide an implementation with these improvements?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 278-278: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L278
Added line #L278 was not covered by tests


Line range hint 314-367: Add test coverage for business operations

The business operation methods (StartBiz and StopBiz) lack test coverage. These methods handle critical module lifecycle operations and should be thoroughly tested.

Would you like me to help generate test cases for these methods?

controller/module_deployment_controller/module_deployment_controller.go (7)

9-9: Use a clearer alias for imported package

The alias errors2 might be confusing. Consider using apierrors for better clarity.

Apply this diff to rename the alias:

- errors2 "k8s.io/apimachinery/pkg/api/errors"
+ apierrors "k8s.io/apimachinery/pkg/api/errors"

170-170: Address the TODO comment regarding deployment labels

There's a TODO comment indicating that labels should be added to deployments by the module controller. Ensuring that these labels are correctly set is important for proper functioning.

Would you like assistance in implementing the addition of these labels?


236-237: Handle errors appropriately when listing deployments fails

Currently, when an error occurs while listing deployments, you log the error and return an empty list. Consider propagating the error to the caller for better error handling.

Apply this diff to return the error:

      if err != nil {
          logrus.WithError(err).Error("failed to list deployments")
-         return matchedDeployments
+         return nil, err
      }

Ensure that the function signature and callers are updated accordingly.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 236-237: controller/module_deployment_controller/module_deployment_controller.go#L236-L237
Added lines #L236 - L237 were not covered by tests


289-290: Fix typo in log message

There's a typo in the log message: "skip to update relicas" should be "skip updating replicas".

Apply this diff to correct the typo:

-     logrus.Warnf("failed to get cluster name of deployment %s, skip to update relicas", deployment.Name)
+     logrus.Warnf("failed to get cluster name of deployment %s, skipping update of replicas", deployment.Name)
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 289-290: controller/module_deployment_controller/module_deployment_controller.go#L289-L290
Added lines #L289 - L290 were not covered by tests


295-296: Improve error message clarity

The error message "failed to get nodes of cluster %s, skip to update replicas" could be clearer. Consider specifying the action being skipped due to the error.

Apply this diff to enhance the message:

      logrus.WithError(err).Errorf("failed to get nodes of cluster %s, skip to update replicas", clusterName)
+     logrus.WithError(err).Errorf("Failed to get nodes of cluster %s; skipping update of deployment replicas", clusterName)
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 295-296: controller/module_deployment_controller/module_deployment_controller.go#L295-L296
Added lines #L295 - L296 were not covered by tests


342-342: Remove unnecessary return statement

The return statement after the empty line at line 342 seems unnecessary since the function returns immediately afterward.

Consider removing the redundant line for cleaner code.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 342-342: controller/module_deployment_controller/module_deployment_controller.go#L342
Added line #L342 was not covered by tests


352-354: Propagate errors instead of logging and returning zero

When failing to list nodes in getReadyNodeCount, you log the error and return zero. Consider propagating the error to the caller for better error handling.

Ensure the function signature and callers handle the error appropriately.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 352-354: controller/module_deployment_controller/module_deployment_controller.go#L352-L354
Added lines #L352 - L354 were not covered by tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 90d7e3d and 779f804.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (8)
  • common/model/model.go (1 hunks)
  • common/utils/utils.go (5 hunks)
  • common/utils/utils_test.go (9 hunks)
  • controller/module_deployment_controller/module_deployment_controller.go (7 hunks)
  • go.mod (1 hunks)
  • module_tunnels/koupleless_http_tunnel/http_tunnel.go (17 hunks)
  • suite/mock_mqtt_base.go (10 hunks)
  • suite/module_deployment_controller_suite_test.go (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • common/model/model.go
  • common/utils/utils_test.go
  • go.mod
  • suite/module_deployment_controller_suite_test.go
🧰 Additional context used
🪛 GitHub Check: codecov/patch
controller/module_deployment_controller/module_deployment_controller.go

[warning] 178-180: controller/module_deployment_controller/module_deployment_controller.go#L178-L180
Added lines #L178 - L180 were not covered by tests


[warning] 224-226: controller/module_deployment_controller/module_deployment_controller.go#L224-L226
Added lines #L224 - L226 were not covered by tests


[warning] 236-237: controller/module_deployment_controller/module_deployment_controller.go#L236-L237
Added lines #L236 - L237 were not covered by tests


[warning] 289-290: controller/module_deployment_controller/module_deployment_controller.go#L289-L290
Added lines #L289 - L290 were not covered by tests


[warning] 295-296: controller/module_deployment_controller/module_deployment_controller.go#L295-L296
Added lines #L295 - L296 were not covered by tests


[warning] 312-313: controller/module_deployment_controller/module_deployment_controller.go#L312-L313
Added lines #L312 - L313 were not covered by tests


[warning] 321-329: controller/module_deployment_controller/module_deployment_controller.go#L321-L329
Added lines #L321 - L329 were not covered by tests


[warning] 334-334: controller/module_deployment_controller/module_deployment_controller.go#L334
Added line #L334 was not covered by tests


[warning] 342-342: controller/module_deployment_controller/module_deployment_controller.go#L342
Added line #L342 was not covered by tests


[warning] 352-354: controller/module_deployment_controller/module_deployment_controller.go#L352-L354
Added lines #L352 - L354 were not covered by tests

module_tunnels/koupleless_http_tunnel/http_tunnel.go

[warning] 75-75: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L75
Added line #L75 was not covered by tests


[warning] 80-81: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L80-L81
Added lines #L80 - L81 were not covered by tests


[warning] 83-83: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L83
Added line #L83 was not covered by tests


[warning] 88-88: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L88
Added line #L88 was not covered by tests


[warning] 91-92: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L91-L92
Added lines #L91 - L92 were not covered by tests


[warning] 96-97: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L96-L97
Added lines #L96 - L97 were not covered by tests


[warning] 106-106: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L106
Added line #L106 was not covered by tests


[warning] 149-149: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L149
Added line #L149 was not covered by tests


[warning] 158-158: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L158
Added line #L158 was not covered by tests


[warning] 160-160: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L160
Added line #L160 was not covered by tests


[warning] 172-173: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L172-L173
Added lines #L172 - L173 were not covered by tests


[warning] 175-175: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L175
Added line #L175 was not covered by tests


[warning] 182-184: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L182-L184
Added lines #L182 - L184 were not covered by tests


[warning] 218-218: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L218
Added line #L218 was not covered by tests


[warning] 257-257: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L257
Added line #L257 was not covered by tests


[warning] 259-260: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L259-L260
Added lines #L259 - L260 were not covered by tests


[warning] 266-266: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L266
Added line #L266 was not covered by tests


[warning] 278-278: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L278
Added line #L278 was not covered by tests

🔇 Additional comments (11)
common/utils/utils.go (4)

22-23: LGTM! Good naming improvement.

The rename from GetBaseIDFromTopic to GetBaseIdentityFromTopic improves clarity and maintains consistency with the codebase terminology.


72-95: Verify state mapping completeness.

The conversion logic looks good, but please verify that all possible states from BaseStatus are properly mapped to corresponding NodeState values. Currently, only "ACTIVATED" is explicitly handled.


98-99: LGTM! Consistent with new naming convention.

The rename from TranslateHealthDataToNodeStatus to ConvertHealthDataToNodeStatus maintains consistency with the new naming pattern.


121-126: LGTM! Clean and straightforward conversion.

The function properly maps all required fields from BaseMetadata to QueryBaselineRequest.

suite/mock_mqtt_base.go (2)

19-37: LGTM! Clean struct refactoring.

The replacement of the ID field with BaseMetadata and its initialization in the constructor is well-implemented. The BaseMetadata.Identity field is properly initialized, addressing a previous concern.


73-74: Previous comment about MQTT subscription error handling still applies.

module_tunnels/koupleless_http_tunnel/http_tunnel.go (2)

27-53: Verify complete implementation of tunnel.Tunnel interface

The type assertion has been updated from module_tunnels.ModuleTunnel to tunnel.Tunnel. Please ensure all required methods of the new interface are implemented correctly.


55-61: Previous review comment about returning pointer type is still valid

controller/module_deployment_controller/module_deployment_controller.go (3)

299-299: Check for nil before dereferencing deployment.Spec.Replicas

deployment.Spec.Replicas is a pointer and may be nil. Dereferencing it without checking can cause a runtime panic. Ensure it's not nil before comparison.


358-368: Avoid using deprecated node.Status.Phase

The node.Status.Phase field is deprecated and should not be used. Instead, rely solely on the NodeReady condition to check for node readiness.


375-377: ⚠️ Potential issue

Incorrect use of client.MergeFrom

The patch variable is created using client.MergeFrom(old), but old is set after calling deployment.DeepCopy(). This might not produce the expected patch.

Ensure that old represents the original state before any modifications.

Apply this diff to correct the order:

-     old := deployment.DeepCopy()
      patch := client.MergeFrom(old)
+     old := deployment.DeepCopy()

Likely invalid or redundant comment.

Comment on lines +222 to +242
// ConvertBaseStatusFromNodeInfo extracts network info from node info
func ConvertBaseStatusFromNodeInfo(initData vkModel.NodeInfo) model.BaseStatus {
portStr := initData.CustomLabels[model.LabelKeyOfTunnelPort]

port, err := strconv.Atoi(portStr)
if err != nil {
logrus.Errorf("failed to parse port %s from node info", portStr)
port = 1238
}

return model.NetworkInfo{
return model.BaseStatus{
BaseMetadata: model.BaseMetadata{
Identity: utils.ExtractNodeIDFromNodeName(initData.Metadata.Name),
Version: initData.Metadata.Version,
ClusterName: initData.Metadata.ClusterName,
},

LocalIP: initData.NetworkInfo.NodeIP,
LocalHostName: initData.NetworkInfo.HostName,
ArkletPort: port,
Port: port,
State: string(initData.State),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve port handling with configuration-based default.

While the conversion logic is correct, the hardcoded fallback port (1238) should be replaced with a configurable default value. Consider:

  1. Making the default port a constant
  2. Adding validation for the port range
  3. Logging a warning instead of error when using the default port
+ const DefaultTunnelPort = 1238

 func ConvertBaseStatusFromNodeInfo(initData vkModel.NodeInfo) model.BaseStatus {
     portStr := initData.CustomLabels[model.LabelKeyOfTunnelPort]
 
     port, err := strconv.Atoi(portStr)
     if err != nil {
-        logrus.Errorf("failed to parse port %s from node info", portStr)
-        port = 1238
+        logrus.Warnf("Invalid port %s from node info, using default port %d", portStr, DefaultTunnelPort)
+        port = DefaultTunnelPort
     }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +74 to +97
// RegisterNode is called when a new node starts
func (h *HttpTunnel) RegisterNode(ctx context.Context, initData vkModel.NodeInfo) {
h.Lock()
defer h.Unlock()

// check base network info, if not exist, extract from initData
_, has := h.nodeNetworkInfoOfNodeID[nodeID]
nodeID := utils2.ExtractNodeIDFromNodeName(initData.Metadata.Name)
_, has := h.nodeIdToBaseStatusMap[nodeID]
if !has {
h.nodeNetworkInfoOfNodeID[nodeID] = utils.ExtractNetworkInfoFromNodeInfoData(initData)
h.nodeIdToBaseStatusMap[nodeID] = utils.ConvertBaseStatusFromNodeInfo(initData)
}
}

// OnNodeStop is called when a node stops
func (h *HttpTunnel) OnNodeStop(ctx context.Context, nodeID string) {
// UnRegisterNode is called when a node stops
func (h *HttpTunnel) UnRegisterNode(ctx context.Context, nodeName string) {
h.Lock()
defer h.Unlock()
delete(h.nodeNetworkInfoOfNodeID, nodeID)
nodeID := utils2.ExtractNodeIDFromNodeName(nodeName)
delete(h.nodeIdToBaseStatusMap, nodeID)
}

// OnNodeNotReady is called when a node is not ready
func (h *HttpTunnel) OnNodeNotReady(ctx context.Context, info vkModel.UnreachableNodeInfo) {
utils.OnBaseUnreachable(ctx, info, h.env, h.Client)
func (h *HttpTunnel) OnNodeNotReady(ctx context.Context, nodeName string) {
utils.OnBaseUnreachable(ctx, nodeName, h.kubeClient)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add test coverage for node registration methods

The node registration methods (RegisterNode, UnRegisterNode, and OnNodeNotReady) lack test coverage. These are critical paths that should be thoroughly tested to ensure reliable node management.

Would you like me to help generate test cases for these methods?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 75-75: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L75
Added line #L75 was not covered by tests


[warning] 80-81: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L80-L81
Added lines #L80 - L81 were not covered by tests


[warning] 83-83: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L83
Added line #L83 was not covered by tests


[warning] 88-88: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L88
Added line #L88 was not covered by tests


[warning] 91-92: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L91-L92
Added lines #L91 - L92 were not covered by tests


[warning] 96-97: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L96-L97
Added lines #L96 - L97 were not covered by tests

Comment on lines +80 to +83
nodeID := utils2.ExtractNodeIDFromNodeName(initData.Metadata.Name)
_, has := h.nodeIdToBaseStatusMap[nodeID]
if !has {
h.nodeNetworkInfoOfNodeID[nodeID] = utils.ExtractNetworkInfoFromNodeInfoData(initData)
h.nodeIdToBaseStatusMap[nodeID] = utils.ConvertBaseStatusFromNodeInfo(initData)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential race condition in RegisterNode

The check-then-set operation on nodeIdToBaseStatusMap between the has check and the assignment is not atomic. Consider combining these operations to avoid potential race conditions.

-	_, has := h.nodeIdToBaseStatusMap[nodeID]
-	if !has {
-		h.nodeIdToBaseStatusMap[nodeID] = utils.ConvertBaseStatusFromNodeInfo(initData)
-	}
+	if _, has := h.nodeIdToBaseStatusMap[nodeID]; !has {
+		h.nodeIdToBaseStatusMap[nodeID] = utils.ConvertBaseStatusFromNodeInfo(initData)
+	}

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 80-81: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L80-L81
Added lines #L80 - L81 were not covered by tests


[warning] 83-83: module_tunnels/koupleless_http_tunnel/http_tunnel.go#L83
Added line #L83 was not covered by tests

}
}

return ""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure all code paths return a value

The function getClusterNameFromDeployment may not return a value in all code paths, which can lead to issues. Review the logic to ensure that a default value is returned when no cluster name is found.

Consider returning an explicit error or a default value to handle this case.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 334-334: controller/module_deployment_controller/module_deployment_controller.go#L334
Added line #L334 was not covered by tests

Comment on lines +178 to +180
log.G(context.Background()).WithError(err).Error("failed to list deployments")
return []corev1.Container{}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle context properly in error logging

In the error logging, context.Background() is used instead of the passed ctx. This may lead to missing context information, such as cancellation signals or deadlines.

Apply this diff to use the proper context:

-     log.G(context.Background()).WithError(err).Error("failed to list deployments")
+     log.G(ctx).WithError(err).Error("failed to list deployments")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
log.G(context.Background()).WithError(err).Error("failed to list deployments")
return []corev1.Container{}
}
log.G(ctx).WithError(err).Error("failed to list deployments")
return []corev1.Container{}
}
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 178-180: controller/module_deployment_controller/module_deployment_controller.go#L178-L180
Added lines #L178 - L180 were not covered by tests

Comment on lines +224 to +226
logrus.Warnf("failed to get cluster name of node %s", node.Name)
return matchedDeployments
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve error handling when cluster name is missing

When the cluster name is empty, you log a warning but continue execution. Consider returning an error or handling this case explicitly to prevent unexpected behavior.

Apply this diff to return early when the cluster name is missing:

      if clusterName == "" {
          logrus.Warnf("failed to get cluster name of node %s", node.Name)
+         return nil
-         return matchedDeployments
      }

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 224-226: controller/module_deployment_controller/module_deployment_controller.go#L224-L226
Added lines #L224 - L226 were not covered by tests

Comment on lines +321 to +332
if deployment.Spec.Template.Spec.Affinity != nil &&
deployment.Spec.Template.Spec.Affinity.NodeAffinity != nil &&
deployment.Spec.Template.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil &&
deployment.Spec.Template.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms != nil {
for _, term := range deployment.Spec.Template.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms {
for _, expr := range term.MatchExpressions {
if expr.Key == vkModel.LabelKeyOfVNodeClusterName {
return expr.Values[0]
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Check for empty expr.Values before accessing

Accessing expr.Values[0] without verifying that expr.Values is not empty can cause an index out of range panic. Ensure that expr.Values has at least one element before accessing it.

Apply this diff to add a length check:

      for _, expr := range term.MatchExpressions {
          if expr.Key == vkModel.LabelKeyOfVNodeClusterName {
+             if len(expr.Values) > 0 {
                  return expr.Values[0]
+             }
          }
      }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if deployment.Spec.Template.Spec.Affinity != nil &&
deployment.Spec.Template.Spec.Affinity.NodeAffinity != nil &&
deployment.Spec.Template.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil &&
deployment.Spec.Template.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms != nil {
for _, term := range deployment.Spec.Template.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms {
for _, expr := range term.MatchExpressions {
if expr.Key == vkModel.LabelKeyOfVNodeClusterName {
return expr.Values[0]
}
}
}
}
if deployment.Spec.Template.Spec.Affinity != nil &&
deployment.Spec.Template.Spec.Affinity.NodeAffinity != nil &&
deployment.Spec.Template.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil &&
deployment.Spec.Template.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms != nil {
for _, term := range deployment.Spec.Template.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms {
for _, expr := range term.MatchExpressions {
if expr.Key == vkModel.LabelKeyOfVNodeClusterName {
if len(expr.Values) > 0 {
return expr.Values[0]
}
}
}
}
}
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 321-329: controller/module_deployment_controller/module_deployment_controller.go#L321-L329
Added lines #L321 - L329 were not covered by tests

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🧹 Outside diff range and nitpick comments (20)
suite/http/module_http_lifecycle_test.go (2)

Line range hint 27-41: Consider enhancing node readiness check

The node readiness check implementation is correct, but could be made more robust:

 Eventually(func() bool {
     err := k8sClient.Get(ctx, types.NamespacedName{
         Name: utils.FormatNodeName(nodeID, env),
     }, vnode)
+    if err != nil {
+        return false
+    }
     vnodeReady := false
     for _, cond := range vnode.Status.Conditions {
         if cond.Type == v1.NodeReady {
             vnodeReady = cond.Status == v1.ConditionTrue
             break
         }
     }
-    return err == nil && vnodeReady
+    return vnodeReady
 }, time.Second*20, time.Second).Should(BeTrue())

Line range hint 1-89: Consider adding more test scenarios

The test covers basic lifecycle operations but could benefit from additional test cases:

  1. Error scenarios (e.g., network failures)
  2. Concurrent module operations
  3. Resource limit validations
  4. Edge cases in state transitions

Would you like me to help create additional test scenarios?

suite/mqtt/base_mqtt_lifecycle_test.go (3)

40-40: Define timeout constants

The magic numbers time.Second*50 for timeouts should be defined as constants at the package level for better maintainability and consistency.

const (
    nodeReadinessTimeout = 50 * time.Second
    nodeReadinessInterval = time.Second
)

Also applies to: 73-73, 84-84


20-20: Document mock behavior expectations

Add comments explaining the expected behavior of NewMockMqttBase and its interaction patterns to help other developers understand the test setup.

// NewMockMqttBase creates a mock MQTT base that simulates node registration
// and status updates. It supports controlled shutdown through Unreachable() and Exit() methods.

14-89: Consider adding negative test cases

The test suite covers the happy path for both deactivation and unreachable scenarios, but consider adding test cases for error conditions such as:

  • Network interruptions during node registration
  • Invalid node states
  • Concurrent status updates

Would you like me to help generate these additional test cases?

suite/mqtt/module_mqtt_lifecycle_test.go (1)

Line range hint 26-124: Consider splitting test scenarios into smaller, focused contexts

The current test file covers multiple scenarios in a single context. Consider reorganizing the tests into smaller, more focused contexts such as:

  • Node Registration
  • Pod Lifecycle
  • Error Handling
  • Node Shutdown

This would improve test maintainability and make failures easier to diagnose.

suite/mqtt/module_deployment_controller_suite_test.go (3)

117-117: Consider documenting the cluster naming pattern.

The use of clusterName + "-2" suggests a pattern for multi-cluster deployments. Consider:

  • Documenting this naming convention
  • Adding constants for common suffixes if this is a repeated pattern

Also applies to: 127-129


165-165: Consider standardizing timeout durations.

The timeout duration is set to 20 minutes which seems excessive for this test case. Consider:

  • Standardizing timeout durations across tests
  • Using constants for common timeout values
  • Documenting why such a long timeout is needed

Also applies to: 173-173


241-241: Consider documenting timeout increase.

The timeout has been increased from 20 to 30 seconds. Please document:

  • The reason for this specific increase
  • Any observed issues that led to this change
suite/http/base_http_lifecycle_test.go (1)

Line range hint 26-43: Refactor repeated code into a helper function

The block of code that checks if the vnode is ready is repeated multiple times. Consider extracting this logic into a helper function to improve maintainability and reduce duplication.

For example, you could create a helper function:

func waitForVnodeReady(ctx context.Context, nodeID, env string) bool {
	vnode := &v1.Node{}
	err := k8sClient.Get(ctx, types.NamespacedName{
		Name: utils.FormatNodeName(nodeID, env),
	}, vnode)
	vnodeReady := false
	for _, cond := range vnode.Status.Conditions {
		if cond.Type == v1.NodeReady {
			vnodeReady = cond.Status == v1.ConditionTrue
			break
		}
	}
	return err == nil && vnodeReady
}

Then replace the repeated code with:

Eventually(func() bool {
	return waitForVnodeReady(ctx, httpNodeID, env)
}, time.Second*50, time.Second).Should(BeTrue())

Also applies to: 59-76, 93-110

suite/mqtt/suite_test.go (1)

86-93: Implement context cancellation for graceful shutdown

Consider handling context cancellation to allow the MQTT server to shut down gracefully when the context is canceled. Currently, the server may continue running even if the rest of the test suite is terminated.

suite/http/suite_test.go (4)

39-39: Remove unused variable mqttTunnel.

The variable mqttTunnel declared on line 39 is not used in the code. Unused variables can lead to confusion and should be removed to keep the code clean.

🧰 Tools
🪛 golangci-lint

39-39: var mqttTunnel is unused

(unused)


6-13: Clarify import aliases for better readability.

The import aliases model2 (line 6) and model (line 12) are similar, which may cause confusion when reading the code. Consider renaming them to more descriptive aliases to enhance code clarity. For example:


85-89: Avoid using panic in goroutines.

In the goroutine starting at line 83, if mqttServer.Serve() returns an error, panic(err) is called. Panicking inside goroutines may not propagate the error appropriately. Consider sending the error through a channel or handling it gracefully.


143-182: Add comments to the prepareModulePod function.

Providing comments for the prepareModulePod function will help other developers understand its purpose and usage. Consider adding a brief description above the function definition.

suite/http/mock_http_base.go (3)

60-60: Unused clientID parameter in Start method

The Start method signature now includes a new parameter clientID, but this parameter is not used within the method body. Consider removing it if it's unnecessary, or implement its usage if it's meant to serve a purpose.


162-167: Handle the error returned by json.Marshal

In the getHeartBeatMsg method, the error returned by json.Marshal(msg) is currently ignored. Handling this error will help in identifying serialization issues during runtime.

Apply this diff to handle the potential error:

 func (b *MockHttpBase) getHeartBeatMsg() []byte {
     msg := model.BaseStatus{
         BaseMetadata:  b.Metadata,
         LocalIP:       "127.0.0.1",
         LocalHostName: "localhost",
         Port:          b.port,
         State:         b.CurrState,
     }
-    msgBytes, _ := json.Marshal(msg)
+    msgBytes, err := json.Marshal(msg)
+    if err != nil {
+        logrus.Errorf("error marshaling heartbeat message: %v", err)
+        return nil
+    }
     return msgBytes
 }

279-281: Add validation in getBizIdentity function

To prevent malformed identity strings, consider validating that bizModel.BizName and bizModel.BizVersion are not empty before formatting.

suite/mqtt/mock_mqtt_base.go (2)

268-269: Avoid Manual JSON Construction in SendFailedMessage

Constructing JSON strings manually using fmt.Sprintf is error-prone and can lead to invalid JSON. Instead, use json.Marshal to ensure correct formatting.

Refactor the code:

data := map[string]interface{}{
    "publishTimestamp": time.Now().UnixMilli(),
    "data": map[string]interface{}{
        "code": "",
    },
}
msgBytes, err := json.Marshal(data)
if err != nil {
    // Handle error
    log.Errorf("Failed to marshal failed message: %v", err)
    return
}
b.client.Pub(topic, qos, msgBytes)

Line range hint 312-334: Handle Errors When Publishing Messages in SetBizState

The publish operation in SetBizState does not handle potential errors from b.client.Pub. Unhandled errors could mean that important state updates are not communicated.

Consider capturing and handling the error:

if err := b.client.Pub(topic, qos, respBytes); err != nil {
    // Handle error
    log.Errorf("Failed to publish biz state: %v", err)
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 779f804 and 98930a5.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (10)
  • go.mod (1 hunks)
  • suite/http/base_http_lifecycle_test.go (5 hunks)
  • suite/http/mock_http_base.go (8 hunks)
  • suite/http/module_http_lifecycle_test.go (2 hunks)
  • suite/http/suite_test.go (1 hunks)
  • suite/mqtt/base_mqtt_lifecycle_test.go (1 hunks)
  • suite/mqtt/mock_mqtt_base.go (11 hunks)
  • suite/mqtt/module_deployment_controller_suite_test.go (9 hunks)
  • suite/mqtt/module_mqtt_lifecycle_test.go (2 hunks)
  • suite/mqtt/suite_test.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod
🧰 Additional context used
🪛 golangci-lint
suite/http/suite_test.go

39-39: var mqttTunnel is unused

(unused)

🔇 Additional comments (13)
suite/http/module_http_lifecycle_test.go (1)

19-21: Verify the test configuration parameters

The addition of clusterName parameter and its usage in NewMockHttpBase looks correct, but please ensure that:

  1. The test cluster name "test-cluster-name" aligns with other test configurations
  2. The port number 1238 doesn't conflict with other tests running in parallel
suite/mqtt/module_mqtt_lifecycle_test.go (3)

1-1: LGTM! Package name change improves code organization

The package name change from 'suite' to 'mqtt' better reflects the module's responsibility and improves code organization.

Also applies to: 5-6


20-22: LGTM! Improved test configuration setup

The introduction of clusterName and the use of utils.FormatNodeName make the test configuration more explicit and maintainable.


34-37: LGTM! Improved error handling

The addition of error logging for vnode retrieval failures enhances debugging capabilities.

suite/mqtt/module_deployment_controller_suite_test.go (2)

1-1: LGTM! Good improvements to initialization.

The changes improve code maintainability by:

  • Moving to a more specific package name
  • Extracting cluster name to a variable
  • Consistently using the cluster name in mock base initialization

Also applies to: 21-24


55-55: Verify image naming convention and node selector changes.

The changes include:

  1. Adding .jar extension to image name
  2. Updated node selector with cluster name

Please verify:

  • If the .jar extension in image names is a new standard
  • If all dependent systems support this image naming convention
  • If the cluster name in node selector aligns with the virtual kubelet configuration

Also applies to: 65-67

suite/http/base_http_lifecycle_test.go (1)

111-111: Verify that updating Metadata.Identity correctly changes the base ID

By setting mockHttpBase.Metadata.Identity = "changed-base-id", ensure that this operation effectively changes the base ID in the system. Verify that all dependent components handle this change appropriately and that there are no unintended side effects.

suite/mqtt/suite_test.go (1)

1-1: Verify package name change impact

The package name has been changed to mqtt. Ensure that all references to this package are updated accordingly throughout the codebase to prevent import issues.

suite/http/suite_test.go (2)

130-132: Pass the correct context to k8sManager.Start.

In the goroutine starting at line 129, k8sManager.Start(ctx) is called with the main context. If ctx is canceled, it may prematurely stop the manager before the test completes. Consider using a separate context or ensuring the context lifespan is appropriate for the test duration.


99-105: ⚠️ Potential issue

Check the error returned by SetupWithManager method.

On lines 102-105, after calling moduleDeploymentController.SetupWithManager(ctx, k8sManager), the error err is not being checked. This could lead to unhandled errors affecting the test setup. Please add error handling as demonstrated:

+ Expect(err).ToNot(HaveOccurred())

Likely invalid or redundant comment.

suite/http/mock_http_base.go (1)

32-39: Ensure all callers of NewMockHttpBase are updated with the new clusterName parameter

The constructor NewMockHttpBase now includes an additional parameter clusterName. Please verify that all invocations of this function are updated to pass the new clusterName argument to prevent any runtime errors.

suite/mqtt/mock_mqtt_base.go (2)

69-69: Ensure Unique ClientID to Prevent Connection Conflicts

In the Start method, the ClientID is set using the clientID parameter. MQTT protocol requires each client to have a unique identifier. If two clients use the same ClientID, they can interfere with each other's connections.

Please verify that the clientID provided is unique across all clients to avoid unexpected disconnections or subscription issues.


254-257: Confirm Intentional Empty Message Payloads in SendInvalidMessage

The SendInvalidMessage function publishes empty message payloads to several topics. If this is intended for testing invalid message handling, ensure that downstream systems are prepared to handle empty payloads without adverse effects.

Context("pod install", func() {
It("base should become a ready vnode eventually", func() {
go mockBase.Start(ctx)
go mockBase.Start(ctx, clientID)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Verify clientID initialization

The clientID variable is being used in mockBase.Start() but its initialization is not visible in the provided code. Ensure that clientID is properly initialized before this line to avoid potential nil reference issues.

Comment on lines +28 to +40
Eventually(func() bool {
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(mqttNodeID, env),
}, vnode)
vnodeReady := false
for _, cond := range vnode.Status.Conditions {
if cond.Type == v1.NodeReady {
vnodeReady = cond.Status == v1.ConditionTrue
break
}
}
return err == nil && vnodeReady
}, time.Second*50, time.Second).Should(BeTrue())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Extract node readiness check into a helper function

This node readiness check logic is duplicated in both test contexts. Consider extracting it into a helper function to improve maintainability and reduce duplication.

func waitForNodeReady(ctx context.Context, k8sClient client.Client, nodeName string) func() bool {
    return func() bool {
        vnode := &v1.Node{}
        err := k8sClient.Get(ctx, types.NamespacedName{
            Name: nodeName,
        }, vnode)
        if err != nil {
            return false
        }
        for _, cond := range vnode.Status.Conditions {
            if cond.Type == v1.NodeReady {
                return cond.Status == v1.ConditionTrue
            }
        }
        return false
    }
}

Comment on lines +45 to +51
Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(mqttNodeID, env),
}, vnode)
return errors.IsNotFound(err)
}, time.Second*50, time.Second).Should(BeTrue())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Extract node not found check into a helper function

The node not found check logic is duplicated. Consider extracting it into a helper function.

func waitForNodeNotFound(ctx context.Context, k8sClient client.Client, nodeName string) func() bool {
    return func() bool {
        vnode := &v1.Node{}
        err := k8sClient.Get(ctx, types.NamespacedName{
            Name: nodeName,
        }, vnode)
        return errors.IsNotFound(err)
    }
}

Also applies to: 78-84

return errors.IsNotFound(err)
}, time.Second*50, time.Second).Should(BeTrue())

mockMqttBase.Exit()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using AfterEach for cleanup

The mockMqttBase.Exit() call should be moved to an AfterEach block to ensure cleanup happens even if tests fail.

AfterEach(func() {
    mockMqttBase.Exit()
})


Context("mqtt base online and unreachable finally", func() {
It("base should become a ready vnode eventually", func() {
time.Sleep(time.Second)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove duplicate sleep

This is another instance of an arbitrary sleep that should be handled more deterministically.

Same recommendation as the previous sleep removal.

Comment on lines +124 to +125
err = vnodeController.SetupWithManager(ctx, k8sManager)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle error for vnodeController.SetupWithManager.

The error returned by vnodeController.SetupWithManager(ctx, k8sManager) is not being handled. Unchecked errors can cause issues during the test execution. Please ensure to handle the error appropriately:

+ err = vnodeController.SetupWithManager(ctx, k8sManager)
+ Expect(err).ToNot(HaveOccurred())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
err = vnodeController.SetupWithManager(ctx, k8sManager)
err = vnodeController.SetupWithManager(ctx, k8sManager)
Expect(err).ToNot(HaveOccurred())

mqttServer = mqtt.New(nil)

// Allow all connections.
_ = mqttServer.AddHook(new(auth.AllowHook), nil)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle error returned by AddHook method.

On line 75, the error returned by mqttServer.AddHook(new(auth.AllowHook), nil) is not being checked. Ignoring errors may lead to unexpected behavior. Consider handling the error as shown below:

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +73 to +74
client.Subscribe(fmt.Sprintf("koupleless_%s/%s/+", b.Env, b.BaseMetadata.Identity), 1, b.processCommand)
client.Subscribe(fmt.Sprintf("koupleless_%s/%s/base/baseline", b.Env, b.BaseMetadata.Identity), 1, b.processBaseline)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle Errors from MQTT Subscription Tokens

The calls to client.Subscribe do not check for errors that might occur during subscription. Failing to handle these errors can lead to missed subscriptions without any notification.

Consider handling the errors as follows:

token := client.Subscribe(topic, qos, callback)
if token.Wait() && token.Error() != nil {
    // Handle error, e.g., log it or retry subscription
    log.Errorf("Failed to subscribe to %s: %v", topic, token.Error())
    return token.Error()
}

Comment on lines 104 to 109
PublishTimestamp: time.Now().UnixMilli(),
Data: model.HeartBeatData{
Data: model.BaseStatus{
BaseMetadata: b.BaseMetadata,
LocalIP: "127.0.0.1",
LocalHostName: "localhost",
State: b.CurrState,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Check for Errors When Marshaling JSON in getHeartBeatMsg

In the getHeartBeatMsg function, the error returned by json.Marshal(msg) is being ignored. Ignoring marshalling errors can result in nil or invalid message payloads being sent.

Modify the code to handle the error:

msgBytes, err := json.Marshal(msg)
if err != nil {
    // Handle error, e.g., log the error or return it
    log.Errorf("Failed to marshal heartbeat message: %v", err)
    return nil
}

Comment on lines 273 to 275
PublishTimestamp: time.Now().UnixMilli(),
Data: b.Metadata,
Data: b.BaseMetadata,
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle Errors from JSON Marshaling in QueryBaseline

In the QueryBaseline method, the error from json.Marshal is ignored. If marshalling fails, queryBaselineBytes could be nil, leading to a runtime error when publishing.

Modify the code to handle the error:

queryBaselineBytes, err := json.Marshal(...)
if err != nil {
    // Handle error, e.g., log the error or return
    log.Errorf("Failed to marshal baseline query: %v", err)
    return
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
suite/http/base_http_lifecycle_test.go (1)

Line range hint 26-46: Consider optimizing test execution time and reliability

A few suggestions to improve the test suite:

  1. The 50-second timeout in Eventually blocks seems excessive for most operations. Consider reducing it unless there's a specific reason for such a long wait.
  2. The time.Sleep(time.Second) calls suggest potential race conditions. Consider making the tests more deterministic by properly waiting for specific conditions instead.

Example of a more deterministic approach:

Eventually(func() bool {
    // Check for specific preconditions here
    return preconditionsMet
}, time.Second*5, time.Millisecond*100).Should(BeTrue())

Also applies to: 59-79, 93-113

suite/http/mock_http_base.go (1)

278-280: Consider enhancing the getBizIdentity function.

While the function is correctly implemented, consider adding validation for empty fields and documentation about the format.

+// getBizIdentity returns a unique identifier for a business model in the format "bizName:bizVersion"
 func getBizIdentity(bizModel ark.BizModel) string {
+    if bizModel.BizName == "" || bizModel.BizVersion == "" {
+        return ""
+    }
     return fmt.Sprintf("%s:%s", bizModel.BizName, bizModel.BizVersion)
 }
module_tunnels/koupleless_http_tunnel/http_tunnel.go (1)

27-53: LGTM! Good architectural improvements.

The changes improve the codebase architecture through:

  1. Standardization by implementing the tunnel.Tunnel interface
  2. Better field naming (e.g., nodeIdToBaseStatusMap)
  3. Improved separation of concerns with the moduleDeploymentController field
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 98930a5 and 1c6da51.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • go.mod (1 hunks)
  • module_tunnels/koupleless_http_tunnel/http_tunnel.go (18 hunks)
  • suite/http/base_http_lifecycle_test.go (5 hunks)
  • suite/http/mock_http_base.go (8 hunks)
  • suite/http/module_http_lifecycle_test.go (2 hunks)
  • suite/mqtt/base_mqtt_lifecycle_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • go.mod
  • suite/http/module_http_lifecycle_test.go
  • suite/mqtt/base_mqtt_lifecycle_test.go
🔇 Additional comments (11)
suite/http/base_http_lifecycle_test.go (2)

Line range hint 1-12: LGTM! Package name change aligns with HTTP functionality


111-113: LGTM! Improved identity management through metadata

The change to use Metadata.Identity instead of direct ID modification is a good improvement that provides better structure and maintainability.

suite/http/mock_http_base.go (5)

22-22: Verify the impact of metadata structure changes.

The change from model.Metadata to model.BaseMetadata and the updated field structure (Identity instead of Name) represents a breaking change. Ensure that all consumers of this struct have been updated accordingly.

Also applies to: 35-38


59-67: Review the usage of clientID parameter.

The Start method has been updated to accept a clientID parameter, but it's not being used within the method implementation. Consider either:

  1. Using the parameter in the implementation
  2. Documenting why it's needed
  3. Removing it if unnecessary

161-165: LGTM: Improved status message structure.

The transition to model.BaseStatus with explicit field mapping improves code clarity and type safety.


184-186: LGTM: Consistent metadata field usage.

The health message correctly uses the new metadata fields, maintaining consistency with the struct changes.


247-247: LGTM: Consistent identity field usage.

The response structures correctly use BaseIdentity field, aligned with the metadata changes.

Also applies to: 272-272

module_tunnels/koupleless_http_tunnel/http_tunnel.go (4)

55-61: Previous comment about returning a pointer type is still valid.


74-97: Previous comments about test coverage and race conditions are still valid.


Line range hint 314-367: LGTM! Well-structured business operations.

The implementation demonstrates:

  1. Consistent error handling
  2. Proper lock management
  3. Appropriate logging

172-183: 🛠️ Refactor suggestion

Improve error handling in baseline query.

Consider these improvements:

  1. Handle JSON marshaling errors
  2. Add logging for baseline query operations
- jsonData, _ := json.Marshal(baselineBizs)
+ jsonData, err := json.Marshal(baselineBizs)
+ if err != nil {
+   logger.WithError(err).Error("failed to marshal baseline response")
+   w.WriteHeader(http.StatusInternalServerError)
+   w.Write([]byte(err.Error()))
+   return
+ }

Likely invalid or redundant comment.

Comment on lines +18 to +20
clusterName := "test-cluster-name"
httpNodeID := "test-http-base"
mockHttpBase := NewMockHttpBase("test-http-base", "1.0.0", httpNodeID, env, 1238)

Context("mqtt base online and deactive finally", func() {
It("base should become a ready vnode eventually", func() {
time.Sleep(time.Second)

go mockMqttBase.Start(ctx)
vnode := &v1.Node{}
Eventually(func() bool {
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(mqttNodeID, env),
}, vnode)
vnodeReady := false
for _, cond := range vnode.Status.Conditions {
if cond.Type == v1.NodeReady {
vnodeReady = cond.Status == v1.ConditionTrue
break
}
}
return err == nil && vnodeReady
}, time.Second*50, time.Second).Should(BeTrue())
})

It("base offline with deactive message and finally exit", func() {
mockMqttBase.Exit()
Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(mqttNodeID, env),
}, vnode)
return errors.IsNotFound(err)
}, time.Second*50, time.Second).Should(BeTrue())
})
})

Context("mqtt base online and unreachable finally", func() {
It("base should become a ready vnode eventually", func() {
time.Sleep(time.Second)

go mockMqttBase.Start(ctx)
vnode := &v1.Node{}
Eventually(func() bool {
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(mqttNodeID, env),
}, vnode)
vnodeReady := false
for _, cond := range vnode.Status.Conditions {
if cond.Type == v1.NodeReady {
vnodeReady = cond.Status == v1.ConditionTrue
break
}
}
return err == nil && vnodeReady
}, time.Second*50, time.Second).Should(BeTrue())
})

It("base unreachable finally exit", func() {
mockMqttBase.Unreachable()
Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(mqttNodeID, env),
}, vnode)
return errors.IsNotFound(err)
}, time.Second*50, time.Second).Should(BeTrue())

mockMqttBase.Exit()
})
})
mockHttpBase := NewMockHttpBase(httpNodeID, clusterName, "1.0.0", env, 1238)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Define missing clientID variable

The clientID variable is used in multiple mockHttpBase.Start() calls but is not defined. This issue was previously reported and needs to be addressed.

Add the following before line 18:

clientID := "test-client-id"

@lvjing2 lvjing2 force-pushed the youji-dev branch 2 times, most recently from c228e20 to 1741b44 Compare November 20, 2024 14:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)
suite/http/module_http_lifecycle_test.go (1)

Line range hint 29-41: Consider extracting node readiness check into a helper function

The node readiness check logic is complex enough to warrant its own helper function, which would improve readability and reusability across tests.

Consider refactoring to:

func isNodeReady(ctx context.Context, client client.Client, nodeName string) bool {
    vnode := &v1.Node{}
    err := client.Get(ctx, types.NamespacedName{
        Name: nodeName,
    }, vnode)
    if err != nil {
        return false
    }
    
    for _, cond := range vnode.Status.Conditions {
        if cond.Type == v1.NodeReady {
            return cond.Status == v1.ConditionTrue
        }
    }
    return false
}

Then use it in the test:

Eventually(func() bool {
    return isNodeReady(ctx, k8sClient, utils.FormatNodeName(nodeID, env))
}, time.Second*50, time.Second).Should(BeTrue())
suite/mqtt/suite_test.go (2)

146-146: Consider replacing fixed sleep with proper shutdown synchronization

The hard-coded 15-second sleep makes tests slower and might be unreliable. Consider:

  1. Using a WaitGroup or channel to properly synchronize shutdown
  2. Adding a comment explaining why this delay is necessary if it must be retained

Line range hint 1-31: Consider improving test organization and documentation

The test suite would benefit from:

  1. Helper functions to encapsulate setup logic
  2. Documentation of prerequisites and environment variables
  3. Constants for configuration values instead of hard-coded strings

Example improvement:

const (
    defaultMQTTBroker = "localhost"
    defaultMQTTPort   = "1883"
    defaultUsername   = "test"
    // ... other constants
)

// setupMQTTEnvironment configures required environment variables
func setupMQTTEnvironment() {
    envVars := map[string]string{
        "MQTT_BROKER": defaultMQTTBroker,
        "MQTT_PORT":   defaultMQTTPort,
        // ... other variables
    }
    for k, v := range envVars {
        os.Setenv(k, v)
    }
}
suite/http/mock_http_base.go (1)

279-281: Add documentation for the helper function

Consider adding a docstring to explain the business identity format and its usage:

+// getBizIdentity returns a unique identifier for a business model in the format "bizName:bizVersion"
 func getBizIdentity(bizModel ark.BizModel) string {
     return fmt.Sprintf("%s:%s", bizModel.BizName, bizModel.BizVersion)
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1c6da51 and 1741b44.

📒 Files selected for processing (5)
  • common/model/model.go (1 hunks)
  • suite/http/mock_http_base.go (9 hunks)
  • suite/http/module_http_lifecycle_test.go (3 hunks)
  • suite/http/suite_test.go (1 hunks)
  • suite/mqtt/suite_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • common/model/model.go
  • suite/http/suite_test.go
🔇 Additional comments (11)
suite/http/module_http_lifecycle_test.go (4)

1-1: LGTM: Package name change aligns with restructuring

The package name change from 'suite' to 'http' better reflects the module's functionality and organization.


19-21: LGTM: Improved base initialization with cluster context

The addition of clusterName and its usage in NewMockHttpBase improves the test context by properly simulating a cluster environment.


27-27: Verify clientID initialization

The clientID variable is being used in mockBase.Start() but its initialization is not visible in the provided code.


41-41: Verify increased timeout duration

The timeout for node readiness check has been increased from 20s to 50s. While this might fix flaky tests, it could also mask underlying performance issues.

Consider:

  1. Adding a comment explaining why 50s is needed
  2. Investigating if there are performance optimizations possible
  3. Adding metrics to track actual node readiness time
suite/mqtt/suite_test.go (2)

104-106: LGTM!

The module deployment controller initialization is properly handled with appropriate error checking.


111-119: Previous review comment is still applicable

The issues identified in the previous review regarding error handling and code structure haven't been addressed.

suite/http/mock_http_base.go (5)

22-22: Verify the impact of Metadata type change

The change from model.Metadata to model.BaseMetadata is a breaking change. Ensure that this aligns with the updated virtual-kubelet dependency and that all consumers of this mock implementation have been updated accordingly.


31-38: LGTM: Constructor changes improve clarity

The addition of clusterName parameter and the updated metadata initialization using Identity and ClusterName provide better clarity and consistency.


162-166: LGTM: Improved status reporting

The updated heartbeat message structure provides more comprehensive status information while maintaining consistency with the metadata changes.


185-185: LGTM: Consistent naming convention

The update to use Identity instead of Name maintains consistency with the metadata changes.


248-248: LGTM: Consistent response structure updates

The update to use BaseIdentity in responses maintains consistency across the codebase.

Also applies to: 273-273

Comment on lines +87 to +95
ctx := context.Background()
go func() {
err := mqttServer.Serve()
if err != nil {
log.G(ctx).Error("failed to start mqtt server")
panic(err)
}
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve MQTT server error handling

The current error handling in the goroutine could be enhanced:

  1. Consider using a channel to communicate server startup status
  2. Pass context to Serve() for proper cancellation
+serverErrCh := make(chan error, 1)
 go func() {
-    err := mqttServer.Serve()
+    err := mqttServer.Serve(ctx)
     if err != nil {
         log.G(ctx).Error("failed to start mqtt server")
-        panic(err)
+        serverErrCh <- err
+        return
     }
+    serverErrCh <- nil
 }()
+
+if err := <-serverErrCh; err != nil {
+    panic(fmt.Sprintf("MQTT server failed to start: %v", err))
+}

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)
suite/mqtt/suite_test.go (2)

145-147: Consider making the teardown wait time configurable

The hardcoded 15-second sleep duration could lead to flaky tests if cleanup takes longer. Consider:

  1. Making this duration configurable via environment variable
  2. Adding a comment explaining why this specific duration is needed
  3. Implementing a more deterministic way to verify cleanup completion
+const defaultTeardownWaitTime = 15 * time.Second
+
 var _ = AfterSuite(func() {
     By("tearing down the suite environment")
     mqttServer.Close()
     cancel()
     testEnv.Stop()
-    time.Sleep(15 * time.Second)
+    // Wait for MQTT server and test environment to clean up completely
+    waitTime := defaultTeardownWaitTime
+    if envWait := os.Getenv("TEST_TEARDOWN_WAIT"); envWait != "" {
+        if d, err := time.ParseDuration(envWait); err == nil {
+            waitTime = d
+        }
+    }
+    time.Sleep(waitTime)
 })

Line range hint 53-61: Add cleanup for environment variables

Environment variables set in BeforeSuite should be cleaned up in AfterSuite to prevent affecting other tests.

 var _ = BeforeSuite(func() {
+    // Store original env values for restoration
+    originalEnv := map[string]string{
+        "MQTT_BROKER": os.Getenv("MQTT_BROKER"),
+        "MQTT_PORT": os.Getenv("MQTT_PORT"),
+        "MQTT_USERNAME": os.Getenv("MQTT_USERNAME"),
+        "MQTT_PASSWORD": os.Getenv("MQTT_PASSWORD"),
+        "MQTT_CLIENT_PREFIX": os.Getenv("MQTT_CLIENT_PREFIX"),
+        "ENABLE_MODULE_REPLICAS_SYNC_WITH_BASE": os.Getenv("ENABLE_MODULE_REPLICAS_SYNC_WITH_BASE"),
+    }

     os.Setenv("MQTT_BROKER", "localhost")
     os.Setenv("MQTT_PORT", "1883")
     os.Setenv("MQTT_USERNAME", "test")
     os.Setenv("MQTT_PASSWORD", "")
     os.Setenv("MQTT_CLIENT_PREFIX", "suite-test")
     os.Setenv("ENABLE_MODULE_REPLICAS_SYNC_WITH_BASE", "true")
+
+    // Store env map for cleanup in AfterSuite
+    testContextKey := "original_env"
+    GinkgoT().Setenv(testContextKey, originalEnv)
 })

 var _ = AfterSuite(func() {
     By("tearing down the suite environment")
+    // Restore original env values
+    if originalEnv, ok := GinkgoT().Env(testContextKey).(map[string]string); ok {
+        for k, v := range originalEnv {
+            if v == "" {
+                os.Unsetenv(k)
+            } else {
+                os.Setenv(k, v)
+            }
+        }
+    }
     mqttServer.Close()
     cancel()
     testEnv.Stop()
     time.Sleep(15 * time.Second)
 })
suite/mqtt/mock_mqtt_base.go (2)

85-90: Consider goroutine cleanup.

The heart beat upload goroutine might continue running even after the MQTT client is disconnected. Consider adding a cleanup mechanism.

 // Start a new goroutine to upload node heart beat data every 10 seconds
-go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
+heartbeatCtx, cancel := context.WithCancel(ctx)
+defer cancel()
+go utils.TimedTaskWithInterval(heartbeatCtx, time.Second*10, func(ctx context.Context) {
     log.G(ctx).Info("upload node heart beat data from node ", b.BaseMetadata.Identity)
     b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/heart", b.Env, b.BaseMetadata.Identity), 1, b.getHeartBeatMsg())
 })

Line range hint 289-321: Consider validating state transitions.

The SetBizState method allows any state transition without validation. Consider implementing a state machine to ensure only valid transitions are allowed.

+var validStateTransitions = map[string][]string{
+    "RESOLVED": {"ACTIVATED", "FAILED"},
+    "ACTIVATED": {"DEACTIVATED", "FAILED"},
+    "DEACTIVATED": {"RESOLVED"},
+    "FAILED": {"RESOLVED"},
+}

 func (b *MockMQTTBase) SetBizState(bizIdentity, state, reason, message string) {
     b.Lock()
     defer b.Unlock()
     info := b.BizInfos[bizIdentity]
+    if validStates, exists := validStateTransitions[info.BizState]; exists {
+        isValidTransition := false
+        for _, validState := range validStates {
+            if state == validState {
+                isValidTransition = true
+                break
+            }
+        }
+        if !isValidTransition {
+            log.G(context.Background()).Errorf("Invalid state transition from %s to %s", info.BizState, state)
+            return
+        }
+    }
     info.BizState = state
     // ... rest of the method
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1741b44 and 2ee99be.

📒 Files selected for processing (5)
  • common/model/model.go (1 hunks)
  • suite/http/mock_http_base.go (11 hunks)
  • suite/http/suite_test.go (1 hunks)
  • suite/mqtt/mock_mqtt_base.go (11 hunks)
  • suite/mqtt/suite_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • common/model/model.go
  • suite/http/suite_test.go
🔇 Additional comments (8)
suite/mqtt/suite_test.go (2)

104-108: LGTM!

The module deployment controller setup follows best practices with proper error handling and context usage.


111-119: Reusing existing review comment

The previous review comment about improving error handling and removing the unnecessary else block is still applicable.

suite/http/mock_http_base.go (5)

24-24: LGTM: Metadata structure changes look good!

The refactoring from model.Metadata to model.BaseMetadata and the addition of Identity and ClusterName fields are properly implemented in both the struct and constructor.

Also applies to: 33-46


140-148: LGTM: Well-structured heartbeat implementation

The periodic heartbeat implementation is well-structured with proper error handling and logging.


174-178: LGTM: Clean transition to BaseStatus

The migration from HeartBeatData to BaseStatus is properly implemented with all required fields.


197-199: LGTM: Identity field properly updated

The health message correctly uses the new Identity field from BaseMetadata.


291-293: LGTM: Clean utility function

The getBizIdentity function is well-focused and properly encapsulates the identity string formatting logic.

suite/mqtt/mock_mqtt_base.go (1)

21-26: LGTM! Well-structured refactoring of metadata handling.

The consolidation of metadata fields into BaseMetadata improves code organization and maintainability.

Also applies to: 32-40


func (b *MockHttpBase) Start(ctx context.Context) error {
func (b *MockHttpBase) Start(ctx context.Context, clientID string) error {
select {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

clientID parameter is used in MQTT tunnel but missing in HTTP implementation

The clientID parameter is used in other implementations, particularly in the MQTT tunnel where it's used for client identification. The HTTP tunnel implementation (MockHttpBase) should also utilize this parameter for consistency.

Key findings:

  • In module_tunnels/koupleless_mqtt_tunnel/mqtt_tunnel.go, clientID is used for MQTT client configuration
  • In module_tunnels/koupleless_http_tunnel/http_tunnel.go, clientID is received but not utilized
  • The parameter is part of the tunnel interface contract as seen in both HTTP and MQTT implementations
🔗 Analysis chain

Verify the usage of clientID parameter

The clientID parameter is added to the method signature but not used within the implementation. This might indicate missing functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other usages of clientID to understand its purpose
rg -A 5 "clientID.*Start\(" 

Length of output: 29


Script:

#!/bin/bash
# Search for any usage of clientID in the codebase
rg -A 5 "clientID"

# Search for Start method implementations in interfaces
ast-grep --pattern 'type $_ interface {
  $$$
  Start($$$) $$$
  $$$
}'

Length of output: 9296

Comment on lines +262 to +265
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/health", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/biz", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/heart", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/queryBaseline", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Check client state before publishing messages.

The SendInvalidMessage method publishes messages without verifying if the client is connected, which could lead to nil pointer dereference.

 func (b *MockMQTTBase) SendInvalidMessage() {
+    if b.client == nil {
+        log.G(context.Background()).Error("MQTT client is not initialized")
+        return
+    }
     b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/health", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
     b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/biz", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
     b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/heart", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
     b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/queryBaseline", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/health", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/biz", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/heart", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/queryBaseline", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
func (b *MockMQTTBase) SendInvalidMessage() {
if b.client == nil {
log.G(context.Background()).Error("MQTT client is not initialized")
return
}
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/health", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/biz", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/heart", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/queryBaseline", b.Env, b.BaseMetadata.Identity), 1, []byte(""))
}

@lvjing2 lvjing2 force-pushed the youji-dev branch 3 times, most recently from 9557ee0 to 078a6c6 Compare November 21, 2024 07:59
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 10

🧹 Outside diff range and nitpick comments (4)
suite/http/module_http_lifecycle_test.go (1)

19-21: Consider parameterizing test configuration

The hardcoded cluster name might limit test flexibility. Consider extracting test configuration into a separate structure or using test parameters for better maintainability and reusability.

-	clusterName := "test-cluster-name"
-	mockBase := NewMockHttpBase(nodeID, clusterName, "1.0.0", env, 1238)
+	type testConfig struct {
+		nodeID      string
+		clusterName string
+		version     string
+		env         string
+		port        int
+	}
+	cfg := testConfig{
+		nodeID:      "test-http-base",
+		clusterName: "test-cluster-name",
+		version:     "1.0.0",
+		env:         env,
+		port:        1238,
+	}
+	mockBase := NewMockHttpBase(cfg.nodeID, cfg.clusterName, cfg.version, cfg.env, cfg.port)
suite/http/suite_test.go (1)

122-161: Add documentation for the helper function.

The prepareModulePod function would benefit from documentation explaining its purpose and parameters:

+// prepareModulePod creates a Pod configured for module testing with appropriate labels and tolerations.
+// Parameters:
+//   name: The name of the pod
+//   namespace: The namespace where the pod will be created
+//   nodeName: The name of the node where the pod should be scheduled
 func prepareModulePod(name, namespace, nodeName string) v1.Pod {
suite/http/mock_http_base.go (1)

Line range hint 93-105: Enhance error handling in HTTP handlers

The handlers should return proper error responses when request body processing fails.

 if err != nil {
     logrus.Errorf("error reading body: %s", err)
+    w.WriteHeader(http.StatusBadRequest)
+    json.NewEncoder(w).Encode(map[string]string{
+        "code":    "ERROR",
+        "message": "Failed to read request body",
+    })
     return
 }

Also applies to: 112-124

suite/mqtt/mock_mqtt_base.go (1)

85-90: Consider implementing graceful shutdown for the timed task

The heartbeat task should be properly cleaned up when the context is canceled or the exit signal is received.

 go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
+    select {
+    case <-ctx.Done():
+        return
+    case <-b.exit:
+        return
+    default:
         log.G(ctx).Info("upload node heart beat data from node ", b.BaseMetadata.Identity)
         b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/heart", b.Env, b.BaseMetadata.Identity), 1, b.getHeartBeatMsg())
+    }
 })
🛑 Comments failed to post (10)
suite/http/suite_test.go (3)

105-109: 🛠️ Refactor suggestion

Improve error handling in goroutine.

The error handling in the goroutine could lead to test failures being missed or unclear failure messages:

 	go func() {
+		defer GinkgoRecover()
 		err = k8sManager.Start(ctx)
-		log.G(ctx).WithError(err).Error("k8sManager Start")
-		Expect(err).ToNot(HaveOccurred())
+		Expect(err).ToNot(HaveOccurred(), "failed to start manager")
 	}()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

	go func() {
		defer GinkgoRecover()
		err = k8sManager.Start(ctx)
		Expect(err).ToNot(HaveOccurred(), "failed to start manager")
	}()

84-90: 🛠️ Refactor suggestion

Improve error handling for tunnel start.

The current error handling uses a panic, which is not ideal for test setup. Consider using the Ginkgo assertion framework consistently:

-	if err != nil {
-		log.G(ctx).WithError(err).Error("failed to start tunnel", httpTunnel.Key())
-		panic(fmt.Sprintf("failed to start tunnel %s", httpTunnel.Key()))
-	} else {
-		log.G(ctx).Info("Tunnel started: ", httpTunnel.Key())
-	}
+	Expect(err).ToNot(HaveOccurred(), fmt.Sprintf("failed to start tunnel %s", httpTunnel.Key()))
+	log.G(ctx).Info("Tunnel started: ", httpTunnel.Key())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

	err = httpTunnel.Start(ctx, clientID, env)
	Expect(err).ToNot(HaveOccurred(), fmt.Sprintf("failed to start tunnel %s", httpTunnel.Key()))
	log.G(ctx).Info("Tunnel started: ", httpTunnel.Key())

114-120: 🛠️ Refactor suggestion

Replace hardcoded sleep with proper synchronization.

Using a fixed sleep duration of 15 seconds is fragile and could make tests flaky. Consider using proper synchronization mechanisms:

var _ = AfterSuite(func() {
	By("tearing down the suite environment")
	cancel()
+	// Create a channel to signal completion
+	done := make(chan struct{})
+	go func() {
+		defer close(done)
		testEnv.Stop()
+	}()
-	time.Sleep(15 * time.Second)
+	// Wait with timeout
+	select {
+	case <-done:
+		log.G(ctx).Info("suite for http stopped!")
+	case <-time.After(30 * time.Second):
+		log.G(ctx).Error("timeout waiting for suite to stop")
+	}
-	log.G(ctx).Info("suite for http stopped!")
})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

var _ = AfterSuite(func() {
	By("tearing down the suite environment")
	cancel()
	// Create a channel to signal completion
	done := make(chan struct{})
	go func() {
		defer close(done)
		testEnv.Stop()
	}()
	// Wait with timeout
	select {
	case <-done:
		log.G(ctx).Info("suite for http stopped!")
	case <-time.After(30 * time.Second):
		log.G(ctx).Error("timeout waiting for suite to stop")
	}
})
suite/mqtt/suite_test.go (3)

113-119: 🛠️ Refactor suggestion

Simplify error handling and improve error messages

The error handling can be simplified and made more informative.

Apply this diff to improve the implementation:

 err = mqttTunnel.Start(ctx, clientID, env)
 if err != nil {
-    log.G(ctx).WithError(err).Error("failed to start tunnel", mqttTunnel.Key())
-    panic(fmt.Sprintf("failed to start tunnel %s", mqttTunnel.Key()))
-} else {
-    log.G(ctx).Info("Tunnel started: ", mqttTunnel.Key())
+    panic(fmt.Sprintf("failed to start tunnel %s: %v", mqttTunnel.Key(), err))
 }
+log.G(ctx).Info("Tunnel started: ", mqttTunnel.Key())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

	err = mqttTunnel.Start(ctx, clientID, env)
	if err != nil {
		panic(fmt.Sprintf("failed to start tunnel %s: %v", mqttTunnel.Key(), err))
	}
	log.G(ctx).Info("Tunnel started: ", mqttTunnel.Key())

88-94: 🛠️ Refactor suggestion

Enhance MQTT server goroutine with proper context handling and error propagation

The current server startup implementation could be improved for better error handling and graceful shutdown.

Apply this diff to enhance the implementation:

+serverErrCh := make(chan error, 1)
 go func() {
+    serverCtx, serverCancel := context.WithCancel(ctx)
+    defer serverCancel()
-    err := mqttServer.Serve()
+    err := mqttServer.Serve(serverCtx)
     if err != nil {
         log.G(ctx).Error("failed to start mqtt server")
-        panic(err)
+        serverErrCh <- err
+        return
     }
+    serverErrCh <- nil
 }()
+
+// Wait for server startup status
+if err := <-serverErrCh; err != nil {
+    panic(fmt.Sprintf("MQTT server failed to start: %v", err))
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

	serverErrCh := make(chan error, 1)
	go func() {
		serverCtx, serverCancel := context.WithCancel(ctx)
		defer serverCancel()
		err := mqttServer.Serve(serverCtx)
		if err != nil {
			log.G(ctx).Error("failed to start mqtt server")
			serverErrCh <- err
			return
		}
		serverErrCh <- nil
	}()

	// Wait for server startup status
	if err := <-serverErrCh; err != nil {
		panic(fmt.Sprintf("MQTT server failed to start: %v", err))
	}

146-149: 💡 Codebase verification

Arbitrary sleep duration found in multiple test suites

Based on the initial search results, the same 15-second sleep pattern appears in both HTTP and MQTT test suites after testEnv.Stop(). This consistent usage across test suites suggests it's a deliberate choice, but it's still not an ideal practice as:

  • Fixed sleep durations can make tests flaky or unnecessarily slow
  • It may hide race conditions or improper cleanup
  • There's no documentation explaining the necessity of this specific duration

Consider:

  • Implementing proper shutdown synchronization using channels or wait groups
  • If the sleep is absolutely necessary, add comments explaining why 15 seconds specifically is required
  • Consider extracting the duration to a named constant with documentation
🔗 Analysis chain

Review the necessity of arbitrary sleep duration

The 15-second sleep after testEnv.Stop() seems arbitrary and could mask potential cleanup issues. Consider:

  1. Implementing proper cleanup synchronization
  2. Using wait groups or channels to ensure proper teardown
  3. Adding comments explaining why this specific duration is needed

Let's check if this sleep is used in other test suites:


Let me search for any related test environment setup or configuration that might explain this sleep duration.


Let me try to find the test environment implementation to understand its shutdown behavior.


Let me try one last search to find any test-related environment setup or teardown patterns.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similar sleep patterns in test files
rg -A 2 -B 2 'testEnv.Stop\(\)' --type go

Length of output: 558


Script:

#!/bin/bash
# Search for test environment setup and configuration
rg -A 5 -B 5 'NewTestEnvironment' --type go

Length of output: 43


Script:

#!/bin/bash
# Search for test environment implementation
fd -e go -x rg -l "type TestEnvironment" {}

Length of output: 47


Script:

#!/bin/bash
# Look for test environment related code and setup/teardown patterns
rg -A 5 "func TestMain|func SetupTest|func BeforeSuite|func AfterSuite" --type go

Length of output: 83

suite/http/mock_http_base.go (1)

131-139: 🛠️ Refactor suggestion

Improve heartbeat implementation

Several improvements can be made to the heartbeat implementation:

  1. Make the heartbeat URL and interval configurable
  2. Handle context cancellation properly
  3. Extract heartbeat sending logic to reduce duplication
+const (
+    defaultHeartbeatURL      = "http://127.0.0.1:7777/heartbeat"
+    defaultHeartbeatInterval = time.Second * 10
+)
+
+func (b *MockHttpBase) sendHeartbeat(ctx context.Context) error {
+    _, err := http.Post(defaultHeartbeatURL, "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
+    if err != nil {
+        return fmt.Errorf("error calling heartbeat: %w", err)
+    }
+    return nil
+}
+
-go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
-    log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)
-
-    _, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
-    if err != nil {
-        logrus.Errorf("error calling heartbeat: %s", err)
-    }
-})
+go func() {
+    ticker := time.NewTicker(defaultHeartbeatInterval)
+    defer ticker.Stop()
+    
+    for {
+        select {
+        case <-ctx.Done():
+            return
+        case <-ticker.C:
+            log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)
+            if err := b.sendHeartbeat(ctx); err != nil {
+                logrus.Error(err)
+            }
+        }
+    }
+}()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

	const (
		defaultHeartbeatURL      = "http://127.0.0.1:7777/heartbeat"
		defaultHeartbeatInterval = time.Second * 10
	)

	func (b *MockHttpBase) sendHeartbeat(ctx context.Context) error {
		_, err := http.Post(defaultHeartbeatURL, "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
		if err != nil {
			return fmt.Errorf("error calling heartbeat: %w", err)
		}
		return nil
	}

	go func() {
		ticker := time.NewTicker(defaultHeartbeatInterval)
		defer ticker.Stop()
		
		for {
			select {
			case <-ctx.Done():
				return
			case <-ticker.C:
				log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)
				if err := b.sendHeartbeat(ctx); err != nil {
					logrus.Error(err)
				}
			}
		}
	}()
suite/mqtt/mock_mqtt_base.go (1)

213-213: 🛠️ Refactor suggestion

Implement a safe publish wrapper function

Multiple MQTT publish operations throughout the file lack error handling and client state verification. Consider implementing a wrapper function to centralize these safety checks.

+func (b *MockMQTTBase) safePub(topic string, qos byte, payload []byte) error {
+    if b.client == nil {
+        return fmt.Errorf("MQTT client is not initialized")
+    }
+    return b.client.Pub(topic, qos, payload)
+}

 func (b *MockMQTTBase) processHealth() {
-    b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/health", b.Env, b.BaseMetadata.Identity), 1, b.getHealthMsg())
+    if err := b.safePub(fmt.Sprintf("koupleless_%s/%s/base/health", b.Env, b.BaseMetadata.Identity), 1, b.getHealthMsg()); err != nil {
+        log.G(context.Background()).Errorf("Failed to publish health message: %v", err)
+    }
 }

This pattern should be applied to all publish operations in the file.

Also applies to: 258-258, 262-265, 269-272, 276-277, 285-285, 320-320, 342-342, 346-346

suite/http/base_http_lifecycle_test.go (2)

20-20: ⚠️ Potential issue

Undefined variable env in NewMockHttpBase initialization

The variable env is used in the NewMockHttpBase initialization but is not defined in this context. Please ensure that env is properly defined or imported before use.


30-30: ⚠️ Potential issue

Undefined variable env in utils.FormatNodeName function calls

The variable env is used in multiple calls to utils.FormatNodeName, but env is not defined in this context. Please ensure that env is properly defined or imported before use.

Also applies to: 48-48, 61-61, 79-79, 94-94, 114-114

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
suite/mqtt/suite_test.go (3)

111-119: Optimize error handling and logging in tunnel setup

Consider these improvements:

  1. Use fmt.Errorf for error wrapping
  2. Combine the log and panic into a single error handling block
 err = mqttTunnel.Start(ctx, clientID, env)
 if err != nil {
-    log.G(ctx).WithError(err).Error("failed to start tunnel", mqttTunnel.Key())
-    panic(fmt.Sprintf("failed to start tunnel %s", mqttTunnel.Key()))
+    err = fmt.Errorf("failed to start tunnel %s: %w", mqttTunnel.Key(), err)
+    log.G(ctx).Error(err)
+    panic(err)
 } else {
     log.G(ctx).Info("Tunnel started: ", mqttTunnel.Key())
 }

121-126: Consider parameterizing VNode worker count

The VNodeWorkerNum is hardcoded to 4. Consider making this configurable through environment variables or test parameters for better flexibility in testing different scenarios.

+const defaultVNodeWorkerNum = 4
+
 vnodeController, err := vnode_controller.NewVNodeController(&model.BuildVNodeControllerConfig{
     ClientID:       clientID,
     Env:            env,
     VPodIdentity:   model2.ComponentModule,
-    VNodeWorkerNum: 4,
+    VNodeWorkerNum: getEnvInt("VNODE_WORKER_NUM", defaultVNodeWorkerNum),
 }, &mqttTunnel)

146-149: Document the purpose of sleep in teardown

A 15-second sleep has been added after stopping the test environment. Please add a comment explaining why this delay is necessary to help future maintainers understand its purpose.

 cancel()
 testEnv.Stop()
+// Sleep to ensure all resources are properly cleaned up
+// TODO: Replace with proper resource cleanup confirmation
 time.Sleep(15 * time.Second)
 log.G(ctx).Info("suite for mqtt stopped!")
🛑 Comments failed to post (1)
suite/mqtt/suite_test.go (1)

135-137: 🛠️ Refactor suggestion

Improve error handling in manager start

The current implementation logs the error after it occurs but before the assertion. This could lead to unclear test failures.

 go func() {
     err = k8sManager.Start(ctx)
-    log.G(ctx).WithError(err).Error("k8sManager Start")
-    Expect(err).ToNot(HaveOccurred())
+    if err != nil {
+        log.G(ctx).WithError(err).Error("k8sManager Start failed")
+        Expect(err).ToNot(HaveOccurred())
+    }
 }()

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (2)
suite/mqtt/suite_test.go (2)

108-112: Consider using a separate context for setup operations

The shared context ctx could be cancelled before the manager starts. Consider using a separate context with timeout for setup operations:

+setupCtx, setupCancel := context.WithTimeout(context.Background(), 30*time.Second)
+defer setupCancel()
-err = moduleDeploymentController.SetupWithManager(ctx, k8sManager)
+err = moduleDeploymentController.SetupWithManager(setupCtx, k8sManager)

129-130: Consider making VNodeWorkerNum configurable

The worker number is hardcoded to 4. Consider making it configurable via environment variable for better flexibility in different environments:

+workerNum := 4
+if envNum := os.Getenv("VNODE_WORKER_NUM"); envNum != "" {
+    if num, err := strconv.Atoi(envNum); err == nil && num > 0 {
+        workerNum = num
+    }
+}
 VNodeWorkerNum: workerNum,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 078a6c6 and 38393c1.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • go.mod (1 hunks)
  • suite/http/base_http_lifecycle_test.go (1 hunks)
  • suite/http/mock_http_base.go (12 hunks)
  • suite/http/suite_test.go (1 hunks)
  • suite/mqtt/module_deployment_controller_suite_test.go (10 hunks)
  • suite/mqtt/suite_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • go.mod
  • suite/http/base_http_lifecycle_test.go
  • suite/http/suite_test.go
🔇 Additional comments (10)
suite/mqtt/suite_test.go (1)

89-95: Previous review comment about MQTT server error handling is still applicable

The concerns raised in the previous review about improving MQTT server error handling have not been addressed.

suite/http/mock_http_base.go (4)

33-46: LGTM: Constructor changes align with new metadata structure

The constructor properly initializes the new BaseMetadata structure and reachable field. The changes reflect the broader restructuring of metadata handling across the codebase.


57-57: Implement clientID parameter usage

The clientID parameter is added but not utilized in the implementation. This parameter is used in other tunnel implementations (e.g., MQTT) for client identification.


161-165: LGTM: Improved heartbeat message structure

The update to use BaseStatus with proper field initialization improves the code structure and type safety.


278-280: LGTM: Well-structured helper function

The getBizIdentity helper function improves code organization and reduces duplication.

suite/mqtt/module_deployment_controller_suite_test.go (5)

Line range hint 1-24: LGTM! Package organization and initialization improvements.

The changes improve code organization and reduce hardcoding by:

  • Moving to a more specific package name
  • Introducing a configurable clusterName variable
  • Consistently using the cluster name in mock initializations

208-217: Remove redundant test case.

This test case appears to be redundant with the subsequent test "mock base 2 exit and replicas should be 1 finally". Both test cases verify the same behavior.


165-165: Review test stability and timing.

The increased timeout durations (20s -> 30s) might indicate potential test stability issues:

  1. Consider investigating why longer timeouts are needed
  2. Look for potential race conditions or slow operations
  3. Consider adding logging to track operation durations

Also applies to: 177-177, 205-205, 241-241

✅ Verification successful

The output shows a consistent pattern of timeouts and timing-related operations across test files. Let me analyze the specific operations to understand if the increased timeouts are justified.


The longer timeouts are justified and appropriate for the operations being tested

The timeouts align with the nature of operations being tested:

  • 50s timeouts are used for node readiness checks and cleanup verification
  • 30-40s timeouts are used for pod lifecycle operations (running state, termination)
  • 20s timeouts are used for simpler operations like checking mock states

The increased timeouts are not indicative of test stability issues, but rather reflect the expected duration of Kubernetes operations:

  • Node condition changes
  • Pod lifecycle state transitions
  • Resource cleanup and garbage collection

The sleep operations present are intentional and minimal:

  • 1s sleeps for test setup synchronization
  • 15s sleeps for suite cleanup
  • No arbitrary waits in the actual test assertions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for similar timeout patterns and potential stability issues

# Search for timeout patterns in tests
echo "Checking timeout patterns..."
rg "time\.Second\*\d+" --type go -A 1 | grep -E "Second\*|Should"

# Look for other timing-related patterns
echo "Checking for potential timing issues..."
rg "Eventually|Consistently" --type go -A 2

Length of output: 11184


Script:

#!/bin/bash
# Check the context of operations with longer timeouts (30s and above)
echo "Checking operations with longer timeouts..."
rg -B 5 "time\.Second\*(30|40|50)" --type go

# Check if there are any sleep operations that might indicate timing issues
echo "Checking for sleep operations..."
rg "time\.Sleep" --type go -A 2

Length of output: 10015


Line range hint 117-129: Verify the cluster name suffix pattern.

The use of clusterName + "-2" for the second deployment's node selector appears to be a hardcoded pattern. Consider if this should be more configurable or documented.

✅ Verification successful

Let me gather more context about the cluster name usage pattern.


Let me check the test context to understand if this is a test-specific pattern.


The cluster name suffix pattern is test-specific and not a concern

The suffix pattern clusterName + "-2" appears only in test files and is used specifically for testing multiple deployment scenarios. This is a valid test setup pattern where:

  • The first deployment uses the base cluster name
  • The second deployment uses a "-2" suffix to simulate a different target cluster
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for similar cluster name suffix patterns

# Search for hardcoded cluster name suffixes
echo "Checking cluster name suffix patterns..."
rg -A 2 "clusterName \+.*\"" --type go

Length of output: 397


Script:

#!/bin/bash
# Search for cluster name related patterns in test files
echo "Checking cluster name patterns in test files..."
rg -B 5 -A 2 "LabelKeyOfVNodeClusterName.*clusterName" --type go

# Search for actual cluster name label key definition
echo -e "\nChecking cluster name label key definition..."
rg "LabelKeyOfVNodeClusterName.*=.*" --type go

# Search for other cluster name usages
echo -e "\nChecking other cluster name usages..."
rg "clusterName.*[-+]" --type go

Length of output: 3245


Script:

#!/bin/bash
# Get more context around the test file
echo "Checking the test context..."
rg -B 10 -A 10 "test-cluster-name" suite/mqtt/module_deployment_controller_suite_test.go

# Check if there are any deployment-related constants or configurations
echo -e "\nChecking for deployment configurations..."
rg -l "type.*Deployment" --type go

Length of output: 1111


Line range hint 55-67: Verify consistency of deployment configuration changes.

The changes to image naming convention and node selection strategy should be verified across the codebase to ensure consistent implementation.

✅ Verification successful

Let me generate additional verification scripts to check the test files for a complete picture of the changes.


Let me try one more verification script with corrected regex to check the NodeSelector patterns.


Changes to deployment configuration are consistent with new strategy

Based on the verification results:

  1. The .jar suffix is consistently used across all test files and deployment configurations
  2. The LabelKeyOfVNodeName has been completely removed from the controller and test files
  3. LabelKeyOfVNodeClusterName is consistently used in both test configurations and the controller implementation
  4. The node selection strategy consistently uses tech stack, version, and cluster name labels
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistency of image naming and node selection changes

# Check for jar suffix usage in deployment configurations
echo "Checking image naming convention..."
rg -l "Image:.+\.jar" --type yaml --type go

# Check for consistent node selector label usage
echo "Checking node selector label usage..."
rg "LabelKeyOfVNodeName" --type go
rg "LabelKeyOfVNodeClusterName" --type go

Length of output: 1444


Script:

#!/bin/bash
# Check test files for deployment configurations
echo "Checking test files for deployment configurations..."
rg -A 5 "Image:.+\.jar" suite/mqtt/module_deployment_controller_suite_test.go suite/mqtt/suite_test.go suite/http/suite_test.go common/utils/utils_test.go

# Check for any remaining references to VNodeName in deployment configurations
echo "Checking for VNodeName references in deployment configurations..."
rg -A 5 "NodeSelector.*{" suite/mqtt/module_deployment_controller_suite_test.go

Length of output: 3301


Script:

#!/bin/bash
# Check NodeSelector patterns in test files
echo "Checking NodeSelector patterns..."
rg -B 2 -A 5 "NodeSelector:" suite/mqtt/module_deployment_controller_suite_test.go

# Check for any VNodeName references in the controller
echo "Checking VNodeName usage in controller..."
rg "VNodeName" controller/module_deployment_controller/module_deployment_controller.go

Length of output: 857


go func() {
err = k8sManager.Start(ctx)
log.G(ctx).WithError(err).Error("k8sManager Start")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve cleanup process

Several improvements can be made to the cleanup process:

  1. Remove redundant error logging before assertion
  2. Make sleep duration configurable
  3. Add graceful shutdown attempt
-log.G(ctx).WithError(err).Error("k8sManager Start")
 Expect(err).ToNot(HaveOccurred())

+const defaultCleanupTimeout = 15 * time.Second
+
+func getCleanupTimeout() time.Duration {
+    if val := os.Getenv("CLEANUP_TIMEOUT"); val != "" {
+        if d, err := time.ParseDuration(val); err == nil {
+            return d
+        }
+    }
+    return defaultCleanupTimeout
+}
+
 var _ = AfterSuite(func() {
     By("tearing down the suite environment")
+    shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), getCleanupTimeout())
+    defer shutdownCancel()
+    
+    // Attempt graceful shutdown of MQTT server
+    if err := mqttServer.Close(); err != nil {
+        log.G(ctx).WithError(err).Warn("Error during MQTT server shutdown")
+    }
-    mqttServer.Close()
     cancel()
+    
+    // Wait for resources to clean up
+    <-shutdownCtx.Done()
-    time.Sleep(15 * time.Second)
     testEnv.Stop()
     log.G(ctx).Info("suite for mqtt stopped!")
 })

Also applies to: 150-153

Comment on lines +115 to +123
mqttTunnel = koupleless_mqtt_tunnel.NewMqttTunnel(env, k8sManager.GetClient(), moduleDeploymentController)

err = mqttTunnel.Start(ctx, clientID, env)
if err != nil {
log.G(ctx).WithError(err).Error("failed to start tunnel", mqttTunnel.Key())
panic(fmt.Sprintf("failed to start tunnel %s", mqttTunnel.Key()))
} else {
log.G(ctx).Info("Tunnel started: ", mqttTunnel.Key())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve tunnel initialization error handling

The current implementation has several issues:

  1. Redundant logging of tunnel key in error case
  2. No timeout on tunnel start operation
  3. Inconsistent error handling pattern

Consider this improvement:

+startCtx, startCancel := context.WithTimeout(ctx, 30*time.Second)
+defer startCancel()
-err = mqttTunnel.Start(ctx, clientID, env)
+err = mqttTunnel.Start(startCtx, clientID, env)
 if err != nil {
-    log.G(ctx).WithError(err).Error("failed to start tunnel", mqttTunnel.Key())
-    panic(fmt.Sprintf("failed to start tunnel %s", mqttTunnel.Key()))
+    return fmt.Errorf("failed to start tunnel %s: %w", mqttTunnel.Key(), err)
 }
-else {
-    log.G(ctx).Info("Tunnel started: ", mqttTunnel.Key())
-}
+log.G(ctx).Info("Tunnel started: ", mqttTunnel.Key())

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +69 to +73
if b.reachable {
w.WriteHeader(http.StatusOK)
w.Header().Set("Content-Type", "application/json")
w.Write(b.getHealthMsg())
default:
} else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix HTTP response header order and improve error handling

  1. Response headers must be set before calling WriteHeader
  2. Error responses should return proper error status codes and messages

Example fix for the health endpoint (apply similar pattern to other handlers):

 mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
+    w.Header().Set("Content-Type", "application/json")
     if b.reachable {
         w.WriteHeader(http.StatusOK)
-        w.Header().Set("Content-Type", "application/json")
         w.Write(b.getHealthMsg())
     } else {
+        http.Error(w, `{"error":"service unavailable"}`, http.StatusServiceUnavailable)
-        w.WriteHeader(http.StatusBadGateway)
     }
 })

Also applies to: 79-83, 89-101, 108-120

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (5)
suite/http/base_http_lifecycle_test.go (1)

24-25: Replace time.Sleep with more reliable test synchronization

Using time.Sleep for test synchronization can make tests flaky. Consider using channels, context cancellation, or test framework features for better synchronization.

Example approach using a channel:

done := make(chan struct{})
go func() {
    mockHttpBase.Start(ctx, clientID)
    close(done)
}()
<-done

Also applies to: 55-56, 89-90

suite/mqtt/suite_test.go (2)

42-42: Consider adding timeout context for the entire test suite

While the cancel context is good for cleanup, consider wrapping it with a timeout to ensure the test suite doesn't hang indefinitely:

-var ctx, cancel = context.WithCancel(context.Background())
+var ctx, cancel = context.WithTimeout(context.Background(), 5*time.Minute)

102-113: Make metrics server port configurable

Consider making the metrics server port configurable through environment variables for better flexibility in different test environments:

+const defaultMetricsPort = "8080"
+
+func getMetricsPort() string {
+    if port := os.Getenv("METRICS_PORT"); port != "" {
+        return port
+    }
+    return defaultMetricsPort
+}

 k8sManager, err := ctrl.NewManager(cfg, ctrl.Options{
     Scheme: scheme.Scheme,
     Metrics: metricsserver.Options{
-        BindAddress: ":8080",
+        BindAddress: ":" + getMetricsPort(),
     },
 })
suite/http/mock_http_base.go (1)

278-280: Consider adding input validation to getBizIdentity

The function should validate that neither BizName nor BizVersion is empty to prevent malformed identities.

 func getBizIdentity(bizModel ark.BizModel) string {
+    if bizModel.BizName == "" || bizModel.BizVersion == "" {
+        return ""
+    }
     return fmt.Sprintf("%s:%s", bizModel.BizName, bizModel.BizVersion)
 }
suite/mqtt/module_deployment_controller_suite_test.go (1)

205-206: Review test timeout increases.

The timeout increases (20s → 30s) might indicate potential test flakiness. Consider:

  1. Investigating why longer timeouts are needed
  2. Adding logging to track operation durations
  3. Implementing retry mechanisms instead of longer timeouts

Also applies to: 241-241

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 38393c1 and fab46bd.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (7)
  • go.mod (1 hunks)
  • suite/http/base_http_lifecycle_test.go (1 hunks)
  • suite/http/mock_http_base.go (12 hunks)
  • suite/http/module_http_lifecycle_test.go (3 hunks)
  • suite/http/suite_test.go (1 hunks)
  • suite/mqtt/module_deployment_controller_suite_test.go (10 hunks)
  • suite/mqtt/suite_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • go.mod
  • suite/http/module_http_lifecycle_test.go
  • suite/http/suite_test.go
🔇 Additional comments (12)
suite/http/base_http_lifecycle_test.go (1)

1-12: LGTM! Imports are well-organized and appropriate.

suite/mqtt/suite_test.go (4)

Line range hint 1-31: LGTM: Package name and imports reflect focused MQTT testing

The package rename from 'suite' to 'mqtt' better represents the test suite's specific focus. Import changes align well with the codebase restructuring.


89-95: Refer to existing review comment about MQTT server error handling

This code segment has known issues that were already addressed in a previous review comment. Please refer to the existing comment about improving MQTT server error handling with proper channels and context management.


115-123: Refer to existing review comment about tunnel initialization

This code segment has known issues that were already addressed in a previous review comment. Please refer to the existing comment about improving tunnel initialization with proper timeouts and error handling.


150-153: Refer to existing review comment about cleanup process

This code segment has known issues that were already addressed in a previous review comment. Please refer to the existing comment about improving the cleanup process with configurable timeouts and graceful shutdown.

suite/http/mock_http_base.go (4)

57-57: Implement clientID parameter usage

The clientID parameter is added but not utilized in the implementation. This parameter is used in other implementations (e.g., MQTT tunnel) for client identification.


69-73: Fix HTTP response header order

Response headers must be set before calling WriteHeader.


247-247: Add error handling for JSON marshaling

Both install and uninstall responses ignore potential JSON marshaling errors.

Also applies to: 272-272


150-151: Consider improving the shutdown sequence

The 2-second sleep before shutdown could make tests slower and less reliable.

suite/mqtt/module_deployment_controller_suite_test.go (3)

1-1: LGTM! Package organization and initialization changes look good.

The changes improve code maintainability by:

  • Moving to a more specific package name
  • Centralizing cluster name configuration
  • Consistent mock initialization pattern

Also applies to: 21-24


208-217: Remove redundant test case.

This test case appears to be redundant with the subsequent test "mock base 2 exit and replicas should be 1 finally". Consider removing this test or clarifying its unique purpose.


55-55: Verify image naming convention and cluster name configuration.

While the changes are logically consistent, please verify:

  1. The .jar suffix in image names aligns with your container registry conventions
  2. The cluster name differentiation pattern (appending "-2") is the intended approach

Also applies to: 65-67, 117-117, 127-129

✅ Verification successful

Image naming convention and cluster naming pattern are consistently used in test files

The verification shows that:

  1. The .jar suffix is consistently used across all test files for container images, appearing in test suites and utility tests
  2. The cluster name differentiation pattern (appending "-2") is only used in test scenarios and appears to be intentional for testing multiple deployments
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for consistent image naming patterns across the codebase
rg -g '*.{yaml,yml,go}' '\.jar"' -A 2 -B 2

# Check for cluster name patterns
rg -g '*.go' 'clusterName.*-2"'

Length of output: 3533

Comment on lines +23 to +41
It("base should become a ready vnode eventually", func() {
time.Sleep(time.Second)

go mockHttpBase.Start(ctx, clientID)
Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
vnodeReady := false
for _, cond := range vnode.Status.Conditions {
if cond.Type == v1.NodeReady {
vnodeReady = cond.Status == v1.ConditionTrue
break
}
}
return err == nil && vnodeReady
}, time.Second*50, time.Second).Should(BeTrue())
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor duplicate node readiness check logic

The node readiness check logic is duplicated across multiple test cases. Consider extracting it into a helper function for better maintainability.

Create a helper function:

func waitForNodeReady(ctx context.Context, k8sClient client.Client, nodeName string) bool {
    vnode := &v1.Node{}
    err := k8sClient.Get(ctx, types.NamespacedName{
        Name: nodeName,
    }, vnode)
    if err != nil {
        return false
    }
    for _, cond := range vnode.Status.Conditions {
        if cond.Type == v1.NodeReady {
            return cond.Status == v1.ConditionTrue
        }
    }
    return false
}

Then use it in the tests:

Eventually(func() bool {
    return waitForNodeReady(ctx, k8sClient, utils.FormatNodeName(httpNodeID, env))
}, time.Second*50, time.Second).Should(BeTrue())

Also applies to: 54-72, 88-106

Comment on lines +22 to +121
Context("http base online and deactive finally", func() {
It("base should become a ready vnode eventually", func() {
time.Sleep(time.Second)

go mockHttpBase.Start(ctx, clientID)
Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
vnodeReady := false
for _, cond := range vnode.Status.Conditions {
if cond.Type == v1.NodeReady {
vnodeReady = cond.Status == v1.ConditionTrue
break
}
}
return err == nil && vnodeReady
}, time.Second*50, time.Second).Should(BeTrue())
})

It("base offline with deactive message and finally exit", func() {
mockHttpBase.Exit()
Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
return errors.IsNotFound(err)
}, time.Second*50, time.Second).Should(BeTrue())
})

It("base should become a ready vnode eventually", func() {
time.Sleep(time.Second)

go mockHttpBase.Start(ctx, clientID)
vnode := &v1.Node{}
Eventually(func() bool {
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
vnodeReady := false
for _, cond := range vnode.Status.Conditions {
if cond.Type == v1.NodeReady {
vnodeReady = cond.Status == v1.ConditionTrue
break
}
}
return err == nil && vnodeReady
}, time.Second*50, time.Second).Should(BeTrue())
})

It("base unreachable finally exit", func() {
mockHttpBase.reachable = false
Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
return errors.IsNotFound(err)
}, time.Second*50, time.Second).Should(BeTrue())

mockHttpBase.Exit()
})
})

It("base should become a ready vnode eventually", func() {
time.Sleep(time.Second)
mockHttpBase.reachable = true
go mockHttpBase.Start(ctx, clientID)
vnode := &v1.Node{}
Eventually(func() bool {
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
vnodeReady := false
for _, cond := range vnode.Status.Conditions {
if cond.Type == v1.NodeReady {
vnodeReady = cond.Status == v1.ConditionTrue
break
}
}
return err == nil && vnodeReady
}, time.Second*50, time.Second).Should(BeTrue())
})

It("base id changed finally exit", func() {
mockHttpBase.Exit()

Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
return errors.IsNotFound(err)
}, time.Second*50, time.Second).Should(BeTrue())

})

})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add test cleanup to prevent state leakage

The tests modify shared state (mockHttpBase) but don't clean up between tests. This could lead to test interference.

Add cleanup using Ginkgo's BeforeEach and AfterEach:

var mockHttpBase *MockHttpBase

BeforeEach(func() {
    mockHttpBase = NewMockHttpBase(httpNodeID, clusterName, "1.0.0", env, 1238)
    mockHttpBase.reachable = true
})

AfterEach(func() {
    mockHttpBase.Exit()
    // Wait for node to be deleted
    Eventually(func() bool {
        vnode := &v1.Node{}
        err := k8sClient.Get(ctx, types.NamespacedName{
            Name: utils.FormatNodeName(httpNodeID, env),
        }, vnode)
        return errors.IsNotFound(err)
    }, time.Second*50, time.Second).Should(BeTrue())
})

Comment on lines +140 to 142
log.G(ctx).WithError(err).Error("k8sManager Start")
Expect(err).ToNot(HaveOccurred())
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove redundant error logging before assertion

The error logging before the assertion is redundant and could make test failures harder to debug. Consider removing it:

 go func() {
     err = k8sManager.Start(ctx)
-    log.G(ctx).WithError(err).Error("k8sManager Start")
     Expect(err).ToNot(HaveOccurred())
 }()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
log.G(ctx).WithError(err).Error("k8sManager Start")
Expect(err).ToNot(HaveOccurred())
}()
err = k8sManager.Start(ctx)
Expect(err).ToNot(HaveOccurred())
}()

Comment on lines +127 to +135
// Start a new goroutine to upload node heart beat data every 10 seconds
go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)

_, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
if err != nil {
logrus.Errorf("error calling heartbeat: %s", err)
}
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate heartbeat POST request

The heartbeat POST request is duplicated - once in the goroutine and once immediately after. This could cause unnecessary load.

Remove the duplicate POST request:

 // Start a new goroutine to upload node heart beat data every 10 seconds
 go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
     log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)
     _, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
     if err != nil {
         logrus.Errorf("error calling heartbeat: %s", err)
     }
 })

-// Remove duplicate POST request
-_, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
-if err != nil {
-    logrus.Errorf("error calling heartbeat: %s", err)
-    return err
-}

Also applies to: 136-141

Comment on lines 19 to 31

type MockHttpBase struct {
sync.Mutex
ID string
Env string
CurrState string
Metadata model.Metadata
Metadata model.BaseMetadata
port int
BizInfos map[string]ark.ArkBizInfo
Baseline []ark.BizModel

exit chan struct{}
reachable chan struct{}
reachable bool
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider adding mutex protection for the reachable field

The reachable field is accessed from multiple goroutines (HTTP handlers and heartbeat) but lacks mutex protection. While there's a mutex in the struct, it's not used for this field.

Add mutex protection when accessing the reachable field:

 func (b *MockHttpBase) setReachable(value bool) {
+    b.Lock()
+    defer b.Unlock()
     b.reachable = value
 }

 func (b *MockHttpBase) isReachable() bool {
+    b.Lock()
+    defer b.Unlock()
     return b.reachable
 }

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (3)
suite/http/base_http_lifecycle_test.go (1)

27-40: Optimize polling intervals in Eventually blocks

The current polling interval of 50 seconds is excessive for unit tests and could slow down the test suite significantly.

Consider reducing the timeout and polling interval:

-}, time.Second*50, time.Second).Should(BeTrue())
+}, time.Second*5, time.Millisecond*100).Should(BeTrue())

Also applies to: 59-71, 93-105

suite/mqtt/suite_test.go (1)

108-113: Enhance error context in controller setup

Consider providing more context in error handling to aid debugging:

 moduleDeploymentController, err := module_deployment_controller.NewModuleDeploymentController(env)
-Expect(err).ToNot(HaveOccurred())
+Expect(err).ToNot(HaveOccurred(), "failed to create module deployment controller")

 err = moduleDeploymentController.SetupWithManager(ctx, k8sManager)
-Expect(err).ToNot(HaveOccurred())
+Expect(err).ToNot(HaveOccurred(), "failed to setup module deployment controller with manager")
suite/mqtt/module_deployment_controller_suite_test.go (1)

205-206: Review test stability and timeout configuration.

The test timeouts have been increased:

  • From 10s to 20s for baseline query
  • From 20s to 30s for final replica check

This might indicate potential test stability issues. Consider:

  1. Investigating why longer timeouts are needed
  2. Adding logging to track operation durations
  3. Optimizing the underlying operations if possible

Also applies to: 241-241

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fab46bd and 6240598.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • go.mod (1 hunks)
  • suite/http/base_http_lifecycle_test.go (1 hunks)
  • suite/http/mock_http_base.go (12 hunks)
  • suite/http/suite_test.go (1 hunks)
  • suite/mqtt/module_deployment_controller_suite_test.go (10 hunks)
  • suite/mqtt/suite_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • go.mod
  • suite/http/suite_test.go
🔇 Additional comments (12)
suite/http/base_http_lifecycle_test.go (2)

16-20: Missing required variable definitions

The issues regarding undefined env and clientID variables have been previously reported and are still valid.


22-121: Test cleanup and code duplication issues

The previously reported issues regarding test cleanup and duplicate node readiness check logic are still valid.

suite/mqtt/suite_test.go (5)

Line range hint 1-31: LGTM: Package and imports look good

The package name change to 'mqtt' and import modifications align well with the module's functionality.


89-95: Refer to existing comment about MQTT server error handling

A previous review already suggested improvements for this section regarding proper error handling and context management.


115-123: Refer to existing comment about tunnel initialization

A previous review already suggested improvements for this section regarding timeout and error handling patterns.


150-153: Refer to existing comment about cleanup process

A previous review already suggested improvements for this section regarding configurable timeouts and graceful shutdown.


Line range hint 89-153: Verify potential race conditions in test suite

The test suite involves multiple goroutines and shared resources. Let's verify there are no potential race conditions.

✅ Verification successful

No concerning race conditions detected in the test suite

The test suite implementation follows proper synchronization patterns:

  • Shared resources (mqttServer, k8sManager, etc.) are properly initialized in BeforeSuite
  • Context cancellation is used for coordinated shutdown
  • Cache synchronization is waited for using WaitForCacheSync
  • Proper cleanup is performed in AfterSuite with ordered shutdown sequence
  • The two goroutines (mqtt server and k8s manager) operate on independent resources
  • Resource access is protected by the underlying frameworks (mqtt server and controller-runtime)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential race conditions in test files

# Look for shared resources accessed across goroutines
echo "Checking for shared resource access patterns..."
rg -A 5 "go func" suite/mqtt/suite_test.go

# Look for mutex usage
echo "Checking for synchronization mechanisms..."
rg "sync\." suite/mqtt/suite_test.go

# Look for channel usage
echo "Checking for channel-based communication..."
rg "make\(chan" suite/mqtt/suite_test.go

Length of output: 693


Script:

#!/bin/bash
# Let's gather more context about shared resources and their usage

# Check the full context of suite_test.go to understand resource initialization and cleanup
echo "=== Full test suite context ==="
cat suite/mqtt/suite_test.go

# Check for any test-specific synchronization utilities
echo "=== Test synchronization utilities ==="
rg "Wait|Done|Stop|Close|Sync" suite/mqtt/suite_test.go

# Look for resource initialization
echo "=== Resource initialization patterns ==="
rg "var .* =" suite/mqtt/suite_test.go

Length of output: 5839

suite/http/mock_http_base.go (2)

33-46: LGTM: Constructor properly handles new BaseMetadata structure

The changes correctly initialize the BaseMetadata structure with the new required fields.


278-280: LGTM: Clean helper function implementation

The getBizIdentity helper function provides a clean way to generate consistent business identifiers.

suite/mqtt/module_deployment_controller_suite_test.go (3)

1-1: LGTM! Package name and initialization changes look good.

The changes improve code organization and reduce hardcoding by:

  • Moving to a more specific package name that aligns with the MQTT context
  • Introducing a clusterName variable for better maintainability
  • Making mock initialization more flexible

Also applies to: 21-24


165-165: Define and document the clientID parameter.

The clientID parameter is used in Start() calls but its source and purpose remain unclear. This issue was previously flagged and needs to be addressed.

Also applies to: 177-177


55-55: Verify deployment configuration changes across the codebase.

The deployment configurations have been updated with:

  • Image names now include .jar suffix
  • Updated node selector labels
  • Different cluster names for different deployments

Let's verify these changes are consistent across the codebase.

Also applies to: 65-67, 117-117, 127-129

✅ Verification successful

Let me gather more information about the image name patterns and their consistency.


Deployment configuration changes are consistent across the codebase

The verification confirms:

  • All image references consistently use the .jar suffix in test files
  • LabelKeyOfVNodeClusterName is properly implemented in the controller logic for node selection and cluster management
  • The old LabelKeyOfVNodeName label has been completely removed
  • Different cluster names (with "-2" suffix) are handled correctly in the deployment controller
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify deployment configuration consistency

# Check for image name patterns
echo "Checking image name patterns..."
rg -l "\.jar\"" --type yaml --type go

# Check for label usage
echo "Checking label key usage..."
rg "LabelKeyOfVNodeClusterName" --type go -A 2

Length of output: 2531


Script:

#!/bin/bash
# Check for all image references in test files and deployment configurations
echo "Checking all image references..."
rg "Image:\s+['\"].*['\"]" --type go --type yaml -A 1

# Check if there are any remaining references to LabelKeyOfVNodeName
echo "Checking for old label key..."
rg "LabelKeyOfVNodeName" --type go

Length of output: 1245


Context("http base online and deactive finally", func() {
It("base should become a ready vnode eventually", func() {
time.Sleep(time.Second)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid using Sleep in tests

Using time.Sleep in tests can make them flaky and slow. Consider using Gomega's built-in async testing capabilities instead.

Replace sleep statements with proper test setup:

-time.Sleep(time.Second)
+// If waiting for a specific condition, use Eventually
+Eventually(func() bool {
+    // Check your condition here
+    return true
+}, time.Second*5, time.Millisecond*100).Should(BeTrue())

Also applies to: 55-55, 89-89

Message: "",
ErrorStackTrace: "",
BaseID: b.ID,
BaseIdentity: b.Metadata.Identity,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for JSON unmarshaling in request processing

Both install and uninstall handlers ignore potential JSON unmarshaling errors.

 func (b *MockHttpBase) processInstallBiz(msg []byte) []byte {
     b.Lock()
     defer b.Unlock()
     request := ark_service.InstallBizRequest{}
-    json.Unmarshal(msg, &request)
+    if err := json.Unmarshal(msg, &request); err != nil {
+        logrus.Errorf("failed to unmarshal install request: %v", err)
+        response := ark_service.ArkResponse{
+            Code:           "ERROR",
+            Message:        "Invalid request format",
+            BaseIdentity:   b.Metadata.Identity,
+        }
+        respBytes, _ := json.Marshal(response)
+        return respBytes
+    }

Also applies to: 272-272

Comment on lines +161 to 169
msg := model.BaseStatus{
BaseMetadata: b.Metadata,
LocalIP: "127.0.0.1",
LocalHostName: "localhost",
Port: b.port,
State: b.CurrState,
MasterBizInfo: b.Metadata,
NetworkInfo: model.NetworkInfo{
LocalIP: "127.0.0.1",
LocalHostName: "localhost",
ArkletPort: b.port,
},
}
msgBytes, _ := json.Marshal(msg)
return msgBytes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for JSON marshaling

The function ignores potential JSON marshaling errors.

 func (b *MockHttpBase) getHeartBeatMsg() []byte {
     msg := model.BaseStatus{
         BaseMetadata:  b.Metadata,
         LocalIP:       "127.0.0.1",
         LocalHostName: "localhost",
         Port:          b.port,
         State:         b.CurrState,
     }
-    msgBytes, _ := json.Marshal(msg)
+    msgBytes, err := json.Marshal(msg)
+    if err != nil {
+        logrus.Errorf("failed to marshal heartbeat message: %v", err)
+        return []byte(`{"error":"internal server error"}`)
+    }
     return msgBytes
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
msg := model.BaseStatus{
BaseMetadata: b.Metadata,
LocalIP: "127.0.0.1",
LocalHostName: "localhost",
Port: b.port,
State: b.CurrState,
MasterBizInfo: b.Metadata,
NetworkInfo: model.NetworkInfo{
LocalIP: "127.0.0.1",
LocalHostName: "localhost",
ArkletPort: b.port,
},
}
msgBytes, _ := json.Marshal(msg)
return msgBytes
msg := model.BaseStatus{
BaseMetadata: b.Metadata,
LocalIP: "127.0.0.1",
LocalHostName: "localhost",
Port: b.port,
State: b.CurrState,
}
msgBytes, err := json.Marshal(msg)
if err != nil {
logrus.Errorf("failed to marshal heartbeat message: %v", err)
return []byte(`{"error":"internal server error"}`)
}
return msgBytes

Comment on lines +128 to +135
go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)

_, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
if err != nil {
logrus.Errorf("error calling heartbeat: %s", err)
}
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling in heartbeat goroutine

The heartbeat goroutine should handle context cancellation and implement proper error handling.

 go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
     log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)
-    _, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
+    req, err := http.NewRequestWithContext(ctx, "POST", "http://127.0.0.1:7777/heartbeat", bytes.NewBuffer(b.getHeartBeatMsg()))
     if err != nil {
-        logrus.Errorf("error calling heartbeat: %s", err)
+        logrus.Errorf("error creating heartbeat request: %v", err)
+        return
+    }
+    req.Header.Set("Content-Type", "application/json")
+    resp, err := http.DefaultClient.Do(req)
+    if err != nil {
+        logrus.Errorf("error sending heartbeat: %v", err)
+        return
+    }
+    defer resp.Body.Close()
+    if resp.StatusCode != http.StatusOK {
+        logrus.Errorf("unexpected status code from heartbeat: %d", resp.StatusCode)
     }
 })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)
_, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
if err != nil {
logrus.Errorf("error calling heartbeat: %s", err)
}
})
go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)
req, err := http.NewRequestWithContext(ctx, "POST", "http://127.0.0.1:7777/heartbeat", bytes.NewBuffer(b.getHeartBeatMsg()))
if err != nil {
logrus.Errorf("error creating heartbeat request: %v", err)
return
}
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err != nil {
logrus.Errorf("error sending heartbeat: %v", err)
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
logrus.Errorf("unexpected status code from heartbeat: %d", resp.StatusCode)
}
})

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
common/utils/utils.go (1)

Line range hint 203-220: Fix error variable in deletion check.

There's a bug in the error handling where the wrong error variable is used in the deletion check.

Apply this fix:

 deleteErr := k8sClient.Delete(ctx, &node)
-if deleteErr != nil && !apiErrors.IsNotFound(err) {
+if deleteErr != nil && !apiErrors.IsNotFound(deleteErr) {
     logger.WithError(deleteErr).Info("delete base node failed")
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6240598 and b6f59a3.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • cmd/module-controller/main.go (4 hunks)
  • common/utils/utils.go (5 hunks)
  • go.mod (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod
🔇 Additional comments (7)
cmd/module-controller/main.go (2)

144-148: ⚠️ Potential issue

Improve tunnel initialization error handling

The tunnel initialization lacks proper error handling:

  1. startTunnels uses panic which is caught nowhere
  2. No error checking on the returned tunnel

Consider this safer approach:

-   tunnel := startTunnels(ctx, clientID, env, mgr, mdc)
+   tunnel, err := startTunnels(ctx, clientID, env, mgr, mdc)
+   if err != nil {
+       log.G(ctx).Error(err, "failed to initialize tunnels")
+       os.Exit(1)
+   }

    vc, err := vnode_controller.NewVNodeController(&rcc, tunnel)
    if err != nil {
        log.G(ctx).Error(err, "unable to set up VNodeController")
-       return
+       os.Exit(1)
    }

Likely invalid or redundant comment.


105-106: Review logger configuration choice

Using NullLogSink will suppress all controller-runtime logs. Consider using a proper logger implementation to maintain observability of the controller's operations.

✅ Verification successful

Let me gather more context about the logging setup and its usage.


Using NullLogSink for controller-runtime is intentional and correct

The code uses a dual logging strategy where:

  1. Controller-runtime logs are suppressed using NullLogSink
  2. Application's actual logging is handled through a custom logger (logrus) which is properly configured and actively used throughout the codebase for error reporting, info messages, and debugging

The extensive logging calls found in the code confirm that important operational logs are not being suppressed, they're just using a different logging system.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are other logger configurations in the codebase
rg -B 2 -A 2 'SetLogger|NullLogSink' --type go

Length of output: 1050


Script:

#!/bin/bash
# Check for logging-related imports and configurations
rg -B 2 -A 2 'import.*".*log.*"' --type go

# Check for any logging calls in the module controller
rg -B 2 -A 2 'log\.' cmd/module-controller/

Length of output: 4543

common/utils/utils.go (5)

22-23: LGTM! Function rename improves clarity.

The renaming from GetBaseIDFromTopic to GetBaseIdentityFromTopic better reflects its purpose while maintaining the same correct functionality.


72-95: LGTM! Clean model conversion with proper state handling.

The function successfully:

  • Converts between the new model types
  • Properly maps activation states
  • Correctly handles port information for tunneling

98-99: LGTM! Consistent with new naming convention.

The rename from TranslateHealthDataToNodeStatus to ConvertHealthDataToNodeStatus aligns with the new naming pattern while preserving the correct functionality.


121-127: LGTM! Clean model conversion.

The function correctly maps the BaseMetadata fields to QueryBaselineRequest while following the new naming convention.


222-243: LGTM! Clean model conversion with proper field mapping.

The function correctly maps all required fields from NodeInfo to BaseStatus.

Comment on lines +132 to 143
mdc, err := module_deployment_controller.NewModuleDeploymentController(env)
if err != nil {
log.G(ctx).Error(err, "unable to set up VNodeController")
log.G(ctx).Error(err, "unable to set up module_deployment_controller")
return
}

err = vc.SetupWithManager(ctx, mgr)
err = mdc.SetupWithManager(ctx, mgr)
if err != nil {
log.G(ctx).WithError(err).Error("unable to setup vnode controller")
return
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Standardize error handling and fix error message

The error handling is inconsistent with other parts of the code:

  1. Some errors trigger os.Exit(1) while others just return
  2. The error message incorrectly mentions "vnode controller" instead of "module deployment controller"

Apply this diff to fix these issues:

    mdc, err := module_deployment_controller.NewModuleDeploymentController(env)
    if err != nil {
-       log.G(ctx).Error(err, "unable to set up module_deployment_controller")
-       return
+       log.G(ctx).Error(err, "unable to set up module deployment controller")
+       os.Exit(1)
    }

    err = mdc.SetupWithManager(ctx, mgr)
    if err != nil {
-       log.G(ctx).WithError(err).Error("unable to setup vnode controller")
-       return
+       log.G(ctx).WithError(err).Error("unable to setup module deployment controller")
+       os.Exit(1)
    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mdc, err := module_deployment_controller.NewModuleDeploymentController(env)
if err != nil {
log.G(ctx).Error(err, "unable to set up VNodeController")
log.G(ctx).Error(err, "unable to set up module_deployment_controller")
return
}
err = vc.SetupWithManager(ctx, mgr)
err = mdc.SetupWithManager(ctx, mgr)
if err != nil {
log.G(ctx).WithError(err).Error("unable to setup vnode controller")
return
}
mdc, err := module_deployment_controller.NewModuleDeploymentController(env)
if err != nil {
log.G(ctx).Error(err, "unable to set up module deployment controller")
os.Exit(1)
}
err = mdc.SetupWithManager(ctx, mgr)
if err != nil {
log.G(ctx).WithError(err).Error("unable to setup module deployment controller")
os.Exit(1)
}

Comment on lines +174 to +219
func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) tunnel.Tunnel {
// Initialize tunnels based on configuration
tunnels := make([]tunnel.Tunnel, 0)

mqttTunnelEnable := utils.GetEnv("ENABLE_MQTT_TUNNEL", "false")
if mqttTunnelEnable == "true" {
mqttTl := koupleless_mqtt_tunnel.NewMqttTunnel(env, mgr.GetClient(), moduleDeploymentController)
tunnels = append(tunnels, &mqttTl)
}

httpTunnelEnable := utils.GetEnv("ENABLE_HTTP_TUNNEL", "false")
if httpTunnelEnable == "true" {
httpTunnelListenPort, err := strconv.Atoi(utils.GetEnv("HTTP_TUNNEL_LISTEN_PORT", "7777"))

if err != nil {
log.G(ctx).WithError(err).Error("failed to parse HTTP_TUNNEL_LISTEN_PORT, set default port 7777")
httpTunnelListenPort = 7777
}

httpTl := koupleless_http_tunnel.NewHttpTunnel(env, mgr.GetClient(), moduleDeploymentController, httpTunnelListenPort)
tunnels = append(tunnels, &httpTl)
}

// Start all tunnels
successTunnelCount := 0
startFailedCount := 0
for _, t := range tunnels {
err := t.Start(ctx, clientId, env)
if err != nil {
log.G(ctx).WithError(err).Error("failed to start tunnel", t.Key())
startFailedCount++
} else {
log.G(ctx).Info("Tunnel started: ", t.Key())
successTunnelCount++
}
}

if startFailedCount > 0 {
panic(errors.New(fmt.Sprintf("failed to start %d tunnels", startFailedCount)))
} else if successTunnelCount == 0 {
panic(errors.New(fmt.Sprintf("successfully started 0 tunnels")))
}
// we only using one tunnel for now
return tunnels[0]
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Refactor startTunnels for better error handling and safety

Several issues need attention:

  1. Using panic for error handling is not idiomatic Go
  2. Potential nil pointer dereference in t.Key()
  3. Unsafe array access at tunnels[0]
  4. Comment suggests technical debt

Consider this safer implementation:

 func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
-    moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) tunnel.Tunnel {
+    moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) (tunnel.Tunnel, error) {
     tunnels := make([]tunnel.Tunnel, 0)

     mqttTunnelEnable := utils.GetEnv("ENABLE_MQTT_TUNNEL", "false")
     if mqttTunnelEnable == "true" {
         mqttTl := koupleless_mqtt_tunnel.NewMqttTunnel(env, mgr.GetClient(), moduleDeploymentController)
         tunnels = append(tunnels, &mqttTl)
     }

     httpTunnelEnable := utils.GetEnv("ENABLE_HTTP_TUNNEL", "false")
     if httpTunnelEnable == "true" {
         httpTunnelListenPort, err := strconv.Atoi(utils.GetEnv("HTTP_TUNNEL_LISTEN_PORT", "7777"))
         if err != nil {
             log.G(ctx).WithError(err).Error("failed to parse HTTP_TUNNEL_LISTEN_PORT, set default port 7777")
             httpTunnelListenPort = 7777
         }

         httpTl := koupleless_http_tunnel.NewHttpTunnel(env, mgr.GetClient(), moduleDeploymentController, httpTunnelListenPort)
         tunnels = append(tunnels, &httpTl)
     }

     if len(tunnels) == 0 {
         return nil, fmt.Errorf("no tunnels configured to start")
     }

     // Start all tunnels
     var startedTunnel tunnel.Tunnel
     for _, t := range tunnels {
         if t == nil {
             continue
         }
         err := t.Start(ctx, clientId, env)
         if err != nil {
             log.G(ctx).WithError(err).Error("failed to start tunnel", t.Key())
             continue
         }
         log.G(ctx).Info("Tunnel started: ", t.Key())
         startedTunnel = t
         break
     }

     if startedTunnel == nil {
         return nil, fmt.Errorf("failed to start any tunnel")
     }

     return startedTunnel, nil
 }

This implementation:

  1. Returns errors instead of panicking
  2. Checks for nil before calling methods
  3. Safely handles tunnel selection
  4. Provides clear error messages

Consider defining a strategy for handling multiple tunnels in the future, as the current implementation only uses the first successful tunnel.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) tunnel.Tunnel {
// Initialize tunnels based on configuration
tunnels := make([]tunnel.Tunnel, 0)
mqttTunnelEnable := utils.GetEnv("ENABLE_MQTT_TUNNEL", "false")
if mqttTunnelEnable == "true" {
mqttTl := koupleless_mqtt_tunnel.NewMqttTunnel(env, mgr.GetClient(), moduleDeploymentController)
tunnels = append(tunnels, &mqttTl)
}
httpTunnelEnable := utils.GetEnv("ENABLE_HTTP_TUNNEL", "false")
if httpTunnelEnable == "true" {
httpTunnelListenPort, err := strconv.Atoi(utils.GetEnv("HTTP_TUNNEL_LISTEN_PORT", "7777"))
if err != nil {
log.G(ctx).WithError(err).Error("failed to parse HTTP_TUNNEL_LISTEN_PORT, set default port 7777")
httpTunnelListenPort = 7777
}
httpTl := koupleless_http_tunnel.NewHttpTunnel(env, mgr.GetClient(), moduleDeploymentController, httpTunnelListenPort)
tunnels = append(tunnels, &httpTl)
}
// Start all tunnels
successTunnelCount := 0
startFailedCount := 0
for _, t := range tunnels {
err := t.Start(ctx, clientId, env)
if err != nil {
log.G(ctx).WithError(err).Error("failed to start tunnel", t.Key())
startFailedCount++
} else {
log.G(ctx).Info("Tunnel started: ", t.Key())
successTunnelCount++
}
}
if startFailedCount > 0 {
panic(errors.New(fmt.Sprintf("failed to start %d tunnels", startFailedCount)))
} else if successTunnelCount == 0 {
panic(errors.New(fmt.Sprintf("successfully started 0 tunnels")))
}
// we only using one tunnel for now
return tunnels[0]
}
func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) (tunnel.Tunnel, error) {
tunnels := make([]tunnel.Tunnel, 0)
mqttTunnelEnable := utils.GetEnv("ENABLE_MQTT_TUNNEL", "false")
if mqttTunnelEnable == "true" {
mqttTl := koupleless_mqtt_tunnel.NewMqttTunnel(env, mgr.GetClient(), moduleDeploymentController)
tunnels = append(tunnels, &mqttTl)
}
httpTunnelEnable := utils.GetEnv("ENABLE_HTTP_TUNNEL", "false")
if httpTunnelEnable == "true" {
httpTunnelListenPort, err := strconv.Atoi(utils.GetEnv("HTTP_TUNNEL_LISTEN_PORT", "7777"))
if err != nil {
log.G(ctx).WithError(err).Error("failed to parse HTTP_TUNNEL_LISTEN_PORT, set default port 7777")
httpTunnelListenPort = 7777
}
httpTl := koupleless_http_tunnel.NewHttpTunnel(env, mgr.GetClient(), moduleDeploymentController, httpTunnelListenPort)
tunnels = append(tunnels, &httpTl)
}
if len(tunnels) == 0 {
return nil, fmt.Errorf("no tunnels configured to start")
}
// Start all tunnels
var startedTunnel tunnel.Tunnel
for _, t := range tunnels {
if t == nil {
continue
}
err := t.Start(ctx, clientId, env)
if err != nil {
log.G(ctx).WithError(err).Error("failed to start tunnel", t.Key())
continue
}
log.G(ctx).Info("Tunnel started: ", t.Key())
startedTunnel = t
break
}
if startedTunnel == nil {
return nil, fmt.Errorf("failed to start any tunnel")
}
return startedTunnel, nil
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (3)
samples/module_controller_pod.yaml (1)

1-5: LGTM! Consider updating documentation.

The transition from Pod to Deployment is a good improvement that enables better availability and management capabilities.

Consider updating the README or deployment documentation to reflect this architectural change, especially noting the high-availability benefits of running multiple replicas.

common/utils/utils.go (2)

72-95: Consider using constants for state strings.

The function correctly converts BaseStatus to NodeInfo, but the hardcoded string "ACTIVATED" could be replaced with a constant for better maintainability.

+ const (
+    StateActivated = "ACTIVATED"
+ )

 func ConvertBaseStatusToNodeInfo(data model.BaseStatus, env string) vkModel.NodeInfo {
     state := vkModel.NodeStateDeactivated
-    if strings.EqualFold(data.State, "ACTIVATED") {
+    if strings.EqualFold(data.State, StateActivated) {
         state = vkModel.NodeStateActivated
     }

Line range hint 203-219: Consider consistent error handling in delete operation.

The error check in the delete operation uses the wrong error variable:

if deleteErr != nil && !apiErrors.IsNotFound(err)

should be:

if deleteErr != nil && !apiErrors.IsNotFound(deleteErr)
 func OnBaseUnreachable(ctx context.Context, nodeName string, k8sClient client.Client) {
     node := corev1.Node{}
     err := k8sClient.Get(ctx, client.ObjectKey{Name: nodeName}, &node)
     logger := log.G(ctx).WithField("nodeName", nodeName).WithField("func", "OnNodeNotReady")
     if err == nil {
         logger.Info("DeleteBaseNode")
         deleteErr := k8sClient.Delete(ctx, &node)
-        if deleteErr != nil && !apiErrors.IsNotFound(err) {
+        if deleteErr != nil && !apiErrors.IsNotFound(deleteErr) {
             logger.WithError(deleteErr).Info("delete base node failed")
         }
     } else if apiErrors.IsNotFound(err) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b6f59a3 and 073d4c3.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (4)
  • cmd/module-controller/main.go (4 hunks)
  • common/utils/utils.go (5 hunks)
  • go.mod (1 hunks)
  • samples/module_controller_pod.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod
🧰 Additional context used
🪛 yamllint (1.29.0-1)
samples/module_controller_pod.yaml

[error] 55-55: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (6)
samples/module_controller_pod.yaml (2)

16-24: Verify the new image version compatibility.

The image version has been updated to module-controller-v2:v2.1.2. Please ensure:

  1. The version change is documented in the changelog
  2. The new version is compatible with the current infrastructure
  3. The image has been tested in a staging environment

7-14: Verify if multiple replicas are appropriate for this controller.

Running 3 replicas of a controller pod might lead to race conditions or conflicts if the controller isn't designed for concurrent operation. Please verify:

  1. Is the controller stateless?
  2. Does it implement leader election?
  3. Can it handle concurrent operations safely?
✅ Verification successful

Let me search for leader election implementation in the controller code.


Let me check the manager configuration and controller initialization to understand how the controller is set up.


Let me check one final aspect - how the controller processes events and if there's any synchronization mechanism in place.


Running multiple replicas of this controller is safe and recommended

The controller is designed to handle concurrent operations safely through multiple mechanisms:

  • The RuntimeInfoStore uses proper synchronization with RWMutex for thread-safe access to shared state
  • The controller uses a cache with proper synchronization (WaitForCacheSync)
  • The controller maintains internal state in a thread-safe manner through its RuntimeInfoStore

Running multiple replicas provides:

  • High availability in case of node failures
  • Better load distribution for processing events
  • Failover capability without service interruption
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for leader election implementation
rg -A 5 "leaderelection|NewLeaderElectionConfig|LeaderElectionConfiguration"

# Search for potential concurrent operation handling
ast-grep --pattern 'import (
  $$$
  "k8s.io/client-go/tools/leaderelection"
  $$$
)'

Length of output: 28713


Script:

#!/bin/bash
# Search for leader election configuration in main.go and controller files
rg -A 10 "NewManager|cmd.AddFlags|LeaderElection" cmd/module-controller/main.go

# Search for controller implementation details
ast-grep --pattern 'type ModuleDeploymentController struct {
  $$$
}'

Length of output: 1410


Script:

#!/bin/bash
# Check manager setup in main.go
rg -A 15 "mgr, err := ctrl.NewManager\(" cmd/module-controller/main.go

# Check controller initialization
rg -A 10 "func NewModuleDeploymentController|func \(c \*ModuleDeploymentController\) Reconcile" controller/module_deployment_controller/

Length of output: 1441


Script:

#!/bin/bash
# Check for synchronization mechanisms in the controller
rg -A 10 "mutex|Lock|Sync|concurrent" controller/module_deployment_controller/

# Check the runtime store implementation
rg -A 10 "type RuntimeInfoStore struct" controller/module_deployment_controller/runtime_info_store.go

Length of output: 8711

common/utils/utils.go (4)

22-23: LGTM! Clear and descriptive function name.

The renaming from GetBaseIDFromTopic to GetBaseIdentityFromTopic improves clarity while maintaining the same functionality.


98-99: LGTM! Function name aligns with new conventions.

The renaming from TranslateHealthDataToNodeStatus to ConvertHealthDataToNodeStatus maintains consistency with other conversion functions.


121-127: LGTM! Simple and correct conversion logic.

The function properly maps BaseMetadata fields to QueryBaselineRequest.


232-243: LGTM! Clean conversion of NodeInfo to BaseStatus.

The function correctly maps all required fields from NodeInfo to BaseStatus.

Note: The port handling improvement suggestion from the previous review is still applicable.

Comment on lines +25 to +55
env:
- name: ENABLE_MODULE_DEPLOYMENT_CONTROLLER
value: "true"
- name: ENABLE_MQTT_TUNNEL # mqtt tunnel flag
value: "true" # true means enable, if enable mqtt tunnel, please configure env below
- name: MQTT_BROKER # mqtt broker url
value: YOUR_MQTT_BROKER_ENDPOINT
- name: MQTT_PORT # mqtt port
value: YOUR_MQTT_PORT
- name: MQTT_USERNAME # mqtt username
value: YOUR_MQTT_USERNAME
- name: MQTT_PASSWORD # mqtt password
value: YOUR_MQTT_PASSWORD
- name: MQTT_CLIENT_PREFIX # mqtt client prefix
value: YOUR_MQTT_CLIENT_PREFIX
- name: ENABLE_HTTP_TUNNEL # http tunnel flag
value: "true" # true means enable, if enable mqtt tunnel, you can choose to configure env below
- name: HTTP_TUNNEL_LISTEN_PORT # Module Controller HTTP Tunnel listen port, use 7777 for default
value: "7777"
- name: REPORT_HOOKS # error report hooks, only support dingTalk robot webhook
value: YOUR_REPORT_HOOKS
- name: ENV # module controller env, will set to vnode labels
value: YOUR_ENV
- name: IS_CLUSTER # cluster flag, if true, will use cluster configure to start virtual kubelet
value: "false"
- name: WORKLOAD_MAX_LEVEL # cluster configure, means max workload level for workload calculation in virtual kubelet, default is 3
value: "3"
- name: ENABLE_MODULE_DEPLOYMENT_CONTROLLER # module deployment controller flag, if true, will start deployment controller to modify module deployment's replicas and baseline
value: "true"
- name: VNODE_WORKER_NUM # VNode concurrent module processing thread number, set to 1 to indicate single thread
value: "8" No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Address security and configuration issues.

Several issues need attention:

  1. The manifest contains placeholder values (YOUR_*) which might accidentally be deployed
  2. ENABLE_MODULE_DEPLOYMENT_CONTROLLER is defined twice (lines 26 and 52)
  3. Missing newline at end of file

Recommendations:

  1. Move sensitive configurations to Kubernetes Secrets
  2. Remove the duplicate environment variable
  3. Add validation for boolean values

Apply this diff to fix the immediate issues:

           env:
-            - name: ENABLE_MODULE_DEPLOYMENT_CONTROLLER
-              value: "true"
             - name: MQTT_BROKER # mqtt broker url
-              value: YOUR_MQTT_BROKER_ENDPOINT
+              valueFrom:
+                secretKeyRef:
+                  name: mqtt-credentials
+                  key: broker
             - name: MQTT_USERNAME # mqtt username
-              value: YOUR_MQTT_USERNAME
+              valueFrom:
+                secretKeyRef:
+                  name: mqtt-credentials
+                  key: username
             - name: MQTT_PASSWORD # mqtt password
-              value: YOUR_MQTT_PASSWORD
+              valueFrom:
+                secretKeyRef:
+                  name: mqtt-credentials
+                  key: password

Also, add a newline at the end of the file.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 yamllint (1.29.0-1)

[error] 55-55: no new line character at the end of file

(new-line-at-end-of-file)

Comment on lines +105 to +106
ctrl.SetLogger(logr.New(log2.NullLogSink{}))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Reconsider using NullLogSink for controller-runtime logging

Using NullLogSink will suppress all controller-runtime framework logs, which could make debugging issues in production more difficult. Consider using a proper logger configuration that allows for appropriate log levels.

-ctrl.SetLogger(logr.New(log2.NullLogSink{}))
+ctrl.SetLogger(log2.NewFunctionLogger(
+    func(prefix, args string) {
+        log.G(ctx).V(4).Info(prefix + " " + args)
+    },
+))

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
samples/module_controller_pod.yaml (2)

7-14: Consider adding deployment strategy and resource management.

The deployment configuration could be enhanced with:

  1. An explicit update strategy for controlled rollouts
  2. Pod disruption budget for availability guarantees
  3. Justification for the 3 replicas setting

Add these configurations:

 spec:
   replicas: 3
+  strategy:
+    type: RollingUpdate
+    rollingUpdate:
+      maxSurge: 1
+      maxUnavailable: 0
   selector:
     matchLabels:
       app: module-controller

Also consider adding a PodDisruptionBudget:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: module-controller-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: module-controller

55-55: Add newline at end of file.

Add a newline character at the end of the file to comply with POSIX standards.

🧰 Tools
🪛 yamllint (1.29.0-1)

[error] 55-55: no new line character at the end of file

(new-line-at-end-of-file)

cmd/module-controller/main.go (1)

174-219: Add graceful shutdown handling for tunnels

The startTunnels function initializes tunnels but doesn't provide a mechanism for graceful shutdown. This could lead to resource leaks or incomplete cleanup when the application terminates.

Consider implementing a shutdown mechanism:

 func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
     moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) tunnel.Tunnel {
+    // Create a cleanup function for graceful shutdown
+    cleanup := func(tunnels []tunnel.Tunnel) {
+        for _, t := range tunnels {
+            if t != nil {
+                if err := t.Stop(); err != nil {
+                    log.G(ctx).WithError(err).Error("failed to stop tunnel", t.Key())
+                }
+            }
+        }
+    }
+
+    // Register cleanup with the manager
+    if err := mgr.Add(manager.RunnableFunc(func(ctx context.Context) error {
+        <-ctx.Done()
+        cleanup(tunnels)
+        return nil
+    })); err != nil {
+        panic(fmt.Errorf("failed to register tunnel cleanup: %v", err))
+    }
common/utils/utils.go (2)

72-95: Consider using constants for state strings

The function looks good overall, but consider defining constants for the state strings to prevent typos and improve maintainability.

+const (
+    StateActivated   = "ACTIVATED"
+    StateDeactivated = "DEACTIVATED"
+)

 func ConvertBaseStatusToNodeInfo(data model.BaseStatus, env string) vkModel.NodeInfo {
     state := vkModel.NodeStateDeactivated
-    if strings.EqualFold(data.State, "ACTIVATED") {
+    if strings.EqualFold(data.State, StateActivated) {
         state = vkModel.NodeStateActivated
     }

203-207: Consider adding context timeout

While the error handling is good, consider adding a timeout to the context for the Kubernetes client operations to prevent hanging in case of API server issues.

 func OnBaseUnreachable(ctx context.Context, nodeName string, k8sClient client.Client) {
+    ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
+    defer cancel()
     // base not ready, delete from api server
     node := corev1.Node{}
     err := k8sClient.Get(ctx, client.ObjectKey{Name: nodeName}, &node)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 073d4c3 and a295e86.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • cmd/module-controller/main.go (4 hunks)
  • common/utils/utils.go (5 hunks)
  • go.mod (1 hunks)
  • samples/module_controller_pod.yaml (1 hunks)
  • suite/http/base_http_lifecycle_test.go (1 hunks)
  • suite/mqtt/base_mqtt_lifecycle_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • go.mod
  • suite/http/base_http_lifecycle_test.go
  • suite/mqtt/base_mqtt_lifecycle_test.go
🧰 Additional context used
🪛 yamllint (1.29.0-1)
samples/module_controller_pod.yaml

[error] 55-55: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (6)
samples/module_controller_pod.yaml (2)

25-55: ⚠️ Potential issue

Critical: Move sensitive data to Kubernetes Secrets.

Several critical issues need attention:

  1. Sensitive credentials are exposed in plain text
  2. ENABLE_MODULE_DEPLOYMENT_CONTROLLER is defined twice (lines 26 and 52)
  3. Placeholder values (YOUR_*) could accidentally be deployed

Create a secret for sensitive data:

apiVersion: v1
kind: Secret
metadata:
  name: module-controller-config
type: Opaque
stringData:
  mqtt_broker: "your-broker"
  mqtt_username: "your-username"
  mqtt_password: "your-password"
  report_hooks: "your-hooks"

Then update the deployment:

           env:
             - name: ENABLE_MODULE_DEPLOYMENT_CONTROLLER
               value: "true"
             - name: MQTT_BROKER
-              value: YOUR_MQTT_BROKER_ENDPOINT
+              valueFrom:
+                secretKeyRef:
+                  name: module-controller-config
+                  key: mqtt_broker
             - name: MQTT_USERNAME
-              value: YOUR_MQTT_USERNAME
+              valueFrom:
+                secretKeyRef:
+                  name: module-controller-config
+                  key: mqtt_username
-            - name: ENABLE_MODULE_DEPLOYMENT_CONTROLLER
-              value: "true"

Also, consider using a ConfigMap for non-sensitive configuration values.

🧰 Tools
🪛 yamllint (1.29.0-1)

[error] 55-55: no new line character at the end of file

(new-line-at-end-of-file)


1-5: Verify namespace and deployment naming strategy.

While converting to a Deployment is good for high availability, consider:

  1. Using a dedicated namespace instead of default for better isolation
  2. Clarifying the purpose of the -pre suffix in the deployment name
✅ Verification successful

Using default namespace aligns with project conventions

The namespace usage in this deployment follows the established pattern in the project. The default namespace is consistently used across multiple module-related resources (module-daemonset, module-deployment). The -pre suffix appears to be a new pattern, possibly for staging deployments, while the main controller in quick-start uses no suffix.

  • The namespace choice is consistent with:
    • samples/module/module_daemonset.yaml
    • samples/module/module_deployment.yaml
    • Other service accounts and bindings in the project
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are other resources in the default namespace that might conflict
kubectl get all -n default -o json | jq '.items[] | select(.metadata.name | contains("module-controller"))'

# Check if there are any namespace conventions in place
find . -type f -name "*.yaml" -exec grep -l "namespace:" {} \;

Length of output: 549


Script:

#!/bin/bash
# Check namespace usage patterns in the repository
rg "namespace:" -A 2 -B 2 --type yaml

# Check for any deployment naming patterns
rg "name: .*module-controller.*" -A 2 -B 2 --type yaml

Length of output: 3655

common/utils/utils.go (4)

22-23: LGTM! Function rename improves clarity

The renaming from GetBaseIDFromTopic to GetBaseIdentityFromTopic better reflects its purpose while maintaining the same correct functionality.


98-99: LGTM! Function rename maintains consistency

The renaming from TranslateHealthDataToNodeStatus to ConvertHealthDataToNodeStatus aligns with the new naming convention while preserving the correct functionality.


121-126: LGTM! Clean and correct conversion logic

The function properly maps the BaseMetadata fields to the QueryBaselineRequest structure.


222-242: LGTM! Function rename and return type change improve clarity

The renaming and restructuring of the function better reflects its purpose and responsibilities.

Note: The hardcoded port value concern was previously raised and is still applicable.

Comment on lines +16 to +24
serviceAccountName: virtual-kubelet
containers:
- name: module-controller
image: serverless-registry.cn-shanghai.cr.aliyuncs.com/opensource/release/module-controller-v2:v2.1.2
imagePullPolicy: Always
resources:
limits:
cpu: "1000m"
memory: "400Mi"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add container health checks and resource requests.

The container configuration should include:

  1. Resource requests alongside limits
  2. Liveness and readiness probes
  3. Security context

Add these configurations:

       containers:
         - name: module-controller
           image: serverless-registry.cn-shanghai.cr.aliyuncs.com/opensource/release/module-controller-v2:v2.1.2
           imagePullPolicy: Always
           resources:
             limits:
               cpu: "1000m"
               memory: "400Mi"
+            requests:
+              cpu: "500m"
+              memory: "200Mi"
+          livenessProbe:
+            httpGet:
+              path: /healthz
+              port: 7777
+            initialDelaySeconds: 15
+            periodSeconds: 20
+          readinessProbe:
+            httpGet:
+              path: /readyz
+              port: 7777
+            initialDelaySeconds: 5
+            periodSeconds: 10
+          securityContext:
+            runAsNonRoot: true
+            runAsUser: 1000
+            readOnlyRootFilesystem: true

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +146 to 156
vc, err := vnode_controller.NewVNodeController(&rcc, tunnel)
if err != nil {
log.G(ctx).Error(err, "unable to set up module_deployment_controller")
log.G(ctx).Error(err, "unable to set up VNodeController")
return
}

err = mdc.SetupWithManager(ctx, mgr)
err = vc.SetupWithManager(ctx, mgr)
if err != nil {
log.G(ctx).WithError(err).Error("unable to setup vnode controller")
return
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Standardize error handling in VNode controller setup

The error handling in this section is inconsistent. Some errors trigger a return while others call os.Exit(1). This inconsistency could lead to unpredictable shutdown behavior.

Apply this diff to standardize error handling:

    vc, err := vnode_controller.NewVNodeController(&rcc, tunnel)
    if err != nil {
        log.G(ctx).Error(err, "unable to set up VNodeController")
-       return
+       os.Exit(1)
    }

    err = vc.SetupWithManager(ctx, mgr)
    if err != nil {
        log.G(ctx).WithError(err).Error("unable to setup vnode controller")
-       return
+       os.Exit(1)
    }

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (7)
suite/mqtt/base_mqtt_lifecycle_test.go (1)

94-102: Consider improving test stability

The 2-minute timeout suggests potential test flakiness. Consider:

  1. Adding logging to track state changes
  2. Implementing a more deterministic way to detect node unreachability
  3. Documenting why such a long timeout is necessary
 mockMqttBase.reachable = false
+// Add logging to track state changes
+t.Logf("Setting node %s as unreachable", mqttNodeID)
 Eventually(func() bool {
     vnode := &v1.Node{}
     err := k8sClient.Get(ctx, types.NamespacedName{
         Name: utils.FormatNodeName(mqttNodeID, env),
     }, vnode)
+    if err != nil {
+        t.Logf("Node get error: %v", err)
+    }
     return errors.IsNotFound(err)
-}, time.Minute*2, time.Second).Should(BeTrue())
+}, time.Minute*2, time.Second*5).Should(BeTrue(), "Node should be removed after becoming unreachable")
suite/http/base_http_lifecycle_test.go (2)

30-42: Optimize test timeouts and polling intervals

The Eventually blocks use long timeouts (50s) with 1s polling intervals. Consider:

  1. Reducing timeout values for faster test execution
  2. Using shorter polling intervals for quicker feedback
  3. Using consistent timeout values across similar checks
const (
    defaultTimeout  = 10 * time.Second
    defaultInterval = 100 * time.Millisecond
)

Eventually(func() bool {
    // Your check logic
}, defaultTimeout, defaultInterval).Should(BeTrue())

Also applies to: 76-88, 107-119


24-134: Improve test descriptions and organization

The test descriptions could be more specific about what they're testing. Consider:

  1. Using more descriptive Context blocks to group related tests
  2. Making It blocks more specific about the behavior being tested
  3. Adding comments to explain complex test scenarios

Example structure:

Context("when HTTP base is started", func() {
    Context("and is healthy", func() {
        It("should register as a ready node in kubernetes", func() {
            // Test logic
        })
    })
    
    Context("when becoming unreachable", func() {
        It("should be removed from kubernetes", func() {
            // Test logic
        })
        
        It("should re-register when becoming reachable again", func() {
            // Test logic
        })
    })
})
suite/http/mock_http_base.go (2)

33-44: Add parameter validation in constructor

The constructor accepts multiple string parameters but doesn't validate them for empty values. Consider adding validation to ensure required fields are not empty.

 func NewMockHttpBase(name, clusterName, version, env string, port int) *MockHttpBase {
+    if name == "" || clusterName == "" {
+        panic("name and clusterName cannot be empty")
+    }
+    if port <= 0 {
+        panic("port must be positive")
+    }
     return &MockHttpBase{
         Env:       env,
         CurrState: "ACTIVATED",

279-281: Add input validation to getBizIdentity

The helper function should validate its input to prevent formatting invalid identifiers.

 func getBizIdentity(bizModel ark.BizModel) string {
+    if bizModel.BizName == "" || bizModel.BizVersion == "" {
+        return ""
+    }
     return fmt.Sprintf("%s:%s", bizModel.BizName, bizModel.BizVersion)
 }
suite/mqtt/mock_mqtt_base.go (2)

21-26: Consider adding mutex protection for BaseMetadata

The BaseMetadata field is accessed from multiple goroutines (heart beat routine and message processing routines) but isn't protected by the struct's mutex. While the current usage might be read-only, it's better to be explicit about thread safety.

Consider protecting BaseMetadata access with the existing mutex:

 type MockMQTTBase struct {
 	sync.Mutex
 	Env          string
 	CurrState    string
 	BaseMetadata model.BaseMetadata
 	BizInfos     map[string]ark.ArkBizInfo
 	Baseline     []ark.BizModel
 	client       *mqtt.Client

Line range hint 277-309: Add nil checks in SetBizState

The method accesses info.BizName and info.BizVersion without verifying if the bizIdentity exists in the map. This could lead to a nil pointer dereference.

Add validation:

 func (b *MockMQTTBase) SetBizState(bizIdentity, state, reason, message string) {
     b.Lock()
     defer b.Unlock()
     info, exists := b.BizInfos[bizIdentity]
+    if !exists {
+        log.G(context.Background()).Errorf("Biz with identity %s not found", bizIdentity)
+        return
+    }
     info.BizState = state
     info.BizStateRecords = append(info.BizStateRecords, ark.ArkBizStateRecord{
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a295e86 and b753500.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • go.mod (1 hunks)
  • suite/http/base_http_lifecycle_test.go (1 hunks)
  • suite/http/mock_http_base.go (12 hunks)
  • suite/mqtt/base_mqtt_lifecycle_test.go (5 hunks)
  • suite/mqtt/mock_mqtt_base.go (11 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod
🔇 Additional comments (4)
suite/mqtt/base_mqtt_lifecycle_test.go (4)

Line range hint 1-22: LGTM: Initialization and setup changes look good

The package restructuring and addition of cluster name parameter are well-implemented. The imports and initialization are properly organized.


Line range hint 76-90: Code duplication in node readiness check

The node readiness check logic is duplicated across multiple test cases.

Also applies to: 108-122


125-134: Cleanup should be in AfterEach

The cleanup logic using mockMqttBase.Exit() should be moved to an AfterEach block.


28-41: Verify lease duration constants

The node lease duration (model.NodeLeaseDurationSeconds) is used for leader verification. Please verify if this aligns with the Kubernetes node lease duration defaults and your system requirements.

✅ Verification successful

Let me search for the actual constant definition since the previous search only showed its usage.


Let me try one more search strategy to locate this constant, focusing on the model package and any configuration files.


I notice that the constant is imported from github.com/koupleless/virtual-kubelet/model. Since this is an external dependency, and we can't verify its exact value in the current repository, I can now generate the final response.

Review lease duration constant usage

The NodeLeaseDurationSeconds constant is imported from the external dependency github.com/koupleless/virtual-kubelet/model. Since this is using a standardized Kubernetes lease mechanism from a well-established project, the duration constant should be properly aligned with Kubernetes node lease defaults.

  • The constant is consistently used across both HTTP and MQTT lifecycle tests
  • The implementation follows the standard leader election pattern using Kubernetes lease objects
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check the NodeLeaseDurationSeconds constant definition
rg -A 1 "NodeLeaseDurationSeconds.*=.*" 

# Check if this duration is referenced consistently across the codebase
rg "NodeLeaseDurationSeconds"

Length of output: 345


Script:

#!/bin/bash
# Search for the constant definition in model package
ast-grep --pattern 'const NodeLeaseDurationSeconds = $_'

# Also try searching in case it's defined as a variable
ast-grep --pattern 'var NodeLeaseDurationSeconds = $_'

# Search for any file containing NodeLeaseDurationSeconds in its name or content
fd NodeLeaseDurationSeconds

# Let's also check for any lease-related constants or configurations
rg -i "lease.*duration|duration.*lease" 

Length of output: 461


Script:

#!/bin/bash
# Find all Go files in model directory
fd -e go . model/

# Search for any duration-related constants in Go files
rg "const.*Duration|var.*Duration" -t go

# Check for any configuration files that might define lease durations
fd -e yaml -e json . config/

# Look for imports of the model package to trace its location
rg "\".*model\"" -t go

Length of output: 2382

Comment on lines +71 to +89
It("base should become a ready vnode eventually", func() {
time.Sleep(time.Second)

go mockHttpBase.Start(ctx, clientID)
vnode := &v1.Node{}
Eventually(func() bool {
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
vnodeReady := false
for _, cond := range vnode.Status.Conditions {
if cond.Type == v1.NodeReady {
vnodeReady = cond.Status == v1.ConditionTrue
break
}
}
return err == nil && vnodeReady
}, time.Second*50, time.Second).Should(BeTrue())
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consolidate duplicate test cases

The test cases "base should become a ready vnode eventually" appear twice with very similar logic. Consider consolidating them and using different test cases to cover specific scenarios:

Context("http base lifecycle", func() {
    It("should become ready when started", func() {
        // Your existing ready check logic
    })

    It("should recover after being marked unreachable", func() {
        // First mark as unreachable
        mockHttpBase.reachable = false
        // Then recover
        mockHttpBase.reachable = true
        // Verify recovery
    })
})

Also applies to: 102-120

Comment on lines +128 to +136
go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
if b.reachable {
log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)
_, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
if err != nil {
logrus.Errorf("error calling heartbeat: %s", err)
}
}
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve heartbeat goroutine implementation

The heartbeat goroutine needs improvements:

  1. Add proper context cancellation handling
  2. Add timeout for HTTP requests
  3. Add backoff mechanism for retries
 go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
     if b.reachable {
-        log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)
-        _, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
-        if err != nil {
-            logrus.Errorf("error calling heartbeat: %s", err)
-        }
+        reqCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
+        defer cancel()
+        
+        req, err := http.NewRequestWithContext(reqCtx, "POST", "http://127.0.0.1:7777/heartbeat", bytes.NewBuffer(b.getHeartBeatMsg()))
+        if err != nil {
+            log.G(ctx).Errorf("failed to create heartbeat request: %v", err)
+            return
+        }
+        
+        req.Header.Set("Content-Type", "application/json")
+        resp, err := http.DefaultClient.Do(req)
+        if err != nil {
+            log.G(ctx).Errorf("failed to send heartbeat: %v", err)
+            return
+        }
+        defer resp.Body.Close()
+        
+        if resp.StatusCode != http.StatusOK {
+            log.G(ctx).Errorf("unexpected status code from heartbeat: %d", resp.StatusCode)
+        }
     }
 })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
if b.reachable {
log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)
_, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
if err != nil {
logrus.Errorf("error calling heartbeat: %s", err)
}
}
})
go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
if b.reachable {
reqCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
req, err := http.NewRequestWithContext(reqCtx, "POST", "http://127.0.0.1:7777/heartbeat", bytes.NewBuffer(b.getHeartBeatMsg()))
if err != nil {
log.G(ctx).Errorf("failed to create heartbeat request: %v", err)
return
}
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err != nil {
log.G(ctx).Errorf("failed to send heartbeat: %v", err)
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
log.G(ctx).Errorf("unexpected status code from heartbeat: %d", resp.StatusCode)
}
}
})

Comment on lines +76 to +82
// Start a new goroutine to upload node heart beat data every 10 seconds
go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
if b.reachable {
log.G(ctx).Info("upload node heart beat data from node ", b.BaseMetadata.Identity)
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/heart", b.Env, b.BaseMetadata.Identity), 1, b.getHeartBeatMsg())
}
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Protect reachable flag access in heart beat routine

The reachable flag is accessed from the heart beat goroutine without mutex protection, which could lead to a race condition.

Add mutex protection:

 go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
+    b.Lock()
+    reachable := b.reachable
+    b.Unlock()
-    if b.reachable {
+    if reachable {
         log.G(ctx).Info("upload node heart beat data from node ", b.BaseMetadata.Identity)
         b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/heart", b.Env, b.BaseMetadata.Identity), 1, b.getHeartBeatMsg())
     }
 })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Start a new goroutine to upload node heart beat data every 10 seconds
go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
if b.reachable {
log.G(ctx).Info("upload node heart beat data from node ", b.BaseMetadata.Identity)
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/heart", b.Env, b.BaseMetadata.Identity), 1, b.getHeartBeatMsg())
}
})
// Start a new goroutine to upload node heart beat data every 10 seconds
go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
b.Lock()
reachable := b.reachable
b.Unlock()
if reachable {
log.G(ctx).Info("upload node heart beat data from node ", b.BaseMetadata.Identity)
b.client.Pub(fmt.Sprintf("koupleless_%s/%s/base/heart", b.Env, b.BaseMetadata.Identity), 1, b.getHeartBeatMsg())
}
})

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (8)
suite/mqtt/base_mqtt_lifecycle_test.go (2)

94-94: Consider reducing the timeout duration

The 2-minute timeout in the unreachability test might make the test suite unnecessarily slow. Consider if a shorter duration would be sufficient, or if the test could be restructured to complete faster.

-			}, time.Minute*2, time.Second).Should(BeTrue())
+			}, time.Second*30, time.Second).Should(BeTrue())

Also applies to: 102-102


94-94: Encapsulate reachability state changes

Direct manipulation of mockMqttBase.reachable could be better encapsulated using methods like SetReachable() and SetUnreachable().

Consider adding these methods to the MockMqttBase type:

func (m *MockMqttBase) SetReachable() {
    m.reachable = true
}

func (m *MockMqttBase) SetUnreachable() {
    m.reachable = false
}

Then update the test cases:

-			mockMqttBase.reachable = false
+			mockMqttBase.SetUnreachable()
-			mockMqttBase.reachable = true
+			mockMqttBase.SetReachable()

Also applies to: 107-107

suite/http/base_http_lifecycle_test.go (3)

18-22: Consider using a more robust test context

The current context initialization doesn't include timeout or cancellation handling. For better test reliability, consider using a context with timeout:

-ctx := context.Background()
+ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
+defer cancel()

93-100: Standardize timeout values across tests

The timeout in this test (2 minutes) differs from others (50 seconds). Consider using a consistent timeout value defined as a constant:

+const (
+    defaultTestTimeout = 2 * time.Minute
+    defaultPollInterval = time.Second
+)

-}, time.Minute*2, time.Second).Should(BeTrue())
+}, defaultTestTimeout, defaultPollInterval).Should(BeTrue())

122-133: Add error logging for better test debugging

The error check in the Eventually block silently ignores the actual error. Consider logging it for better debugging:

 Eventually(func() bool {
     vnode := &v1.Node{}
     err := k8sClient.Get(ctx, types.NamespacedName{
         Name: utils.FormatNodeName(httpNodeID, env),
     }, vnode)
+    if err != nil && !errors.IsNotFound(err) {
+        GinkgoT().Logf("Unexpected error while checking node: %v", err)
+    }
     return errors.IsNotFound(err)
 }, time.Second*50, time.Second).Should(BeTrue())
suite/http/mock_http_base.go (3)

33-46: Add parameter validation in constructor

The constructor accepts multiple string parameters but doesn't validate them for empty values. Consider adding validation to ensure robust initialization.

 func NewMockHttpBase(name, clusterName, version, env string, port int) *MockHttpBase {
+    if name == "" || clusterName == "" || version == "" {
+        logrus.Error("invalid parameters: name, clusterName, and version must not be empty")
+        return nil
+    }
+    if port <= 0 {
+        logrus.Error("invalid port number")
+        return nil
+    }
     return &MockHttpBase{
         Env:       env,
         CurrState: "ACTIVATED",

185-188: Add nil check for Metadata access

The code directly accesses Metadata fields without checking if they're properly initialized. While unlikely in normal operation, defensive programming would be beneficial.

+                        BizName:    b.Metadata.Identity,
+                        BizState:   b.CurrState,
+                        BizVersion: b.Metadata.Version,
-                        BizName:    b.Metadata.Identity,
-                        BizState:   b.CurrState,
-                        BizVersion: b.Metadata.Version,

279-281: Enhance getBizIdentity function robustness

The function should validate input parameters and handle potential edge cases.

 func getBizIdentity(bizModel ark.BizModel) string {
+    if bizModel.BizName == "" || bizModel.BizVersion == "" {
+        logrus.Warn("invalid biz model: empty name or version")
+        return ""
+    }
     return fmt.Sprintf("%s:%s", bizModel.BizName, bizModel.BizVersion)
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b753500 and d53c98d.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • go.mod (1 hunks)
  • suite/http/base_http_lifecycle_test.go (1 hunks)
  • suite/http/mock_http_base.go (12 hunks)
  • suite/mqtt/base_mqtt_lifecycle_test.go (5 hunks)
  • suite/mqtt/mock_mqtt_base.go (11 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • go.mod
  • suite/mqtt/mock_mqtt_base.go
🔇 Additional comments (3)
suite/mqtt/base_mqtt_lifecycle_test.go (3)

Line range hint 1-14: LGTM! Package name change improves organization

The package name change from 'suite' to 'mqtt' better reflects the test's scope and responsibility.


21-22: LGTM! Improved test setup with cluster name

The addition of the clusterName parameter aligns with the broader changes in the codebase and improves the test configuration clarity.


31-41: LGTM! Improved leader election check using lease

The new lease-based approach for leader election is more robust and follows Kubernetes best practices.

Comment on lines +30 to +42
Eventually(func() bool {
lease := &v12.Lease{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
Namespace: v1.NamespaceNodeLease,
}, lease)

isLeader := err == nil &&
*lease.Spec.HolderIdentity == clientID &&
!time.Now().After(lease.Spec.RenewTime.Time.Add(time.Second*model.NodeLeaseDurationSeconds))

return isLeader
}, time.Second*50, time.Second).Should(BeTrue())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Standardize lease verification logic across test cases

The lease verification logic is only present in this test case but should be consistent across all node readiness checks. Consider extracting this into a helper function:

func waitForLeaseOwnership(ctx context.Context, k8sClient client.Client, nodeName, clientID string) bool {
    lease := &v12.Lease{}
    err := k8sClient.Get(ctx, types.NamespacedName{
        Name:      nodeName,
        Namespace: v1.NamespaceNodeLease,
    }, lease)
    
    return err == nil &&
        lease.Spec.HolderIdentity != nil &&
        *lease.Spec.HolderIdentity == clientID &&
        !time.Now().After(lease.Spec.RenewTime.Time.Add(time.Second*model.NodeLeaseDurationSeconds))
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (7)
suite/mqtt/base_mqtt_lifecycle_test.go (2)

Line range hint 1-22: Define the clientID variable.

The clientID is used in multiple places but its definition is not visible in the code. Ensure it's properly defined at the test suite level.

Add the clientID definition near other test variables:

var _ = Describe("Base Lifecycle Test", func() {
    ctx := context.Background()
    mqttNodeID := "test-mqtt-base"
    clusterName := "test-cluster-name"
+   clientID := "test-client-id"  // Add this line
    mockMqttBase := NewMockMqttBase(mqttNodeID, clusterName, "1.0.0", env)

31-41: LGTM: Improved leader election check using lease.

The new lease-based leader check is a more reliable approach for determining node leadership.

Consider extracting the timeout values into constants for better maintainability:

+const (
+    nodeLeaseTimeout = 50 * time.Second
+    pollInterval = time.Second
+)
suite/http/base_http_lifecycle_test.go (2)

30-42: Improve lease verification error reporting

The lease verification should include descriptive error messages to help debug test failures.

 Eventually(func() bool {
     lease := &v12.Lease{}
     err := k8sClient.Get(ctx, types.NamespacedName{
         Name:      utils.FormatNodeName(httpNodeID, env),
         Namespace: v1.NamespaceNodeLease,
     }, lease)

     isLeader := err == nil &&
         *lease.Spec.HolderIdentity == clientID &&
         !time.Now().After(lease.Spec.RenewTime.Time.Add(time.Second*model.NodeLeaseDurationSeconds))

     return isLeader
-}, time.Second*50, time.Second).Should(BeTrue())
+}, time.Second*50, time.Second).Should(BeTrue(), "Failed to verify lease ownership")

93-99: Adjust timeout for unreachability test

The 2-minute timeout for checking node unreachability seems excessive and could slow down the test suite.

-}, time.Minute*2, time.Second).Should(BeTrue())
+}, time.Second*60, time.Second).Should(BeTrue(), "Node should be marked as not found after becoming unreachable")
suite/http/mock_http_base.go (1)

279-281: Add input validation to getBizIdentity

The function assumes valid input parameters but should validate them to prevent formatting issues with empty or invalid values.

 func getBizIdentity(bizModel ark.BizModel) string {
+    if bizModel.BizName == "" || bizModel.BizVersion == "" {
+        logrus.Warn("invalid business model: empty name or version")
+        return ""
+    }
     return fmt.Sprintf("%s:%s", bizModel.BizName, bizModel.BizVersion)
 }
suite/mqtt/mock_mqtt_base.go (2)

Line range hint 167-181: Improve goroutine management in command processing

The command processing spawns goroutines without context propagation or cancellation mechanism. This could lead to goroutine leaks if the parent context is cancelled.

Consider passing the context to the processing functions:

-			go b.processHealth()
+			go b.processHealth(ctx)
-			go b.processInstallBiz(msg.Payload())
+			go b.processInstallBiz(ctx, msg.Payload())
-			go b.processUnInstallBiz(msg.Payload())
+			go b.processUnInstallBiz(ctx, msg.Payload())
-			go b.processQueryAllBiz()
+			go b.processQueryAllBiz(ctx)

Line range hint 276-331: Refactor SetBizState for better maintainability

The method is complex and handles multiple responsibilities:

  1. State management
  2. Response generation
  3. Publishing updates

Additionally, JSON marshaling errors are not handled.

Consider breaking down the method and handling errors:

func (b *MockMQTTBase) SetBizState(bizIdentity, state, reason, message string) {
    b.Lock()
    defer b.Unlock()
    
    if err := b.updateBizState(bizIdentity, state, reason, message); err != nil {
        log.G(context.Background()).Errorf("Failed to update biz state: %v", err)
        return
    }
    
    if state == "ACTIVATED" {
        if err := b.publishBizOperationResponse(bizIdentity); err != nil {
            log.G(context.Background()).Errorf("Failed to publish operation response: %v", err)
        }
    }
    
    if err := b.publishSimpleBizInfo(); err != nil {
        log.G(context.Background()).Errorf("Failed to publish simple biz info: %v", err)
    }
}

func (b *MockMQTTBase) updateBizState(bizIdentity, state, reason, message string) error {
    // State update logic
}

func (b *MockMQTTBase) publishBizOperationResponse(bizIdentity string) error {
    // Response publishing logic with error handling
}

func (b *MockMQTTBase) publishSimpleBizInfo() error {
    // Simple biz info publishing logic with error handling
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d53c98d and 4d663dc.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • go.mod (1 hunks)
  • suite/http/base_http_lifecycle_test.go (1 hunks)
  • suite/http/mock_http_base.go (12 hunks)
  • suite/http/module_http_lifecycle_test.go (4 hunks)
  • suite/mqtt/base_mqtt_lifecycle_test.go (5 hunks)
  • suite/mqtt/mock_mqtt_base.go (11 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • go.mod
  • suite/http/module_http_lifecycle_test.go
🔇 Additional comments (6)
suite/mqtt/base_mqtt_lifecycle_test.go (1)

Line range hint 43-57: Extract duplicated node checks into helper functions.

The node readiness check logic is duplicated across multiple test cases. Please refer to the previous suggestions about extracting these into helper functions:

  1. waitForNodeReady for node readiness checks
  2. waitForNodeNotFound for node removal checks

These helper functions will improve code maintainability and reduce duplication.

Also applies to: 77-90, 109-122

suite/mqtt/mock_mqtt_base.go (5)

18-44: LGTM! Well-structured metadata refactoring

The refactoring of metadata handling using BaseMetadata improves code organization and maintainability.


Line range hint 55-82: Skip generating new comments as existing review comments are still valid

The previous review comments about MQTT subscription error handling and race conditions in the heart beat routine are still applicable.


Line range hint 102-113: Skip generating new comment as existing review comment is still valid

The previous review comment about handling JSON marshaling errors is still applicable.


249-253: Skip generating new comment as existing review comment is still valid

The previous review comment about checking client state before publishing messages is still applicable.


268-274: Skip generating new comment as existing review comment is still valid

The previous review comment about handling JSON marshaling errors is still applicable.

Comment on lines +94 to +102
mockMqttBase.reachable = false
Eventually(func() bool {
vnode := &v1.Node{}

err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
Name: utils.FormatNodeName(mqttNodeID, env),
}, vnode)
return errors.IsNotFound(err)
}, time.Second*50, time.Second).Should(BeTrue())

mockHttpBase.Exit()
}, time.Minute*2, time.Second).Should(BeTrue())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider encapsulating reachability control and standardizing timeouts.

  1. Direct manipulation of mockMqttBase.reachable breaks encapsulation. Consider adding methods to control reachability.
  2. This test uses a 2-minute timeout while others use 50 seconds. Document the reason for this difference or standardize the timeouts.

Suggested encapsulation:

-mockMqttBase.reachable = false
+mockMqttBase.SetReachable(false)  // Add this method to MockMqttBase

Also consider standardizing the timeout or documenting the reason for the difference:

-}, time.Minute*2, time.Second).Should(BeTrue())
+}, nodeUnreachableTimeout, pollInterval).Should(BeTrue())  // Use a named constant

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +24 to +134
Context("http base online and deactive finally", func() {
It("base should become a ready vnode eventually", func() {
time.Sleep(time.Second)

go mockHttpBase.Start(ctx, clientID)

Eventually(func() bool {
lease := &v12.Lease{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
Namespace: v1.NamespaceNodeLease,
}, lease)

isLeader := err == nil &&
*lease.Spec.HolderIdentity == clientID &&
!time.Now().After(lease.Spec.RenewTime.Time.Add(time.Second*model.NodeLeaseDurationSeconds))

return isLeader
}, time.Second*50, time.Second).Should(BeTrue())

Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
vnodeReady := false
for _, cond := range vnode.Status.Conditions {
if cond.Type == v1.NodeReady {
vnodeReady = cond.Status == v1.ConditionTrue
break
}
}
return err == nil && vnodeReady
}, time.Second*50, time.Second).Should(BeTrue())
})

It("base offline with deactive message and finally exit", func() {
mockHttpBase.Exit()
Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
return errors.IsNotFound(err)
}, time.Second*50, time.Second).Should(BeTrue())
})

It("base should become a ready vnode eventually", func() {
time.Sleep(time.Second)

go mockHttpBase.Start(ctx, clientID)
vnode := &v1.Node{}
Eventually(func() bool {
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
vnodeReady := false
for _, cond := range vnode.Status.Conditions {
if cond.Type == v1.NodeReady {
vnodeReady = cond.Status == v1.ConditionTrue
break
}
}
return err == nil && vnodeReady
}, time.Second*50, time.Second).Should(BeTrue())
})

It("base unreachable finally exit", func() {
mockHttpBase.reachable = false
Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
return errors.IsNotFound(err)
}, time.Minute*2, time.Second).Should(BeTrue())
})

It("reachable base should become a ready vnode eventually", func() {
time.Sleep(time.Second)
mockHttpBase.reachable = true
go mockHttpBase.Start(ctx, clientID)
vnode := &v1.Node{}
Eventually(func() bool {
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
vnodeReady := false
for _, cond := range vnode.Status.Conditions {
if cond.Type == v1.NodeReady {
vnodeReady = cond.Status == v1.ConditionTrue
break
}
}
return err == nil && vnodeReady
}, time.Second*50, time.Second).Should(BeTrue())
})

It("base finally exit", func() {
mockHttpBase.Exit()

Eventually(func() bool {
vnode := &v1.Node{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: utils.FormatNodeName(httpNodeID, env),
}, vnode)
return errors.IsNotFound(err)
}, time.Second*50, time.Second).Should(BeTrue())

})
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Restructure test cases for better organization

The test suite structure could be improved for better readability and maintenance.

Context("http base lifecycle", func() {
    Context("when base comes online", func() {
        It("should become a ready vnode with valid lease", func() {
            // Combine the lease and node readiness checks
        })
    })

    Context("when base goes offline", func() {
        It("should be removed when explicitly deactivated", func() {
            // Test explicit Exit() case
        })

        It("should be removed when becoming unreachable", func() {
            // Test unreachability case
        })
    })

    Context("when base recovers", func() {
        It("should become ready again after being marked reachable", func() {
            // Test recovery case
        })
    })
})


var _ = Describe("Base Lifecycle Test", func() {

ctx := context.Background()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add context cancellation for proper cleanup

The background context should be cancelled after tests complete to prevent resource leaks.

-ctx := context.Background()
+var cancel context.CancelFunc
+var ctx context.Context
+
+BeforeEach(func() {
+    ctx, cancel = context.WithCancel(context.Background())
+})
+
+AfterEach(func() {
+    cancel()
+})

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +151 to +152
time.Sleep(2 * time.Second)
server.Shutdown(ctx)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve shutdown context handling

The parent context is used for server shutdown, which could be already cancelled. Create a separate timeout context for shutdown.

-time.Sleep(2 * time.Second)
-server.Shutdown(ctx)
+shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 5*time.Second)
+defer shutdownCancel()
+if err := server.Shutdown(shutdownCtx); err != nil {
+    logrus.Errorf("error during server shutdown: %v", err)
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
time.Sleep(2 * time.Second)
server.Shutdown(ctx)
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 5*time.Second)
defer shutdownCancel()
if err := server.Shutdown(shutdownCtx); err != nil {
logrus.Errorf("error during server shutdown: %v", err)
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)
suite/http/base_http_lifecycle_test.go (2)

99-99: Standardize timeout durations

The timeout duration of 2 minutes is inconsistent with the 50-second timeout used in other test cases. Consider standardizing the timeout duration across all test cases for consistency.

-}, time.Minute*2, time.Second).Should(BeTrue())
+}, time.Second*50, time.Second).Should(BeTrue())

25-25: Improve test case descriptions

The test descriptions could be more descriptive to better communicate their purpose. Consider using BDD-style descriptions that explain the scenario and expected outcome.

-It("base should become a ready vnode eventually", func() {
+It("should register as a ready virtual node when started", func() {

-It("base offline with deactive message and finally exit", func() {
+It("should remove the virtual node when explicitly deactivated", func() {

-It("base should become a ready vnode eventually", func() {
+It("should re-register as a ready virtual node after restart", func() {

-It("base unreachable finally exit", func() {
+It("should remove the virtual node when becoming unreachable", func() {

-It("reachable base should become a ready vnode eventually", func() {
+It("should re-register as a ready virtual node when becoming reachable again", func() {

-It("base finally exit", func() {
+It("should clean up resources when exiting", func() {

Also applies to: 60-60, 71-71, 91-91, 102-102, 122-122

suite/http/mock_http_base.go (2)

185-188: Add nil checks for Metadata access

The code accesses b.Metadata fields without checking if they're initialized. While the constructor does initialize these fields, defensive programming would be beneficial for maintainability.

 MasterBizInfo: ark.MasterBizInfo{
-	BizName:    b.Metadata.Identity,
-	BizState:   b.CurrState,
-	BizVersion: b.Metadata.Version,
+	BizName:    b.Metadata.Identity,
+	BizState:   b.CurrState,
+	BizVersion: b.Metadata.Version,
+	// Add validation in constructor
+	validate: func() {
+		if b.Metadata.Identity == "" || b.Metadata.Version == "" {
+			panic("Metadata not properly initialized")
+		}
+	},
 },

279-281: Add input validation to getBizIdentity

The function should validate its input parameters to prevent malformed identifiers.

 func getBizIdentity(bizModel ark.BizModel) string {
+	if bizModel.BizName == "" || bizModel.BizVersion == "" {
+		logrus.Warn("Invalid bizModel: empty name or version")
+		return ""
+	}
 	return fmt.Sprintf("%s:%s", bizModel.BizName, bizModel.BizVersion)
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4d663dc and 645abcc.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • go.mod (1 hunks)
  • suite/http/base_http_lifecycle_test.go (1 hunks)
  • suite/http/mock_http_base.go (12 hunks)
  • suite/http/module_http_lifecycle_test.go (4 hunks)
  • suite/mqtt/base_mqtt_lifecycle_test.go (5 hunks)
  • suite/mqtt/mock_mqtt_base.go (11 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • go.mod
  • suite/http/module_http_lifecycle_test.go
  • suite/mqtt/base_mqtt_lifecycle_test.go
  • suite/mqtt/mock_mqtt_base.go
🔇 Additional comments (1)
suite/http/mock_http_base.go (1)

33-46: Verify constructor signature consistency across the codebase

The constructor signature has been updated to use clusterName instead of id. Let's ensure this change is consistent with other implementations.

✅ Verification successful

Constructor signatures are consistent across mock implementations

The verification shows that both mock constructors (NewMockHttpBase and NewMockMqttBase) follow the same pattern using clusterName parameter, maintaining consistency in the codebase. The signatures align well with each other, with NewMockHttpBase having an additional port parameter which is specific to HTTP implementation needs.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other NewMock* constructors to verify signature consistency
ast-grep --pattern 'func New$_(name string, $$$) $_'

Length of output: 54


Script:

#!/bin/bash
# Let's try a more specific search for NewMock constructors
rg "func NewMock.*\(" -A 2

Length of output: 455


clusterName := "test-cluster-name"
httpNodeID := "test-http-base"
mockHttpBase := NewMockHttpBase(httpNodeID, clusterName, "1.0.0", env, 1238)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Move mock initialization into test setup

Initializing mockHttpBase at the suite level could lead to state sharing between tests. This initialization should be moved into a BeforeEach block to ensure each test starts with a fresh instance.

-mockHttpBase := NewMockHttpBase(httpNodeID, clusterName, "1.0.0", env, 1238)
+var mockHttpBase *MockHttpBase
+
+BeforeEach(func() {
+    mockHttpBase = NewMockHttpBase(httpNodeID, clusterName, "1.0.0", env, 1238)
+})

Committable suggestion skipped: line range outside the PR's diff.

go utils.TimedTaskWithInterval(ctx, time.Second*10, func(ctx context.Context) {
if b.reachable {
log.G(ctx).Info("upload node heart beat data from node ", b.Metadata.Identity)
_, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid hardcoding URLs

The heartbeat endpoint URL is hardcoded. Consider making it configurable through constructor parameters or environment variables.

-			_, err := http.Post("http://127.0.0.1:7777/heartbeat", "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))
+			_, err := http.Post(fmt.Sprintf("http://%s:%d/heartbeat", b.heartbeatHost, b.heartbeatPort), "application/json", bytes.NewBuffer(b.getHeartBeatMsg()))

Add these fields to the struct:

 type MockHttpBase struct {
     // ... existing fields ...
+    heartbeatHost string
+    heartbeatPort int
 }

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)
suite/mqtt/suite_test.go (3)

102-104: Consider extracting magic numbers into constants

The metrics server port :8080 should be defined as a constant at the package level for better maintainability and reusability.

 const (
     clientID = "suite-test"
     env      = "suite"
+    metricsPort = ":8080"
 )

 // Later in the code
 Metrics: metricsserver.Options{
-    BindAddress: ":8080",
+    BindAddress: metricsPort,
 },

Line range hint 57-58: Security: Consider strengthening MQTT authentication

The test environment uses an empty MQTT password and lacks TLS configuration. While this might be acceptable for tests, consider:

  1. Adding a comment explaining why secure configuration is skipped in tests
  2. Ensuring production code has proper security measures

Line range hint 48-51: Consider adding negative test cases

The test suite would benefit from additional test cases covering:

  1. MQTT connection failures
  2. Authentication failures
  3. Network timeout scenarios

Would you like me to help generate these additional test cases?

cmd/module-controller/main.go (1)

215-219: Use fmt.Errorf instead of errors.New(fmt.Sprintf()).

Using fmt.Errorf is more idiomatic and concise.

Apply this diff to improve error creation:

 	if startFailedCount > 0 {
-		panic(errors.New(fmt.Sprintf("failed to start %d tunnels", startFailedCount)))
+		panic(fmt.Errorf("failed to start %d tunnels", startFailedCount))
 	} else if successTunnelCount == 0 {
-		panic(errors.New(fmt.Sprintf("successfully started 0 tunnels")))
+		panic(fmt.Errorf("successfully started 0 tunnels"))
 	}
🧰 Tools
🪛 golangci-lint (1.61.0)

216-216: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)


218-218: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 645abcc and 4c9703c.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (4)
  • cmd/module-controller/main.go (4 hunks)
  • go.mod (1 hunks)
  • suite/http/suite_test.go (1 hunks)
  • suite/mqtt/suite_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • go.mod
  • suite/http/suite_test.go
🧰 Additional context used
🪛 golangci-lint (1.61.0)
cmd/module-controller/main.go

216-216: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)


218-218: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)

🔇 Additional comments (2)
suite/mqtt/suite_test.go (2)

Line range hint 1-31: LGTM: Package and import changes are well-structured

The package rename to 'mqtt' better reflects the test suite's purpose, and the import updates align with the module restructuring.


42-42: LGTM: Well-structured context management

The addition of context with cancel function provides proper resource management capabilities.

Comment on lines +177 to +179
func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) tunnel.Tunnel {
// Initialize tunnels based on configuration
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add early validation for tunnel configuration.

The function should validate that at least one tunnel type is enabled before proceeding.

Add this validation at the start of the function:

 func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
 	moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) tunnel.Tunnel {
+	mqttEnabled := utils.GetEnv("ENABLE_MQTT_TUNNEL", "false") == "true"
+	httpEnabled := utils.GetEnv("ENABLE_HTTP_TUNNEL", "false") == "true"
+	if !mqttEnabled && !httpEnabled {
+		panic(fmt.Errorf("no tunnel types are enabled, set ENABLE_MQTT_TUNNEL=true or ENABLE_HTTP_TUNNEL=true"))
+	}

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +105 to +108
// TODO: should support to set from parameter
kubeConfig.QPS = 100
kubeConfig.Burst = 200
ctrl.SetLogger(logr.New(log2.NullLogSink{}))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider making QPS and Burst configurable via environment variables.

The hardcoded QPS and Burst values might not be suitable for all environments. As noted in the TODO comment, these should be configurable parameters.

Apply this diff to make these values configurable:

-	// TODO: should support to set from parameter
-	kubeConfig.QPS = 100
-	kubeConfig.Burst = 200
+	qps, err := strconv.ParseFloat(utils.GetEnv("KUBE_QPS", "100"), 32)
+	if err != nil {
+		log.G(ctx).WithError(err).Error("failed to parse KUBE_QPS, using default value 100")
+		qps = 100
+	}
+	burst, err := strconv.Atoi(utils.GetEnv("KUBE_BURST", "200"))
+	if err != nil {
+		log.G(ctx).WithError(err).Error("failed to parse KUBE_BURST, using default value 200")
+		burst = 200
+	}
+	kubeConfig.QPS = float32(qps)
+	kubeConfig.Burst = burst

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (7)
suite/http/suite_test.go (4)

38-38: Consider moving context creation to BeforeSuite.

Global context variables can make tests less isolated and harder to maintain. Consider moving the context creation to BeforeSuite and storing it in a suite-scoped variable.

-var ctx, cancel = context.WithCancel(context.Background())
+var (
+    ctx context.Context
+    cancel context.CancelFunc
+)

Then in BeforeSuite:

ctx, cancel = context.WithCancel(context.Background())

82-84: Simplify error handling pattern.

The error handling pattern is inconsistent with the rest of the code. Consider using the same inline error checking pattern:

-err = moduleDeploymentController.SetupWithManager(ctx, k8sManager)
-
-Expect(err).ToNot(HaveOccurred())
+Expect(moduleDeploymentController.SetupWithManager(ctx, k8sManager)).ToNot(HaveOccurred())

109-113: Improve error handling in goroutine.

The error logging before the assertion could mask the actual error. Consider removing the log line since Ginkgo will handle the error reporting:

 go func() {
     err = k8sManager.Start(ctx)
-    log.G(ctx).WithError(err).Error("k8sManager Start")
     Expect(err).ToNot(HaveOccurred())
 }()

126-165: Consider enhancing the helper function with validation and configurability.

The function could be improved in several ways:

  1. Add validation for required parameters
  2. Make container configuration more flexible

Example enhancement:

-func prepareModulePod(name, namespace, nodeName string) v1.Pod {
+func prepareModulePod(name, namespace, nodeName string) (v1.Pod, error) {
+    if name == "" || namespace == "" || nodeName == "" {
+        return v1.Pod{}, fmt.Errorf("name, namespace, and nodeName are required")
+    }
+
     return v1.Pod{
         ObjectMeta: metav1.ObjectMeta{
             Name:      name,
             Namespace: namespace,
             Labels: map[string]string{
                 model.LabelKeyOfComponent: model2.ComponentModule,
             },
         },
         Spec: v1.PodSpec{
             NodeName: nodeName,
             Containers: []v1.Container{
                 {
-                    Name:  "biz",
-                    Image: "suite-biz.jar",
+                    Name:  name + "-container",
+                    Image: os.Getenv("TEST_BIZ_IMAGE", "suite-biz.jar"),
                     Env: []v1.EnvVar{
                         {
                             Name:  "BIZ_VERSION",
-                            Value: "0.0.1",
+                            Value: os.Getenv("TEST_BIZ_VERSION", "0.0.1"),
                         },
                     },
                 },
             },
             Tolerations: []v1.Toleration{
                 // ... existing tolerations ...
             },
         },
-    }
+    }, nil
 }
suite/mqtt/suite_test.go (2)

42-42: Consider adding a timeout to the context

While the context with cancel is good for cleanup, consider adding a timeout to ensure the test suite doesn't run indefinitely in case of issues.

-var ctx, cancel = context.WithCancel(context.Background())
+var ctx, cancel = context.WithTimeout(context.Background(), 5*time.Minute)

108-113: Enhance error handling in controller setup

Consider providing more context in error cases to aid debugging.

 moduleDeploymentController, err := module_deployment_controller.NewModuleDeploymentController(env)
-Expect(err).ToNot(HaveOccurred())
+Expect(err).ToNot(HaveOccurred(), "failed to create module deployment controller")

 err = moduleDeploymentController.SetupWithManager(ctx, k8sManager)
-Expect(err).ToNot(HaveOccurred())
+Expect(err).ToNot(HaveOccurred(), "failed to setup module deployment controller with manager")
cmd/module-controller/main.go (1)

215-219: Use fmt.Errorf instead of errors.New(fmt.Sprintf).

Using fmt.Errorf is more idiomatic and concise than combining errors.New with fmt.Sprintf.

Apply this diff:

-       panic(errors.New(fmt.Sprintf("failed to start %d tunnels", startFailedCount)))
+       panic(fmt.Errorf("failed to start %d tunnels", startFailedCount))
-       panic(errors.New(fmt.Sprintf("successfully started 0 tunnels")))
+       panic(fmt.Errorf("successfully started 0 tunnels"))
🧰 Tools
🪛 golangci-lint (1.61.0)

216-216: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)


218-218: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4c9703c and e857dde.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (4)
  • cmd/module-controller/main.go (4 hunks)
  • go.mod (1 hunks)
  • suite/http/suite_test.go (1 hunks)
  • suite/mqtt/suite_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod
🧰 Additional context used
🪛 golangci-lint (1.61.0)
cmd/module-controller/main.go

216-216: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)


218-218: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)

🔇 Additional comments (6)
suite/http/suite_test.go (1)

104-104: Handle error from vnodeController.SetupWithManager.

The error returned by vnodeController.SetupWithManager is not being checked.

suite/mqtt/suite_test.go (4)

Line range hint 1-31: LGTM: Package and imports are well-organized

The package name change to "mqtt" better reflects the test suite's purpose, and imports are properly organized.


89-95: Reusing existing review comment about MQTT server error handling

The previous review comment about improving MQTT server error handling using channels is still applicable here.


115-123: Reusing existing review comment about tunnel initialization

The previous review comment about improving tunnel initialization error handling is still applicable here.


150-153: Reusing existing review comment about cleanup process

The previous review comment about improving the cleanup process with configurable timeouts and graceful shutdown is still applicable here.

cmd/module-controller/main.go (1)

177-179: 🛠️ Refactor suggestion

Add validation for tunnel configuration.

The function should validate that at least one tunnel type is enabled before proceeding with initialization.

Add this validation at the start of the function:

 func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
     moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) tunnel.Tunnel {
+    mqttEnabled := utils.GetEnv("ENABLE_MQTT_TUNNEL", "false") == "true"
+    httpEnabled := utils.GetEnv("ENABLE_HTTP_TUNNEL", "false") == "true"
+    if !mqttEnabled && !httpEnabled {
+        return nil, fmt.Errorf("no tunnel types are enabled, set ENABLE_MQTT_TUNNEL=true or ENABLE_HTTP_TUNNEL=true")
+    }

Likely invalid or redundant comment.

By("tearing down the suite environment")
cancel()
testEnv.Stop()
time.Sleep(15 * time.Second)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider replacing fixed sleep with a more reliable mechanism.

Using a fixed sleep duration of 15 seconds could make tests either flaky (if too short) or unnecessarily slow (if too long). Consider using a polling mechanism or wait condition instead.

Example approach:

// Define a timeout duration
timeout := time.After(30 * time.Second)
ticker := time.NewTicker(100 * time.Millisecond)
defer ticker.Stop()

// Poll until condition is met or timeout
for {
    select {
    case <-timeout:
        Fail("Timed out waiting for cleanup")
    case <-ticker.C:
        if /* check condition */ {
            return
        }
    }
}

Comment on lines +88 to +94
err = httpTunnel.Start(ctx, clientID, env)
if err != nil {
log.G(ctx).WithError(err).Error("failed to start tunnel", httpTunnel.Key())
panic(fmt.Sprintf("failed to start tunnel %s", httpTunnel.Key()))
} else {
log.G(ctx).Info("Tunnel started: ", httpTunnel.Key())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify error handling using Ginkgo's Expect.

The current error handling uses a mix of panic and logging. Consider using Ginkgo's built-in error handling:

-err = httpTunnel.Start(ctx, clientID, env)
-if err != nil {
-    log.G(ctx).WithError(err).Error("failed to start tunnel", httpTunnel.Key())
-    panic(fmt.Sprintf("failed to start tunnel %s", httpTunnel.Key()))
-} else {
-    log.G(ctx).Info("Tunnel started: ", httpTunnel.Key())
-}
+Expect(httpTunnel.Start(ctx, clientID, env)).ToNot(HaveOccurred(), 
+    fmt.Sprintf("failed to start tunnel %s", httpTunnel.Key()))
+log.G(ctx).Info("Tunnel started: ", httpTunnel.Key())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
err = httpTunnel.Start(ctx, clientID, env)
if err != nil {
log.G(ctx).WithError(err).Error("failed to start tunnel", httpTunnel.Key())
panic(fmt.Sprintf("failed to start tunnel %s", httpTunnel.Key()))
} else {
log.G(ctx).Info("Tunnel started: ", httpTunnel.Key())
}
Expect(httpTunnel.Start(ctx, clientID, env)).ToNot(HaveOccurred(),
fmt.Sprintf("failed to start tunnel %s", httpTunnel.Key()))
log.G(ctx).Info("Tunnel started: ", httpTunnel.Key())

Comment on lines +220 to +222
// we only using one tunnel for now
return tunnels[0]
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add safety checks before accessing tunnels slice.

The comment suggests only using one tunnel, but accessing tunnels[0] without checking the slice length could panic.

Apply this diff:

-    // we only using one tunnel for now
-    return tunnels[0]
+    if len(tunnels) == 0 {
+        return nil, fmt.Errorf("no tunnels were initialized")
+    }
+    // TODO: Define criteria for selecting which tunnel to use when multiple are available
+    return tunnels[0], nil
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// we only using one tunnel for now
return tunnels[0]
}
if len(tunnels) == 0 {
return nil, fmt.Errorf("no tunnels were initialized")
}
// TODO: Define criteria for selecting which tunnel to use when multiple are available
return tunnels[0], nil
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
cmd/module-controller/main.go (1)

215-219: Simplify error creation using fmt.Errorf

The current error creation using errors.New(fmt.Sprintf(...)) is unnecessarily verbose.

Apply this diff to simplify the error creation:

-	if startFailedCount > 0 {
-		panic(errors.New(fmt.Sprintf("failed to start %d tunnels", startFailedCount)))
-	} else if successTunnelCount == 0 {
-		panic(errors.New(fmt.Sprintf("successfully started 0 tunnels")))
-	}
+	if startFailedCount > 0 {
+		panic(fmt.Errorf("failed to start %d tunnels", startFailedCount))
+	} else if successTunnelCount == 0 {
+		panic(fmt.Errorf("successfully started 0 tunnels"))
+	}
🧰 Tools
🪛 golangci-lint (1.61.0)

216-216: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)


218-218: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e857dde and 8ca3ef2.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (4)
  • cmd/module-controller/main.go (4 hunks)
  • go.mod (1 hunks)
  • suite/http/suite_test.go (1 hunks)
  • suite/mqtt/suite_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • go.mod
  • suite/http/suite_test.go
🧰 Additional context used
🪛 golangci-lint (1.61.0)
cmd/module-controller/main.go

216-216: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)


218-218: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)

🔇 Additional comments (5)
suite/mqtt/suite_test.go (5)

Line range hint 1-31: LGTM: Package and import changes align with module restructuring

The package rename and import updates properly reflect the refactoring of the MQTT module structure.


108-113: LGTM: Clean module deployment controller setup

The initialization and setup of the module deployment controller follows good practices with proper error handling.


115-123: Reusing existing review comment

A previous review comment already addresses the improvements needed in the tunnel initialization error handling.


140-142: Reusing existing review comment

A previous review comment already addresses the redundant error logging before assertion.


150-153: Reusing existing review comment

A previous review comment already addresses the improvements needed in the cleanup process.

Comment on lines +89 to +95
go func() {
err := mqttServer.Serve()
if err != nil {
log.G(ctx).Error("failed to start mqtt server")
panic(err)
}
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enhance MQTT server initialization with proper error handling

The current implementation has several issues that could lead to reliability problems:

  1. No way to confirm successful server startup
  2. No context for proper cancellation
  3. Panic in goroutine is an anti-pattern

Apply this improvement:

+serverErrCh := make(chan error, 1)
 go func() {
-    err := mqttServer.Serve()
+    err := mqttServer.Serve(ctx)
     if err != nil {
         log.G(ctx).Error("failed to start mqtt server")
-        panic(err)
+        serverErrCh <- err
+        return
     }
+    serverErrCh <- nil
 }()
+
+select {
+case err := <-serverErrCh:
+    if err != nil {
+        return fmt.Errorf("MQTT server failed to start: %v", err)
+    }
+case <-time.After(5 * time.Second):
+    return fmt.Errorf("MQTT server startup timed out")
+}

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +204 to +213
for _, t := range tunnels {
err := t.Start(ctx, clientId, env)
if err != nil {
log.G(ctx).WithError(err).Error("failed to start tunnel", t.Key())
startFailedCount++
} else {
log.G(ctx).Info("Tunnel started: ", t.Key())
successTunnelCount++
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add synchronization for concurrent tunnel operations

The tunnel initialization loop lacks synchronization mechanisms. If multiple tunnels are started concurrently, the shared counters (successTunnelCount and startFailedCount) could lead to race conditions.

Apply this diff to add proper synchronization:

+	var mu sync.Mutex
 	for _, t := range tunnels {
 		err := t.Start(ctx, clientId, env)
 		if err != nil {
+			mu.Lock()
 			log.G(ctx).WithError(err).Error("failed to start tunnel", t.Key())
 			startFailedCount++
+			mu.Unlock()
 		} else {
+			mu.Lock()
 			log.G(ctx).Info("Tunnel started: ", t.Key())
 			successTunnelCount++
+			mu.Unlock()
 		}
 	}

Also, add the following import:

+import "sync"

Committable suggestion skipped: line range outside the PR's diff.

@lvjing2 lvjing2 changed the title update vk multiple optimization Nov 25, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (6)
suite/http/suite_test.go (2)

38-38: Consider localizing context management

Defining context with cancel globally could lead to potential issues with test isolation and cleanup. Consider moving the context creation into the BeforeSuite function and storing it in a global variable there.

-var ctx, cancel = context.WithCancel(context.Background())
+var ctx context.Context
+var cancel context.CancelFunc

Then in BeforeSuite:

ctx, cancel = context.WithCancel(context.Background())

126-165: Add validation and documentation to helper function

The helper function could benefit from:

  1. Input validation for required fields
  2. GoDoc documentation explaining the purpose and requirements
  3. Constants for hardcoded values like container name and image
+// prepareModulePod creates a test Pod configured for module deployment testing.
+// Parameters:
+//   name: The name of the pod
+//   namespace: The namespace where the pod will be created
+//   nodeName: The name of the node where the pod should be scheduled
+// Returns: A Pod object configured with necessary labels and tolerations
 func prepareModulePod(name, namespace, nodeName string) v1.Pod {
+    if name == "" || namespace == "" || nodeName == "" {
+        panic("name, namespace, and nodeName are required")
+    }
+
+    const (
+        containerName = "biz"
+        containerImage = "suite-biz.jar"
+        bizVersionEnv = "BIZ_VERSION"
+        bizVersion = "0.0.1"
+    )
+
     return v1.Pod{
         // ... rest of the implementation
     }
 }
suite/mqtt/suite_test.go (2)

Line range hint 42-47: Consider adding a timeout to the context

While adding context with cancel is good, consider adding a timeout to prevent tests from hanging indefinitely.

-var ctx, cancel = context.WithCancel(context.Background())
+var ctx, cancel = context.WithTimeout(context.Background(), 5*time.Minute)

128-130: Consider making VNodeWorkerNum configurable

The worker number is currently hardcoded to 4. Consider making it configurable through environment variables for better flexibility in different test scenarios.

+const defaultWorkerNum = 4
+
+func getVNodeWorkerNum() int {
+    if val := os.Getenv("VNODE_WORKER_NUM"); val != "" {
+        if num, err := strconv.Atoi(val); err == nil {
+            return num
+        }
+    }
+    return defaultWorkerNum
+}

 vnodeController, err := vnode_controller.NewVNodeController(&model.BuildVNodeControllerConfig{
     ClientID:       clientID,
     Env:            env,
     VPodType:       model2.ComponentModule,
-    VNodeWorkerNum: 4,
+    VNodeWorkerNum: getVNodeWorkerNum(),
 }, &mqttTunnel)
cmd/module-controller/main.go (1)

215-219: Simplify error creation using fmt.Errorf.

Replace errors.New(fmt.Sprintf(...)) with fmt.Errorf(...) for better readability and efficiency.

Apply this diff:

-    if startFailedCount > 0 {
-        panic(errors.New(fmt.Sprintf("failed to start %d tunnels", startFailedCount)))
-    } else if successTunnelCount == 0 {
-        panic(errors.New(fmt.Sprintf("successfully started 0 tunnels")))
-    }
+    if startFailedCount > 0 {
+        panic(fmt.Errorf("failed to start %d tunnels", startFailedCount))
+    } else if successTunnelCount == 0 {
+        panic(fmt.Errorf("successfully started 0 tunnels"))
+    }
🧰 Tools
🪛 golangci-lint (1.61.0)

216-216: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)


218-218: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)

suite/mqtt/module_deployment_controller_suite_test.go (1)

65-67: Clarify the intention behind different cluster names.

While the node selector changes are good, there's an unexplained difference in cluster names:

  • deployment1 uses clusterName
  • deployment2 uses clusterName + "-2"

Please add a comment explaining why different cluster names are used, as this affects the test's readability.

Also applies to: 127-129

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8ca3ef2 and 241bd3f.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • cmd/module-controller/main.go (4 hunks)
  • go.mod (1 hunks)
  • suite/http/suite_test.go (1 hunks)
  • suite/mqtt/module_deployment_controller_suite_test.go (10 hunks)
  • suite/mqtt/suite_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod
🧰 Additional context used
🪛 golangci-lint (1.61.0)
cmd/module-controller/main.go

216-216: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)


218-218: S1028: should use fmt.Errorf(...) instead of errors.New(fmt.Sprintf(...))

(gosimple)

🔇 Additional comments (9)
suite/http/suite_test.go (3)

88-94: Simplify error handling using Ginkgo's Expect

This is a duplicate of a previous review comment. The error handling should be simplified using Ginkgo's Expect instead of panic.


104-104: Handle error from vnodeController.SetupWithManager

This is a duplicate of a previous review comment. The error returned by vnodeController.SetupWithManager needs to be handled.


122-122: Replace fixed sleep with polling mechanism

This is a duplicate of a previous review comment. The fixed sleep duration should be replaced with a more reliable polling mechanism.

suite/mqtt/suite_test.go (5)

Line range hint 1-31: LGTM: Package and import changes align with module restructuring

The package rename to 'mqtt' better reflects the test suite's purpose, and the import updates properly align with the codebase restructuring.


89-95: Reusing existing review comment

The previous review comment about enhancing MQTT server initialization with proper error handling is still applicable to this code segment.


102-104: LGTM: Metrics configuration properly updated

The metrics server configuration correctly uses the new API structure.


115-123: Reusing existing review comment

The previous review comment about improving tunnel initialization error handling is still applicable to this code segment.


150-153: Reusing existing review comment

The previous review comment about improving the cleanup process is still applicable to this code segment.

suite/mqtt/module_deployment_controller_suite_test.go (1)

1-1: LGTM! Package organization and mock setup improvements.

The changes improve code organization by moving to a more specific package name and centralizing the cluster name configuration.

Also applies to: 21-24

Comment on lines +109 to +113
go func() {
err = k8sManager.Start(ctx)
log.G(ctx).WithError(err).Error("k8sManager Start")
Expect(err).ToNot(HaveOccurred())
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling in goroutine

The error handling in the goroutine could be improved. Consider using a channel to propagate errors back to the test suite.

+errChan := make(chan error, 1)
 go func() {
-    err = k8sManager.Start(ctx)
-    log.G(ctx).WithError(err).Error("k8sManager Start")
-    Expect(err).ToNot(HaveOccurred())
+    if err := k8sManager.Start(ctx); err != nil {
+        log.G(ctx).WithError(err).Error("k8sManager Start")
+        errChan <- err
+        return
+    }
+    errChan <- nil
 }()
+
+// Wait for manager to start or timeout
+select {
+case err := <-errChan:
+    Expect(err).ToNot(HaveOccurred())
+case <-time.After(30 * time.Second):
+    Fail("Timeout waiting for manager to start")
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
go func() {
err = k8sManager.Start(ctx)
log.G(ctx).WithError(err).Error("k8sManager Start")
Expect(err).ToNot(HaveOccurred())
}()
errChan := make(chan error, 1)
go func() {
if err := k8sManager.Start(ctx); err != nil {
log.G(ctx).WithError(err).Error("k8sManager Start")
errChan <- err
return
}
errChan <- nil
}()
// Wait for manager to start or timeout
select {
case err := <-errChan:
Expect(err).ToNot(HaveOccurred())
case <-time.After(30 * time.Second):
Fail("Timeout waiting for manager to start")
}

Comment on lines +147 to +148
tunnel := startTunnels(ctx, clientID, env, mgr, mdc)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve error handling for tunnel initialization.

The startTunnels function uses panic for error handling, but this should be handled gracefully at the caller level. Consider modifying the function to return an error instead of panicking.

Apply this diff:

-func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
-    moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) tunnel.Tunnel {
+func startTunnels(ctx context.Context, clientId string, env string, mgr manager.Manager,
+    moduleDeploymentController *module_deployment_controller.ModuleDeploymentController) (tunnel.Tunnel, error) {

    // ... rest of the function ...
-    if startFailedCount > 0 {
-        panic(errors.New(fmt.Sprintf("failed to start %d tunnels", startFailedCount)))
-    } else if successTunnelCount == 0 {
-        panic(errors.New(fmt.Sprintf("successfully started 0 tunnels")))
-    }
+    if startFailedCount > 0 {
+        return nil, fmt.Errorf("failed to start %d tunnels", startFailedCount)
+    } else if successTunnelCount == 0 {
+        return nil, fmt.Errorf("successfully started 0 tunnels")
+    }
    return tunnels[0], nil
}

// And at the caller:
-tunnel := startTunnels(ctx, clientID, env, mgr, mdc)
+tunnel, err := startTunnels(ctx, clientID, env, mgr, mdc)
+if err != nil {
+    log.G(ctx).Error(err, "failed to initialize tunnels")
+    os.Exit(1)
+}

Committable suggestion skipped: line range outside the PR's diff.


It("one node online and deployment replicas should be 1", func() {
go mockBase.Start(ctx)
go mockBase.Start(ctx, clientID)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Define the source of clientID variable.

The clientID parameter is used in Start method calls but its source is undefined. Please:

  1. Define the clientID variable
  2. Document its purpose and expected values

Also applies to: 177-177

return err == nil && *depFromKubernetes.Spec.Replicas == 0
}, time.Second*20, time.Second).Should(BeTrue())
}, time.Minute*2, time.Second).Should(BeTrue())
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Extended timeout is inconsistent with other similar operations

The 2-minute timeout in the "mock base exit" test case is notably longer than other similar operations in the same file, which consistently use 20-second timeouts. This specific test waits for replicas to become zero after a base exit, which should not take significantly longer than other replica scaling operations.

  • The test mock base exit and replicas should be 0 finally uses time.Minute*2
  • All other replica scaling operations use time.Second*20
  • Both operations are checking for the same type of condition (replica count changes)

Consider aligning this timeout with other similar operations (20 seconds) or document why this specific case needs a longer timeout.

🔗 Analysis chain

Investigate the need for extended timeout.

The timeout has been significantly increased to 2 minutes. This might indicate:

  1. Potential performance issues in the test setup
  2. Slow cleanup operations

Please investigate if this extended timeout is masking underlying issues that should be addressed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other timeout changes in test files
# This helps identify if this is an isolated change or part of a broader pattern

# Search for timeout changes in test files
rg -A 2 'time\.(Minute|Second).*Should\(BeTrue\)' --type go

Length of output: 61


Script:

#!/bin/bash
# Let's try a broader search to understand the context of timeouts in test files
# First, let's look for the specific file and its timeout patterns
rg -B 2 -A 2 'time\.Minute.*2.*Second' suite/mqtt/module_deployment_controller_suite_test.go

# Then, let's look for similar patterns in other test files to understand the timeout conventions
rg -l 'Eventually.*time\.Minute' --type go

# Also search for the specific test context
rg -B 5 -A 5 'time\.Minute.*2.*Second' suite/mqtt/module_deployment_controller_suite_test.go

Length of output: 660


Script:

#!/bin/bash
# Let's understand the broader context of this test case and what it's waiting for
rg -B 15 -A 5 'time\.Minute.*2.*Second' suite/mqtt/module_deployment_controller_suite_test.go

# Also search for other timeout patterns in the same file to compare
rg 'Eventually.*time\.' suite/mqtt/module_deployment_controller_suite_test.go

# Look for any comments or descriptions about timing requirements
rg -B 5 'Describe|Context|It.*deployment' suite/mqtt/module_deployment_controller_suite_test.go

Length of output: 2062

@lvjing2 lvjing2 merged commit 9731d90 into main Nov 26, 2024
@lvjing2 lvjing2 deleted the youji-dev branch November 26, 2024 08:35
@coderabbitai coderabbitai bot mentioned this pull request Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant