-
Notifications
You must be signed in to change notification settings - Fork 465
swarm - switch to handoff node only after current node stops #1147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| logger.debug("reason=<%s> | stopping execution", reason) | ||
| break | ||
|
|
||
| # Get current node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed a few inline comments because I felt the code was already self explanatory.
src/strands/multiagent/swarm.py
Outdated
| self.state.node_history.append(current_node) | ||
|
|
||
| # After self.state add current node, swarm state finish updating, we persist here | ||
| self.hooks.invoke_callbacks(AfterNodeCallEvent(self, current_node.node_id, invocation_state)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reiterate, setting self.state.current_node = handoff_node in the handoff tool means that AfterNodeCallEvent is emitted with a current node_id that does not match the self.state.current_node.node_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for supporting interrupts, we can't have self.state.current_node update to the handoff node if the current node is interrupted.
|
|
||
| registry.add_callback(MultiAgentInitializedEvent, lambda event: self.initialize_multi_agent(event.source)) | ||
| registry.add_callback(AfterNodeCallEvent, lambda event: self.sync_multi_agent(event.source)) | ||
| registry.add_callback(BeforeNodeCallEvent, lambda event: self.sync_multi_agent(event.source)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say we have successfully executed one node and are now executing the handoff node. If we crash on the handoff node, we would be left in different states depending on which event we persist on:
AfterNodeCallEvent: Current node is not set to the handoff node in session because the handoff node hasn't yet emitted itsAfterNodeCallEvent. This means if we resume after crashing on the handoff node, we will be starting again from the first node.BeforeNodeCallEvent: Current node is set to the handoff node in session because the handoff node already emitted itsBeforeNodeCallEvent. This means if we resume after crashing on the handoff node, we will be starting again from the handoff node.
In short, persisting on AfterNodeCallEvent only made sense when setting the current node to the handoff in the handoff tool.
…nto swarm-handoff-delayed
| # After self.state add current node, swarm state finish updating, we persist here | ||
| await self.hooks.invoke_callbacks_async( | ||
| AfterNodeCallEvent(self, current_node.node_id, invocation_state) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reiterate, setting self.state.current_node = handoff_node in the handoff tool means that AfterNodeCallEvent is emitted with a current node_id that does not match the self.state.current_node.node_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for supporting interrupts, we can't have self.state.current_node update to the handoff node if the current node is interrupted.
Description
Set the handoff node to current only after the current node finishes. Currently, we make the switch in the middle of the current node execution. It is important to fix this for a few reasons:
AfterNodeCallEventwith the current node id andstate.current_nodeset to the handoff node. This is going to cause customer confusion.Related Issues
#204
Documentation PR
Implementation detail
Type of Change
Bug fix
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepare: Relying on existing unit testshatch test tests_integ/test_multiagent_swarm.pyChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.