Skip to content

Commit 1e1f404

Browse files
shayne-fletchermeta-codesync[bot]
authored andcommitted
system_actor: audit tests (meta-pytorch#1999)
Summary: Pull Request resolved: meta-pytorch#1999 this change applies the same V0->V1 test audit to 'system_actor.rs' that we did for 'proc_actor.rs'. i went through all 7 tests in this file and tagged each one as V0-specific, with no V1 equivalent: they all exercise `SystemActor`-centric behavior that simply doesn’t exist in the V1 design (world-level supervision state, world orchestration via hosts joining and `UpsertWorld`, snapshot filtering, and `ReportingRouter`/dynamic address updates). the new comments spell out what each test is validating in V0 and why there is no direct V1 analog, and where relevant they point at the closest V1 mechanism (for example, undeliverable handling now covered by `actor_mesh::test_undeliverable_message_return`). there are no behavior changes here; this is purely test documentation and migration context for the remaining V0 multiprocess tests in 'system_actor.rs'. Reviewed By: pzhan9 Differential Revision: D87890512 fbshipit-source-id: 389e9070de428de20e6b79b87ecfdcdced989703
1 parent 9d2cf9e commit 1e1f404

File tree

1 file changed

+52
-0
lines changed

1 file changed

+52
-0
lines changed

hyperactor_multiprocess/src/system_actor.rs

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1890,6 +1890,14 @@ mod tests {
18901890
}
18911891
}
18921892

1893+
// V0-specific test - no V1 equivalent. Unit test for
1894+
// SystemSupervisionState which tracks proc health and failed
1895+
// actors centrally at world level. Tests heartbeat timeout
1896+
// detection (marks procs expired if no heartbeat within timeout)
1897+
// and failed actor aggregation. V1 does not have centralized
1898+
// supervision state - V1 uses local supervision where actors
1899+
// handle ActorSupervisionEvent locally rather than reporting to a
1900+
// central SystemActor for world-level health monitoring.
18931901
#[tokio::test]
18941902
async fn test_supervision_state() {
18951903
let mut sv = SystemSupervisionState::new(Duration::from_secs(1));
@@ -1989,6 +1997,16 @@ mod tests {
19891997
);
19901998
}
19911999

2000+
// V0-specific test - no V1 equivalent. Tests SystemActor world
2001+
// orchestration where hosts can join before world is created.
2002+
// Flow: hosts send Join messages → queued by SystemActor →
2003+
// UpsertWorld defines world topology → SystemActor sends
2004+
// SpawnProc messages telling each host which procs to spawn.
2005+
// Verifies correct proc assignment across hosts. V1 does not have
2006+
// this orchestration model - V1 uses coordinated ProcMesh
2007+
// allocation where meshes are allocated in one operation, not
2008+
// assembled from hosts independently joining a central
2009+
// SystemActor.
19922010
#[tracing_test::traced_test]
19932011
#[tokio::test]
19942012
async fn test_host_join_before_world() {
@@ -2064,6 +2082,14 @@ mod tests {
20642082
}
20652083
}
20662084

2085+
// V0-specific test - no V1 equivalent. Tests SystemActor world
2086+
// orchestration where world is created before hosts join (reverse
2087+
// order of test_host_join_before_world). Flow: UpsertWorld
2088+
// defines topology → hosts send Join messages → SystemActor
2089+
// immediately sends SpawnProc messages. Tests that join order
2090+
// doesn't matter. V1 does not have this orchestration model - V1
2091+
// uses coordinated ProcMesh allocation where meshes are allocated
2092+
// in one operation.
20672093
#[tokio::test]
20682094
async fn test_host_join_after_world() {
20692095
// Spins up a new world with 2 hosts, with 3 procs each.
@@ -2138,6 +2164,12 @@ mod tests {
21382164
}
21392165
}
21402166

2167+
// V0-specific test - no V1 equivalent. Unit test for
2168+
// SystemSnapshotFilter which filters worlds by name and labels
2169+
// when querying SystemActor. Tests world_matches() and
2170+
// labels_match() logic. V1 does not have SystemActor or
2171+
// SystemSnapshot - V1 uses mesh-based iteration and state queries
2172+
// instead.
21412173
#[test]
21422174
fn test_snapshot_filter() {
21432175
let test_world = World::new(
@@ -2176,6 +2208,13 @@ mod tests {
21762208
));
21772209
}
21782210

2211+
// V0-specific test - no V1 equivalent. Tests SystemActor
2212+
// supervision behavior when mailbox server crashes: undeliverable
2213+
// messages are handled AND system supervision detects the
2214+
// unhealthy world state. V1 does not have SystemActor or world
2215+
// supervision. V1 undeliverable message handling (without
2216+
// supervision) is tested in
2217+
// hyperactor_mesh/src/v1/actor_mesh.rs::test_undeliverable_message_return.
21792218
#[tokio::test]
21802219
async fn test_undeliverable_message_return() {
21812220
// System can't send a message to a remote actor because the
@@ -2349,6 +2388,13 @@ mod tests {
23492388
));
23502389
}
23512390

2391+
// V0-specific test - no V1 equivalent. Tests SystemActor stop
2392+
// when system is empty (no worlds). Sends SystemMessage::Stop to
2393+
// central SystemActor which coordinates shutdown of all worlds.
2394+
// V1 does not have a central SystemActor - V1 uses mesh-level
2395+
// stop operations (ProcMesh::stop(), HostMesh::shutdown()) where
2396+
// you stop individual meshes rather than a system-wide
2397+
// coordinator.
23522398
#[tokio::test]
23532399
async fn test_stop_fast() -> Result<()> {
23542400
let server_handle = System::serve(
@@ -2380,6 +2426,12 @@ mod tests {
23802426
Ok(())
23812427
}
23822428

2429+
// V0-specific test - no V1 equivalent. Tests ReportingRouter's
2430+
// UpdateAddress behavior in simnet mode. When messages are sent,
2431+
// post_update_address() sends MailboxAdminMessage::UpdateAddress
2432+
// to update address caches with simnet source routing info. V1
2433+
// does not have ReportingRouter or dynamic address updates - V1
2434+
// uses static/direct addressing.
23832435
#[tokio::test]
23842436
async fn test_update_sim_address() {
23852437
simnet::start();

0 commit comments

Comments
 (0)