ui/phase-25: backend turn-cutoff guard + auto-pause + UI sync protocol
Backend now owns the turn-cutoff and pause guards the order tab relies on: the scheduler flips runtime_status between generation_in_progress and running around every engine tick, a failed tick auto-pauses the game through OnRuntimeSnapshot, and a new game.paused notification kind fans out alongside game.turn.ready. The user-games handlers reject submits with HTTP 409 turn_already_closed or game_paused depending on the runtime state. UI delegates auto-sync to a new OrderQueue: offline detection, single retry on reconnect, conflict / paused classification. OrderDraftStore surfaces conflictBanner / pausedBanner runes, clears them on local mutation or on a game.turn.ready push via resetForNewTurn. The order tab renders the matching banners and the new conflict per-row badge; i18n bundles cover en + ru. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+25
-7
@@ -333,20 +333,38 @@ cannot guarantee.
|
||||
| `runtime.image_pull_failed` | admin email | `game_id`, `image_ref` |
|
||||
| `runtime.container_start_failed` | admin email | `game_id` |
|
||||
| `runtime.start_config_invalid` | admin email | `game_id`, `reason` |
|
||||
| `game.turn.ready` | push | `game_id`, `turn` |
|
||||
| `game.paused` | push | `game_id`, `turn`, `reason` |
|
||||
|
||||
Admin-channel kinds (`runtime.*`) deliver email to
|
||||
`BACKEND_NOTIFICATION_ADMIN_EMAIL`; when the variable is empty, those
|
||||
routes land in `notification_routes` with `status='skipped'` and the
|
||||
operator log line records the configuration miss.
|
||||
|
||||
`game.turn.ready` is emitted by `lobby.Service.OnRuntimeSnapshot`
|
||||
(`backend/internal/lobby/runtime_hooks.go`) whenever the engine's
|
||||
`current_turn` advances. The intent targets every active membership
|
||||
of the game, uses idempotency key `turn-ready:<game_id>:<turn>`, and
|
||||
carries the JSON payload `{game_id, turn}`. The catalog routes it
|
||||
through the push channel only — per-turn email would be spam — so
|
||||
`game.turn.ready` and `game.paused` are emitted by
|
||||
`lobby.Service.OnRuntimeSnapshot`
|
||||
(`backend/internal/lobby/runtime_hooks.go`):
|
||||
|
||||
- `game.turn.ready` fires whenever the engine's `current_turn`
|
||||
advances. Idempotency key `turn-ready:<game_id>:<turn>`, JSON
|
||||
payload `{game_id, turn}`.
|
||||
- `game.paused` fires whenever the same hook flips the game
|
||||
`running → paused` because a runtime snapshot landed with
|
||||
`engine_unreachable` / `generation_failed`. Idempotency key
|
||||
`paused:<game_id>:<turn>`, JSON payload
|
||||
`{game_id, turn, reason}` (reason carries the runtime status
|
||||
that triggered the transition). The runtime scheduler
|
||||
(`backend/internal/runtime/scheduler.go`) forwards the failing
|
||||
snapshot through `Service.publishFailureSnapshot` so a single
|
||||
failing tick reliably reaches lobby.
|
||||
|
||||
Both kinds target every active membership and route through the
|
||||
push channel only — per-turn / per-pause email would be spam — so
|
||||
the UI's signed `SubscribeEvents` stream
|
||||
(`ui/frontend/src/api/events.svelte.ts`) is the sole delivery path.
|
||||
(`ui/frontend/src/api/events.svelte.ts`) is the sole delivery
|
||||
path. The order tab consumes them via
|
||||
`OrderDraftStore.resetForNewTurn` / `markPaused`
|
||||
(`ui/docs/sync-protocol.md`).
|
||||
|
||||
The remaining `game.*` (`game.started`, `game.generation.failed`,
|
||||
`game.finished`) and `mail.dead_lettered` are reserved kinds without
|
||||
|
||||
@@ -110,6 +110,7 @@ const (
|
||||
NotificationLobbyRaceNamePending = "lobby.race_name.pending"
|
||||
NotificationLobbyRaceNameExpired = "lobby.race_name.expired"
|
||||
NotificationGameTurnReady = "game.turn.ready"
|
||||
NotificationGamePaused = "game.paused"
|
||||
)
|
||||
|
||||
// Deps aggregates every collaborator the lobby Service depends on.
|
||||
|
||||
@@ -37,6 +37,7 @@ func (s *Service) OnRuntimeSnapshot(ctx context.Context, gameID uuid.UUID, snaps
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
transitionedToPaused := false
|
||||
if next, transition := nextStatusFromSnapshot(updated.Status, snapshot); transition {
|
||||
switch next {
|
||||
case GameStatusFinished:
|
||||
@@ -53,12 +54,18 @@ func (s *Service) OnRuntimeSnapshot(ctx context.Context, gameID uuid.UUID, snaps
|
||||
return err
|
||||
}
|
||||
updated = rec
|
||||
if next == GameStatusPaused {
|
||||
transitionedToPaused = true
|
||||
}
|
||||
}
|
||||
}
|
||||
s.deps.Cache.PutGame(updated)
|
||||
if merged.CurrentTurn > prevTurn {
|
||||
s.publishTurnReady(ctx, gameID, merged.CurrentTurn)
|
||||
}
|
||||
if transitionedToPaused {
|
||||
s.publishGamePaused(ctx, gameID, merged.CurrentTurn, snapshot.RuntimeStatus)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -106,6 +113,56 @@ func (s *Service) publishTurnReady(ctx context.Context, gameID uuid.UUID, turn i
|
||||
}
|
||||
}
|
||||
|
||||
// publishGamePaused fans out a `game.paused` notification to every
|
||||
// active member of the game when the lobby flips the game to
|
||||
// `paused` in reaction to a runtime snapshot (typically a failed
|
||||
// turn generation). The intent is best-effort: a publisher failure
|
||||
// is logged at warn level and does not abort the snapshot
|
||||
// bookkeeping. Idempotency is anchored on (game_id, turn) so a
|
||||
// repeated `generation_failed` snapshot for the same turn collapses
|
||||
// into a single notification at the notification.Submit boundary.
|
||||
//
|
||||
// reason carries the raw runtime status that triggered the pause
|
||||
// (`engine_unreachable` / `generation_failed`); the UI displays a
|
||||
// status-agnostic banner today but the payload is preserved so a
|
||||
// future revision of the order tab can differentiate.
|
||||
func (s *Service) publishGamePaused(ctx context.Context, gameID uuid.UUID, turn int32, reason string) {
|
||||
memberships, err := s.deps.Store.ListMembershipsForGame(ctx, gameID)
|
||||
if err != nil {
|
||||
s.deps.Logger.Warn("game-paused notification: list memberships failed",
|
||||
zap.String("game_id", gameID.String()),
|
||||
zap.Int32("turn", turn),
|
||||
zap.Error(err))
|
||||
return
|
||||
}
|
||||
recipients := make([]uuid.UUID, 0, len(memberships))
|
||||
for _, m := range memberships {
|
||||
if m.Status != MembershipStatusActive {
|
||||
continue
|
||||
}
|
||||
recipients = append(recipients, m.UserID)
|
||||
}
|
||||
if len(recipients) == 0 {
|
||||
return
|
||||
}
|
||||
intent := LobbyNotification{
|
||||
Kind: NotificationGamePaused,
|
||||
IdempotencyKey: fmt.Sprintf("paused:%s:%d", gameID, turn),
|
||||
Recipients: recipients,
|
||||
Payload: map[string]any{
|
||||
"game_id": gameID.String(),
|
||||
"turn": turn,
|
||||
"reason": reason,
|
||||
},
|
||||
}
|
||||
if pubErr := s.deps.Notification.PublishLobbyEvent(ctx, intent); pubErr != nil {
|
||||
s.deps.Logger.Warn("game-paused notification failed",
|
||||
zap.String("game_id", gameID.String()),
|
||||
zap.Int32("turn", turn),
|
||||
zap.Error(pubErr))
|
||||
}
|
||||
}
|
||||
|
||||
// OnGameFinished completes the game lifecycle: marks the game as
|
||||
// `finished`, evaluates capable-finish per active member, and
|
||||
// transitions reservation rows to either `pending_registration`
|
||||
@@ -278,13 +335,28 @@ func mergeRuntimeSnapshot(prev, next RuntimeSnapshot) RuntimeSnapshot {
|
||||
// nextStatusFromSnapshot maps the runtime-reported runtime status into
|
||||
// a lobby status transition. Returns (next, true) when the lobby
|
||||
// status must change; (current, false) otherwise.
|
||||
//
|
||||
// The map intentionally distinguishes the pre-running boot path
|
||||
// (`starting → start_failed`) from the in-flight failure path
|
||||
// (`running → paused`). Paused games can be resumed by the admin via
|
||||
// the explicit `/resume` transition; the runtime keeps the engine
|
||||
// container alive, the scheduler short-circuits ticks while paused,
|
||||
// and any user-games command/order is rejected by the order handler
|
||||
// with `turn_already_closed` until the game resumes.
|
||||
func nextStatusFromSnapshot(currentStatus string, snapshot RuntimeSnapshot) (string, bool) {
|
||||
switch snapshot.RuntimeStatus {
|
||||
case "running":
|
||||
if currentStatus == GameStatusStarting {
|
||||
return GameStatusRunning, true
|
||||
}
|
||||
case "engine_unreachable", "start_failed", "generation_failed":
|
||||
case "engine_unreachable", "generation_failed":
|
||||
if currentStatus == GameStatusStarting {
|
||||
return GameStatusStartFailed, true
|
||||
}
|
||||
if currentStatus == GameStatusRunning {
|
||||
return GameStatusPaused, true
|
||||
}
|
||||
case "start_failed":
|
||||
if currentStatus == GameStatusStarting {
|
||||
return GameStatusStartFailed, true
|
||||
}
|
||||
|
||||
@@ -0,0 +1,127 @@
|
||||
package lobby
|
||||
|
||||
import "testing"
|
||||
|
||||
// TestNextStatusFromSnapshot covers the pure status-mapping function
|
||||
// that drives `OnRuntimeSnapshot`'s lifecycle transitions. The Phase
|
||||
// 25 contribution is the `running → paused` branch on
|
||||
// `engine_unreachable` / `generation_failed`: the order handler relies
|
||||
// on the `paused` game status to reject late submits with
|
||||
// `turn_already_closed`.
|
||||
func TestNextStatusFromSnapshot(t *testing.T) {
|
||||
t.Parallel()
|
||||
|
||||
tests := []struct {
|
||||
name string
|
||||
currentStatus string
|
||||
runtimeStatus string
|
||||
wantStatus string
|
||||
wantTransit bool
|
||||
}{
|
||||
{
|
||||
name: "starting then running flips to running",
|
||||
currentStatus: GameStatusStarting,
|
||||
runtimeStatus: "running",
|
||||
wantStatus: GameStatusRunning,
|
||||
wantTransit: true,
|
||||
},
|
||||
{
|
||||
name: "running on running snapshot does not transit",
|
||||
currentStatus: GameStatusRunning,
|
||||
runtimeStatus: "running",
|
||||
wantStatus: GameStatusRunning,
|
||||
wantTransit: false,
|
||||
},
|
||||
{
|
||||
name: "starting then engine_unreachable flips to start_failed",
|
||||
currentStatus: GameStatusStarting,
|
||||
runtimeStatus: "engine_unreachable",
|
||||
wantStatus: GameStatusStartFailed,
|
||||
wantTransit: true,
|
||||
},
|
||||
{
|
||||
name: "starting then generation_failed flips to start_failed",
|
||||
currentStatus: GameStatusStarting,
|
||||
runtimeStatus: "generation_failed",
|
||||
wantStatus: GameStatusStartFailed,
|
||||
wantTransit: true,
|
||||
},
|
||||
{
|
||||
name: "running then engine_unreachable flips to paused",
|
||||
currentStatus: GameStatusRunning,
|
||||
runtimeStatus: "engine_unreachable",
|
||||
wantStatus: GameStatusPaused,
|
||||
wantTransit: true,
|
||||
},
|
||||
{
|
||||
name: "running then generation_failed flips to paused",
|
||||
currentStatus: GameStatusRunning,
|
||||
runtimeStatus: "generation_failed",
|
||||
wantStatus: GameStatusPaused,
|
||||
wantTransit: true,
|
||||
},
|
||||
{
|
||||
name: "paused stays paused on repeated failed snapshot",
|
||||
currentStatus: GameStatusPaused,
|
||||
runtimeStatus: "generation_failed",
|
||||
wantStatus: GameStatusPaused,
|
||||
wantTransit: false,
|
||||
},
|
||||
{
|
||||
name: "starting then start_failed flips to start_failed",
|
||||
currentStatus: GameStatusStarting,
|
||||
runtimeStatus: "start_failed",
|
||||
wantStatus: GameStatusStartFailed,
|
||||
wantTransit: true,
|
||||
},
|
||||
{
|
||||
name: "running ignores start_failed",
|
||||
currentStatus: GameStatusRunning,
|
||||
runtimeStatus: "start_failed",
|
||||
wantStatus: GameStatusRunning,
|
||||
wantTransit: false,
|
||||
},
|
||||
{
|
||||
name: "running on finished flips to finished",
|
||||
currentStatus: GameStatusRunning,
|
||||
runtimeStatus: "finished",
|
||||
wantStatus: GameStatusFinished,
|
||||
wantTransit: true,
|
||||
},
|
||||
{
|
||||
name: "finished stays finished on finished snapshot",
|
||||
currentStatus: GameStatusFinished,
|
||||
runtimeStatus: "finished",
|
||||
wantStatus: GameStatusFinished,
|
||||
wantTransit: false,
|
||||
},
|
||||
{
|
||||
name: "cancelled stays cancelled on finished snapshot",
|
||||
currentStatus: GameStatusCancelled,
|
||||
runtimeStatus: "finished",
|
||||
wantStatus: GameStatusCancelled,
|
||||
wantTransit: false,
|
||||
},
|
||||
{
|
||||
name: "paused on stopped snapshot flips to finished",
|
||||
currentStatus: GameStatusPaused,
|
||||
runtimeStatus: "stopped",
|
||||
wantStatus: GameStatusFinished,
|
||||
wantTransit: true,
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
got, transit := nextStatusFromSnapshot(tt.currentStatus, RuntimeSnapshot{
|
||||
RuntimeStatus: tt.runtimeStatus,
|
||||
})
|
||||
if got != tt.wantStatus {
|
||||
t.Errorf("status = %q, want %q", got, tt.wantStatus)
|
||||
}
|
||||
if transit != tt.wantTransit {
|
||||
t.Errorf("transit = %v, want %v", transit, tt.wantTransit)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -18,6 +18,7 @@ const (
|
||||
KindRuntimeContainerStartFailed = "runtime.container_start_failed"
|
||||
KindRuntimeStartConfigInvalid = "runtime.start_config_invalid"
|
||||
KindGameTurnReady = "game.turn.ready"
|
||||
KindGamePaused = "game.paused"
|
||||
)
|
||||
|
||||
// CatalogEntry describes the per-kind delivery policy: which channels
|
||||
@@ -99,6 +100,9 @@ var catalog = map[string]CatalogEntry{
|
||||
KindGameTurnReady: {
|
||||
Channels: []string{ChannelPush},
|
||||
},
|
||||
KindGamePaused: {
|
||||
Channels: []string{ChannelPush},
|
||||
},
|
||||
}
|
||||
|
||||
// LookupCatalog returns the per-kind policy and a boolean reporting
|
||||
@@ -128,5 +132,6 @@ func SupportedKinds() []string {
|
||||
KindRuntimeContainerStartFailed,
|
||||
KindRuntimeStartConfigInvalid,
|
||||
KindGameTurnReady,
|
||||
KindGamePaused,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -40,6 +40,7 @@ func TestCatalogChannels(t *testing.T) {
|
||||
KindRuntimeContainerStartFailed: {ChannelEmail},
|
||||
KindRuntimeStartConfigInvalid: {ChannelEmail},
|
||||
KindGameTurnReady: {ChannelPush},
|
||||
KindGamePaused: {ChannelPush},
|
||||
}
|
||||
for kind, want := range expect {
|
||||
entry, ok := LookupCatalog(kind)
|
||||
|
||||
@@ -20,8 +20,14 @@ import (
|
||||
// other consumer reads the payload — adopting the FB encoder would
|
||||
// require a new TS notification stub set and the regen tooling for
|
||||
// `pkg/schema/fbs/notification.fbs` without buying anything.
|
||||
//
|
||||
// `game.paused` (Phase 25) follows the same JSON-friendly contract:
|
||||
// payload is `{game_id, turn, reason}` consumed by the same in-game
|
||||
// shell layout, so there is no value in dragging a FB schema in for
|
||||
// one consumer.
|
||||
var jsonFriendlyKinds = map[string]bool{
|
||||
KindGameTurnReady: true,
|
||||
KindGamePaused: true,
|
||||
}
|
||||
|
||||
// TestBuildClientPushEventCoversCatalog asserts that every catalog kind
|
||||
@@ -77,6 +83,11 @@ func TestBuildClientPushEventCoversCatalog(t *testing.T) {
|
||||
"game_id": gameID.String(),
|
||||
"turn": int32(7),
|
||||
}},
|
||||
{"game paused", KindGamePaused, map[string]any{
|
||||
"game_id": gameID.String(),
|
||||
"turn": int32(7),
|
||||
"reason": "generation_failed",
|
||||
}},
|
||||
}
|
||||
|
||||
seenKinds := map[string]bool{}
|
||||
|
||||
@@ -606,7 +606,7 @@ CREATE TABLE notifications (
|
||||
'lobby.race_name.expired',
|
||||
'runtime.image_pull_failed', 'runtime.container_start_failed',
|
||||
'runtime.start_config_invalid',
|
||||
'game.turn.ready'
|
||||
'game.turn.ready', 'game.paused'
|
||||
))
|
||||
);
|
||||
|
||||
|
||||
@@ -42,4 +42,23 @@ var (
|
||||
// ErrShutdown means the runtime service has stopped accepting
|
||||
// work because the parent context was cancelled.
|
||||
ErrShutdown = errors.New("runtime: shutting down")
|
||||
|
||||
// ErrTurnAlreadyClosed reports that the runtime is currently
|
||||
// producing a turn — runtime status is `generation_in_progress`
|
||||
// — and the engine is not accepting writes for the closing
|
||||
// turn. Handlers map this to HTTP 409 with httperr code
|
||||
// `turn_already_closed`; the UI shows a conflict banner and
|
||||
// waits for the next `game.turn.ready` push.
|
||||
ErrTurnAlreadyClosed = errors.New("runtime: turn already closed")
|
||||
|
||||
// ErrGamePaused reports that the game is not in a state that
|
||||
// accepts user-games commands or orders: the runtime row
|
||||
// carries `paused = true`, or the runtime status lands on any
|
||||
// terminal value (`engine_unreachable`, `generation_failed`,
|
||||
// `stopped`, `finished`, `removed`), or the game has not yet
|
||||
// finished bootstrapping (`starting`). Handlers map this to
|
||||
// HTTP 409 with httperr code `game_paused`; the UI surfaces a
|
||||
// pause banner and waits for an admin resume or a fresh
|
||||
// snapshot.
|
||||
ErrGamePaused = errors.New("runtime: game paused")
|
||||
)
|
||||
|
||||
@@ -0,0 +1,82 @@
|
||||
package runtime
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestOrdersAcceptStatus pins down the Phase 25 pre-check that
|
||||
// gates the user-games command/order handlers against the runtime
|
||||
// record. The decision must distinguish a turn cutoff (engine is
|
||||
// producing) from a paused game so the UI can surface the right
|
||||
// banner; all other non-running runtime statuses collapse into
|
||||
// `ErrGamePaused`.
|
||||
func TestOrdersAcceptStatus(t *testing.T) {
|
||||
t.Parallel()
|
||||
|
||||
tests := []struct {
|
||||
name string
|
||||
rec RuntimeRecord
|
||||
want error
|
||||
}{
|
||||
{
|
||||
name: "running and not paused accepts orders",
|
||||
rec: RuntimeRecord{Status: RuntimeStatusRunning, Paused: false},
|
||||
want: nil,
|
||||
},
|
||||
{
|
||||
name: "running but paused returns game paused",
|
||||
rec: RuntimeRecord{Status: RuntimeStatusRunning, Paused: true},
|
||||
want: ErrGamePaused,
|
||||
},
|
||||
{
|
||||
name: "generation in progress returns turn already closed",
|
||||
rec: RuntimeRecord{Status: RuntimeStatusGenerationInProgress},
|
||||
want: ErrTurnAlreadyClosed,
|
||||
},
|
||||
{
|
||||
name: "generation failed returns game paused",
|
||||
rec: RuntimeRecord{Status: RuntimeStatusGenerationFailed},
|
||||
want: ErrGamePaused,
|
||||
},
|
||||
{
|
||||
name: "engine unreachable returns game paused",
|
||||
rec: RuntimeRecord{Status: RuntimeStatusEngineUnreachable},
|
||||
want: ErrGamePaused,
|
||||
},
|
||||
{
|
||||
name: "stopped returns game paused",
|
||||
rec: RuntimeRecord{Status: RuntimeStatusStopped},
|
||||
want: ErrGamePaused,
|
||||
},
|
||||
{
|
||||
name: "finished returns game paused",
|
||||
rec: RuntimeRecord{Status: RuntimeStatusFinished},
|
||||
want: ErrGamePaused,
|
||||
},
|
||||
{
|
||||
name: "removed returns game paused",
|
||||
rec: RuntimeRecord{Status: RuntimeStatusRemoved},
|
||||
want: ErrGamePaused,
|
||||
},
|
||||
{
|
||||
name: "starting returns game paused",
|
||||
rec: RuntimeRecord{Status: RuntimeStatusStarting},
|
||||
want: ErrGamePaused,
|
||||
},
|
||||
{
|
||||
name: "paused takes precedence over generation in progress",
|
||||
rec: RuntimeRecord{Status: RuntimeStatusGenerationInProgress, Paused: true},
|
||||
want: ErrGamePaused,
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
got := OrdersAcceptStatus(tt.rec)
|
||||
if !errors.Is(got, tt.want) {
|
||||
t.Errorf("OrdersAcceptStatus = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -7,6 +7,7 @@ import (
|
||||
"time"
|
||||
|
||||
"galaxy/backend/internal/dockerclient"
|
||||
"galaxy/backend/internal/engineclient"
|
||||
"galaxy/cronutil"
|
||||
|
||||
"github.com/google/uuid"
|
||||
@@ -213,6 +214,22 @@ func (sch *Scheduler) loop(ctx context.Context, rec RuntimeRecord, done chan str
|
||||
|
||||
// tick runs one engine /admin/turn call under the per-game mutex,
|
||||
// publishes the resulting snapshot, and clears `skip_next_tick`.
|
||||
//
|
||||
// Phase 25 wraps the engine call between two runtime-status flips so
|
||||
// the backend order handler can reject late submits while the engine
|
||||
// is producing:
|
||||
//
|
||||
// - before `Engine.Turn`: runtime status moves to
|
||||
// `generation_in_progress`; the loop's running-only guard tolerates
|
||||
// this because the flip back happens inside the same tick.
|
||||
// - on success: runtime status moves back to `running` (unless the
|
||||
// engine reports `finished`, in which case `publishSnapshot` has
|
||||
// already promoted the row to `finished`).
|
||||
// - on error: runtime status moves to `generation_failed` (engine
|
||||
// validation failure) or `engine_unreachable` (transport / 5xx).
|
||||
// The matching snapshot is forwarded to lobby through
|
||||
// `publishFailureSnapshot` so lobby can flip the game to `paused`
|
||||
// and emit `game.paused`.
|
||||
func (sch *Scheduler) tick(ctx context.Context, rec RuntimeRecord) error {
|
||||
mu := sch.svc.gameLock(rec.GameID)
|
||||
if !mu.TryLock() {
|
||||
@@ -224,10 +241,24 @@ func (sch *Scheduler) tick(ctx context.Context, rec RuntimeRecord) error {
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if _, err := sch.svc.transitionRuntimeStatus(ctx, rec.GameID, RuntimeStatusGenerationInProgress, ""); err != nil {
|
||||
sch.svc.completeOperation(ctx, op, err)
|
||||
return err
|
||||
}
|
||||
state, err := sch.svc.deps.Engine.Turn(ctx, rec.EngineEndpoint)
|
||||
if err != nil {
|
||||
sch.svc.completeOperation(ctx, op, err)
|
||||
_, _ = sch.svc.transitionRuntimeStatus(ctx, rec.GameID, RuntimeStatusEngineUnreachable, "")
|
||||
failureStatus := RuntimeStatusEngineUnreachable
|
||||
if errors.Is(err, engineclient.ErrEngineValidation) {
|
||||
failureStatus = RuntimeStatusGenerationFailed
|
||||
}
|
||||
_, _ = sch.svc.transitionRuntimeStatus(ctx, rec.GameID, failureStatus, "down")
|
||||
if pubErr := sch.svc.publishFailureSnapshot(ctx, rec.GameID, failureStatus); pubErr != nil {
|
||||
sch.svc.deps.Logger.Warn("publish failure snapshot to lobby",
|
||||
zap.String("game_id", rec.GameID.String()),
|
||||
zap.String("runtime_status", failureStatus),
|
||||
zap.Error(pubErr))
|
||||
}
|
||||
// On engine unreachable, also clear skip_next_tick so the next
|
||||
// real tick can start fresh.
|
||||
_ = sch.clearSkipFlag(ctx, rec.GameID)
|
||||
@@ -244,6 +275,12 @@ func (sch *Scheduler) tick(ctx context.Context, rec RuntimeRecord) error {
|
||||
sch.svc.completeOperation(ctx, op, err)
|
||||
return err
|
||||
}
|
||||
if !state.Finished {
|
||||
// `publishSnapshot` patches CurrentTurn / EngineHealth but does
|
||||
// not reset the status column; reopen the orders window here so
|
||||
// the next loop iteration finds the runtime back in `running`.
|
||||
_, _ = sch.svc.transitionRuntimeStatus(ctx, rec.GameID, RuntimeStatusRunning, "ok")
|
||||
}
|
||||
sch.svc.completeOperation(ctx, op, nil)
|
||||
_ = sch.clearSkipFlag(ctx, rec.GameID)
|
||||
return nil
|
||||
|
||||
@@ -257,6 +257,57 @@ func (s *Service) ResolvePlayerMapping(ctx context.Context, gameID, userID uuid.
|
||||
return s.deps.Store.LoadPlayerMapping(ctx, gameID, userID)
|
||||
}
|
||||
|
||||
// CheckOrdersAccept verifies that the runtime is in a state that
|
||||
// accepts user-games commands and orders. It is called by the user
|
||||
// game-proxy handlers (`Commands`, `Orders`) before forwarding to
|
||||
// engine, so the backend's turn-cutoff and pause guards run before
|
||||
// network traffic leaves the host. The decision itself lives in the
|
||||
// pure helper `OrdersAcceptStatus` so it can be unit-tested without
|
||||
// constructing a full Service.
|
||||
//
|
||||
// A missing runtime row is surfaced as `ErrNotFound` so the handler
|
||||
// keeps its existing 404 behaviour.
|
||||
func (s *Service) CheckOrdersAccept(ctx context.Context, gameID uuid.UUID) error {
|
||||
rec, err := s.GetRuntime(ctx, gameID)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return OrdersAcceptStatus(rec)
|
||||
}
|
||||
|
||||
// OrdersAcceptStatus inspects a runtime record and returns the
|
||||
// matching sentinel for the user-games order/command pre-check:
|
||||
//
|
||||
// - `runtime_status = generation_in_progress` → `ErrTurnAlreadyClosed`.
|
||||
// The cron-driven `Scheduler.tick` has flipped the row before
|
||||
// calling the engine. The order window reopens once the tick
|
||||
// completes successfully.
|
||||
//
|
||||
// - `runtime_status ∈ {engine_unreachable, generation_failed,
|
||||
// stopped, finished, removed, starting}` → `ErrGamePaused`.
|
||||
// The game is not in a state that accepts writes; the lobby
|
||||
// state machine has either already flipped the game to
|
||||
// `paused` / `finished` or is still bootstrapping.
|
||||
//
|
||||
// - `runtime.Paused = true` → `ErrGamePaused`. The lobby admin
|
||||
// paused the game explicitly.
|
||||
//
|
||||
// - `runtime_status = running` and `Paused = false` → nil
|
||||
// (forward).
|
||||
func OrdersAcceptStatus(rec RuntimeRecord) error {
|
||||
if rec.Paused {
|
||||
return ErrGamePaused
|
||||
}
|
||||
switch rec.Status {
|
||||
case RuntimeStatusRunning:
|
||||
return nil
|
||||
case RuntimeStatusGenerationInProgress:
|
||||
return ErrTurnAlreadyClosed
|
||||
default:
|
||||
return ErrGamePaused
|
||||
}
|
||||
}
|
||||
|
||||
// EngineEndpoint returns the engine endpoint URL for gameID. Used by
|
||||
// the user game-proxy handlers.
|
||||
func (s *Service) EngineEndpoint(ctx context.Context, gameID uuid.UUID) (string, error) {
|
||||
@@ -812,6 +863,33 @@ func (s *Service) publishSnapshot(ctx context.Context, gameID uuid.UUID, state r
|
||||
return nil
|
||||
}
|
||||
|
||||
// publishFailureSnapshot forwards a runtime-failure observation to
|
||||
// lobby so the game lifecycle can react (e.g. flipping `running` to
|
||||
// `paused` on `engine_unreachable` / `generation_failed` per Phase
|
||||
// 25). The snapshot carries the unchanged `current_turn` because no
|
||||
// new turn has been produced; lobby uses the turn number to anchor
|
||||
// the `game.paused` idempotency key.
|
||||
//
|
||||
// The call is best-effort: lobby errors are returned to the caller
|
||||
// (the scheduler tick) so the warn-level logging stays in one place.
|
||||
// A missing runtime cache entry (e.g. the row was just removed by
|
||||
// the reconciler) collapses into a silent no-op.
|
||||
func (s *Service) publishFailureSnapshot(ctx context.Context, gameID uuid.UUID, runtimeStatus string) error {
|
||||
if s.deps.Lobby == nil {
|
||||
return nil
|
||||
}
|
||||
rec, ok := s.deps.Cache.GetRuntime(gameID)
|
||||
if !ok {
|
||||
return nil
|
||||
}
|
||||
return s.deps.Lobby.OnRuntimeSnapshot(ctx, gameID, LobbySnapshot{
|
||||
CurrentTurn: rec.CurrentTurn,
|
||||
RuntimeStatus: runtimeStatus,
|
||||
EngineHealth: "down",
|
||||
ObservedAt: s.deps.Now().UTC(),
|
||||
})
|
||||
}
|
||||
|
||||
// transitionRuntimeStatus updates the status / engine_health columns
|
||||
// and refreshes the cache.
|
||||
func (s *Service) transitionRuntimeStatus(ctx context.Context, gameID uuid.UUID, status, health string) (RuntimeRecord, error) {
|
||||
|
||||
@@ -60,6 +60,10 @@ func (h *UserGamesHandlers) Commands() gin.HandlerFunc {
|
||||
return
|
||||
}
|
||||
ctx := c.Request.Context()
|
||||
if err := h.runtime.CheckOrdersAccept(ctx, gameID); err != nil {
|
||||
respondGameProxyError(c, h.logger, "user games commands", ctx, err)
|
||||
return
|
||||
}
|
||||
mapping, err := h.runtime.ResolvePlayerMapping(ctx, gameID, userID)
|
||||
if err != nil {
|
||||
respondGameProxyError(c, h.logger, "user games commands", ctx, err)
|
||||
@@ -105,6 +109,10 @@ func (h *UserGamesHandlers) Orders() gin.HandlerFunc {
|
||||
return
|
||||
}
|
||||
ctx := c.Request.Context()
|
||||
if err := h.runtime.CheckOrdersAccept(ctx, gameID); err != nil {
|
||||
respondGameProxyError(c, h.logger, "user games orders", ctx, err)
|
||||
return
|
||||
}
|
||||
mapping, err := h.runtime.ResolvePlayerMapping(ctx, gameID, userID)
|
||||
if err != nil {
|
||||
respondGameProxyError(c, h.logger, "user games orders", ctx, err)
|
||||
@@ -257,6 +265,12 @@ func respondGameProxyError(c *gin.Context, logger *zap.Logger, op string, ctx co
|
||||
switch {
|
||||
case errors.Is(err, runtime.ErrNotFound):
|
||||
httperr.Abort(c, http.StatusNotFound, httperr.CodeNotFound, "no runtime mapping for this user/game")
|
||||
case errors.Is(err, runtime.ErrTurnAlreadyClosed):
|
||||
httperr.Abort(c, http.StatusConflict, httperr.CodeTurnAlreadyClosed,
|
||||
"turn already closed; orders are not accepted while the engine is producing")
|
||||
case errors.Is(err, runtime.ErrGamePaused):
|
||||
httperr.Abort(c, http.StatusConflict, httperr.CodeGamePaused,
|
||||
"game is paused; orders are not accepted until it resumes")
|
||||
case errors.Is(err, runtime.ErrConflict):
|
||||
httperr.Abort(c, http.StatusConflict, httperr.CodeConflict, err.Error())
|
||||
default:
|
||||
|
||||
@@ -23,6 +23,22 @@ const (
|
||||
CodeMethodNotAllowed = "method_not_allowed"
|
||||
CodeInternalError = "internal_error"
|
||||
CodeServiceUnavailable = "service_unavailable"
|
||||
|
||||
// CodeTurnAlreadyClosed marks a user-games command or order rejection
|
||||
// caused by the backend's turn-cutoff guard: the request arrived
|
||||
// after the active turn started generating (runtime status
|
||||
// `generation_in_progress` / `generation_failed` / `engine_unreachable`)
|
||||
// and the engine no longer accepts writes for the closing turn. The
|
||||
// caller is expected to wait for the next `game.turn.ready` push and
|
||||
// resubmit against the new turn.
|
||||
CodeTurnAlreadyClosed = "turn_already_closed"
|
||||
|
||||
// CodeGamePaused marks a user-games command or order rejection caused
|
||||
// by the lobby-side game lifecycle: the game is in `paused`,
|
||||
// `finished`, or any other status that does not accept writes. The
|
||||
// caller is expected to wait for the game to resume before
|
||||
// resubmitting.
|
||||
CodeGamePaused = "game_paused"
|
||||
)
|
||||
|
||||
// Body stores the inner `error` object of the standard envelope.
|
||||
|
||||
@@ -2314,9 +2314,10 @@ components:
|
||||
type: string
|
||||
description: |
|
||||
Stable machine-readable failure marker. The closed set is
|
||||
`not_implemented`, `invalid_request`, `unauthorized`, `not_found`,
|
||||
`conflict`, `method_not_allowed`, `internal_error`,
|
||||
`service_unavailable`.
|
||||
`not_implemented`, `invalid_request`, `unauthorized`,
|
||||
`forbidden`, `not_found`, `conflict`, `method_not_allowed`,
|
||||
`internal_error`, `service_unavailable`,
|
||||
`turn_already_closed`, `game_paused`.
|
||||
enum:
|
||||
- not_implemented
|
||||
- invalid_request
|
||||
@@ -2327,6 +2328,8 @@ components:
|
||||
- method_not_allowed
|
||||
- internal_error
|
||||
- service_unavailable
|
||||
- turn_already_closed
|
||||
- game_paused
|
||||
message:
|
||||
type: string
|
||||
description: Human-readable client-safe failure description.
|
||||
|
||||
Reference in New Issue
Block a user