isle/tasks/bugs/garage-layout-management.md

---
type: task
---

## Problem

There are high-level but extremely problematic issues with how garage layout
management is being done.

In general the strategy around layout management is that each host only modifies
the cluster layout related to itself, and never touches applied roles of other
hosts. This works great for all except one case: a host removing one or more of
its allocations.

There are two separate issues which must be dealt with, each related partially
to the other.

## Draining of garage data

When a garage node is removed from the cluster it first goes into the "draining"
state, so that other nodes in the cluster can ensure that the replication factor
for each piece of data is met prior to the node being decommissioned.

While the node is in draining state it cannot be used for S3 API calls, as the
bucket credentials are no longer present on it.

## Configuration change on restart

For hosts whose configuration is managed by `daemon.yml` it is not necessarily
known that a garage node used to exist at all upon restart. The host can't
investigate the cluster layout because it won't have a garage instance running,
and even if it could it wouldn't be able to bring up a garage node to properly
drain the old allocations.

# Invalid Solutions

One solution which is tempting but ultimately NOT viable is to make all hosts
run at least one garage instance, and if they have no storage allocations to
make that instance be a "gateway" instance. This is won't work though, because
it would require all hosts to open up the RPC port on their firewall, and
firewall management requires extra user involvement.

Another previous solution was to use an "orphan remover" process on each host,
where the host would compare the garage cluster layout to the expected layout
based on the bootstrap data in the common bucket, and remove any hosts from the
layout which shouldn't be there and don't have a garage instance to remove
themselves with. This had a bunch of unresolveable race conditions, and it
didn't account for draining besides.

# Possible Solution

The solution seems to be that the host must maintain two views of its garage
allocations: the last known allocation state, and the desired allocation state.

The last known state needs to contain both what state the allocation was in
(healthy or draining), along with its directories and capacity. This should get
updated anytime the host performs an action which changes it (modifying the
cluster layout to add a new instance or move an existing one to draining, or
actually removing an instance which is done draining).

The desired state is essentially the network configuration as it is now. This
will be used along with the last known state to take actions.

There are a few details to note with this solution:

- There will need to be a worker which periodically checks the last known state
  for any nodes which were draining, and if they are done draining then remove
  them.

- When the host starts up it should _always_ use the last known state, and only
  once started up should it go to apply the desired configuration.

- When choosing an admin endpoint to use the last known state should be used,
  even though it might result in unexpected behavior from the user's perspective
  (since the user only knows about the desired state). This applies for RPC
  endpoints as well.

- The last/desired states need to be checked for conflicts, and an error emitted
  in the event that there is one (either returned from SetConfig or Load). This
  includes a new allocation using the same directory as an old one (based on RPC
  port), or two allocations using the same RPC port.

- The nebula firewall must base its opened ports on the last known state rather
  than desired state.
Get rid of orphan removal after investigation into race conditions, new solution pending 2025-01-02 13:08:24 +00:00			`---`
			`type: task`
			`---`

			`## Problem`

			`There are high-level but extremely problematic issues with how garage layout`
			`management is being done.`

			`In general the strategy around layout management is that each host only modifies`
			`the cluster layout related to itself, and never touches applied roles of other`
			`hosts. This works great for all except one case: a host removing one or more of`
			`its allocations.`

			`There are two separate issues which must be dealt with, each related partially`
			`to the other.`

			`## Draining of garage data`

			`When a garage node is removed from the cluster it first goes into the "draining"`
			`state, so that other nodes in the cluster can ensure that the replication factor`
			`for each piece of data is met prior to the node being decommissioned.`

			`While the node is in draining state it cannot be used for S3 API calls, as the`
			`bucket credentials are no longer present on it.`

			`## Configuration change on restart`

			For hosts whose configuration is managed by `daemon.yml` it is not necessarily
			`known that a garage node used to exist at all upon restart. The host can't`
			`investigate the cluster layout because it won't have a garage instance running,`
			`and even if it could it wouldn't be able to bring up a garage node to properly`
			`drain the old allocations.`

			`# Invalid Solutions`

			`One solution which is tempting but ultimately NOT viable is to make all hosts`
			`run at least one garage instance, and if they have no storage allocations to`
			`make that instance be a "gateway" instance. This is won't work though, because`
			`it would require all hosts to open up the RPC port on their firewall, and`
			`firewall management requires extra user involvement.`

			`Another previous solution was to use an "orphan remover" process on each host,`
			`where the host would compare the garage cluster layout to the expected layout`
			`based on the bootstrap data in the common bucket, and remove any hosts from the`
			`layout which shouldn't be there and don't have a garage instance to remove`
			`themselves with. This had a bunch of unresolveable race conditions, and it`
			`didn't account for draining besides.`

			`# Possible Solution`

			`The solution seems to be that the host must maintain two views of its garage`
			`allocations: the last known allocation state, and the desired allocation state.`

			`The last known state needs to contain both what state the allocation was in`
			`(healthy or draining), along with its directories and capacity. This should get`
			`updated anytime the host performs an action which changes it (modifying the`
			`cluster layout to add a new instance or move an existing one to draining, or`
			`actually removing an instance which is done draining).`

			`The desired state is essentially the network configuration as it is now. This`
			`will be used along with the last known state to take actions.`

			`There are a few details to note with this solution:`

			`- There will need to be a worker which periodically checks the last known state`
			`for any nodes which were draining, and if they are done draining then remove`
			`them.`

			`- When the host starts up it should _always_ use the last known state, and only`
			`once started up should it go to apply the desired configuration.`

			`- When choosing an admin endpoint to use the last known state should be used,`
			`even though it might result in unexpected behavior from the user's perspective`
			`(since the user only knows about the desired state). This applies for RPC`
			`endpoints as well.`

			`- The last/desired states need to be checked for conflicts, and an error emitted`
			`in the event that there is one (either returned from SetConfig or Load). This`
			`includes a new allocation using the same directory as an old one (based on RPC`
			`port), or two allocations using the same RPC port.`

			`- The nebula firewall must base its opened ports on the last known state rather`
			`than desired state.`