Skip to content
Draft
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 61 additions & 18 deletions docs/bgp.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ pod IPs, service IPs, etc.).

This is the default mode. All nodes in the clusters form iBGP peering relationships with rest of the nodes forming a
full node-to-node mesh. Each node advertise the pod CIDR allocated to the nodes with its peers (the rest of the nodes in
the cluster). There is no configuration required in this mode. All the nodes in the cluster are associated with the
the cluster). There is no configuration required in this mode. All the nodes in the cluster are associated with the
private ASN 64512 implicitly (which can be configured with `--cluster-asn` flag) and users are transparent to use of
iBGP. This mode is suitable in public cloud environments or small cluster deployments.

Expand All @@ -30,7 +30,7 @@ kubectl annotate node <kube-node> "kube-router.io/node.asn=64512"

Only nodes within same ASN form full mesh. Two nodes with different ASNs never get peered.

### Route-Reflector setup Without Full Mesh
### Route-Reflector setup Without Full Mesh

This model supports the common scheme of using a Route Reflector Server node to concentrate peering from client peers.
This has the big advantage of not needing full mesh, and will scale better. In this mode kube-router expects each node
Expand Down Expand Up @@ -75,11 +75,45 @@ For example:

### Node Specific External BGP Peers

Alternatively, each node can be configured with one or more node specific BGP peers. Information regarding node specific
BGP peer is read from node API object annotations:
Each node can be configured with one or more node specific BGP peers using the `kube-router.io/peers` node annotation.
Previously, these settings were configured using individual `kube-router.io/peer.*` annotations.
While these individual annotations are still supported, they're now deprecated and
will be removed in a future release.

#### Using Consolidated Annotation

The `kube-router.io/peers` annotation accepts peer configurations in YAML format with the following fields:

- `remoteip` (required): The IP address of the peer
- `remoteasn` (required): The ASN of the peer
- `localip` (optional): Local IP address to use for this peer connection
- `password` (optional): Base64 encoded password for BGP authentication
- `port` (optional): BGP port (defaults to 179 if not specified)

```shell
kubectl annotate node <kube-node> \
kube-router.io/peers="$(cat <<'EOF'
- remoteip: 192.168.1.99
remoteasn: 65000
password: U2VjdXJlUGFzc3dvcmQK,
- remoteip: 192.168.1.100
remoteasn: 65000'
password: U2VjdXJlUGFzc3dvcmQK,
EOF
)"
```

#### Using Individual Annotations (Deprecated)

> **NOTE:** The individual peer annotations listed below are deprecated in favor of the consolidated `kube-router.io/peers`
> annotation. They are maintained for backward compatibility but will be removed in a future release.

Node-specific BGP peer configs can also be set via individual node API object annotations:

- `kube-router.io/peer.ips`
- `kube-router.io/peer.asns`
- `kube-router.io/peer.passwords`
- `kube-router.io/peer.localips`

For example, users can annotate node object with below commands:

Expand All @@ -106,26 +140,23 @@ kubectl annotate node <kube-node> "kube-router.io/path-prepend.repeat-n=5"

### BGP Peer Local IP configuration

In some setups it might be desirable to set a local IP address used for connecting external BGP peers. This can be
accomplished on nodes with annotations:
In some setups it might be desirable to set a local IP address used for connecting external BGP peers.

- `kube-router.io/peer.localips`

If set, this must be a list with a local IP address for each peer, or left empty to use nodeIP.
When using the `kube-router.io/peers` annotation, specify the `localip` field for each peer as shown in the
[Node Specific External BGP Peers](#node-specific-external-bgp-peers) section above.

Example:
When using individual annotations, you can specify the local IP address using `kube-router.io/peer.localips`:

```shell
kubectl annotate node <kube-node> "kube-router.io/peer.localips=10.1.1.1,10.1.1.2"
```

This will instruct kube-router to use IP `10.1.1.1` for first BGP peer as a local address, and use `10.1.1.2`for the
second.
If set, this must be a list with a local IP address for each peer, or left empty to use nodeIP.

### BGP Peer Password Authentication

The examples above have assumed there is no password authentication with BGP peer routers. If you need to use a password
for peering, you can use the `--peer-router-passwords` command-line option, the `kube-router.io/peer.passwords` node
If you need to use a password for peering with BGP peer routers, you can configure it using the `kube-router.io/peers`
annotation, the `--peer-router-passwords` command-line option, the deprecated `kube-router.io/peer.passwords` node
annotation, or the `--peer-router-passwords-file` command-line option.

#### Base64 Encoding Passwords
Expand All @@ -142,7 +173,15 @@ U2VjdXJlUGFzc3dvcmQ=

#### Password Configuration Examples

In this CLI flag example the first router (192.168.1.99) uses a password, while the second (192.168.1.100) does not.
**Using the consolidated annotation (recommended):**

When using the `kube-router.io/peers` annotation, specify the `password` field with a base64 encoded password for each
peer that requires authentication. See the
[Node Specific External BGP Peers](#node-specific-external-bgp-peers) section for an example.

**Using CLI flags:**

In this example the first router (192.168.1.99) uses a password, while the second (192.168.1.100) does not:

```sh
--peer-router-ips="192.168.1.99,192.168.1.100"
Expand All @@ -152,14 +191,18 @@ In this CLI flag example the first router (192.168.1.99) uses a password, while

Note the comma indicating the end of the first password.

Here's the same example but configured as node annotations:
**Using individual annotations (deprecated):**

Here's the same example but configured with individual node annotations:

```shell
kubectl annotate node <kube-node> "kube-router.io/peer.ips=192.168.1.99,192.168.1.100"
kubectl annotate node <kube-node> "kube-router.io/peer.asns=65000,65000"
kubectl annotate node <kube-node> "kube-router.io/peer.passwords=U2VjdXJlUGFzc3dvcmQK,"
```

**Using a password file:**

Finally, to include peer passwords as a file you would run kube-router with the following option:

```shell
Expand All @@ -168,8 +211,8 @@ Finally, to include peer passwords as a file you would run kube-router with the
--peer-router-passwords-file="/etc/kube-router/bgp-passwords.conf"
```

The password file, closely follows the syntax of the command-line and node annotation options.
Here, the first peer IP (192.168.1.99) would be configured with a password, while the second would not.
The password file closely follows the syntax of the command-line and node annotation options.
Here, the first peer IP (192.168.1.99) would be configured with a password, while the second would not:

```sh
U2VjdXJlUGFzc3dvcmQK,
Expand Down
3 changes: 2 additions & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ require (
github.com/ccoveille/go-safecast v1.6.1
github.com/coreos/go-iptables v0.8.0
github.com/docker/docker v28.4.0+incompatible
github.com/goccy/go-yaml v1.18.0
github.com/google/go-cmp v0.7.0
github.com/hashicorp/go-version v1.7.0
github.com/moby/ipvs v1.1.0
github.com/onsi/ginkgo v1.16.5
Expand Down Expand Up @@ -65,7 +67,6 @@ require (
github.com/go-openapi/swag v0.23.0 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/google/gnostic-models v0.7.0 // indirect
github.com/google/go-cmp v0.7.0 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/hashicorp/hcl v1.0.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ github.com/go-task/slim-sprig/v3 v3.0.0 h1:sUs3vkvUymDpBKi3qH1YSqBQk9+9D/8M2mN1v
github.com/go-task/slim-sprig/v3 v3.0.0/go.mod h1:W848ghGpv3Qj3dhTPRyJypKRiqCdHZiAzKg9hl15HA8=
github.com/go-test/deep v1.1.0 h1:WOcxcdHcvdgThNXjw0t76K42FXTU7HpNQWHpA2HHNlg=
github.com/go-test/deep v1.1.0/go.mod h1:5C2ZWiW0ErCdrYzpqxLbTX7MG14M9iiw8DgHncVwcsE=
github.com/goccy/go-yaml v1.18.0 h1:8W7wMFS12Pcas7KU+VVkaiCng+kG8QiFeFwzFb+rwuw=
github.com/goccy/go-yaml v1.18.0/go.mod h1:XBurs7gK8ATbW4ZPGKgcbrY1Br56PdM69F7LkFRi1kA=
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
Expand Down
205 changes: 205 additions & 0 deletions pkg/bgp/peer_config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
package bgp

import (
"errors"
"fmt"
"net"
"strconv"

"github.com/cloudnativelabs/kube-router/v2/pkg/options"
"github.com/cloudnativelabs/kube-router/v2/pkg/utils"
"github.com/goccy/go-yaml"
)

type PeerConfig struct {
LocalIP *string `yaml:"localip"`
Password *utils.Base64String `yaml:"password"`
Port *uint32 `yaml:"port"`
RemoteASN *uint32 `yaml:"remoteasn"`
RemoteIP *net.IP `yaml:"remoteip"`
}

func (p *PeerConfig) UnmarshalYAML(raw []byte) error {
tmp := struct {
LocalIP *string `yaml:"localip"`
Password *utils.Base64String `yaml:"password"`
Port *uint32 `yaml:"port"`
RemoteASN *uint32 `yaml:"remoteasn"`
RemoteIP string `yaml:"remoteip"`
}{}

if err := yaml.Unmarshal(raw, &tmp); err != nil {
return fmt.Errorf("failed to unmarshal peer config: %w", err)
}

p.LocalIP = tmp.LocalIP
p.Password = tmp.Password
p.Port = tmp.Port
p.RemoteASN = tmp.RemoteASN

if tmp.RemoteIP != "" {
ip := net.ParseIP(tmp.RemoteIP)
if ip == nil {
return fmt.Errorf("%s is not a valid IP address", tmp.RemoteIP)
}
p.RemoteIP = &ip
}
return nil
}

type PeerConfigs []PeerConfig

func (p PeerConfigs) LocalIPs() []string {
localIPs := make([]string, 0)
for _, cfg := range p {
if cfg.LocalIP != nil {
localIPs = append(localIPs, *cfg.LocalIP)
}
}
return localIPs
}

// Returns b64 decoded passwords
func (p PeerConfigs) Passwords() []string {
passwords := make([]string, 0)
for _, cfg := range p {
if cfg.Password != nil {
passwords = append(passwords, string(*cfg.Password))
}
}
return passwords
}

func (p PeerConfigs) Ports() []uint32 {
ports := make([]uint32, 0)
for _, cfg := range p {
if cfg.Port != nil {
ports = append(ports, *cfg.Port)
}
}
return ports
}

func (p PeerConfigs) RemoteASNs() []uint32 {
asns := make([]uint32, 0)
for _, cfg := range p {
if cfg.RemoteASN != nil {
asns = append(asns, *cfg.RemoteASN)
}
}
return asns
}

func (p PeerConfigs) RemoteIPs() []net.IP {
remoteIPs := make([]net.IP, 0)
for _, cfg := range p {
if cfg.RemoteIP != nil {
remoteIPs = append(remoteIPs, *cfg.RemoteIP)
}
}
return remoteIPs
}

func (p PeerConfigs) RemoteIPStrings() []string {
remoteIPs := make([]string, 0)
for _, cfg := range p {
if cfg.RemoteIP != nil {
remoteIPs = append(remoteIPs, cfg.RemoteIP.String())
}
}
return remoteIPs
}

func (p *PeerConfigs) UnmarshalYAML(raw []byte) error {
type tmpPeerConfigs PeerConfigs
tmp := (*tmpPeerConfigs)(p)

if err := yaml.Unmarshal(raw, tmp); err != nil {
return err
}

return p.Validate()
}

func (p PeerConfigs) Validate() error {
return validatePeerConfigs(p.RemoteIPStrings(), p.RemoteASNs(), p.Ports(), p.Passwords(), p.LocalIPs(), "")
}

func NewPeerConfigs(
remoteIPs []string,
remoteASNs []uint32,
ports []uint32,
b64EncodedPasswords []string,
localIPs []string,
localAddress string,
) (PeerConfigs, error) {
if err := validatePeerConfigs(remoteIPs, remoteASNs, ports, b64EncodedPasswords, localIPs, localAddress); err != nil {
return nil, err
}

peerCfgs := make(PeerConfigs, len(remoteIPs))
for i, remoteIP := range remoteIPs {
ip := net.ParseIP(remoteIP)
if ip == nil {
return nil, fmt.Errorf("invalid IP address: %s", remoteIP)
}
peerCfgs[i].RemoteIP = &ip
peerCfgs[i].RemoteASN = &remoteASNs[i]

if len(ports) != 0 {
peerCfgs[i].Port = &ports[i]
}

if len(b64EncodedPasswords) != 0 {
pw := utils.Base64String(b64EncodedPasswords[i])
peerCfgs[i].Password = &pw
}

if len(localIPs) != 0 && localIPs[i] != "" {
peerCfgs[i].LocalIP = &localIPs[i]
}
}

return peerCfgs, nil
}

func validatePeerConfigs(
remoteIPs []string,
remoteASNs []uint32,
ports []uint32,
b64EncodedPasswords []string,
localIPs []string,
localAddress string,
) error {
if len(remoteIPs) != len(remoteASNs) {
return errors.New("invalid peer router config, the number of IPs and ASN numbers must be equal")
}
if len(remoteIPs) != len(b64EncodedPasswords) && len(b64EncodedPasswords) != 0 {
return errors.New("invalid peer router config. The number of passwords should either be zero, or " +
"one per peer router. Use blank items if a router doesn't expect a password. Example: \"pass,,pass\" " +
"OR [\"pass\",\"\",\"pass\"]")
}
if len(remoteIPs) != len(ports) && len(ports) != 0 {
return fmt.Errorf("invalid peer router config. The number of ports should either be zero, or "+
"one per peer router. If blank items are used, it will default to standard BGP port, %s. "+
"Example: \"port,,port\" OR [\"port\",\"\",\"port\"]", strconv.Itoa(options.DefaultBgpPort))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return an error here, then we wouldn't actually use the default option right? I believe that returning an error here would mean that the BGP server would fail to start in startBGPServer() right?

Its possible that I misunderstood the code path here, but if I'm right, then we may want to change the helpers above to have an else statement so that we always have a default. Then remove the default blurb from this message as at this point we'll already have the default?

Alternatively we could turn these into a warn or an info log rather than an error.

Copy link
Author

@catherinetcai catherinetcai Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked back at the original implementation and while I think this validation causes there to be an earlier exit, the end result seems to ultimately be the same.

Going to just write out my interpretation of the original logic since it's very likely I'm just reading this incorrectly:

  • We grab the list of peer ports here
  • We then pass that list of peer ports (along with everything else) to newGlobalPeers. This is where I lifted the validation logic from. This does the same length check and returns an error if the length of the ports don't match the length of IPs.
  • If an error is returned from newGlobalPeers, we end up calling StopBgp

I wonder based off of this comment if the original implementation had a bug and was supposed to actually look like this:

// Default to default BGP port if port annotation is not found
var peerPorts = make([]uint32, len(peerIPs))

for i := range peerPorts {
  peerPorts[i] = options.DefaultBgpPort
}

Or - the validation was supposed to warn (like you pointed out) and not actually return an error, since later in newGlobalPeers, we do construct the Peer structs with a default port.


Anyway - I think this might be a moot point since I don't think this was desirable behavior to begin with, so I'll update this MR to follow what you were suggesting 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked back at the original implementation and while I think this validation causes there to be an earlier exit, the end result seems to ultimately be the same.

Going to just write out my interpretation of the original logic since it's very likely I'm just reading this incorrectly:

* We grab the list of peer ports [here](https://github.com/cloudnativelabs/kube-router/blob/00b46196001975939be6c1672f42a1f8e7990cc9/pkg/controllers/routing/network_routes_controller.go#L1134-L1148)

* We then pass that list of peer ports (along with everything else) to [newGlobalPeers](https://github.com/cloudnativelabs/kube-router/blob/00b46196001975939be6c1672f42a1f8e7990cc9/pkg/controllers/routing/bgp_peers.go#L303-L307). This is where I lifted the validation logic from. This does the same length check and returns an error if the length of the ports don't match the length of IPs.

* If an error is returned from newGlobalPeers, we end up [calling StopBgp](https://github.com/cloudnativelabs/kube-router/blob/00b46196001975939be6c1672f42a1f8e7990cc9/pkg/controllers/routing/network_routes_controller.go#L1199-L1206)

I wonder based off of this comment if the original implementation had a bug and was supposed to actually look like this:

// Default to default BGP port if port annotation is not found
var peerPorts = make([]uint32, len(peerIPs))

for i := range peerPorts {
  peerPorts[i] = options.DefaultBgpPort
}

Or - the validation was supposed to warn (like you pointed out) and not actually return an error, since later in newGlobalPeers, we do construct the Peer structs with a default port.

Anyway - I think this might be a moot point since I don't think this was desirable behavior to begin with, so I'll update this MR to follow what you were suggesting 😄

Ah never mind. You can ignore me. I'm just apparently arguing with myself now. The thing I was missing again was that the original implementation meant you had peer ports in the form of 179,,179 to make it pass the length check. Sorry - I'm finally picking up what you're putting down now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I added a few more unit test cases to the PeerConfig/PeerConfigs and I think I've gotten the logic to be correctly backwards compatible now by doing the following on PeerConfigs:

  • Ports() returns the default port if it's not set on a PeerConfig
  • LocalIPs() will return an empty string if it's not set on a PeerConfig

This means that the validation should now behave correctly when it comes to the length checks and won't return an error.

}
if len(remoteIPs) != len(localIPs) && len(localIPs) != 0 {
return fmt.Errorf("invalid peer router config. The number of localIPs should either be zero, or "+
"one per peer router. If blank items are used, it will default to nodeIP, %s. "+
"Example: \"10.1.1.1,,10.1.1.2\" OR [\"10.1.1.1\",\"\",\"10.1.1.2\"]", localAddress)
}

for _, asn := range remoteASNs {
if (asn < 1 || asn > 23455) &&
(asn < 23457 || asn > 63999) &&
(asn < 64512 || asn > 65534) &&
(asn < 131072 || asn > 4199999999) &&
(asn < 4200000000 || asn > 4294967294) {
return fmt.Errorf("reserved ASN number \"%d\" for global BGP peer",
asn)
}
}

return nil
}
Loading