Exploding Large Go Structs

May 3

One of the more useful ways of examining a piece of software is to look at the marginal cost of changing it.

Consider this program. Given an employee ID, it collects information about them from an HR portal, a chat program, and a site where employees discuss their favorite ice cream flavors. It prints that information to stdout:

/*somecompany/profile/main.go

Usage: go run main.go EMPLOYEE
Example: go run main.go @jturner

This program aggregates employee profile information from our HR portal, Chat, and our internal Ice Cream Preferences service and prints this information to stdout.
*/
import (
    "fmt"
    "os"

    "somecompany/profile/hrportal"
    "somecompany/profile/icecream"
    "somecompany/profile/config"
    "somecompany/profile/chat"
)

func main() {
    conf := config.Load()
    employee := os.Args(1)
    out := lookupProfile(conf, employee)
    fmt.Println(out)
}

func lookupProfile(conf config.Config, employee string) string {
    hrClient := getHRClient(conf)
    chatClient := getChatClient(conf)
    iceCreamClient := getIceCreamClient(conf)

    hrInfo := hrClient.GetInfo(employee)
    chatInfo := chatClient.GetInfo(employee)
    iceCreamInfo := iceCreamClient.GetInfo(employee)
    return combine(hrInfo, chatInfo, iceCreamInfo)
}

func getHRClient(conf config.Config) *hrportal.Client {
    return &hrportal.Client{Username: conf.Username, password: conf.Password}
}

func getChatClient(conf config.Config) *chat.Client {
    return &chat.Client{Token: conf.ChatToken}
}

func getIceCreamClient(conf config.Config) *icecream.Client {
    return &icecream.Client{Username: conf.IceCreamUsername, Password: conf.IceCreamPassword}
}

func combine(hr, chat, iceCreamPref string) string {
    fmtString := `
    HR Profile:
        %s
    Chat Profile Information:
        %s
    Ice Cream Preferences:
        %s
    `
    return fmt.Sprintf(fmtString, hr, chat, iceCreamPref)
}

What's the most likely way that we'd need to change this program? I'd bet that it would be adding a new data source to pull employee information from. To add a new data source using the currently established pattern, we would:

Add new fields to the Config struct containing auth information for the new data source
Add a getNewDataSource(config.Config) *DataSourceClient function
Extract the fields from that Config struct that our new data source needs and use them to construct the client
Call the getNewDataSource function from the lookupProfile function
Add a call to GetInfo or a similar function on the new client
Add that new info to the combine function

I'd quantify the marginal cost of adding a new data source as the sum of:

Adding a new thing that depends on the Config struct
Adding a dependency to lookupProfile
Some amount of code that will have to change if we decide to rethink how we add data sources
However you want to quantify the fact that we made lookupProfile harder to test

This cost is probably acceptable if you aren't going to add new data sources often, but it gets steep quickly. Adding a single new data source like this is easy. At ten new data sources the Config will probably start to be very large and bugs in lookupProfile will become hard to fix. At 100 new data sources you'll probably have to start greping the fields in Config, and the fact that you can't easily test lookupProfile will become an impediment to confidently shipping code. At 1000... you get the idea.

If we wanted to reduce the marginal cost of adding a new data source, there are two things we need to change: we need to loosen the coupling with the Config struct, and we need to make lookupProfile easier to test. We can do this by first "exploding" the Config struct, and then rethinking how we handle dependencies in lookupProfile.

Exploding the Config Struct

"Exploding" a struct is my term for refactoring functions to explicitly accept a subset of a struct's fields instead of the whole struct. I call it "exploding" because in my mind we're taking a large struct, blowing it into pieces, and then distributing those pieces to the things that need them. If this makes no sense to you, it's because explaining one's mental models is hard, and brains are weird sometimes. Just roll with it, and feel free to come up with your own term for this pattern. Or if you prefer, there's almost certainly already a name for it in refactoring books, and you should feel free to seek out and use that instead.

Especially in cases where a struct is passed through different callers, I find it helpful to start by writing down the "tree" of those function calls. This allows me to start "exploding" the struct at the leafs of the tree, and then follow compiler error messages until I get to the root of the tree where the struct is instantiated. For the config, that would be something like:

main (config created)
|-- lookupProfile(config)
|-- |-- getIceCreamClient(config) <- start here...
|-- |-- getChatClient(config) <- ...or here...
|-- |-- getHRClient(config) <- ...or here

To "explode" the struct for a function, simply change the function arguments from the struct to the struct fields that the function needs. For getIceCreamClient, we would do:

// Before
func getIceCreamClient(conf config.Config) *icecream.Client {
    return &icecream.Client{Username: conf.IceCreamUsername, Password: conf.IceCreamPassword}
}

// After
func getIceCreamClient(username, password string) *icecream.Client {
    return &icecream.Client{Username: username, Password: password}
}

And then do the same for the other leafs:

// Before
func getChatClient(conf config.Config) *chat.Client {
    return &chat.Client{Token: conf.ChatToken}
}

// After
func getChatClient(token string) *chat.Client {
    return &chat.Client{Token: token}
}

// Before
func getHRClient(conf config.Config) *hrportal.Client {
    return &hrportal.Client{Username: conf.Username, password: conf.Password}
}

// After
func getHRClient(username, password string) *hrportal.Client {
    return &hrportal.Client{Username: username, password: password}
}

Now the compiler is yelling at us. Let's make it happy by updating the call sites of those functions.

// Before
func lookupProfile(conf config.Config, employee string) string {
    hrClient := getHRClient(conf)
    chatClient := getChatClient(conf)
    iceCreamClient := getIceCreamClient(conf)
    // snip
}

// After
func lookupProfile(conf config.Config, employee string) string {
    hrClient := getHRClient(conf.HRUsername, conf.HRPassword)
    chatClient := getChatClient(conf.ChatToken)
    iceCreamClient := getIceCreamClient(conf.IceCreamUsername, conf.IceCreamPassword)
    // snip
}

We want to "explode" the Config all the way up the tree until it's only referenced where it's created, so let's update lookupProfile next:

// Before
func lookupProfile(conf config.Config, employee string) string {
    hrClient := getHRClient(conf.HRUsername, conf.HRPassword)
    chatClient := getChatClient(conf.ChatToken)
    iceCreamClient := getIceCreamClient(conf.IceCreamUsername, conf.IceCreamPassword)
    // snip
}

// After
func lookupProfile(hrUsername, hrPassword, chatToken, iceCreamUsername, iceCreamPassword, employee string) string {
    hrClient := getHRClient(hrUsername, hrPassword)
    chatClient := getChatClient(chatToken)
    iceCreamClient := getIceCreamClient(iceCreamUsername, iceCreamPassword)
    // snip
}

And update the call site:

// Before
func main() {
    conf := config.Load()
    employee := os.Args(1)
    out := lookupProfile(conf, employee)
    fmt.Println(out)
}

// After
func main() {
    conf := config.Load()
    employee := os.Args(1)
    out := lookupProfile(conf.HRUsername, conf.HRPassword, conf.ChatToken, conf.IceCreamUsername, conf.IceCreamPassword, employee)
    fmt.Println(out)
}

We're now left with this code:

func main() {
    conf := config.Load()
    employee := os.Args(1)
    out := lookupProfile(conf.HRUsername, conf.HRPassword, conf.ChatToken, conf.IceCreamUsername, conf.IceCreamPassword, employee)
    fmt.Println(out)
}

func lookupProfile(hrUsername, hrPassword, chatToken, iceCreamUsername, iceCreamPassword, employee string) string {
    hrClient := getHRClient(hrUsername, hrPassword)
    chatClient := getChatClient(chatToken)
    iceCreamClient := getIceCreamClient(iceCreamUsername, iceCreamPassword)

    hrInfo := hrClient.GetInfo(employee)
    chatInfo := chatClient.GetInfo(employee)
    iceCreamInfo := iceCreamClient.GetInfo(employee)
    return combine(hrInfo, chatInfo, iceCreamInfo)
}

func getHRClient(username, password string) *hrportal.Client {
    return &hrportal.Client{Username: username, password: password}
}

func getChatClient(token string) *chat.Client {
    return &chat.Client{Token: token}
}

func getIceCreamClient(username, password string) *icecream.Client {
    return &icecream.Client{Username: username, Password: password}
}

func combine(hr, chat, iceCreamPref string) string {
    fmtString := `
    HR Profile:
        %s
    Chat Profile Information:
        %s
    Ice Cream Preferences:
        %s
    `
    return fmt.Sprintf(fmtString, hr, chat, iceCreamPref)
}

Reassessing

With every refactor, we want to ask "what have we improved?" and "what are the tradeoffs that we've made?"

What have we improved? The biggest improvement is that the dependencies of our functions are clearer. Most of the functions that accepted a Config didn't actually need the Config, they just needed some strings from it. This refactor has made that fact more explicit, which in turn has mode those functions more portable.

What trade offs have we made? The biggest one is that most of our functions now require more parameters, which makes them slightly more cumbersome to type. In particular, lookupProfile has expanded from two arguments to six. Worse, it has traded taking a richer object for a bunch of primitives, a code smell sometimes referred to as primitive obsession. So while this refactor has reduced the marginal cost of adding a new data source by removing the pattern of everything depending on Config, it has made our code worse from the view of lookupProfile. And, we still can't test lookupProfile easily. Let's do another iteration and see if we can't improve it some more.

Inverting dependencies

lookupProfile depends on three clients, one for each of the data sources it pulls information from. In a small program like this, creating those dependencies inside of the function that depends on them is fine. It'd be perfectly reasonable to back up a step, change the signature to be lookupProfile(conf config.Config, employee string), and move on with our lives. However, if we ever wanted to test lookupProfile without making real API calls, we'd have to mock out the behavior of getHRClient, etc. Go (thankfully) makes messing with the internals of a function extremely difficult. The normal way that you work around this in Go is using a technique that I've heard referred to as both "dependency inversion" and "dependency injection." I prefer the former because "dependency injection" is often associated with Dependency Injection (DI) frameworks in languages like Java, which are ~~great for making your code incomprehensible~~ a different topic than the technique discussed here. "Dependency inversion" says that if our function, class, struct, etc. depends on something, that dependency should be supplied by the caller. We can apply that technique here by having lookupProfile accept the clients that it needs rather than constructing them:

import (
    "fmt"
    "os"

    "somecompany/profile/hrportal"
    "somecompany/profile/chat"
    "somecompany/profile/config"
    "somecompany/profile/icecream"
)

func main() {
    employee := os.Args(1)
    conf := config.Load()
    hrPortalClient := &hrportal.Client{Username: conf.HRUsername, Password: conf.HRPassword}
    chatClient := &chat.Client{conf.ChatToken}
    iceCreamClient := &icecream.Client{conf.IceCreamUsername, conf.IceCreamPassword}
    out := lookupProfile(hrPortalClient, chatClient, iceCreamClient, employee)
    fmt.Println(out)
}

func lookupProfile(hrClient *hrportal.Client, chatClient *chat.Client, iceCreamClient *icecream.Client, employee string) {
    hrInfo := hrClient.GetInfo(employee)
    chatInfo := chatClient.GetInfo(employee)
    iceCreamInfo := iceCreamClient.GetInfo(employee)
    return combine(hrInfo, chatInfo, iceCreamInfo)
}

Reassessing again

What have we improved? lookupProfile is now testable with mock clients. More importantly, removing the dependency creation code makes it clear that each client is doing the same thing. This can be useful information if we're trying to add a new client. It even hints that an InfoGetter interface might be useful in the future. Removing this code also reveals that lookupProfile might not be necessary. If we don't need the portability that a function provides then it's an unnecessary layer of indirection.

The code that most of us work in every day is likely to be quite a bit more complex than this example. Even so, the heuristic is the same: a large struct depended on by many functions, none of which use the whole struct, and some of which only pass the struct to other functions. The next time you see code like that, try exploding the struct and see if the result looks better. If it doesn't, that's what git reset is for. If you're not sure, ask another maintainer of the codebase. After all, evolving a codebase is a team sport.

John Turner

Exploding Large Go Structs

Exploding the Config Struct

Reassessing

Inverting dependencies

Reassessing again

Just Send The Message

Using C4 Diagrams to Model Reliability