Practical SOA / microservices - Hydration - Part 2
In part 1 we saw how we can use hydration, a modern form of edge side include, to bring together data from multiple services. In this part we'll look at the code. This isn't an end-to-end solution that you can just drop into your system. It's just something to get you headed in the right direction.
Responses
We actually have to consider four distinct types of responses:
- cached + hydrated
- uncached + hydrated
- cached + unhydrated
- uncached + gunhydrated
The request that stands apart from the others is uncached+unhydrated. It's unique because it's the only one that doesn't need us to explicitly read the data into memory -- we can pipe one end of to the other. I'm actually not going to show how to do that because it adds quite a bit of complexity, and we can simply use the cached+unhydrated model response type. We might pay the price of an extra copy, but that's nothing. (To be clear, I'd have to think hard about how to implement this cleanly, I've never handled this case specifically, and it's never been a problem.)
First we want a Response
interface:
type Response interface {
Header() http.Header
Status() int
Body() []byte
}
This is the type that actually gets returned out of the middleware chain to the main handler. A very simple handler might look like:
func handler(output http.ResponseWriter, req *http.Request) {
response, err := Proxy(req)
if err != nil {
log.Println(err.Error())
output.WriteHeader(500)
return
}
for k, v := range response.Header() {
response.Header()[k] = v
}
body := response.Body()
output.Header()["Content-Length"] = []string{strconv.Itoa(len(body))}
output.WriteHeader(response.Status())
output.Write(body)
}
The proxy would really be a chain of middlewares which handle logs and maybe authentication, caching and so on. I won't actually show the caching here, but I'll make it so that data is cacheable. To keep this relatively simple though, we'll just have a single "middleware" called Proxy
:
func Proxy(req *http.Request) (Response, error) {
upstreamRequest := createRequest(req)
response, err := http.DefaultClient.Do(upstreamRequest)
if err != nil {
return nil, err
}
readResponse := NewNormalResponse(response)
hydrateField := response.Header.Get("X-Hydrate")
if len(hydrateField) == 0 {
return readResponse, nil
}
return NewHydrateResponse(readResponse, hydrateField)
}
From the above code, all we need to do is complete NewNormalResponse
and NewHydrateResponse
functions and we're done. (createRequest
converts the incoming *http.Request
into the request to send to your service; it's your routing logic.)
NormalResponse
I'll start with handling the case where we aren't hydrating the data. This works for both cachable and uncacheable requests. The NormalResponse
implementation is simple:
type NormalResponse struct {
status int
header http.Header
body []byte
}
func (r *NormalResponse) Header() http.Header {
return r.header
}
func (r *NormalResponse) Status() int {
return r.status
}
func (r *NormalResponse) Body() []byte {
return r.body
}
And converting our service's response into NormalResponse
is mostly a matter of copying values:
func NewNormalResponse(response *http.Response) Response {
var body []byte
length := response.ContentLength
if length > 0 {
body = make([]byte, length)
io.ReadFull(response.Body, body)
} else if length == -1 {
buffer := bytes.NewBuffer(make([]byte, 0, 16384))
io.Copy(buffer, response.Body)
body = buffer.Bytes()
// if we're going to cache this request
// we should consider trimming the buffer
}
response.Body.Close()
return &NormalResponse{
status: response.StatusCode,
header: response.Header,
body: body,
}
}
There's a comment in the block where the content length isn't known. This something which has frustrated me before.
We now have a NormalResponse
which handles the two cases, cached and uncached, where we aren't doing hydration. We can store this object in memory, or not, and return it to our handler.
HydrateResponse
Finally, we get to the point of this post: creating a response object that we can both cache and hydrate. As a reminder, we want to turn the following JSON into something that lets us efficiently resolve the references on the fly:
{
"page": 1,
"total": 54,
"results": [
{
"!ref": {
"id": "9001p",
"type": "product"
}
},
{
"!ref": {
"id": "322p",
"type": "product"
}
},
...
]
}
Because references can be deeply nested, there's really only one way to do this: don't parse this into JSON. It might seem like a hack, but I guarantee you that the code ends up being simpler and considerably faster.
What we want to do is break this data into parts
. There's two types of parts
: LiteralPart
and ReferencePart
type LiteralPart []byte
type ReferencePart struct {
id string
t string
}
Before we jump into how to create these parts, I want to show you how it gets sent to the user. Our Response
interface demands a method, Body
, which exposes the response as a []byte
. Here's the implementation for a HydrateRespone
:
func (res *HydrateResponse) Body() []byte {
// should use a pre-allocated buffer pool
// to minimize the amount of allocations
buffer := bytes.NewBuffer(make([]byte, 0, 16384))
for _, part := range res.parts {
buffer.Write(part.Render())
}
return buffer.Bytes()
}
Essentially, each part knows how to render itself. The response just glues them together. Speaking of the response, here's the structure:
type HydrateResponse struct {
status int
header http.Header
parts []Part
}
From the two above snippets, we know that the Part
interface looks like:
type Part interface {
Render() []byte
}
Let's keep going down this road. Here's the Render
code for our two part types:
func (p LiteralPart) Render() []byte {
return p
}
func (p *ReferencePart) Render() []byte {
return Get(p.id, p.t)
}
Get
is a placeholder. Maybe you're holding the objects in memory, or maybe in Redis or a relational database. For testing purposes, this is what I used:
func Get(id, t string) []byte {
return []byte(`"id":"12321p"`)
}
Notice the data isn't enclosed in braces. I know, I know. It sucks. It makes the parsing a lot easier though ... definitely fixable if you're so inclined.
There's only one piece left: creating our Parts
. To re-cap though, the plan is to take our response, split it into alternating LiteralPart
and ReferencePart
. I say alternating because most of the time you'll have have some actual data, followed by a reference, followed by more data, then a reference and so on. You'll never have two LiteralParts
in a row, but you could have two or more ReferenceParts
in a row.
Here's the code:
func NewHydrateResponse(res Response, fieldName string) (Response, error) {
position := 0
body := res.Body()
needle := []byte("\"" + fieldName)
parts := make([]Part, 0, 10)
for {
index := bytes.Index(body, needle)
if index == -1 {
parts = append(parts, LiteralPart(body[position:]))
break
}
parts = append(parts, LiteralPart(body[position:index]))
body = body[index:]
start := bytes.IndexRune(body, '{')
if start == -1 {
//should probably at least log an error
continue
}
end := bytes.IndexRune(body, '}')
if end == -1 {
//should probably at least log an error
continue
}
end += 1
var ref map[string]string
if err := json.Unmarshal(body[start:end], &ref); err != nil {
return nil, err
}
parts = append(parts, &ReferencePart{ref["id"], ref["type"]})
body = body[end:]
}
return &HydrateResponse{
status: res.Status(),
header: res.Header(),
parts: parts,
}, nil
}
I don't plan on explaining this in any great detail. It's moving forward through the data looking for the index of !ref
(or whatever value you returned in the X-Hydrate
header). If you picked field with a '{' in it, you'll break the above code (it could be fixed, but just don't pick a field name with that inside of it!).
Once we've found a !ref
we look for the next '{' and '}'. We grab that content and parse it. This is a little heavier than I like, but it does mean that we can embedded data in our reference. Here we're doing an id
and type
, but we could embed even more (not a nested object though, else that throws off the parser which is looking for the next '}').
Whether or not this response is cached, you need to go through the same logic. Again, you could optimize the non-cached path by hydrating while parsing. Meh.
Conclusion
I've put some working code up on Github.
If anyone else is putting this in production, I'd love to know more about it. It's super fun stuff.