home

Practical SOA / microservices - Hydration - Part 2

Oct 18, 2014

In part 1 we saw how we can use hydration, a modern form of edge side include, to bring together data from multiple services. In this part we'll look at the code. This isn't an end-to-end solution that you can just drop into your system. It's just something to get you headed in the right direction.

Responses

We actually have to consider four distinct types of responses:

The request that stands apart from the others is uncached+unhydrated. It's unique because it's the only one that doesn't need us to explicitly read the data into memory -- we can pipe one end of to the other. I'm actually not going to show how to do that because it adds quite a bit of complexity, and we can simply use the cached+unhydrated model response type. We might pay the price of an extra copy, but that's nothing. (To be clear, I'd have to think hard about how to implement this cleanly, I've never handled this case specifically, and it's never been a problem.)

First we want a Response interface:

type Response interface {
  Header() http.Header
  Status() int
  Body() []byte
}

This is the type that actually gets returned out of the middleware chain to the main handler. A very simple handler might look like:

func handler(output http.ResponseWriter, req *http.Request) {
  response, err := Proxy(req)
  if err != nil {
    log.Println(err.Error())
    output.WriteHeader(500)
    return
  }
  for k, v := range response.Header() {
    response.Header()[k] = v
  }
  body := response.Body()
  output.Header()["Content-Length"] = []string{strconv.Itoa(len(body))}
  output.WriteHeader(response.Status())
  output.Write(body)
}

The proxy would really be a chain of middlewares which handle logs and maybe authentication, caching and so on. I won't actually show the caching here, but I'll make it so that data is cacheable. To keep this relatively simple though, we'll just have a single "middleware" called Proxy:

func Proxy(req *http.Request) (Response, error) {
  upstreamRequest := createRequest(req)
  response, err := http.DefaultClient.Do(upstreamRequest)
  if err != nil {
    return nil, err
  }
  readResponse := NewNormalResponse(response)
  hydrateField := response.Header.Get("X-Hydrate")
  if len(hydrateField) == 0 {
    return readResponse, nil
  }
  return NewHydrateResponse(readResponse, hydrateField)
}

From the above code, all we need to do is complete NewNormalResponse and NewHydrateResponse functions and we're done. (createRequest converts the incoming *http.Request into the request to send to your service; it's your routing logic.)

NormalResponse

I'll start with handling the case where we aren't hydrating the data. This works for both cachable and uncacheable requests. The NormalResponse implementation is simple:

type NormalResponse struct {
  status int
  header http.Header
  body   []byte
}

func (r *NormalResponse) Header() http.Header {
  return r.header
}

func (r *NormalResponse) Status() int {
  return r.status
}

func (r *NormalResponse) Body() []byte {
  return r.body
}

And converting our service's response into NormalResponse is mostly a matter of copying values:

func NewNormalResponse(response *http.Response) Response {
  var body []byte
  length := response.ContentLength
  if length > 0 {
    body = make([]byte, length)
    io.ReadFull(response.Body, body)
  } else if length == -1 {
    buffer := bytes.NewBuffer(make([]byte, 0, 16384))
    io.Copy(buffer, response.Body)
    body = buffer.Bytes()
    // if we're going to cache this request
    // we should consider trimming the buffer
  }
  response.Body.Close()
  return &NormalResponse{
    status: response.StatusCode,
    header: response.Header,
    body:   body,
  }
}

There's a comment in the block where the content length isn't known. This something which has frustrated me before.

We now have a NormalResponse which handles the two cases, cached and uncached, where we aren't doing hydration. We can store this object in memory, or not, and return it to our handler.

HydrateResponse

Finally, we get to the point of this post: creating a response object that we can both cache and hydrate. As a reminder, we want to turn the following JSON into something that lets us efficiently resolve the references on the fly:

{
  "page": 1,
  "total": 54,
  "results": [
    {
      "!ref": {
        "id": "9001p",
        "type": "product"
      }
    },
    {
      "!ref": {
        "id": "322p",
        "type": "product"
      }
    },
    ...
  ]
}

Because references can be deeply nested, there's really only one way to do this: don't parse this into JSON. It might seem like a hack, but I guarantee you that the code ends up being simpler and considerably faster.

What we want to do is break this data into parts. There's two types of parts: LiteralPart and ReferencePart

type LiteralPart []byte

type ReferencePart struct {
  id string
  t string
}

Before we jump into how to create these parts, I want to show you how it gets sent to the user. Our Response interface demands a method, Body, which exposes the response as a []byte. Here's the implementation for a HydrateRespone:

func (res *HydrateResponse) Body() []byte {
  // should use a pre-allocated buffer pool
  // to minimize the amount of allocations
  buffer := bytes.NewBuffer(make([]byte, 0, 16384))
  for _, part := range res.parts {
    buffer.Write(part.Render())
  }
  return buffer.Bytes()
}

Essentially, each part knows how to render itself. The response just glues them together. Speaking of the response, here's the structure:

type HydrateResponse struct {
  status int
  header http.Header
  parts []Part
}

From the two above snippets, we know that the Part interface looks like:

type Part interface {
  Render() []byte
}

Let's keep going down this road. Here's the Render code for our two part types:

func (p LiteralPart) Render() []byte {
  return p
}

func (p *ReferencePart) Render() []byte {
  return Get(p.id, p.t)
}

Get is a placeholder. Maybe you're holding the objects in memory, or maybe in Redis or a relational database. For testing purposes, this is what I used:

func Get(id, t string) []byte {
  return []byte(`"id":"12321p"`)
}

Notice the data isn't enclosed in braces. I know, I know. It sucks. It makes the parsing a lot easier though ... definitely fixable if you're so inclined.

There's only one piece left: creating our Parts. To re-cap though, the plan is to take our response, split it into alternating LiteralPart and ReferencePart. I say alternating because most of the time you'll have have some actual data, followed by a reference, followed by more data, then a reference and so on. You'll never have two LiteralParts in a row, but you could have two or more ReferenceParts in a row.

Here's the code:

func NewHydrateResponse(res Response, fieldName string) (Response, error) {
  position := 0
  body := res.Body()
  needle := []byte("\"" + fieldName)
  parts := make([]Part, 0, 10)
  for {
    index := bytes.Index(body, needle)
    if index == -1 {
      parts = append(parts, LiteralPart(body[position:]))
      break
    }
    parts = append(parts, LiteralPart(body[position:index]))
    body = body[index:]
    start := bytes.IndexRune(body, '{')
    if start == -1 {
      //should probably at least log an error
      continue
    }
    end := bytes.IndexRune(body, '}')
    if end == -1 {
      //should probably at least log an error
      continue
    }
    end += 1

    var ref map[string]string
    if err := json.Unmarshal(body[start:end], &ref); err != nil {
      return nil, err
    }
    parts = append(parts, &ReferencePart{ref["id"], ref["type"]})
    body = body[end:]
  }

  return &HydrateResponse{
    status: res.Status(),
    header: res.Header(),
    parts: parts,
  }, nil
}

I don't plan on explaining this in any great detail. It's moving forward through the data looking for the index of !ref (or whatever value you returned in the X-Hydrate header). If you picked field with a '{' in it, you'll break the above code (it could be fixed, but just don't pick a field name with that inside of it!).

Once we've found a !ref we look for the next '{' and '}'. We grab that content and parse it. This is a little heavier than I like, but it does mean that we can embedded data in our reference. Here we're doing an id and type, but we could embed even more (not a nested object though, else that throws off the parser which is looking for the next '}').

Whether or not this response is cached, you need to go through the same logic. Again, you could optimize the non-cached path by hydrating while parsing. Meh.

Conclusion

I've put some working code up on Github.

If anyone else is putting this in production, I'd love to know more about it. It's super fun stuff.