Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ETag-caching (and AWS SDK v2) support #287

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rtyley
Copy link
Member

@rtyley rtyley commented Aug 4, 2023

This change adds these improvements:

  • Facia data is only re-downloaded & re-parsed if the S3 content has changed (thanks to ETag-caching)
  • AWS SDK v2: the FAPI client itself now has a fapi-s3-sdk-v2 artifact (I think this can replace Update to AWS SDK v2 #286, @fredex42?)

The ETag-caching library itself is also being used in DotCom PROD, introduced with guardian/frontend#26338.

Usage

Example PR consuming this updated version of the FAPI client: https://github.com/guardian/ophan/pull/5506.

Updated FAPI artifact layout

To use FAPI with the new AWS SDK v2 support, users must now have a dependency on two FAPI artifacts:

  • fapi-s3-sdk-v2
  • fapi-client-playXX

Due to needing to support the matrix of:

  • AWS SDK v1 & v2
  • Play-JSON 2.7, 2.8, and eventually 2.9

...it's best not to try to produce an artifact that corresponds to every single possible combination! Consequently, this change provides artifacts that are specific to the different versions of AWS SDK (or at least, could do - if AWS SDK v1 was moved out of common code), and artifacts that are specific to the different versions of Play-JSON, and allow the user to combine them as needed. A similar approach was used with guardian/play-secret-rotation: guardian/play-secret-rotation#8

In order for the different artifacts to have interfaces they can use to join together and become a single useful Facia client, there is a fapi-client-core artifact. Any code that doesn't depend on either the actual AWS SDK version, or the JSON classes, (which isn't much!) can live in there. In particular, we have:

  • com.gu.facia.client.ApiClient, an existing type that is now a trait, with 2 implementations - one that uses the existing com.gu.facia.client.S3Client abstraction on S3 behaviour, and another with ETag-based caching.
  • com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour, a new trait that exposes just enough interface to allow the conditional fetching used for ETag-based caching, but doesn't tie you to any specific version of the AWS SDK.

This PR is still draft, but I wanted to put it on your radar @fredex42 !

Noisy logging associated with absent collection JSON

See guardian/etag-caching#32 for an issue around excessive logging that currently occurs with this PR.

@rtyley rtyley force-pushed the add-etag-caching-support branch 11 times, most recently from 49f9a7f to 9b755d7 Compare August 10, 2023 11:54
@rtyley rtyley changed the title WIP: Add ETag caching support Add ETag-caching (and AWS SDK v2) support Aug 10, 2023
Comment on lines -13 to -15
trait S3Client {
def get(bucket: String, path: String): Future[FaciaResult]
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the com.gu.facia.client.S3Client trait still exists, it's just moved to the fapi-client-core artifact - couldn't delete it without badly breaking binary compatability for existing users!

rtyley added a commit to guardian/etag-caching that referenced this pull request Sep 6, 2023
This is a hack, and it probably won't be able to fully conceal that fact.

The facia-scala-client, in normal use, is asked for collections that don't
yet have collection JSON, and may never have it (apparently, if a collection
has never been edited, its collection JSON won't exist). In the past, this
would be handled by generating a `FaciaNotFound` response for
`com.gu.facia.client.S3Client.get()` which `ApiClient` would cheerfully
_not_ throw an exception for:

https://github.com/guardian/facia-scala-client/blob/f9830fd286d409662be850ff441792ec03c1c2e2/facia-json/src/main/scala/com/gu/facia/client/ApiClient.scala#L25-L27

...so encountering the missing resource would not be logged as an exception,
**which is the desired behaviour**, see guardian/facia-scala-client#32

However, as my attempt (guardian/facia-scala-client#287)
to introduce ETag-caching & AWS SDK v2 support rejects the `com.gu.facia.client.S3Client`
trait, in favour of adapting `com.gu.facia.client.ApiClient`, I need to provide
that same exception-supressing behaviour.

The task is complicated because currently `ETagCache` holds a scaffeine cache with
**value** type `ETaggedData[V]` - there's no successful response for the cache loader
to return for keys that can denote 'resource missing' - any successful value must have
an ETag for one thing, as `ETaggedData` requires it, but when a resource is missing,
we have no ETag.

This commit tried to deal with the problem by replacing the `S3Exception` with a
`CancellationException`, which caffeine special-cases as exception to _not_ log:

ben-manes/caffeine@1e52b10

...however this gets even yuckier than we might have thought, because it means that
`facia-scala-client`, as the consumer of `etag-caching`, has to know that it must
now try to catch `CancellationException`, rather than `S3Exception`, when it is
trying to do it's _own_ suppression of exception-logging.
@rtyley
Copy link
Member Author

rtyley commented Sep 13, 2023

As discussed at standup, I'm going to do a release of facia-scala-client as it currently is (v4.0.6), and get that in use, before introducing this new update.

rtyley added a commit to guardian/editors-picks-uploader that referenced this pull request Sep 13, 2023
This is a small upgrade, catching up with the recent dependency updates of
https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6,
before the more extensive update in guardian/facia-scala-client#287
is introduced.

This update has already been tested in Ophan's PromotionPoller with
guardian/ophan#5540, successfully deployed
to Prodution.

I've switched from `fapi-client-play27` to `fapi-client-play28` - the
`editors-picks-uploader` is an AWS Lambda, it doesn't actually run
the Play framework, so specifying a higher Play version doesn't really
affect it! The `fapi-client` uses `play-json` to do its JSON-parsing
(that's all that the `-play2x` suffix indicates), so we might as well use
the latest version that's available (allowing `facia-scala-client` to drop
support for `play27` at some future point).
rtyley added a commit to guardian/story-packages that referenced this pull request Sep 13, 2023
This is a small upgrade, catching up with the recent dependency updates of
https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6,
before the more extensive update in guardian/facia-scala-client#287
is introduced.

This update has already been tested in Ophan's PromotionPoller with
guardian/ophan#5540, successfully deployed
to Prodution.
rtyley added a commit to guardian/facia-tool that referenced this pull request Sep 13, 2023
This is a small upgrade, catching up with the recent dependency updates of
https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6,
before the more extensive update in guardian/facia-scala-client#287
is introduced.

This update has already been tested in Ophan's PromotionPoller with
guardian/ophan#5540, successfully deployed
to Prodution.
rtyley added a commit to guardian/frontend that referenced this pull request Sep 13, 2023
This is a small upgrade, catching up with the recent dependency updates of
https://github.com/guardian/facia-scala-client/releases/tag/v4.0.6,
before the more extensive update in guardian/facia-scala-client#287
is introduced.

This update has already been tested in Ophan's PromotionPoller with
guardian/ophan#5540, successfully deployed
to Prodution.

Note that as https://github.com/guardian/frontend is _already_
using Play 2.8, it should probably be using `fapi-client-play28`,
rather than `fapi-client-play27`, so I've also updated that.
rtyley added a commit to guardian/etag-caching that referenced this pull request Jan 5, 2024
This is a hack, and it probably won't be able to fully conceal that fact.

The facia-scala-client, in normal use, is asked for collections that don't
yet have collection JSON, and may never have it (apparently, if a collection
has never been edited, its collection JSON won't exist). In the past, this
would be handled by generating a `FaciaNotFound` response for
`com.gu.facia.client.S3Client.get()` which `ApiClient` would cheerfully
_not_ throw an exception for:

https://github.com/guardian/facia-scala-client/blob/f9830fd286d409662be850ff441792ec03c1c2e2/facia-json/src/main/scala/com/gu/facia/client/ApiClient.scala#L25-L27

...so encountering the missing resource would not be logged as an exception,
**which is the desired behaviour**, see guardian/facia-scala-client#32

However, as my attempt (guardian/facia-scala-client#287)
to introduce ETag-caching & AWS SDK v2 support rejects the `com.gu.facia.client.S3Client`
trait, in favour of adapting `com.gu.facia.client.ApiClient`, I need to provide
that same exception-supressing behaviour.

The task is complicated because currently `ETagCache` holds a scaffeine cache with
**value** type `ETaggedData[V]` - there's no successful response for the cache loader
to return for keys that can denote 'resource missing' - any successful value must have
an ETag for one thing, as `ETaggedData` requires it, but when a resource is missing,
we have no ETag.

This commit tried to deal with the problem by replacing the `S3Exception` with a
`CancellationException`, which caffeine special-cases as exception to _not_ log:

ben-manes/caffeine@1e52b10

...however this gets even yuckier than we might have thought, because it means that
`facia-scala-client`, as the consumer of `etag-caching`, has to know that it must
now try to catch `CancellationException`, rather than `S3Exception`, when it is
trying to do it's _own_ suppression of exception-logging.
This change adds these improvements:

* Facia data is only re-downloaded & re-parsed if the S3 content has
  _changed_, thanks to ETag-caching - see https://github.com/guardian/etag-caching .
  This library has already been used in DotCom PROD with guardian/frontend#26338
* AWS SDK v2: the FAPI client itself now has a `fapi-s3-sdk-v2` artifact.

An example PR consuming this updated version of the FAPI client is at:

guardian/ophan#5506

Updated FAPI artifact layout
----------------------------

To use FAPI with the new AWS SDK v2 support, users must now have a
dependency on *two* FAPI artifacts:

* `fapi-s3-sdk-v2`
* `fapi-client-playXX`

Due to needing to support the matrix of:

* AWS SDK v1 & v2
* Play-JSON 2.7, 2.8, and eventually 2.9

...it's best not to try to produce an artifact that corresponds to
every single combination of those! Consequently, we provide an
artifacts that are specific to the different versions of AWS SDK
(or at least, could do - if AWS SDK v1 was moved out of common code),
and artifacts that are specific to the different versions of
Play-JSON, and allow the user to combine them as needed. A
similar approach was used with `guardian/play-secret-rotation`:

guardian/play-secret-rotation#8

In order for the different artifacts to have interfaces they can
use to join together and become a single useful Facia client, we have
a `fapi-client-core` artifact. Any code that doesn't depend on the
JSON classes, or the actual AWS SDK version (which isn't much!), can
live in there. In particular, we have:

* `com.gu.facia.client.ApiClient`, an existing type that is now a
  trait, with 2 implementations - one that uses the existing
  `com.gu.facia.client.S3Client` abstraction on S3 behaviour
* `com.gu.facia.client.etagcaching.fetching.S3FetchBehaviour`,
  a new trait that exposes just enough interface to allow the
  conditional fetching used for ETag-based caching, but doesn't
  tie you to any specific version of the AWS SDK.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant