Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggregation_keys created before the size was limited can break sql indexes via dumps #17156

Open
MTRNord opened this issue May 5, 2024 · 4 comments

Comments

@MTRNord
Copy link
Contributor

MTRNord commented May 5, 2024

Description

When dumping and reimporting, postgres does a new index. these are size limited. At some point someone sent an event with ascii art in aggregation_key which landed in my public.event_relations table. This is 3497 chars long in size.

As a result this happens on any dump + import:

ERROR: index row size 2920 exceeds btree version 4 maximum 2704 for index "event_relations_relates"
DETAIL: Index row references tuple (9835,12) in relation "event_relations".
TIP: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

Steps to reproduce

  • have a super long string in your db.

Homeserver

matrix.midnightthoughts.space

Synapse Version

v1.106.0

Installation Method

Docker (matrixdotorg/synapse)

Database

Postgres 15

Workers

Single process

Platform

Kubernetes with a pg cluster

Configuration

No response

Relevant log output

This happened on import of a pgdump:



ERROR: index row size 2920 exceeds btree version 4 maximum 2704 for index "event_relations_relates"
DETAIL: Index row references tuple (9835,12) in relation "event_relations".
TIP: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.


### Anything else that would be useful to know?

It would be a good idea to have a migration removing faulty long rows as it will break indexes which might in the best case lower performance and in the worst case break backups in ways where the dump has to manually changed, reimported, the row manually deleted and the index recreated for it to work.

A query how I found it was `select event_id, relates_to_id, relation_type, aggregation_key, length(aggregation_key) from public.event_relations where length(aggregation_key) >= 2000 ORDER BY length(aggregation_key) DESC LIMIT 10;`
@reivilibre
Copy link
Contributor

reivilibre commented May 9, 2024

the obvious question here is: how did this get accepted into the original database in the first place?

edit: I guess your server you dumped from has a larger page size? I have never configured that in Postgres but I imagine it is possible

@MTRNord
Copy link
Contributor Author

MTRNord commented May 9, 2024

I think it might have just been silently broken all the time as the index was probably created before the event was received. A reindex on the database before the dump with the same pagesize settings actually does work too. I assume postgres just skips broken rows at runtime to prevent downtime? It only became an issue when dumping it and reimporting on a fresh server.

@MTRNord
Copy link
Contributor Author

MTRNord commented May 9, 2024

Oh and since I missed to link to it: matrix-org/synapse#12101 does limit this key already. So this issue is only about historic data people may have received before that PR. It cant happen again after a version with that PR afaik.

@reivilibre
Copy link
Contributor

I assume postgres just skips broken rows at runtime to prevent downtime?

I would be kind of surprised if it did this to be honest, it seems uncharacteristic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants