Lo de Raúl

OpenSearch .keyword sub-fields have a hard 32,766-byte limit

I was indexing legal documents in OpenSearch and started getting errors like this:

type=illegal_argument_exception, reason=Document contains at least one immense term
in field="content.keyword" (whose UTF8 encoding is longer than the max length 32766)

The problem: the mapping had content as a text field with a keyword sub-field added almost by default — easy to miss. Any document whose content exceeds 32,766 bytes fails to index, and since the error is marked retriable: False, SQS retries won’t help — they just hit the DLQ.

The fix depends on whether you actually need exact matching on that field:

If you don’t need exact matching (most common for large content fields): remove the keyword sub-field entirely.

If you need it but want to avoid failures, add ignore_above:

"keyword": {
  "type": "keyword",
  "ignore_above": 32766
}

If you’re truncating in code, leave ~66 bytes of buffer — encode to UTF-8, slice at 32,700 bytes, then decode with errors='ignore'.

The limit is the same in Elasticsearch.