Our repair jobs failed for a long period (> 14 days).
Today i manually started an repair job with
nodetool repair -pr. Afterwards it looks like we lost some data from a table.
Question: Is it theoretically possible to lose data after a repair job? If yes what can be done to avoid this?
You should not lose data with repair. If anything, you could gain back records that were deleted (resurrected zombie records).
One scenario where data might appear to be "lost" is if you have a missing tombstone cell copied from an alternate node during repair. That would be a correct value, not a lost value. If your client CL was something small, say, 1, and you're on the node with the data (but missing the tombstone), you might think that all-of-the-sudden you lost the cell, but again, that's the correct value.
Another scenario where things might appear to be "lost" is if the nodes time/clocks ever got out of sync and on your cluster where certain cells have incorrect time/date values causing things to get potentially messed up when repair tries to sync things up.
That's all I can think of off the top of my head.