What about error status reporting? One year ago, the Linux Filesystem, Storage, and Memory-Management Summit (LSFMM) included a session on error reporting, wherein it was described as "a mess"; errors could easily be lost so that no application would ever see them.
. . .
Ted Ts'o, instead, explained why the affected pages are marked clean after an I/O error occurs; in short, the most common cause of I/O errors, by far, is a user pulling out a USB drive at the wrong time. If some process was copying a lot of data to that drive, the result will be an accumulation of dirty pages in memory, perhaps to the point that the system as a whole runs out of memory for anything else. So those pages cannot be kept if the user wants the system to remain usable after such an event.
Both Chinner and Ts'o, along with others, said that the proper solution is for PostgreSQL to move to direct I/O (DIO) instead. Using DIO gives a greater level of control over writeback and I/O in general; that includes access to information on exactly which I/O operations might have failed. Andres Freund, like a number of other PostgreSQL developers, has acknowledged that DIO is the best long-term solution. But he also noted that getting there is "a metric ton of work" that isn't going to happen anytime soon. Meanwhile, he said, there are other programs (he mentioned dpkg) that are also affected by this behavior.