dburtsev (dburtsev) wrote,

PostgreSQL's fsync() surprise

Craig Ringer first reported the problem to the pgsql-hackers mailing list at the end of March. In short, PostgreSQL assumes that a successful call to fsync() indicates that all data written since the last successful call made it safely to persistent storage. But that is not what the kernel actually does. When a buffered I/O write fails due to a hardware-level error, filesystems will respond differently, but that behavior usually includes discarding the data in the affected pages and marking them as being clean. So a read of the blocks that were just written will likely return something other than the data that was written.
What about error status reporting? One year ago, the Linux Filesystem, Storage, and Memory-Management Summit (LSFMM) included a session on error reporting, wherein it was described as "a mess"; errors could easily be lost so that no application would ever see them.
. . .
Ted Ts'o, instead, explained why the affected pages are marked clean after an I/O error occurs; in short, the most common cause of I/O errors, by far, is a user pulling out a USB drive at the wrong time. If some process was copying a lot of data to that drive, the result will be an accumulation of dirty pages in memory, perhaps to the point that the system as a whole runs out of memory for anything else. So those pages cannot be kept if the user wants the system to remain usable after such an event.

Both Chinner and Ts'o, along with others, said that the proper solution is for PostgreSQL to move to direct I/O (DIO) instead. Using DIO gives a greater level of control over writeback and I/O in general; that includes access to information on exactly which I/O operations might have failed. Andres Freund, like a number of other PostgreSQL developers, has acknowledged that DIO is the best long-term solution. But he also noted that getting there is "a metric ton of work" that isn't going to happen anytime soon. Meanwhile, he said, there are other programs (he mentioned dpkg) that are also affected by this behavior.
Tags: it

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.