Hatred's Log Place

DON'T PANIC!

Apr 16, 2018 - 5 minute read - linux

Падение Xorg после выхода из suspend

Замечено на ядрах 4.15. Проявляется не всегда.

В логах замечено следующее, накануне сего события:

Apr 13 21:56:33 localhost kernel: Restarting tasks ... done.
Apr 13 21:56:33 localhost kernel: EXT4-fs error (device sda6): __ext4_get_inode_loc:4619: inode #271265: block 1049210: comm nmbd: unable to read itable block
Apr 13 21:56:33 localhost kernel: Buffer I/O error on dev sda6, logical block 0, lost sync page write
Apr 13 21:56:33 localhost kernel: EXT4-fs error (device sda6) in ext4_reserve_inode_write:5754: IO failure
Apr 13 21:56:33 localhost kernel: EXT4-fs (sda6): previous I/O error to superblock detected
Apr 13 21:56:33 localhost kernel: Buffer I/O error on dev sda6, logical block 0, lost sync page write
Apr 13 21:56:33 localhost kernel: EXT4-fs error (device sda6): __ext4_get_inode_loc:4619: inode #271265: block 1049210: comm nmbd: unable to read itable block
Apr 13 21:56:33 localhost kernel: EXT4-fs (sda6): previous I/O error to superblock detected
Apr 13 21:56:33 localhost kernel: Buffer I/O error on dev sda6, logical block 0, lost sync page write
Apr 13 21:56:33 localhost kernel: EXT4-fs error (device sda6) in ext4_reserve_inode_write:5754: IO failure
Apr 13 21:56:33 localhost kernel: EXT4-fs (sda6): previous I/O error to superblock detected
Apr 13 21:56:33 localhost kernel: Buffer I/O error on dev sda6, logical block 0, lost sync page write
Apr 13 21:56:33 localhost kernel: EXT4-fs error (device sda6) in ext4_orphan_add:2819: IO failure
Apr 13 21:56:33 localhost kernel: EXT4-fs (sda6): previous I/O error to superblock detected
Apr 13 21:56:33 localhost kernel: Buffer I/O error on dev sda6, logical block 0, lost sync page write
Apr 13 21:56:33 localhost kernel: EXT4-fs error (device sda6): __ext4_get_inode_loc:4619: inode #271265: block 1049210: comm nmbd: unable to read itable block
Apr 13 21:56:33 localhost kernel: EXT4-fs (sda6): previous I/O error to superblock detected
Apr 13 21:56:33 localhost kernel: Buffer I/O error on dev sda6, logical block 0, lost sync page write
Apr 13 21:56:33 localhost kernel: EXT4-fs error (device sda6) in ext4_reserve_inode_write:5754: IO failure
Apr 13 21:56:33 localhost kernel: EXT4-fs (sda6): previous I/O error to superblock detected
Apr 13 21:56:33 localhost kernel: Buffer I/O error on dev sda6, logical block 0, lost sync page write
Apr 13 21:56:33 localhost kernel: PM: suspend exit
...
Apr 13 21:56:35 localhost kernel: WARNING: CPU: 2 PID: 688 at fs/buffer.c:1108 mark_buffer_dirty+0xe9/0x100
Apr 13 21:56:35 localhost kernel: Modules linked in: veth hid_generic uhid hid algif_hash algif_skcipher af_alg cmac rfcomm ccm ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c crc32c_generic br_netfilter bridge stp llc overlay bnep btusb btrtl intel_rapl btbcm btintel x86_pkg_temp_thermal intel_powerclamp bluetooth ecdh_generic kvm_intel uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev kvm media tun irqbypass snd_hda_codec_realtek crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic coretemp pcbc aesni_intel aes_x86_64 crypto_simd joydev glue_helper mousedev cryptd msr iTCO_wdt mei_wdt intel_cstate intel_uncore
Apr 13 21:56:35 localhost kernel:  arc4 iwldvm tpm_tis tpm_tis_core mac80211 tpm intel_rapl_perf iwlwifi iTCO_vendor_support wmi_bmof snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm e1000e cfg80211 mei_me psmouse i2c_i801 thinkpad_acpi shpchp snd_timer ptp mei lpc_ich pps_core nvram rfkill fuse battery input_leds snd ac rtc_cmos wmi soundcore evdev mac_hid vboxpci(O) vboxnetflt(O) vboxnetadp(O) vboxdrv(O) acpi_call(O) parport_pc ppdev lp parport sg crypto_user ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypto sr_mod cdrom sd_mod serio_raw atkbd libps2 ahci sdhci_pci xhci_pci libahci ehci_pci sdhci xhci_hcd ehci_hcd libata crc32c_intel led_class scsi_mod mmc_core usbcore usb_common i8042 serio i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm intel_agp intel_gtt agpgart
Apr 13 21:56:35 localhost kernel: CPU: 2 PID: 688 Comm: jbd2/sda6-8 Tainted: G           O     4.15.14-1-MANJARO #1
Apr 13 21:56:35 localhost kernel: Hardware name: LENOVO 2392AQU/2392AQU, BIOS G4ETB0WW (2.70 ) 09/26/2017
Apr 13 21:56:35 localhost kernel: RIP: 0010:mark_buffer_dirty+0xe9/0x100
Apr 13 21:56:35 localhost kernel: RSP: 0018:ffffb4a18215bcd8 EFLAGS: 00010246
Apr 13 21:56:35 localhost kernel: RAX: 0000000000a20828 RBX: ffffa300bfd02750 RCX: ffffa2fea42ed8e8
Apr 13 21:56:35 localhost kernel: RDX: ffffa300bfd02750 RSI: ffffa2fea42edf00 RDI: ffffa300bfd02750
Apr 13 21:56:35 localhost kernel: RBP: ffffa300bfd02750 R08: 0000000000000000 R09: ffffa300ad083fc0
Apr 13 21:56:35 localhost kernel: R10: 0000000000000000 R11: 0000000000000228 R12: ffffa300bcdfd388
Apr 13 21:56:35 localhost kernel: R13: ffffa2fea42ed780 R14: ffffa300a9361e00 R15: ffffa300bfd02752
Apr 13 21:56:35 localhost kernel: FS:  0000000000000000(0000) GS:ffffa300de280000(0000) knlGS:0000000000000000
Apr 13 21:56:35 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 13 21:56:35 localhost kernel: CR2: 00007f4688041160 CR3: 00000002de00a005 CR4: 00000000001606e0
Apr 13 21:56:35 localhost kernel: Call Trace:
Apr 13 21:56:35 localhost kernel:  __jbd2_journal_refile_buffer+0xa3/0xc0 [jbd2]
Apr 13 21:56:35 localhost kernel:  jbd2_journal_commit_transaction+0x128e/0x18b0 [jbd2]
Apr 13 21:56:35 localhost kernel:  ? sched_clock_cpu+0xe/0xd0
Apr 13 21:56:35 localhost kernel:  ? kjournald2+0xc0/0x270 [jbd2]
Apr 13 21:56:35 localhost kernel:  kjournald2+0xc0/0x270 [jbd2]
Apr 13 21:56:35 localhost kernel:  ? wait_woken+0x80/0x80
Apr 13 21:56:35 localhost kernel:  ? commit_timeout+0x10/0x10 [jbd2]
Apr 13 21:56:35 localhost kernel:  kthread+0x113/0x130
Apr 13 21:56:35 localhost kernel:  ? kthread_create_on_node+0x70/0x70
Apr 13 21:56:35 localhost kernel:  ret_from_fork+0x35/0x40
Apr 13 21:56:35 localhost kernel: Code: c0 48 89 c5 74 2c 48 89 c6 48 89 df 31 d2 e8 7f fd ff ff 48 89 df e8 07 77 fb ff 48 8b 7d 00 be 04 00 00 00 5b 5d e9 67 7d ff ff <0f> 0b e9 25 ff ff ff 48 89 df eb bb 90 66 2e 0f 1f 84 00 00 00 
Apr 13 21:56:35 localhost kernel: ---[ end trace c1b2e1f90bb5b2b0 ]---

Погуглив по строке __jbd2_journal_refile_buffer+0xa3/0xc0 [jbd2] обнаружил:

Рекомендуемый WA: установить параметр scan для модуля scsi_mod в значение sync.

Чревато: увеличение времени выхода из сна, примерно на секунду. Для меня не критично.

Суть: работа по сканированию будет выполнена из того же потока, где выполняет процедура выхода из сна и не возникнет состояния гонки.

Если модуль вкомпилирован в ядро:

  • в параметры ядра нужно передать из загрузчика: scsi_mod.scan=sync
  • например, через grub (grub2):
    • редактируем /etc/default/grub

    • исправляем строку:

      GRUB_CMDLINE_LINUX_DEFAULT="... scsi_mod.scan=sync ..."
      
    • сохраняем и выполняем update-grub

    • перезагружаемся

Если модуль отдельно, то

  • создаём файл /etc/modprobe.d/scsi_mod.conf
  • в него помещаем:
    # WA:
    # - https://bugzilla.redhat.com/show_bug.cgi?id=1562982
    # - https://bugzilla.redhat.com/show_bug.cgi?id=1553979
    options scsi_mod scan=sync
    
  • и сразу на рабочей системе исправляем параметры налету:
    echo sync > /sys/module/scsi_mod/parameters/scan
    # или
    echo sync | sudo tee /sys/module/scsi_mod/parameters/scan
    

Перезагружаться не обязательно.

И ждём исправления в апстриме.