Skip to content
Snippets Groups Projects
  1. Nov 02, 2022
  2. Nov 01, 2022
  3. Oct 30, 2022
    • Greg Kroah-Hartman's avatar
    • Kuniyuki Iwashima's avatar
      udp: Update reuse->has_conns under reuseport_lock. · 43d51092
      Kuniyuki Iwashima authored
      commit 69421bf98482d089e50799f45e48b25ce4a8d154 upstream.
      
      When we call connect() for a UDP socket in a reuseport group, we have
      to update sk->sk_reuseport_cb->has_conns to 1.  Otherwise, the kernel
      could select a unconnected socket wrongly for packets sent to the
      connected socket.
      
      However, the current way to set has_conns is illegal and possible to
      trigger that problem.  reuseport_has_conns() changes has_conns under
      rcu_read_lock(), which upgrades the RCU reader to the updater.  Then,
      it must do the update under the updater's lock, reuseport_lock, but
      it doesn't for now.
      
      For this reason, there is a race below where we fail to set has_conns
      resulting in the wrong socket selection.  To avoid the race, let's split
      the reader and updater with proper locking.
      
       cpu1                               cpu2
      +----+                             +----+
      
      __ip[46]_datagram_connect()        reuseport_grow()
      .                                  .
      |- reuseport_has_conns(sk, true)   |- more_reuse = __reuseport_alloc(more_socks_size)
      |  .                               |
      |  |- rcu_read_lock()
      |  |- reuse = rcu_dereference(sk->sk_reuseport_cb)
      |  |
      |  |                               |  /* reuse->has_conns == 0 here */
      |  |                               |- more_reuse->has_conns = reuse->has_conns
      |  |- reuse->has_conns = 1         |  /* more_reuse->has_conns SHOULD BE 1 HERE */
      |  |                               |
      |  |                               |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb,
      |  |                               |                     more_reuse)
      |  `- rcu_read_unlock()            `- kfree_rcu(reuse, rcu)
      |
      |- sk->sk_state = TCP_ESTABLISHED
      
      Note the likely(reuse) in reuseport_has_conns_set() is always true,
      but we put the test there for ease of review.  [0]
      
      For the record, usually, sk_reuseport_cb is changed under lock_sock().
      The only exception is reuseport_grow() & TCP reqsk migration case.
      
        1) shutdown() TCP listener, which is moved into the latter part of
           reuse->socks[] to migrate reqsk.
      
        2) New listen() overflows reuse->socks[] and call reuseport_grow().
      
        3) reuse->max_socks overflows u16 with the new listener.
      
        4) reuseport_grow() pops the old shutdown()ed listener from the array
           and update its sk->sk_reuseport_cb as NULL without lock_sock().
      
      shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
      but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
      so likely(reuse) never be false in reuseport_has_conns_set().
      
      [0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/
      
      
      
      Fixes: acdcecc6 ("udp: correct reuseport selection with connected sockets")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43d51092
    • Seth Jenkins's avatar
      mm: /proc/pid/smaps_rollup: fix no vma's null-deref · a50ed2d2
      Seth Jenkins authored
      
      Commit 258f669e ("mm: /proc/pid/smaps_rollup: convert to single value
      seq_file") introduced a null-deref if there are no vma's in the task in
      show_smaps_rollup.
      
      Fixes: 258f669e ("mm: /proc/pid/smaps_rollup: convert to single value seq_file")
      Signed-off-by: default avatarSeth Jenkins <sethjenkins@google.com>
      Reviewed-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Tested-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a50ed2d2
    • Yu Kuai's avatar
      blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init() · 31b15706
      Yu Kuai authored
      
      commit 285febabac4a16655372d23ff43e89ff6f216691 upstream.
      
      commit 8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is
      initialized") moves wbt_set_write_cache() before rq_qos_add(), which
      is wrong because wbt_rq_qos() is still NULL.
      
      Fix the problem by removing wbt_set_write_cache() and setting 'rwb->wc'
      directly. Noted that this patch also remove the redundant setting of
      'rab->wc'.
      
      Fixes: 8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com
      
      
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20221009101038.1692875-1-yukuai1@huaweicloud.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31b15706
    • Avri Altman's avatar
      mmc: core: Add SD card quirk for broken discard · e2f9b62e
      Avri Altman authored
      
      commit 07d2872bf4c864eb83d034263c155746a2fb7a3b upstream.
      
      Some SD-cards from Sandisk that are SDA-6.0 compliant reports they supports
      discard, while they actually don't. This might cause mk2fs to fail while
      trying to format the card and revert it to a read-only mode.
      
      To fix this problem, let's add a card quirk (MMC_QUIRK_BROKEN_SD_DISCARD)
      to indicate that we shall fall-back to use the legacy erase command
      instead.
      
      Signed-off-by: default avatarAvri Altman <avri.altman@wdc.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20220928095744.16455-1-avri.altman@wdc.com
      
      
      [Ulf: Updated the commit message]
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2f9b62e
    • Nick Desaulniers's avatar
      Makefile.debug: re-enable debug info for .S files · 3a260e98
      Nick Desaulniers authored
      
      This is _not_ an upstream commit and just for 5.10.y only. It is based
      on commit 32ef9e5054ec0321b9336058c58ec749e9c6b0fe upstream.
      
      Alexey reported that the fraction of unknown filename instances in
      kallsyms grew from ~0.3% to ~10% recently; Bill and Greg tracked it down
      to assembler defined symbols, which regressed as a result of:
      
      commit b8a90923 ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")
      
      In that commit, I allude to restoring debug info for assembler defined
      symbols in a follow up patch, but it seems I forgot to do so in
      
      commit a66049e2 ("Kbuild: make DWARF version a choice")
      
      Fixes: b8a90923 ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a260e98
    • Nathan Chancellor's avatar
      x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUB · 6ab2287b
      Nathan Chancellor authored
      
      commit 33806e7cb8d50379f55c3e8f335e91e1b359dc7b upstream.
      
      A recent change in LLVM made CONFIG_EFI_STUB unselectable because it no
      longer pretends to support -mabi=ms, breaking the dependency in
      Kconfig. Lack of CONFIG_EFI_STUB can prevent kernels from booting via
      EFI in certain circumstances.
      
      This check was added by
      
        8f24f8c2 ("efi/libstub: Annotate firmware routines as __efiapi")
      
      to ensure that __attribute__((ms_abi)) was available, as -mabi=ms is
      not actually used in any cflags.
      
      According to the GCC documentation, this attribute has been supported
      since GCC 4.4.7. The kernel currently requires GCC 5.1 so this check is
      not necessary; even when that change landed in 5.6, the kernel required
      GCC 4.9 so it was unnecessary then as well.
      
      Clang supports __attribute__((ms_abi)) for all versions that are
      supported for building the kernel so no additional check is needed.
      Remove the 'depends on' line altogether to allow CONFIG_EFI_STUB to be
      selected when CONFIG_EFI is enabled, regardless of compiler.
      
      Fixes: 8f24f8c2 ("efi/libstub: Annotate firmware routines as __efiapi")
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://github.com/llvm/llvm-project/commit/d1ad006a8f64bdc17f618deffa9e7c91d82c444d
      
      
      [nathan: Fix conflict due to lack of c6dbd3e5e69c in older trees]
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ab2287b
    • Werner Sembach's avatar
      ACPI: video: Force backlight native for more TongFang devices · 67dafece
      Werner Sembach authored
      commit 3dbc80a3e4c55c4a5b89ef207bed7b7de36157b4 upstream.
      
      This commit is very different from the upstream commit! It fixes the same
      issue by adding more quirks, rather then the general fix from the 6.1
      kernel, because the general fix from the 6.1 kernel is part of a larger
      refactoring of the backlight code which is not suitable for the stable
      series.
      
      As described in "ACPI: video: Drop NL5x?U, PF4NU1F and PF5?U??
      acpi_backlight=native quirks" (10212754a0d2) the upstream commit "ACPI:
      video: Make backlight class device registration a separate step (v2)"
      (3dbc80a3e4c5) makes these quirks unnecessary. However as mentioned in this
      bugtracker ticket https://bugzilla.kernel.org/show_bug.cgi?id=215683#c17
      
      
      the upstream fix is part of a larger patchset that is overall too complex
      for stable.
      
      The TongFang GKxNRxx, GMxNGxx, GMxZGxx, and GMxRGxx / TUXEDO
      Stellaris/Polaris Gen 1-4, have the same problem as the Clevo NL5xRU and
      NL5xNU / TUXEDO Aura 15 Gen1 and Gen2:
      They have a working native and video interface for screen backlight.
      However the default detection mechanism first registers the video interface
      before unregistering it again and switching to the native interface during
      boot. This results in a dangling SBIOS request for backlight change for
      some reason, causing the backlight to switch to ~2% once per boot on the
      first power cord connect or disconnect event. Setting the native interface
      explicitly circumvents this buggy behaviour by avoiding the unregistering
      process.
      
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarWerner Sembach <wse@tuxedocomputers.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67dafece
    • Gaurav Kohli's avatar
      hv_netvsc: Fix race between VF offering and VF association message from host · dcaf6313
      Gaurav Kohli authored
      
      commit 365e1ececb2905f94cc10a5817c5b644a32a3ae2 upstream.
      
      During vm boot, there might be possibility that vf registration
      call comes before the vf association from host to vm.
      
      And this might break netvsc vf path, To prevent the same block
      vf registration until vf bind message comes from host.
      
      Cc: stable@vger.kernel.org
      Fixes: 00d7ddba ("hv_netvsc: pair VF based on serial number")
      Reviewed-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarGaurav Kohli <gauravkohli@linux.microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcaf6313
    • Adrian Hunter's avatar
      perf/x86/intel/pt: Relax address filter validation · da54c5f4
      Adrian Hunter authored
      
      commit c243cecb58e3905baeace8827201c14df8481e2a upstream.
      
      The requirement for 64-bit address filters is that they are canonical
      addresses. In other respects any address range is allowed which would
      include user space addresses.
      
      That can be useful for tracing virtual machine guests because address
      filtering can be used to advantage in place of current privilege level
      (CPL) filtering.
      
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220131072453.2839535-2-adrian.hunter@intel.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da54c5f4
    • Conor Dooley's avatar
      riscv: topology: fix default topology reporting · 79c3482f
      Conor Dooley authored
      
      commit fbd92809997a391f28075f1c8b5ee314c225557c upstream.
      
      RISC-V has no sane defaults to fall back on where there is no cpu-map
      in the devicetree.
      Without sane defaults, the package, core and thread IDs are all set to
      -1. This causes user-visible inaccuracies for tools like hwloc/lstopo
      which rely on the sysfs cpu topology files to detect a system's
      topology.
      
      On a PolarFire SoC, which should have 4 harts with a thread each,
      lstopo currently reports:
      
      Machine (793MB total)
        Package L#0
          NUMANode L#0 (P#0 793MB)
          Core L#0
            L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0)
            L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1)
            L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2)
            L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3)
      
      Adding calls to store_cpu_topology() in {boot,smp} hart bringup code
      results in the correct topolgy being reported:
      
      Machine (793MB total)
        Package L#0
          NUMANode L#0 (P#0 793MB)
          L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
          L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
          L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
          L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
      
      CC: stable@vger.kernel.org # 456797da792f: arm64: topology: move store_cpu_topology() to shared code
      Fixes: 03f11f03 ("RISC-V: Parse cpu topology during boot.")
      Reported-by: default avatarBrice Goglin <Brice.Goglin@inria.fr>
      Link: https://github.com/open-mpi/hwloc/issues/536
      
      
      Reviewed-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Reviewed-by: default avatarAtish Patra <atishp@rivosinc.com>
      Signed-off-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79c3482f
    • Conor Dooley's avatar
      arm64: topology: move store_cpu_topology() to shared code · a6e77073
      Conor Dooley authored
      
      commit 456797da792fa7cbf6698febf275fe9b36691f78 upstream.
      
      arm64's method of defining a default cpu topology requires only minimal
      changes to apply to RISC-V also. The current arm64 implementation exits
      early in a uniprocessor configuration by reading MPIDR & claiming that
      uniprocessor can rely on the default values.
      
      This is appears to be a hangover from prior to '3102bc0e ("arm64:
      topology: Stop using MPIDR for topology information")', because the
      current code just assigns default values for multiprocessor systems.
      
      With the MPIDR references removed, store_cpu_topolgy() can be moved to
      the common arch_topology code.
      
      Reviewed-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarAtish Patra <atishp@rivosinc.com>
      Signed-off-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6e77073
    • Sibi Sankar's avatar
      arm64: dts: qcom: sc7180-trogdor: Fixup modem memory region · cb1024d8
      Sibi Sankar authored
      
      commit ef9a5d18 upstream.
      
      The modem firmware memory requirements vary between 32M/140M on
      no-lte/lte skus respectively, so fixup the modem memory region
      to reflect the requirements.
      
      Reviewed-by: default avatarEvan Green <evgreen@chromium.org>
      Signed-off-by: default avatarSibi Sankar <sibis@codeaurora.org>
      Link: https://lore.kernel.org/r/1602786476-27833-1-git-send-email-sibis@codeaurora.org
      
      
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarStephen Boyd <swboyd@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb1024d8
    • Desmond Cheong Zhi Xi's avatar
      fcntl: fix potential deadlocks for &fown_struct.lock · f687e211
      Desmond Cheong Zhi Xi authored
      
      [ Upstream commit f671a691 ]
      
      Syzbot reports a potential deadlock in do_fcntl:
      
      ========================================================
      WARNING: possible irq lock inversion dependency detected
      5.12.0-syzkaller #0 Not tainted
      --------------------------------------------------------
      syz-executor132/8391 just changed the state of lock:
      ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: f_getown_ex fs/fcntl.c:211 [inline]
      ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: do_fcntl+0x8b4/0x1200 fs/fcntl.c:395
      but this lock was taken by another, HARDIRQ-safe lock in the past:
       (&dev->event_lock){-...}-{2:2}
      
      and interrupts could create inverse lock ordering between them.
      
      other info that might help us debug this:
      Chain exists of:
        &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
      
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&f->f_owner.lock);
                                     local_irq_disable();
                                     lock(&dev->event_lock);
                                     lock(&new->fa_lock);
        <Interrupt>
          lock(&dev->event_lock);
      
       *** DEADLOCK ***
      
      This happens because there is a lock hierarchy of
      &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
      from the following call chain:
      
        input_inject_event():
          spin_lock_irqsave(&dev->event_lock,...);
          input_handle_event():
            input_pass_values():
              input_to_handler():
                evdev_events():
                  evdev_pass_values():
                    spin_lock(&client->buffer_lock);
                    __pass_event():
                      kill_fasync():
                        kill_fasync_rcu():
                          read_lock(&fa->fa_lock);
                          send_sigio():
                            read_lock_irqsave(&fown->lock,...);
      
      However, since &dev->event_lock is HARDIRQ-safe, interrupts have to be
      disabled while grabbing &f->f_owner.lock, otherwise we invert the lock
      hierarchy.
      
      Hence, we replace calls to read_lock/read_unlock on &f->f_owner.lock,
      with read_lock_irq/read_unlock_irq.
      
      Reported-and-tested-by: default avatar <syzbot+e6d5398a02c516ce5e70@syzkaller.appspotmail.com>
      Signed-off-by: default avatarDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f687e211
    • Pavel Tikhomirov's avatar
      fcntl: make F_GETOWN(EX) return 0 on dead owner task · b1efc196
      Pavel Tikhomirov authored
      
      [ Upstream commit cc4a3f88 ]
      
      Currently there is no way to differentiate the file with alive owner
      from the file with dead owner but pid of the owner reused. That's why
      CRIU can't actually know if it needs to restore file owner or not,
      because if it restores owner but actual owner was dead, this can
      introduce unexpected signals to the "false"-owner (which reused the
      pid).
      
      Let's change the api, so that F_GETOWN(EX) returns 0 in case actual
      owner is dead already. This comports with the POSIX spec, which
      states that a PID of 0 indicates that no signal will be sent.
      
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Andrei Vagin <avagin@gmail.com>
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Stable-dep-of: f671a691 ("fcntl: fix potential deadlocks for &fown_struct.lock")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b1efc196
    • Rob Herring's avatar
      perf: Skip and warn on unknown format 'configN' attrs · ca4c4983
      Rob Herring authored
      
      [ Upstream commit e552b7be12ed62357df84392efa525ecb01910fb ]
      
      If the kernel exposes a new perf_event_attr field in a format attr, perf
      will return an error stating the specified PMU can't be found. For
      example, a format attr with 'config3:0-63' causes an error as config3 is
      unknown to perf. This causes a compatibility issue between a newer
      kernel with older perf tool.
      
      Before this change with a kernel adding 'config3' I get:
      
        $ perf record -e arm_spe// -- true
        event syntax error: 'arm_spe//'
                             \___ Cannot find PMU `arm_spe'. Missing kernel support?
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list
        available events
      
      After this change, I get:
      
        $ perf record -e arm_spe// -- true
        WARNING: 'arm_spe_0' format 'inv_event_filter' requires 'perf_event_attr::config3' which is not supported by this version of perf!
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.091 MB perf.data ]
      
      To support unknown configN formats, rework the YACC implementation to
      pass any config[0-9]+ format to perf_pmu__new_format() to handle with a
      warning.
      
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Tested-by: default avatarLeo Yan <leo.yan@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20220914-arm-perf-tool-spe1-2-v2-v4-1-83c098e6212e@kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ca4c4983
    • Jin Yao's avatar
      perf pmu: Validate raw event with sysfs exported format bits · dea47fef
      Jin Yao authored
      
      [ Upstream commit e4064776 ]
      
      A raw PMU event (eventsel+umask) in the form of rNNN is supported
      by perf but lacks of checking for the validity of raw encoding.
      
      For example, bit 16 and bit 17 are not valid on KBL but perf doesn't
      report warning when encoding with these bits.
      
      Before:
      
        # ./perf stat -e cpu/r031234/ -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
                         0      cpu/r031234/
      
               1.003798924 seconds time elapsed
      
      It may silently measure the wrong event!
      
      The kernel supported bits have been exported through
      /sys/devices/<pmu>/format/. Perf collects the information to
      'struct perf_pmu_format' and links it to 'pmu->format' list.
      
      The 'struct perf_pmu_format' has a bitmap which records the
      valid bits for this format. For example,
      
        root@kbl-ppc:/sys/devices/cpu/format# cat umask
        config:8-15
      
      The valid bits (bit8-bit15) are recorded in bitmap of format 'umask'.
      
      We collect total valid bits of all formats, save to a local variable
      'masks' and reverse it. Now '~masks' represents total invalid bits.
      
      bits = config & ~masks;
      
      The set bits in 'bits' indicate the invalid bits used in config.
      Finally we use bitmap_scnprintf to report the invalid bits.
      
      Some architectures may not export supported bits through sysfs,
      so if masks is 0, perf_pmu__warn_invalid_config directly returns.
      
      After:
      
      Single event without name:
      
        # ./perf stat -e cpu/r031234/ -a -- sleep 1
        WARNING: event 'N/A' not valid (bits 16-17 of config '31234' not supported by kernel)!
      
         Performance counter stats for 'system wide':
      
                         0      cpu/r031234/
      
               1.001597373 seconds time elapsed
      
      Multiple events with names:
      
        # ./perf stat -e cpu/rf01234,name=aaa/,cpu/r031234,name=bbb/ -a -- sleep 1
        WARNING: event 'aaa' not valid (bits 20,22 of config 'f01234' not supported by kernel)!
        WARNING: event 'bbb' not valid (bits 16-17 of config '31234' not supported by kernel)!
      
         Performance counter stats for 'system wide':
      
                         0      aaa
                         0      bbb
      
               1.001573787 seconds time elapsed
      
      Warnings are reported for invalid bits.
      
      Co-developed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210310051138.12154-1-yao.jin@linux.intel.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Stable-dep-of: e552b7be12ed ("perf: Skip and warn on unknown format 'configN' attrs")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dea47fef
    • Wenting Zhang's avatar
      riscv: always honor the CONFIG_CMDLINE_FORCE when parsing dtb · 86e995f9
      Wenting Zhang authored
      
      [ Upstream commit 10f6913c548b32ecb73801a16b120e761c6957ea ]
      
      When CONFIG_CMDLINE_FORCE is enabled, cmdline provided by
      CONFIG_CMDLINE are always used. This allows CONFIG_CMDLINE to be
      used regardless of the result of device tree scanning.
      
      This especially fixes the case where a device tree without the
      chosen node is supplied to the kernel. In such cases,
      early_init_dt_scan would return true. But inside
      early_init_dt_scan_chosen, the cmdline won't be updated as there
      is no chosen node in the device tree. As a result, CONFIG_CMDLINE
      is not copied into boot_command_line even if CONFIG_CMDLINE_FORCE
      is enabled. This commit allows properly update boot_command_line
      in this situation.
      
      Fixes: 8fd6e05c ("arch: riscv: support kernel command line forcing when no DTB passed")
      Signed-off-by: default avatarWenting Zhang <zephray@outlook.com>
      Reviewed-by: default avatarBjörn Töpel <bjorn@kernel.org>
      Reviewed-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Link: https://lore.kernel.org/r/PSBPR04MB399135DFC54928AB958D0638B1829@PSBPR04MB3991.apcprd04.prod.outlook.com
      
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      86e995f9
    • Kefeng Wang's avatar
      riscv: Add machine name to kernel boot log and stack dump output · 0e4c06ae
      Kefeng Wang authored
      
      [ Upstream commit 46ad48e8 ]
      
      Add the machine name to kernel boot-up log, and install
      the machine name to stack dump for DT boot mode.
      
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarAtish Patra <atish.patra@wdc.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      Stable-dep-of: 10f6913c548b ("riscv: always honor the CONFIG_CMDLINE_FORCE when parsing dtb")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0e4c06ae
    • Prathamesh Shete's avatar
      mmc: sdhci-tegra: Use actual clock rate for SW tuning correction · 7fba4a38
      Prathamesh Shete authored
      
      [ Upstream commit b78870e7f41534cc719c295d1f8809aca93aeeab ]
      
      Ensure tegra_host member "curr_clk_rate" holds the actual clock rate
      instead of requested clock rate for proper use during tuning correction
      algorithm. Actual clk rate may not be the same as the requested clk
      frequency depending on the parent clock source set. Tuning correction
      algorithm depends on certain parameters which are sensitive to current
      clk rate. If the host clk is selected instead of the actual clock rate,
      tuning correction algorithm may end up applying invalid correction,
      which could result in errors
      
      Fixes: ea8fc595 ("mmc: tegra: update hw tuning process")
      Signed-off-by: default avatarAniruddha TVS Rao <anrao@nvidia.com>
      Signed-off-by: default avatarPrathamesh Shete <pshete@nvidia.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarThierry Reding <treding@nvidia.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20221006130622.22900-4-pshete@nvidia.com
      
      
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7fba4a38
    • M. Vefa Bicakci's avatar
      xen/gntdev: Accommodate VMA splitting · 3c6a888e
      M. Vefa Bicakci authored
      [ Upstream commit 5c13a4a0291b30191eff9ead8d010e1ca43a4d0c ]
      
      Prior to this commit, the gntdev driver code did not handle the
      following scenario correctly with paravirtualized (PV) Xen domains:
      
      * User process sets up a gntdev mapping composed of two grant mappings
        (i.e., two pages shared by another Xen domain).
      * User process munmap()s one of the pages.
      * User process munmap()s the remaining page.
      * User process exits.
      
      In the scenario above, the user process would cause the kernel to log
      the following messages in dmesg for the first munmap(), and the second
      munmap() call would result in similar log messages:
      
        BUG: Bad page map in process doublemap.test  pte:... pmd:...
        page:0000000057c97bff refcount:1 mapcount:-1 \
          mapping:0000000000000000 index:0x0 pfn:...
        ...
        page dumped because: bad pte
        ...
        file:gntdev fault:0x0 mmap:gntdev_mmap [xen_gntdev] readpage:0x0
        ...
        Call Trace:
         <TASK>
         dump_stack_lvl+0x46/0x5e
         print_bad_pte.cold+0x66/0xb6
         unmap_page_range+0x7e5/0xdc0
         unmap_vmas+0x78/0xf0
         unmap_region+0xa8/0x110
         __do_munmap+0x1ea/0x4e0
         __vm_munmap+0x75/0x120
         __x64_sys_munmap+0x28/0x40
         do_syscall_64+0x38/0x90
         entry_SYSCALL_64_after_hwframe+0x61/0xcb
         ...
      
      For each munmap() call, the Xen hypervisor (if built with CONFIG_DEBUG)
      would print out the following and trigger a general protection fault in
      the affected Xen PV domain:
      
        (XEN) d0v... Attempt to implicitly unmap d0's grant PTE ...
        (XEN) d0v... Attempt to implicitly unmap d0's grant PTE ...
      
      As of this writing, gntdev_grant_map structure's vma field (referred to
      as map->vma below) is mainly used for checking the start and end
      addresses of mappings. However, with split VMAs, these may change, and
      there could be more than one VMA associated with a gntdev mapping.
      Hence, remove the use of map->vma and rely on map->pages_vm_start for
      the original start address and on (map->count << PAGE_SHIFT) for the
      original mapping size. Let the invalidate() and find_special_page()
      hooks use these.
      
      Also, given that there can be multiple VMAs associated with a gntdev
      mapping, move the "mmu_interval_notifier_remove(&map->notifier)" call to
      the end of gntdev_put_map, so that the MMU notifier is only removed
      after the closing of the last remaining VMA.
      
      Finally, use an atomic to prevent inadvertent gntdev mapping re-use,
      instead of using the map->live_grants atomic counter and/or the map->vma
      pointer (the latter of which is now removed). This prevents the
      userspace from mmap()'ing (with MAP_FIXED) a gntdev mapping over the
      same address range as a previously set up gntdev mapping. This scenario
      can be summarized with the following call-trace, which was valid prior
      to this commit:
      
        mmap
          gntdev_mmap
        mmap (repeat mmap with MAP_FIXED over the same address range)
          gntdev_invalidate
            unmap_grant_pages (sets 'being_removed' entries to true)
              gnttab_unmap_refs_async
          unmap_single_vma
          gntdev_mmap (maps the shared pages again)
        munmap
          gntdev_invalidate
            unmap_grant_pages
              (no-op because 'being_removed' entries are true)
          unmap_single_vma (For PV domains, Xen reports that a granted page
            is being unmapped and triggers a general protection fault in the
            affected domain, if Xen was built with CONFIG_DEBUG)
      
      The fix for this last scenario could be worth its own commit, but we
      opted for a single commit, because removing the gntdev_grant_map
      structure's vma field requires guarding the entry to gntdev_mmap(), and
      the live_grants atomic counter is not sufficient on its own to prevent
      the mmap() over a pre-existing mapping.
      
      Link: https://github.com/QubesOS/qubes-issues/issues/7631
      
      
      Fixes: ab31523c ("xen/gntdev: allow usermode to map granted pages")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarM. Vefa Bicakci <m.v.b@runbox.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Link: https://lore.kernel.org/r/20221002222006.2077-3-m.v.b@runbox.com
      
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3c6a888e
    • Juergen Gross's avatar
      xen: assume XENFEAT_gnttab_map_avail_bits being set for pv guests · 5232411f
      Juergen Gross authored
      
      [ Upstream commit 30dcc56b ]
      
      XENFEAT_gnttab_map_avail_bits is always set in Xen 4.0 and newer.
      Remove coding assuming it might be zero.
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Link: https://lore.kernel.org/r/20210730071804.4302-4-jgross@suse.com
      
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Stable-dep-of: 5c13a4a0291b ("xen/gntdev: Accommodate VMA splitting")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5232411f
    • Steven Rostedt (Google)'s avatar
      tracing: Do not free snapshot if tracer is on cmdline · ea82edad
      Steven Rostedt (Google) authored
      [ Upstream commit a541a9559bb0a8ecc434de01d3e4826c32e8bb53 ]
      
      The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the
      snapshot buffer at boot up for use later. The ftrace_boot_snapshot in
      particular requires the snapshot to be allocated because it will take a
      snapshot at the end of boot up allowing to see the traces that happened
      during boot so that it's not lost when user space takes over.
      
      When a tracer is registered (started) there's a path that checks if it
      requires the snapshot buffer or not, and if it does not and it was
      allocated it will do a synchronization and free the snapshot buffer.
      
      This is only required if the previous tracer was using it for "max
      latency" snapshots, as it needs to make sure all max snapshots are
      complete before freeing. But this is only needed if the previous tracer
      was using the snapshot buffer for latency (like irqoff tracer and
      friends). But it does not make sense to free it, if the previous tracer
      was not using it, and the snapshot was allocated by the cmdline
      parameters. This basically takes away the point of allocating it in the
      first place!
      
      Note, the allocated snapshot worked fine for just trace events, but fails
      when a tracer is enabled on the cmdline.
      
      Further investigation, this goes back even further and it does not require
      a tracer on the cmdline to fail. Simply enable snapshots and then enable a
      tracer, and it will remove the snapshot.
      
      Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home
      
      
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Fixes: 45ad21ca ("tracing: Have trace_array keep track if snapshot buffer is allocated")
      Reported-by: default avatarRoss Zwisler <zwisler@kernel.org>
      Tested-by: default avatarRoss Zwisler <zwisler@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ea82edad
    • sunliming's avatar
      tracing: Simplify conditional compilation code in tracing_set_tracer() · bd6af07e
      sunliming authored
      [ Upstream commit f4b0d318097e45cbac5e14976f8bb56aa2cef504 ]
      
      Two conditional compilation directives "#ifdef CONFIG_TRACER_MAX_TRACE"
      are used consecutively, and no other code in between. Simplify conditional
      the compilation code and only use one "#ifdef CONFIG_TRACER_MAX_TRACE".
      
      Link: https://lkml.kernel.org/r/20220602140613.545069-1-sunliming@kylinos.cn
      
      
      
      Signed-off-by: default avatarsunliming <sunliming@kylinos.cn>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Stable-dep-of: a541a9559bb0 ("tracing: Do not free snapshot if tracer is on cmdline")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bd6af07e
    • Dario Binacchi's avatar
      dmaengine: mxs: use platform_driver_register · 4e3a15ca
      Dario Binacchi authored
      
      [ Upstream commit 26696d4657167112a1079f86cba1739765c1360e ]
      
      Driver registration fails on SOC imx8mn as its supplier, the clock
      control module, is probed later than subsys initcall level. This driver
      uses platform_driver_probe which is not compatible with deferred probing
      and won't be probed again later if probe function fails due to clock not
      being available at that time.
      
      This patch replaces the use of platform_driver_probe with
      platform_driver_register which will allow probing the driver later again
      when the clock control module will be available.
      
      The __init annotation has been dropped because it is not compatible with
      deferred probing. The code is not executed once and its memory cannot be
      freed.
      
      Fixes: a580b8c5 ("dmaengine: mxs-dma: add dma support for i.MX23/28")
      Co-developed-by: default avatarMichael Trimarchi <michael@amarulasolutions.com>
      Signed-off-by: default avatarMichael Trimarchi <michael@amarulasolutions.com>
      Signed-off-by: default avatarDario Binacchi <dario.binacchi@amarulasolutions.com>
      Acked-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Cc: stable@vger.kernel.org
      
      Link: https://lore.kernel.org/r/20220921170556.1055962-1-dario.binacchi@amarulasolutions.com
      
      
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4e3a15ca
    • Fabio Estevam's avatar
      dmaengine: mxs-dma: Remove the unused .id_table · 1da5d249
      Fabio Estevam authored
      
      [ Upstream commit cc2afb0d ]
      
      The mxs-dma driver is only used by DT platforms and the .id_table
      is unused.
      
      Get rid of it to simplify the code.
      
      Signed-off-by: default avatarFabio Estevam <festevam@gmail.com>
      Link: https://lore.kernel.org/r/20201123193051.17285-1-festevam@gmail.com
      
      
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Stable-dep-of: 26696d465716 ("dmaengine: mxs: use platform_driver_register")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1da5d249
    • Dmitry Osipenko's avatar
      drm/virtio: Use appropriate atomic state in virtio_gpu_plane_cleanup_fb() · 1414e9bf
      Dmitry Osipenko authored
      
      [ Upstream commit 4656b3a26a9e9fe5f04bfd2ab55b066266ba7f4d ]
      
      Make virtio_gpu_plane_cleanup_fb() to clean the state which DRM core
      wants to clean up and not the current plane's state. Normally the older
      atomic state is cleaned up, but the newer state could also be cleaned up
      in case of aborted commits.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Osipenko <dmitry.osipenko@collabora.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20220630200726.1884320-6-dmitry.osipenko@collabora.com
      
      
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1414e9bf
    • Jerry Snitselaar's avatar
      iommu/vt-d: Clean up si_domain in the init_dmars() error path · d74196bb
      Jerry Snitselaar authored
      
      [ Upstream commit 620bf9f981365c18cc2766c53d92bf8131c63f32 ]
      
      A splat from kmem_cache_destroy() was seen with a kernel prior to
      commit ee2653bbe89d ("iommu/vt-d: Remove domain and devinfo mempool")
      when there was a failure in init_dmars(), because the iommu_domain
      cache still had objects. While the mempool code is now gone, there
      still is a leak of the si_domain memory if init_dmars() fails. So
      clean up si_domain in the init_dmars() error path.
      
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Fixes: 86080ccc ("iommu/vt-d: Allocate si_domain in init_dmars()")
      Signed-off-by: default avatarJerry Snitselaar <jsnitsel@redhat.com>
      Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com
      
      
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d74196bb
    • Charlotte Tan's avatar
      iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check() · ef11e8ec
      Charlotte Tan authored
      [ Upstream commit 5566e68d829f5d87670d5984c1c2ccb4c518405f ]
      
      arch_rmrr_sanity_check() warns if the RMRR is not covered by an ACPI
      Reserved region, but it seems like it should accept an NVS region as
      well. The ACPI spec
      https://uefi.org/specs/ACPI/6.5/15_System_Address_Map_Interfaces.html
      uses similar wording for "Reserved" and "NVS" region types; for NVS
      regions it says "This range of addresses is in use or reserved by the
      system and must not be used by the operating system."
      
      There is an old comment on this mailing list that also suggests NVS
      regions should pass the arch_rmrr_sanity_check() test:
      
       The warnings come from arch_rmrr_sanity_check() since it checks whether
       the region is E820_TYPE_RESERVED. However, if the purpose of the check
       is to detect RMRR has regions that may be used by OS as free memory,
       isn't  E820_TYPE_NVS safe, too?
      
      This patch overlaps with another proposed patch that would add the region
      type to the log since sometimes the bug reporter sees this log on the
      console but doesn't know to include the kernel log:
      
      https://lore.kernel.org/lkml/20220611204859.234975-3-atomlin@redhat.com/
      
      Here's an example of the "Firmware Bug" apparent false positive (wrapped
      for line length):
      
       DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR
             [0x000000006f760000-0x000000006f762fff], contact BIOS vendor for
             fixes
       DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR
             [0x000000006f760000-0x000000006f762fff]
      
      This is the snippet from the e820 table:
      
       BIOS-e820: [mem 0x0000000068bff000-0x000000006ebfefff] reserved
       BIOS-e820: [mem 0x000000006ebff000-0x000000006f9fefff] ACPI NVS
       BIOS-e820: [mem 0x000000006f9ff000-0x000000006fffefff] ACPI data
      
      Fixes: f036c7fa ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved")
      Cc: Will Mortensen <will@extrahop.com>
      Link: https://lore.kernel.org/linux-iommu/64a5843d-850d-e58c-4fc2-0a0eeeb656dc@nec.com/
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216443
      
      
      Signed-off-by: default avatarCharlotte Tan <charlotte@extrahop.com>
      Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Link: https://lore.kernel.org/r/20220929044449.32515-1-charlotte@extrahop.com
      
      
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ef11e8ec
    • Felix Riemann's avatar
      net: phy: dp83822: disable MDI crossover status change interrupt · 35c92435
      Felix Riemann authored
      
      [ Upstream commit 7f378c03aa4952507521174fb0da7b24a9ad0be6 ]
      
      If the cable is disconnected the PHY seems to toggle between MDI and
      MDI-X modes. With the MDI crossover status interrupt active this causes
      roughly 10 interrupts per second.
      
      As the crossover status isn't checked by the driver, the interrupt can
      be disabled to reduce the interrupt load.
      
      Fixes: 87461f7a ("net: phy: DP83822 initial driver submission")
      Signed-off-by: default avatarFelix Riemann <felix.riemann@sma.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20221018104755.30025-1-svc.sw.rte.linux@sma.de
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      35c92435
    • Eric Dumazet's avatar
      net: sched: fix race condition in qdisc_graft() · 7aa3d623
      Eric Dumazet authored
      
      [ Upstream commit ebda44da44f6f309d302522b049f43d6f829f7aa ]
      
      We had one syzbot report [1] in syzbot queue for a while.
      I was waiting for more occurrences and/or a repro but
      Dmitry Vyukov spotted the issue right away.
      
      <quoting Dmitry>
      qdisc_graft() drops reference to qdisc in notify_and_destroy
      while it's still assigned to dev->qdisc
      </quoting>
      
      Indeed, RCU rules are clear when replacing a data structure.
      The visible pointer (dev->qdisc in this case) must be updated
      to the new object _before_ RCU grace period is started
      (qdisc_put(old) in this case).
      
      [1]
      BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
      Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
      
      CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      print_address_description mm/kasan/report.c:317 [inline]
      print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
      kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
      __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
      __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
      tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
      rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
      netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
      __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f5efaa89279
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
      RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
      RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
      </TASK>
      
      Allocated by task 21027:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track mm/kasan/common.c:45 [inline]
      set_alloc_info mm/kasan/common.c:437 [inline]
      ____kasan_kmalloc mm/kasan/common.c:516 [inline]
      ____kasan_kmalloc mm/kasan/common.c:475 [inline]
      __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
      kmalloc_node include/linux/slab.h:623 [inline]
      kzalloc_node include/linux/slab.h:744 [inline]
      qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
      qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
      attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
      netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
      attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
      dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
      __dev_open+0x393/0x4d0 net/core/dev.c:1441
      __dev_change_flags+0x583/0x750 net/core/dev.c:8556
      rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
      rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
      __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
      rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
      rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
      netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
      __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 21020:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track+0x21/0x30 mm/kasan/common.c:45
      kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
      ____kasan_slab_free mm/kasan/common.c:367 [inline]
      ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
      kasan_slab_free include/linux/kasan.h:200 [inline]
      slab_free_hook mm/slub.c:1754 [inline]
      slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
      slab_free mm/slub.c:3534 [inline]
      kfree+0xe2/0x580 mm/slub.c:4562
      rcu_do_batch kernel/rcu/tree.c:2245 [inline]
      rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
      __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
      
      Last potentially related work creation:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
      call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
      qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
      notify_and_destroy net/sched/sch_api.c:1012 [inline]
      qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
      tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
      rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
      netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
      __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Second to last potentially related work creation:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
      kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
      neigh_destroy+0x431/0x630 net/core/neighbour.c:912
      neigh_release include/net/neighbour.h:454 [inline]
      neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
      neigh_del net/core/neighbour.c:225 [inline]
      neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
      neigh_forced_gc net/core/neighbour.c:276 [inline]
      neigh_alloc net/core/neighbour.c:447 [inline]
      ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
      ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
      __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
      ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
      NF_HOOK_COND include/linux/netfilter.h:296 [inline]
      ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
      dst_output include/net/dst.h:451 [inline]
      NF_HOOK include/linux/netfilter.h:307 [inline]
      NF_HOOK include/linux/netfilter.h:301 [inline]
      mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
      mld_send_cr net/ipv6/mcast.c:2121 [inline]
      mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
      process_one_work+0x991/0x1610 kernel/workqueue.c:2289
      worker_thread+0x665/0x1080 kernel/workqueue.c:2436
      kthread+0x2e4/0x3a0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
      
      The buggy address belongs to the object at ffff88802065e000
      which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 56 bytes inside of
      1024-byte region [ffff88802065e000, ffff88802065e400)
      
      The buggy address belongs to the physical page:
      page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
      head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
      prep_new_page mm/page_alloc.c:2532 [inline]
      get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
      __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
      alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
      alloc_slab_page mm/slub.c:1824 [inline]
      allocate_slab+0x27e/0x3d0 mm/slub.c:1969
      new_slab mm/slub.c:2029 [inline]
      ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
      __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
      slab_alloc_node mm/slub.c:3209 [inline]
      __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
      kmalloc_reserve net/core/skbuff.c:358 [inline]
      __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
      alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
      tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
      tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
      tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
      inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      sock_write_iter+0x291/0x3d0 net/socket.c:1108
      call_write_iter include/linux/fs.h:2187 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x9e9/0xdd0 fs/read_write.c:578
      ksys_write+0x1e8/0x250 fs/read_write.c:631
      page last free stack trace:
      reset_page_owner include/linux/page_owner.h:24 [inline]
      free_pages_prepare mm/page_alloc.c:1449 [inline]
      free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
      free_unref_page_prepare mm/page_alloc.c:3380 [inline]
      free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
      __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
      qlink_free mm/kasan/quarantine.c:168 [inline]
      qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
      kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
      __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
      kasan_slab_alloc include/linux/kasan.h:224 [inline]
      slab_post_alloc_hook mm/slab.h:727 [inline]
      slab_alloc_node mm/slub.c:3243 [inline]
      slab_alloc mm/slub.c:3251 [inline]
      __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
      kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
      kmem_cache_zalloc include/linux/slab.h:723 [inline]
      alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
      alloc_page_buffers+0x280/0x790 fs/buffer.c:829
      create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
      ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
      ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
      generic_perform_write+0x246/0x560 mm/filemap.c:3738
      ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
      ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
      call_write_iter include/linux/fs.h:2187 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x9e9/0xdd0 fs/read_write.c:578
      
      Fixes: af356afa ("net_sched: reintroduce dev->qdisc for use by sch_api")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Diagnosed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221018203258.2793282-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7aa3d623
    • Yang Yingliang's avatar
      net: hns: fix possible memory leak in hnae_ae_register() · 2974f3b3
      Yang Yingliang authored
      
      [ Upstream commit ff2f5ec5d009844ec28f171123f9e58750cef4bf ]
      
      Inject fault while probing module, if device_register() fails,
      but the refcount of kobject is not decreased to 0, the name
      allocated in dev_set_name() is leaked. Fix this by calling
      put_device(), so that name can be freed in callback function
      kobject_cleanup().
      
      unreferenced object 0xffff00c01aba2100 (size 128):
        comm "systemd-udevd", pid 1259, jiffies 4294903284 (age 294.152s)
        hex dump (first 32 bytes):
          68 6e 61 65 30 00 00 00 18 21 ba 1a c0 00 ff ff  hnae0....!......
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<0000000034783f26>] slab_post_alloc_hook+0xa0/0x3e0
          [<00000000748188f2>] __kmem_cache_alloc_node+0x164/0x2b0
          [<00000000ab0743e8>] __kmalloc_node_track_caller+0x6c/0x390
          [<000000006c0ffb13>] kvasprintf+0x8c/0x118
          [<00000000fa27bfe1>] kvasprintf_const+0x60/0xc8
          [<0000000083e10ed7>] kobject_set_name_vargs+0x3c/0xc0
          [<000000000b87affc>] dev_set_name+0x7c/0xa0
          [<000000003fd8fe26>] hnae_ae_register+0xcc/0x190 [hnae]
          [<00000000fe97edc9>] hns_dsaf_ae_init+0x9c/0x108 [hns_dsaf]
          [<00000000c36ff1eb>] hns_dsaf_probe+0x548/0x748 [hns_dsaf]
      
      Fixes: 6fe6611f ("net: add Hisilicon Network Subsystem hnae framework support")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20221018122451.1749171-1-yangyingliang@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2974f3b3
    • Pieter Jansen van Vuuren's avatar
      sfc: include vport_id in filter spec hash and equal() · 3032e316
      Pieter Jansen van Vuuren authored
      
      [ Upstream commit c2bf23e4a5af37a4d77901d9ff14c50a269f143d ]
      
      Filters on different vports are qualified by different implicit MACs and/or
      VLANs, so shouldn't be considered equal even if their other match fields
      are identical.
      
      Fixes: 7c460d9b ("sfc: Extend and abstract efx_filter_spec to cover Huntington/EF10")
      Co-developed-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarPieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
      Reviewed-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Link: https://lore.kernel.org/r/20221018092841.32206-1-pieter.jansen-van-vuuren@amd.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3032e316
    • Zhengchao Shao's avatar
      net: sched: sfb: fix null pointer access issue when sfb_init() fails · ded86c41
      Zhengchao Shao authored
      
      [ Upstream commit 2a3fc78210b9f0e85372a2435368962009f480fc ]
      
      When the default qdisc is sfb, if the qdisc of dev_queue fails to be
      inited during mqprio_init(), sfb_reset() is invoked to clear resources.
      In this case, the q->qdisc is NULL, and it will cause gpf issue.
      
      The process is as follows:
      qdisc_create_dflt()
      	sfb_init()
      		tcf_block_get()          --->failed, q->qdisc is NULL
      	...
      	qdisc_put()
      		...
      		sfb_reset()
      			qdisc_reset(q->qdisc)    --->q->qdisc is NULL
      				ops = qdisc->ops
      
      The following is the Call Trace information:
      general protection fault, probably for non-canonical address
      0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
      RIP: 0010:qdisc_reset+0x2b/0x6f0
      Call Trace:
      <TASK>
      sfb_reset+0x37/0xd0
      qdisc_reset+0xed/0x6f0
      qdisc_destroy+0x82/0x4c0
      qdisc_put+0x9e/0xb0
      qdisc_create_dflt+0x2c3/0x4a0
      mqprio_init+0xa71/0x1760
      qdisc_create+0x3eb/0x1000
      tc_modify_qdisc+0x408/0x1720
      rtnetlink_rcv_msg+0x38e/0xac0
      netlink_rcv_skb+0x12d/0x3a0
      netlink_unicast+0x4a2/0x740
      netlink_sendmsg+0x826/0xcc0
      sock_sendmsg+0xc5/0x100
      ____sys_sendmsg+0x583/0x690
      ___sys_sendmsg+0xe8/0x160
      __sys_sendmsg+0xbf/0x160
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f2164122d04
      </TASK>
      
      Fixes: e13e02a3 ("net_sched: SFB flow scheduler")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ded86c41
    • Zhengchao Shao's avatar
      net: sched: delete duplicate cleanup of backlog and qlen · 305aa36b
      Zhengchao Shao authored
      
      [ Upstream commit c19d893fbf3f2f8fa864ae39652c7fee939edde2 ]
      
      qdisc_reset() is clearing qdisc->q.qlen and qdisc->qstats.backlog
      _after_ calling qdisc->ops->reset. There is no need to clear them
      again in the specific reset function.
      
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Link: https://lore.kernel.org/r/20220824005231.345727-1-shaozhengchao@huawei.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Stable-dep-of: 2a3fc78210b9 ("net: sched: sfb: fix null pointer access issue when sfb_init() fails")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      305aa36b
    • Zhengchao Shao's avatar
      net: sched: cake: fix null pointer access issue when cake_init() fails · ae48bee2
      Zhengchao Shao authored
      
      [ Upstream commit 51f9a8921ceacd7bf0d3f47fa867a64988ba1dcb ]
      
      When the default qdisc is cake, if the qdisc of dev_queue fails to be
      inited during mqprio_init(), cake_reset() is invoked to clear
      resources. In this case, the tins is NULL, and it will cause gpf issue.
      
      The process is as follows:
      qdisc_create_dflt()
      	cake_init()
      		q->tins = kvcalloc(...)        --->failed, q->tins is NULL
      	...
      	qdisc_put()
      		...
      		cake_reset()
      			...
      			cake_dequeue_one()
      				b = &q->tins[...]   --->q->tins is NULL
      
      The following is the Call Trace information:
      general protection fault, probably for non-canonical address
      0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      RIP: 0010:cake_dequeue_one+0xc9/0x3c0
      Call Trace:
      <TASK>
      cake_reset+0xb1/0x140
      qdisc_reset+0xed/0x6f0
      qdisc_destroy+0x82/0x4c0
      qdisc_put+0x9e/0xb0
      qdisc_create_dflt+0x2c3/0x4a0
      mqprio_init+0xa71/0x1760
      qdisc_create+0x3eb/0x1000
      tc_modify_qdisc+0x408/0x1720
      rtnetlink_rcv_msg+0x38e/0xac0
      netlink_rcv_skb+0x12d/0x3a0
      netlink_unicast+0x4a2/0x740
      netlink_sendmsg+0x826/0xcc0
      sock_sendmsg+0xc5/0x100
      ____sys_sendmsg+0x583/0x690
      ___sys_sendmsg+0xe8/0x160
      __sys_sendmsg+0xbf/0x160
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f89e5122d04
      </TASK>
      
      Fixes: 046f6fd5 ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ae48bee2
    • Serge Semin's avatar
      nvme-hwmon: kmalloc the NVME SMART log buffer · 2008ad08
      Serge Semin authored
      
      [ Upstream commit c94b7f9bab22ac504f9153767676e659988575ad ]
      
      Recent commit 52fde2c07da6 ("nvme: set dma alignment to dword") has
      caused a regression on our platform.
      
      It turned out that the nvme_get_log() method invocation caused the
      nvme_hwmon_data structure instance corruption.  In particular the
      nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with
      garbage.  After some research we discovered that the problem happened
      even before the actual NVME DMA execution, but during the buffer mapping.
      Since our platform is DMA-noncoherent, the mapping implied the cache-line
      invalidations or write-backs depending on the DMA-direction parameter.
      In case of the NVME SMART log getting the DMA was performed
      from-device-to-memory, thus the cache-invalidation was activated during
      the buffer mapping.  Since the log-buffer isn't cache-line aligned, the
      cache-invalidation caused the neighbour data to be discarded.  The
      neighbouring data turned to be the data surrounding the buffer in the
      framework of the nvme_hwmon_data structure.
      
      In order to fix that we need to make sure that the whole log-buffer is
      defined within the cache-line-aligned memory region so the
      cache-invalidation procedure wouldn't involve the adjacent data. One of
      the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the
      rest of the NVME core driver prefer that method it has been chosen to fix
      this problem too.
      
      Note after a deeper researches we found out that the denoted commit wasn't
      a root cause of the problem. It just revealed the invalidity by activating
      the DMA-based NVME SMART log getting performed in the framework of the
      NVME hwmon driver. The problem was here since the initial commit of the
      driver.
      
      [1] Documentation/core-api/dma-api-howto.rst
      
      Fixes: 400b6a7b ("nvme: Add hardware monitoring support")
      Signed-off-by: default avatarSerge Semin <Sergey.Semin@baikalelectronics.ru>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2008ad08
    • Christoph Hellwig's avatar
      nvme-hwmon: consistently ignore errors from nvme_hwmon_init · 770b7e3a
      Christoph Hellwig authored
      
      [ Upstream commit 6b8cf94005187952f794c0c4ed3920a1e8accfa3 ]
      
      An NVMe controller works perfectly fine even when the hwmon
      initialization fails.  Stop returning errors that do not come from a
      controller reset from nvme_hwmon_init to handle this case consistently.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Stable-dep-of: c94b7f9bab22 ("nvme-hwmon: kmalloc the NVME SMART log buffer")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      770b7e3a
Loading