Skip to content
Snippets Groups Projects
  1. Aug 12, 2021
  2. Jul 19, 2021
    • Minchan Kim's avatar
      selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC · f3837182
      Minchan Kim authored
      
      [ Upstream commit 648f2c61 ]
      
      In the field, we have seen lots of allocation failure from the call
      path below.
      
      06-03 13:29:12.999 1010315 31557 31557 W Binder  : 31542_2: page allocation failure: order:0, mode:0x800(GFP_NOWAIT), nodemask=(null),cpuset=background,mems_allowed=0
      ...
      ...
      06-03 13:29:12.999 1010315 31557 31557 W Call trace:
      06-03 13:29:12.999 1010315 31557 31557 W         : dump_backtrace.cfi_jt+0x0/0x8
      06-03 13:29:12.999 1010315 31557 31557 W         : dump_stack+0xc8/0x14c
      06-03 13:29:12.999 1010315 31557 31557 W         : warn_alloc+0x158/0x1c8
      06-03 13:29:12.999 1010315 31557 31557 W         : __alloc_pages_slowpath+0x9d8/0xb80
      06-03 13:29:12.999 1010315 31557 31557 W         : __alloc_pages_nodemask+0x1c4/0x430
      06-03 13:29:12.999 1010315 31557 31557 W         : allocate_slab+0xb4/0x390
      06-03 13:29:12.999 1010315 31557 31557 W         : ___slab_alloc+0x12c/0x3a4
      06-03 13:29:12.999 1010315 31557 31557 W         : kmem_cache_alloc+0x358/0x5e4
      06-03 13:29:12.999 1010315 31557 31557 W         : avc_alloc_node+0x30/0x184
      06-03 13:29:12.999 1010315 31557 31557 W         : avc_update_node+0x54/0x4f0
      06-03 13:29:12.999 1010315 31557 31557 W         : avc_has_extended_perms+0x1a4/0x460
      06-03 13:29:12.999 1010315 31557 31557 W         : selinux_file_ioctl+0x320/0x3d0
      06-03 13:29:12.999 1010315 31557 31557 W         : __arm64_sys_ioctl+0xec/0x1fc
      06-03 13:29:12.999 1010315 31557 31557 W         : el0_svc_common+0xc0/0x24c
      06-03 13:29:12.999 1010315 31557 31557 W         : el0_svc+0x28/0x88
      06-03 13:29:12.999 1010315 31557 31557 W         : el0_sync_handler+0x8c/0xf0
      06-03 13:29:12.999 1010315 31557 31557 W         : el0_sync+0x1a4/0x1c0
      ..
      ..
      06-03 13:29:12.999 1010315 31557 31557 W SLUB    : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
      06-03 13:29:12.999 1010315 31557 31557 W cache   : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
      06-03 13:29:12.999 1010315 31557 31557 W node 0  : slabs: 57, objs: 2907, free: 0
      06-03 13:29:12.999 1010161 10686 10686 W SLUB    : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
      06-03 13:29:12.999 1010161 10686 10686 W cache   : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
      06-03 13:29:12.999 1010161 10686 10686 W node 0  : slabs: 57, objs: 2907, free: 0
      06-03 13:29:12.999 1010161 10686 10686 W SLUB    : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
      06-03 13:29:12.999 1010161 10686 10686 W cache   : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
      06-03 13:29:12.999 1010161 10686 10686 W node 0  : slabs: 57, objs: 2907, free: 0
      06-03 13:29:12.999 1010161 10686 10686 W SLUB    : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
      06-03 13:29:12.999 1010161 10686 10686 W cache   : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
      06-03 13:29:12.999 1010161 10686 10686 W node 0  : slabs: 57, objs: 2907, free: 0
      06-03 13:29:13.000 1010161 10686 10686 W SLUB    : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
      06-03 13:29:13.000 1010161 10686 10686 W cache   : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
      06-03 13:29:13.000 1010161 10686 10686 W node 0  : slabs: 57, objs: 2907, free: 0
      06-03 13:29:13.000 1010161 10686 10686 W SLUB    : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
      06-03 13:29:13.000 1010161 10686 10686 W cache   : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
      06-03 13:29:13.000 1010161 10686 10686 W node 0  : slabs: 57, objs: 2907, free: 0
      06-03 13:29:13.000 1010161 10686 10686 W SLUB    : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
      06-03 13:29:13.000 1010161 10686 10686 W cache   : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
      06-03 13:29:13.000 1010161 10686 10686 W node 0  : slabs: 57, objs: 2907, free: 0
      06-03 13:29:13.000 10230 30892 30892 W SLUB    : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
      06-03 13:29:13.000 10230 30892 30892 W cache   : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
      06-03 13:29:13.000 10230 30892 30892 W node 0  : slabs: 57, objs: 2907, free: 0
      06-03 13:29:13.000 10230 30892 30892 W SLUB    : Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
      06-03 13:29:13.000 10230 30892 30892 W cache   : avc_node, object size: 72, buffer size: 80, default order: 0, min order: 0
      
      Based on [1], selinux is tolerate for failure of memory allocation.
      Then, use __GFP_NOWARN together.
      
      [1] 476accbe ("selinux: use GFP_NOWAIT in the AVC kmem_caches")
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      [PM: subj fix, line wraps, normalized commit refs]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f3837182
  3. May 14, 2021
    • Paul Moore's avatar
      selinux: add proper NULL termination to the secclass_map permissions · 4c0ddc87
      Paul Moore authored
      
      commit e4c82eaf upstream.
      
      This patch adds the missing NULL termination to the "bpf" and
      "perf_event" object class permission lists.
      
      This missing NULL termination should really only affect the tools
      under scripts/selinux, with the most important being genheaders.c,
      although in practice this has not been an issue on any of my dev/test
      systems.  If the problem were to manifest itself it would likely
      result in bogus permissions added to the end of the object class;
      thankfully with no access control checks using these bogus
      permissions and no policies defining these permissions the impact
      would likely be limited to some noise about undefined permissions
      during policy load.
      
      Cc: stable@vger.kernel.org
      Fixes: ec27c356 ("selinux: bpf: Add selinux check for eBPF syscall operations")
      Fixes: da97e184 ("perf_event: Add support for LSM and SELinux checks")
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c0ddc87
  4. Apr 14, 2021
    • Ondrej Mosnacek's avatar
      selinux: fix race between old and new sidtab · a28124e8
      Ondrej Mosnacek authored
      commit 9ad6e9cb upstream.
      
      Since commit 1b8b31a2 ("selinux: convert policy read-write lock to
      RCU"), there is a small window during policy load where the new policy
      pointer has already been installed, but some threads may still be
      holding the old policy pointer in their read-side RCU critical sections.
      This means that there may be conflicting attempts to add a new SID entry
      to both tables via sidtab_context_to_sid().
      
      See also (and the rest of the thread):
      https://lore.kernel.org/selinux/CAFqZXNvfux46_f8gnvVvRYMKoes24nwm2n3sPbMjrB8vKTW00g@mail.gmail.com/
      
      
      
      Fix this by installing the new policy pointer under the old sidtab's
      spinlock along with marking the old sidtab as "frozen". Then, if an
      attempt to add new entry to a "frozen" sidtab is detected, make
      sidtab_context_to_sid() return -ESTALE to indicate that a new policy
      has been installed and that the caller will have to abort the policy
      transaction and try again after re-taking the policy pointer (which is
      guaranteed to be a newer policy). This requires adding a retry-on-ESTALE
      logic to all callers of sidtab_context_to_sid(), but fortunately these
      are easy to determine and aren't that many.
      
      This seems to be the simplest solution for this problem, even if it
      looks somewhat ugly. Note that other places in the kernel (e.g.
      do_mknodat() in fs/namei.c) use similar stale-retry patterns, so I think
      it's reasonable.
      
      Cc: stable@vger.kernel.org
      Fixes: 1b8b31a2 ("selinux: convert policy read-write lock to RCU")
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a28124e8
    • Ondrej Mosnacek's avatar
      selinux: fix cond_list corruption when changing booleans · fd75d73a
      Ondrej Mosnacek authored
      
      commit d8f5f0ea upstream.
      
      Currently, duplicate_policydb_cond_list() first copies the whole
      conditional avtab and then tries to link to the correct entries in
      cond_dup_av_list() using avtab_search(). However, since the conditional
      avtab may contain multiple entries with the same key, this approach
      often fails to find the right entry, potentially leading to wrong rules
      being activated/deactivated when booleans are changed.
      
      To fix this, instead start with an empty conditional avtab and add the
      individual entries one-by-one while building the new av_lists. This
      approach leads to the correct result, since each entry is present in the
      av_lists exactly once.
      
      The issue can be reproduced with Fedora policy as follows:
      
          # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
          allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
          allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True
          # setsebool ftpd_anon_write=off ftpd_connect_all_unreserved=off ftpd_connect_db=off ftpd_full_access=off
      
      On fixed kernels, the sesearch output is the same after the setsebool
      command:
      
          # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
          allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
          allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True
      
      While on the broken kernels, it will be different:
      
          # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
          allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
          allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
          allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
      
      While there, also simplify the computation of nslots. This changes the
      nslots values for nrules 2 or 3 to just two slots instead of 4, which
      makes the sequence more consistent.
      
      Cc: stable@vger.kernel.org
      Fixes: c7c556f1 ("selinux: refactor changing booleans")
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd75d73a
    • Ondrej Mosnacek's avatar
      selinux: make nslot handling in avtab more robust · 4f29b08e
      Ondrej Mosnacek authored
      
      commit 442dc00f upstream.
      
      1. Make sure all fileds are initialized in avtab_init().
      2. Slightly refactor avtab_alloc() to use the above fact.
      3. Use h->nslot == 0 as a sentinel in the access functions to prevent
         dereferencing h->htable when it's not allocated.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f29b08e
  5. Mar 30, 2021
    • Ondrej Mosnacek's avatar
      selinux: fix variable scope issue in live sidtab conversion · 19c9967e
      Ondrej Mosnacek authored
      
      commit 6406887a upstream.
      
      Commit 02a52c5c ("selinux: move policy commit after updating
      selinuxfs") moved the selinux_policy_commit() call out of
      security_load_policy() into sel_write_load(), which caused a subtle yet
      rather serious bug.
      
      The problem is that security_load_policy() passes a reference to the
      convert_params local variable to sidtab_convert(), which stores it in
      the sidtab, where it may be accessed until the policy is swapped over
      and RCU synchronized. Before 02a52c5c, selinux_policy_commit() was
      called directly from security_load_policy(), so the convert_params
      pointer remained valid all the way until the old sidtab was destroyed,
      but now that's no longer the case and calls to sidtab_context_to_sid()
      on the old sidtab after security_load_policy() returns may cause invalid
      memory accesses.
      
      This can be easily triggered using the stress test from commit
      ee1a84fd ("selinux: overhaul sidtab to fix bug and improve
      performance"):
      ```
      function rand_cat() {
      	echo $(( $RANDOM % 1024 ))
      }
      
      function do_work() {
      	while true; do
      		echo -n "system_u:system_r:kernel_t:s0:c$(rand_cat),c$(rand_cat)" \
      			>/sys/fs/selinux/context 2>/dev/null || true
      	done
      }
      
      do_work >/dev/null &
      do_work >/dev/null &
      do_work >/dev/null &
      
      while load_policy; do echo -n .; sleep 0.1; done
      
      kill %1
      kill %2
      kill %3
      ```
      
      Fix this by allocating the temporary sidtab convert structures
      dynamically and passing them among the
      selinux_policy_{load,cancel,commit} functions.
      
      Fixes: 02a52c5c ("selinux: move policy commit after updating selinuxfs")
      Cc: stable@vger.kernel.org
      Tested-by: default avatarTyler Hicks <tyhicks@linux.microsoft.com>
      Reviewed-by: default avatarTyler Hicks <tyhicks@linux.microsoft.com>
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      [PM: merge fuzz in security.h and services.c]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19c9967e
    • Ondrej Mosnacek's avatar
      selinux: don't log MAC_POLICY_LOAD record on failed policy load · 9731e08a
      Ondrej Mosnacek authored
      
      commit 519dad3b upstream.
      
      If sel_make_policy_nodes() fails, we should jump to 'out', not 'out1',
      as the latter would incorrectly log an MAC_POLICY_LOAD audit record,
      even though the policy hasn't actually been reloaded. The 'out1' jump
      label now becomes unused and can be removed.
      
      Fixes: 02a52c5c ("selinux: move policy commit after updating selinuxfs")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9731e08a
  6. Mar 04, 2021
  7. Dec 30, 2020
  8. Nov 13, 2020
  9. Oct 05, 2020
  10. Sep 15, 2020
  11. Sep 11, 2020
  12. Aug 31, 2020
  13. Aug 27, 2020
  14. Aug 26, 2020
    • Dan Carpenter's avatar
      selinux: fix error handling bugs in security_load_policy() · 0256b0aa
      Dan Carpenter authored
      
      There are a few bugs in the error handling for security_load_policy().
      
      1) If the newpolicy->sidtab allocation fails then it leads to a NULL
         dereference.  Also the error code was not set to -ENOMEM on that
         path.
      2) If policydb_read() failed then we call policydb_destroy() twice
         which meands we call kvfree(p->sym_val_to_name[i]) twice.
      3) If policydb_load_isids() failed then we call sidtab_destroy() twice
         and that results in a double free in the sidtab_destroy_tree()
         function because entry.ptr_inner and entry.ptr_leaf are not set to
         NULL.
      
      One thing that makes this code nice to deal with is that none of the
      functions return partially allocated data.  In other words, the
      policydb_read() either allocates everything successfully or it frees
      all the data it allocates.  It never returns a mix of allocated and
      not allocated data.
      
      I re-wrote this to only free the successfully allocated data which
      avoids the double frees.  I also re-ordered selinux_policy_free() so
      it's in the reverse order of the allocation function.
      
      Fixes: c7c556f1 ("selinux: refactor changing booleans")
      Acked-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      [PM: partially merged by hand due to merge fuzz]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      0256b0aa
  15. Aug 25, 2020
  16. Aug 24, 2020
  17. Aug 23, 2020
  18. Aug 21, 2020
    • Peter Enderborg's avatar
      selinux: add basic filtering for audit trace events · 30969bc8
      Peter Enderborg authored
      
      This patch adds further attributes to the event. These attributes are
      helpful to understand the context of the message and can be used
      to filter the events.
      
      There are three common items. Source context, target context and tclass.
      There are also items from the outcome of operation performed.
      
      An event is similar to:
                 <...>-1309  [002] ....  6346.691689: selinux_audited:
             requested=0x4000000 denied=0x4000000 audited=0x4000000
             result=-13
             scontext=system_u:system_r:cupsd_t:s0-s0:c0.c1023
             tcontext=system_u:object_r:bin_t:s0 tclass=file
      
      With systems where many denials are occurring, it is useful to apply a
      filter. The filtering is a set of logic that is inserted with
      the filter file. Example:
       echo "tclass==\"file\" " > events/avc/selinux_audited/filter
      
      This adds that we only get tclass=file.
      
      The trace can also have extra properties. Adding the user stack
      can be done with
         echo 1 > options/userstacktrace
      
      Now the output will be
               runcon-1365  [003] ....  6960.955530: selinux_audited:
           requested=0x4000000 denied=0x4000000 audited=0x4000000
           result=-13
           scontext=system_u:system_r:cupsd_t:s0-s0:c0.c1023
           tcontext=system_u:object_r:bin_t:s0 tclass=file
                runcon-1365  [003] ....  6960.955560: <user stack trace>
       =>  <00007f325b4ce45b>
       =>  <00005607093efa57>
      
      Signed-off-by: default avatarPeter Enderborg <peter.enderborg@sony.com>
      Reviewed-by: default avatarThiébaud Weksteen <tweek@google.com>
      Acked-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      30969bc8
    • Thiébaud Weksteen's avatar
      selinux: add tracepoint on audited events · dd816621
      Thiébaud Weksteen authored
      The audit data currently captures which process and which target
      is responsible for a denial. There is no data on where exactly in the
      process that call occurred. Debugging can be made easier by being able to
      reconstruct the unified kernel and userland stack traces [1]. Add a
      tracepoint on the SELinux denials which can then be used by userland
      (i.e. perf).
      
      Although this patch could manually be added by each OS developer to
      trouble shoot a denial, adding it to the kernel streamlines the
      developers workflow.
      
      It is possible to use perf for monitoring the event:
        # perf record -e avc:selinux_audited -g -a
        ^C
        # perf report -g
        [...]
            6.40%     6.40%  audited=800000 tclass=4
                     |
                        __libc_start_main
                        |
                        |--4.60%--__GI___ioctl
                        |          entry_SYSCALL_64
                        |          do_syscall_64
                        |          __x64_sys_ioctl
                        |          ksys_ioctl
                        |          binder_ioctl
                        |          binder_set_nice
                        |          can_nice
                        |          capable
                        |          security_capable
                        |          cred_has_capability.isra.0
                        |          slow_avc_audit
                        |          common_lsm_audit
                        |          avc_audit_post_callback
                        |          avc_audit_post_callback
                        |
      
      It is also possible to use the ftrace interface:
        # echo 1 > /sys/kernel/debug/tracing/events/avc/selinux_audited/enable
        # cat /sys/kernel/debug/tracing/trace
        tracer: nop
        entries-in-buffer/entries-written: 1/1   #P:8
        [...]
        dmesg-3624  [001] 13072.325358: selinux_denied: audited=800000 tclass=4
      
      The tclass value can be mapped to a class by searching
      security/selinux/flask.h. The audited value is a bit field of the
      permissions described in security/selinux/av_permissions.h for the
      corresponding class.
      
      [1] https://source.android.com/devices/tech/debug/native_stack_dump
      
      
      
      Signed-off-by: default avatarThiébaud Weksteen <tweek@google.com>
      Suggested-by: default avatarJoel Fernandes <joelaf@google.com>
      Reviewed-by: default avatarPeter Enderborg <peter.enderborg@sony.com>
      Acked-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      dd816621
    • Daniel Burgener's avatar
      selinux: Create new booleans and class dirs out of tree · 0eea6091
      Daniel Burgener authored
      
      In order to avoid concurrency issues around selinuxfs resource availability
      during policy load, we first create new directories out of tree for
      reloaded resources, then swap them in, and finally delete the old versions.
      
      This fix focuses on concurrency in each of the two subtrees swapped, and
      not concurrency between the trees.  This means that it is still possible
      that subsequent reads to eg the booleans directory and the class directory
      during a policy load could see the old state for one and the new for the other.
      The problem of ensuring that policy loads are fully atomic from the perspective
      of userspace is larger than what is dealt with here.  This commit focuses on
      ensuring that the directories contents always match either the new or the old
      policy state from the perspective of userspace.
      
      In the previous implementation, on policy load /sys/fs/selinux is updated
      by deleting the previous contents of
      /sys/fs/selinux/{class,booleans} and then recreating them.  This means
      that there is a period of time when the contents of these directories do not
      exist which can cause race conditions as userspace relies on them for
      information about the policy.  In addition, it means that error recovery in
      the event of failure is challenging.
      
      In order to demonstrate the race condition that this series fixes, you
      can use the following commands:
      
      while true; do cat /sys/fs/selinux/class/service/perms/status
      >/dev/null; done &
      while true; do load_policy; done;
      
      In the existing code, this will display errors fairly often as the class
      lookup fails.  (In normal operation from systemd, this would result in a
      permission check which would be allowed or denied based on policy settings
      around unknown object classes.) After applying this patch series you
      should expect to no longer see such error messages.
      
      Signed-off-by: default avatarDaniel Burgener <dburgener@linux.microsoft.com>
      Acked-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      0eea6091
    • Daniel Burgener's avatar
      selinux: Standardize string literal usage for selinuxfs directory names · 613ba187
      Daniel Burgener authored
      
      Switch class and policy_capabilities directory names to be referred to with
      global constants, consistent with booleans directory name.  This will allow
      for easy consistency of naming in future development.
      
      Signed-off-by: default avatarDaniel Burgener <dburgener@linux.microsoft.com>
      Acked-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      613ba187
    • Daniel Burgener's avatar
      selinux: Refactor selinuxfs directory populating functions · 66ec384a
      Daniel Burgener authored
      
      Make sel_make_bools and sel_make_classes take the specific elements of
      selinux_fs_info that they need rather than the entire struct.
      
      This will allow a future patch to pass temporary elements that are not in
      the selinux_fs_info struct to these functions so that the original elements
      can be preserved until we are ready to perform the switch over.
      
      Signed-off-by: default avatarDaniel Burgener <dburgener@linux.microsoft.com>
      Acked-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      66ec384a
    • Daniel Burgener's avatar
      selinux: Create function for selinuxfs directory cleanup · aeecf4a3
      Daniel Burgener authored
      
      Separating the cleanup from the creation will simplify two things in
      future patches in this series.  First, the creation can be made generic,
      to create directories not tied to the selinux_fs_info structure.  Second,
      we will ultimately want to reorder creation and deletion so that the
      deletions aren't performed until the new directory structures have already
      been moved into place.
      
      Signed-off-by: default avatarDaniel Burgener <dburgener@linux.microsoft.com>
      Acked-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      aeecf4a3
    • Stephen Smalley's avatar
      selinux: permit removing security.selinux xattr before policy load · 9530a3e0
      Stephen Smalley authored
      
      Currently SELinux denies attempts to remove the security.selinux xattr
      always, even when permissive or no policy is loaded.  This was originally
      motivated by the view that all files should be labeled, even if that label
      is unlabeled_t, and we shouldn't permit files that were once labeled to
      have their labels removed entirely.  This however prevents removing
      SELinux xattrs in the case where one "disables" SELinux by not loading
      a policy (e.g. a system where runtime disable is removed and selinux=0
      was not specified).  Allow removing the xattr before SELinux is
      initialized.  We could conceivably permit it even after initialization
      if permissive, or introduce a separate permission check here.
      
      Signed-off-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      9530a3e0
  19. Aug 20, 2020
  20. Aug 19, 2020
  21. Aug 18, 2020
    • Stephen Smalley's avatar
      selinux: refactor changing booleans · c7c556f1
      Stephen Smalley authored
      Refactor the logic for changing SELinux policy booleans in a similar
      manner to the refactoring of policy load, thereby reducing the
      size of the critical section when the policy write-lock is held
      and making it easier to convert the policy rwlock to RCU in the
      future.  Instead of directly modifying the policydb in place, modify
      a copy and then swap it into place through a single pointer update.
      Only fully copy the portions of the policydb that are affected by
      boolean changes to avoid the full cost of a deep policydb copy.
      Introduce another level of indirection for the sidtab since changing
      booleans does not require updating the sidtab, unlike policy load.
      While we are here, create a common helper for notifying
      other kernel components and userspace of a policy change and call it
      from both security_set_bools() and selinux_policy_commit().
      
      Based on an old (2004) patch by Kaigai Kohei [1] to convert the policy
      rwlock to RCU that was deferred at the time since it did not
      significantly improve performance and introduced complexity. Peter
      Enderborg later submitted a patch series to convert to RCU [2] that
      would have made changing booleans a much more expensive operation
      by requiring a full policydb_write();policydb_read(); sequence to
      deep copy the entire policydb and also had concerns regarding
      atomic allocations.
      
      This change is now simplified by the earlier work to encapsulate
      policy state in the selinux_policy struct and to refactor
      policy load.  After this change, the last major obstacle to
      converting the policy rwlock to RCU is likely the sidtab live
      convert support.
      
      [1] https://lore.kernel.org/selinux/6e2f9128-e191-ebb3-0e87-74bfccb0767f@tycho.nsa.gov/
      [2] https://lore.kernel.org/selinux/20180530141104.28569-1-peter.enderborg@sony.com/
      
      
      
      Signed-off-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      c7c556f1
    • Stephen Smalley's avatar
      selinux: move policy commit after updating selinuxfs · 02a52c5c
      Stephen Smalley authored
      
      With the refactoring of the policy load logic in the security
      server from the previous change, it is now possible to split out
      the committing of the new policy from security_load_policy() and
      perform it only after successful updating of selinuxfs.  Change
      security_load_policy() to return the newly populated policy
      data structures to the caller, export selinux_policy_commit()
      for external callers, and introduce selinux_policy_cancel() to
      provide a way to cancel the policy load in the event of an error
      during updating of the selinuxfs directory tree.  Further, rework
      the interfaces used by selinuxfs to get information from the policy
      when creating the new directory tree to take and act upon the
      new policy data structure rather than the current/active policy.
      Update selinuxfs to use these updated and new interfaces.  While
      we are here, stop re-creating the policy_capabilities directory
      on each policy load since it does not depend on the policy, and
      stop trying to create the booleans and classes directories during
      the initial creation of selinuxfs since no information is available
      until first policy load.
      
      After this change, a failure while updating the booleans and class
      directories will cause the entire policy load to be canceled, leaving
      the original policy intact, and policy load notifications to userspace
      will only happen after a successful completion of updating those
      directories.  This does not (yet) provide full atomicity with respect
      to the updating of the directory trees themselves.
      
      Signed-off-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      02a52c5c
    • Stephen Smalley's avatar
      selinux: encapsulate policy state, refactor policy load · 46169802
      Stephen Smalley authored
      
      Encapsulate the policy state in its own structure (struct
      selinux_policy) that is separately allocated but referenced from the
      selinux_ss structure.  The policy state includes the SID table
      (particularly the context structures), the policy database, and the
      mapping between the kernel classes/permissions and the policy values.
      Refactor the security server portion of the policy load logic to
      cleanly separate loading of the new structures from committing the new
      policy.  Unify the initial policy load and reload code paths as much
      as possible, avoiding duplicated code.  Make sure we are taking the
      policy read-lock prior to any dereferencing of the policy.  Move the
      copying of the policy capability booleans into the state structure
      outside of the policy write-lock because they are separate from the
      policy and are read outside of any policy lock; possibly they should
      be using at least READ_ONCE/WRITE_ONCE or smp_load_acquire/store_release.
      
      These changes simplify the policy loading logic, reduce the size of
      the critical section while holding the policy write-lock, and should
      facilitate future changes to e.g. refactor the entire policy reload
      logic including the selinuxfs code to make the updating of the policy
      and the selinuxfs directory tree atomic and/or to convert the policy
      read-write lock to RCU.
      
      Signed-off-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      46169802
    • Stephen Smalley's avatar
      scripts/selinux,selinux: update mdp to enable policy capabilities · 339949be
      Stephen Smalley authored
      
      Presently mdp does not enable any SELinux policy capabilities
      in the dummy policy it generates. Thus, policies derived from
      it will by default lack various features commonly used in modern
      policies such as open permission, extended socket classes, network
      peer controls, etc.  Split the policy capability definitions out into
      their own headers so that we can include them into mdp without pulling in
      other kernel headers and extend mdp generate policycap statements for the
      policy capabilities known to the kernel.  Policy authors may wish to
      selectively remove some of these from the generated policy.
      
      Signed-off-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      339949be
  22. Jul 19, 2020
    • Adrian Reber's avatar
      capabilities: Introduce CAP_CHECKPOINT_RESTORE · 124ea650
      Adrian Reber authored
      This patch introduces CAP_CHECKPOINT_RESTORE, a new capability facilitating
      checkpoint/restore for non-root users.
      
      Over the last years, The CRIU (Checkpoint/Restore In Userspace) team has
      been asked numerous times if it is possible to checkpoint/restore a
      process as non-root. The answer usually was: 'almost'.
      
      The main blocker to restore a process as non-root was to control the PID
      of the restored process. This feature available via the clone3 system
      call, or via /proc/sys/kernel/ns_last_pid is unfortunately guarded by
      CAP_SYS_ADMIN.
      
      In the past two years, requests for non-root checkpoint/restore have
      increased due to the following use cases:
      * Checkpoint/Restore in an HPC environment in combination with a
        resource manager distributing jobs where users are always running as
        non-root. There is a desire to provide a way to checkpoint and
        restore long running jobs.
      * Container migration as non-root
      * We have been in contact with JVM developers who are integrating
        CRIU into a Java VM to decrease the startup time. These
        checkpoint/restore applications are not meant to be running with
        CAP_SYS_ADMIN.
      
      We have seen the following workarounds:
      * Use a setuid wrapper around CRIU:
        See https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c
      * Use a setuid helper that writes to ns_last_pid.
        Unfortunately, this helper delegation technique is impossible to use
        with clone3, and is thus prone to races.
        See https://github.com/twosigma/set_ns_last_pid
      * Cycle through PIDs with fork() until the desired PID is reached:
        This has been demonstrated to work with cycling rates of 100,000 PIDs/s
        See https://github.com/twosigma/set_ns_last_pid
      
      
      * Patch out the CAP_SYS_ADMIN check from the kernel
      * Run the desired application in a new user and PID namespace to provide
        a local CAP_SYS_ADMIN for controlling PIDs. This technique has limited
        use in typical container environments (e.g., Kubernetes) as /proc is
        typically protected with read-only layers (e.g., /proc/sys) for
        hardening purposes. Read-only layers prevent additional /proc mounts
        (due to proc's SB_I_USERNS_VISIBLE property), making the use of new
        PID namespaces limited as certain applications need access to /proc
        matching their PID namespace.
      
      The introduced capability allows to:
      * Control PIDs when the current user is CAP_CHECKPOINT_RESTORE capable
        for the corresponding PID namespace via ns_last_pid/clone3.
      * Open files in /proc/pid/map_files when the current user is
        CAP_CHECKPOINT_RESTORE capable in the root namespace, useful for
        recovering files that are unreachable via the file system such as
        deleted files, or memfd files.
      
      See corresponding selftest for an example with clone3().
      
      Signed-off-by: default avatarAdrian Reber <areber@redhat.com>
      Signed-off-by: default avatarNicolas Viennot <Nicolas.Viennot@twosigma.com>
      Reviewed-by: default avatarSerge Hallyn <serge@hallyn.com>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Link: https://lore.kernel.org/r/20200719100418.2112740-2-areber@redhat.com
      
      
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      124ea650
  23. Jul 09, 2020
    • Ondrej Mosnacek's avatar
      selinux: complete the inlining of hashtab functions · 54b27f92
      Ondrej Mosnacek authored
      
      Move (most of) the definitions of hashtab_search() and hashtab_insert()
      to the header file. In combination with the previous patch, this avoids
      calling the callbacks indirectly by function pointers and allows for
      better optimization, leading to a drastic performance improvement of
      these operations.
      
      With this patch, I measured a speed up in the following areas (measured
      on x86_64 F32 VM with 4 CPUs):
        1. Policy load (`load_policy`) - takes ~150 ms instead of ~230 ms.
        2. `chcon -R unconfined_u:object_r:user_tmp_t:s0:c381,c519 /tmp/linux-src`
           where /tmp/linux-src is an extracted linux-5.7 source tarball -
           takes ~522 ms instead of ~576 ms. This is because of many
           symtab_search() calls in string_to_context_struct() when there are
           many categories specified in the context.
        3. `stress-ng --msg 1 --msg-ops 10000000` - takes 12.41 s instead of
           13.95 s (consumes 18.6 s of kernel CPU time instead of 21.6 s).
           This is thanks to security_transition_sid() being ~43% faster after
           this patch.
      
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Acked-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      54b27f92
    • Ondrej Mosnacek's avatar
      selinux: prepare for inlining of hashtab functions · 24def7bb
      Ondrej Mosnacek authored
      
      Refactor searching and inserting into hashtabs to pave the way for
      converting hashtab_search() and hashtab_insert() to inline functions in
      the next patch. This will avoid indirect calls and allow the compiler to
      better optimize individual callers, leading to a significant performance
      improvement.
      
      In order to avoid the indirect calls, the key hashing and comparison
      callbacks need to be extracted from the hashtab struct and passed
      directly to hashtab_search()/_insert() by the callers so that the
      callback address is always known at compile time. The kernel's
      rhashtable library (<linux/rhashtable*.h>) does the same thing.
      
      This of course makes the hashtab functions slightly easier to misuse by
      passing a wrong callback set, but unfortunately there is no better way
      to implement a hash table that is both generic and efficient in C. This
      patch tries to somewhat mitigate this by only calling the hashtab
      functions in the same file where the corresponding callbacks are
      defined (wrapping them into more specialized functions as needed).
      
      Note that this patch doesn't bring any benefit without also moving the
      definitions of hashtab_search() and -_insert() to the header file, which
      is done in a follow-up patch for easier review of the hashtab.c changes
      in this patch.
      
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Acked-by: default avatarStephen Smalley <stephen.smalley.work@gmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      24def7bb
Loading