- Aug 21, 2020
-
-
Miaohe Lin authored
[ Upstream commit ce787a5a ] We should fput() file iff FDPUT_FPUT is set. So we should set fput_needed accordingly. Fixes: 00e188ef ("sockfd_lookup_light(): switch to fdget^W^Waway from fget_light") Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- May 27, 2020
-
-
R. Parameswaran authored
commit 57240d00 upstream. The MTU overhead calculation in L2TP device set-up merged via commit b784e7eb needs to be adjusted to lock the tunnel socket while referencing the sub-data structures to derive the socket's IP overhead. Reported-by:
Guillaume Nault <g.nault@alphalink.fr> Tested-by:
Guillaume Nault <g.nault@alphalink.fr> Signed-off-by:
R. Parameswaran <rparames@brocade.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Cc: Giuliano Procida <gprocida@google.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
R. Parameswaran authored
commit 113c3075 upstream. A new function, kernel_sock_ip_overhead(), is provided to calculate the cumulative overhead imposed by the IP Header and IP options, if any, on a socket's payload. The new function returns an overhead of zero for sockets that do not belong to the IPv4 or IPv6 address families. This is used in the L2TP code path to compute the total outer IP overhead on the L2TP tunnel socket when calculating the default MTU for Ethernet pseudowires. Signed-off-by:
R. Parameswaran <rparames@brocade.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Giuliano Procida <gprocida@google.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Jan 23, 2020
-
-
Arnd Bergmann authored
commit 9d7bf41f upstream. Unlike the normal SIOCOUTQ, SIOCOUTQNSD was never handled in compat mode. Add it to the common socket compat handler along with similar ones. Fixes: 2f4e1b39 ("tcp: ioctl type SIOCOUTQNSD returns amount of data not sent") Cc: Eric Dumazet <edumazet@google.com> Cc: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Mar 23, 2019
-
-
Andreas Gruenbacher authored
commit 971df15b upstream. The standard return value for unsupported attribute names is -EOPNOTSUPP, as opposed to undefined but supported attributes (-ENODATA). Also, fail for attribute names like "system.sockprotonameXXX" and simplify the code a bit. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> [removes a build warning on 4.4.y - gregkh] Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Nov 10, 2018
-
-
Wenwen Wang authored
[ Upstream commit b6168562 ] In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch statement to see whether it is necessary to pre-process the ethtool structure, because, as mentioned in the comment, the structure ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc' is allocated through compat_alloc_user_space(). One thing to note here is that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is partially determined by 'rule_cnt', which is actually acquired from the user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through get_user(). After 'rxnfc' is allocated, the data in the original user-space buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(), including the 'rule_cnt' field. However, after this copy, no check is re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user race to change the value in the 'compat_rxnfc->rule_cnt' between these two copies. Through this way, the attacker can bypass the previous check on 'rule_cnt' and inject malicious data. This can cause undefined behavior of the kernel and introduce potential security risk. This patch avoids the above issue via copying the value acquired by get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL. Signed-off-by:
Wenwen Wang <wang6495@umn.edu> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Aug 06, 2018
-
-
Jeremy Cline authored
commit c8e8cd57 upstream. 'call' is a user-controlled value, so sanitize the array index after the bounds check to avoid speculating past the bounds of the 'nargs' array. Found with the help of Smatch: net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue 'nargs' [r] (local cap) Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Signed-off-by:
Jeremy Cline <jcline@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Feb 03, 2018
-
-
Alexei Starovoitov authored
[ upstream commit 290af866 ] The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715. A quote from goolge project zero blog: "At this point, it would normally be necessary to locate gadgets in the host kernel code that can be used to actually leak data by reading from an attacker-controlled location, shifting and masking the result appropriately and then using the result of that as offset to an attacker-controlled address for a load. But piecing gadgets together and figuring out which ones work in a speculation context seems annoying. So instead, we decided to use the eBPF interpreter, which is built into the host kernel - while there is no legitimate way to invoke it from inside a VM, the presence of the code in the host kernel's text section is sufficient to make it usable for the attack, just like with ordinary ROP gadgets." To make attacker job harder introduce BPF_JIT_ALWAYS_ON config option that removes interpreter from the kernel in favor of JIT-only mode. So far eBPF JIT is supported by: x64, arm64, arm32, sparc64, s390, powerpc64, mips64 The start of JITed program is randomized and code page is marked as read-only. In addition "constant blinding" can be turned on with net.core.bpf_jit_harden v2->v3: - move __bpf_prog_ret0 under ifdef (Daniel) v1->v2: - fix init order, test_bpf and cBPF (Daniel's feedback) - fix offloaded bpf (Jakub's feedback) - add 'return 0' dummy in case something can invoke prog->bpf_func - retarget bpf tree. For bpf-next the patch would need one extra hunk. It will be sent when the trees are merged back to net-next Considered doing: int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT; but it seems better to land the patch as-is and in bpf-next remove bpf_jit_enable global variable from all JITs, consolidate in one place and remove this jit_init() function. Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Dec 20, 2017
-
-
Alexander Potapenko authored
[ Upstream commit 9f138fa6 ] KMSAN reports a use of uninitialized memory in put_cmsg() because msg.msg_flags in recvfrom haven't been initialized properly. The flag values don't affect the result on this path, but it's still a good idea to initialize them explicitly. Signed-off-by:
Alexander Potapenko <glider@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <alexander.levin@verizon.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Feb 26, 2017
-
-
Maxime Jayat authored
[ Upstream commit e623a9e9 ] Commit 34b88a68 ("net: Fix use after free in the recvmmsg exit path"), changed the exit path of recvmmsg to always return the datagrams variable and modified the error paths to set the variable to the error code returned by recvmsg if necessary. However in the case sock_error returned an error, the error code was then ignored, and recvmmsg returned 0. Change the error path of recvmmsg to correctly return the error code of sock_error. The bug was triggered by using recvmmsg on a CAN interface which was not up. Linux 4.6 and later return 0 in this case while earlier releases returned -ENETDOWN. Fixes: 34b88a68 ("net: Fix use after free in the recvmmsg exit path") Signed-off-by:
Maxime Jayat <maxime.jayat@mobile-devices.fr> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Nov 21, 2016
-
-
Soheil Hassas Yeganeh authored
[ Upstream commit 3023898b ] Do not send the next message in sendmmsg for partial sendmsg invocations. sendmmsg assumes that it can continue sending the next message when the return value of the individual sendmsg invocations is positive. It results in corrupting the data for TCP, SCTP, and UNIX streams. For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream of "aefgh" if the first sendmsg invocation sends only the first byte while the second sendmsg goes through. Datagram sockets either send the entire datagram or fail, so this patch affects only sockets of type SOCK_STREAM and SOCK_SEQPACKET. Fixes: 228e548e ("net: Add sendmmsg socket system call") Signed-off-by:
Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
Neal Cardwell <ncardwell@google.com> Acked-by:
Maciej Żenczykowski <maze@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Apr 20, 2016
-
-
Arnaldo Carvalho de Melo authored
[ Upstream commit 34b88a68 ] The syzkaller fuzzer hit the following use-after-free: Call Trace: [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295 [<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261 [< inline >] SYSC_recvmmsg net/socket.c:2281 [<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270 [<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185 And, as Dmitry rightly assessed, that is because we can drop the reference and then touch it when the underlying recvmsg calls return some packets and then hit an error, which will make recvmmsg to set sock->sk->sk_err, oops, fix it. Reported-and-Tested-by:
Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Fixes: a2e27255 ("net: Introduce recvmmsg socket syscall") http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Dec 30, 2015
-
-
Nicolai Stange authored
Commit ceb5d58b ("net: fix sock_wake_async() rcu protection") from the current 4.4 release cycle introduced a new flags member in struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA from struct socket's flags member into that new place. Unfortunately, the new flags field is never initialized properly, at least not for the struct socket_wq instance created in sock_alloc_inode(). One particular issue I encountered because of this is that my GNU Emacs failed to draw anything on my desktop -- i.e. what I got is a transparent window, including the title bar. Bisection lead to the commit mentioned above and further investigation by means of strace told me that Emacs is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is reproducible 100% of times and the fact that properly initializing the struct socket_wq ->flags fixes the issue leads me to the conclusion that somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags, preventing my Emacs from receiving any SIGIO's due to data becoming available and it got stuck. Make sock_alloc_inode() set the newly created struct socket_wq's ->flags member to zero. Fixes: ceb5d58b ("net: fix sock_wake_async() rcu protection") Signed-off-by:
Nicolai Stange <nicstange@gmail.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Dec 15, 2015
-
-
tadeusz.struk@intel.com authored
msg_iocb needs to be initialized on the recv/recvfrom path. Otherwise afalg will wrongly interpret it as an async call. Cc: stable@vger.kernel.org Reported-by:
Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by:
Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Dec 01, 2015
-
-
Eric Dumazet authored
Dmitry provided a syzkaller (http://github.com/google/syzkaller ) triggering a fault in sock_wake_async() when async IO is requested. Said program stressed af_unix sockets, but the issue is generic and should be addressed in core networking stack. The problem is that by the time sock_wake_async() is called, we should not access the @flags field of 'struct socket', as the inode containing this socket might be freed without further notice, and without RCU grace period. We already maintain an RCU protected structure, "struct socket_wq" so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it is the safe route. It also reduces number of cache lines needing dirtying, so might provide a performance improvement anyway. In followup patches, we might move remaining flags (SOCK_NOSPACE, SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket' being mostly read and let it being shared between cpus. Reported-by:
Dmitry Vyukov <dvyukov@google.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This patch is a cleanup to make following patch easier to review. Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA from (struct socket)->flags to a (struct socket_wq)->flags to benefit from RCU protection in sock_wake_async() To ease backports, we rename both constants. Two new helpers, sk_set_bit(int nr, struct sock *sk) and sk_clear_bit(int net, struct sock *sk) are added so that following patch can change their implementation. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 29, 2015
-
-
Viresh Kumar authored
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there is no need to do that again from its callers. Drop it. Acked-by:
Neil Horman <nhorman@tuxdriver.com> Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by:
Jiri Kosina <jkosina@suse.cz>
-
- May 11, 2015
-
-
Eric W. Biederman authored
This is long overdue, and is part of cleaning up how we allocate kernel sockets that don't reference count struct net. Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
There is no need for tun to do the weird network namespace refcounting. The existing network namespace refcounting in tfile has almost exactly the same lifetime. So rewrite the code to use the struct sock network namespace refcounting and remove the unnecessary hand rolled network namespace refcounting and the unncesary tfile->net. This change allows the tun code to directly call sock_put bypassing sock_release and making SOCK_EXTERNALLY_ALLOCATED unnecessary. Remove the now unncessary tun_release so that if anything tries to use the sock_release code path the kernel will oops, and let us know about the bug. The macvtap code already uses it's internal socket this way. Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Apr 15, 2015
-
-
David Howells authored
socket inodes and sunrpc filesystems - inodes owned by that code Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Apr 12, 2015
-
-
Al Viro authored
All places outside of core VFS that checked ->read and ->write for being NULL or called the methods directly are gone now, so NULL {read,write} with non-NULL {read,write}_iter will do the right thing in all cases. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Apr 11, 2015
-
-
Al Viro authored
convert open-coded instances Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
it's equal to iov_iter_count(&msg->msg_iter) in all cases Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Apr 09, 2015
-
-
Al Viro authored
For kernel_sendmsg() that eliminates the need to play with setfs(); for kernel_recvmsg() it does *not* - a couple of callers are using it with non-NULL ->msg_control, which would be treated as userland address on recvmsg side of things. In all cases we are really setting a kvec-backed iov_iter, though. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Mar 23, 2015
-
-
tadeusz.struk@intel.com authored
Add support for async operations. Signed-off-by:
Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Mar 20, 2015
-
-
Al Viro authored
Cc: stable@vger.kernel.org # v3.19 Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Mar 13, 2015
-
-
Christoph Hellwig authored
The AIO interface is fairly complex because it tries to allow filesystems to always work async and then wakeup a synchronous caller through aio_complete. It turns out that basically no one was doing this to avoid the complexity and context switches, and we've already fixed up the remaining users and can now get rid of this case. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
There is no need to pass the total request length in the kiocb, as we already get passed in through the iov_iter argument. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Mar 02, 2015
-
-
Ying Xue authored
After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by:
Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by:
Ying Xue <ying.xue@windriver.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eyal Birger authored
Commit 97775007 ("af_packet: add interframe drop cmsg (v6)") unionized skb->mark and skb->dropcount in order to allow recording of the socket drop count while maintaining struct sk_buff size. skb->dropcount was introduced since there was no available room in skb->cb[] in packet sockets. However, its introduction led to the inability to export skb->mark, or any other aliased field to userspace if so desired. Moving the dropcount metric to skb->cb[] eliminates this problem at the expense of 4 bytes less in skb->cb[] for protocol families using it. Signed-off-by:
Eyal Birger <eyal.birger@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Feb 04, 2015
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Jan 29, 2015
-
-
Christoph Hellwig authored
The sock_iocb structure is allocate on stack for each read/write-like operation on sockets, and contains various fields of which only the embedded msghdr and sometimes a pointer to the scm_cookie is ever used. Get rid of the sock_iocb and put a msghdr directly on the stack and pass the scm_cookie explicitly to netlink_mmap_sendmsg. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jan 27, 2015
-
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jan 18, 2015
-
-
Nicolas Dichtel authored
This field already contains the length of the iovec, no need to calculate it again. Suggested-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jan 15, 2015
-
-
Nicolas Dichtel authored
Better to use available helpers. Signed-off-by:
Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Dec 19, 2014
-
-
Al Viro authored
Reported-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Dec 11, 2014
-
-
Al Viro authored
As it is, default ->i_fop has NULL ->open() (along with all other methods). The only case where it matters is reopening (via procfs symlink) a file that didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned to something sane (default would fail on read/write/ioctl/etc.). Unfortunately, such case exists - alloc_file() users, especially anon_get_file() ones. There we have tons of opened files of very different kinds sharing the same inode. As the result, attempt to reopen those via procfs succeeds and you get a descriptor you can't do anything with. Moreover, in case of sockets we set ->i_fop that will only be used on such reopen attempts - and put a failing ->open() into it to make sure those do not succeed. It would be simpler to put such ->open() into default ->i_fop and leave it unchanged both for anon inode (as we do anyway) and for socket ones. Result: * everything going through do_dentry_open() works as it used to * sock_no_open() kludge is gone * attempts to reopen anon-inode files fail as they really ought to * ditto for aio_private_file() * ditto for perfmon - this one actually tried to imitate sock_no_open() trick, but failed to set ->i_fop, so in the current tree reopens succeed and yield completely useless descriptor. Intent clearly had been to fail with -ENXIO on such reopens; now it actually does. * everything else that used alloc_file() keeps working - it has ->i_fop set for its inodes anyway Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-