Skip to content

Conversation

@Jo-stfc
Copy link
Collaborator

@Jo-stfc Jo-stfc commented May 20, 2025

V5.8.2patched

Jo-stfc and others added 30 commits May 12, 2025 15:27
The access mode values for files are not the same for all platforms.
The detection of a read only request here assumes that O_RDONLY is 0,
which is not mandated by the POSIX standaed.

Linux:
  #define O_ACCMODE       00000003
  #define O_RDONLY        00000000
  #define O_WRONLY        00000001
  #define O_RDWR          00000002

But this is also allowed:
  #define O_RDONLY	0x0001	/* Open read-only.  */
  #define O_WRONLY	0x0002	/* Open write-only.  */
  #define O_RDWR	(O_RDONLY|O_WRONLY) /* Open for reading and writing. */
  #define O_ACCMODE	O_RDWR	/* Mask for file access modes.  */

And on systems using this option the detection of a read only request
as implemented here is is broken and the cache test fails.

This commit fixes the detection so that it is done in the intended way
that works everywhere.
The test binaries are in ${CMAKE_BINARY_DIR}/bin which is already in PATH.
"man 2 open" says:

The argument flags must include one of the following access modes:
O_RDONLY, O_WRONLY, or O_RDWR.

So O_APPEND on its own is not allowed.

The change here is based on "man fdopen":

              ┌──────────────┬───────────────────────────────┐
              │ fopen() mode │ open() flags                  │
              ├──────────────┼───────────────────────────────┤
              │      r       │ O_RDONLY                      │
              ├──────────────┼───────────────────────────────┤
              │      w       │ O_WRONLY | O_CREAT | O_TRUNC  │
              ├──────────────┼───────────────────────────────┤
              │      a       │ O_WRONLY | O_CREAT | O_APPEND │
              ├──────────────┼───────────────────────────────┤
              │      r+      │ O_RDWR                        │
              ├──────────────┼───────────────────────────────┤
              │      w+      │ O_RDWR | O_CREAT | O_TRUNC    │
              ├──────────────┼───────────────────────────────┤
              │      a+      │ O_RDWR | O_CREAT | O_APPEND   │
              └──────────────┴───────────────────────────────┘
…as already been scanned during the initial namspace scan (thanks to bbockelm for tracing it down).
When using `X-Transfer-Status`, there were three subtle HTTP protocol
errors:
- The trailer contained a `\n` which is not a permitted character.
  This caused clients to view the last CRLF as leftover data on
  an idle TCP channel.
- A trailer was sent but no `Trailer` header was sent in the
  response.
- When the server said that the connection would be kept alive but
  an error occurred, it closed the TCP connection unexpectedly on
  the client.  The server now keeps the TCP connection open if
  keepalive is enabled, provided the HTTP response was sent cleanly.
This unit test (including an XrdOss wrapper to inject failures)
tests the behavior of the HTTP server when a read error occurs. The
server should either abruptly close the TCP connection or, if
Trailers are enabled, send an error message and keep the connection
alive.
Fixes lost response when an error occurs before the resend
of a requeued request. e.g. a requeued message can happen when
the client has been redirected or told to wait and resend a
request later by kxr_wait. An error before resend could be that
the stream connect is lost or the request times-out before resend.
This makes the code more readable and clears up potential thread safety
issues since correctness is undefined/unclear when some variable access
is racy but other isn't.

Additionally, since the order doesn't need to be sequentially consistent,
this removes any fencing instructions or memory barriers on x86.
There's no concurrency control to prevent a parallel Redrive from
executing while the object it is operating on is being destroyed.

If this line of code can be reached then it can potentially cause
corruption.  Hopefully, by loudly logging, we can observe if it
happens in practice.
Signed-off-by: Rahul Chauhan <rahul.chauhan@cern.ch>
Add permission denied error code for destination

Signed-off-by: Rahul Chauhan <rahul.chauhan@cern.ch>
The HTTP status text is not stored in the varible but
mapped from the status code in XrdHttpProtocol::StartSimpleResp.
This varible stores the detailed error message to be sent to the user
beyond the information provided by the http error codes.
This fixes xrdcp --recursive dir1/ root://server//dir/subdir, when
subdir/ does not exist at the remote end. However, due to how the code
currently handles directories by scanning them and creating a list of
files to copy, having similar behavior to plain cp (i.e. having the
contents of dir1/ appear in root://server//dir/subdir instead of in
root://server//dir/subdir/dir1/) is not easy to achieve without causing
a change in behavior also when copying a mix of files and directories
into the same destination. For now, to achieve this, the alternative is
to run the copy as xrdcp --recursive dir1/* root://server//dir/subdir,
which then puts the contents of dir1/ directly into subdir/ at the
destination.

Fixes: xrootd#2501
As soon as the block is freed, it may cause the file to be marked
as "inactive" (and thus eligible for deletion).

If a file sync is desired, then we must delay decreasing the block's
reference count until after the sync is scheduled to ensure it is still
alive when the sync scheduling increases the reference count on the file.

This change eliminates a use-after-free segfault observed on my
local stress test which would trigger about 50% of the time (10 clients
reading unique 3-byte files in a caching setup would crash after 20-30
seconds).
dynamic-entropy and others added 30 commits September 9, 2025 22:24
Move operations from a manager should succed
after redirection
- Add tests for rename operation with HTTP
  - MOVE operation is currently fails
	after redirection
  - Thus the tests assert against http code 401;
    a fix would change it to 201 instead
  - For redirection from meta-manager; move failes
	for a manager node
…peration; this allows rename after redirections
uint16_t to time_t. Since time_t can be 64 bits the Python bindings
need to use unsigned long long.
We test for overflow, but now timeouts use time_t, which may be
64 bits wide, so use 2^65 to force an overflow in that case too.
Reverts commit eb4295e as the hack
introduced there is no longer needed after the current fix.

Fixes: xrootd#2357
This adds support to get/put void* values in the client environment.
Since pointers are only defined during runtime, no support for shell
import is provided for pointers.

Issue: xrootd#2522
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants