• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard?


  • Subject: Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard?
  • From: "Per Mildner" <email@hidden>
  • Date: Fri, 23 Nov 2007 11:00:34 +0100
  • Organization: SICS


Cancellation passes VSTH test suite conformance testing, and is
therefore compliant with the letter of the specification.

I do not think any tests could verify conformance, for the same reason no tests can verify the absence of bugs.


In general, it is implementation defined where uninterruptible blocking takes place in a given kernel implementation.

Perhaps if the blocking is brief, or at least finite. Clearly arbitrary system calls are not allowed to take forever in general.


SUSv3 requires that a thread suspended in one of the functions designated as cancellation points should be awakened when a request is made (by some other thread) to cancel it.

The following quote, from SUSv3, seems pretty clear to me in saying that blocking is _not_ allowed to go on indefinitely:

"If a thread has cancelability enabled and a cancellation request is made with the thread as a target while the thread is suspended at a cancellation point, the thread shall be awakened and the cancellation request shall be acted upon. "

That is, the blocking system call is not allowed to block forever.

This is the whole point of pthread cancelation. Otherwise there would be no need for kernel support at all and all user code could just precede the relevant calls with a call to pthread_testcancel().

If your attemp to explocily cancel is made on a thread whose state is not blocked in Mach with Mach thread state UNINT, then the action will be immediate, otherwise, whatever blocking operation must succeed prior to cancellation.

Then, as far as I can tell, such blocking operation cannot be performed by the SUSv3-designated cancellable functions if the blocking can go on for ever.


Another part to this is that it is unspecified in POSIX or the SUS whether notification operations are synchronous or isochronous or asynchronous. So, for example, you will not get notification of some events until you run up to the user/kernel boundary. An example of this is a signal received during a blocking operation with UNINT set on a thread, and UNINT not being cleared until the operation has completed. In this case, the operation (say, read that was waiting on a disk I/O as a result of a page fault) will complete successfully, rather than throwing an EINTR error, yet the signal handler also fires.

The issue here is blocking system calls that may/will _never_ complete. Also, you seem to be saying that SUSv3 allows a process to block in such a way that it cannot be killed by a signal from another process. In either case, thread cancelation is separate from the signal feature.


This type of operation is the same thing you expect on Tru64 UNIX

Luckilty we no longer support Tru64 :-)

So effectively, cancellability in what you've posted as your example is the same as having lost the scheduler race between the events preceeding the cancel and the join (I.e.: this I'd exactly how the code would behave, had the read completed before the cancellation was attempted).

The point is that the read will _not_ complete. The worker thread is waiting, forever, on input to appear since there will never become any input available.
Perhaps the example would have been clearer if the worker thread read from a locally created pipe where no-one writes.


Note that the sleep(10) in the example is only to make it likely, in the test code, that the worker thread really reaches the point when it blocks within read(). The sleep() is not used in place of proper synchronization or some such.

Without seeing the rest of your code, it's impossible to be more specific about the race window

This is not a race condition, it has nothing to do with the fairness or not of the scheduler.


I happen to know the person who will catch this bug report, and unless you have included a full example that can be compiled to reproduce the problem,

The submitted code reproduces the problem.

Regards,

it will bounce back to you as "insufficent information" to get one. But in any case, I expect the code is at fault here.

-- Terry

On Nov 22, 2007, at 11:11 AM, Per Mildner <email@hidden> wrote:


Sorry if this was the wrong forum for this question, it was not intended as a complaint or bug report. I hoped some kernel hacker would be able to shed some light on the technical background to the implementation of this new and important feature.


I have filed  Bug ID# 5610812 for this issue.

Regards,

On Nov 22, 2007, at 7:17 PM, Michael Smith wrote:

Per,

Whilst your complaint may well be legitimate, this is not a bug- reporting forum.

Please file a bug (bugreporter.apple.com) and include the bug number when discussing the matter here or (ideally) in any other forum.

= Mike

On Nov 22, 2007, at 2:41 AM, Per Mildner wrote:

My tests indicate that pthread_cancel still cannot interrupt blocking system calls like read(2) in Intel Mac OS X 10.5.1, despite the claim that it is now UNIX 2003 compliant. A brief look at the Darwin 9.0 kernel sources seems to confirm this.

The Single Unix Specification v3 (SUSv3) defines cancelation  points in
http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_09.html#tag_02_09_05_02

The critical part, and the reason why pthread_testcancel is not sufficient is "If a thread has cancelability enabled and a cancellation request is made with the thread as a target while the thread is suspended at a cancellation point, the thread shall be awakened and the cancellation request shall be acted upon.". That is, pthread_cancel should wake (a well defined list of) blocking system calls.

The below test program shows that a blocking read() is _not_ awakened by pthread_cancel, making the cancelation handling in Leopard no improvement over earlier OS releases and in violation of the SUSv3 standard.

The test program starts a worker thread that loops reading from standard input. The main thread then sleeps for a while to ensure that the worker thread has blocked in read(2) and the main thread then cancels the worker thread. This program terminates on Linux and Solaris but not on Intel Leopard 10.5.1.

From a cursory glance at xnu-1228, e.g. sys_generic.c, it looks as if the system calls that SUSv3 requires to be cancelation points are all implemented something like the following template:

int read(...) {
pthread_testcancel();
return read_nocancel(...);
}

That is, if a cancel is pending when the system call is entered then the call will be canceled but once the real system call is blocking it will not react to cancel requests.

Transcript:
bash$ gcc cancel_bug.c -Wall && ./a.out
Created thread, sleeping
calling read..<press return>
.called read()==1
calling read..<press return>
.called read()==1
calling read..cancelling read()
<blocks forever or until more input is available>

This is with Xcode 3.0. Compiling with -D_APPLE_C_SOURCE does not make a difference.


/* cancel_bug.c */ #include <assert.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h>

static int exit_status = EXIT_FAILURE;

#define CHECK(X) do { if ((X) == 0) { fprintf(stderr, "%s:%d CHECK FAILED\n", __FILE__, (int)__LINE__); } } while (0)

void *
func(void *arg)
{
char buf;
int oldstate;
ssize_t res;

exit_status = EXIT_SUCCESS;
/* redundant */
CHECK(pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &oldstate) ==  0);
/* Will not terminate unless cancelled or read error */
do
 {
   fprintf(stderr, "calling read..");fflush(stderr);
   res = read(STDIN_FILENO, &buf, 1);
   fprintf(stderr, ".called read()==%ld\n", (long)res);fflush (stderr);
 }
while (res != -1);

return NULL;
}

int
main(void)
{
pthread_t thread;
void *retval;

CHECK(pthread_create(&thread, NULL, func, NULL) == 0);
fprintf(stderr, "Created thread, sleeping\n");fflush(stderr);
sleep(10);                 /* ensure thread reaches blocking read */
fprintf(stderr, "cancelling read()\n");fflush(stderr);
CHECK(pthread_cancel(thread) == 0);
CHECK(pthread_join(thread, &retval) == 0);
CHECK(PTHREAD_CANCELED == retval);

return exit_status;
}

So, has Apple and the Open Group conformance testers made a huge mistake or am I missing something?

Regards,

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


_______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: This email sent to email@hidden



_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard?
      • From: Amanda Walker <email@hidden>
    • Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard?
      • From: Steve Checkoway <email@hidden>
References: 
 >pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard? (From: Per Mildner <email@hidden>)
 >Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard? (From: Michael Smith <email@hidden>)
 >Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard? (From: Per Mildner <email@hidden>)
 >Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard? (From: Terry Lambert <email@hidden>)

  • Prev by Date: Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard?
  • Next by Date: Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard?
  • Previous by thread: Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard?
  • Next by thread: Re: pthread_cancel and cancelation points still broken in Mac OS X 10.5 Leopard?
  • Index(es):
    • Date
    • Thread