Sleeping / Waiting in a CUDA Thread -
i'm trying write cuda code calculate longest common subsequence. can't work out how make thread sleep until dependencies calculate it's cell satisfied:
i.e.
// ignore spurious maths here, messy data structures. planning ahead strings bigger gpu blocks. & j correct though. int real_i = blockdim.x * blockidx.x + threadidx.x; int real_j = blockdim.y * (max_offset - blockidx.x) + threadidx.y; char i_char = seq1[real_i]; char j_char = seq2[real_j]; // & j = 1 length if((real_i > 0 && real_j > 0) && (real_i < sequence_length && real_j < sequence_length) { printf("i: %d, j: %d\n", real_i, real_j); printf("i need wait dependancy @ i: %d j: %d , i: %d j: %d\n", real_i, (real_j - 1), real_i - 1, real_j); printf("is true? %d\n", (depend[sequence_length * real_i + (real_j - 1)] && depend[sequence_length * (real_i - 1) + real_j])); //wait dependency satisfied //this need code hang while( (depend[sequence_length * real_i + (real_j - 1)] == false) && (depend[sequence_length * (real_i - 1) + real_j] == false) ) { } if (i_char == j_char) c[sequence_length * real_i + real_j] = (c[sequence_length * (real_i - 1) + (real_j - 1)]) + 1; else c[sequence_length * real_i + real_j] = max(c[sequence_length * real_i + (real_j - 1)], c[sequence_length * (real_i - 1) + real_j]); // setting these true should allow other threads break past while block depend[sequence_length * real_i + (real_j - 1)] = true; depend[sequence_length * (real_i - 1) + real_j] = true; }
so thread should hang on while loop until dependencies, satisfied other threads before moving calculation code.
i know 'first' thread has dependencies satisfied prints
real 1, real j 1 need wait dependancy @ i: 1 j: 0 , i: 0 j: 1 true? 1
which once has finished calculation sets cells in dependency matrix true allowing 2 more threads past while loop , kernel moves there.
however if uncomment while loop whole system hangs ~10 seconds , get
the launch timed out , terminated
any suggestions?
it's bad idea sleep, better wait on condition variable or mutex.
on gpu every condition statement extremely expensive. if can, try parallelize code. make sure code finished in threads can use __syncthreads()
if still want use easiest solution add mutex, it's bad idea
Comments
Post a Comment