Sleeping / Waiting in a CUDA Thread -


i'm trying write cuda code calculate longest common subsequence. can't work out how make thread sleep until dependencies calculate it's cell satisfied:

i.e.

// ignore spurious maths here, messy data structures. planning ahead strings bigger gpu blocks. & j correct though. int real_i = blockdim.x * blockidx.x + threadidx.x; int real_j = blockdim.y * (max_offset - blockidx.x) + threadidx.y;  char i_char = seq1[real_i]; char j_char = seq2[real_j];  // & j = 1 length if((real_i > 0 && real_j > 0) && (real_i < sequence_length && real_j < sequence_length) {      printf("i: %d, j: %d\n", real_i, real_j);     printf("i need wait dependancy @ i: %d j: %d , i: %d j: %d\n", real_i, (real_j - 1), real_i - 1, real_j);     printf("is true? %d\n", (depend[sequence_length * real_i + (real_j - 1)] && depend[sequence_length * (real_i - 1) + real_j]));      //wait dependency satisfied     //this need code hang     while( (depend[sequence_length * real_i + (real_j - 1)] == false) && (depend[sequence_length * (real_i - 1) + real_j] == false) ) {     }      if (i_char == j_char)         c[sequence_length * real_i + real_j] = (c[sequence_length * (real_i - 1) + (real_j - 1)]) + 1;      else         c[sequence_length * real_i + real_j] = max(c[sequence_length * real_i + (real_j - 1)], c[sequence_length * (real_i - 1) + real_j]);      // setting these true should allow other threads break past while block     depend[sequence_length * real_i + (real_j - 1)] = true;     depend[sequence_length * (real_i - 1) + real_j] = true; } 

so thread should hang on while loop until dependencies, satisfied other threads before moving calculation code.

i know 'first' thread has dependencies satisfied prints

real 1, real j 1 need wait dependancy @ i: 1 j: 0 , i: 0 j: 1 true? 1 

which once has finished calculation sets cells in dependency matrix true allowing 2 more threads past while loop , kernel moves there.

however if uncomment while loop whole system hangs ~10 seconds , get

the launch timed out , terminated 

any suggestions?

it's bad idea sleep, better wait on condition variable or mutex.

on gpu every condition statement extremely expensive. if can, try parallelize code. make sure code finished in threads can use __syncthreads()

if still want use easiest solution add mutex, it's bad idea


Comments

Popular posts from this blog

image - ClassNotFoundException when add a prebuilt apk into system.img in android -

I need to import mysql 5.1 to 5.5? -

Java, Hibernate, MySQL - store UTC date-time -