Skip to content

.cu question #1285

@JinD123

Description

@JinD123

device float do_separable_conv_y(float pixels[CY][CCX], int H, int W, bool sq = false) {//此时4242变成了4232
auto block = cg::this_thread_block();
int local_y = block.thread_index().y + 5;//这里+5是为了要找到每个核处理的像素中心位置,线程还是那个线程,只是处理的像素以哪里为中心
int local_x = block.thread_index().x + 5;//这里x为什么会有+5??
float val = 0.0f;
// if (threadIdx.x == 31 && blockIdx.x == 0 && blockIdx.y == 0 && blockIdx.z == 0) {
// printf("local_y:%d,local_x:%d\n",local_y,local_x);
// }

val += G_00 * pixels[local_y - 5][local_x];
val += G_01 * pixels[local_y - 4][local_x];
val += G_02 * pixels[local_y - 3][local_x];
val += G_03 * pixels[local_y - 2][local_x];
val += G_04 * pixels[local_y - 1][local_x];
val += G_05 * pixels[local_y ][local_x];
val += G_06 * pixels[local_y + 1][local_x];
val += G_07 * pixels[local_y + 2][local_x];
val += G_08 * pixels[local_y + 3][local_x];
val += G_09 * pixels[local_y + 4][local_x];
val += G_10 * pixels[local_y + 5][local_x];

return val;
}

In the do_separable_conv_y function, the input shared memory array pixels has dimensions [CY][CCX], which is $42 \times 32$. Given that this function performs a vertical (column-wise) convolution, why is the column index calculated as int local_x = block.thread_index().x + 5;?"Since the valid column indices are $0$ to $31$, doesn't adding 5 cause the thread to try accessing columns $32$ through $36$, leading to an array out-of-bounds error? Shouldn't the calculation simply be int local_x = block.thread_index().x;? (Although the code actually runs without error in practice.)"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions