Appendix B
In this section, we show our implementations of BiCGStab [33]. This iterative Krylov-subspace method requires a matrix A and a left-side vector b, and returns an approximate vector solution x of the system Ax = b. The algorithm (2) presents the general steps of the BiCGStab method.
Algorith 2 BiCGStab
1: r0 ← b − Ax0
2: r*0 arbitrary
3: p0 ←r0
4: for j= 0, 1... jmax do:
5: αj ← (rj, r*0)/(Apj, r*0)
6: sj ← rj − αj Apj
7: wj ← (Asj, sj)/(Asj, Asj)
8: xj+1 ← xj + αj pj + wj sj
9: rj+1 ←sj− wjAsj
10: βj ← (rj+1, r*0)/(rj, r*0) × αj /wj
11: pj+1 ← rj+1 + βj ( pj + wj Apj )
12: end for
The corresponding CUDA implementation used in this work is presented below.
init_res = cublasDnrm2(N,d_r,1);
if(init_res!=0.0) init_res=1./init_res;
else return false;cusparseDcsrmv(handle,CUSPARSE_OPERATION_NON_TRANSPOSE, N, N,-1.,
descr, d_val, d_row, d_col, d_x, 1., d_r);
res = cublasDnrm2(N,d_r,1);
res *= init_res;
if (res < tol) return true;cublasDcopy(N,d_r,1,d_rtilde,1);
while ( res > tol && k <= max_iter) {
ri_0 = cublasDdot(N,d_rtilde,1, d_r,1);
if (ri_0 == PrecType(0)) {
return false;
}if (k!=1){
if (omega == 0.) {
return false;}
beta = (ri_0 / r_ant) * (alpha / omega);
cublasDaxpy(N,-omega,d_Ap,1,d_p,1);
cublasDscal(N,beta,d_p,1);
cublasDaxpy(N,PrecType(1),d_r,1,d_p,1);
} else {
cublasDcopy(N,d_r,1, d_p,1);
}
cusparseDcsrmv(handle,CUSPARSE_OPERATION_NON_TRANSPOSE, N, N, 1.,
descr, d_val, d_row, d_col, d_p, 0., d_Ap);
alpha = ri_0 / cublasDdot(N,d_Ap,1, d_rtilde,1);
cublasDcopy(N,d_r,1,d_s,1);
cublasDaxpy(N,-alpha,d_Ap,1,d_s,1);
if ( cublasDnrm2(N,d_s,1)*init_res < tol ) {
cublasDaxpy(N,alpha,d_p,1,d_x,1);
return true;
}
cusparseDcsrmv(handle,CUSPARSE_OPERATION_NON_TRANSPOSE, N, N, 1.,
descr, d_val, d_row, d_col, d_s, 0., d_As);
omega = cublasDdot(N,d_As,1,d_s,1) / cublasDdot(N,d_As,1,d_As,1);
cublasDaxpy(N,alpha,d_p,1,d_x,1);
cublasDaxpy(N,omega,d_s,1,d_x,1);
cublasDcopy(N,d_s,1,d_r,1);
cublasDaxpy(N,-omega,d_As,1,d_r,1);
r_ant = ri_0;
res=cublasDnrm2(N,d_r,1)*init_res;
k++;
}
if(res < tol && k <= max_iter) return true;
else return max_iter+1;
The original matrix A was stored in the d_val, d_row and d_col arrays (format CRS), the left-side vector b in the d_r array, and the vector solution x was stored in the d_x array. For the PETSc implementation, only minimal changes must be done, changing for example, line with the matrix vector product in CUDA
cusparseDcsrmv(handle,CUSPARSE_OPERATION_NON_TRANSPOSE, N, N,-1.,
descr, d_val, d_row, d_col, d_x, 1., d_r);
by the line
ierr = MatMult(A,x,r);
where A, x, r are PETSc matrix and vectors.