INTRODUCTION
Percutaneous nephrolithotomy is now the firstline treatment option for large, complex stones and staghorn calculi, greatly reducing the need for open surgery.1-2
Several studies have identified significant predictors of stone-free rate after percutaneous nephrolithotomy, with stone size, number, location, and pyelocaliceal system anatomy as the suggested predictors.3-4 Nevertheless, a significant predictor alone, is not a predictive tool. Some authors developed different scoring systems to standardize the terminology in relation to stone complexity. The Guy’s stone score,5 S.T.O.N.E. nephrolithometry,6 CROES nomogram,7 and S-ReSC scoring system8 have recently been externally validated and they all effectively predicted stone-free rate after percutaneous nephrolithotomy.9-16 However, none of them has gained wide acceptance or implementation into clinical practice.17
There are many benefits to having a standardized method of predicting the stone-free rate after percutaneous nephrolithotomy. The primary aim of our study was to compare the most popular scoring systems, describe their advantages to identify the most accurate scale, and propose its standardized use. Our study is the first comparison of the four scoring systems in the same patient group.
MATERIALS AND METHODS:
We analyzed a total of 188 patients that underwent percutaneous nephrolithotomy for kidney stones, within the time frame of October 2010 and July 2015 at a tertiary care referral center. Patients with incomplete data (n=36) were excluded from the study. All procedures were performed by a single experienced endourologist. The patients were placed in the proneposition and received general anesthesia, and the surgical technique was carried out according to previously published manuscripts. 18-19
A preoperative non-contrast computed tomography scan was utilized in all patients to evaluate stone characteristics (stone burden, laterality, location, number, and density). Stone burden was estimated using the following formula: length x width x pi x 0.25, where pi is a mathematical constant equal to 3.14.20 A junior urology resident from our institution reviewed all images and calculated the corresponding Guy’s,5 S.T.O.N.E.,6 CROES,7 and S-ReSC8 scores. We compared and correlated the scores with preoperative and postoperative data. Each scoring system was categorized according to its original description, but the CROES nomogram score and stone burden were randomly categorized for better statistical analysis.
The demographic data and length of hospitalization were available from our prospective percutaneous nephrolithotomy database. Postoperative complication data were graded using the modified Clavien classification system,21 collected from the contemporaneous electronic patient records, radiologic imaging findings, and paper case notes. Patients had a non-contrast computed tomography scan between the first and third month follow-up visit. Postoperative stonefree rate was defined using the strict criterion of absolute absence of residual stone.
Statistical analyses were performed with the Statistical Package of Social Sciences version 20 (SPSS, Chicago, IL, USA). Categorical variables were presented as numbers and percentages and compared with the chi-square test. The KruskalWallis test was used for the statistical analysis of the ordinal variables to assess the categories of the scoring systems. Continuous variables were presented as means and standard deviations and compared with an independent sample t test.
Correlation analyses were evaluated using the Pearson correlation coefficient (r). The area under the curve (AUC), calculated from the receiver operating characteristic (ROC) curve, was used to assess the predictive ability of the different scoring systems. The AUCs were compared using the online calculator of significance of difference between areas under two independent ROC curves from the website http://vassarstats.net/roc_comp.html accessed on November 30th, 2015. Statistical significance was considered at a two-tailed p value <0.05.
RESULTS
We identified 152 patients that underwent percutaneous nephrolithotomy between 2010 and 2015 at a single tertiary care referral center and that met the study inclusion criteria. Table 1 shows the demographic and preoperative characteristics of the patients. The overall stone-free rate was 57.9%, with the strict criterion of absolute absence of residual stone. The postoperative complication rate was 39.5%, with Clavien grade I in 37 patients, Clavien grade II in 14 patients, Clavien grade III in 3 patients, and Clavien grade IV in 6 patients. There were no deaths.
Stone-free | Not stone-free | p | |
---|---|---|---|
Number of patients (%) | 88 (57) | 64 (42.1) | |
Mean age ± SD (years) | 49.48 ± 14.1 | 47.65 ± 13 | 0.418a |
Sex (%) | |||
Male | 34 (38.6) | 20 (31.2) | 0.34 b |
Female | 54 (61.4) | 44 (68.8) | |
Laterality (%) | |||
Right | 43 (48.9) | 25 (39.1) | 0.23b |
Left | 45 (51.1) | 39 (60.9) | |
Mean BMI ± SD (kg/m2) | 28.54 ± 5.6 | 27.21 ± 4.8 | 0.161a |
Mean Hounsfield Units ± SD | 840.3 ± 335.3 | 893.7 ± 275.7 | 0.298a |
Mean number of stones ± SD | 2.1 ± 1.6 | 3.3 ± 2.3 | 0.001*a |
Mean operative time ± SD (min) | 167 ± 67.34 | 245.1 ± 100.08 | <0.001*a |
Mean length of hospital stay ± SD (days) | 3.4 ± 3.6 | 4.4 ± 3.4 | 0.081a |
Number of staghorn stones (%) | 16 (18.2) | 20 (54.7) | <0.0001*a |
Multiple locations (%) | 36 (81.8) | 8 (18.2) | <0.0001*a |
Mean stone burden ± SD (mm2) | 411.5 ± 362.7 | 895.6 ± 693.2 | <0.0001*a |
Mean Guy’s stone score ± SD | 1.99 ± 1.07 | 3.23 ± 0.95 | <0.001*a |
Mean S.T.O.N.E. score ± SD | 6.94 ± 1.53 | 9.42 ± 1.79 | <0.001*a |
Mean CROES score ± SD | 177.34 ± 50.46 | 127.28 ± 59.76 | <0.001*a |
Mean S-ReSC score ± SD | 2.88 ± 1.69 | 5.03 ± 2.42 | <0.001*a |
* Statistical significance <0.05; a Compared with an independent sample t test; b Compared with the chi-square test.
In patients that were stone-free and in those with residual stones, the mean Guy’s stone score was 1.99 and 3.23, the mean S.T.O.N.E. score was 6.94 and 9.42, the mean CROES score was 177.34 and 127.28, and the mean S-ReSC score was 2.88 and 5.03, respectively (p<0.001 each). In patients that presented with any complications and in those with no complications, the mean Guy’s stone score was 2.6 and 2.4 (p=0.38), the mean S.T.O.N.E. score was 8.3 and 7.6 (p=0.069), the mean CROES score was 146.2 and 164.8 (p=0.061), and the mean S-ReSC score was 4.2 and 3.33 (p=0.016), respectively.
Table 2 shows the stone-free rate of each of the four scoring systems. The S.T.O.N.E., CROES, and S-ReSC groups were significantly associated with the stone-free rate and complication rate. The Guy’s stone score was associated with the stonefree rate, but not the complication rate. Each scale had a correlation with operative time and length of hospital stay: Guy’s stone score (r=0.41, p<0.001) (r=0.22, p=0.007), S.T.O.N.E. score (r=0.50, p<0.001) (r=0.33, p<0.001), CROES score (r=0.40, p<0.001) (r=0.27, p<0.001), and S-ReSC score (r=0.35, p<0.001) (r=0.20, p=0.012), respectively. Stone burden also correlated with operative time and hospital stay duration (r=0.41, p<0.0001) (r=0.41, p=0.022).
Scoring system | No. stone free/ Total No. (%) | p c | No. complication/ Total No. (%) | p c |
---|---|---|---|---|
CROES | ||||
0-100 | 3/29 (10.3) | 18/29 (62) | ||
101-200 | 53/82 (64.6) | <0.0001* | 27/82 (32.9) | 0.020* |
201 or greater | 32/41 (78) | 15/41 (36.5) | ||
S.T.O.N.E. (category) | ||||
4-5 | 14/15 (93.3) | 1/15 (6.6) | ||
6-8 | 56/74 (75.7) | <0.0001* | 30/74 (40.5) | 0.020* |
9-13 | 18/63 (28.6) | 29/63 (46) | ||
Guy’s (grade) | ||||
I | 36/38 (94.7) | 14/38 (36.8) | ||
II | 32/49 (65.3) | <0.0001* | 16/49 (32.6) | 0.499 |
III | 5/14 (35.7) | 7/14 (50) | ||
IV | 15/51 (29.4) | 23/51 (45) | ||
S-ReSC (category) | ||||
1-2 | 47/57 (82.4) | 17/57 (29.8) | ||
3-4 | 27/47 (57.4) | <0.0001* | 15/47 (31.9) | 0.005* |
5-9 | 14/48 (29.1) | 28/48 (58.3) | ||
Stone burden (mm2) | ||||
1-500 | 64/87 (73.5) | 32/87 (36.7) | ||
501-1000 | 21/38 (55.2) | <0.0001* | 10/38 (26.3) | 0.004* |
>1000 | 3/27 (11.1) | 18/27 (66.6) |
* Statistical significance <0.05; c Compared with the Kruskal-Wallis test.
Table 3 and Figure 1 show the AUC and ROC curves for each scoring system and for stone burden in relation to the stone-free rate. All scoring systems had similar accuracy and none was more predictive for stone-free rate than another. There was no significant difference in the AUC between the four scoring systems (p=0.2). However, the Guy’s stone score had the greatest AUC for predicting the stone-free rate. Table 4 and Figure 2 show the AUC and ROC curves in relation to the complication rates. All the scoring systems had poor predictive capacity for complications and only the S-ReSC score had a statistically significant AUC (p=0.007).
Scoring system | AUC | 95% CI | p |
---|---|---|---|
Guy’s | 0.791 | (0.71)-(0.86) | <0.0001* |
S.T.O.N.E. | 0.767 | (0.69)-(0.84) | <0.0001* |
CROES | 0.722 | (0.64)-(0.80) | <0.0001* |
S-ReSC | 0.746 | (0.66)-(0.85) | <0.0001* |
Stone burden (categorized) | 0.724 | (0.63)-(.80) | <0.0001* |
Scoring system | AUC | 95% CI | p |
---|---|---|---|
Guy’s | 0.550 | (0.45)-(0.64) | 0.300 |
S.T.O.N.E. | 0.591 | (0.50)-(0.68) | 0.058 |
CROES | 0.579 | (0.48)-(0.67) | 0.100 |
S-ReSC | 0.630 | (0.53)-(0.72) | 0.007* |
Stone burden (categorized) | 0.570 | (0.47)-(.66) | 0.147 |
DISCUSSION
Multiple attempts to identify significant predictors of stone-free rate after percutaneous nephrolithotomy have been made, since the procedure became the first-line surgical treatment for kidney stones.1-2 Tefekli et al. divided stones into simple or complex calculi, according to their location in the renal pelvis and calices.22
Nevertheless, it is not enough to standardize the terminology of the complexity of the procedure. Several scoring systems for predicting the stone-free rate have recently been published, but none of them has widespread acceptance or implementation in reported clinical practice.
The potential benefit of a standardized method for predicting the stone-free rate after percutaneous nephrolithotomy is reported across different case series.23-24 The clear advantages of a widelyaccepted scoring system include more accurate preoperative patient counseling, surgical planning, and outcome evaluation, as well as uniform academic reporting. It could also aid in resource management, in referring complex cases to specialized centers,25 or even in making the decisions to use adjunct techniques, such as combining the procedure with ureterorenoscopy. The ideal scoring system must be simple, reproducible, and provide a high degree of accuracy to estimate the success of the procedure. The Guy’s stone score, S.T.O.N.E. nephrolithometry, CROES nomogram, and S-ReSC scoring system have recently been externally validated as predictors of stone-free rate after percutaneous nephrolithotomy, using preoperative non-contrast computed tomography.9,15-17 However, our study provides the first comparison of the four scoring systems in the same cohort, with the strict criterion of absolute absence of residual stone in the non-contrast computed tomography study. As our results show, those four scoring systems were significantly associated with the stone-free rate after percutaneous nephrolithotomy (p<0.0001), which has also been demonstrated by original articles describing the same scales.5-8 Those results have recently been replicated in several studies, but there is much discrepancy among them, regarding the predictive accuracy of each scoring system.5,15-16 This may be due to the fact that each system was constructed, based on the patient population analyzed, resulting in an intrinsic bias favoring predictive efficacy. In our study, the Guy’s stone score showed the wider AUC, reaching 0.791, followed by the S.T.O.N.E., S-ReSC, and CROES scores, respectively. Sfoungaristos et al.9 described a very similar ROC curve for the Guy’s stone score in their study, with a high AUC of 0.796 and excellent statistical significance (p<0.001).
Another relevant factor that could be involved in the inter-study discrepancy is the absence of a standardized definition for stone-free rate after percutaneous nephrolithotomy. Many of the studies applied the criterion suggested by Opondo et al.24 of no stones visible, or the presence of clinically insignificant residual fragments <4 mm, for treatment success. Nonetheless, residual stone size does not always correlate with clinical significance.26-27 Moreover, different imaging methods were employed in some of the studies. Abdominal X-ray was the most commonly used method for stone-free evaluation, which is inferior to non-contrast computed tomography imaging for residual fragment assessment.28 Sensitivity was reported at close to 70%, and with a cut-off level of <4 mm, it reached 85.7%, whereas non-contrast computed tomography had almost 100% sensitivity and has been accepted as the gold standard.29 We used the strict criterion of absolute absence of residual stone, because our experience, together with the evidence acquired from several reports, has shown that a significant number of patients with residual fragments will experience a stone-related event during the postoperative period,26-30 reaching up to 46%. Of those events, renal colic is the most common, followed by stone regrowth, increasing the need for additional intervention. Gokce et al.29 reported an absolute stone-free rate of 54.9%, using non-contrast computed tomography as the imaging control and applying the same strict criterion that we used. Their results were slightly lower than ours of almost 58%.
Despite all scoring systems having similar predictive accuracy, we found no significant difference between each AUC (p=0.2) and none was more predictive than another for stone-free rate. Noureldin et al. showed the same results, comparing the AUCs of the Guy’s stone score and the S.T.O.N.E. score (p=0.6).31 It is up to the urologist to consider the specific characteristics of each method to decide which should be used as the standard in clinical practice and academic reporting.
All the scoring systems analyzed in the present work had significant correlation with operation time and length of hospital stay, concurring with the results of most of the published articles.8,31-32 However, the correlation between the S-ReSC score and length of hospital stay had not been described until now. Another important quality is the excellent inter-observer agreement of the four scales, which makes them very reliable instruments. The CROES nomogram was developed from a large prospective study that included 2,806 patients from 96 centers.7 Nevertheless, we feel it is a very complicated scoring system, compared with the others, making its every day application a challenge. Furthermore, its capacity to predict stone-free rate was no more accurate than that of stone burden alone,17 which has been described in several works as a significant predictor of stone-free rate.33-34 We, too, demonstrated its direct correlation with operative time and length of hospital stay. However, stone burden is commonly expressed as the largest diameter, which is potentially inaccurate, because of the complex shape of large kidney stones.
In a multi-center study of 850 patients, S.T.O.N.E. nephrolithometry was significantly associated with the complication rate,12 as was also demonstrated by us, but unlike the study of Okhunov et al.,6 in which it could not be correlated with postpercutaneous nephrolithotomy complications (p=0.09). That scale is relatively easy to use and showed a high degree of accuracy in our study. However, it consists of variables that are obtained specifically from non-contrast computed tomography images and requires specific software to calculate the different variables,16 representing an important disadvantage for centers that do not have the necessary resources. On the other hand, the Guy’s stone score was initially developed using preoperative plain abdominal X-ray, which is the most common assessment method. Moreover, said scale presented the best predictive ability for stone-free rate, when calculated using non-contrast computed tomography. We found no significant association with postoperative complications (p=0.49), results that are consistent with other studies.6,11,16,21 In contrast, Vicentini et al.13 reported a statistically significant positive association. The Guy’s stone score was based on results in the literature and expert opinion, which we believe gives it a much more practical value, providing a simple, reproducible, and accurate method. Its main disadvantage is the poor agreement among reviewers when grading patients with partial versus complete staghorn stones.5
In addition to all the above, the S-ReSC score is a recent, but not widely used, grading scale that has presented excellent stone-free rate predictive capacity. Our results showed no statistical difference between the different AUCs analyzed. This scoring system was developed under the hypothesis that complex stone distribution was the most powerful predictor of treatment success, counting the number of sites involved.8 However, stone distribution is closely related to stone size and number.15 The S-ReSC scoring system does not require any particular software and is extremely simple, requiring approximately 15 seconds to score, according to the original description.8 Additionally, it showed a significant ability to predict complications, something not previously described. Its greatest advantage is its almost perfect inter-observer reliability, with a correlation coefficient of 0.949 (95% CI: 0.922-0.969, p<0.001),15 the highest reported among all the assessment scales. In our opinion, these facts make it the ideal scoring system for standardizing its daily application and homogenizing criteria.
The main limitation of our study was its retrospective design, but the data were evaluated by an experienced endourologist to substantiate the clinical information. In addition, inter-observer reliability could not be analyzed, because the complexity grading was performed by a single junior urology resident from our institution. It is up to the urologic community to eventually decide which evidence-based scale is the most suitable. The need for a standardized method continues to grow, as experience increases and studies continue to be published.
CONCLUSION
The four scoring systems analyzed were significant predictors of stone-free rate and they hadsimilar ROC curves and AUCs, with no significant differences. However, the Guy’s stone score demonstrated the best predictive capacity and the S-ReSC scoring system proved to be superior for predicting complications.