Introduction
Acute appendicitis (AA) is one of the most common surgical pathologies in emergence departments, with a lifetime risk between 7% and 9%1-3. It is more frequent between children and young adults. Its incidence seems to be conditioned by different factors such as sex, age, ethnicity, and the season of the year2,4. However, most of the patients who present with acute right iliac fossa pain (RIFP) do not have appendicitis. The differential diagnosis can be difficult and remains a clinical challenge3.
Traditionally considered as a progressive disease whose natural evolution is toward perforation, early diagnosis and treatment are necessary to reduce morbidity and mortality5,6, and surgery is the gold standard of treatment7,8. The etiology and pathogenesis of AA remain poorly understood and predicting its evolution towards a mild or complicated form of the disease is difficult. However, recent studies suggest that with a logistic regression model, this could be predicted6. At present, two types of appendicitis are thought to exist: uncomplicated (non-perforating) and complicated (perforating) appendicitis6. Regardless of its presentation, the concept of early diagnosis and treatment remains in force8-11.
Diagnosis is based on the clinical assessment, and laboratory and imaging tests12.
Ultrasonography (US), abdominal computed tomography (CT), and magnetic resonance imaging (MRI) are most commonly used to reduce the negative appendectomy rate, which has been reported to be as high as 15%13.
US is noninvasive, not expose to ionizing radiation and is associated with a sensitivity rate between 71% and 94% and a specificity rate between 81% and 98%14,15. It is efficient to confirm the presence of appendicitis, but not to exclude the presence of it, being also operator dependent3. The lack of conclusive findings, either due to non-visualization of the appendix or due to specific technical difficulties (obese or obstetric patients), makes necessary to implement second-line imaging studies. Abdominal CT for suspected appendicitis has sensitivity and specificity rates between 76% and 100% and 83-100%, respectively15,16. However, the radiation exposure is a concern, particularly in children and pregnant women16. In these cases, MRI is another option5.
There is an evident lack of uniformity between the different guidelines regarding diagnosing and managing AA7. In an attempt to standardize the diagnostic approach to this pathology, clinical prediction rules (CPRs) have been introduced, seeking to provide a more objective approach to diagnosing RIFP and avoiding unnecessary operations12,17.
Both clinical and biochemical variables have been used in CPRs to increase the value of individual variables. Since the initial proposal of Alvarado18, there are currently approximately 12 CPRs available for AA diagnosis12.
The most tested one is the Alvarado score, introduced in 19868. This score has proven to be very efficient at “ruling-out” appendicitis with an overall sensitivity and specificity of 96% and 81%, respectively17. However, of the eight variables used in the initial scale of Alvarado, new variables and weightings have been added to the successive scales developed, leading to a progressive complexity of CPRs and, therefore, making their use less efficient12.
Despite that, the use of these scores seems to be useful to determine the low, medium, or high likelihood of AA. Furthermore, they allow identifying the cases in which image methods must be implemented14. The systematic clinical evaluation of patients with RIFP can be done efficiently with the use of CPR, but the simplification of these tools can make them more useful and easier to apply.
The aim of this study was to validate the effectiveness of the currently available CPRs in performing a correct diagnosis and to develop a new simplified and efficient scoring system.
Methods
A retrospective observational study was conducted. The clinical records of 458 patients who were evaluated for suspected AA from January 2010 to December 2016 were reviewed. All patients underwent surgery using an open (25) or a laparoscopic (433) approach. Before surgery, informed consent was obtained from all subjects or legal guardian.
Diagnostic confirmation was obtained through the anatomopathological report, which indicated AA by the presence of inflammatory cells (leukocytes, lymphocytes, or plasma cells) in the surgical specimen or indicated negative appendectomy (NA) in the absence of these cells19.
The information collected included demographic and personal data, clinical features, and analytical data at admission, as well as interventional reports, and post-operative outcomes. With this information, the Alvarado, Raja isteri pengiran anak saleha appendicitis (RIPASA), appendicitis inflammatory response (AIR), and adult appendicitis score (AAS) scores were established for the selected patients.
The data obtained were entered into an anonymized database created in Microsoft Excel (Microsoft Corporation. Redmond, WA 98052. USA) and were analyzed using IBM SPSS Statistics version 20.0 (IBM Corporation Armonk, New York 10504. USA). In the descriptive analysis, the quantitative variables are reported as the median and the interquartile range (IQR). The qualitative variables are reported as frequencies and percentages of the total number of patients (N, %). Associations between the qualitative variables were analyzed by the Pearson Chi-square (χ2) test. Comparisons of the quantitative values were carried out using the nonparametric Mann–Whitney U test. To determine the diagnostic efficiency of these scales, an analysis was performed using receiver operating characteristic (ROC) curves, with a calculation of the area under the curve (AUC) for each scale. Then, the scores were stratified according to a low, medium, or high probability of presenting AA according to established literature guidelines the Alvarado17,18, RIPASA20, AIR21, and AAS22 scales.
To elaborate a new score Hospital Medina del Campo score (HMC score), a univariate logistic regression was performed for each variable incluidas en los items de las escalas. In addition, the white blood cell count (WBCC) was categorized to establish a cut-off point. A univariate binary logistic regression was performed with each decile of leukocytes, being considered significant if p < 0.05. The lowets data who fullfilled the condition was considered the cut-off point since it included a greater number of patients. With this aim, the variables with p < 0.1 in the univariate analysis were included in a multivariate logistic regression analysis using “the enter method” and “the Wald method.”
With those variables that reached statistical value a new AA diagnostic probability scale was designed, calculating its Area Under the Curve. To score this new scale, whole number was calculated from the coefficient B of the Wald method, multiplying it by 10 and eliminating the decimals (Table 1).
Variable | Coefficient β | Score | p |
---|---|---|---|
Anorexia | 0.825 | 8 | 0.039 |
WBCC ≥ 8.275 | 1.640 | 16 | 0.001 |
NTF > 75% | 1.157 | 12 | 0.002 |
Pain migration to RIF | 0.861 | 9 | 0.021 |
Pain evolution < 48 h | 0.745 | 7 | 0.028 |
T - > 37°C, < 39°C | −0.873 | −9 | 0.013 |
Wald method for calculating the HMC score. NTF: Neutrophilia, RIF: Right iliac fossa, T: temperature, WBCC: white blood cell count, HMC: Hospital Medina del Campo
This is a retrospective study which involved using data from clinical records. To guarantee the adequate treatment of the information and its confidentiality, the data were treated confidentially and anonymously according to the provisions of Spanish Organic Law 15/1999 of December 13 of the Personal Data Protection (LOPD). All methods were performed in accordance with the guidelines and regulations established by the Declaration of Helsinki (1964/Revised in 1983) on biomedical research in humans and Spanish Royal Decree 1090/2015, of December 4, which regulates clinical trials with drugs, the Research Ethics Committees with drugs and the Spanish Registry of Clinical Studies.
Ethical approval through the Clinical Trials and Ethics Committee of Valladolid University was granted in January 2017.
Results
We analyzed 458 patients who fulfilled the inclusion criteria: abdominal pain with suspected AA and underwent an appendectomy. Of these, 404 (88.2%) patients had a histological confirmation of appendicitis, and 54 (11.8%) had a normal appendix. In 36 patients, the intraoperative appearance of the appendix was considered normal; however, in ten of these patients (27.8%), the histological report confirmed the presence of AA. The median age of all patients was 31 years (IQR: 18.0-48.0 years). In the distribution by sex, a male predominance was observed (266 patients: 58.1%), and 60.9% of the patients with histologically confirmed appendicitis were males (p < 0.001). US was performed in all cases and was suggestive of AA in 260 (60%) patients, including 243 patients with histological confirmation of AA (sensitivity 63.8% and specificity 67.3%).
The scales under investigation were applied to all patients in our cohort, and all scales showed statistically significant results in terms of predictive ability and diagnostic performance (Table 2).
Score | Total (n = 458) | NA (n = 54; 11.8%) | AA (n = 404; 88.2%) | p |
---|---|---|---|---|
Alvarado | 6.00 (5.00-8.00) | 5.00 (4.00-6.00) | 6.00 (5.00-8.00) | < 0.001 |
RIPASA | 7.50 (6.50-9.00) | 7.00 (5.50-8.00) | 7.50 (6.50-9.00) | < 0.001 |
AIR | 5.00 (4.00-7.00) | 4.00 (2.00-5.00) | 5.00 (4.00-7.00) | < 0.001 |
AAS | 11.00 (9.00-13.00) | 9.00 (7.00-11.00) | 11.00 (9.00-13.50) | < 0.001 |
AA: Histological confirmation, NA: No histological support for AA. AA: Acute appendicitis, RIPASA: Raja isteri pengiran anak saleha appendicitis, AIR: appendicitis inflammatory response, AAS: Adult appendicitis score
The AUC of each CPR based on the probability of AA diagnosis is shown in table 3, and the ROC curves are shown in figure 1.
Score | AUC | 95% CI | p |
---|---|---|---|
Alvarado | 0.74 | 0.67-0.80 | < 0.001 |
RIPASA | 0.63 | 0.56-0.71 | < 0.001 |
AIR | 0.70 | 0.62-0.78 | < 0.001 |
AAS | 0.70 | 0.62-0.78 | < 0.001 |
RIPASA: Raja isteri pengiran anak saleha appendicitis, AIR: appendicitis inflammatory response, AAS: adult appendicitis score, CI: confidence interval, AUC: area under the curve
Of the 4 CPRs, the Alvarado score presented the most accurate diagnosis when the scores were high, assigning a high probability of AA to 206 patients, with diagnostic confirmation of 96.6%. In addition, the Alvarado score places fewer patients in the intermediate probability of having AA (37,6%). In the low probability group, the AAS score was the most efficient, with 81.91% confirmed cases of AA.
On the other hand, the multivariate analysis identified the following variables as independent factors of confirmed diagnosis of AA: anorexia (increased the risk by 2.28 times [p = 0.039]), WBCC ≥ 8.275 leukocytes/mL (increased the risk by 5.16 times [p < 0.001]), neutrophilia (NTF) > 75% (increased risk of 3.18 times [p = 0.002]), migrating pain to the RIF (increased the risk by 2.37 times [p = 0.021]), and abdominal pain for < 48 h of evolution (increased the risk of AA by 2.11 times [p = 0.028]).
In contrast, a temperature between 37oC and 39oC was associated with a lower risk of AA than that in patients with a temperature out of that range (OR = 0.42 [p = 0.013]).
The novel CPR built with these six variables was able to establish three levels of risk among our cohort: low probability (≤ 25 points): 24.9% of patients, medium probability (26-40 points): 47.9% of patients, and high probability (≥ 41 points): 27% of patients (Fig. 2). The AUC was 0.81 (CI 95%: 0.74-0.87 [p < 0.001]). This score has a sensitivity of 60.91% (CI 95%: 53.85-67.98) and a specificity of 90% (CI 95%: 79.45-100).
Discussion
To improve the effectiveness of the diagnostic process, the ideal scoring system should work as an effective and accurate tool that accelerates and improves the decision-making process and simultaneously reduces the need for complementary imaging studies22.
The aim of this study was to validate the effectiveness of the most commonly used CPRs and to develop a new streamlined and efficient scoring system.
In this sense, the most efficient of the CPRs evaluated was the Alvarado score, which has been confirmed in multiple previous studies3,17. This score enables risk stratification in patients with RIFP with the quantification of eight variables. The other CPRS shows a lower diagnostic efficiency with an increase in the number of variables evaluated.
The newly developed CPR (HMC score) included six variables: anorexia, abdominal pain with < 48 h of evolution, migratory pain to the RIF, WBCC > 8.275 leukocytes/mL, NTF > 75%, and axillary temperature between 37oC and 39oC. The score performs well as a predictor of AA with an area under the ROC curve of 0.81 (p < 0.001), with an improved diagnostic performance over the other scales (Fig. 3).
It is composed of three symptoms and three clinical data categories, which are easily identifiable by the patient and the evaluator, respectively. The HMC score has the advantage of being simpler (with fewer items) than the previous ones (Alvarado, RIPASA, AIR, and AAS), eliminating subjective data such as the degree of defense/rebound in the abdominal exploration (AIR and AAS), and data that are not always collected in the patient's medical records.
This score established a cutoff point for the leukocyte count. Although it has already been shown that individual or combined analytical tests have limited or little specific value when predicting AA, their simultaneous negativity allows practically negating the diagnosis of AA23. In a prospective study of 1032 patients, Lau24 concluded that the elevation of the WBCC and the percentage of neutrophils simultaneously increased the diagnostic specificity for AA. In another study, Atema25 found that a WBC count of > 20,000 associated with symptoms for more than 48 h was associated with a positive predictive value of 100%.
Among patients with AA, the reported sensitivity and specificity rates of leukocyte counts were 60-87% and 53-100%, respectively26, with different leukocyte cut-off points: 11,000 leukocytes/mL in the study of Bilic27 and 10,400 leukocytes/mL reported by Narci28. Our leukocyte cutoff point was 8275 leukocytes/mL, which increased the sensitivity of the test and, when combined with NTF (> 75%), the specificity was also increased. The percentage of neutrophils is by itself considered the best diagnostic marker for AA and is also related to its severity25.
Another aspect introduced by the HMC scale is in reference to body temperature. Fever is one of the variables present in most of the RIFP diagnostic scales (Alvarado, RIPASA, and AIR). However, many authors believe that the predictive value of fever for AA is limited29,30. Andersson31, in a study of 496 patients, demonstrated that a temperature > 37.7°C had a sensitivity and specificity of 70% and 65%, respectively, for the diagnosis of AA. In a later study, Andersson found that the mean temperature in nonsurgical abdominal pathology was 37.7°C, and only its persistence in serial physical evaluations would indicate the presence of complicated AA32. Therefore, temperature, as an independent variable, is not as useful3,29. In our scale, an axillary temperature between 37°C and 39°C was associated with a lower risk of AA. For that reason, and in agreement with these authors, our data support the idea that temperature is not a good predictive value of AA pathology. Its presence in patients evaluated for RIFP should alert clinicians to the possible existence of other intra-abdominal pathologies, such acute gastroenteritis and pelvic inflammatory disease.
It is well established that the diagnostic approach to RIFP is conditioned by certain characteristics of the patient, such as age and sex2,4. When comparing the global cohort of female patients with AA, we found that the HMC scale presented an AUC = 0.84 (0.77-0.90) (p < 0.001), which was higher than the AUC of the other CPRs. The data were even more obvious when we analyzed the group of women between the ages of 15 and 64 with an AUC of 0.86 (0.78-0.93) (p < 0.001). In addition, the diagnostic approach in women of childbearing age is particularly difficult because of the overlap of gynecological symptoms with those of AA itself, causing an increase in NA due to diagnostic errors33. It has been postulated that CPR scores fail to properly evaluate this subgroup of patients because the scores cannot adequately exclude the presence of gynecological pathologies. In fact, a diagnostic scale has been developed for the management of acute abdominal pain in women of reproductive age34.
When we applied the HMC score to women between 15 and 64 years old, we obtained a very high degree of success for the diagnosis of AA because of the 44 patients in this age subgroup with an HMC score ≥ 41, only one of them had a diagnosis recorded as AN, which improves the data provided by other authors35. However, female patients with a score ≤ 25 had the highest rate of NA (20 out of 44). These results support those collected in other studies that also showed high rates of NA in women of childbearing age29,36 and support the early implementation of imaging tests in these patients37.
Another group of patients with specific characteristics is the pediatric group. In this subgroup, the diagnosis of AA is a challenge both for the presence of nonsurgical pathologies that resemble appendicitis and for the difficulties of the anamnesis and exploration of these patients14. The rate of diagnostic errors increases as age decreases, and children 3 under 3 years of age have up to 5 times more risk of complicated AA38. Unable to provide data on patients under 5 years of age, our results show that NA was more frequent in pubescent girls between 10 and 14 years old (60% in our cohort), which are similar results to those found by Güller in a retrospective study of 7452 cases39.
The HMC scale was shown to be an acceptable predictor of AA in pediatric patients, with an AUC = 0.74 (0.59-0.90; p = 0.019), a result not achieved when applying the other scales. A high score on this scale was 100% diagnosed by AA, which could have avoided the use of ultrasound, a conclusion similar to that derived from the study of Blitman in which the Alvarado score was applied14. On the other hand, authors such as Fleischman40 showed that low scores of the appendicitis scales in children had good sensitivity to rule out AA and, therefore, to save diagnostic imaging tests with certainty and avoid unnecessary radiation risks.
Consequently, we believe that imaging tests improve the diagnostic accuracy, avoid errors and delays in definitive treatment, and should be performed in the diagnostic workup of doubtful diagnoses (intermediate scores) followed by CT scan when needed, a strategy supported by other authors41,42.
Finally, in elderly patients, the AA rate is approximately 10%, although with the aging of the population, these figures are increasing43. Comorbidities, the insidious onset of the disease and the delay in diagnosis with the high rate of perforations make AA pathology with high using and mortality rates in elderly patients44. The diagnostic scales for AA were designed with a young population, so their effectiveness in an elderly age group is not well documented45. For all this, and in the same way as other authors44, we recommended the early use of imaging tests in these patients, especially in the presence of inconclusive clinical data.
In our study, 11.1% of the patients were within this age group, with only 3 results of NA. None of the CPRs tested were statistically significant when applied to this group to discriminate between AA and NA. Nevertheless, the HMC scale was statistically significant, with the best AUC for elderly patients out of all the scores (0.86), showing that it was also a good predictive model for these patients. However, this sample size seems to be too small to make suitable comparisons with other published data.
The major weaknesses of this study are its retrospective nature, which increases the potential for bias and that it is a single center study. Among the strengths, it stands out that all patients have been treated by a small number of surgeons, with an adequate level of criteria uniformity and that in more than 95% of the cases, the clinical data were complete. Obviously, the score developed requires a validation that is currently being implemented in our center.
Conclusion
We can affirm that the diagnostic probability scales for AA are useful tools to evaluate patients with RIFP, which can facilitate the diagnostic approach during emergency situations and save time and unnecessary tests. The diagnostic accuracy for AA can be increased, in probable or inconclusive cases in which the diagnoses are based on clinical data, with the implementation of US and CT studies.
Obviously, in western countries, access to image studies is relatively easy. However, frequently, this supposes an overload for the radiological services as in our center. The implementation and proper use of these tools in emergency services can help to select those patients who truly need an extension of the clinical evaluation with complementary imaging studies.
Finally, our data allow us to affirm that the HMC score improves the diagnostic effectiveness in the population groups studied with respect to the other scales that have been evaluated and previously validated and supported by the literature.