1 Description of the dataset

The myeloma dataset is composed by clinical information on 16 covariates of 65 patients with multiple myeloma Krall et al. (1975). This dataset was analysed in two different ways. The approach of the following report all the covariates were considered to the Cox regression model and subsequently reduced using the stepwise method in order to decrease the data dimensionality.

myeloma<-read.table("./myeloma.txt",sep="")

head(myeloma)

##   PID time status  hgb platelets infection age sex logwbc frac logbm
## 1   1 1.25      1  9.4         1         0  67   1  3.663    1 1.954
## 2   2 1.25      1 12.0         1         1  38   1  3.987    1 1.954
## 3   3 2.00      1  9.8         1         1  81   1  3.875    1 2.000
## 4   4 2.00      1 11.3         0         0  75   1  3.806    1 1.255
## 5   5 2.00      1  5.1         0         0  57   1  3.724    1 2.000
## 6   6 3.00      1  6.7         1         1  46   2  4.476    0 1.934
##   pctlymph pctmyel.cell proteinuria protein.urine total.serum.prot
## 1        0            0          12             2               11
## 2        0            0          20             1                6
## 3        0            0           2             1               11
## 4       13           52           0             2                9
## 5        0            0           3             1               10
## 6        3            5          12             2                9
##   serum.globin serum.calcium logbun
## 1            7            10 2.2175
## 2            3            18 1.9393
## 3            8            15 1.5185
## 4            6            12 1.7482
## 5            6             9 1.3010
## 6            3            10 1.5441

colnames(myeloma)

##  [1] "PID"              "time"             "status"          
##  [4] "hgb"              "platelets"        "infection"       
##  [7] "age"              "sex"              "logwbc"          
## [10] "frac"             "logbm"            "pctlymph"        
## [13] "pctmyel.cell"     "proteinuria"      "protein.urine"   
## [16] "total.serum.prot" "serum.globin"     "serum.calcium"   
## [19] "logbun"

dim(myeloma)

## [1] 65 19

Some transformations of the covariates have to be done.

myeloma$logbun=10^myeloma$logbun

Select the covariates that are going to be used in the study.

data.survival=myeloma[,-c(1,2,3)]
names(data.survival)[16]="bun"

The covariate matrix is given by:

head(data.survival)

##    hgb platelets infection age sex logwbc frac logbm pctlymph pctmyel.cell
## 1  9.4         1         0  67   1  3.663    1 1.954        0            0
## 2 12.0         1         1  38   1  3.987    1 1.954        0            0
## 3  9.8         1         1  81   1  3.875    1 2.000        0            0
## 4 11.3         0         0  75   1  3.806    1 1.255       13           52
## 5  5.1         0         0  57   1  3.724    1 2.000        0            0
## 6  6.7         1         1  46   2  4.476    0 1.934        3            5
##   proteinuria protein.urine total.serum.prot serum.globin serum.calcium
## 1          12             2               11            7            10
## 2          20             1                6            3            18
## 3           2             1               11            8            15
## 4           0             2                9            6            12
## 5           3             1               10            6             9
## 6          12             2                9            3            10
##         bun
## 1 165.00610
## 2  86.95609
## 3  32.99894
## 4  56.00154
## 5  19.99862
## 6  35.00258

And the matrix for the time and status is given by:

data.time.status=myeloma[,c(2,3)]
head(data.time.status)

##   time status
## 1 1.25      1
## 2 1.25      1
## 3 2.00      1
## 4 2.00      1
## 5 2.00      1
## 6 3.00      1

2 Survival analysis

The Cox’s proportional hazard was performed in all variables.

2.1 Cox Regression model

Let’s start the analysis for the Cox regression model.

library(survival)

Fit the Cox proportional hazards to the dataset. By the results obtained below, for a 5% level of significance, covariates proteinuria, protein.urine and bun are statistically significant for the Cox’s regression model.

fit.cox.all<-coxph(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival)
fit.cox.all

## Call:
## coxph(formula = Surv(data.time.status$time, data.time.status$status) ~ 
##     ., data = data.survival)
## 
##                       coef exp(coef)  se(coef)     z       p
## hgb              -0.138865  0.870345  0.079713 -1.74 0.08150
## platelets        -1.101926  0.332231  0.672115 -1.64 0.10111
## infection         0.793635  2.211420  0.497769  1.59 0.11085
## age              -0.000915  0.999085  0.022527 -0.04 0.96760
## sex              -0.706409  0.493413  0.455751 -1.55 0.12114
## logwbc            1.574200  4.826879  0.875547  1.80 0.07218
## frac              0.369597  1.447151  0.443359  0.83 0.40449
## logbm             1.056089  2.875103  0.650821  1.62 0.10465
## pctlymph         -0.014365  0.985738  0.037530 -0.38 0.70190
## pctmyel.cell      0.019179  1.019364  0.015886  1.21 0.22732
## proteinuria       0.104580  1.110244  0.040772  2.56 0.01032
## protein.urine     1.280076  3.596911  0.466501  2.74 0.00607
## total.serum.prot  0.260820  1.297994  0.169249  1.54 0.12331
## serum.globin     -0.188870  0.827894  0.167378 -1.13 0.25915
## serum.calcium     0.020696  1.020911  0.139537  0.15 0.88209
## bun               0.020981  1.021203  0.005816  3.61 0.00031
## 
## Likelihood ratio test=40  on 16 df, p=0.00077
## n= 65, number of events= 48

Now let us apply the stepwise function in order to reduce the model.

fit.cox.1

## Call:
## coxph(formula = Surv(data.time.status$time, data.time.status$status) ~ 
##     hgb + sex + proteinuria + protein.urine + total.serum.prot + 
##         bun, data = data.survival)
## 
##                      coef exp(coef) se(coef)     z       p
## hgb              -0.13581   0.87301  0.06349 -2.14   0.032
## sex              -0.51191   0.59935  0.36297 -1.41   0.158
## proteinuria       0.06575   1.06796  0.02819  2.33   0.020
## protein.urine     0.92502   2.52192  0.38124  2.43   0.015
## total.serum.prot  0.13296   1.14220  0.06887  1.93   0.054
## bun               0.02360   1.02388  0.00551  4.29 1.8e-05
## 
## Likelihood ratio test=28.9  on 6 df, p=6.27e-05
## n= 65, number of events= 48

From the results above, the model with the lowest AIC contains the following covariates: hgb, sex, proteinuria, protein.urine, total.serum.prot and bun. The model with those covariates is the one that is going to be analysed.

The covariates matrix will be

data.survival=data.survival[,c(1,5,11,12,13,16)]
head(data.survival)

##    hgb sex proteinuria protein.urine total.serum.prot       bun
## 1  9.4   1          12             2               11 165.00610
## 2 12.0   1          20             1                6  86.95609
## 3  9.8   1           2             1               11  32.99894
## 4 11.3   1           0             2                9  56.00154
## 5  5.1   1           3             1               10  19.99862
## 6  6.7   2          12             2                9  35.00258

For the Cox’s regression model we have

fit.cox<-coxph(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival)
fit.cox

## Call:
## coxph(formula = Surv(data.time.status$time, data.time.status$status) ~ 
##     ., data = data.survival)
## 
##                      coef exp(coef) se(coef)     z       p
## hgb              -0.13581   0.87301  0.06349 -2.14   0.032
## sex              -0.51191   0.59935  0.36297 -1.41   0.158
## proteinuria       0.06575   1.06796  0.02819  2.33   0.020
## protein.urine     0.92502   2.52192  0.38124  2.43   0.015
## total.serum.prot  0.13296   1.14220  0.06887  1.93   0.054
## bun               0.02360   1.02388  0.00551  4.29 1.8e-05
## 
## Likelihood ratio test=28.9  on 6 df, p=6.27e-05
## n= 65, number of events= 48

Covariates hgb, proteinuria, protein.urine and bun, for a 5% level of significance, statistically significant.

Next, see if the hypothesis of proportional hazards is not violated. By the results obtained, the proportional hazards hypothesis is not violated.

pph<-cox.zph(fit.cox, transform="km", global=TRUE)
pph

##                      rho  chisq     p
## hgb              -0.0197 0.0202 0.887
## sex               0.0403 0.0877 0.767
## proteinuria       0.0828 0.3503 0.554
## protein.urine     0.1067 0.6607 0.416
## total.serum.prot  0.1757 1.1636 0.281
## bun              -0.0510 0.1434 0.705
## GLOBAL                NA 1.8513 0.933

2.2 Cox Robust

In order to see if the results obtained before are consistent, a robust version of the Cox regression model is going to be performed. First call the following package.

library(coxrobust)

To see the results obtained by fitting the robust Cox using exponential weights with a 5% level of trimming, do the following:

fit.coxrobust<-coxr(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival,trunc=0.95, f.weight="exponential",singular.ok=T,model=F)
fit.coxrobust

## 
## Call:
## coxr(formula = Surv(data.time.status$time, data.time.status$status) ~     ., data = data.survival, trunc = 0.95, f.weight = "exponential",     singular.ok = T, model = F)
## 
## Partial likelihood estimator
##                     coef exp(coef) se(coef)        p
## hgb              -0.1343     0.874  0.06363 3.49e-02
## sex              -0.5553     0.574  0.36574 1.29e-01
## proteinuria       0.0707     1.073  0.02868 1.38e-02
## protein.urine     0.9736     2.647  0.38470 1.14e-02
## total.serum.prot  0.1391     1.149  0.06846 4.22e-02
## bun               0.0258     1.026  0.00579 8.75e-06
## 
## Wald test=31.2 on 6 df, p=2.37e-05
## 
## Robust estimator
##                     coef exp(coef) se(coef)        p
## hgb              -0.1381     0.871  0.09986 0.166545
## sex              -0.5462     0.579  0.63915 0.392784
## proteinuria       0.0603     1.062  0.05479 0.271273
## protein.urine     0.8603     2.364  0.52034 0.098277
## total.serum.prot  0.1221     1.130  0.06308 0.052902
## bun               0.0249     1.025  0.00736 0.000723
## 
## Extended Wald test=18 on 6 df, p=0.00615

From the results, only covariate bun is statistically significant for a 5% level of significance. The results are not consistent with the ones obtained from the Cox’s proportional hazards.

2.3 Cox Robust based on Heritier

Another proposal of the robust Cox is given by Heritier (2009), and is going to be performed next.

filepath="../functions/"
source(paste(filepath,"Chapter7_functions.r",sep=""))

The fitted model using exponential weights with 5% level of trimming is given bellow. Notice that the results show that all covariates are significant for a 5% level of significant.

fit.coxrobust.H=rcoxph(data.time.status$time,data.time.status$status,data.survival,wt.type="exponential",quant=.95)
fit.coxrobust.H$coefficients

##                     estimate         SE         z p.value
## hgb              -0.13393895 0.07532391 -1.778173 0.07537
## sex              -0.53088710 0.41507710 -1.279008 0.20089
## proteinuria       0.05862697 0.02831727  2.070361 0.03841
## protein.urine     0.86260830 0.40406690  2.134816 0.03277
## total.serum.prot  0.12324233 0.06439164  1.913949 0.05562
## bun               0.02502957 0.00387232  6.463716 0.00000

Only covariates proteinuria, protein.urine and bun are statistically significant.

The next figure shows that observations 40, 44 and 48 are identified as influential observations in the sense that the weight given for each one is the lowest among the others.

In the next section it is going to be evaluated if the observations identified before are or not influential.

3 Outlier detection methods for survival analisys

In order to identify which of the observations might be outliers some methods for outlier detection for survival analysis are going to be performed.

3.1 Residuals

The first method used for outlier detection is the residuals: martingale and deviance.

3.1.1 Martingale

The results regarding the martingale residuals are shown bellow. Observation 40, 44 and 48 presented the lowest values for the martingale residual.

res.mart.2<- resid(fit.cox,type="martingale")

head(sort(res.mart.2))

##        44        48        40        64        46        57 
## -2.203739 -1.933192 -1.820856 -1.316503 -1.244034 -1.110623

3.1.2 Deviance

The results regarding the martingale residuals are shown bellow. Observations 3, 2, 5 and 15 presented the highest absolut value.

res.dev.2<- resid(fit.cox,type="deviance")

head(sort(abs(res.dev.2),decreasing = TRUE))

##        3        2        5       15        4       12 
## 2.273811 2.219232 2.158446 2.156097 1.874950 1.691678

3.2 Concordance c-index

The outlier detection method for the concordance c-index is based on the definition of an outlier as “an observation that when absent from the data, will likely decrease the prediction error of the fitted model”. The concordance c-index is used as a test statistics that is sensitive to the presence of outliers, i.e., the larger the number of outliers, the lower the performance of the model adjusted. The concordance c-index measures how well predicted values are concordant with the rank ordered response variables. Two algorithms based on this are going to be performed next. First let’s call the package to perform this outlier detection method.

library("BCSOD")

The input of this algorithm is:

the time vector;
the status vector;
the covariate matrix.

3.2.1 BHT

The main idea behind the bootstrap hypothesis test (BHT) for outlier detection is to perform B hypothesis tests for the concordance variation on bootstrap samples without the target observation i.

Notice that the selected method for this algorithm is “bht”, and the number of bootstrapped selected was 1000.

In the following output we can see the top observations with the lowest p-values.

outliers_bht1000[1:10,]

##    id   exp infl       max p-value
## 15 15 0.04173028 0.1553587   0.117
## 17 17 0.03305233 0.1497043   0.202
## 40 40 0.03388816 0.1667668   0.202
## 23 23 0.03421315 0.1818862   0.204
## 3   3 0.03002469 0.1451865   0.214
## 44 44 0.03295515 0.1779616   0.214
## 35 35 0.03057466 0.1582950   0.216
## 50 50 0.02761639 0.1605236   0.221
## 18 18 0.03093624 0.1627225   0.230
## 21 21 0.02998733 0.1571565   0.234

The next command give us only the vectors that are needed, the id and the p-values

bht<-outliers_bht1000[,c(1,4)]# only the id and the p-value

Here in increasing order

bht.order<-bht[order(bht$id),]
bht.order[1:10,] # it has the id in increasing order

##    id p-value
## 1   1   0.356
## 2   2   0.317
## 3   3   0.214
## 4   4   0.310
## 5   5   0.245
## 6   6   0.299
## 7   7   0.353
## 8   8   0.355
## 9   9   0.272
## 10 10   0.318

3.2.2 DBHT

The dual bootstrap hypothesis testing (DBHT) is an improvement of the BHT described before. The BHT removes one observation from the dataset, and then evaluates the impact of each removal on the concordance c-index. Notice that the model has less observations than the original dataset, and therefore the concordance c-index has the tendency to increase, which may hamper the evaluation of the statistical significance and may increase the number of false positives.

The input is the same as before. Notice that B.N corresponds to the number of observations

Here are the top observations with lowest p-values.

outliers_dbht1000[1:10,]

##               pvalues
##  [1,] 15 5.105849e-82
##  [2,] 23 5.150567e-29
##  [3,] 40 1.893182e-23
##  [4,] 17 8.135095e-18
##  [5,] 57 1.865634e-15
##  [6,] 21 1.172692e-13
##  [7,] 44 8.269036e-13
##  [8,] 35 2.873929e-12
##  [9,] 18 2.791437e-10
## [10,]  3 4.585048e-09

dbht<-as.data.frame(outliers_dbht1000)# convert to a dataframe (more easy to handle)
names(dbht)[1]="id" # change the name of the 1st) column
dbht.order<-dbht[order(dbht$id),]
dbht.order[1:10,]

##    id      pvalues
## 65  1 1.000000e+00
## 48  2 9.986505e-01
## 10  3 4.585048e-09
## 52  4 9.999125e-01
## 22  5 1.104598e-01
## 55  6 9.999836e-01
## 62  7 1.000000e+00
## 63  8 1.000000e+00
## 36  9 6.431752e-01
## 59 10 1.000000e+00

3.3 Censored quantile regression

Censored quantile regression was introduced by Portnoy (2003) as an alternative to the Cox’s regression model for survival data with censored observations. Due to the fact that the Cox’s regression model assumes proportional hazards, the censored quantile regression can provide more flexibility. The alternative method proposed by Portnoy (2003) is a generalization of the Kaplan-Meier one sample estimator.

Here two algorithm are considered. the results an command lines to obtain the outliers based on this theory are presented next. First call the package OutlierDC, and make sure that the covariate matrix is in the form of a data frame.

library(OutlierDC)

## Loading required package: quantreg

## Loading required package: SparseM

## 
## Attaching package: 'SparseM'

## The following object is masked from 'package:base':
## 
##     backsolve

## 
## Attaching package: 'quantreg'

## The following object is masked from 'package:survival':
## 
##     untangle.specials

## Loading required package: Formula

##

## Package OutlierDC (0.3-0) loaded.

data.survival<-as.data.frame(data.survival)

3.3.1 Residuals

The residual algorithm arises from the outlier detection algorithm for the Cox’s regression model for censored data proposed by Nardi (1999) but now based on quantile regression.

Notice that the odc function all covariates must be written.

odcR_myeloma3<-odc(formula=Surv((data.time.status$time),data.time.status$status)~ hgb + sex + proteinuria + protein.urine + total.serum.prot + bun, data = data.survival,method="residual",rq.model="Wang",k_r=0.5)

## Please wait... 
## Done.

odcR_myeloma3

## 
##      Outlier Detection for Censored Data
## 
##  Call: odc(formula = Surv((data.time.status$time), data.time.status$status) ~ 
##     hgb + sex + proteinuria + protein.urine + total.serum.prot + 
##         bun, data = data.survival, method = "residual", rq.model = "Wang", 
##     k_r = 0.5)
##  Algorithm: Residual-based algorithm (residual) 
##  Model: Locally weighted censored quantile regression (Wang) 
##  Value for cut-off k_r:  0.5 
##  # of outliers detected:  15 
## 
##  Outliers detected:
##    times delta (Intercept)  hgb sex proteinuria protein.urine
## 35    32     1           1 10.6   1           1             2
## 39    41     1           1  5.0   2           0             2
## 40    51     1           1  7.7   1           4             1
## 41    52     1           1 10.1   2           4             1
## 42    54     1           1  9.0   1           2             1
## 43    58     1           1 12.1   2          22             1
##    total.serum.prot bun residual sigma Outlier
## 35               12  21     15.8   7.8       *
## 39                7  14     27.9   7.8       *
## 40               12  37     17.9   7.8       *
## 41               10  10      9.7   7.8       *
## 42                8  18     14.4   7.8       *
## 43                7  16     13.2   7.8       *
## 
##  6 of all 15 outliers were displayed.

The top outliers are:

head(sort(odcR_myeloma3@score,decreasing = TRUE))

##       44       65       48       46       47       64 
## 52.57250 51.46428 49.91432 44.60249 42.99268 35.44830

Put in form of a vector the results obtained before

odR.vec<-as.vector(odcR_myeloma3@score)

Define a matrix with the id and the values for the residual algorithm.

n<-dim(data.survival)[1]#dimension of the data
id<-as.vector(seq(1:n))# vector with the id
odR.matrix<-as.data.frame(cbind(id,odR.vec))

3.3.2 Score

The score algorithm arises from the outlier detection algorithm for the Cox’s regression model for censored data proposed by Nardi (1999) but now based on quantile regression.

odcS_myeloma3<-odc(formula=Surv((data.time.status$time),data.time.status$status)~ hgb + sex + proteinuria + protein.urine + total.serum.prot + bun, data = data.survival,method="score",rq.model="Wang",h=0.05)

## Please wait... 
## Done.

odcS_myeloma3

## 
##      Outlier Detection for Censored Data
## 
##  Call: odc(formula = Surv((data.time.status$time), data.time.status$status) ~ 
##     hgb + sex + proteinuria + protein.urine + total.serum.prot + 
##         bun, data = data.survival, method = "score", rq.model = "Wang", 
##     h = 0.05)
##  Algorithm: Scoring algorithm (score) 
##  Model: Locally weighted censored quantile regression (Wang) 
##  Value for cut-off k_s:   
##  # of outliers detected:  0 
## 
##  Top 6 outlying scores:
##    times delta (Intercept)  hgb sex proteinuria protein.urine
## 40    51     1           1  7.7   1           4             1
## 46    88     1           1 10.6   2          21             1
## 39    41     1           1    5   2           0             2
## 44    66     1           1  6.6   1           0             2
## 48    92     1           1   11   2           4             1
## 57    13     0           1  4.9   2           0             2
##    total.serum.prot bun score Outlier
## 40               12  37   3.7        
## 46                6  15   3.4        
## 39                7  14   2.7        
## 44                5  28   2.7        
## 48                9  27   2.1        
## 57               10  46     2

The top outliers are:

head(sort(odcS_myeloma3@score,decreasing = TRUE))

##       40       46       39       44       48       57 
## 3.700180 3.398639 2.732555 2.656039 2.105830 2.031864

Put in form of a vector the results obtained before

odS.vec<-as.vector(odcS_myeloma3@score)

Define a matrix with the id and the values for the residual algorithm.

n<-dim(data.survival)[1]#dimension of the data
id<-as.vector(seq(1:n))# vector with the id
odS.matrix<-as.data.frame(cbind(id,odS.vec))

4 Rank Product test

In order to obtain a global result concerning each one of the techniques performed in the previous section, a rank product test is going to be used. First a rank matrix is going to be obtained based on the ranks of each outliers detection method.

res.mart.vec<-as.vector(res.mart.2)
rank_martingale<-rank(res.mart.vec,ties.method = "first")

res.dev.vec<-as.vector(abs(res.dev.2))
rank_deviance<-rank(-res.dev.vec,ties.method = "first")

rank_bht<-rank((bht.order$`p-value`),ties.method = "first")

rank_dbht<-rank(dbht.order$pvalues,ties.method = "first")

rank_residual<-rank(-(odR.matrix$odR.vec),ties.method ="first")

rank_score<-rank(-(odS.matrix$odS.vec),ties.method ="first")

Now let us obtain based on the ranks above the rank product matrix. First let’s consider a rank matrix, with all the ranks obtained by each method.

rankMat<-cbind(rank_martingale,rank_deviance,rank_bht,rank_dbht,rank_residual,rank_score)

In second obtain the rank product, which is obtained based on the input of the rank matrix.

RankProduct = function(rankMat)
{
  return(apply(rankMat, 1, prod))
}

rankproduct<-RankProduct(rankMat)

4.1 p-values

In order to obtain the p-values the algorithm proposed by Heskes (2014) was used. Notice that the p-values are based on the geometric mean of upper and lower bounds, defined recursively.

filepath="../functions/"
source(paste(filepath,"Heskes_pvalues.R",sep=""))

The input to obtain the p-values is the following:

rho: the rank product matrix;
the number of observations, n;
the number of methods used, k.

rho=rankproduct
n<-dim(data.survival)[1]
k<-dim(rankMat)[2]

The p-values are obtained by the following, where Delta option is the geometric mean.

pvalues<-as.vector(rankprodbounds(rho,n,k,Delta ='geometric'))

In order to obtain the observations with the lowest p-values we can do the lines:

pvalues.matrix<-cbind(pvalues)
which(pvalues.matrix<0.01) ## pvalues < to 1%

## [1] 40 44 57

which(pvalues.matrix<0.05) ## pvalues < to 5%

## [1] 15 35 40 44 46 48 57 64 65

4.2 q-values

Another key issue when performing Rank product tests is related with the multiple testing problem. In fact, since many observations are tested, type-I errors (false positives) will increase.

The FDR, which is the expected proportion of false positives among all tests that are significant, sorts in an ascendant order the p-values and divides them by their percentile rank. The measure used to determine the FDR is the q-value. For the p-value: 0.05 implies that 5% of all tests will result in false positives, instead, for the q-value: 0.05 implies that 5% of significant tests will result in false positives. In this sense the q-value is able to control the number of false discoveries in those tests. For this reason it has the ability of finding truly significant results.

The q-value package must be installed.

library(qvalue)

Here the only input that is need is the p-values obtained before for each observation.

qobj<-qvalue(pvalues)

qvalues<-qobj$qvalues
which(qvalues<0.01)

## [1] 40 44

which(qvalues<0.05)

## [1] 40 44

In order to resume all the information, let us get an id vector and a data frame that combines the ranks the pvalues and the qvalues.

id<-as.vector(seq(1:n))# vector with the id
myeloma.outliers<-as.data.frame(cbind(id,rank_martingale,rank_deviance,rank_bht,rank_dbht,rank_residual,rank_score,pvalues,qvalues))

Next we can obtain the top 10 outlier observations, with the qvalues sorted in increasing order.

sort.myeloma.outliers <- myeloma.outliers[order(qvalues) , ]
sort.myeloma.outliers[1:10,]

##    id rank_martingale rank_deviance rank_bht rank_dbht rank_residual
## 44 44               1            15        6         7             1
## 40 40               3            23        3         3            10
## 57 57               6            11       11         5            16
## 15 15              62             4        1         1            65
## 48 48               2            22       24        41             3
## 64 64               4             7       21        25             6
## 35 35              10            37        7         8            11
## 46 46               5            30       60        58             4
## 65 65               7            12       51        64             2
## 3   3              65             1        5        10            64
##    rank_score      pvalues     qvalues
## 44          4 2.118336e-05 0.001376919
## 40          1 6.903560e-05 0.002243657
## 57          6 5.820630e-03 0.126113654
## 15         53 1.306892e-02 0.141579921
## 48          5 1.027446e-02 0.141579921
## 64          9 1.225983e-02 0.141579921
## 35          7 2.210230e-02 0.205235603
## 46          2 4.667962e-02 0.349565208
## 65          8 4.840134e-02 0.349565208
## 3          51 8.953652e-02 0.523564170

Notice observations 40, and 44 are the two with the lowest qvalues, leading us to conclude that they are influential observations and should taken for further study.

Outlier detection in survival analysis (All techniques)

IDMEC Team, 2017

Dataset1-myeloma