1 Description of the dataset

The myeloma dataset is composed by clinical information on 16 covariates of 65 patients with multiple myeloma Krall et al. (1975). This dataset was analysed in two different ways. The approach of the following report all the covariates were considered to the Cox regression model and subsequently reduced using the stepwise method in order to decrease the data dimensionality. Lets read the dataset. Do not forget to change the filepath for the correct directory of your computer.

myeloma<-read.table("./myeloma.txt",sep = "")
head(myeloma)
##   PID time status  hgb platelets infection age sex logwbc frac logbm
## 1   1 1.25      1  9.4         1         0  67   1  3.663    1 1.954
## 2   2 1.25      1 12.0         1         1  38   1  3.987    1 1.954
## 3   3 2.00      1  9.8         1         1  81   1  3.875    1 2.000
## 4   4 2.00      1 11.3         0         0  75   1  3.806    1 1.255
## 5   5 2.00      1  5.1         0         0  57   1  3.724    1 2.000
## 6   6 3.00      1  6.7         1         1  46   2  4.476    0 1.934
##   pctlymph pctmyel.cell proteinuria protein.urine total.serum.prot
## 1        0            0          12             2               11
## 2        0            0          20             1                6
## 3        0            0           2             1               11
## 4       13           52           0             2                9
## 5        0            0           3             1               10
## 6        3            5          12             2                9
##   serum.globin serum.calcium logbun
## 1            7            10 2.2175
## 2            3            18 1.9393
## 3            8            15 1.5185
## 4            6            12 1.7482
## 5            6             9 1.3010
## 6            3            10 1.5441
colnames(myeloma)
##  [1] "PID"              "time"             "status"          
##  [4] "hgb"              "platelets"        "infection"       
##  [7] "age"              "sex"              "logwbc"          
## [10] "frac"             "logbm"            "pctlymph"        
## [13] "pctmyel.cell"     "proteinuria"      "protein.urine"   
## [16] "total.serum.prot" "serum.globin"     "serum.calcium"   
## [19] "logbun"
dim(myeloma)
## [1] 65 19

Some transformations of the covariates have to be done.

myeloma$logbun=10^myeloma$logbun

Select the covariates that are going to be used in the study.

data.survival=myeloma[,-c(1,2,3)]
names(data.survival)[16]="bun" 

The covariate matrix is given by:

head(data.survival)
##    hgb platelets infection age sex logwbc frac logbm pctlymph pctmyel.cell
## 1  9.4         1         0  67   1  3.663    1 1.954        0            0
## 2 12.0         1         1  38   1  3.987    1 1.954        0            0
## 3  9.8         1         1  81   1  3.875    1 2.000        0            0
## 4 11.3         0         0  75   1  3.806    1 1.255       13           52
## 5  5.1         0         0  57   1  3.724    1 2.000        0            0
## 6  6.7         1         1  46   2  4.476    0 1.934        3            5
##   proteinuria protein.urine total.serum.prot serum.globin serum.calcium
## 1          12             2               11            7            10
## 2          20             1                6            3            18
## 3           2             1               11            8            15
## 4           0             2                9            6            12
## 5           3             1               10            6             9
## 6          12             2                9            3            10
##         bun
## 1 165.00610
## 2  86.95609
## 3  32.99894
## 4  56.00154
## 5  19.99862
## 6  35.00258

And the matrix for the time and status is given by:

data.time.status=myeloma[,c(2,3)]
head(data.time.status)
##   time status
## 1 1.25      1
## 2 1.25      1
## 3 2.00      1
## 4 2.00      1
## 5 2.00      1
## 6 3.00      1

2 Survival analysis

The Cox’s proportional hazard was performed in all variables.

2.1 Cox Regression model

Let’s start the analysis for the Cox regression model.

library(survival)

Fit the Cox proportional hazards to the dataset. By the results obtained below, for a 5% level of significance, covariates proteinuria, protein.urine and bun are statistically significant for the Cox’s regression model.

fit.cox.all<-coxph(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival)
fit.cox.all
## Call:
## coxph(formula = Surv(data.time.status$time, data.time.status$status) ~ 
##     ., data = data.survival)
## 
##                       coef exp(coef)  se(coef)     z       p
## hgb              -0.138865  0.870345  0.079713 -1.74 0.08150
## platelets        -1.101926  0.332231  0.672115 -1.64 0.10111
## infection         0.793635  2.211420  0.497769  1.59 0.11085
## age              -0.000915  0.999085  0.022527 -0.04 0.96760
## sex              -0.706409  0.493413  0.455751 -1.55 0.12114
## logwbc            1.574200  4.826879  0.875547  1.80 0.07218
## frac              0.369597  1.447151  0.443359  0.83 0.40449
## logbm             1.056089  2.875103  0.650821  1.62 0.10465
## pctlymph         -0.014365  0.985738  0.037530 -0.38 0.70190
## pctmyel.cell      0.019179  1.019364  0.015886  1.21 0.22732
## proteinuria       0.104580  1.110244  0.040772  2.56 0.01032
## protein.urine     1.280076  3.596911  0.466501  2.74 0.00607
## total.serum.prot  0.260820  1.297994  0.169249  1.54 0.12331
## serum.globin     -0.188870  0.827894  0.167378 -1.13 0.25915
## serum.calcium     0.020696  1.020911  0.139537  0.15 0.88209
## bun               0.020981  1.021203  0.005816  3.61 0.00031
## 
## Likelihood ratio test=40  on 16 df, p=0.00077
## n= 65, number of events= 48

Now let us apply the stepwise function in order to reduce the model.

fit.cox.1
## Call:
## coxph(formula = Surv(data.time.status$time, data.time.status$status) ~ 
##     hgb + sex + proteinuria + protein.urine + total.serum.prot + 
##         bun, data = data.survival)
## 
##                      coef exp(coef) se(coef)     z       p
## hgb              -0.13581   0.87301  0.06349 -2.14   0.032
## sex              -0.51191   0.59935  0.36297 -1.41   0.158
## proteinuria       0.06575   1.06796  0.02819  2.33   0.020
## protein.urine     0.92502   2.52192  0.38124  2.43   0.015
## total.serum.prot  0.13296   1.14220  0.06887  1.93   0.054
## bun               0.02360   1.02388  0.00551  4.29 1.8e-05
## 
## Likelihood ratio test=28.9  on 6 df, p=6.27e-05
## n= 65, number of events= 48

From the results above, the model with the lowest AIC contains the following covariates: hgb, sex, proteinuria, protein.urine, total.serum.prot and bun. The model with those covariates is the one that is going to be analysed.

The covariates matrix will be

data.survival=data.survival[,c(1,5,11,12,13,16)]
head(data.survival)
##    hgb sex proteinuria protein.urine total.serum.prot       bun
## 1  9.4   1          12             2               11 165.00610
## 2 12.0   1          20             1                6  86.95609
## 3  9.8   1           2             1               11  32.99894
## 4 11.3   1           0             2                9  56.00154
## 5  5.1   1           3             1               10  19.99862
## 6  6.7   2          12             2                9  35.00258

For the Cox’s regression model we have

fit.cox<-coxph(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival)
fit.cox
## Call:
## coxph(formula = Surv(data.time.status$time, data.time.status$status) ~ 
##     ., data = data.survival)
## 
##                      coef exp(coef) se(coef)     z       p
## hgb              -0.13581   0.87301  0.06349 -2.14   0.032
## sex              -0.51191   0.59935  0.36297 -1.41   0.158
## proteinuria       0.06575   1.06796  0.02819  2.33   0.020
## protein.urine     0.92502   2.52192  0.38124  2.43   0.015
## total.serum.prot  0.13296   1.14220  0.06887  1.93   0.054
## bun               0.02360   1.02388  0.00551  4.29 1.8e-05
## 
## Likelihood ratio test=28.9  on 6 df, p=6.27e-05
## n= 65, number of events= 48

Covariates hgb, proteinuria, protein.urine and bun, for a 5% level of significance, statistically significant.

Next, see if the hypothesis of proportional hazards is not violated. By the results obtained, the proportional hazards hypothesis is not violated.

pph<-cox.zph(fit.cox, transform="km", global=TRUE)
pph
##                      rho  chisq     p
## hgb              -0.0197 0.0202 0.887
## sex               0.0403 0.0877 0.767
## proteinuria       0.0828 0.3503 0.554
## protein.urine     0.1067 0.6607 0.416
## total.serum.prot  0.1757 1.1636 0.281
## bun              -0.0510 0.1434 0.705
## GLOBAL                NA 1.8513 0.933

2.2 Cox Robust

In order to see if the results obtained before are consistent, a robust version of the Cox regression model is going to be performed. First call the following package.

library(coxrobust)

To see the results obtained by fitting the robust Cox using exponential weights with a 5% level of trimming, do the following:

fit.coxrobust<-coxr(Surv(data.time.status$time,data.time.status$status) ~ .,data=data.survival,trunc=0.95, f.weight="exponential",singular.ok=T,model=F)
fit.coxrobust
## 
## Call:
## coxr(formula = Surv(data.time.status$time, data.time.status$status) ~     ., data = data.survival, trunc = 0.95, f.weight = "exponential",     singular.ok = T, model = F)
## 
## Partial likelihood estimator
##                     coef exp(coef) se(coef)        p
## hgb              -0.1343     0.874  0.06363 3.49e-02
## sex              -0.5553     0.574  0.36574 1.29e-01
## proteinuria       0.0707     1.073  0.02868 1.38e-02
## protein.urine     0.9736     2.647  0.38470 1.14e-02
## total.serum.prot  0.1391     1.149  0.06846 4.22e-02
## bun               0.0258     1.026  0.00579 8.75e-06
## 
## Wald test=31.2 on 6 df, p=2.37e-05
## 
## Robust estimator
##                     coef exp(coef) se(coef)        p
## hgb              -0.1381     0.871  0.09986 0.166545
## sex              -0.5462     0.579  0.63915 0.392784
## proteinuria       0.0603     1.062  0.05479 0.271273
## protein.urine     0.8603     2.364  0.52034 0.098277
## total.serum.prot  0.1221     1.130  0.06308 0.052902
## bun               0.0249     1.025  0.00736 0.000723
## 
## Extended Wald test=18 on 6 df, p=0.00615

From the results, only covariate bun is statistically significant for a 5% level of significance. The results are not consistent with the ones obtained from the Cox’s proportional hazards.

2.3 Cox Robust based on Heritier

Another proposal of the robust Cox is given by Heritier (2009), and is going to be performed next.

filepath="../functions/"
source(paste(filepath,"Chapter7_functions.r",sep=""))

The fitted model using exponential weights with 5% level of trimming is given bellow. Notice that the results show that all covariates are significant for a 5% level of significant.

fit.coxrobust.H=rcoxph(data.time.status$time,data.time.status$status,data.survival,wt.type="exponential",quant=.95)
fit.coxrobust.H$coefficients
##                     estimate         SE         z p.value
## hgb              -0.13393895 0.07532391 -1.778173 0.07537
## sex              -0.53088710 0.41507710 -1.279008 0.20089
## proteinuria       0.05862697 0.02831727  2.070361 0.03841
## protein.urine     0.86260830 0.40406690  2.134816 0.03277
## total.serum.prot  0.12324233 0.06439164  1.913949 0.05562
## bun               0.02502957 0.00387232  6.463716 0.00000

Only covariates proteinuria, protein.urine and bun are statistically significant.

The next figure shows that observations 40, 44 and 48 are identified as influential observations in the sense that the weight given for each one is the lowest among the others.

In the next section it is going to be evaluated if the observations identified before are or not influential.

3 Outlier detection methods for survival analisys

In order to identify which of the observations might be outliers some methods for outlier detection for survival analysis are going to be performed.

3.1 Residuals

The first method used for outlier detection is the residuals: deviance.

3.1.1 Deviance

The results regarding the martingale residuals are shown bellow. Observations 3, 2, 5 and 15 presented the highest absolut value.

res.dev.2<- resid(fit.cox,type="deviance")

head(sort(abs(res.dev.2),decreasing = TRUE))
##        3        2        5       15        4       12 
## 2.273811 2.219232 2.158446 2.156097 1.874950 1.691678

3.2 Concordance c-index

The outlier detection method for the concordance c-index is based on the definition of an outlier as “an observation that when absent from the data, will likely decrease the prediction error of the fitted model”. The concordance c-index is used as a test statistics that is sensitive to the presence of outliers, i.e., the larger the number of outliers, the lower the performance of the model adjusted. The concordance c-index measures how well predicted values are concordant with the rank ordered response variables. Two algorithms based on this are going to be performed next. First let’s call the package to perform this outlier detection method.

library("BCSOD")

The input of this algorithm is:

  • the time vector;
  • the status vector;
  • the covariate matrix.

3.2.1 DBHT

The dual bootstrap hypothesis testing (DBHT) is an improvement of the BHT described before. The BHT removes one observation from the dataset, and then evaluates the impact of each removal on the concordance c-index. Notice that the model has less observations than the original dataset, and therefore the concordance c-index has the tendency to increase, which may hamper the evaluation of the statistical significance and may increase the number of false positives.

The input is the same as before. Notice that B.N corresponds to the number of observations

Here are the top observations with lowest p-values.

outliers_dbht1000[1:10,]
##               pvalues
##  [1,] 15 5.105849e-82
##  [2,] 23 5.150567e-29
##  [3,] 40 1.893182e-23
##  [4,] 17 8.135095e-18
##  [5,] 57 1.865634e-15
##  [6,] 21 1.172692e-13
##  [7,] 44 8.269036e-13
##  [8,] 35 2.873929e-12
##  [9,] 18 2.791437e-10
## [10,]  3 4.585048e-09
dbht<-as.data.frame(outliers_dbht1000)# convert to a dataframe (more easy to handle)
names(dbht)[1]="id" # change the name of the 1st) column
dbht.order<-dbht[order(dbht$id),]
dbht.order[1:10,]
##    id      pvalues
## 65  1 1.000000e+00
## 48  2 9.986505e-01
## 10  3 4.585048e-09
## 52  4 9.999125e-01
## 22  5 1.104598e-01
## 55  6 9.999836e-01
## 62  7 1.000000e+00
## 63  8 1.000000e+00
## 36  9 6.431752e-01
## 59 10 1.000000e+00

3.3 Censored quantile regression

Censored quantile regression was introduced by Portnoy (2003) as an alternative to the Cox’s regression model for survival data with censored observations. Due to the fact that the Cox’s regression model assumes proportional hazards, the censored quantile regression can provide more flexibility. The alternative method proposed by Portnoy (2003) is a generalization of the Kaplan-Meier one sample estimator.

Here two algorithm are considered. the results an command lines to obtain the outliers based on this theory are presented next. First call the package OutlierDC, and make sure that the covariate matrix is in the form of a data frame.

library(OutlierDC)
## Loading required package: quantreg
## Loading required package: SparseM
## 
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
## 
##     backsolve
## 
## Attaching package: 'quantreg'
## The following object is masked from 'package:survival':
## 
##     untangle.specials
## Loading required package: Formula
## 
## Package OutlierDC (0.3-0) loaded.
data.survival<-as.data.frame(data.survival)

3.3.1 Score

The score algorithm arises from the outlier detection algorithm for the Cox’s regression model for censored data proposed by Nardi (1999) but now based on quantile regression.

odcS_myeloma3<-odc(formula=Surv((data.time.status$time),data.time.status$status)~ hgb + sex + proteinuria + protein.urine + total.serum.prot + bun, data = data.survival,method="score",rq.model="Wang",h=0.05)
## Please wait... 
## Done.
odcS_myeloma3
## 
##      Outlier Detection for Censored Data
## 
##  Call: odc(formula = Surv((data.time.status$time), data.time.status$status) ~ 
##     hgb + sex + proteinuria + protein.urine + total.serum.prot + 
##         bun, data = data.survival, method = "score", rq.model = "Wang", 
##     h = 0.05)
##  Algorithm: Scoring algorithm (score) 
##  Model: Locally weighted censored quantile regression (Wang) 
##  Value for cut-off k_s:   
##  # of outliers detected:  0 
## 
##  Top 6 outlying scores:
##    times delta (Intercept)  hgb sex proteinuria protein.urine
## 40    51     1           1  7.7   1           4             1
## 46    88     1           1 10.6   2          21             1
## 39    41     1           1    5   2           0             2
## 44    66     1           1  6.6   1           0             2
## 48    92     1           1   11   2           4             1
## 57    13     0           1  4.9   2           0             2
##    total.serum.prot bun score Outlier
## 40               12  37   3.7        
## 46                6  15   3.4        
## 39                7  14   2.7        
## 44                5  28   2.7        
## 48                9  27   2.1        
## 57               10  46     2

The top outliers are:

head(sort(odcS_myeloma3@score,decreasing = TRUE))
##       40       46       39       44       48       57 
## 3.700180 3.398639 2.732555 2.656039 2.105830 2.031864

Put in form of a vector the results obtained before

odS.vec<-as.vector(odcS_myeloma3@score)

Define a matrix with the id and the values for the residual algorithm.

n<-dim(data.survival)[1]#dimension of the data
id<-as.vector(seq(1:n))# vector with the id
odS.matrix<-as.data.frame(cbind(id,odS.vec))

4 Rank Product test

In order to obtain a global result concerning each one of the techniques performed in the previous section, a rank product test is going to be used. First a rank matrix is going to be obtained based on the ranks of each outliers detection method.

res.dev.vec<-as.vector(abs(res.dev.2))
rank_deviance<-rank(-res.dev.vec,ties.method = "first") 
rank_dbht<-rank(dbht.order$pvalues,ties.method = "first")
rank_score<-rank(-(odS.matrix$odS.vec),ties.method ="first")

Now let us obtain based on the ranks above the rank product matrix. First let’s consider a rank matrix, with all the ranks obtained by each method.

rankMat<-cbind(rank_deviance,rank_dbht,rank_score)

In second obtain the rank product, which is obtained based on the input of the rank matrix.

RankProduct = function(rankMat)
{
  return(apply(rankMat, 1, prod))
}

rankproduct<-RankProduct(rankMat)

4.1 p-values

In order to obtain the p-values the algorithm proposed by Heskes (2014) was used. Notice that the p-values are based on the geometric mean of upper and lower bounds, defined recursively.

filepath="../functions/"
source(paste(filepath,"Heskes_pvalues.R",sep=""))

The input to obtain the p-values is the following:

  • rho: the rank product matrix;
  • the number of observations, n;
  • the number of methods used, k.
rho=rankproduct
n<-dim(data.survival)[1]
k<-dim(rankMat)[2]

The p-values are obtained by the following, where Delta option is the geometric mean.

pvalues<-as.vector(rankprodbounds(rho,n,k,Delta ='geometric'))

In order to obtain the observations with the lowest p-values we can do the lines:

pvalues.matrix<-cbind(pvalues)
which(pvalues.matrix<0.01) ## pvalues < to 1%
## [1] 40
which(pvalues.matrix<0.05) ## pvalues < to 5%
## [1]  3 15 23 40 44 57

4.2 q-values

Another key issue when performing Rank product tests is related with the multiple testing problem. In fact, since many observations are tested, type-I errors (false positives) will increase.

The FDR, which is the expected proportion of false positives among all tests that are significant, sorts in an ascendant order the p-values and divides them by their percentile rank. The measure used to determine the FDR is the q-value. For the p-value: 0.05 implies that 5% of all tests will result in false positives, instead, for the q-value: 0.05 implies that 5% of significant tests will result in false positives. In this sense the q-value is able to control the number of false discoveries in those tests. For this reason it has the ability of finding truly significant results.

The q-value package must be installed.

library(qvalue)

Here the only input that is need is the p-values obtained before for each observation.

qobj<-qvalue(pvalues)
qvalues<-qobj$qvalues
which(qvalues<0.01)
## integer(0)
which(qvalues<0.05)
## integer(0)

In order to resume all the information, let us get an id vector and a data frame that combines the ranks the pvalues and the qvalues.

id<-as.vector(seq(1:n))# vector with the id
myeloma.outliers<-as.data.frame(cbind(id,rank_deviance,rank_dbht,rank_score,pvalues,qvalues))

Next we can obtain the top 10 outlier observations, with the qvalues sorted in increasing order.

sort.myeloma.outliers <- myeloma.outliers[order(qvalues) , ]
sort.myeloma.outliers[1:10,]
##    id rank_deviance rank_dbht rank_score     pvalues   qvalues
## 40 40            23         3          1 0.003377528 0.1809949
## 15 15             4         1         53 0.012610919 0.3378967
## 3   3             1        10         51 0.031832523 0.3411680
## 44 44            15         7          4 0.026132000 0.3411680
## 57 57            11         5          6 0.020331831 0.3411680
## 23 23             8         2         46 0.045681602 0.4079974
## 64 64             7        25          9 0.091776738 0.7025898
## 5   5             3        22         45 0.154110803 0.7148786
## 17 17            13         4         57 0.153873652 0.7148786
## 21 21            10         6         56 0.169025847 0.7148786

5 Analysis of the Results

Let us look for the original dataset.

head(myeloma)
##   PID time status  hgb platelets infection age sex logwbc frac logbm
## 1   1 1.25      1  9.4         1         0  67   1  3.663    1 1.954
## 2   2 1.25      1 12.0         1         1  38   1  3.987    1 1.954
## 3   3 2.00      1  9.8         1         1  81   1  3.875    1 2.000
## 4   4 2.00      1 11.3         0         0  75   1  3.806    1 1.255
## 5   5 2.00      1  5.1         0         0  57   1  3.724    1 2.000
## 6   6 3.00      1  6.7         1         1  46   2  4.476    0 1.934
##   pctlymph pctmyel.cell proteinuria protein.urine total.serum.prot
## 1        0            0          12             2               11
## 2        0            0          20             1                6
## 3        0            0           2             1               11
## 4       13           52           0             2                9
## 5        0            0           3             1               10
## 6        3            5          12             2                9
##   serum.globin serum.calcium    logbun
## 1            7            10 165.00610
## 2            3            18  86.95609
## 3            8            15  32.99894
## 4            6            12  56.00154
## 5            6             9  19.99862
## 6            3            10  35.00258
summary(myeloma[,-c(1,3,5,8)])
##       time            hgb         infection           age       
##  Min.   : 1.25   Min.   : 4.9   Min.   :0.0000   Min.   :27.00  
##  1st Qu.: 7.00   1st Qu.: 8.8   1st Qu.:0.0000   1st Qu.:51.00  
##  Median :15.00   Median :10.2   Median :0.0000   Median :60.00  
##  Mean   :24.01   Mean   :10.2   Mean   :0.1538   Mean   :59.69  
##  3rd Qu.:35.00   3rd Qu.:12.0   3rd Qu.:0.0000   3rd Qu.:67.00  
##  Max.   :92.00   Max.   :14.6   Max.   :1.0000   Max.   :82.00  
##      logwbc           frac            logbm          pctlymph     
##  Min.   :3.362   Min.   :0.0000   Min.   :0.477   Min.   : 0.000  
##  1st Qu.:3.644   1st Qu.:1.0000   1st Qu.:1.362   1st Qu.: 1.000  
##  Median :3.732   Median :1.0000   Median :1.633   Median : 6.000  
##  Mean   :3.769   Mean   :0.7538   Mean   :1.580   Mean   : 6.738  
##  3rd Qu.:3.875   3rd Qu.:1.0000   3rd Qu.:1.857   3rd Qu.:10.000  
##  Max.   :4.954   Max.   :1.0000   Max.   :3.398   Max.   :24.000  
##   pctmyel.cell    proteinuria   protein.urine   total.serum.prot
##  Min.   : 0.00   Min.   : 0.0   Min.   :1.000   Min.   : 4.0    
##  1st Qu.:11.00   1st Qu.: 0.0   1st Qu.:1.000   1st Qu.: 7.0    
##  Median :35.00   Median : 1.0   Median :2.000   Median : 9.0    
##  Mean   :30.35   Mean   : 3.6   Mean   :1.646   Mean   : 8.6    
##  3rd Qu.:49.00   3rd Qu.: 4.0   3rd Qu.:2.000   3rd Qu.:10.0    
##  Max.   :68.00   Max.   :27.0   Max.   :2.000   Max.   :17.0    
##   serum.globin    serum.calcium       logbun       
##  Min.   : 2.000   Min.   : 7.00   Min.   :  6.001  
##  1st Qu.: 3.000   1st Qu.: 9.00   1st Qu.: 13.999  
##  Median : 5.000   Median :10.00   Median : 20.999  
##  Mean   : 5.215   Mean   :10.12   Mean   : 33.522  
##  3rd Qu.: 7.000   3rd Qu.:10.00   3rd Qu.: 37.000  
##  Max.   :12.000   Max.   :18.00   Max.   :171.989

Now let us see the values of the covariates regarding the observations that were considered outliers, 40 and 44.

myeloma[c(40),]
##    PID time status hgb platelets infection age sex logwbc frac logbm
## 40  40   51      1 7.7         0         0  74   1  3.415    1 1.041
##    pctlymph pctmyel.cell proteinuria protein.urine total.serum.prot
## 40       10           50           4             1               12
##    serum.globin serum.calcium   logbun
## 40            8            13 36.99985