How to code a multiparameter log

2018-06-27 08:18:06

I would like to estimate power of the following problem. I am interested in comparing two groups that both follow Weibull distribution. So, group A has two parameters (shape par = a1,scale par = b1) and two parameters has group B (a2, b2). By simulating random variables from distribution of interest (for example assuming different scale and shape parameters, ie a1=1.5*a2, and b1=b2*0.5; or either differences between groups are just in either shape or scale parameters), apply log-likelihood ratio test to test if a1=a2 and b1=b2 (or eg a1=a1, when we know that b1=b2), and estimate power of the test.

The questions would be what are log-likelihoods for the full models, and how to code it in R when a)having exact data, and b) for interval-censored data ?

That is, for reduced model (when a1=a2,b1=b2) log-likelihoods for exact and interval-censored data are:

LL.reduced.exact <- function(par,data){sum(log(dweibull(data,shape=par[1],scale=par[2])))};
LL.reduced.interval.censored<-function(par, data.lower, data.upper) {sum(log((1-pweibull(data.lower, par[1], par[2])) – (1-pweibull(data.upper, par[1],par[2]))))}

What is it for the full model, when a1!=a2, b1!=b2, taking into account two different observational schemes, ie when 4 parameters have to be estimated (or, in case when interested in looking at diferences in shape parameters, 3 parameters have to be estimated)?

Is it possible to estimate it buy building two log-likelihoods for separate groups and add it together (ie LL.full<-LL.group1+LL.group2 )?

Regarding log-likelihood for interval-censored data, censoring is non-informative and all observations are interval-censored. Any better ideas how to perform this task will be appreciated.

Please, find the R Code for exact data below to illustrate the problem. Thank you very much in advance.

R Code:    
# n (sample size) = 500
# sim (number of simulations) = 1000
# alpha  = .05
# Parameters of Weibull distributions: 
   #group 1: a1=1, b1=20
   #group 2: a2=1*1.5 b2=b1

n=500
sim=1000
alpha=.05
a1=1
b1=20
a2=a1*1.5
b2=b1
#OR: a1=1, b1=20, a2=a1*1.5, b2=b1*0.5 

# the main question is how to build this log-likelihood model, when a1!=a2, and b1=b2
# (or a1!=a2, and b1!=b2)
LL.full<-????? 
LL.reduced <- function(par,data){sum(log(dweibull(data,shape=par[1],scale=par[2])))}

LR.test<-function(red,full,df) {
lrt<-(-2)*(red-full)
pvalue<-1-pchisq(lrt,df)
return(data.frame(lrt,pvalue))
}

rejections<-NULL

for (i in 1:sim) {

RV1<-rweibull (n, a1, b1)
RV2<-rweibull (n, a2, b2)
RV.Total<-c(RV1, RV2)

par.start<-c(1, 15)

mle.full<- ????????????  
mle.reduced<-optim(par.start, LL, data=RV.Total, control=list(fnscale=-1))

LL.full<-????? 
LL.reduced<-mle.reduced$value

LRT<-LR.test(LL.reduced, LL.full, 1)

rejections1<-ifelse(LRT$pvalue<alpha,1,0)
rejections<-c(rejections, rejections1)
}

table(rejections)
sum(table(rejections)[[2]])/sim   # estimated power

Yes, you can sum the log-likelihoods for the two groups (if they were calculated separately). Just like you would sum the log-likelihoods for a vector of observations, where each observation has different generative parameters.

I prefer to think in terms of one large vector (ie, of the shape parameter) which contains values that vary according to the structure of the covariates (ie, group membership). In a linear model context, this vector could equal the linear predictor (once appropriately transformed by link function): the dot product of the design matrix and the vector of regression coefficients.

Here is a (non-functionalized) example:

## setup true values
nobs = 50 ## number of observations
a1 = 1  ## shape for first group
b1 = 2  ## scale parameter for both groups
beta = c(a1, a1 * 1.5)  ## vector of linear coefficients (group shapes)

## model matrix for full, null models
mm_full = cbind(grp1 = rep(c(1,0), each = nobs), grp2 = rep(c(0,1), each = nobs))
mm_null = cbind(grp1 = rep(1, nobs*2))

## shape parameter vector for the full, null models
shapes_full = mm_full %*% beta ## different shape parameters by group (full model)
shapes_null = mm_null %*% beta[1] ## same shape parameter for all obs
scales = rep(b1, length(shapes_full)) ## scale parameters the same for both groups

## simulate response from full model
response = rweibull(length(shapes_full), shapes_full, scales)

## the log likelihood for the full, null models:
LL_full = sum(dweibull(response, shapes_full, scales, log = T)) 
LL_null = sum(dweibull(response, shapes_null, scales, log = T)) 

## likelihood ratio test
LR_test = function(LL_null, LL_full, df) {
    LR = -2 * (LL_null - LL_full) ## test statistic
    pchisq(LR, df = df, ncp = 0, lower = F) ## probability of test statistic under central chi-sq distribution
    }
LR_test(LL_null, LL_full, 1) ## 1 degrees freedom (1 parameter added)

To write a log-likelihood function to find the MLE of a Weibull model where the shape parameter(s) are some linear function of covariates, you could use the same approach:

## (negative) log-likelihood function
LL_weibull = function(par, data, mm, inv_link_fun = function(.) .){
    P = ncol(mm) ## number of regression coefficients
    N = nrow(mm) ## number of observations
    shapes = inv_link_fun(mm %*% par[1:P]) ## shape vector (possibly transformed)
    scales = rep(par[P+1], N) ## scale vector
    -sum(dweibull(data, shape = shapes, scale = scales, log = T)) ## negative log likelihood
    }

Then your power simulation might look like this:

## function to simulate data, perform LRT
weibull_sim = function(true_shapes, true_scales, mm_full, mm_null){
    ## simulate response
    response = rweibull(length(true_shapes), true_shapes, true_scales)

    ## find MLE
    mle_full = optim(par = rep(1, ncol(mm_full)+1), fn = LL_weibull, data = response, mm = mm_full) 
    mle_null = optim(par = rep(1, ncol(mm_null)+1), fn = LL_weibull, data = response, mm = mm_null)

    ## likelihood ratio test
    df = ncol(mm_full) - ncol(mm_null)
    return(LR_test(-mle_null$value, -mle_full$value, df))
    }

## run simulations
nsim = 1000
pvals = sapply(1:nsim, function(.) weibull_sim(shapes_full, scales, mm_full, mm_null) )

## calculate power
alpha = 0.05
power = sum(pvals < alpha) / nsim

An identity link works fine in the above example, but for more complex models some sort of transformation might be required.

And you don't have to use linear algebra in the log-likelihood function -- obviously, you can construct the vector of shapes in any way you see fit (as long as you explicitly index the appropriate generative parameters in the vector par ).

Interval-censored data

The cumulative distribution function F(T) of the Weibull distribution ( pweibull in R) gives the probability of failure before time T. So, if an observation is interval censored between times T[0] and T[1], the probability that the object fails between T[0] and T[1] is F(T[1]) - F(T[0]) : the probability that the object failed before T[1] minus the probability that it failed before T[0] (the integral of the PDF between T[0] and T[1]). Because the Weibull CDF is already implemented in R it is trivial to modify the likelihood function above:

LL_ic_weibull <- function(par, data, mm){
    ## 'data' has two columns, left and right times of censoring interval
    P = ncol(mm) ## number of regression coefficients
    shapes = mm %*% par[1:P]
    scales = par[P+1]
    -sum(log(pweibull(data[,2], shape = shapes, scale = scales) - pweibull(data[,1], shape = shapes, scale = scales)))
    }

Or if you don't want to use a model matrix, etc., and just restrict yourself to indexing the shape parameter vector by groups, you could do something like:

LL_ic_weibull2 <- function(par, data, nobs){
    ## 'data' has two columns, left and right times of censoring interval
    ## 'nobs' is a vector that contains the num. observations for each group (grp1, grp2, ...)
    P = length(nobs) ## number of regression coefficients
    shapes = rep(par[1:P], nobs)
    scales = par[P+1]
    -sum(log(pweibull(data[,2], shape = shapes, scale = scales) - pweibull(data[,1], shape = shapes, scale = scales)))
    }

Test that both functions give the same solution:

## generate intervals from simulated response (above)
left = ifelse(response - 0.2 < 0, 0, response - 0.2)
right = response + 0.2
response_ic = cbind(left, right)

## find MLE w/ first LL function (model matrix)
mle_ic_full = optim(par = c(1,1,3), fn = LL_ic_weibull, data = response_ic, mm = mm_full)
mle_ic_null = optim(par = c(1,3), fn = LL_ic_weibull, data = response_ic, mm = mm_null)

## find MLE w/ second LL function (groups only)
nobs_per_group = apply(mm_full, 2, sum) ## just contains number of observations per group
nobs_one_group = nrow(mm_null) ## one group so only one value
mle_ic_full2 = optim(par = c(1,1,3), fn = LL_ic_weibull2, data = response_ic, nobs = nobs_per_group)
mle_ic_null2 = optim(par = c(1,3), fn = LL_ic_weibull2, data = response_ic, nobs = nobs_one_group)

链接地址: http://www.djcxy.com/p/76450.html

上一篇: 哪一个是错的？ Msdn或Headerfile

下一篇: 如何编写多参数日志