# Lexicalized Probabilistic Context-Free Grammars

## Weaknesses of PCFGs as Parsing Models

$$p(a)=q(S\rightarrow NP,,VP)q(NP \rightarrow NNS)q(VP\rightarrow VP,,PP)q(VP\rightarrow VBD,,NP)\q(pp\rightarrow IN ,,NP)q(NP \rightarrow NNS)q(NP\rightarrow DT,,NN)…\ p(b)=q(S\rightarrow NP,,VP)q(NP\rightarrow NNS)q(VP \rightarrow VBD,,NP)q(NP\rightarrow NP,,PP)\q(NP\rightarrow NNS)q(PP\rightarrow IN,,NP)q(NP\rightarrow DT,,NN)$$

## Lexicalized PCFGs

• N表示的是非终止符
• 其中$\sum$表示词汇集合，也就是终止符
• R 是一个规则集合，这些规则属于下面三种中的一种：
$$X(h) \rightarrow_1 Y_1(h)Y_2(m) ,,where,,X,Y_1,Y_2 \in N,h,m \in \sum\ X(h) \rightarrow_2 Y_1(m)Y_2(h),, where ,,X,Y_1,Y_2 \in N,h,m \in \sum\ X(h) \rightarrow h ,,where,, X \in N,h \in \sum$$
• 对每一个规则$r$，用$q(r)$表示概率,有：
$$\sum_{r \in R: LHS(r)=X(h)} q(r)=1$$
其中$LHS(r)$表示任何规则的左边
• 定义$\gamma(X,h),X \in N,h \in \sum$表示X的词汇是h的概率，有$\sum_{X \in N,h \in \sum}\gamma(X,h)=1$
这样parse tree的概率就可以如下表示，其中$r_i$表示R中的规则：
$$\gamma(LHS(r_1)) \prod_{i=1}^{N}q(r_i)$$
如下图，就可以计算下面解析树的概率了：

## Parameter Estimation in Lexicalized PCFGs

$$q(S(examined) \rightarrow_2 NP(lawyer),,VP(examined))\ =P(R=S\rightarrow_2 NP,,VP,M=lawyer|X=S,H=examined)\ =P(R=S \rightarrow_2 NP,,VP|X=S,H=examined)\P(M=lawyer|R=S\rightarrow_2 NP,,VP,X=S,H=examined)$$

$$q_{ML}(S\rightarrow_2 NP,,VP|S,examined)=\frac{count(R=S\rightarrow_2 NP,,VP,X=S,H=examined)}{count(X=S,H=examined)}\ q_{ML}(S\rightarrow_2 NP,,VP|S)=\frac{count(R=S\rightarrow_2 NP,,VP,X=S)}{count(X=S)}$$

$$P(R=S\rightarrow_2 NP,,VP|X=S,H=examined)\ =\lambda_1q_{ML}(S\rightarrow_2 NP,,VP|S,examined)+(1-\lambda_1)q_{ML}(S\rightarrow_2 NP,,VP|S)$$

$$P(M=lawyer|R=S\rightarrow_2 NP,,VP,X=S,H=examined)\ =P(M=lawyer|R=S\rightarrow_2 NP,,VP,H=examined)$$

$$q_{ML}(lawyer|S\rightarrow_2 NP,,VP,examined)\=\frac{count(M=lawyer,R=\rightarrow_2 NP,,VP,H=examined)}{count(R=S\rightarrow_2 NP,,VP,H=examinde)}\ q_{ML}(lawyer|S\rightarrow_2 NP,,VP)\=\frac{count(M=lawyer,R=\rightarrow_2 NP,,VP)}{count(R\rightarrow_2 NP,,VP)}$$

$$\lambda_2 q_{ML}(lawyer|S\rightarrow_2 NP,,VP,examined)+(1-\lambda_2) q_{ML}(lawyer|S\rightarrow_2 NP,,VP)$$