## Cmpsci 240: ``reasoning about uncertainty" - lecture 17: naive bayes classification

CMPSCI 240: “Reasoning About Uncertainty”
The hypotheses are a set of k partitioning events H1, . . . , Hk .

We have priors for the hypotheses P [H1] , P [H2] , . . . , P [Hk ]We have observed data D and likelihoods P [D | Hi ]Use Bayes rule to update hypotheses probabilities based on the data,
P [Hi |D] = P [D|Hi ] × P [Hi ] = P [D|Hi ] ×
If we have m pieces of data D1, . . . , Dm then
It’s not unusual for us not to have enough information to calculatethis so we often make some independence assumptions . . .

DefinitionGiven three events A, B, and C , we say that A and B are independentconditioned on C if P [A ∩ B|C ] = P [A|C ] P [B|C ].

ExampleThrow a 5 sided dice. Define the events A = {1, 2, 5}, B = {1, 3, 5}, andC = {1, 2, 3, 4}. Then A and B are not independent:
But A and B are independent conditioned on C :
P [A1 ∩ . . . ∩ Am] = P [A1]×P [A2|A1]×P [A3|A1 ∩ A2] . . . P [Am|A1 ∩ . . . ∩ Am−1]
P [A1 ∩ . . . ∩ Am] = P [A1] × P [A2] × P [A3] × . . . × P [Am]
We say the events are independent conditioned on C if
P [A1 ∩ A2 ∩ . . . ∩ Am|C ] = P [A1|C ] × . . . × P [Am|C ]
We say events D1, D2, . . . , Dm are independent given the hypothesesH1, . . . , Hk if for each Hi
P [D1 ∩ D2 ∩ . . . ∩ Dm|Hi ] = P [D1|Hi ] × P [D2|Hi ] × · · · × P [Dm|Hi ]
If the observations D1, . . . , Dm are independent given the hypothesesP [Hi |D1 ∩ D2 ∩ . . . ∩ Dm] simplifies to
Example of Bayesian Reasoning with Independent Data
You see a blue parrot that you hypothesize is a Norwegian BlueYour birdwatching book says:
Only 10% of blue parrots are Norwegian BluesNorwegian Blues spend 60% of their time lying downOther blue parrots only spend 20% of their time lying down80% of Norwegian Blues have lovely plumage20% of other blue parrots have lovely plumage.

If we see a blue parrot that is lying down (data D1) and has lovelyplumage (data D2): what’s the probability it’s a Norwegian Blue?Let H1 be the hypothesis that it’s a Norwegian Blue and let H2 bethe hypothesis that’s it’s not. Assuming the data is independentgiven the hypotheses,
0.6 × 0.8 × 0.1 × (0.6 × 0.8) + 0.9 × (0.2 × 0.2)
Suppose you have an email and you want to know if it’s spam
You can compute various “features” of the email. The presence orabsence of these features are then pieces of observed data, e.g.,presence or absence of particular words like
You have access to a lot of previously-labeled emails
How can you compute the probability that this email’s spam?
How can you classify this email (spam vs. not spam)?
You have hypotheses (spam and not spam) with associated priors:
You have m pieces of data D1, . . . , Dm about the email, e.g.,
If you assume D1, . . . , Dm are independent, and you can computeP [Dj | spam] and P [Dj | not spam], then
The task of classifying an email is the same as “choosing the besthypothesis” (spam vs. not spam) for that email
The email can be classified by computing:
In other words, compute likelihood × prior for each hypothesis(spam vs. not spam) and see which has a greater value
How can we compute the priors and likelihoods?
We have access to a lot of previously-labeled emails!
# total number of emails# spam emails with Dj
A useful heuristic is to smooth the data:
Suppose I know that 80% of my email is spam. I have 3 features:luxury, brands, and save. For each email, I will therefore have 3 piecesof data—the presence or absence of each one of these features. I know
P [luxury | spam] = 0.4, P [brands | spam] = 0.3, P [save | spam] = 0.4
P [luxury | ¬spam] = 0.01, P [brands | ¬spam] = 0.2, P [save | ¬spam] = 0.1
Suppose an email includes luxury and save and brands. Should it beclassified as spam or not spam? Spam since:
P [luxury ∩ brands ∩ save | spam] × P [spam]
P [luxury | spam] × P [brands | spam] × P [save | spam] × P [spam]
P [luxury ∩ brands ∩ save | ¬spam] × P [¬spam]
P [luxury | ¬spam] P [brands | ¬spam] P [save | ¬spam] P [¬spam]
Suppose I know that 80% of my email is spam. I have 3 features:luxury, brands, and save. For each email, I will therefore have 3 piecesof data—the presence or absence of each one of these features. I know
P [luxury | spam] = 0.4, P [brands | spam] = 0.3, P [save | spam] = 0.4
P [luxury | ¬spam] = 0.01, P [brands | ¬spam] = 0.2, P [save | ¬spam] = 0.1
P [¬luxury | spam] = 0.6, P [¬brands | spam] = 0.7, P [¬save | spam] = 0.6
P [¬luxury | ¬spam] = 0.99, P [¬brands | ¬spam] = 0.8, P [¬save | ¬spam] = 0.9
Suppose an email includes brands and save and not luxury. Should itbe classified as spam or not spam? Still spam since:
P [(¬luxury) ∩ brands ∩ save | spam] P [spam] = 0.6×0.3×0.4×0.8 = 0.0576
P [(¬luxury) ∩ brands ∩ save | ¬spam] P [¬spam] = 0.99 × 0.2 × 0.1 × 0.2

Source: http://www.cs.umass.edu/~mcgregor/240F11/lec25.pdf

VerVita™ Natural Balancing Cream Menopause. The very word itself gives women pause as they age and approach that time of life often called simply “The Change.” Those common physiological changes— not to mention pronounced psychological perturbations—associated with a woman’s perimenopausal life include such debilitating downers as depression, sleep difficulties, night sweats,

Trattamento degli stati di agitazione e delle psicosi in corso di demenza L'agitazione è un termine di tipo generale che studiati a sufficienza nei pazienti anziani, pare, da una crescente mole di dati clinici, si possa aggressività, combattività, veemenza verbale, affermare la concreta possibilità del loro Una percentuale di circa il 50% di soggetti Risperidone e Clozapina sarebb