Till KTH:s startsida Till KTH:s startsida

Nyhetsflöde

Logga in till din kurswebb

Du är inte inloggad på KTH så innehållet är inte anpassat efter dina val.

I Nyhetsflödet hittar du uppdateringar på sidor, schema och inlägg från lärare (när de även behöver nå tidigare registrerade studenter).

April 2016
under HT 2015 mladv15

Hedvig Kjellström skapade sidan 13 augusti 2015

Lärare Hedvig Kjellström ändrade rättigheterna 14 september 2015

Kan därmed läsas av alla och ändras av lärare.
Lärare kommenterade 3 november 2015

Assignment 1 has now been published on the course homepage, under HT 2015 mladv15 > Assignments. Good luck with the assignment, and see you all tomorrow!

Lärare kommenterade 4 november 2015

This message is for PhD students taking the course.

PhD students do not receive grades (for any of their courses). The PhD baseline for a Pass grade is set to:

  • All tasks on Assignment 1 (A)
  • All tasks on Assignment 2 (A)
  • The compulsory requirements on the project (E)

There are no time limits on assignments other than the final deadline of April 1, 2016. The project should however be presented together with the others, January 18, 2016.

kommenterade 4 november 2015

Hello! I have a question regarding the first assignment's first question -- you write \(\mathbf{x}_i^j\), which looks like it would be a vector (on account of the bold face), but I'm assuming the superscript j is supposed to be the jth component, so it would be a scalar, no?

Lärare kommenterade 4 november 2015

Well spotted Ludvig, the copy-paste devil strikes again, just before and in the left hand side of Eq. 2 you can skip the superscript j, it doesn't mean anything.

kommenterade 5 november 2015

Hi Teachers,

Is it possible to have access to Assignment 2 right away? I have some personal reasons why I would want to attempt that earlier if possible.

kommenterade 5 november 2015

Hello! Regarding question 7 in assignment 1, what does representability of a model mean?

Lärare kommenterade 6 november 2015

Hi Gabriela,

That is indeed an interesting question and rather hard to answer without giving you the answer to the question ;-). Think like this, we build models to represent data, i.e. representability can then be interpreted as the capability of a model to represent data. So, how does then a non-parametric and a parametric approach differ with respect to the data it can represent? I know the answer is a bit vague but think about how these two classes of models differ and I think you will be able to give a really good answer.

Lärare kommenterade 6 november 2015

Hi Akshaya, Assignment 2 is not ready yet, as soon as we are done with it it will be published.

En användare har tagit bort sin kommentar
kommenterade 9 november 2015

Hi Carl!

in Question 11 - visualise the prior distribution of W - is it a multivariate Gaussion we should plot or norm.pdf... I confused. 

kommenterade 9 november 2015

Hi, another notation question... In equation (33): the way this is written doesn't conform if A is 10x2 and x is 100x2. Should it instead read Ax' ? Then will be 10x100, and not 100x10 like it says in the next paragraph, which kinda makes more sense - inferring the lower dimensional representation of 100 ten dimensional observations, than inferring the lower dimensional representation of just 10 one-hundred dimensional observations...? Thanks

kommenterade 9 november 2015

In question 5, are we supposed to interpret the meaning behind a cube-y prior with vertices on axes?

kommenterade 9 november 2015

Hi Carl,

in question 6 we are supposed to derive the posterior by the knowledge of the likelihood and prior. As the likelihood is composed of multiplications of Gaussian distribution, the derivation will be included a sum of the terms in the exponent. Due to this, I get the sum term in the mean and covariance matrix of the posterior. This feels on the one hand very strange (mean and covariance potentially increases with more pair of points xi and yi) but at the same time correct because of that the likelihood consists of a multiple of Gaussians. I am confused...

Best regards,

Leo

Lärare kommenterade 9 november 2015

Hi all,

Lets get cracking on these questions. 

Q5. Think like this, what is the characteristic different between an L1 and a L2 distance? If the "cost" is associated with L1 or L2 which points/parameters will have the same cost, different cost. Then from this I think that you should be able to figure out the question. One clue is to draw iso-surfaces of the different distance functions.

Q6. I am also a little bit confused by your question. So, yes you are supposed to multiply the prior with the likelihood and write the expression for the posterior. There will be a sum term in the exponent, but you can write this as a matrix product instead just as we did on Friday. I am not sure if this explains your question fully, but I hope that it gets you somewhere on the way.

Q33. Erik, you are indeed completely correct. The output should be 100x10 and currently the dimensions of the calculations will not match. So, if you do Y'=A*x' which means Y = x*A', then the dimensionality will be as it should. 

Q11. So the way to plot this is to use colour to encode the actual probability value just as I did on the lecture notes. You can do this really simply by creating a function that returns the probability of the prior and then do a nested for loop over the two parameters and visualising this as an image. 

kommenterade 11 november 2015

In the first assignment, section 2.3, it says "Think about how this relates to the latent space models that you worked on in the first part of the course, where you used discrete latent states to represent continuous data." and I'm wondering what I've missed? Is the first assignment not the first? Does it assume that I have prior knowledge?

Lärare kommenterade 11 november 2015

Hi Carl,

Yes, that is me being sloppy this sentence should have been removed when the assignment changed from being the second (last year) to the first (this year). So do not worry about this now, it is nothing essential but maybe you can now relate to this when you are doing Jens course. Sorry about this.

kommenterade 12 november 2015

Is the prior for X missing under the integral in Eq. 23?

kommenterade 12 november 2015

I don't get what you are asking for in Question 27: all these sums should be 1 by construction... Thanks

Lärare kommenterade 12 november 2015

Hlynur, yes p(X) is missing in that equation.

Erik, if I answer like this ;-) is that clear enough =).

kommenterade 12 november 2015

Hi, I think I have spotted a mistake in the derivations of exercise 1. The resulting posterior distribution over W should have the inverse of the covariance matrix it has, because what it is found "completing the square" is the inverse of the actual covariance matrix (S^-1). Maybe this helps someone for the lab.

kommenterade 12 november 2015

In the derivation of the posterior of W, there's a line saying y'xW = W'x'y or something like that, I'm not seeing why that is.

kommenterade 12 november 2015

In 2.1, it is stated \(y_i\) is a \(D\)-dimensional vector and \(x_i\) is a \(q\)-dimensional vector. This means \(W\) in eq. (5)  has to be a \(D \times q\) matrix in order to make the matrix multiplication correct.

Is \(W_0\) also a \(D \times q\) matrix?

Lärare kommenterade 13 november 2015

Aitor: yes that is indeed true, I've lost the ^{-1} for the last line, the PDF that I uploaded was updated with the correct derivation, so if you re-download it I think that it should be fine.

Ludvig: For the rules of the transpose, you get this result by repeatedly performing (AB)' = B'A' . A good reference for these tricks, and lots of others can be found here [1] which is free online.

[1] K. Petersen and M. Pedersen, “The matrix cookbook,” Technical University of Denmark, 2006.

Oscar: That is indeed, true, and if you work with that you are going to have to model really tricky covariances. However, think about this assumption, the output dimensions are conditionally independent of the input. If you make this assumption it all falls back to a simple 1-D problem for each dimension and you have a W_0 matrix which is q x 1. 

kommenterade 13 november 2015

What I mean is not the previous mistake when finding the mean, but the part where it is written P(W|Y,X) proportional to N(mean,cov) at the bottom of that page. The cov that is written there is the inverse of the actual covariance, so it should be cov^{-1}

kommenterade 13 november 2015

I'm confused. Are we suppose to write a report and also present it orally? Will the oral presentation be carried out at the deadline date or some time after? Some clarification is needed.

/Robin

Lärare kommenterade 13 november 2015

Hi Robin, you should submit a report with all your findings. Normally we will not have any oral examination but in specific cases this still might be needed if we find that it is not feasible to set a grade based on the results in the report. If we feel that an oral examination is needed then we will contact you and we will decide on a date that suits both of us.

Lärare kommenterade 13 november 2015

Aitor: ah, now I understand, yes there is something fishy going on here, when I identify the S matrix I am actually identifying the S^{-1} matrix and that error follows on through the derivation. Right now I do believe that there is only an inverse on the covariance that’s missing but I do need to go through this more in detail just to be sure. I will do that and update the derivation again.

kommenterade 14 november 2015

Regarding the oral exam after Assignment 2 on December 17, is there any possibility to do it through Skype if you get chosen? I know many students, including myself, will be spending the holidays in our home countries. Flying before the 17th can save quite a lot of money. 

Lärare kommenterade 14 november 2015

Hi Kristofer, absolutely, if that would be the case then we will make sure that we somehow find a solution that works for everyone. Do not worry about this, think more of it as a catch all, "that we as teachers want to keep the option open to call someone for an oral examination if we think that there are stuff that we need to clear up". So you just go ahead and book your tickets.

kommenterade 15 november 2015

Equation 23 in the assignment:

$$p(Y|W) = \int p(Y|X,W)dX$$

Will not the left hand side integrate to > 1? The integrand in the right hand side should be multiplied by p(X) right?

kommenterade 15 november 2015

Ok, just saw the previous question for this. but the pdf wasn't updated so I thought this wasn't noticed yet.

kommenterade 16 november 2015

In the 2.4.1 practical are we only supposed to learn the linear mapping (that the matrix A does when we generate the data)? Or do we need to learn the non linear mapping too (that the function f_nonlin does)?

Lärare kommenterade 16 november 2015

Hi John, you are completely right, you only need/can get the output of the non-linear mapping back from the linear assumption. If you want to recover the "true" underlying parameter you will need a non-linear method and that we do not do in this assignment. Hope this helps.

kommenterade 16 november 2015

So just to clarify, the X in question 21, question 20 and so on corresponds to the left hand side of equation 32 and not 31? i.e.

$$\bar{X} = f_{non-lin}(x_i), x\in[0,...,4\pi]$$

Lärare kommenterade 16 november 2015

The X in question 20 refers to the equation above, i.e. eq. 28, which has to do with why it is simpler to marginalise out f than X. The X in question 21, is what you learn from performing the optimisation.

En användare har tagit bort sin kommentar
En användare har tagit bort sin kommentar
kommenterade 16 november 2015

Suggestion for the assignments: 
As there is only one Carl and many questions it seems less optimal to have Carl answer the same questions more than once.
Let student post questions on bilda prior to the exercise or write them up on the board in the beginning of (and during) an exercise. 
Then other students can "like" the questions that they want answered. In the case that you write the questions on a black board then students can vore by putting marks after each questions. 
Questions with most likes or marks will be answered by Carl. 
This provides answers to the largest amount of students. It also provides Carl with info of what students know and don't know. If digitalised, through bilda or similar systems, this data can be used to further develop course. 

kommenterade 16 november 2015

There can be only one.

kommenterade 16 november 2015

I'm getting stuck on question 19. Is there any point to continuing and doing other questions even if I can not answer that one?

Lärare kommenterade 17 november 2015

Erik that is a very good suggestion, so it is a bit tricky now. But if you post things here I will try to answer them. I already have collected enough data so the posterior over which questions are challenging is low entropy so I will try to write a couple of general answers here right now. Just give me a bit and I'll try to answer some questions here.

Lärare kommenterade 17 november 2015

Ok, I have now written a little help on the questions that most of you seem to get stuck on, I hope that this helps. You can download the PDF here.

Also, I have uploaded the derivations from Friday. The will be no help at all for the assignment but for you who are keen to look through what we actually did you can find them on the lecture page.

kommenterade 17 november 2015

The supplied code for index in part3 seems to have errors. It is trying to subtract each model's evidence with every other models', but in the paper they calculate distance as the sum of differences between evidences for two data sets. I cannot get the supplied code to run. Can anyone else?

kommenterade 17 november 2015

In the matrix cookbook
http://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
it feels like eq. (43) and eq. (57) are doing the same thing, but one gives a scalar and the other a square matrix. How do we know when to use which?

Lärare kommenterade 17 november 2015

Hlynur: they are actually the same thing, 57 is a special case where you calculate each element of the matrix at the same time, while in 43 you need to also place a "denominator" to show what you are taking the derivative over. Make sense?

Carl: how do you pass the evidence? There is the line in the comment of the code in the beginning that already does the sum I think that you are talking about, so the difference that is computed in the code will actually be the difference between the evidence for two data-sets summed over each model, so I think it is correct. Anyhow there isn't really a right or wrong way of doing this, as is stated in the paper, so if we all use the code that I supplied then it makes it easy for me to compare them. Does this clarify things?

#       evidence = np.zeros([num_models,num_data_sets])
#       index = create_index_set(np.sum(evidence,axis=0))

Zlatan for president!
kommenterade 18 november 2015

I don't really understand the index algorithm in the paper for the last practical.

The distance function they are using is not symmetrical, is that the point? When getting the N set which should be the points (datasets I guess?) that are closest to L (the last dataset we picked) should it be the distance from L to the other datasets or the other way around which will be different since the distance function is not symmetrical?

And if the N set is nonempty, it says the furthest point in N from L should be picked. In your code, you have an argmin(), but shouldn't that be argmax() if the furthest point is to be picked? 

kommenterade 18 november 2015

I'm stuck on the last part of Practical 2 (Q21). I have defined the function and the derivative, but I'm not sure how to get to x, what the shape of x is, and how to plot it. The question says we should find "the single line x" but it seems more like we want to find the two-dimensional x' that results from the non-linear mapping from R1 -> R2, since the calculations of the gradient assume a linear mapping. What do you mean by a single line?

Right now I'm also getting precision loss errors while optimizing, do you have any ideas about what the problem could be?

kommenterade 18 november 2015

To avoid precision loss errors while optimizing, add a very small amount of white noise to the data, i.e. make sure you always have some variability in your data.

En användare har tagit bort sin kommentar
kommenterade 19 november 2015

My optimization runs different amounts of iterations but always ends in "desired error not necessarily achieved due to precision loss.". I have added noise to the A matrix which did not result in any improvement.
Is there any other cause you can think of?
Furthermore assuming I do get an optimal w I do not see how to convert this to an optimal f(x) where f is the non-linear function since this w is not a square matrix and not invertible.

kommenterade 19 november 2015

Use pseudo inverse.

kommenterade 19 november 2015

Where should be submit the assignment?

kommenterade 19 november 2015

The email is mentioned in the assignment text in the beginning. Correct me if im wrong.

kommenterade 19 november 2015

Correct! Thank you Robin

Lärare kommenterade 19 november 2015

Assignment 2 Part II is now uploaded on the web pages. You can thus get started with Tasks 2.4-2.6 already now if you like, and then take on Tasks 2.1-2.3 when they are uploaded later today.

Best,

/Hedvig

Lärare kommenterade 20 november 2015

Now the full Assignment 2 is online! If you downloaded part II yesterday, throw this version away and download the full Assignment 2 as of today Friday, 14.15.

Good luck!

/Hedvig

kommenterade 21 november 2015

Question 2.1, on the 6th line, "+" should rather be "-" right?

Lärare kommenterade 22 november 2015

Yes that's true. Thanks for pointing this out. 

Jens

Lärare kommenterade 22 november 2015

This is now corrected in the pdf file!

Lärare kommenterade 23 november 2015

Master students: For time management reasons, we will not correct late assignments until the late deadline, April 1, 2016. Thus, wait until then with the hand-in of your late Assignment 1!

PhD students: We will, as agreed upon earlier, correct your assignmens when you hand them in. The requirements for a Pass grade is all tasks, i.e., the same requirement as A for Master students.

Best,

/Hedvig

Lärare kommenterade 23 november 2015

The grades for the first assignment is now up on RAPP.

I had a lovely weekend reading through your reports, there were some very impressive work that you should be very proud of. I would have love to sit down with all of you and have a discussion about your work but sadly there is no time for this. Tomorrow I will come to the lecture and hand back your reports in the break.

kommenterade 23 november 2015

Carl Henrik for president

Lärare kommenterade 24 november 2015

In 2.1, assume that all outcomes have a positive probability. This removes some pathological cases, which you may not have considered anyway :-). 

Best,

Jens

Lärare kommenterade 25 november 2015

The whitening in Task 2.5 requires you to do singular value decomposition (SVD)  to get the eigenvectors and eigenvalues of the data. For a concise description, see e.g. the Wikipedia page on PCA. Note that this is the deterministic version of PCA, not PPCA as you studied in Assignment 1, and that you do a closed-form solution using SVD to find the eigenvectors and eigenvalues of the data.

I can recommend the Python function numpy.linalg.svd or the Matlab function svd.

Cheers,

/Hedvig

kommenterade 25 november 2015

Can we use R for the implementation tasks in the second assignment?

kommenterade 25 november 2015

Could we have the .tex files for the assignments as well?

Lärare kommenterade 25 november 2015

You can definitely use R! Really nice if students try different languages.

We prefer to not give out the source tex files - but there are tons of other tex examples to get inspiration from on the web!

All the best,

/Hedvig

kommenterade 26 november 2015

In task 2.2, does each table have its own dice with its own categorical distribution, or do the two table classes T and T' have just two categorical distributions (one each) and all the member tables of the class use that class's distribution? I don't see how the two table classes are particularly meaningful if every table has its own distribution. The assignment text is ambiguous. I smell HMM.

Lärare kommenterade 26 november 2015

This is a good question. As it is formulated, each table has its own categorical. Although, when you generate data, you don't necessarily have to have different categorical distributions across the tables of each class. It is of course true that the formulation doesn't make a whole lot of sense, but it is an assignment and they are often somewhat contrived, in order to get a problem of an appropriate difficulty. 

Best,

Jens

kommenterade 26 november 2015

In task 2.1, the interpretation of the influences (denoted by "+" and "-") is essentially comparing the probabilities of outcomes ("0" or "1") of the parents (it says which parent has probably generated a certain child). In question 1, however, I feel that, besides comparing outcomes of parents, I need to compare outcomes of a child ("0" or "1") for a given parent as well. What I mean is: if the outcome of a parent is "1" and the connection between this parent and a child has a "+", I would like to conclude that it is more probable that this child has an outcome "1" than "0". But from the interpretation of the influences shown in the assignment I can't conclude that (the interpretation doesn't compare the probabilities for different outcomes of children). I tried to derive it mathematically from the given rule, but I didn't reach any insightful result. I must be missing something. Could you help me with this? :)

kommenterade 30 november 2015

Hi Hedvig!

In 2.5 the description of the data in the assignment doesn't seem to coincide with the figures a, b, c and d. That is, in the assignment it says. "Figure 2(c) shows two individual, independent signals over time.... represented as a point s_i in a 2D space, see Figure 2(b)." But figures 2(a) and 2(b) show the mixed signals (the observed ones). In the citation above it sounds like the latent signals are described. Am I correct or have I simply misunderstood the data?

Lärare kommenterade 30 november 2015

You are completely correct, Elizaveta - I can only blame my workload... :)

A version of Assignment 2 where the Task 2.5 has been corrected is now uploaded.

Best,

/Hedvig

kommenterade 2 december 2015

Do we have to write down the code in the report for question 5?

kommenterade 2 december 2015

For question 7 and 8 in section 2.3, are we allowed to use a built in libraries and just describe the algorithms?

Or do we have to implement algorithms ourselves?

Lärare kommenterade 2 december 2015

Yes you should do that. The main reason is that I strongly suspect that the wanted functionality cannot be found in standard packages. 

Best,

Jens

kommenterade 2 december 2015

In 2.4, we're supposed to find the EM parameters for the model of a given player. Is it then OK to assume N graphical models in 2.2 and have the algorithms output the probabilities of tables for each player separately in 2.3?

kommenterade 3 december 2015

I also have a question about EM. I understand it like I should calculate \(p(Z|X,\theta)\) for all possible combinations of Z, i. e. the latent, single dice throws. That grows exponentially, is this ok or did I think  wrong somewhere? My likelihood increases and converges so it seems to work, it just scales really badly :)

Lärare kommenterade 3 december 2015

Carl: These problems have been designed in order to not be solvable using standard methods and, thereby, give you an opportunity to develop a deeper understanding. So the right approach is to try to understand how they differ from a problem that can be solved using the standard methods and how a standard method needs to be tweaked or extended in order to solve the problem at hand. 

Hlynur: No the the tables parameters are the same for al the players. 

Isac: It is not ok to get a exponential running time. An exponential running time, i.e., caused by examination of combinations of hidden variables, can typically be evaded by noticing that it is sufficient to compute a marginal, in this case of \(p(Z|X,\theta)\).

Best,

Jens

kommenterade 4 december 2015

In questions 7 and 8, what does z mean? In the instructions it says that z is the player's own dice, but then in question 7 it seems that z is a table. Can you help me with that? :)

Lärare kommenterade 4 december 2015

Thanks for this excellent observation Gabriela. As you suggest, I actually meant the tables in in question 7 and 8. That interpretation also renders the problems easier so I suggest that you all use it. That is, interpret each Zi in p(Z1, ..., ZK|s1, . . . , sK, Θ) as a table, or better use some other variable say Qi and let p(Q1, ..., QK|s1, . . . , sK, Θ) be the probability of the table sequence Q1, ..., QK given the observations s1, . . . , sK and the parameters Θ.

Best,

Jens

kommenterade 7 december 2015

Hi Jens! I'm wondering about the indices in question 7. First regarding the given table sequence z_1, ..., z_n : is n the same N as in the casino description (number of players) or is it just any n?
And for the dice sum sequence s_1, ..., s_K : I'm assuming K is the same as above = number of tables visited per player?

Is there a relation between n and K in question 7, for example should n<K ?

Mvh,

Sofia

Lärare kommenterade 7 december 2015

The n in the z_1, ..., z_n should be K and, yes, K is the number of tables visited. 

Best,

Jens

En användare har tagit bort sin kommentar
En användare har tagit bort sin kommentar
kommenterade 9 december 2015

I'm still struggling with the EM-algorithm. The Q-term is supposed to be the expectation, or the weighted sum over all generating distributions. But what are the generating distributions really? And given some X and Z, this really leaves no distribution (over S anyway) since there is only one S = X + Z, right? So letting only Z be the latent variable at least leaves a distribution over X's. But still, how many are they? Are they not all the combinations of assignments of the _vector_ Z, that is exponentially many? And I really don't see how to not consider all combinations of generating distributions.

Thanks

Lärare kommenterade 9 december 2015

I assume that you use X_k and Z_k for the tables and players dice at step k. Notice that those are the hidden variables. So you should first assume that those available to you and write an expression for the complete likelihood. Then you should consider the expected complete log-likelihood, that is over X and Z (which are the X_1,...,X_K and  Z_1,...,Z_K, respectively) given S (all the sums). As you write there are exponentially many X and Z, but notice that this has been the case in each example of the EM algorithm we have considered. However, in the end (after the standard steps), we have been left with marginals over single or pairs of variables, corresponding to the present X_i and Z_i variables.

Again, first make sure that you can express the likelihood for complete data. 

Best,

Jens

kommenterade 9 december 2015

Follow up question:

The innermost product looks like this for the complete data:

$$\log \left[ p(z_{nk})p(x_{nk})p(s_{nk}|x_{nk}, z_{nk}) \right] = \log \left[ p(z_{nk})p(x_{nk})I(s_{nk} = x_{nk} + z_{nk}) \right]$$

Is it assumed the complete data more or less likely, or rather, is it a problem that you can get \(\log(0)\) when the indicator function is zero?

Lärare kommenterade 9 december 2015

Getting log(0) is usually not a concern. Except from that comment it is hard to completely understand you question and answering it without risking to lead you in an incorrect direction. Sorry. 

Best,

Jens

Lärare kommenterade 9 december 2015

I have added some tips about Task 2.6 to the Assignments page in the course homepages. In short:

- Start asap, running times are very large!

- Read the tips before starting!

Best of luck,

/Hedvig, and Zheng who designed the task

kommenterade 10 december 2015

Hi Jens! I'm trying to maximize the expected probability of the latent variables given some observed data, and wind up with this expression:

$$\frac{\partial E \log p(\mathbf{x}, \mathbf{y} | \mathbf{s})}{\partial p(x_{nk})} = \sum_{nk}^{NK} \sum_{a,b=1}^6 \frac{r_{nk}^{ab}}{p(x_{nk})} = 0$$

Where \(r_{nk}^{ab} = p(X_k = a, Y_n = b | s_{nk})\) is the responsibility of dice throw pair a and b for the nk'th dice sum. I'm not sure how to continue from here - I seem to recall from the lectures that you were supposed to get some kind of frequency here, but I don't really see how that would work out.

Any hints?

Lärare kommenterade 10 december 2015

No, not really. It is hard to provide any guidance here concerning the path forward here. It is odd, though, that you are taking derivatives. Also, where are the new parameters in the expression for the log-likelihood?

Do you have an expression for the complete likelihood, i.e., assuming that all variables are known to you?

Best,

Jens

kommenterade 10 december 2015

I do have an expression \(p(\mathbf{s, x, y})\) which I rewrote to the log likelihood of the complete data, and then took the expectation of the latent variables given the observations, that is, \(E_{p(\mathbf{x, y} | \mathbf{s})} \log p(\mathbf{s, x, y})\).

Lärare kommenterade 10 december 2015

Then you should proceed from there, and that should not give the previous expression. 

Best,

Jens

kommenterade 10 december 2015

I should clarify concerning the parameters -- the expected log probability  is taken with the current estimate, so \(E_{p(\mathbf{x, y} | \mathbf{s}, \theta')} \log p(\mathbf{s, x, y| \theta})\) where the prime indicates current estimate.

kommenterade 10 december 2015

File failed to load: /extensions/MathMenu.js

Hi again Jens! Can I post the expression I got for the expected complete log likelihood? I've looked at how others do EM for continuous distributions, and they generally seem to do derivatives of the kind I showed above -- perhaps that's a mistake. I guess you're saying take that expression, and instead of deriving it, reason about how to maximize it in a similar way to how you did for Baum-Welch?

Lärare kommenterade 10 december 2015

No please don't post it and not the final solution either :).Yes, I believe that should be able to do it in the way I did it for Baum-Welch. As always try get a expressions, solutions etcetera that are similar to those for the most similar problem. 

Best,

Jens

En användare har tagit bort sin kommentar
kommenterade 11 december 2015

In question 7, does theta contain any information of the probabilities of switching between primed/unprimed states or do we have to somehow learn these probabilities before calculating the sought output?

Lärare kommenterade 12 december 2015

The transition probabilities are specified by the sentence: "In the k:th step, if the the previous table visited was Tk−1, the player visits Tk with probability 1/4 and T 0 k with probability 3/4, and if the previous table visited was T 0 k−1 , the player visits T 0 k with probability 1/4 and Tk with probability 3/4." So you know them. 

Best,

Jens

kommenterade 12 december 2015

Hi Hedvig,

I have a question regarding LDA. I have plenty of words such at "at", "in", "and" etc among top 20 in each topic for K=3. I wonder if that is expected. Is there any remedy for this in case this kind of output is indeed expected? 

Best,

Polina

Lärare kommenterade 12 december 2015

Yes this is correct! Think about why this is and write it in the report (not here). Test also with more topics. Compare classification accuracies for different number of topics. Is 3 enough, or does it get better for more? Why? Again, write this in the report, but not here in the forum. Good luck! :)

kommenterade 12 december 2015

Thanks for the quick reply. Yep, I can think of a number of reasons! Btw, could you maybe give us a hint regarding how to infer \(\theta_{m, test}\) from \(\beta\)

En användare har tagit bort sin kommentar
kommenterade 13 december 2015

Hi Hedvig!

About question 16, I didnt quite understand what do you mean by "Are the correctly classified documents more typical for their class?".Could you elaborate about it? I think I may be confusing the three labels (classes) that we have as the topics from our K.

Lärare kommenterade 13 december 2015

The topic representation can be seen as a low-dimensional representation of the data. In addition to that the data also comes from three classes. Each training doc has 1 class label, and the task is to classify the test doc as the majority vote among the k training docs that are closest in the topic space.

Lärare kommenterade 13 december 2015

Polina, I have removed the statement "from Beta", that was confusing. You should infer theta, the topic distribution, of the test doc in the regular way, see the literature. (It of course involves beta but that is not needed info here.)

I have uploaded a new pdf file, where these things have been made clearer.

kommenterade 13 december 2015

Thanks! Should one use Gibbs sampler on the test set, is that what you mean? 

Lärare kommenterade 13 december 2015

No, you should not change beta! you should simply infer the z corresponding to each word w in the test document. That you do from beta. But read the literature!

En användare har tagit bort sin kommentar
Lärare kommenterade 14 december 2015

Assignment 2 is now updated to the correct version. There are very minor changes, so if you were fine with the old formulation of Task 2.6, you do not have to download the new one.

kommenterade 15 december 2015

Is equation 3 in 2.7 correct? For the first denominator, shouldn't there at least be a parenthesis like for the second denominator? Likewise for the beta expression.

Lärare kommenterade 15 december 2015

Thanks Kaj! The equations in task 2.7 have all been updated with that, a new version of the assignment has been uploaded!

kommenterade 15 december 2015

Hi Hedvig!

In question 2.7, if we have detailed the derivation for the first term, do we also need to detail it for the second term, given that we follow exactly the same steps?

Thank you!

Salma

Lärare kommenterade 15 december 2015

Yes, you need to derive the whole expression.

kommenterade 16 december 2015

Hi Jens!

Some questions ask us to "provide an implementation" but the instructions say not to include source code. To clarify, would you like us include source code?

If yes, should source code be inline in the report, or emailed as e.g. python files?

Best,

Joel

Lärare kommenterade 16 december 2015

Yes, that's confusing. Follow the instructions and skip the source code. 

Best,

Jens

Lärare kommenterade 22 december 2015

Your reports on Assignment 2 have now been corrected and are reported into Rapp. Good work everyone! We realize you are really busy this time of year and hope that the learning experience was worth the effort - we saw a lot of interesting and insightful answers in the reports.

After New Year you will be able to pick up your report - more about how later.

If you got an F, the report features comments on what to correct before handing it in again to get an E. As mentioned above, you have until April 1, 2016 to complete the assignment.

But before that, a Merry Christmas and a Happy New Year - or simply a relaxing holiday! :)

Hedvig

kommenterade 4 januari 2016

How do we pick up our reports?

Lärare kommenterade 7 januari 2016

Those of you who passed can pick them up on the 18th in connection to the presentations. The students that did not pass have got an email from me with information on how to get their reports back.

All the best,

/Hedvig

Lärare kommenterade 22 april 2016

The late hand-ins of assignments, April 1, have now been corrected and reported into Rapp. If your result is not in Rapp, let me know. They will be reported into Ladok in a couple of days.

kommenterade 22 april 2016

I seem to be missing the result of the first assignment (the second one is reported in Rapp)

Lärare kommenterade 22 april 2016

Joakim, it is sent to Carl Henrik - you only sent it to me, and I discovered that yesterday. So all is fine with it, and you will get the results soon.

kommenterade 22 april 2016

Okay great, thanks!

kommenterade 22 april 2016

I sent the first assignment to Carl Henrik. Was it wrong? Should I have sent both to you Hedvig?

Lärare kommenterade 23 april 2016

Hi All,

For those of you who are waiting to get Assignment 1 ticked of, I'm still in the process of going through them. They will be reported into RAPP by the end of next week. 

 
Januari 2016
under HT 2015 mladv15

Hedvig Kjellström skapade sidan 13 augusti 2015

Lärare Hedvig Kjellström ändrade rättigheterna 14 september 2015

Kan därmed läsas av alla och ändras av lärare.
kommenterade 7 november 2015

Fri 6 Nov: Is one page missing from the derivationsThe last page is numbered 10 but here are only 9 pages. Looking at each one it seems page 3 is missing. Or you skipped a number after page 2...

Lärare kommenterade 7 november 2015

Thank you Kristófer,

I have simply made an error in the numbering, 3 is non-existent.

Cheers,

Carl Henrik

Lärare kommenterade 7 november 2015

Also, I still have not scanned the derivation of the Gaussian marginal which I did not do yesterday in the interest of time and discussion. I do not have access to a scanner right now but will post them as soon as I do.

kommenterade 11 november 2015

Hi Carl!

On slide 106 in lecture 3 (the one about GP) there is an equation for posterior. the mean is k(x_star, X).transpose dot K(X,X)^-1 dot f. But if k(x_star, X) has the size 200 x 7 then I can't really get the matrices' sizes to match... Also according to Wikipadia there is no transpose for this one: k(x_star, X) -> so is transpose a typo or am I missing something here?

Thanks :)

Lärare kommenterade 11 november 2015

Hi, you are not missing something at all, its just all about getting the dimensions to match up, so the dimensions should be this in your multiplications,

[200x7][7x7][7x1] = [200x1] so according to my matrix above the equation, k(x_start,X) has to have dimension 200x7 which means that there should be no transpose.

Sorry about this and hopefully its clear now.

kommenterade 11 november 2015

Thanks :)

kommenterade 12 november 2015

Could you perhaps upload the font demo code from yesterday? Thanks

Lärare kommenterade 12 november 2015

Hi,

The code that I am using for this is a package called GPy which you can download from here https://github.com/SheffieldML/GPy . The model that I am using is a Bayesian GP-LVM model, under models, BGPLVM you can see how to run it. However, I am not allowed to share the data for the fonts as the one who created them wants to keep it to himself. Sorry about that, but you can try some other interesting data inside the GPy package, for example motion capture data.

kommenterade 12 november 2015

I'm a bit curious about the oral assessments mentioned in the schedule, "Selected oral assessments during Friday 20 nov (see Assignments)"

How will this happen? When do you know if you have been selected for one?

Asking here since I can't find anything about it on the assignments page, and I'm not in Sweden at that date..

En användare har tagit bort sin kommentar
En användare har tagit bort sin kommentar
Lärare kommenterade 15 november 2015

We have removed the formulation "selected oral assignments", due to the large number of students we have decided to go over to an entirely text-format examination of the two assignments.

Lärare kommenterade 16 november 2015

Due to (or rather thanks to - it is great!) the large number of students we have had to rebook lecture halls. Please look in the updated schedule on the course web pages, HT 2015 mladv15 > Schedule and cours plan, and take notes of the new rooms, starting Tuesday Nov 24.

We did not get larger rooms for some of the lectures; these are marked with boldface and "(small room)".  These rooms fit around 60 students, which means that we will have to do an ad-hoc solution, e.g., use tables to sit on. We will sort it out - everyone will fit! The project presentations will be arranged in separate sessions so that only <60 students are present at the same time.

All the best,

/Hedvig

Lärare kommenterade 2 december 2015

On Tuesday at 12-13, Jens will hold a help session in room 1448 on Lindstedtsv 3, floor 4. It is the room with entrance directly from the stairway, directly opposite the entrance to the computer halls.

This session will be useful for quite open questions where you need to discuss things. While waiting for this session, first try to pose your questions on the home page, so that all students can see the answers!

kommenterade 8 december 2015

Was the help session moved somewhere else?

kommenterade 8 december 2015

E3

kommenterade 8 december 2015

Sorry e31

kommenterade 10 december 2015

Hi Hedvig,​

    I am not sure, but I think you said that you were open to some suggestions for the content of the next class, scheduled on 15th December. I had a discussion with few course mates. We thought it might be a good idea if you could talk about the recently concluded NIPS Conference in Montreal. 

             I heard there was a overwhelming response from the academia as well the industry this year. We would love to hear about the recent developments in the field of machine learning especially the work related to the content taught to us in this course. What skills do you think might prove to be crucial for the academia as well as the industry in near future. 

Thank you

Lärare kommenterade 10 december 2015

That sounds like a really nice idea!

What do other students think about this?

Best,

/Hedvig

kommenterade 10 december 2015

I like the idea!

kommenterade 10 december 2015

Sounds like a great idea, really interested to hear about the frontier of the field :)

kommenterade 13 december 2015

It sounds great!

Lärare kommenterade 13 december 2015

Ok then, let us decide that the session on Tues is spent looking at the developments of the ML state of the art! I will guve a short overview of the discussions during the last few years and up to now. We can then have a short look at three papers and how they relate to this discussion. Best, Hedvig (on my way home from NIPS)

Lärare kommenterade 13 januari 2016

Now the detailed schedule for the project presentations is presented in the Schedule and Course Plan page. We are looking forward to seeing your presentations on Monday between 14 and 18, and receiving your reports via email on Monday at 12 noon!

Carl Henrik will not be able to make it unfortunately, but Jens and Hedvig are there to listen to you.

Please make sure that Hedvig has your slides on her computer before the start of your session - bring them on a stick to the lecture hall. Very strict 10 min time limits apply, since we are no less than 21 groups!

 
under HT 2015 mladv15

Hedvig Kjellström skapade sidan 13 augusti 2015

Lärare Hedvig Kjellström ändrade rättigheterna 14 september 2015

Kan därmed läsas av alla och ändras av lärare.
kommenterade 19 november 2015

Is the oral presentation on january 18 or 19? The dates are not consistent in the text and it says 19 under written report and 18 under oral presentation.

/Robin

Lärare kommenterade 26 november 2015

Should be the 18 - the 19 was last year. Sorry about that!

Lärare kommenterade 26 november 2015

Hi everyone, please visit the Project page and look at the group assignments!

All the best,

/Hedvig

Lärare kommenterade 27 november 2015

A few additions to the project group list, give me a shout if I missed you!

kommenterade 28 november 2015

I can not seem to find the list of scientific articles, is it in some place other than specified above?

Lärare kommenterade 30 november 2015

Look at the literature list, page mladv15 > Literature and examination.

kommenterade 30 november 2015

Is it? I can't find it either.

Lärare kommenterade 30 november 2015

Ah, the project paper - sorry. That you will find using Google...

Lärare kommenterade 4 december 2015

Papers have now been assigned to groups. The students in each group are jointly responsible for driving the project, asking the supervisor for advice, and making sure that all group members contribute to both the implementation of the method and to the writing of the report.

For issues relating to the collaboration in the group (rather than the project itself), contact Hedvig.

Good luck! We are greatly looking forward to the project results!

/Hedvig

kommenterade 13 januari 2016

We're having a discussion about whether or not to include criticism solely on the algorithmic method or also including scientific method?

Lärare kommenterade 13 januari 2016

Both is interesting to discuss, I think!

 
November 2015
under
HT 2015 mladv15
Schemahandläggare skapade händelsen 10 april 2015

ändrade rättigheterna 30 april 2015

Kan därmed läsas av alla och ändras av lärare.
Schemahandläggare tog bort händelsen 13 november 2015
Schemahandläggare redigerade 16 november 2015

Q3E3

 
under
HT 2015 mladv15
Schemahandläggare skapade händelsen 11 mars 2015
Schemahandläggare redigerade 11 april 2015

FöreläsÖvning

FöreläsÖvning

ändrade rättigheterna 30 april 2015

Kan därmed läsas av alla och ändras av lärare.
Schemahandläggare tog bort händelsen 13 november 2015
Schemahandläggare redigerade 16 november 2015

L5K2

 
under
HT 2015 mladv15
Schemahandläggare skapade händelsen 11 mars 2015

ändrade rättigheterna 30 april 2015

Kan därmed läsas av alla och ändras av lärare.
Schemahandläggare tog bort händelsen 13 november 2015
Schemahandläggare redigerade 16 november 2015

V3M2

 
under
HT 2015 mladv15
Schemahandläggare skapade händelsen 11 mars 2015

ändrade rättigheterna 30 april 2015

Kan därmed läsas av alla och ändras av lärare.
Schemahandläggare tog bort händelsen 13 november 2015
Schemahandläggare redigerade 16 november 2015

Q31E3

 
under
HT 2015 mladv15
Schemahandläggare skapade händelsen 14 augusti 2015
Schemahandläggare tog bort händelsen 13 november 2015
Schemahandläggare redigerade 16 november 2015

V3B3

 
under
HT 2015 mladv15
Schemahandläggare skapade händelsen 11 mars 2015

ändrade rättigheterna 30 april 2015

Kan därmed läsas av alla och ändras av lärare.
Schemahandläggare tog bort händelsen 13 november 2015
Schemahandläggare redigerade 16 november 2015

Q34E3

 
under
HT 2015 mladv15
Schemahandläggare skapade händelsen 11 mars 2015

ändrade rättigheterna 30 april 2015

Kan därmed läsas av alla och ändras av lärare.
Schemahandläggare tog bort händelsen 13 november 2015
Schemahandläggare redigerade 16 november 2015

V35B1