Nyhetsflöde
Logga in till din kurswebb
Du är inte inloggad på KTH så innehållet är inte anpassat efter dina val.
Har du frågor om kursen?
Om du är registrerad på en aktuell kursomgång, se kursrummet i Canvas. Du hittar rätt kursrum under "Kurser" i personliga menyn.
Är du inte registrerad, se Kurs-PM för DD2434 eller kontakta din studentexpedition, studievägledare, eller utbilningskansli.
I Nyhetsflödet hittar du uppdateringar på sidor, schema och inlägg från lärare (när de även behöver nå tidigare registrerade studenter).
This message is for PhD students taking the course.
PhD students do not receive grades (for any of their courses). The PhD baseline for a Pass grade is set to:
- All tasks on Assignment 1 (A)
- All tasks on Assignment 2 (A)
- The compulsory requirements on the project (E)
There are no time limits on assignments other than the final deadline of April 1, 2016. The project should however be presented together with the others, January 18, 2016.
Hello! I have a question regarding the first assignment's first question -- you write \(\mathbf{x}_i^j\), which looks like it would be a vector (on account of the bold face), but I'm assuming the superscript j is supposed to be the jth component, so it would be a scalar, no?
Well spotted Ludvig, the copy-paste devil strikes again, just before and in the left hand side of Eq. 2 you can skip the superscript j, it doesn't mean anything.
Hi Teachers,
Is it possible to have access to Assignment 2 right away? I have some personal reasons why I would want to attempt that earlier if possible.
Hello! Regarding question 7 in assignment 1, what does representability of a model mean?
Hi Gabriela,
That is indeed an interesting question and rather hard to answer without giving you the answer to the question ;-). Think like this, we build models to represent data, i.e. representability can then be interpreted as the capability of a model to represent data. So, how does then a non-parametric and a parametric approach differ with respect to the data it can represent? I know the answer is a bit vague but think about how these two classes of models differ and I think you will be able to give a really good answer.
Hi Akshaya, Assignment 2 is not ready yet, as soon as we are done with it it will be published.
Hi Carl!
in Question 11 - visualise the prior distribution of W - is it a multivariate Gaussion we should plot or norm.pdf... I confused.
Hi, another notation question... In equation (33): the way this is written doesn't conform if A is 10x2 and x is 100x2. Should it instead read Ax' ? Then Y will be 10x100, and not 100x10 like it says in the next paragraph, which kinda makes more sense - inferring the lower dimensional representation of 100 ten dimensional observations, than inferring the lower dimensional representation of just 10 one-hundred dimensional observations...? Thanks
In question 5, are we supposed to interpret the meaning behind a cube-y prior with vertices on axes?
Hi Carl,
in question 6 we are supposed to derive the posterior by the knowledge of the likelihood and prior. As the likelihood is composed of multiplications of Gaussian distribution, the derivation will be included a sum of the terms in the exponent. Due to this, I get the sum term in the mean and covariance matrix of the posterior. This feels on the one hand very strange (mean and covariance potentially increases with more pair of points xi and yi) but at the same time correct because of that the likelihood consists of a multiple of Gaussians. I am confused...
Best regards,
Leo
Hi all,
Lets get cracking on these questions.
Q5. Think like this, what is the characteristic different between an L1 and a L2 distance? If the "cost" is associated with L1 or L2 which points/parameters will have the same cost, different cost. Then from this I think that you should be able to figure out the question. One clue is to draw iso-surfaces of the different distance functions.
Q6. I am also a little bit confused by your question. So, yes you are supposed to multiply the prior with the likelihood and write the expression for the posterior. There will be a sum term in the exponent, but you can write this as a matrix product instead just as we did on Friday. I am not sure if this explains your question fully, but I hope that it gets you somewhere on the way.
Q33. Erik, you are indeed completely correct. The output should be 100x10 and currently the dimensions of the calculations will not match. So, if you do Y'=A*x' which means Y = x*A', then the dimensionality will be as it should.
Q11. So the way to plot this is to use colour to encode the actual probability value just as I did on the lecture notes. You can do this really simply by creating a function that returns the probability of the prior and then do a nested for loop over the two parameters and visualising this as an image.
In the first assignment, section 2.3, it says "Think about how this relates to the latent space models that you worked on in the first part of the course, where you used discrete latent states to represent continuous data." and I'm wondering what I've missed? Is the first assignment not the first? Does it assume that I have prior knowledge?
Hi Carl,
Yes, that is me being sloppy this sentence should have been removed when the assignment changed from being the second (last year) to the first (this year). So do not worry about this now, it is nothing essential but maybe you can now relate to this when you are doing Jens course. Sorry about this.
Is the prior for X missing under the integral in Eq. 23?
I don't get what you are asking for in Question 27: all these sums should be 1 by construction... Thanks
Hlynur, yes p(X) is missing in that equation.
Erik, if I answer like this ;-) is that clear enough =).
Hi, I think I have spotted a mistake in the derivations of exercise 1. The resulting posterior distribution over W should have the inverse of the covariance matrix it has, because what it is found "completing the square" is the inverse of the actual covariance matrix (S^-1). Maybe this helps someone for the lab.
In the derivation of the posterior of W, there's a line saying y'xW = W'x'y or something like that, I'm not seeing why that is.
In 2.1, it is stated \(y_i\) is a \(D\)-dimensional vector and \(x_i\) is a \(q\)-dimensional vector. This means \(W\) in eq. (5) has to be a \(D \times q\) matrix in order to make the matrix multiplication correct.
Is \(W_0\) also a \(D \times q\) matrix?
Aitor: yes that is indeed true, I've lost the ^{-1} for the last line, the PDF that I uploaded was updated with the correct derivation, so if you re-download it I think that it should be fine.
Ludvig: For the rules of the transpose, you get this result by repeatedly performing (AB)' = B'A' . A good reference for these tricks, and lots of others can be found here [1] which is free online.
[1] K. Petersen and M. Pedersen, “The matrix cookbook,” Technical University of Denmark, 2006.
Oscar: That is indeed, true, and if you work with that you are going to have to model really tricky covariances. However, think about this assumption, the output dimensions are conditionally independent of the input. If you make this assumption it all falls back to a simple 1-D problem for each dimension and you have a W_0 matrix which is q x 1.
What I mean is not the previous mistake when finding the mean, but the part where it is written P(W|Y,X) proportional to N(mean,cov) at the bottom of that page. The cov that is written there is the inverse of the actual covariance, so it should be cov^{-1}
I'm confused. Are we suppose to write a report and also present it orally? Will the oral presentation be carried out at the deadline date or some time after? Some clarification is needed.
/Robin
Hi Robin, you should submit a report with all your findings. Normally we will not have any oral examination but in specific cases this still might be needed if we find that it is not feasible to set a grade based on the results in the report. If we feel that an oral examination is needed then we will contact you and we will decide on a date that suits both of us.
Aitor: ah, now I understand, yes there is something fishy going on here, when I identify the S matrix I am actually identifying the S^{-1} matrix and that error follows on through the derivation. Right now I do believe that there is only an inverse on the covariance that’s missing but I do need to go through this more in detail just to be sure. I will do that and update the derivation again.
Regarding the oral exam after Assignment 2 on December 17, is there any possibility to do it through Skype if you get chosen? I know many students, including myself, will be spending the holidays in our home countries. Flying before the 17th can save quite a lot of money.
Hi Kristofer, absolutely, if that would be the case then we will make sure that we somehow find a solution that works for everyone. Do not worry about this, think more of it as a catch all, "that we as teachers want to keep the option open to call someone for an oral examination if we think that there are stuff that we need to clear up". So you just go ahead and book your tickets.
Equation 23 in the assignment:
$$p(Y|W) = \int p(Y|X,W)dX$$
Will not the left hand side integrate to > 1? The integrand in the right hand side should be multiplied by p(X) right?
Ok, just saw the previous question for this. but the pdf wasn't updated so I thought this wasn't noticed yet.
In the 2.4.1 practical are we only supposed to learn the linear mapping (that the matrix A does when we generate the data)? Or do we need to learn the non linear mapping too (that the function f_nonlin does)?
Hi John, you are completely right, you only need/can get the output of the non-linear mapping back from the linear assumption. If you want to recover the "true" underlying parameter you will need a non-linear method and that we do not do in this assignment. Hope this helps.
So just to clarify, the X in question 21, question 20 and so on corresponds to the left hand side of equation 32 and not 31? i.e.
$$\bar{X} = f_{non-lin}(x_i), x\in[0,...,4\pi]$$
The X in question 20 refers to the equation above, i.e. eq. 28, which has to do with why it is simpler to marginalise out f than X. The X in question 21, is what you learn from performing the optimisation.
Suggestion for the assignments:
As there is only one Carl and many questions it seems less optimal to have Carl answer the same questions more than once.
Let student post questions on bilda prior to the exercise or write them up on the board in the beginning of (and during) an exercise.
Then other students can "like" the questions that they want answered. In the case that you write the questions on a black board then students can vore by putting marks after each questions.
Questions with most likes or marks will be answered by Carl.
This provides answers to the largest amount of students. It also provides Carl with info of what students know and don't know. If digitalised, through bilda or similar systems, this data can be used to further develop course.
There can be only one.
I'm getting stuck on question 19. Is there any point to continuing and doing other questions even if I can not answer that one?
Erik that is a very good suggestion, so it is a bit tricky now. But if you post things here I will try to answer them. I already have collected enough data so the posterior over which questions are challenging is low entropy so I will try to write a couple of general answers here right now. Just give me a bit and I'll try to answer some questions here.
Ok, I have now written a little help on the questions that most of you seem to get stuck on, I hope that this helps. You can download the PDF here.
Also, I have uploaded the derivations from Friday. The will be no help at all for the assignment but for you who are keen to look through what we actually did you can find them on the lecture page.
The supplied code for index in part3 seems to have errors. It is trying to subtract each model's evidence with every other models', but in the paper they calculate distance as the sum of differences between evidences for two data sets. I cannot get the supplied code to run. Can anyone else?
In the matrix cookbook
http://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
it feels like eq. (43) and eq. (57) are doing the same thing, but one gives a scalar and the other a square matrix. How do we know when to use which?
Hlynur: they are actually the same thing, 57 is a special case where you calculate each element of the matrix at the same time, while in 43 you need to also place a "denominator" to show what you are taking the derivative over. Make sense?
Carl: how do you pass the evidence? There is the line in the comment of the code in the beginning that already does the sum I think that you are talking about, so the difference that is computed in the code will actually be the difference between the evidence for two data-sets summed over each model, so I think it is correct. Anyhow there isn't really a right or wrong way of doing this, as is stated in the paper, so if we all use the code that I supplied then it makes it easy for me to compare them. Does this clarify things?
# evidence = np.zeros([num_models,num_data_sets]) # index = create_index_set(np.sum(evidence,axis=0))
Zlatan for president!
I don't really understand the index algorithm in the paper for the last practical.
The distance function they are using is not symmetrical, is that the point? When getting the N set which should be the points (datasets I guess?) that are closest to L (the last dataset we picked) should it be the distance from L to the other datasets or the other way around which will be different since the distance function is not symmetrical?
And if the N set is nonempty, it says the furthest point in N from L should be picked. In your code, you have an argmin(), but shouldn't that be argmax() if the furthest point is to be picked?
I'm stuck on the last part of Practical 2 (Q21). I have defined the function and the derivative, but I'm not sure how to get to x, what the shape of x is, and how to plot it. The question says we should find "the single line x" but it seems more like we want to find the two-dimensional x' that results from the non-linear mapping from R1 -> R2, since the calculations of the gradient assume a linear mapping. What do you mean by a single line?
Right now I'm also getting precision loss errors while optimizing, do you have any ideas about what the problem could be?
To avoid precision loss errors while optimizing, add a very small amount of white noise to the data, i.e. make sure you always have some variability in your data.
My optimization runs different amounts of iterations but always ends in "desired error not necessarily achieved due to precision loss.". I have added noise to the A matrix which did not result in any improvement.
Is there any other cause you can think of?
Furthermore assuming I do get an optimal w I do not see how to convert this to an optimal f(x) where f is the non-linear function since this w is not a square matrix and not invertible.
Use pseudo inverse.
Where should be submit the assignment?
The email is mentioned in the assignment text in the beginning. Correct me if im wrong.
Correct! Thank you Robin
Assignment 2 Part II is now uploaded on the web pages. You can thus get started with Tasks 2.4-2.6 already now if you like, and then take on Tasks 2.1-2.3 when they are uploaded later today.
Best,
/Hedvig
Now the full Assignment 2 is online! If you downloaded part II yesterday, throw this version away and download the full Assignment 2 as of today Friday, 14.15.
Good luck!
/Hedvig
Question 2.1, on the 6th line, "+" should rather be "-" right?
Yes that's true. Thanks for pointing this out.
Jens
This is now corrected in the pdf file!
Master students: For time management reasons, we will not correct late assignments until the late deadline, April 1, 2016. Thus, wait until then with the hand-in of your late Assignment 1!
PhD students: We will, as agreed upon earlier, correct your assignmens when you hand them in. The requirements for a Pass grade is all tasks, i.e., the same requirement as A for Master students.
Best,
/Hedvig
The grades for the first assignment is now up on RAPP.
I had a lovely weekend reading through your reports, there were some very impressive work that you should be very proud of. I would have love to sit down with all of you and have a discussion about your work but sadly there is no time for this. Tomorrow I will come to the lecture and hand back your reports in the break.
Carl Henrik for president
In 2.1, assume that all outcomes have a positive probability. This removes some pathological cases, which you may not have considered anyway :-).
Best,
Jens
The whitening in Task 2.5 requires you to do singular value decomposition (SVD) to get the eigenvectors and eigenvalues of the data. For a concise description, see e.g. the Wikipedia page on PCA. Note that this is the deterministic version of PCA, not PPCA as you studied in Assignment 1, and that you do a closed-form solution using SVD to find the eigenvectors and eigenvalues of the data.
I can recommend the Python function numpy.linalg.svd or the Matlab function svd.
Cheers,
/Hedvig
Can we use R for the implementation tasks in the second assignment?
Could we have the .tex files for the assignments as well?
You can definitely use R! Really nice if students try different languages.
We prefer to not give out the source tex files - but there are tons of other tex examples to get inspiration from on the web!
All the best,
/Hedvig
In task 2.2, does each table have its own dice with its own categorical distribution, or do the two table classes T and T' have just two categorical distributions (one each) and all the member tables of the class use that class's distribution? I don't see how the two table classes are particularly meaningful if every table has its own distribution. The assignment text is ambiguous. I smell HMM.
This is a good question. As it is formulated, each table has its own categorical. Although, when you generate data, you don't necessarily have to have different categorical distributions across the tables of each class. It is of course true that the formulation doesn't make a whole lot of sense, but it is an assignment and they are often somewhat contrived, in order to get a problem of an appropriate difficulty.
Best,
Jens
In task 2.1, the interpretation of the influences (denoted by "+" and "-") is essentially comparing the probabilities of outcomes ("0" or "1") of the parents (it says which parent has probably generated a certain child). In question 1, however, I feel that, besides comparing outcomes of parents, I need to compare outcomes of a child ("0" or "1") for a given parent as well. What I mean is: if the outcome of a parent is "1" and the connection between this parent and a child has a "+", I would like to conclude that it is more probable that this child has an outcome "1" than "0". But from the interpretation of the influences shown in the assignment I can't conclude that (the interpretation doesn't compare the probabilities for different outcomes of children). I tried to derive it mathematically from the given rule, but I didn't reach any insightful result. I must be missing something. Could you help me with this? :)
Hi Hedvig!
In 2.5 the description of the data in the assignment doesn't seem to coincide with the figures a, b, c and d. That is, in the assignment it says. "Figure 2(c) shows two individual, independent signals over time.... represented as a point s_i in a 2D space, see Figure 2(b)." But figures 2(a) and 2(b) show the mixed signals (the observed ones). In the citation above it sounds like the latent signals are described. Am I correct or have I simply misunderstood the data?
You are completely correct, Elizaveta - I can only blame my workload... :)
A version of Assignment 2 where the Task 2.5 has been corrected is now uploaded.
Best,
/Hedvig
Do we have to write down the code in the report for question 5?
For question 7 and 8 in section 2.3, are we allowed to use a built in libraries and just describe the algorithms?
Or do we have to implement algorithms ourselves?
Yes you should do that. The main reason is that I strongly suspect that the wanted functionality cannot be found in standard packages.
Best,
Jens
How about this? http://mathworks.com/help/stats/hidden-markov-models-hmm.html
In 2.4, we're supposed to find the EM parameters for the model of a given player. Is it then OK to assume N graphical models in 2.2 and have the algorithms output the probabilities of tables for each player separately in 2.3?
I also have a question about EM. I understand it like I should calculate \(p(Z|X,\theta)\) for all possible combinations of Z, i. e. the latent, single dice throws. That grows exponentially, is this ok or did I think wrong somewhere? My likelihood increases and converges so it seems to work, it just scales really badly :)
Carl: These problems have been designed in order to not be solvable using standard methods and, thereby, give you an opportunity to develop a deeper understanding. So the right approach is to try to understand how they differ from a problem that can be solved using the standard methods and how a standard method needs to be tweaked or extended in order to solve the problem at hand.
Hlynur: No the the tables parameters are the same for al the players.
Isac: It is not ok to get a exponential running time. An exponential running time, i.e., caused by examination of combinations of hidden variables, can typically be evaded by noticing that it is sufficient to compute a marginal, in this case of \(p(Z|X,\theta)\).
Best,
Jens
In questions 7 and 8, what does z mean? In the instructions it says that z is the player's own dice, but then in question 7 it seems that z is a table. Can you help me with that? :)
Thanks for this excellent observation Gabriela. As you suggest, I actually meant the tables in in question 7 and 8. That interpretation also renders the problems easier so I suggest that you all use it. That is, interpret each Zi in p(Z1, ..., ZK|s1, . . . , sK, Θ) as a table, or better use some other variable say Qi and let p(Q1, ..., QK|s1, . . . , sK, Θ) be the probability of the table sequence Q1, ..., QK given the observations s1, . . . , sK and the parameters Θ.
Best,
Jens
Hi Jens! I'm wondering about the indices in question 7. First regarding the given table sequence z_1, ..., z_n : is n the same N as in the casino description (number of players) or is it just any n?
And for the dice sum sequence s_1, ..., s_K : I'm assuming K is the same as above = number of tables visited per player?
Is there a relation between n and K in question 7, for example should n<K ?
Mvh,
Sofia
The n in the z_1, ..., z_n should be K and, yes, K is the number of tables visited.
Best,
Jens
I'm still struggling with the EM-algorithm. The Q-term is supposed to be the expectation, or the weighted sum over all generating distributions. But what are the generating distributions really? And given some X and Z, this really leaves no distribution (over S anyway) since there is only one S = X + Z, right? So letting only Z be the latent variable at least leaves a distribution over X's. But still, how many are they? Are they not all the combinations of assignments of the _vector_ Z, that is exponentially many? And I really don't see how to not consider all combinations of generating distributions.
Thanks
I assume that you use X_k and Z_k for the tables and players dice at step k. Notice that those are the hidden variables. So you should first assume that those available to you and write an expression for the complete likelihood. Then you should consider the expected complete log-likelihood, that is over X and Z (which are the X_1,...,X_K and Z_1,...,Z_K, respectively) given S (all the sums). As you write there are exponentially many X and Z, but notice that this has been the case in each example of the EM algorithm we have considered. However, in the end (after the standard steps), we have been left with marginals over single or pairs of variables, corresponding to the present X_i and Z_i variables.
Again, first make sure that you can express the likelihood for complete data.
Best,
Jens
Follow up question:
The innermost product looks like this for the complete data:
$$\log \left[ p(z_{nk})p(x_{nk})p(s_{nk}|x_{nk}, z_{nk}) \right] = \log \left[ p(z_{nk})p(x_{nk})I(s_{nk} = x_{nk} + z_{nk}) \right]$$
Is it assumed the complete data more or less likely, or rather, is it a problem that you can get \(\log(0)\) when the indicator function is zero?
Getting log(0) is usually not a concern. Except from that comment it is hard to completely understand you question and answering it without risking to lead you in an incorrect direction. Sorry.
Best,
Jens
I have added some tips about Task 2.6 to the Assignments page in the course homepages. In short:
- Start asap, running times are very large!
- Read the tips before starting!
Best of luck,
/Hedvig, and Zheng who designed the task
Hi Jens! I'm trying to maximize the expected probability of the latent variables given some observed data, and wind up with this expression:
$$\frac{\partial E \log p(\mathbf{x}, \mathbf{y} | \mathbf{s})}{\partial p(x_{nk})} = \sum_{nk}^{NK} \sum_{a,b=1}^6 \frac{r_{nk}^{ab}}{p(x_{nk})} = 0$$
Where \(r_{nk}^{ab} = p(X_k = a, Y_n = b | s_{nk})\) is the responsibility of dice throw pair a and b for the nk'th dice sum. I'm not sure how to continue from here - I seem to recall from the lectures that you were supposed to get some kind of frequency here, but I don't really see how that would work out.
Any hints?
No, not really. It is hard to provide any guidance here concerning the path forward here. It is odd, though, that you are taking derivatives. Also, where are the new parameters in the expression for the log-likelihood?
Do you have an expression for the complete likelihood, i.e., assuming that all variables are known to you?
Best,
Jens
I do have an expression \(p(\mathbf{s, x, y})\) which I rewrote to the log likelihood of the complete data, and then took the expectation of the latent variables given the observations, that is, \(E_{p(\mathbf{x, y} | \mathbf{s})} \log p(\mathbf{s, x, y})\).
Then you should proceed from there, and that should not give the previous expression.
Best,
Jens
I should clarify concerning the parameters -- the expected log probability is taken with the current estimate, so \(E_{p(\mathbf{x, y} | \mathbf{s}, \theta')} \log p(\mathbf{s, x, y| \theta})\) where the prime indicates current estimate.
File failed to load: /extensions/MathMenu.js
Hi again Jens! Can I post the expression I got for the expected complete log likelihood? I've looked at how others do EM for continuous distributions, and they generally seem to do derivatives of the kind I showed above -- perhaps that's a mistake. I guess you're saying take that expression, and instead of deriving it, reason about how to maximize it in a similar way to how you did for Baum-Welch?
No please don't post it and not the final solution either :).Yes, I believe that should be able to do it in the way I did it for Baum-Welch. As always try get a expressions, solutions etcetera that are similar to those for the most similar problem.
Best,
Jens
In question 7, does theta contain any information of the probabilities of switching between primed/unprimed states or do we have to somehow learn these probabilities before calculating the sought output?
The transition probabilities are specified by the sentence: "In the k:th step, if the the previous table visited was Tk−1, the player visits Tk with probability 1/4 and T 0 k with probability 3/4, and if the previous table visited was T 0 k−1 , the player visits T 0 k with probability 1/4 and Tk with probability 3/4." So you know them.
Best,
Jens
Hi Hedvig,
I have a question regarding LDA. I have plenty of words such at "at", "in", "and" etc among top 20 in each topic for K=3. I wonder if that is expected. Is there any remedy for this in case this kind of output is indeed expected?
Best,
Polina
Yes this is correct! Think about why this is and write it in the report (not here). Test also with more topics. Compare classification accuracies for different number of topics. Is 3 enough, or does it get better for more? Why? Again, write this in the report, but not here in the forum. Good luck! :)
Thanks for the quick reply. Yep, I can think of a number of reasons! Btw, could you maybe give us a hint regarding how to infer \(\theta_{m, test}\) from \(\beta\)?
Hi Hedvig!
About question 16, I didnt quite understand what do you mean by "Are the correctly classified documents more typical for their class?".Could you elaborate about it? I think I may be confusing the three labels (classes) that we have as the topics from our K.
The topic representation can be seen as a low-dimensional representation of the data. In addition to that the data also comes from three classes. Each training doc has 1 class label, and the task is to classify the test doc as the majority vote among the k training docs that are closest in the topic space.
Polina, I have removed the statement "from Beta", that was confusing. You should infer theta, the topic distribution, of the test doc in the regular way, see the literature. (It of course involves beta but that is not needed info here.)
I have uploaded a new pdf file, where these things have been made clearer.
Thanks! Should one use Gibbs sampler on the test set, is that what you mean?
No, you should not change beta! you should simply infer the z corresponding to each word w in the test document. That you do from beta. But read the literature!
Assignment 2 is now updated to the correct version. There are very minor changes, so if you were fine with the old formulation of Task 2.6, you do not have to download the new one.
Is equation 3 in 2.7 correct? For the first denominator, shouldn't there at least be a parenthesis like for the second denominator? Likewise for the beta expression.
Thanks Kaj! The equations in task 2.7 have all been updated with that, a new version of the assignment has been uploaded!
Hi Hedvig!
In question 2.7, if we have detailed the derivation for the first term, do we also need to detail it for the second term, given that we follow exactly the same steps?
Thank you!
Salma
Yes, you need to derive the whole expression.
Hi Jens!
Some questions ask us to "provide an implementation" but the instructions say not to include source code. To clarify, would you like us include source code?
If yes, should source code be inline in the report, or emailed as e.g. python files?
Best,
Joel
Yes, that's confusing. Follow the instructions and skip the source code.
Best,
Jens
Your reports on Assignment 2 have now been corrected and are reported into Rapp. Good work everyone! We realize you are really busy this time of year and hope that the learning experience was worth the effort - we saw a lot of interesting and insightful answers in the reports.
After New Year you will be able to pick up your report - more about how later.
If you got an F, the report features comments on what to correct before handing it in again to get an E. As mentioned above, you have until April 1, 2016 to complete the assignment.
But before that, a Merry Christmas and a Happy New Year - or simply a relaxing holiday! :)
Hedvig
How do we pick up our reports?
Those of you who passed can pick them up on the 18th in connection to the presentations. The students that did not pass have got an email from me with information on how to get their reports back.
All the best,
/Hedvig
The late hand-ins of assignments, April 1, have now been corrected and reported into Rapp. If your result is not in Rapp, let me know. They will be reported into Ladok in a couple of days.
I seem to be missing the result of the first assignment (the second one is reported in Rapp)
Joakim, it is sent to Carl Henrik - you only sent it to me, and I discovered that yesterday. So all is fine with it, and you will get the results soon.
Okay great, thanks!
I sent the first assignment to Carl Henrik. Was it wrong? Should I have sent both to you Hedvig?
Hi All,
For those of you who are waiting to get Assignment 1 ticked of, I'm still in the process of going through them. They will be reported into RAPP by the end of next week.
Fri 6 Nov: Is one page missing from the derivations? The last page is numbered 10 but here are only 9 pages. Looking at each one it seems page 3 is missing. Or you skipped a number after page 2...
Thank you Kristófer,
I have simply made an error in the numbering, 3 is non-existent.
Cheers,
Carl Henrik
Also, I still have not scanned the derivation of the Gaussian marginal which I did not do yesterday in the interest of time and discussion. I do not have access to a scanner right now but will post them as soon as I do.
Hi Carl!
On slide 106 in lecture 3 (the one about GP) there is an equation for posterior. the mean is k(x_star, X).transpose dot K(X,X)^-1 dot f. But if k(x_star, X) has the size 200 x 7 then I can't really get the matrices' sizes to match... Also according to Wikipadia there is no transpose for this one: k(x_star, X) -> so is transpose a typo or am I missing something here?
Thanks :)
Hi, you are not missing something at all, its just all about getting the dimensions to match up, so the dimensions should be this in your multiplications,
[200x7][7x7][7x1] = [200x1] so according to my matrix above the equation, k(x_start,X) has to have dimension 200x7 which means that there should be no transpose.
Sorry about this and hopefully its clear now.
Thanks :)
Could you perhaps upload the font demo code from yesterday? Thanks
Hi,
The code that I am using for this is a package called GPy which you can download from here https://github.com/SheffieldML/GPy . The model that I am using is a Bayesian GP-LVM model, under models, BGPLVM you can see how to run it. However, I am not allowed to share the data for the fonts as the one who created them wants to keep it to himself. Sorry about that, but you can try some other interesting data inside the GPy package, for example motion capture data.
I'm a bit curious about the oral assessments mentioned in the schedule, "Selected oral assessments during Friday 20 nov (see Assignments)"
How will this happen? When do you know if you have been selected for one?
Asking here since I can't find anything about it on the assignments page, and I'm not in Sweden at that date..
We have removed the formulation "selected oral assignments", due to the large number of students we have decided to go over to an entirely text-format examination of the two assignments.
Due to (or rather thanks to - it is great!) the large number of students we have had to rebook lecture halls. Please look in the updated schedule on the course web pages, HT 2015 mladv15 > Schedule and cours plan, and take notes of the new rooms, starting Tuesday Nov 24.
We did not get larger rooms for some of the lectures; these are marked with boldface and "(small room)". These rooms fit around 60 students, which means that we will have to do an ad-hoc solution, e.g., use tables to sit on. We will sort it out - everyone will fit! The project presentations will be arranged in separate sessions so that only <60 students are present at the same time.
All the best,
/Hedvig
On Tuesday at 12-13, Jens will hold a help session in room 1448 on Lindstedtsv 3, floor 4. It is the room with entrance directly from the stairway, directly opposite the entrance to the computer halls.
This session will be useful for quite open questions where you need to discuss things. While waiting for this session, first try to pose your questions on the home page, so that all students can see the answers!
Was the help session moved somewhere else?
E3
Sorry e31
Hi Hedvig,
I am not sure, but I think you said that you were open to some suggestions for the content of the next class, scheduled on 15th December. I had a discussion with few course mates. We thought it might be a good idea if you could talk about the recently concluded NIPS Conference in Montreal.
I heard there was a overwhelming response from the academia as well the industry this year. We would love to hear about the recent developments in the field of machine learning especially the work related to the content taught to us in this course. What skills do you think might prove to be crucial for the academia as well as the industry in near future.
Thank you
That sounds like a really nice idea!
What do other students think about this?
Best,
/Hedvig
I like the idea!
Sounds like a great idea, really interested to hear about the frontier of the field :)
It sounds great!
Ok then, let us decide that the session on Tues is spent looking at the developments of the ML state of the art! I will guve a short overview of the discussions during the last few years and up to now. We can then have a short look at three papers and how they relate to this discussion. Best, Hedvig (on my way home from NIPS)
Now the detailed schedule for the project presentations is presented in the Schedule and Course Plan page. We are looking forward to seeing your presentations on Monday between 14 and 18, and receiving your reports via email on Monday at 12 noon!
Carl Henrik will not be able to make it unfortunately, but Jens and Hedvig are there to listen to you.
Please make sure that Hedvig has your slides on her computer before the start of your session - bring them on a stick to the lecture hall. Very strict 10 min time limits apply, since we are no less than 21 groups!
Is the oral presentation on january 18 or 19? The dates are not consistent in the text and it says 19 under written report and 18 under oral presentation.
/Robin
Should be the 18 - the 19 was last year. Sorry about that!
Hi everyone, please visit the Project page and look at the group assignments!
All the best,
/Hedvig
A few additions to the project group list, give me a shout if I missed you!
I can not seem to find the list of scientific articles, is it in some place other than specified above?
Look at the literature list, page mladv15 > Literature and examination.
Is it? I can't find it either.
Ah, the project paper - sorry. That you will find using Google...
Papers have now been assigned to groups. The students in each group are jointly responsible for driving the project, asking the supervisor for advice, and making sure that all group members contribute to both the implementation of the method and to the writing of the report.
For issues relating to the collaboration in the group (rather than the project itself), contact Hedvig.
Good luck! We are greatly looking forward to the project results!
/Hedvig
We're having a discussion about whether or not to include criticism solely on the algorithmic method or also including scientific method?
Both is interesting to discuss, I think!
Schemahandläggare redigerade 16 november 2015
Q3E3
Schemahandläggare redigerade 11 april 2015
FöreläsÖvning
FöreläsÖvning
Schemahandläggare redigerade 16 november 2015
L5K2
Schemahandläggare redigerade 16 november 2015
V3M2
Schemahandläggare redigerade 16 november 2015
Q31E3
Schemahandläggare redigerade 16 november 2015
V3B3
Schemahandläggare redigerade 16 november 2015
Q34E3
Schemahandläggare redigerade 16 november 2015
V35B1
Assignment 1 has now been published on the course homepage, under HT 2015 mladv15 > Assignments. Good luck with the assignment, and see you all tomorrow!