How to Train Bag of Words Model?

Аватар автора
Программирование ботов для бизнеса
I had previously mentioned, that you need context words and center words to train your continuous bag-of-words model. The question now is, how do you actually get these words? Let&dive in. Previously, you cleaned and tokenized the corpus, and you now have this clean corpus as an array of words or tokens. Now, I will show you how to extract center words and their context words, which will serve as examples to train the continuous bag-of-words model. Here&the code to do this in Python. The get_windows function takes two arguments, words, which is an array of words, or tokens. But I&stick with the term words here. The context have size stored in the variable C, which is the number of words to be taken on each side of the center word. This was 2 in the previous video, for a total window size of 5. The function initializes a counter with the index of the first word that has enough words before it. In the working example, I am happy because I am learning, the tokenized array would be, I am happy because I am learning, where I has the index 0, am index 1, and so on. For a context have size of 2, where the context words are the two words before and after the center word. The first center word that can be used is happy. It&index is 2, which is the size of the context. I then start a loop in this index, which will run until it reaches the last possible center word, which is the last word that has two words after it. Stopping just before it reaches the index corresponding to the number...

0/0


0/0

0/0

0/0