List comprehension in Our_Tokenizer class and episode flow

When the class is first created we use:

```python
class Our_Tokenizer:
  def __init__(self):
    # import spacy tokenizer/language model
    self.nlp = en_core_web_sm.load()
    self.nlp.max_length = 4500000 # increase max number of characters that spacy can process (default = 1,000,000)
  def __call__(self, document):
    tokens = self.nlp(document)
    simplified_tokens = [str.lower(token.lemma_) for token in tokens]
    return simplified_tokens
```

This issue relates to this line:
```python
simplified_tokens = [str.lower(token.lemma_) for token in tokens]
```

Using string comprehension like this makes it shorter, but then we have to explain list comprehension to learners. Not the worst thing. 

However, when we incorporate stop words into the class, we use a for-loop:

```python
    simplified_tokens = []    
    for token in tokens:
        if not token.is_stop and not token.is_punct:
            simplified_tokens.append(str.lower(token.lemma_))
```

Then we switch back to more complex list comprehension later:

```python
    simplified_tokens = [
      token for token in tokens
      if not token.is_stop
      and not token.is_punct
      and token.pos_ in {"ADJ", "ADV", "INTJ", "NOUN", "VERB"}
    ]
```

We should either stick with list comprehension (and include a brief note about what that is) or stick to a for-loop approach throughout this episode. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

List comprehension in Our_Tokenizer class and episode flow #75

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

List comprehension in Our_Tokenizer class and episode flow #75

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions