Can you help me understand this R question?
Provide an “R” script that includes code and explanatory #comments for the following steps:
Load the full 2018-2020 workspace.
1) Choose a set of key words or phrases that are useful for your team project and use GloVe word embeddings to find additional synonyms within the corpus. List any new words as a #comment
2) Generate a frequency table showing the appearances per document of your key words/phrases using the dfm_select or dfm_lookup function.
3) Use the kwic function to extract a text window around one of your key words or phrases and combine the pre- and post- windows.
4) Choose one of the following and analyze the text windows: readability, lexical diversity, or one of the sentiment analysis approaches.
5) Write a few sentences at the end about what this analysis shows you (include this as # comments at the end of your R script).
6) # Please use the stringr and regex syntax to view all instances of “wage,” # “wages,” “Wage,” and “Wages” in the second document. # Which member of Congress utters the word “wage” (or a variant thereof) in this document?
7) # Extract all instances of “wage” (and its variants) that occur within 50 # characters of the word “living” in the first 50 documents. Save these matches # to an object named “living_wage”
# How many matches did you find?
# Which of the first 10 documents has the highest number of matches?
# What were the phrases captured by the regex?
# Now run the code again, expanding the window to 100, 200, and 500 characters. # Does the regex find any additional phrases? If so, what are they?
8) # Use the kwic command to extract a 100-token window around the regex you wrote # for Problem 7. Save this kwic object as “lw_window” and convert it into a data
# frame named “df_lw_window”.
# What are the dimensions of this data frame?
9) As you may have noticed, many words are split in half by a hyphenation followed
# by at least one space (“- “). This is a function of how the PDF documents were
# originally formatted and the difficulty of converting these documents to plain
# text. # Write a regex to replace all occurrences of this break in the first 10 documents
# in the cr_txt object and create a new data frame named “cr_txt_cleaned.”
# Then check the text to make sure that you have performed this replacement
# properly.
10) We will be leveraging the Congressional Record’s relatively uniform
# structure to split the text at the beginning of a speaker’s statement. These
# transitions (1) start with a new line, (2) include the word “Mr.” or “Ms.”, and
# list the name of the speaker in ALL CAPS.
# Write a regex to match and extract all instances of a new speaker in the first
# 10 documents. Save these matches in an object named “speakers” and convert # this list object into a data frame.
11) Segment the text by speaker
12) Merge in covariates of interest (State, Minimum Wage, Union)
13) Troubleshoot failed merges by harmonizing member names;
14) Choose at least one text comparison method and generate a comparison between subsets of your text, (example speeches by Republicans versus by Democrats and Independents)
15) Write a few sentences at the end about what this comparison shows you (include this as # comments at the end of your R script).
Requirements: R Data Script – 15 Questions
ATTACHMENTS
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more