Skip to content

๐Ÿ“ฐ๋‰ด์Šค ๊ธฐ์‚ฌ๋กœ๋ถ€ํ„ฐ ํ•ด์šด์—… ๊ฒฝ์ œ ๋™ํ–ฅ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ

Notifications You must be signed in to change notification settings

soykeepgoing/shipping-sentiment-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Shipping Sentiment Index

(2021~2022) Shipping Sentiment Index : ๋‰ด์Šค๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํ•ด์šด์—… ๊ฒฝ๊ธฐ ์˜ˆ์ธก ์ง€์ˆ˜
Update: 2022-04-28

Index

About this project

  • ํ”„๋กœ์ ํŠธ ์ด๋ฆ„: ๋‰ด์Šค๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ํ•ด์šด์—… ๊ฒฝ๊ธฐ ๋‹น๊ธฐ ์˜ˆ์ธก ์ง€์ˆ˜ ๊ฐœ๋ฐœ
  • ํ”„๋กœ์ ํŠธ ์ง„ํ–‰ ๋ชฉ์ : 2021 ๊ณต๊ณต๋น…๋ฐ์ดํ„ฐ ์ธํ„ด์‹ญ ์ˆ˜๋ จ ํ™œ๋™
  • ํ”„๋กœ์ ํŠธ ์ง„ํ–‰ ๊ธฐ๊ฐ„: 2021๋…„ 9์›” ~ 2022๋…„ 2์›”
  • ํ”„๋กœ์ ํŠธ ์ฐธ์—ฌ ์ธ์›: 1๋ช…

Overview

Goal

  • (๋ชฉ์ ) ๋‰ด์Šค๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ํ…์ŠคํŠธ๋งˆ์ด๋‹ ๊ธฐ๋ฒ•๊ณผ ๊ฐ์„ฑ๋ถ„์„์„ ํ†ตํ•ด ๋‰ด์Šค๋ฐ์ดํ„ฐ ์ง€์ˆ˜๋ฅผ ์‚ฐ์ถœํ•˜๊ณ  ๊ฒฝ๊ธฐ ์˜ˆ์ธก์— ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•จ.
  • (ํ•„์š”์„ฑ) ๋‹ค์†Œ ์ƒ์†Œํ•œ ํ•ด์šด์—… ๋ถ„์•ผ์˜ ์‹ค๋ฌผ๊ฒฝ๊ธฐ ํ˜„ ์ƒํ™ฉ๊ณผ ๋ณ€ํ™” ๋ฐฉํ–ฅ์„ ์‹ ์†ํ•˜๊ฒŒ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฒฝ์ œ ์ฃผ์ฒด๋“ค์˜ ๋ฏผ์ฒฉํ•œ ๋Œ€์‘์ฑ… ๋งˆ๋ จํ•˜๊ธฐ ์œ„ํ•จ.

Flow

Detail Function

Analysis Sentimental

(1) Crawling and Merging
ํŒŒ์ผ ์œ„์น˜: Developing-CurrentForecastIndex-for-ShippingIndustry/1. Analysis Sentimental/(1) Crawling & Merging/

  • ๋ชจ๋ธ ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์ถ•์„ ์œ„ํ•ด ๋„ค์ด๋ฒ„, ๋‹ค์Œ์—์„œ ๋‹ค์Œ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํฌ๋กค๋งํ•จ.
    ๊ฒ€์ƒ‰ ๋‰ด์Šค: ๋งŽ์ด ๋ณธ ๋‰ด์Šค/ ๋Œ“๊ธ€ ๋งŽ์€ ๋‰ด์Šค (2021๋…„ 3์›” ~ 2021๋…„ 11์›”)
    • ๋‰ด์Šค ๋‚ ์งœ
    • ๋‰ด์Šค ์ œ๋ชฉ
    • ๋‰ด์Šค ๋ณธ๋ฌธ
    • ๋‰ด์Šค URL
    • ๋‰ด์Šค ์•„๋ž˜ ๊ฐ์„ฑ rating ์ˆ˜์น˜
      • ์ข‹์•„์š”
      • ๊ฐ๋™์ด์—์š”
      • ์Šฌํผ์š”
      • ํ™”๊ฐ€ ๋‚˜์š”.
  • ๊ฐ ๊ธฐ์‚ฌ์˜ ๊ฐ์„ฑ์ง€์ˆ˜๋Š” ๋‹ค์Œ์˜ ์‹์„ ํ†ตํ•ด ์‚ฐ์ถœํ•จ.
    • ๊ธ์ • rating(์ข‹์•„์š”, ๊ฐ๋™์ด์—์š”) - ๋ถ€์ • rating (์Šฌํผ์š”, ํ™”๊ฐ€ ๋‚˜์š”)
    • ์–‘์ˆ˜์ด๋ฉด 1(๊ธ์ •) tag, ์Œ์ˆ˜์ด๋ฉด 0(๋ถ€์ •) tag
  • ํฌ๋กค๋ง ํ›„ ์ „์ฒด ๊ธฐ์‚ฌ Merge, ์›”๋ณ„๋กœ Merge

(2) Modeling
์ „์ฒ˜๋ฆฌ

def common_word_list(common_num,neg,pos): negative_word=[]; positive_word=[] n_list=neg.most_common(common_num); p_list=pos.most_common(common_num) for i in range(common_num): negative_word.append(n_list[i][0]) positive_word.append(p_list[i][0]) common_list=list(set(negative_word) & set(positive_word)) print(common_list) print('common_list ๊ธธ์ด', len(common_list)) return common_list # #tokenized๋ฅผ list๋กœ ๋ณ€๊ฒฝ mecab=Mecab() stopwords = ['ํ–ˆ','์žˆ','์œผ๋กœ','๋กœ','๊ฒƒ','์”จ','๋ง','๋„', '๋Š”', '๋‹ค', '์˜', '๊ฐ€', '์ด', '์€','์ˆ˜','์—์„œ','ํ•œ', '์—', 'ํ•˜', '๊ณ ', '์„', '๋ฅผ', '์ธ', '๋“ฏ', '๊ณผ', '์™€', '๋„ค', '๋“ค', '๋“ฏ', '์ง€', '์ž„', '๊ฒŒ', '๋งŒ', '๊ฒœ', '๋˜', '์Œ', '๋ฉด'] train_data['tokenized']=train_data['Sentence'].apply(mecab.morphs) #Sentence ๋‚ด์šฉ์„ morphs๋กœ ํ˜•ํƒœ์†Œ ๋ถ„์„(type: list) train_data['tokenized'] = train_data['tokenized'].apply(lambda x: [item for item in x if item not in stopwords]) #ํ•ด๋‹น ์—ด์˜ ๊ฐ’ ์ค‘ stopword์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’ ์ง€์šฐ๊ธฐ train_data['tokenized'] = train_data['tokenized'].apply(lambda x: [item for item in x if len(item)>1]) #๊ธธ์ด 2์ด์ƒ๋งŒ ์ €์žฅ train_data['tokenized'] = train_data['tokenized'].apply(lambda x: [item for item in x if item not in common_list]) #ํ•ด๋‹น ์—ด์˜ ๊ฐ’ ์ค‘ stopword์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’ ์ง€์šฐ๊ธฐ
  • ๋‹ค์Œ์˜ ๊ธฐ์ค€์œผ๋กœ ๊ธฐ์‚ฌ ๋ณธ๋ฌธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ „์ฒ˜๋ฆฌ๋ฅผ ์ง„ํ–‰ํ•จ.
    • (1) ์กฐ์‚ฌ, ์–ด๋ฏธ ๋“ฑ์œผ๋กœ ๊ตฌ์„ฑ๋œ stopword ์ œ๊ฑฐ
    • (2) ๋‹จ์–ด์˜ ๊ธธ์ด๊ฐ€ 2๋ณด๋‹ค ์ž‘์€ ๊ฒฝ์šฐ ์ œ๊ฑฐ
    • (3) common word list๋ฅผ ์ƒ์„ฑํ•˜๊ณ (ํ•จ์ˆ˜ common_word_list), ๊ทธ์— ํ•ด๋‹นํ•˜๋Š” ๋‹จ์–ด ์ œ๊ฑฐ

์ •์ˆ˜ ์ธ์ฝ”๋”ฉ ๋ฐ ํŒจ๋”ฉ

### ์ •์ˆ˜ ์ธ์ฝ”๋”ฉ ### tokenizer = Tokenizer() tokenizer.fit_on_texts(X_train) #๋ฌธ์ž๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜, ๊ฐ ๋‹จ์–ด์— index ๋ถ€์—ฌ vocab_size = total_cnt - rare_cnt + 2 # ์‚ฌ์šฉ๋˜๋Š” ๋‹จ์–ด ์ง‘ํ•ฉ์˜ ํฌ๊ธฐ tokenizer = Tokenizer(vocab_size, oov_token = 'OOV') #์ƒˆ vocab_size๋กœ tokenizer ์ƒˆ๋กœ ์„ค์ • tokenizer.fit_on_texts(X_train) #X_train, X_test์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์–ด์„œ ์ธ์ฝ”๋”ฉ  X_train = tokenizer.texts_to_sequences(X_train) ### ํŒจ๋”ฉ ###  def below_threshold_len(max_len, nested_list): # ํฌ๊ท€ ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜๋งŒํผ ์ œ๊ฑฐํ•˜๋Š” ํ•จ์ˆ˜, max_len์€ ๋ฆฌ๋ทฐ์˜ ์ตœ๋Œ€ ๋ฐ ํ‰๊ท  ๊ธธ์ด๋ฅผ ๋ณด๊ณ  ๋น„๊ตํ•ด์„œ ์„ค์ • count = 0 for sentence in nested_list: if(len(sentence) <= max_len): count = count + 1 print('์ „์ฒด ์ƒ˜ํ”Œ ์ค‘ ๊ธธ์ด๊ฐ€ %s ์ดํ•˜์ธ ์ƒ˜ํ”Œ์˜ ๋น„์œจ: %s'%(max_len, (count / len(nested_list))*100)) max_len = 1000 below_threshold_len(max_len, X_train) X_train = pad_sequences(X_train, maxlen = max_len)
  • ์ •์ˆ˜ ์ธ์ฝ”๋”ฉ ๋ฒ”์œ„ ์„ค์ •ํ•จ.
    • ์ „์ฒด ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜(total cnt)์™€ ์ž„๊ณ„์น˜(threshold)๋ณด๋‹ค ์ž‘์€ ๊ฒฝ์šฐ์— ํ•ด๋‹นํ•˜๋Š” ํฌ๊ท€ ๋‹จ์–ด ์ˆ˜(rare cnt)๋ฅผ ๊ณ„์‚ฐ
  • ํŒจ๋”ฉ
    • max_len์˜ ๊ฐ’์„ ์ž„์˜๋กœ ๋ณ€๊ฒฝํ•˜๋ฉฐ ์ƒ˜ํ”Œ ๋น„์œจ์„ ํ™•์ธํ•˜๊ณ (ํ•จ์ˆ˜ below_threshold_len), pad_sequence ์‹ค์‹œ

๋ชจ๋ธ ์ƒ์„ฑ

embedding_dim = 100 hidden_units = 128 model = Sequential() model.add(Embedding(vocab_size, embedding_dim)) model.add(Bidirectional(LSTM(hidden_units))) model.add(Dense(1, activation='sigmoid')) es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=4) mc = ModelCheckpoint('best_model.h5', monitor='val_acc', mode='max', verbose=1, save_best_only=True) model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc']) history = model.fit(X_train, y_train, epochs=15, callbacks=[es, mc], batch_size=256, validation_split=0.2) loaded_model = load_model('best_model.h5') print("ํ…Œ์ŠคํŠธ ์ •ํ™•๋„: %.4f" % (loaded_model.evaluate(X_test, y_test)[1]))
  • ๋‹ค์Œ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•˜๊ณ  ๋ชจ๋ธ๋ง ์‹ค์‹œ
    • from tensorflow.keras.layers import Embedding, Dense, LSTM, Bidirectional
    • from tensorflow.keras.models import Sequential, load_model
    • from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
  • Bidirectional-LSTM ๋ฐฉ์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•จ.
  • ์†์‹ค์œจ: 0.4597 // ์ •ํ™•๋„: 0.8141

Handling Shipping News

(1) Crawling
ํŒŒ์ผ ์œ„์น˜: Developing-CurrentForecastIndex-for-ShippingIndustry/2. Handling Shipping News/Crawling_๋‰ด์Šค๋ฐ์ดํ„ฐ_Shipping.ipynb

  • ๊ฐ์„ฑ ๋ถ„๋ฅ˜๊ธฐ ๋ชจ๋ธ์— input์œผ๋กœ ๋“ค์–ด๊ฐˆ ๋ฐ์ดํ„ฐ ๊ตฌ์ถ•์„ ์œ„ํ•ด bigkinds ์‚ฌ์ดํŠธ์—์„œ ๋‰ด์Šค๋ฐ์ดํ„ฐ๋ฅผ ํฌ๋กค๋งํ•จ.
  • ํฌ๋กค๋งํ•œ ๋ฐ์ดํ„ฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Œ.
    • ๊ฒ€์ƒ‰ ํ‚ค์›Œ๋“œ: ํ•ด์šด์—…,ํ•ด์šด์‚ฐ์—…,ํ•ด์šด๊ฒฝ๊ธฐ,ํ•ด์šด์—…๊ณ„
    • ๊ฒ€์ƒ‰ ๊ธฐ๊ฐ„: 2000๋…„ 1์›” ~ 2021๋…„ 11์›”
    • ๋‰ด์Šค ์ œ๋ชฉ
    • ๋‰ด์Šค ๋‚ ์งœ
    • ๋‰ด์Šค ๋ณธ๋ฌธ
    • ๋‰ด์Šค url

(2) Topic Modeling
๊ฐ 80๊ฐœ ํ† ํ”ฝ์˜ ์ƒ์œ„ 25๊ฐœ ์—ฐ๊ด€์–ด๋ฅผ ์ถ”์ถœ ํ›„ ์ •ํ•ฉ์„ฑ ๊ฒ€์ฆ ํ›„ NMF ํ† ํ”ฝ์„ ์‚ฌ์šฉํ•˜์˜€์Œ.
LDA Topic Modeling

# ์„ค์น˜ ํŒจํ‚ค์ง€ from gensim import corpora, models from gensim.models.coherencemodel import CoherenceModel from gensim.models.ldamodel import LdaModel from gensim.corpora.dictionary import Dictionary from gensim.test.utils import common_texts from gensim.test.utils import datapath # common_texts์—์„œ dictionary ์ƒ์„ฑ common_dictionary = Dictionary(common_texts) common_corpus = [common_dictionary.doc2bow(text) for text in common_texts] # corpus๋ฅผ ํ™œ์šฉํ•˜์—ฌ LdaModel ์ƒ์„ฑ lda = LdaModel(common_corpus, num_topics=80) #document(๋‰ด์Šค๋ฐ์ดํ„ฐ)์—์„œ word ์ถ”์ถœ (๋ง๋ญ‰์น˜ ์ƒ์„ฑ) data_word=[[word for word in x.split(' ')] for x in document] id2word=corpora.Dictionary(data_word) texts=data_word corpus=[id2word.doc2bow(text) for text in texts] print("Corpus Ready") #์ƒ์„ฑํ•œ ๋ง๋ญ‰์น˜๋กœ lda ์‹œ์ž‘  lda = LdaModel(corpus=corpus, id2word=id2word, num_topics=80) print("lda done, please wait") #์ถœ๋ ฅ๋ถ€ for i in range(num_topics): words = model.show_topic(i, topn=num_words); #๋ฐ˜ํ™˜ํ•˜๋Š” ํ† ํ”ฝ ์—ฐ๊ด€์–ด ๊ฐœ์ˆ˜  word_dict['Topic # ' + '{:02d}'.format(i+1)] = [i[0] for i in words] print("Result_out")
  • gensim ํŒจํ‚ค์ง€ ํ™œ์šฉํ•˜์—ฌ LDA Topic Modeling
  • 80๊ฐœ ํ† ํ”ฝ์œผ๋กœ ๋‚˜๋ˆ„์–ด ๋ถ„๋ฅ˜

NMF Topic Modeling

from sklearn.feature_extraction.text import CountVectorizer,TfidfTransformer from sklearn.decomposition import NMF from sklearn.preprocessing import normalize #Count Vector ์ƒ์„ฑ vectorizer=CountVectorizer(analyzer='word') x_counts=vectorizer.fit_transform(text) transformer=TfidfTransformer(smooth_idf=False) x_tfidf=transformer.fit_transform(x_counts) xtfidf_norm=normalize(x_tfidf,norm='l2',axis=1) print("xtfidf_norm Ready") model=NMF(n_components=80,init='nndsvd') model.fit(xtfidf_norm) # xtidf ๋ฐ์ดํ„ฐ๋ฅผ fitํ•จ print("Model Ready") for topic in range(components_df.shape[0]): tmp = components_df.iloc[topic] print(f'For topic {topic+1} the words with the highest value are:') print(tmp.nlargest(25)) #์ถœ๋ ฅ๋ถ€
  • sklearn์„ ํ™œ์šฉํ•˜์—ฌ NMF Topic Modeling
  • ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ 80๊ฐœ ํ† ํ”ฝ์„ ๋ถ„๋ฅ˜

Calculating Index

(1) Topic Count

  • 2000๋…„ 1์›”๋ถ€ํ„ฐ 2021๋…„ 10์›”๊นŒ์ง€์˜ ๋‰ด์Šค๋ฐ์ดํ„ฐ์—์„œ NMF ๋ฐฉ์‹์œผ๋กœ ์ถ”์ถœํ•œ ๊ฐ ํ† ํ”ฝ๋ณ„ ์—ฐ๊ด€์–ด์˜ ๊ฐœ์ˆ˜ ์ง‘๊ณ„
  • ์›”๋ณ„ ์ง€์ˆ˜ ์‚ฐ์ถœ์„ ์œ„ํ•ด ๊ฐ ๋‰ด์Šค๋ฐ์ดํ„ฐ์˜ ์ผ๋ณ„ ํ† ํ”ฝ ๋‹จ์–ด ์ˆ˜๋ฅผ ์ง‘๊ณ„ํ•จ

(2) Sentimental Index Daily Sentimental

# ๊ฐ์„ฑ์ง€์ˆ˜๋ฅผ ๋ถ„์„ํ•˜๋Š” ํ•จ์ˆ˜  def sentiment_predict(new_sentence): encoded = tokenizer.texts_to_sequences([new_sentence]) # ์ •์ˆ˜ ์ธ์ฝ”๋”ฉ pad_new = pad_sequences(encoded, maxlen = max_len) # ํŒจ๋”ฉ score = float(loaded_model.predict(pad_new)) # ์˜ˆ์ธก return score
  • ์œ„ ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๊ฐ ๋‰ด์Šค๋ฐ์ดํ„ฐ์˜ ๊ธ์ •, ๋ถ€์ • ์ง€์ˆ˜๋ฅผ predictํ•จ.
  • ์˜ˆ์ธก ํ›„์—๋Š” ๋‹จ์ˆœ ๊ฐ์„ฑ์ง€์ˆ˜์— ํ•ด๋‹นํ•˜๋Š” ๊ธ์ •-๋ถ€์ • ๊ฐ’์„ ๋ง๋ถ™์—ฌ์ฃผ์—ˆ์Œ.

Monthly Sentimental

for i in LCount: #์›”๋ณ„ ๋‰ด์Šค ๊ฐœ์ˆ˜  index=LCount.index(i) df_tmp=df_Sentimental[pre:pre+i] #์ผ๋ณ„ ๊ฐ์„ฑ์ง€์ˆ˜์˜ ํ‰๊ท ๊ฐ’์„ ๊ฐ์„ฑ์ง€์ˆ˜ pos=df_tmp['Pos'].tolist(); MeanPos=np.mean(pos); neg=df_tmp['Neg'].tolist(); MeanNeg=np.mean(neg) SentiIndex=(MeanPos-MeanNeg)*100 LSenti.append(round(SentiIndex,1)) pre=i
  • ๋‹ค์Œ์˜ ์ ˆ์ฐจ๋ฅผ ํ†ตํ•ด ์ผ๋ณ„๋กœ ์˜ˆ์ธกํ•œ ๊ฐ์„ฑ์ง€์ˆ˜๋ฅผ ์›”๋ณ„ ์ง€์ˆ˜๋กœ ๋ณ€ํ™˜ํ•˜์˜€์Œ.
    • (1) ์›”๋ณ„ ๋‰ด์Šค ๊ฐœ์ˆ˜ ๋งŒํผ ๊ธ์ • ์ˆ˜์น˜์™€ ๋ถ€์ •์ˆ˜์น˜์˜ ํ‰๊ท ์„ ๊ตฌํ•จ.
    • (2) (ํ‰๊ท  ๊ธ์ • - ํ‰๊ท  ๋ถ€์ •)*100 ์œผ๋กœ ๊ฐ์„ฑ์ง€์ˆ˜๋ฅผ ์‚ฐ์ถœ

(3) Index

  • ๋‰ด์Šค๋ฐ์ดํ„ฐ์ง€์ˆ˜ ์‚ฐ์ถœ์˜ ๊ฒฝ์šฐ ์„ ํ–‰์—ฐ๊ตฌ๋ฅผ ๋”ฐ๋ผ ์‹์„ ์„ค๊ณ„ํ•˜์˜€์Œ.
    • ๊ฒฐํ•ฉ์ง€์ˆ˜ 1: ๊ฐ์„ฑ์ง€์ˆ˜ * ํ† ํ”ฝ ๋น„์ค‘ ์ƒ์œ„ 20๊ฐœ ํ† ํ”ฝ์˜ 10๊ฐœ ์—ฐ๊ด€์–ด ๋น„์ค‘ (%)
    • ๊ฒฐํ•ฉ์ง€์ˆ˜ 2: ๊ฐ์„ฑ์ง€์ˆ˜ * ํ† ํ”ฝ ๊ฐ„ ์ƒ๊ด€ ์ƒ์œ„ 20๊ฐœ ํ† ํ”ฝ ๋‹จ์–ด ๋น„์ค‘ (%)
    • ๊ฒฐํ•ฉ์ง€์ˆ˜ 3: ๊ฐ์„ฑ์ง€์ˆ˜*ํ† ํ”ฝ-์ƒ์‚ฐ ์ƒ๊ด€ ์ƒ์œ„ 20๊ฐœ ํ† ํ”ฝ ๋‹จ์–ด ๋น„์ค‘ (%)
      ์—ฌ๊ธฐ์„œ ํ•ด์šด์—… ์ƒ์‚ฐ ์ƒ๊ด€์„ฑ์„ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ์ˆ˜์ƒ์šด์†ก์—…์ƒ์‚ฐ์ง€์ˆ˜๋ฅผ ์ฐธ๊ณ ํ•˜์˜€์Œ. (ํ†ต๊ณ„์ฒญ)
  • 3๊ฐœ์˜ ์ง€์ˆ˜์™€ ์‹ค์ œ์ง€ํ‘œ ๊ฐ„ ๋†’์€ ์ƒ๊ด€์„ฑ์„ ๋„๋Š” ๊ฒฐํ•ฉ์ง€์ˆ˜ 3์„ ๋‰ด์Šค๋ฐ์ดํ„ฐ ์ง€์ˆ˜๋กœ ์„ ์ •ํ•˜์˜€์Œ.
    • ์‹ค์ œ ์ง€ํ‘œ: OECD์—์„œ ๋ฐœํ‘œํ•œ ์šฐ๋ฆฌ๋‚˜๋ผ์˜ ์‚ฐ์—…์ƒ์‚ฐ์ง€์ˆ˜
    • ์ง€์ˆ˜์™€ ์‹ค์ œ ์ง€ํ‘œ ๊ฐ„ ์ƒ๊ด€๊ณ„์ˆ˜
      ๊ฒฐํ•ฉ์ง€์ˆ˜(1) ๊ฒฐํ•ฉ์ง€์ˆ˜(2) ๊ฒฐํ•ฉ์ง€์ˆ˜(3)
      -0.295 -0.343 -0.493

Data Analysis

(1) Data Set Ready

  • ๋ชจํ˜•์„ ๋งŒ๋“ค๊ธฐ ์ „ ๋‰ด์Šค๋ฐ์ดํ„ฐ ์ด์™ธ์— ๋‹น๊ธฐ์˜ˆ์ธก๋ชจํ˜•์— ์ ์šฉ๋  ํ•ด์šด์—… ์‹ค๋ฌผ ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์ถ•ํ•จ.
  • Stopford(2008), Chen et al(2015), Choi,Kim and Han(2018)์„ ์ฐธ๊ณ ํ•˜์—ฌ ํ•ด์šด์‹œ์žฅ์˜ ๊ณต๊ธ‰๋ถ„์•ผ, ์ˆ˜์š”๋ถ„์•ผ, ์šด์ž„ ๋ฐ ๊ฐ€๊ฒฉ ๋ถ„์•ผ, ๊ฒฝ์ œ์ƒํ™ฉ ๋ถ„์•ผ๋กœ ๋‚˜๋ˆ„์–ด ์ˆ˜์ง‘, ์šฐ๋ฆฌ๋‚˜๋ผ ํ•ด์šด์—… ์ƒ์‚ฐ์ง€์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๊ธฐ์— KOSPI, CLI(KOREA) ๋“ฑ์„ ์ถ”๊ฐ€ํ•˜์—ฌ 26๊ฐœ์˜ ์‹ค๋ฌผ ์ง€ํ‘œ๋ฅผ ์„ ์ •ํ•จ.
  • ์›”๋ณ„ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์— ํ•ด๋‹นํ•˜๋ฏ€๋กœ ์•ˆ์ •๋œ ์‹œ๊ณ„์—ด์„ฑ์„ ๋„๊ธฐ ์œ„ํ•ด Bpanel(library tseries)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์•ˆ์ •ํ™”ํ•จ.
    • trans code๋Š” 3์œผ๋กœ ์ „๋…„๋Œ€๋น„ ์ฆ๊ฐ€์œจ์— ํ•ด๋‹น

(2) Modling

  • Domenico Giannone(2008)์ด ์ œ์•ˆํ•œ ๋‹น๊ธฐ์˜ˆ์ธก๋ชจํ˜•์„ ํ™œ์šฉํ•˜์—ฌ ํ•ด์šด์—… ๊ฒฝ๊ธฐ ์˜ˆ์ธก์„ ์‹œ๋„ํ•˜์˜€์Œ.
  • ์˜ˆ์ธก๋ ฅ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด 3๊ฐœ์˜ ๋ชจ๋ธ์„ ๋งŒ๋“ค์—ˆ์Œ.
    • ์ž๊ธฐํšŒ๊ท€๋ชจํ˜•
    • ๋™ํƒœ์š”์ธ๋ชจํ˜• : ์‹ค์ œ ์ง€ํ‘œ๋งŒ ์‚ฌ์šฉ
    • ๋‹น๊ธฐ์˜ˆ์ธก๋ชจํ˜• : ์‹ค์ œ ์ง€ํ‘œ + ๋‰ด์Šค๋ฐ์ดํ„ฐ ์ง€์ˆ˜ ์‚ฌ์šฉ
  • ๊ฐ ๋ชจํ˜•์˜ RMSE์™€ MAE ๋น„๊ต
    ๋ถ„๋ฅ˜ ์ž๊ธฐํšŒ๊ท€๋ชจํ˜• ๋™ํƒœ์š”์ธ๋ชจํ˜• ๋‹น๊ธฐ์˜ˆ์ธก๋ชจํ˜•
    RMSE 0.06661 0.03762 0.03754
    MAE 0.04841 0.02794 0.02784
  • ๋น„๊ต ๊ฒฐ๊ณผ ๋‹น๊ธฐ์˜ˆ์ธก๋ชจํ˜•(์‹ค์ œ ์ง€ํ‘œ + ๋‰ด์Šค๋ฐ์ดํ„ฐ ์ง€์ˆ˜)์˜ ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์ข‹์•˜์Œ.

Environment

  • Python (3.7.3)
  • R (4,1.2)
  • JupyterNotebook

About

๐Ÿ“ฐ๋‰ด์Šค ๊ธฐ์‚ฌ๋กœ๋ถ€ํ„ฐ ํ•ด์šด์—… ๊ฒฝ์ œ ๋™ํ–ฅ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published