Sophia He's Data Blog

Articles by Sophia He

Credit Risk Modeling in R

Sat 20 July 2019
By Sophia He

Introduction and EDA

Components of expected loss (EL) - Probability of default (PD) - Exposure at default (EAD): the amount of the loan that still needs to be repaid at the time of default - Loss given default (LGD): the amount of loss if there is a default (express as a percentage of …

Read more

Tagged as : Finance R

Statistical Thinking in Python (Part 2)

Parameter estimation by optimization

nohitter_times is an array with length of 251. array([ 843, 1613, 1101, 215, 684, 814,..])

# Seed random number generator
np.random.seed(42)

# Compute mean no-hitter time: tau
tau = np.mean(nohitter_times)

# Draw out of an exponential distribution with parame*ter tau: inter_nohitter_time
inter_nohitter_time = np.random …

Tagged as : Statistics Python

Statistical Thinking in Python (Part 1)

Graphical exploratory data analysis

Histogram

# Import plotting modules
import matplotlib.pyplot as plt
import seaborn as sns

# Set default Seaborn style
sns.set()

# Compute number of data points: n_data
n_data = len(versicolor_petal_length)

# Number of bins is the square root of number of data points: n_bins
n_bins = np.sqrt(n_data)

# Convert …

Tagged as : Statistics Python

Set up a blog in Pelican

Fri 10 May 2019
By Sophia He
```
source activate pyspark_env
pip install pelican markdown
```
Create a new github repo with repo name as sophia-li-he.github.io Change directory to the folder where you want to save your blog
```
$ git clone https://github.com/sophia-li-he/sophia-li-he.github.io.git
```
Change to the new directory
```
cd sophia-li-he.github.io …
```
Read more

Tagged as : Python

Data Science Project Cheat Sheet

Data Profile

Summary info

def singlevalue(my_series):
    """
    input: pd.series
    output: the number of single values
    """
    value = len(my_series.value_counts()[my_series.value_counts() == 1])
    return value

def df_explore(data_explore):
    """
    input: pandas dataframe
    output: 
    1. the shape of the data frame
    2. the number of unique value in each column
    3. the …

Tagged as : Python

Sophia He's Data Blog

Articles by Sophia He

Credit Risk Modeling in R

Introduction and EDA

Statistical Thinking in Python (Part 2)

Parameter estimation by optimization

Statistical Thinking in Python (Part 1)

Graphical exploratory data analysis

Histogram

Set up a blog in Pelican

Data Science Project Cheat Sheet

Data Profile

Summary info

Pages

Categories