Articles by Sophia He

  1. Credit Risk Modeling in R

    Introduction and EDA

    Components of expected loss (EL) - Probability of default (PD) - Exposure at default (EAD): the amount of the loan that still needs to be repaid at the time of default - Loss given default (LGD): the amount of loss if there is a default (express as a percentage of …

    Tagged as : Finance R
  2. Statistical Thinking in Python (Part 2)

    Parameter estimation by optimization

    nohitter_times is an array with length of 251. array([ 843, 1613, 1101, 215, 684, 814,..])

    # Seed random number generator
    np.random.seed(42)
    
    # Compute mean no-hitter time: tau
    tau = np.mean(nohitter_times)
    
    # Draw out of an exponential distribution with parame*ter tau: inter_nohitter_time
    inter_nohitter_time = np.random …
    Tagged as : Statistics Python
  3. Statistical Thinking in Python (Part 1)

    Graphical exploratory data analysis

    Histogram
    # Import plotting modules
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # Set default Seaborn style
    sns.set()
    
    # Compute number of data points: n_data
    n_data = len(versicolor_petal_length)
    
    # Number of bins is the square root of number of data points: n_bins
    n_bins = np.sqrt(n_data)
    
    # Convert …
    Tagged as : Statistics Python
  4. Set up a blog in Pelican

    source activate pyspark_env
    pip install pelican markdown
    

    Create a new github repo with repo name as sophia-li-he.github.io Change directory to the folder where you want to save your blog

    $ git clone https://github.com/sophia-li-he/sophia-li-he.github.io.git
    

    Change to the new directory

    cd sophia-li-he.github.io …
    Tagged as : Python
  5. Data Science Project Cheat Sheet

    Data Profile

    Summary info

    def singlevalue(my_series):
        """
        input: pd.series
        output: the number of single values
        """
        value = len(my_series.value_counts()[my_series.value_counts() == 1])
        return value
    
    def df_explore(data_explore):
        """
        input: pandas dataframe
        output: 
        1. the shape of the data frame
        2. the number of unique value in each column
        3. the …
    Tagged as : Python

Pages