Jekyll2020-10-17T16:15:31-07:00https://wenyuan-vincent-li.github.io/feed.xmlWENYUAN LI / HOMEPAGEpersonal descriptionWenyuan (Vincent) Li (PhD)wenyuanli@fb.comA workflow for Git and GitHub2018-09-09T00:00:00-07:002018-09-09T00:00:00-07:00https://wenyuan-vincent-li.github.io/posts/2018/09/github-workflow<p>This is a <a href="http://www.matiasz.com/2015/06/30/a-workflow-for-git-and-github/">re-blog</a> from my
labmates <a href="http://www.matiasz.com/2015/06/30/a-workflow-for-git-and-github/">NICHOLAS J. MATIASZ</a>.
It aims to help people get familiar with Git and GitHub workflow.</p>
<p>Although version control is useful for all software development,
team-based development particularly highlights the need for tools like Git and GitHub.
Team-based development can get complicated, though, when each teammate has unique
preferences for version control tools and workflows. In this post,
I present the Git/GitHub workflow that I currently use, and I share my impressions from
using it daily. In my experience, this workflow offers a sweet spot with respect to the
time it requires and the benefit it delivers. This workflow makes me more productive,
and—when done right—it blurs the line between version control and project management.
Essential parts of developers’ work, version control tools and workflows should be included
in all computer science curricula.</p>
<h2 id="gitgithub-workflow">Git/GitHub workflow</h2>
<ol>
<li>On GitHub, create an <a href="https://guides.github.com/features/issues/"><em>issue</em></a> (e.g., #17) for a specific feature, bugfix, etc.</li>
<li><code class="language-plaintext highlighter-rouge">$ git checkout master</code></li>
<li><code class="language-plaintext highlighter-rouge">$ git pull</code></li>
<li><code class="language-plaintext highlighter-rouge">$ git checkout -b <branch_name></code> (e.g., 17_fix_login_form )</li>
<li>Commit code with <a href="https://chris.beams.io/posts/git-commit/"><em>descriptive commit messages</em></a>.</li>
<li><code class="language-plaintext highlighter-rouge">$ git push origin <branch_name></code></li>
<li>On GitHub, create a <a href="https://help.github.com/articles/about-pull-requests/"><em>pull request</em></a>.</li>
<li>In the pull request’s description, write “Resolves #<issue_number>.”</issue_number></li>
<li>On GitHub, <a href="https://help.github.com/articles/merging-a-pull-request/"><em>merge</em></a> the pull request.</li>
</ol>
<h3 id="notes">Notes</h3>
<p>This workflow improves some psychological aspects of programming.
By creating an issue on GitHub before you start to code, you are forced to
articulate every planned task—an absolute requirement for technical work.
As one benefit of this strategy, when you create an issue, you define a
clear scope for your work. Defining a clear scope helps you to avoid straying
from your goal while you code. As I navigate through a repository,
I’m sometimes guilty of thinking, “Oh, hey—that’s broken, too.” A few minutes
later, I’m changing some CSS when I originally intended to fix a database query.
Before writing any code, you also benefit from describing your plan in words.
If you can’t explain what you’re doing in plain language, you shouldn’t start
to code.</p>
<p>To help you convey your ideas, GitHub’s interface allows you to paste images
and code snippets into each description box. I often use this feature because
a picture or code snippet will sometimes remind me of a task faster than words
alone. Github’s text fields can also parse <a href="https://help.github.com/categories/writing-on-github/"><em>GitHub Flavored Markdown</em></a>, a simple
syntax for styling your text (e.g., with bold or italics).</p>
<p>Note that I like to start each branch name with its corresponding issue’s number.
This way, I immediately know every branch’s goal, even if I haven’t worked on
it for a while. Similarly, writing “Resolves #<issue_number>” in each pull
request’s description explicitly ties the pull request to its issue. This latter
method has a convenient side effect: GitHub will [*automatically close*](https://blog.github.com/2013-05-14-closing-issues-via-pull-requests/) the
issue once you merge the pull request.</issue_number></p>
<p>Following this workflow yields an elaborate history of your development activity.
Such a history removes part of the burden of having to remember where everything
is in your codebase. Here’s an example: One of my projects uses JavaScript
for zooming on an SVG element in HTML. Some time ago, I created a branch and
pull request to adjust the minimum and maximum levels of zoom allowed for this
element. If, months or even years later, I want to change these zoom levels,
I don’t have to spend time figuring out how I first did it. I can just search
my repository with the term “zoom,” and GitHub will direct me to the exact
commit that recorded the change. GitHub’s intuitive visualizations of changes
between commits will even direct me to the exact line(s) of the file that
I changed. This situation happens frequently; my workflow helps me to avoid
the frustration of solving the same problem twice.</p>
<p>Now that I’ve used this workflow for a while, it feels taboo for me to switch
to a new task on an existing branch. I prefer not to muddy my commit logs.
If I want to switch tasks, I need to create a new branch. But I can’t name
my branch until I know the issue number. And I won’t know the issue number
until I create an issue. For those (hopefully fleeting) moments of laziness
that developers know well, motivation is built right into this workflow.
To follow it is to perform hygienic version control techniques.</p>Wenyuan (Vincent) Li (PhD)wenyuanli@fb.comThis is a re-blog from my labmates NICHOLAS J. MATIASZ. It aims to help people get familiar with Git and GitHub workflow.Tensorflow Template: A proposal for good practice using tensorflow2018-09-07T00:00:00-07:002018-09-07T00:00:00-07:00https://wenyuan-vincent-li.github.io/posts/2018/09/tensorflow-template<p>This article serves as a proposal that people can
use for fast prototyping machine learning models in
Tensorflow. You can find the code <a href="https://github.com/Wenyuan-Vincent-Li/Tensorflow_template">here</a>.
If you find it useful, don’t forget to star
the repo and let more people to know.</p>
<p>The principle behind this design is trying to isolate each
stage in machine learning modeling so that modifying
each module will note affect others. In other words, people
can easily use the same developed model in their own dataset,
or use the same dataset in different models.</p>
<p>This project is more a proposal than a definitive guide.
However I feel that it should cover most of the cases in
machine learning when doing my own coding.</p>
<h2 id="the-overall-folder-architecture">The overall folder architecture</h2>
<p><img src="/files/Overflow.png" alt="WHole process for machine learning" /></p>
<ol>
<li>
<p><strong>Dataset:</strong> used for store and explore the new dataset for your own problem.
You can convert your dataset to tfrecord in it.</p>
</li>
<li>
<p><strong>Inputpipeline:</strong> used for read in the data from tfrecord or other sources and parser
the data to feed into the NN. The return arguments usually are: iterator, input_data,
target(if it is for supervised learning).</p>
</li>
<li>
<p><strong>Model:</strong> used for create the NN model.</p>
</li>
<li>
<p><strong>Training:</strong> used for training the NN model.</p>
</li>
<li>
<p><strong>Testing:</strong> used for testing and post-processing the results.</p>
</li>
<li>
<p><strong>Deploy:</strong> used for freeze the model and serve the tensorflow API.</p>
</li>
</ol>
<p>Other folders and files are:</p>
<ul>
<li><strong>Plots:</strong> folder that stores pics</li>
<li><strong>README file:</strong> Github markdown.</li>
<li><strong>.gitignore:</strong> used to customize the content for git synchronization.</li>
</ul>
<h2 id="more-details">More Details</h2>
<ol>
<li><strong>Dataset:</strong>
<ul>
<li><strong>utils.py:</strong> provides a variety of functions that can be used for
general data processing, including those convert image data and csv files
to tfrecord.</li>
<li><strong>utils_dataset_spec.py:</strong> should be used to store the pre-processing functions
that specific for the dataset.</li>
</ul>
</li>
<li><strong>Inputpipeline:</strong>
<ul>
<li><strong>ProstateDataSet.py:</strong> creates a dataset object. It should be modified to your
own dataset. A typical data input pipeline includes: read in the data, parser the data,
preprocessing the data, shuffle and repeat the data, batch the data up,
make the data iterator.</li>
<li><strong>inputpipeline.py:</strong> provides some functions that can be used in the
data input pipeline.</li>
<li><strong>input_source.py:</strong> shows several example that tensorflow can use for data
input, such as input from numpy, input from numpy as placeholder, input from
tfrecord, etc.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">input_from_numpy</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">label</span><span class="p">):</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">convert_to_tensor</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">int32</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"image"</span><span class="p">)</span>
<span class="n">label</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">convert_to_tensor</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">dtype</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">int32</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"label"</span><span class="p">)</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">Dataset</span><span class="p">.</span><span class="n">from_tensor_slices</span><span class="p">(</span>
<span class="p">{</span><span class="s">"input"</span><span class="p">:</span> <span class="n">image</span><span class="p">,</span>
<span class="s">"target"</span><span class="p">:</span> <span class="n">label</span><span class="p">})</span>
<span class="k">return</span> <span class="n">dataset</span>
<span class="k">def</span> <span class="nf">input_from_numpy_as_placeholder</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">label</span><span class="p">):</span>
<span class="n">input_placeholder</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">placeholder</span><span class="p">(</span><span class="n">image</span><span class="p">.</span><span class="n">dtype</span><span class="p">,</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">target_placeholder</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">placeholder</span><span class="p">(</span><span class="n">label</span><span class="p">.</span><span class="n">dtype</span><span class="p">,</span> <span class="n">label</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">Dataset</span><span class="p">.</span><span class="n">from_tensor_slices</span><span class="p">((</span><span class="n">input_placeholder</span><span class="p">,</span> \
<span class="n">target_placeholder</span><span class="p">))</span>
<span class="k">return</span> <span class="n">dataset</span>
<span class="k">def</span> <span class="nf">input_from_tfrecord</span><span class="p">():</span>
<span class="n">filenames</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">placeholder</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">string</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="bp">None</span><span class="p">])</span>
<span class="c1"># make filenames as placeholder for training and validating purpose
</span> <span class="n">dataset</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">TFRecordDataset</span><span class="p">(</span><span class="n">filenames</span><span class="p">)</span>
<span class="k">return</span> <span class="n">dataset</span>
</code></pre></div> </div>
</li>
<li><strong>Model:</strong>
<ul>
<li><strong>model_base.py:</strong> provides a series of building blocks that you might
use in your NN, such as relu, leakyrelu, fully_connected layer etc. The
model_base object will be inherited by the main NN model.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">_relu</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_leakyrelu</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">leak</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"lrelu"</span><span class="p">):</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">name_scope</span><span class="p">(</span><span class="n">name</span><span class="p">):</span>
<span class="n">f1</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">leak</span><span class="p">)</span>
<span class="n">f2</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">leak</span><span class="p">)</span>
<span class="k">return</span> <span class="n">f1</span> <span class="o">*</span> <span class="n">x</span> <span class="o">+</span> <span class="n">f2</span> <span class="o">*</span> <span class="n">tf</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</code></pre></div> </div>
<ul>
<li><strong>VGG_16.py:</strong> constructs the main model. A forward_pass method should be
implemented within this object.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">forward_pass</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">name_scope</span><span class="p">(</span><span class="s">'Conv_Block_0'</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_conv_batch_relu</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">filters</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_filters</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> \
<span class="n">kernel_size</span> <span class="o">=</span> <span class="mi">3</span><span class="p">,</span> <span class="n">strides</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_max_pool</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">pool_size</span> <span class="o">=</span> <span class="mi">2</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">name_scope</span><span class="p">(</span><span class="s">'Conv_Block_1'</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_conv_batch_relu</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">filters</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_filters</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> \
<span class="n">kernel_size</span> <span class="o">=</span> <span class="mi">3</span><span class="p">,</span> <span class="n">strides</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">name_scope</span><span class="p">(</span><span class="s">'Fully_Connected'</span><span class="p">):</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">name_scope</span><span class="p">(</span><span class="s">'Tensor_Flatten'</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">shape</span> <span class="o">=</span> <span class="p">[</span><span class="bp">self</span><span class="p">.</span><span class="n">_batch_size</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_fully_connected</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">_num_classes</span><span class="p">)</span>
<span class="k">return</span> <span class="n">x</span>
</code></pre></div> </div>
</li>
<li><strong>Training:</strong>
<ul>
<li><strong>Saver.py:</strong> creates a saver object that save and restore training weights
in tensorflow.</li>
<li><strong>Summary.py:</strong> creates a summary object that store the data in the training
process. Data including scalar, image, histogram, graph, etc. can be utilized
by tenorboard.</li>
<li><strong>train_base.py:</strong> a base function that can be inherited by Train.py.
This base function includes different optimizer, metrics, etc.</li>
<li><strong>Train.py:</strong> a main function that trains the model.</li>
<li><strong>utils.py:</strong> utility function that being used by other training functions.</li>
</ul>
</li>
<li><strong>Testing:</strong>
<ul>
<li><strong>eval_base.py:</strong> a base function that can be inherited by Evaler.py.
This base function includes different metrics, etc.</li>
<li><strong>Evaler.py:</strong> a main function that evaluates the model.</li>
<li><strong>utils.py:</strong> utility functions that being used by other evaluation
functions.</li>
</ul>
</li>
<li><strong>Deploy:</strong>
<ul>
<li><strong>deploy_base.py:</strong> a base function that can be inherited by Deploy.py.
This base function includes import_meta_graph, extend_meta_graph, freeze_mode,
etc.</li>
<li><strong>Deploy.py:</strong> a main function that deploys the model.</li>
<li><strong>construct_deploy_model.py:</strong> construct the model for production use. Put placeholder
as an input interface.</li>
<li><strong>model_inspect.py:</strong> functions that can be used to inspect your trained ckpt file.</li>
</ul>
</li>
</ol>
<h3 id="examples-of-using-this-template">Examples of using this template:</h3>
<h3 id="useful-links">Useful links:</h3>
<p><a href="https://blog.metaflow.fr/tensorflow-a-proposal-of-good-practices-for-files-folders-and-models-architecture-f23171501ae3">TensorFlow: A proposal of good practices for files, folders and models architecture</a></p>
<p><a href="https://guides.github.com/features/mastering-markdown/">Mastering markdown in GitHub</a></p>Wenyuan (Vincent) Li (PhD)wenyuanli@fb.comThis article serves as a proposal that people can use for fast prototyping machine learning models in Tensorflow. You can find the code here. If you find it useful, don’t forget to star the repo and let more people to know.