For the 751 course project, you need to form a team of 4 students to
conduct a focused research-oriented project during the second half of the
semester. Given the focus in the
course on hardware and software co-designed solutions for embedded
applications, each project should include both hardware and software
implementation and evaluation targeted at an embedded application or
application domain.
Requirements:
·
You need to work in teams of 4 approximately students. Smaller or larger teams must adjust the
scope of their work to match the size of the team.
·
If your proposed project does not include both a
hardware and a software component, you must discuss this with me before
the proposals are due, and must justify that your project still maintains
sufficient scope and is relevant to this course.
·
You must submit a written project proposal of up to two pages through the
learn@uw dropbox by October
13, 2015. The proposal must include
the names of all team members, a summary of the proposed topic and a research
plan that outlines how you will accomplish your goals.
·
You must submit a 2-3 page progress report by midnight on November 17,
using the learn@uw dropbox. This must report your current progress
on achieving the goals you set in your initial proposal, and how you will
recover or reset your goals if you are falling behind.
·
Project findings will be presented orally during class time on Dec. 1 and
Dec. 3.
·
Written project reports that fully document your activities and findings
are due on Dec. 15, 2015 using the learn@uw dropbox. Report guidelines are provided here.
·
The project report must also include a statement of work that identifies
the contributions of each individual on the team. This statement of work must reflect a
team consensus and must be signed by all team members. I recommend that you structure this
statement as a table with a row for each project milestone, a column for each
team member, and the percentage contribution of each team member to each
milestone in the entries in the table.
Here are some guidelines for the suggested default project topic and
structure. You can deviate from
these but make sure you discuss your plans with me well in advance of the Oct.
13 proposal deadline:
·
Pick an important, performance- and/or power-constrained embedded
application (some ideas are below, but this really is wide open, and it is up
to you to find something you are interested in or familiar with), characterize
the application to find where it spends most of its execution time, then
propose a hardware accelerator for that task. The accelerator could be a new compound
instruction with matching execution unit, or it could be a stand-alone
accelerator, or it could be reconfigurable logic, or anywhere in between these
options. Read the assigned papers
carefully for ideas and directions.
Next, design the hardware, rewrite the application to use the hardware,
and build a performance and/or power model to evaluate your design.
·
Note that given the scope required for the project, you will have to cut
some corners in your model development (most likely). That is to say, I do not expect a
conference-paper-level evaluation of the hardware or the software. Of course, the more detailed, the
better, but a cycle-accurate timing model coupled with an accurate power model
is probably not a realistic goal for a semester project. More likely, you will develop a
“spreadsheet” model that captures the time spent in the accelerated
and non-accelerated portions of the application, accounts for all significant
overheads (i.e. invoking the accelerator, communicating operands to it and
results from it via some interconnect or through the memory hierarchy, etc.),
and then estimates performance benefits and power savings by subtracting that
portion from the baseline and adding back in the estimates you have developed
through hardware design, modeling, and analysis.
·
Important embedded applications include all and any of the ones mentioned
in the papers or in lecture, as well as many others. Your approach to accelerating an
application need not be novel, and the application need not be novel. However, my standard for evaluating your
efforts will be inversely proportional to the novelty of each. That is to say, I will apply the least
stringent standard to a novel application (one not studied before) accelerated
with a novel approach (one not previously proposed for other applications),
since you will have the most difficulty in evaluating it. At the other extreme, if you repeat a
prior proposal (pre-existing application with a previously-proposed accelerator),
I will hold you to a much higher standard of evaluation. Note that what you choose to do is up to
you: in many ways, repeating a prior proposal is much safer, since you can
simply mimic the evaluation and ideas from the prior paper, and you should have
a very good sense of where you will end up by the end of the semester. However, I encourage you to take a risk
in this project, as you will be evaluated based on effort, not on results, and
the probability that a novel application and/or novel approach can lead to a
publication at a workshop or conference is, of course, much higher.
·
Possible applications of interest include:
1.
Media related: encoders, decoders, document rendering (PDF or
html->printer, etc.)
2.
Computer vision: object recognition, optical flow, image understanding,
etc. Look at the OpenCV library as a starting point, pick a task, and devise
an accelerator to streamline this task.
3.
Mobile web browsing (javascript acceleration,
html5)
4.
Mobile UI acceleration (the user interface thread is a significant power
consumer in smartphones).
5.
Embedded sensing platforms.
Look for scenarios where sensors require significant processing
(typically some kind of signal processing) to interpret sensory data, or are
extremely power-constrained (so must minimize power for analysis and/or for
transmission to communicate sensed results wirelessly).
6.
Software radio. For novel
directions, look into cognitive radio (combination of sensing what is the
current RF environment and then adapting channel allocation, etc. to that
context). What can be done in hardware
(for less power) than is currently done in SDR?
7.
GPS processing. Enabling GPS
typically drains smartphone batteries very quickly. GPS signals are encoded using CDMA and
require solving the triangulation problem to derive location. Typical implementations include a fairly
high-performance DSP to perform these tasks. Can we accelerate these tasks and reduce
power consumption?. One interesting place to start looking at this might
be the gnss-sdr project, which
uses a cheap TV tuner USB card as the receiver for a software radio. By tuning
it to the right (GPS) frequency the raw signal is captured and the GPS signal
can be decoded from it using GNU Radio running on a laptop.
8.
Sensor fusion and compressed sensing. GPS power can be reduced by only
triggering GPS samples when the accelerometer and/or gyroscope indicate
movement. Look at an implementation
of this kind of sensor fusion and find opportunities to accelerate for better
performance or reduced power.
9.
Dead reckoning. The
combination of accelerometer data, gyroscope data, and compass data can be used
to estimate location in the absence of a GPS signal (e.g. inside a building). Are there opportunities for acceleration
in this domain?
10. Security. Preventing various side-channel attacks
in a power/energy-efficient manner.
Implementing security protocols and ciphers efficiently.
·
Possible accelerator architectures include:
1.
Accelerator architectures based on novel instructions and execution
units, in the same vein as the Xtensa reading.
2.
Standalone accelerators that are connected at the memory bus or to the
cache hierarchy.
3.
Reconfigurable accelerators based on FPGA technology.
4.
Enhancements of existing approaches like VLIW.
5.
Accelerators based on neural concepts (see readings [35] or [38]).
·
Related possibilities that are somewhat vague but may give you some
ideas:
1.
Investigate challenges and opportunities of deploying near-threshold
computing (NTC) in embedded scenarios.
2.
Look at opportunities created by new memory technologies (e.g.
phase-change memory, or even new 3D packaging like Micron’s Hybrid Memory
Cube) for embedded systems.
3.
Recent work in cache replacement policies may or may not be relevant for
embedded applications. Study these
with embedded workloads and devise new/better policies. Start with the using the infrastructure
from the 2009 contest: http://www.jilp.org/jwac-1/
4.
Recent work in branch prediction may have relevance to embedded
systems. Explore these using the
2010 or 2014 contest infrastructure and winning proposals: http://www.jilp.org/jwac-1/ http://www.jilp.org/cbp2014/
5.
Recent work in DRAM memory schedulers may have relevance to embedded
systems. Explore these using the
2012 contest infrastructure: http://www.cs.utah.edu/~rajeev/jwac12
6.
Compression for caches, scratchpads, buses: are there new and unexplored
possibilities in this domain?
Arithmetic coding vs. Huffman coding? Etc.
·
Note that you are largely on your own for tool support, so you should
probably rely on pre-existing familiarity with simulation and synthesis tools
from prior research or coursework (e.g. ECE 551, CS537, etc.), and assemble
teams that have a diverse set of skills.
·
We also have access to Xilinx Zynq FPGA boards,
which include ARM cores and FPGA resources; these are an excellent platform for
prototyping attached accelerator ideas.
These are just suggestions. I
welcome your ideas; please speak with me after class or during office hours to
refine them further. I prefer that you come up with your own ideas of what you
are interested in.